default search action
Jan Leike
Person information
- affiliation: Anthropic PBC, San Francisco, CA, USA
- affiliation (former): OpenAI, San Francisco, CA, USA
- affiliation (PhD): Australian National University, Canberra, ACT, Australia
- affiliation: University of Freiburg, Germany
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [c29]Hunter Lightman, Vineet Kosaraju, Yuri Burda, Harrison Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, Karl Cobbe:
Let's Verify Step by Step. ICLR 2024 - [c28]Collin Burns, Pavel Izmailov, Jan Hendrik Kirchner, Bowen Baker, Leo Gao, Leopold Aschenbrenner, Yining Chen, Adrien Ecoffet, Manas Joglekar, Jan Leike, Ilya Sutskever, Jeffrey Wu:
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision. ICML 2024 - [i42]Leo Gao, Tom Dupré la Tour, Henk Tillman, Gabriel Goh, Rajan Troll, Alec Radford, Ilya Sutskever, Jan Leike, Jeffrey Wu:
Scaling and evaluating sparse autoencoders. CoRR abs/2406.04093 (2024) - [i41]Nat McAleese, Rai Michael Pokorny, Juan Felipe Ceron Uribe, Evgenia Nitishinskaya, Maja Trebacz, Jan Leike:
LLM Critics Help Catch LLM Bugs. CoRR abs/2407.00215 (2024) - [i40]Jan Hendrik Kirchner, Yining Chen, Harri Edwards, Jan Leike, Nat McAleese, Yuri Burda:
Prover-Verifier Games improve legibility of LLM outputs. CoRR abs/2407.13692 (2024) - 2023
- [i39]Hunter Lightman, Vineet Kosaraju, Yura Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, Karl Cobbe:
Let's Verify Step by Step. CoRR abs/2305.20050 (2023) - [i38]Collin Burns, Pavel Izmailov, Jan Hendrik Kirchner, Bowen Baker, Leo Gao, Leopold Aschenbrenner, Yining Chen, Adrien Ecoffet, Manas Joglekar, Jan Leike, Ilya Sutskever, Jeff Wu:
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision. CoRR abs/2312.09390 (2023) - 2022
- [c27]Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F. Christiano, Jan Leike, Ryan Lowe:
Training language models to follow instructions with human feedback. NeurIPS 2022 - [i37]Matthew Rahtz, Vikrant Varma, Ramana Kumar, Zachary Kenton, Shane Legg, Jan Leike:
Safe Deep RL in 3D Environments using Human Feedback. CoRR abs/2201.08102 (2022) - [i36]Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F. Christiano, Jan Leike, Ryan Lowe:
Training language models to follow instructions with human feedback. CoRR abs/2203.02155 (2022) - [i35]William Saunders, Catherine Yeh, Jeff Wu, Steven Bills, Long Ouyang, Jonathan Ward, Jan Leike:
Self-critiquing models for assisting human evaluators. CoRR abs/2206.05802 (2022) - 2021
- [j3]Carina E. A. Prunkl, Carolyn Ashurst, Markus Anderljung, Helena Webb, Jan Leike, Allan Dafoe:
Institutionalizing ethics in AI through broader impact requirements. Nat. Mach. Intell. 3(2): 104-110 (2021) - [c26]Adam Gleave, Michael Dennis, Shane Legg, Stuart Russell, Jan Leike:
Quantifying Differences in Reward Functions. ICLR 2021 - [i34]Carina E. A. Prunkl, Carolyn Ashurst, Markus Anderljung, Helena Webb, Jan Leike, Allan Dafoe:
Institutionalising Ethics in AI through Broader Impact Requirements. CoRR abs/2106.11039 (2021) - [i33]Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Pondé de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Joshua Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, Wojciech Zaremba:
Evaluating Large Language Models Trained on Code. CoRR abs/2107.03374 (2021) - [i32]Jeff Wu, Long Ouyang, Daniel M. Ziegler, Nisan Stiennon, Ryan Lowe, Jan Leike, Paul F. Christiano:
Recursively Summarizing Books with Human Feedback. CoRR abs/2109.10862 (2021) - 2020
- [c25]Siddharth Reddy, Anca D. Dragan, Sergey Levine, Shane Legg, Jan Leike:
Learning Human Objectives by Evaluating Hypothetical Behavior. ICML 2020: 8020-8029 - [c24]Stuart Armstrong, Jan Leike, Laurent Orseau, Shane Legg:
Pitfalls of Learning a Reward Function Online. IJCAI 2020: 1592-1600 - [i31]Stuart Armstrong, Jan Leike, Laurent Orseau, Shane Legg:
Pitfalls of learning a reward function online. CoRR abs/2004.13654 (2020) - [i30]Adam Gleave, Michael Dennis, Shane Legg, Stuart Russell, Jan Leike:
Quantifying Differences in Reward Functions. CoRR abs/2006.13900 (2020) - [i29]David Krueger, Tegan Maharaj, Jan Leike:
Hidden Incentives for Auto-Induced Distributional Shift. CoRR abs/2009.09153 (2020) - [i28]David Krueger, Jan Leike, Owain Evans, John Salvatier:
Active Reinforcement Learning: Observing Rewards at a Cost. CoRR abs/2011.06709 (2020)
2010 – 2019
- 2019
- [c23]Dzmitry Bahdanau, Felix Hill, Jan Leike, Edward Hughes, Seyed Arian Hosseini, Pushmeet Kohli, Edward Grefenstette:
Learning to Understand Goal Specifications by Modelling Reward. ICLR (Poster) 2019 - [i27]Siddharth Reddy, Anca D. Dragan, Sergey Levine, Shane Legg, Jan Leike:
Learning Human Objectives by Evaluating Hypothetical Behavior. CoRR abs/1912.05652 (2019) - 2018
- [j2]Jan Leike, Marcus Hutter:
On the computability of Solomonoff induction and AIXI. Theor. Comput. Sci. 716: 28-49 (2018) - [c22]Dzmitry Bahdanau, Felix Hill, Jan Leike, Edward Hughes, Pushmeet Kohli, Edward Grefenstette:
Jointly Learning "What" and "How" from Instructions and Goal-States. ICLR (Workshop) 2018 - [c21]Borja Ibarz, Jan Leike, Tobias Pohlen, Geoffrey Irving, Shane Legg, Dario Amodei:
Reward learning from human preferences and demonstrations in Atari. NeurIPS 2018: 8022-8034 - [c20]Jan Leike, Matthias Heizmann:
Geometric Nontermination Arguments. TACAS (2) 2018: 266-283 - [i26]Dzmitry Bahdanau, Felix Hill, Jan Leike, Edward Hughes, Pushmeet Kohli, Edward Grefenstette:
Learning to Follow Language Instructions with Adversarial Reward Induction. CoRR abs/1806.01946 (2018) - [i25]Borja Ibarz, Jan Leike, Tobias Pohlen, Geoffrey Irving, Shane Legg, Dario Amodei:
Reward learning from human preferences and demonstrations in Atari. CoRR abs/1811.06521 (2018) - [i24]Jan Leike, David Krueger, Tom Everitt, Miljan Martic, Vishal Maini, Shane Legg:
Scalable agent alignment via reward modeling: a research direction. CoRR abs/1811.07871 (2018) - [i23]Miljan Martic, Jan Leike, Andrew Trask, Matteo Hessel, Shane Legg, Pushmeet Kohli:
Scaling shared model governance via model splitting. CoRR abs/1812.05979 (2018) - 2017
- [c19]Sean Lamont, John Aslanides, Jan Leike, Marcus Hutter:
Generalised Discount Functions applied to a Monte-Carlo AI u Implementation. AAMAS 2017: 1589-1591 - [c18]John Aslanides, Jan Leike, Marcus Hutter:
Universal Reinforcement Learning Algorithms: Survey and Experiments. IJCAI 2017: 1403-1410 - [c17]Jan Leike, Tor Lattimore, Laurent Orseau, Marcus Hutter:
On Thompson Sampling and Asymptotic Optimality. IJCAI 2017: 4889-4893 - [c16]Paul F. Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, Dario Amodei:
Deep Reinforcement Learning from Human Preferences. NIPS 2017: 4299-4307 - [i22]Sean Lamont, John Aslanides, Jan Leike, Marcus Hutter:
Generalised Discount Functions applied to a Monte-Carlo AImu Implementation. CoRR abs/1703.01358 (2017) - [i21]John Aslanides, Jan Leike, Marcus Hutter:
Universal Reinforcement Learning Algorithms: Survey and Experiments. CoRR abs/1705.10557 (2017) - [i20]Paul F. Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, Dario Amodei:
Deep reinforcement learning from human preferences. CoRR abs/1706.03741 (2017) - [i19]Jan Leike, Miljan Martic, Victoria Krakovna, Pedro A. Ortega, Tom Everitt, Andrew Lefrancq, Laurent Orseau, Shane Legg:
AI Safety Gridworlds. CoRR abs/1711.09883 (2017) - 2016
- [c15]Daniel Filan, Jan Leike, Marcus Hutter:
Loss Bounds and Time Complexity for Speed Priors. AISTATS 2016: 1394-1402 - [c14]Matthias Heizmann, Daniel Dietsch, Marius Greitschus, Jan Leike, Betim Musa, Claus Schätzle, Andreas Podelski:
Ultimate Automizer with Two-track Proofs - (Competition Contribution). TACAS 2016: 950-953 - [c13]Jan Leike, Tor Lattimore, Laurent Orseau, Marcus Hutter:
Thompson Sampling is Asymptotically Optimal in General Environments. UAI 2016 - [c12]Jan Leike, Jessica Taylor, Benya Fallenstein:
A Formal Solution to the Grain of Truth Problem. UAI 2016 - [i18]Jan Leike, Tor Lattimore, Laurent Orseau, Marcus Hutter:
Thompson Sampling is Asymptotically Optimal in General Environments. CoRR abs/1602.07905 (2016) - [i17]Daniel Filan, Marcus Hutter, Jan Leike:
Loss Bounds and Time Complexity for Speed Priors. CoRR abs/1604.03343 (2016) - [i16]Jan Leike:
Exploration Potential. CoRR abs/1609.04994 (2016) - [i15]Jan Leike, Jessica Taylor, Benya Fallenstein:
A Formal Solution to the Grain of Truth Problem. CoRR abs/1609.05058 (2016) - [i14]Jan Leike, Matthias Heizmann:
Geometric Nontermination Arguments. CoRR abs/1609.05207 (2016) - [i13]Jan Leike:
Nonparametric General Reinforcement Learning. CoRR abs/1611.08944 (2016) - 2015
- [j1]Jan Leike, Matthias Heizmann:
Ranking Templates for Linear Loops. Log. Methods Comput. Sci. 11(1) (2015) - [c11]Mayank Daswani, Jan Leike:
A Definition of Happiness for Reinforcement Learning Agents. AGI 2015: 231-240 - [c10]Tom Everitt, Jan Leike, Marcus Hutter:
Sequential Extensions of Causal and Evidential Decision Theory. ADT 2015: 205-221 - [c9]Jan Leike, Marcus Hutter:
Solomonoff Induction Violates Nicod's Criterion. ALT 2015: 349-363 - [c8]Jan Leike, Marcus Hutter:
On the Computability of Solomonoff Induction and Knowledge-Seeking. ALT 2015: 364-378 - [c7]Jan Leike, Marcus Hutter:
Bad Universal Priors and Notions of Optimality. COLT 2015: 1244-1259 - [c6]Matthias Heizmann, Daniel Dietsch, Jan Leike, Betim Musa, Andreas Podelski:
Ultimate Automizer with Array Interpolation - (Competition Contribution). TACAS 2015: 455-457 - [c5]Jan Leike, Marcus Hutter:
On the Computability of AIXI. UAI 2015: 464-473 - [i12]Mayank Daswani, Jan Leike:
A Definition of Happiness for Reinforcement Learning Agents. CoRR abs/1505.04497 (2015) - [i11]Tom Everitt, Jan Leike, Marcus Hutter:
Sequential Extensions of Causal and Evidential Decision Theory. CoRR abs/1506.07359 (2015) - [i10]Jan Leike, Marcus Hutter:
Solomonoff Induction Violates Nicod's Criterion. CoRR abs/1507.04121 (2015) - [i9]Jan Leike, Marcus Hutter:
On the Computability of Solomonoff Induction and Knowledge-Seeking. CoRR abs/1507.04124 (2015) - [i8]Jan Leike, Marcus Hutter:
Bad Universal Priors and Notions of Optimality. CoRR abs/1510.04931 (2015) - [i7]Jan Leike, Marcus Hutter:
On the Computability of AIXI. CoRR abs/1510.05572 (2015) - 2014
- [c4]Jan Leike, Marcus Hutter:
Indefinitely Oscillating Martingales. ALT 2014: 321-335 - [c3]Jan Leike, Matthias Heizmann:
Ranking Templates for Linear Loops. TACAS 2014: 172-186 - [c2]Jan Leike, Ashish Tiwari:
Synthesis for Polynomial Lasso Programs. VMCAI 2014: 434-452 - [i6]Jan Leike, Matthias Heizmann:
Ranking Templates for Linear Loops. CoRR abs/1401.5338 (2014) - [i5]Matthias Heizmann, Jochen Hoenicke, Jan Leike, Andreas Podelski:
Linear Ranking for Linear Lasso Programs. CoRR abs/1401.5347 (2014) - [i4]Jan Leike:
Ranking Function Synthesis for Linear Lasso Programs. CoRR abs/1401.5351 (2014) - [i3]Jan Leike, Matthias Heizmann:
Geometric Series as Nontermination Arguments for Linear Lasso Programs. CoRR abs/1405.4413 (2014) - [i2]Jan Leike, Marcus Hutter:
Indefinitely Oscillating Martingales. CoRR abs/1408.3169 (2014) - 2013
- [c1]Matthias Heizmann, Jochen Hoenicke, Jan Leike, Andreas Podelski:
Linear Ranking for Linear Lasso Programs. ATVA 2013: 365-380 - [i1]Jan Leike, Ashish Tiwari:
Synthesis for Polynomial Lasso Programs. CoRR abs/1311.4046 (2013)
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-10-24 21:29 CEST by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint