default search action

combined dblp search
author search
venue search
publication search

ask others

Evan Hubinger

> Home > Persons

Person information

Refine list

refinements active!

zoomed in on ?? of ?? records

view refined list in

export refined list as

showing all ?? records

Conference and Workshop Papers

see FAQ

What is the meaning of the colors in the publication lists?

2024
[c2]
- view
  authority control:
- export record
  dblp key:
  - conf/acl/RimskyGSTHT24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/acl/RimskyGSTHT24
Nina Rimsky, Nick Gabrieli, Julian Schulz, Meg Tong, Evan Hubinger, Alexander Matt Turner:
Steering Llama 2 via Contrastive Activation Addition. ACL (1) 2024: 15504-15522
2023
[c1]
- view
  authority control:
- export record
  dblp key:
  - conf/acl/PerezRLNCHPOKKJ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/acl/PerezRLNCHPOKKJ23
Ethan Perez, Sam Ringer, Kamile Lukosiute, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, Andy Jones, Anna Chen, Benjamin Mann, Brian Israel, Bryan Seethor, Cameron McKinnon, Christopher Olah, Da Yan, Daniela Amodei, Dario Amodei, Dawn Drain, Dustin Li, Eli Tran-Johnson, Guro Khundadze, Jackson Kernion, James Landis, Jamie Kerr, Jared Mueller, Jeeyoon Hyun, Joshua Landau, Kamal Ndousse, Landon Goldberg, Liane Lovitt, Martin Lucas, Michael Sellitto, Miranda Zhang, Neerav Kingsland, Nelson Elhage, Nicholas Joseph, Noemí Mercado, Nova DasSarma, Oliver Rausch, Robin Larson, Sam McCandlish, Scott Johnston, Shauna Kravec, Sheer El Showk, Tamera Lanham, Timothy Telleen-Lawton, Tom Brown, Tom Henighan, Tristan Hume, Yuntao Bai, Zac Hatfield-Dodds, Jack Clark, Samuel R. Bowman, Amanda Askell, Roger Grosse, Danny Hernandez, Deep Ganguli, Evan Hubinger, Nicholas Schiefer, Jared Kaplan:
Discovering Language Model Behaviors with Model-Written Evaluations. ACL (Findings) 2023: 13387-13434

Informal and Other Publications

see FAQ

What is the meaning of the colors in the publication lists?

2024
[i12]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2401-05566
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2401-05566
Evan Hubinger, Carson Denison, Jesse Mu, Mike Lambert, Meg Tong, Monte MacDiarmid, Tamera Lanham, Daniel M. Ziegler, Tim Maxwell, Newton Cheng, Adam S. Jermyn, Amanda Askell, Ansh Radhakrishnan, Cem Anil, David Duvenaud, Deep Ganguli, Fazl Barez, Jack Clark, Kamal Ndousse, Kshitij Sachan, Michael Sellitto, Mrinank Sharma, Nova DasSarma, Roger Grosse, Shauna Kravec, Yuntao Bai, Zachary Witten, Marina Favaro, Jan Brauner, Holden Karnofsky, Paul F. Christiano, Samuel R. Bowman, Logan Graham, Jared Kaplan, Sören Mindermann, Ryan Greenblatt, Buck Shlegeris, Nicholas Schiefer, Ethan Perez:
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training. CoRR abs/2401.05566 (2024)
[i11]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2405-01576
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2405-01576
Olli Järviniemi, Evan Hubinger:
Uncovering Deceptive Tendencies in Language Models: A Simulated Company AI Assistant. CoRR abs/2405.01576 (2024)
[i10]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2406-10162
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2406-10162
Carson Denison, Monte MacDiarmid, Fazl Barez, David Duvenaud, Shauna Kravec, Samuel Marks, Nicholas Schiefer, Ryan Soklaski, Alex Tamkin, Jared Kaplan, Buck Shlegeris, Samuel R. Bowman, Ethan Perez, Evan Hubinger:
Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models. CoRR abs/2406.10162 (2024)
2023
[i9]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2302-00805
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2302-00805
Evan Hubinger, Adam S. Jermyn, Johannes Treutlein, Rubi Hudson, Kate Woolverton:
Conditioning Predictive Models: Risks and Strategies. CoRR abs/2302.00805 (2023)
[i8]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2307-11768
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2307-11768
Ansh Radhakrishnan, Karina Nguyen, Anna Chen, Carol Chen, Carson Denison, Danny Hernandez, Esin Durmus, Evan Hubinger, Jackson Kernion, Kamile Lukosiute, Newton Cheng, Nicholas Joseph, Nicholas Schiefer, Oliver Rausch, Sam McCandlish, Sheer El Showk, Tamera Lanham, Tim Maxwell, Venkatesa Chandrasekaran, Zac Hatfield-Dodds, Jared Kaplan, Jan Brauner, Samuel R. Bowman, Ethan Perez:
Question Decomposition Improves the Faithfulness of Model-Generated Reasoning. CoRR abs/2307.11768 (2023)
[i7]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2307-13702
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2307-13702
Tamera Lanham, Anna Chen, Ansh Radhakrishnan, Benoit Steiner, Carson Denison, Danny Hernandez, Dustin Li, Esin Durmus, Evan Hubinger, Jackson Kernion, Kamile Lukosiute, Karina Nguyen, Newton Cheng, Nicholas Joseph, Nicholas Schiefer, Oliver Rausch, Robin Larson, Sam McCandlish, Sandipan Kundu, Saurav Kadavath, Shannon Yang, Thomas Henighan, Timothy Maxwell, Timothy Telleen-Lawton, Tristan Hume, Zac Hatfield-Dodds, Jared Kaplan, Jan Brauner, Samuel R. Bowman, Ethan Perez:
Measuring Faithfulness in Chain-of-Thought Reasoning. CoRR abs/2307.13702 (2023)
[i6]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2308-03296
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2308-03296
Roger B. Grosse, Juhan Bae, Cem Anil, Nelson Elhage, Alex Tamkin, Amirhossein Tajdini, Benoit Steiner, Dustin Li, Esin Durmus, Ethan Perez, Evan Hubinger, Kamile Lukosiute, Karina Nguyen, Nicholas Joseph, Sam McCandlish, Jared Kaplan, Samuel R. Bowman:
Studying Large Language Model Generalization with Influence Functions. CoRR abs/2308.03296 (2023)
[i5]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2312-06681
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2312-06681
Nina Rimsky, Nick Gabrieli, Julian Schulz, Meg Tong, Evan Hubinger, Alexander Matt Turner:
Steering Llama 2 via Contrastive Activation Addition. CoRR abs/2312.06681 (2023)
2022
[i4]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2211-09169
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2211-09169
Adam S. Jermyn, Nicholas Schiefer, Evan Hubinger:
Engineering Monosemanticity in Toy Models. CoRR abs/2211.09169 (2022)
[i3]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2212-09251
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2212-09251
Ethan Perez, Sam Ringer, Kamile Lukosiute, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, Andy Jones, Anna Chen, Ben Mann, Brian Israel, Bryan Seethor, Cameron McKinnon, Christopher Olah, Da Yan, Daniela Amodei, Dario Amodei, Dawn Drain, Dustin Li, Eli Tran-Johnson, Guro Khundadze, Jackson Kernion, James Landis, Jamie Kerr, Jared Mueller, Jeeyoon Hyun, Joshua Landau, Kamal Ndousse, Landon Goldberg, Liane Lovitt, Martin Lucas, Michael Sellitto, Miranda Zhang, Neerav Kingsland, Nelson Elhage, Nicholas Joseph, Noemí Mercado, Nova DasSarma, Oliver Rausch, Robin Larson, Sam McCandlish, Scott Johnston, Shauna Kravec, Sheer El Showk, Tamera Lanham, Timothy Telleen-Lawton, Tom Brown, Tom Henighan, Tristan Hume, Yuntao Bai, Zac Hatfield-Dodds, Jack Clark, Samuel R. Bowman, Amanda Askell, Roger Grosse, Danny Hernandez, Deep Ganguli, Evan Hubinger, Nicholas Schiefer, Jared Kaplan:
Discovering Language Model Behaviors with Model-Written Evaluations. CoRR abs/2212.09251 (2022)
2020
[i2]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/abs-2012-07532
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2012-07532
Evan Hubinger:
An overview of 11 proposals for building safe advanced AI. CoRR abs/2012.07532 (2020)
2019
[i1]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/abs-1906-01820
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-1906-01820
Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, Scott Garrabrant:
Risks from Learned Optimization in Advanced Machine Learning Systems. CoRR abs/1906.01820 (2019)

Coauthor Index

see FAQ

manage site settings

To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.