default search action

combined dblp search
author search
venue search
publication search

ask others

Lee Sharkey

Lee D. Sharkey

> Home > Persons

Person information

Refine list

refinements active!

zoomed in on ?? of ?? records

view refined list in

export refined list as

showing all ?? records

2020 – today

see FAQ

What is the meaning of the colors in the publication lists?

2025
[j1]
- view
  - electronic edition @ openreview.net (open access)
  - details & citations
- export record
  dblp key:
  - journals/tmlr/SharkeyCBLWBGHOBBGCNRWS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/tmlr/SharkeyCBLWBGHOBBGCNRWS25
Lee Sharkey, Bilal Chughtai, Joshua Batson, Jack Lindsey, Jeffrey Wu, Lucius Bushnaq, Nicholas Goldowsky-Dill, Stefan Heimersheim, Alejandro Ortega, Joseph Isaac Bloom, Stella Biderman, Adrià Garriga-Alonso, Arthur Conmy, Neel Nanda, Jessica Rumbelow, Martin Wattenberg, Nandi Schoots, Joseph Miller, William Saunders, Eric J. Michaud, Stephen Casper, Max Tegmark, David Bau, Eric Todd, Atticus Geiger, Mor Geva, Jesse Hoogland, Daniel Murfet, Tom McGrath:
Open Problems in Mechanistic Interpretability. Trans. Mach. Learn. Res. 2025 (2025)
[c6]
- view
  - electronic edition @ openreview.net (open access)
  - details & citations
- export record
  dblp key:
  - conf/iclr/LeaskBPBTMSN25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/iclr/LeaskBPBTMSN25
Patrick Leask, Bart Bussmann, Michael T. Pearce, Joseph Isaac Bloom, Curt Tigges, Noura Al Moubayed, Lee Sharkey, Neel Nanda:
Sparse Autoencoders Do Not Find Canonical Units of Analysis. ICLR 2025
[c5]
- view
  - electronic edition @ openreview.net (open access)
  - details & citations
- export record
  dblp key:
  - conf/iclr/PearceDROS25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/iclr/PearceDROS25
Michael T. Pearce, Thomas Dooms, Alice Rigg, José Oramas, Lee Sharkey:
Bilinear MLPs enable weight-based mechanistic interpretability. ICLR 2025
[i16]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2501-14926
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2501-14926
Dan Braun, Lucius Bushnaq, Stefan Heimersheim, Jake Mendel, Lee Sharkey:
Interpretability in Parameter Space: Minimizing Mechanistic Description Length with Attribution-based Parameter Decomposition. CoRR abs/2501.14926 (2025)
[i15]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2501-16496
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2501-16496
Lee Sharkey, Bilal Chughtai, Joshua Batson, Jack Lindsey, Jeff Wu, Lucius Bushnaq, Nicholas Goldowsky-Dill, Stefan Heimersheim, Alejandro Ortega, Joseph Isaac Bloom, Stella Biderman, Adrià Garriga-Alonso, Arthur Conmy, Neel Nanda, Jessica Rumbelow, Martin Wattenberg, Nandi Schoots, Joseph Miller, Eric J. Michaud, Stephen Casper, Max Tegmark, William Saunders, David Bau, Eric Todd, Atticus Geiger, Mor Geva, Jesse Hoogland, Daniel Murfet, Tom McGrath:
Open Problems in Mechanistic Interpretability. CoRR abs/2501.16496 (2025)
[i14]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2502-04878
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2502-04878
Patrick Leask, Bart Bussmann, Michael T. Pearce, Joseph Isaac Bloom, Curt Tigges, Noura Al Moubayed, Lee Sharkey, Neel Nanda:
Sparse Autoencoders Do Not Find Canonical Units of Analysis. CoRR abs/2502.04878 (2025)
[i13]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2504-00194
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2504-00194
Brianna Chrisman, Lucius Bushnaq, Lee Sharkey:
Identifying Sparsely Active Circuits Through Local Loss Landscape Decomposition. CoRR abs/2504.00194 (2025)
[i12]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2504-12170
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2504-12170
Charlotte Stix, Matteo Pistillo, Girish Sastry, Marius Hobbhahn, Alejandro Ortega, Mikita Balesni, Annika Hallensleben, Nix Goldowsky-Dill, Lee Sharkey:
AI Behind Closed Doors: a Primer on The Governance of Internal Deployment. CoRR abs/2504.12170 (2025)
[i11]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2504-19475
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2504-19475
Sonia Joseph, Praneet Suresh, Lorenz Hufe, Edward Stevinson, Robert Graham, Yash Vadi, Danilo Bzdok, Sebastian Lapuschkin, Lee Sharkey, Blake Aaron Richards:
Prisma: An Open Source Toolkit for Mechanistic Interpretability in Vision and Video. CoRR abs/2504.19475 (2025)
[i10]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2506-20790
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2506-20790
Lucius Bushnaq, Dan Braun, Lee Sharkey:
Stochastic Parameter Decomposition. CoRR abs/2506.20790 (2025)
2024
[c4]
- view
  authority control:
- export record
  dblp key:
  - conf/fat/CasperESKCBHWSH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/fat/CasperESKCBHWSH24
Stephen Casper, Carson Ezell, Charlotte Siegmann, Noam Kolt, Taylor Lynn Curtis, Benjamin Bucknall, Andreas A. Haupt, Kevin Wei, Jérémy Scheurer, Marius Hobbhahn, Lee Sharkey, Satyapriya Krishna, Marvin Von Hagen, Silas Alberti, Alan Chan, Qinyi Sun, Michael Gerovitch, David Bau, Max Tegmark, David Krueger, Dylan Hadfield-Menell:
Black-Box Access is Insufficient for Rigorous AI Audits. FAccT 2024: 2254-2272
[c3]
- view
  - electronic edition @ openreview.net (open access)
  - details & citations
- export record
  dblp key:
  - conf/iclr/HubenCRES24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/iclr/HubenCRES24
Robert Huben, Hoagy Cunningham, Logan Riggs Smith, Aidan Ewart, Lee Sharkey:
Sparse Autoencoders Find Highly Interpretable Features in Language Models. ICLR 2024
[c2]
- view
  - electronic edition @ nips.cc (open access)
  - details & citations
- export record
  dblp key:
  - conf/nips/BraunTGS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/nips/BraunTGS24
Dan Braun, Jordan Taylor, Nicholas Goldowsky-Dill, Lee Sharkey:
Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning. NeurIPS 2024
[i9]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2401-14446
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2401-14446
Stephen Casper, Carson Ezell, Charlotte Siegmann, Noam Kolt, Taylor Lynn Curtis, Benjamin Bucknall, Andreas Alexander Haupt, Kevin Wei, Jérémy Scheurer, Marius Hobbhahn, Lee Sharkey, Satyapriya Krishna, Marvin Von Hagen, Silas Alberti, Alan Chan, Qinyi Sun, Michael Gerovitch, David Bau, Max Tegmark, David Krueger, Dylan Hadfield-Menell:
Black-Box Access is Insufficient for Rigorous AI Audits. CoRR abs/2401.14446 (2024)
[i8]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2405-12241
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2405-12241
Dan Braun, Jordan Taylor, Nicholas Goldowsky-Dill, Lee Sharkey:
Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning. CoRR abs/2405.12241 (2024)
[i7]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2410-08417
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2410-08417
Michael T. Pearce, Thomas Dooms, Alice Rigg, José Oramas M., Lee Sharkey:
Bilinear MLPs enable weight-based mechanistic interpretability. CoRR abs/2410.08417 (2024)
[i6]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2410-11179
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2410-11179
Kola Ayonrinde, Michael T. Pearce, Lee Sharkey:
Interpretability as Compression: Reconsidering SAE Explanations of Neural Activations with MDL-SAEs. CoRR abs/2410.11179 (2024)
2023
[i5]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2305-03452
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2305-03452
Lee Sharkey:
A technical note on bilinear layers for interpretability. CoRR abs/2305.03452 (2023)
[i4]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2309-08600
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2309-08600
Hoagy Cunningham, Aidan Ewart, Logan Riggs Smith, Robert Huben, Lee Sharkey:
Sparse Autoencoders Find Highly Interpretable Features in Language Models. CoRR abs/2309.08600 (2023)
2022
[c1]
- view
  - electronic edition @ mlr.press (open access)
  - details & citations
- export record
  dblp key:
  - conf/icml/LangoscoKSPK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icml/LangoscoKSPK22
Lauro Langosco di Langosco, Jack Koch, Lee D. Sharkey, Jacob Pfau, David Krueger:
Goal Misgeneralization in Deep Reinforcement Learning. ICML 2022: 12004-12019
[i3]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2211-12312
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2211-12312
Sid Black, Lee Sharkey, Léo Grinsztajn, Eric Winsor, Dan Braun, Jacob Merizian, Kip Parker, Carlos Ramón Guevara, Beren Millidge, Gabriel Alfour, Connor Leahy:
Interpreting Neural Networks through the Polytope Lens. CoRR abs/2211.12312 (2022)
[i2]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2212-11415
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2212-11415
Lee Sharkey:
Circumventing interpretability: How to defeat mind-readers. CoRR abs/2212.11415 (2022)
2021
[i1]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/abs-2105-14111
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2105-14111
Jack Koch, Lauro Langosco, Jacob Pfau, James Le, Lee Sharkey:
Objective Robustness in Deep Reinforcement Learning. CoRR abs/2105.14111 (2021)

Coauthor Index

see FAQ

manage site settings

To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.