Lee Sharkey (@leesharkey) Following

MIRI @intelligence.org

For over two decades, the Machine Intelligence Research Institute (MIRI) has worked to understand and prepare for the critical challenges that humanity will face as it transitions to a world with artificial superintelligence.

Misha Ahrens @mishaahrens

Neuroscientist @Janelia, HHMI www.ahrenslab.org

Liv @livgorton

✨ mechanistic interpretability research scientist @ Goodfire | deep learning, math, biology | creating a more beautiful future

Jannik Brinkmann @jannikbrinkmann

Francesco Ortu @francescortu

NLP & Interpretability | PhD Student @ University of Trieste & Laboratory of Data Engineering of Area Science Park | Prev MPI-IS

Kaiser Sun @kaiserwholearns

Ph.D. student at @jhuclsp, human LM that hallucinates. Formerly @MetaAI, @uwnlp, and @AWS they/them🏳️‍🌈 #NLProc #NLP Crossposting on X.

Natalie Shapira @natalieshapira

Tell me about challenges, the unbelievable, the human mind and artificial intelligence, thoughts, social life, family life, science and philosophy.

Carl Allen @carl-allen

Laplace Junior Chair, Machine Learning ENS Paris. (prev ETH Zurich, Edinburgh, Oxford..) Working on mathematical foundations/probabilistic interpretability of ML (what NNs learn🤷‍♂️, disentanglement🤔, king-man+woman=queen?👌…)

Tiago Pimentel @tpimentel

Postdoc at ETH. Formerly, PhD student at the University of Cambridge :)

Alessandro Stolfo @alestolfo

PhD @ ETHZ - LLM Interpretability alestolfo.github.io

Bart Bussmann @bartbussmann

Independent Mechanistic Interpretability Researcher

Michael Hanna @michaelwhanna

PhD Student at the ILLC / UvA doing work at the intersection of (mechanistic) interpretability and cognitive science. Current Anthropic Fellow. hannamw.github.io

David Atkinson @diatkinson

PhD student at Northeastern, previously at EpochAI. Doing AI interpretability. diatkinson.github.io

@kevdududu

Vaidehi Patil @vaidehipatil

Ph.D. Student at UNC NLP | Prev: Apple, Amazon, Adobe (Intern) vaidehi99.github.io | Undergrad @IITBombay

Neel Rajani @neelrajani

PhD student in Responsible NLP at the University of Edinburgh, curious about interpretability and alignment

Koyena Pal @koyena

CS Ph.D. Candidate @ Northeastern | Interpretability + Data Science | BS/MS @ Brown koyenapal.github.io

Jasmijn Bastings @jasmijn.bastings.me

Senior Research Scientist at Google DeepMind. 🌐 jasmijn.bastings.me

Taufeeque @taufeeque

Research Engineer @ FAR.AI taufeeque9.github.io

Gonçalo Paulo @goncalo-paulo

Interpretability researcher at @eleutherai.bsky.social

Shivam Raval @sraval

Physics, Visualization and AI PhD @ Harvard | Embedding visualization and LLM interpretability | Love pretty visuals, math, physics and pets | Currently into manifolds Wanna meet and chat? Book a meeting here: https://zcal.co/shivam-raval

Javier Ferrando @javifer

Interpretability

@woog0

@neelnanda

Marianne de Heer Kloots @mdhk.net

Linguist in AI & CogSci 🧠👩‍💻🤖 PhD student @ ILLC, University of Amsterdam 🌐 https://mdhk.net/ 🐘 https://scholar.social/@mdhk 🐦 https://twitter.com/mariannedhk

Abhilasha Ravichander @lasha

Tenure-track faculty at the Max Planck Institute for Software Systems Previously postdoc at UW and AI2, working on Natural Language Processing Recruiting PhD students! 🌐 https://lasharavichander.github.io/

Max Müller-Eberstein @mxij.me

Postdoc AI Researcher (NLP) @ ITU Copenhagen 🧭 https://mxij.me

Anne Oeldorf-Hirsch @anneo

Comm tech & social media research professor by day, symphony violinist by night, outside as much as possible otherwise. German/American Pacific Northwestern New Englander, #firstgen academic, she/her, 🏳️‍🌈 https://anne-oeldorf-hirsch.uconn.edu

Alicia Curth @aliciacurth

Machine Learner by day, 🦮 Statistician at ❤️ In search of statistical intuition for modern ML & simple explanations for complex things👀 Interested in the mysteries of modern ML, causality & all of stats. Opinions my own. https://aliciacurth.github.io

Oskar van der Wal @ovdw

Technology specialist at the EU AI Office / AI Safety / Prev: University of Amsterdam, EleutherAI, BigScience Thoughts & opinions are my own and do not necessarily represent my employer.

Eliana Pastor @elianapastor

Assistant Professor at PoliTo 🇮🇹 | Former Visiting scholar at UCSC 🇺🇸 | she/her | TrustworthyAI, XAI, Fairness in AI https://elianap.github.io/

Dilyara Bareeva @dilya

PhD Candidate in Interpretability @FraunhoferHHI | 📍Berlin, Germany dilyabareeva.github.io

Rachel Lawrence @rachel-law

Organic machine turning tea into theorems ☕️ AI @ Microsoft Research ➡️ Goal: Teach models (and humans) to reason better Let’s connect re: AI for social good, graphs & network dynamics, discrete math, logic 🧩, 🥾,🎨 Organizing for democracy.🗽 www.rlaw.me

Eric Todd @ericwtodd

CS PhD Student, Northeastern University - Machine Learning, Interpretability https://ericwtodd.github.io

Aryaman Arora @aryaman.io

member of technical staff @stanfordnlp.bsky.social

Chris Wendler @wendlerc

Postdoc at the interpretable deep learning lab at Northeastern University, deep learning, LLMs, mechanistic interpretability

vedang @vedanglad

ai interpretability research and running • thinking about how models think • prev @MIT cs + physics

Jonathan Ling @jonling

Assistant Professor @HopkinsMedicine @JHUPath https://scholar.google.com/citations?user=dGBD72YAAAAJ

Cristina @cristinaml

ML/AI researcher @JohnsHopkins

Shan Chen @shan23chen

PhDing @AIM_Harvard @MassGenBrigham｜PhD Fellow @Google | Previously @Bos_CHIP @BrandeisU More robustness and explainabilities 🧐 for Health AI. shanchen.dev

José Oramas @jaom7

Associate Professor @UAntwerp, sqIRL/IDLab, imec. #RepresentationLearning, #Model #Interpretability & #Explainability A guy who plays with toy bricks, enjoys research and gaming. Opinions are my own idlab.uantwerpen.be/~joramasmogrovejo

@michael-pearce

Nishant Subramani @ ACL @nsubramani23

PhD student @CMU LTI - working on model #interpretability, student researcher @google; prev predoc @ai2; intern @MSFT nishantsubramani.github.io

Julian Minder @jkminder

PhD at EPFL with Robert West, Master at ETHZ Mainly interested in Language Model Interpretability and Model Diffing. MATS 7.0 Winter 2025 Scholar w/ Neel Nanda jkminder.ch

Arthur Conmy @arthurconmy

Aspiring 10x reverse engineer at Google DeepMind

Kayo Yin @kayoyin

PhD student at UC Berkeley. NLP for signed languages and LLM interpretability. kayoyin.github.io 🏂🎹🚵‍♀️🥋

Martin Wattenberg @wattenberg

Human/AI interaction. ML interpretability. Visualization as design, science, art. Professor at Harvard, and part-time at Google DeepMind.

Dr. Kareem Carr, Ph.D. @kareemcarr

Statistician. PhD @Harvard • Masters degree in pure math • Follow me for fun, nerdy content. Sign up to my newsletter: kareemcarr.substack.com

Tomer Ullman @tomerullman

Associate Professor, Department of Psychology, Harvard University. Computation, cognition, development.

Sam Gershman @gershbrain

Professor, Department of Psychology and Center for Brain Science, Harvard University https://gershmanlab.com/

Roger Levy @rplevy

Director, MIT Computational Psycholinguistics Lab. President, Cognitive Science Society. Chair of the MIT Faculty. Open access & open science advocate. He. Lab webpage: http://cpl.mit.edu/ Personal webpage: https://www.mit.edu/~rplevy

David Smith @dasmiq

Associate professor of computer science at Northeastern University. Natural language processing, digital humanities, OCR, computational bibliography, and computational social sciences. Artificial intelligence is an archival science.

Jennifer Hu @jennhu

Asst Prof at Johns Hopkins Cognitive Science • Director of the Group for Language and Intelligence (GLINT) ✨• Interested in all things language, cognition, and AI jennhu.github.io

Kanaka Rajan @kanakarajanphd

Associate Professor at Harvard & Kempner Institute. Applying computational frameworks & machine learning to decode multi-scale neural processes. Marathoner. Rescue dog mom. https://www.rajanlab.com/

Leshem (Legend) Choshen @EMNLP @lchoshen

🥇 LLMs together (co-created model merging, BabyLM, textArena.ai) 🥈 Spreading science over hype in #ML & #NLP Proud shareLM💬 Donor @IBMResearch & @MIT_CSAIL

Ilenna Jones @ilennaj

Ella Batty @ellabatty

Senior Machine Learning Researcher, Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard. Board Member, Neuromatch. she/her. Views are my own.

Harvard Brain Science Initiative @harvardbrainsci

HBI brings together neuroscience researchers from different parts of Harvard and its affiliated hospitals. Throughout all that we do, we aspire to build and nurture a scientific community that is diverse, inclusive, and welcoming. https://brain.harvard.edu

@aaronwalsman

Ekdeep Singh @ ICML @ekdeepl

Postdoc at CBS, Harvard University (New around here)

Zelda Mariet @zzzelda

Isabel Papadimitriou @isabelpapad

(jolly good) Fellow at the Kempner Institute @kempnerinstitute.bsky.social‬, incoming assistant professor at UBC Linguistics (and by courtesy CS, Sept 2025). PhD @stanfordnlp.bsky.social‬ with the lovely @jurafsky.bsky.social‬ isabelpapad.com

Venki Murthy @neurovenki

Neuroscience Professor at Harvard University. Personal account and posts here. Research group website: https://vnmurthylab.org.

Brandon Rohrer @brandonrohrer.com

Robotics and Reinforcement Learning tinkerer. brandonrohrer.com Wrangler of algorithms for Confluence @ Atlassian. Eater of bread. Sipper of whisky. Reports to a Shih Tzu.

Roma Patel @romapatel

research scientist @deepmind. language & multi-agent rl & interpretability. phd @BrownUniversity '22 under ellie pavlick (she/her) https://roma-patel.github.io

Stella Biderman @stellaathena

I make sure that OpenAI et al. aren't the only people who are able to study large scale AI systems.

Christoph Molnar @christophmolnar

Author of Interpretable Machine Learning and other books Newsletter: https://mindfulmodeler.substack.com/ Website: https://christophmolnar.com/

Mimansa Jaiswal @mimansaj

Robustness, Data & Annotations, Evaluation & Interpretability in LLMs http://mimansajaiswal.github.io/

Christina (Chrisy) Bornberg @variint

Enjoy not enjoying ideals | Interpretability of modular convnets applied to 👁️ and 🛰️🐝 | she/her 🦒💕 variint.github.io

Miryam de Lhoneux @mdlhx

NLP assistant prof at KU Leuven, PI @lagom-nlp.bsky.social. I like syntax more than most people. Also multilingual NLP, interpretability, mountains and beer. (She/her)

Stephanie Brandl @stephaniebrandl

Assistant Professor in NLP (Fairness, Interpretability and lately interested in Political Science) at the University of Copenhagen ✨ Before: PostDoc in NLP at Uni of CPH, PhD student in ML at TU Berlin

Joao Barbosa @jbarbosa.org

INSERM group leader @ Neuromodulation Institute and NeuroSpin (Paris) in computational neuroscience. How and why are computations enabling cognition distributed across the brain? Expect neuroscience and ML content. jbarbosa.org

Kyle Morgenstein @kylem

Full of childlike wonder. Building friendly robots. UT Austin PhD student, MIT ‘20.

Bharath Radhakrishnan @bharathr98.com

Anna Rogers @annarogers

Associate professor at IT University of Copenhagen: NLP, language models, interpretability, AI & society. Co-editor-in-chief of ACL Rolling Review. #NLProc #NLP

Jenny Kunz @jeku

Postdoc at Linköping University🇸🇪. Doing NLP, particularly explainability, language adaptation, modular LLMs. I‘m also into🌋🏕️🚴.

Mark Riedl @markriedl

AI for storytelling, games, explainability, safety, ethics. Professor at Georgia Tech. Director of ML Center at GT. Time travel expert. Geek. Dad. he/him

Dino Sejdinovic @sejdino

Professor of Statistical Machine Learning at the University of Adelaide. https://sejdino.github.io/

Thomas Fel @thomasfel

Explainability, Computer Vision, Neuro-AI.🪴 Kempner Fellow @Harvard. Prev. PhD @Brown, @Google, @GoPro. Crêpe lover. 📍 Boston | 🔗 thomasfel.me

André Panisson @panisson

Principal Researcher @ CENTAI.eu | Leading the Responsible AI Team. Building Responsible AI through Explainable AI, Fairness, and Transparency. Researching Graph Machine Learning, Data Science, and Complex Systems to understand collective human behavior.

Sarah Wiegreffe @sarah-nlp

Research in NLP (mostly LM interpretability & explainability). Assistant prof at UMD CS + CLIP. Previously @ai2.bsky.social @uwnlp.bsky.social Views my own. sarahwie.github.io

Chris Olah @colah

Reverse engineering neural networks at Anthropic. Previously Distill, OpenAI, Google Brain.Personal account.

Clément Dumas @butanium

Master student at ENS Paris-Saclay / aspiring AI safety researcher / improviser Prev research intern @ EPFL w/ wendlerc.bsky.social and Robert West MATS Winter 7.0 Scholar w/ neelnanda.bsky.social https://butanium.github.io

Aaron Mueller @amuuueller

Postdoc at Northeastern and incoming Asst. Prof. at Boston U. Working on NLP, interpretability, causality. Previously: JHU, Meta, AWS

David Bau @davidbau

Interpretable Deep Networks. http://baulab.info/ @davidbau

Mor Geva @megamor2

https://mega002.github.io

Niklas Stoehr @niklasstoehr

Gemini Post-Training ⚫️ Research Scientist at Google DeepMind ⚫️ PhD from ETH Zurich

Nina Rimsky @ninarimsky

AI Safety Research // Software Engineering

Gabriele Sarti @gsarti.com

Postdoc @ Northeastern, @ndif-team.bsky.social w/ @davidbau.bsky.social. Interpretability ∩ HCI ∩ #NLProc. Built @inseq.org. Prev: PhD @gronlp.bsky.social, ML @awscloud.bsky.social gsarti.com

Naomi Saphra @nsaphra

Waiting on a robot body. All opinions are universal and held by both employers and family. ML/NLP professor. nsaphra.net

Dashiell @dashiells

Machine learning haruspex

Joe Stacey @joestacey

NLP PhD student at Imperial College London and Apple AI/ML Scholar.

Sweta Karlekar @swetakar

Machine learning PhD student @ Blei Lab in Columbia University Working in mechanistic interpretability, nlp, causal inference, and probabilistic modeling! Previously at Meta for ~3 years on the Bayesian Modeling & Generative AI teams. 🔗 www.sweta.dev

Nicolas Beltran-Velez @velezbeltran

Machine Learning PhD Student @ Blei Lab & Columbia University. Working on probabilistic ML | uncertainty quantification | LLM interpretability. Excited about everything ML, AI and engineering!

Daniel Johnson @ddjohnson

PhD student at Vector Institute / University of Toronto. Building tools to study neural nets and find out what they know. He/him. www.danieldjohnson.com

Alex Makelov @amakelov

Mechanistic interpretability Creator of https://github.com/amakelov/mandala prev. Harvard/MIT machine learning, theoretical computer science, competition math.

Andrew Lee @ajyl

Post-doc @ Harvard. PhD UMich. Spent time at FAIR and MSR. ML/NLP/Interpretability

Martina Vilas @martinagvilas

Computer Science PhD student | AI interpretability | Vision + Language | Cogntive Science. Prev. intern @MicrosoftResearch. https://martinagvilas.github.io/

Isabelle Lee @wordscompute

ml/nlp phding @ usc, currently visiting harvard, scientisting @ startup; interpretability & training & reasoning iglee.me

Lee Sharkey

Following (99)