Sean Trott's Avatar

Sean Trott

@seantrott

67
Followers
27
Following
29
Posts
07.12.2024
Joined
Posts Following

Latest posts by Sean Trott @seantrott

Thank you!

17.12.2025 17:12 👍 1 🔁 0 💬 0 📌 0

Feel free to reach out here or elsewhere if any of this interests you!

17.12.2025 04:55 👍 0 🔁 0 💬 0 📌 0

I'm also really interested in epistemological questions about how best to study LMs, relating primarily to construct validity (are the tasks we use appropriate?) and external validity (are the findings we obtain generalizable, and to which "populations" of LMs?).

17.12.2025 04:55 👍 0 🔁 0 💬 1 📌 0

Or whether and to what extent human behavior on mental state reasoning tasks (e.g., the false belief task) can be approximated by LMs trained solely on the distributional statistics of language—and how said LMs appear to solve such tasks.

17.12.2025 04:55 👍 1 🔁 0 💬 1 📌 0

E.g., asking whether human representations of ambiguous words (as operationalized by behavior on psycholinguistic tasks) can be approximated by the continuous representations in transformer LMs.

17.12.2025 04:55 👍 0 🔁 0 💬 1 📌 0

Much (though not all) of my current research uses language models as "model organisms" to test theories about human cognition—and, increasingly, adapts methods and conceptual frameworks from Cognitive Science to better understand the behaviors and internal mechanisms of LMs.

17.12.2025 04:55 👍 0 🔁 0 💬 1 📌 0

Our lab will work on questions at the intersection of language, cognition, and computation, like how humans represent ambiguous words; which factors plausibly give rise to our ability to reason about mental states; and how linguistic representations are integrated with sensorimotor experience.

17.12.2025 04:55 👍 0 🔁 0 💬 1 📌 0

Very excited to announce that I'll be starting as an Assistant Professor in the Psychology department at Rutgers University-Newark in January 2026!

17.12.2025 04:55 👍 7 🔁 2 💬 2 📌 1

(Thanks to @camrobjones.bsky.social , Pam Rivière, Oisín Parkinson-Coombs, and Kola Ayonrinde for valuable comments on various iterations of this work!)

02.12.2025 17:34 👍 0 🔁 0 💬 0 📌 0

In general I think there's tons of interesting work to be done exploring what *kinds of claims* generalize across *which kinds of model instances*!

Paper link here: openreview.net/pdf?id=sZZIO...

02.12.2025 17:34 👍 0 🔁 0 💬 1 📌 0

It's also possible that we live in a world where many mechanisms won't generalize at all, or won't generalize along most of these dimensions. But knowing that depends on having the investigatory framework in the first place—this paper is a first stab at systematizing that.

02.12.2025 17:34 👍 0 🔁 0 💬 1 📌 0

Another problem is that this is simply very hard to implement: we don't have random seeds for most models! Indeed, the problem is even worse: available models are *not* a representative sample of possible models! But this should just make us more cautious about our conclusions.

02.12.2025 17:34 👍 0 🔁 0 💬 1 📌 0

I conclude by discussing potential objections. E.g., if interpretability is intended to be idiographic rather than nomothetic, then we don't really need a framework for generalizability. But if we do want to generalize, then organizing principles are key.

02.12.2025 17:34 👍 0 🔁 0 💬 1 📌 0
larger models show earlier onset, higher peak, and steeper slope of 1-back attention.

larger models show earlier onset, higher peak, and steeper slope of 1-back attention.

Additionally, seeds of larger models generally show earlier onsets, higher peaks, and steeper slopes of 1-back attention developmentally. There's also some positional variation, albeit with putative 1-back heads usually appearing in earlier layers.

02.12.2025 17:34 👍 0 🔁 0 💬 1 📌 0
Figure showing development of putative 1-back attention heads across seeds in pythia models.

Figure showing development of putative 1-back attention heads across seeds in pythia models.

I then test select axes with a very simple example (1-back attention) across random seeds of the Pythia suite. Consistent with other work I find strong *developmental convergence* across seeds and also (to lesser extent) across architectures. (red line = GAM predictions across all models)

02.12.2025 17:34 👍 0 🔁 0 💬 1 📌 0
Axes include Functional (~same behavior and responsiveness to ablations), Developmental (emerge at similar points in training), Positional (at similar layers/depths), Relational (interact with other components), and Configurational (similar weight-space regions).

Axes include Functional (~same behavior and responsiveness to ablations), Developmental (emerge at similar points in training), Positional (at similar layers/depths), Relational (interact with other components), and Configurational (similar weight-space regions).

Drawing on recent interp literature, I first identify/propose five potential *axes of correspondence* along which the generalizability of mechanistic claims could be investigated. You can think of this as a set of organizing principles to guide investigations about generalizability/universality.

02.12.2025 17:34 👍 0 🔁 0 💬 1 📌 0

The issue of generalizability is not limited to mechinterp—we see it in psychology (e.g., "WEIRD" subjects) and work on LLM behavior more generally. But the nature of interp research raises another q: what does it even mean to say two instances have the "same" circuit?

02.12.2025 17:34 👍 0 🔁 0 💬 1 📌 0

Mechinterp research typically aims to identify *circuits* implementing functions in particular model instances. But it's unclear whether and when findings *generalize* to other model instances.

02.12.2025 17:34 👍 0 🔁 0 💬 1 📌 0
Screenshot of paper title.

Screenshot of paper title.

Will be presenting a new paper on generalizability in mechinterp research at the 2025 NeurIPS MechInterp workshop! Thread below. #NeurIPS

02.12.2025 17:34 👍 15 🔁 1 💬 1 📌 0
Post image

Does vision training change how language is represented and used in meaningful ways?🤔The answer is a nuanced yes! Comparing VLM-LM minimal pairs, we find that while the taxonomic organization of the lexicon is similar, VLMs are better at _deploying_ this knowledge. [1/9]

22.07.2025 04:45 👍 18 🔁 4 💬 1 📌 4

I've been working on a related question: along which *correspondence axes* (developmental, etc.) can we reasonably expect mechanistic claims to generalize across instances? Will be presenting this work (along with a case study) at the NeurIPS 2025 MechInterp workshop: openreview.net/pdf?id=sZZIO...

24.11.2025 18:38 👍 3 🔁 0 💬 0 📌 0

I think understanding which factors lead to convergence and divergence (both in behavior and internal mechanisms) across networks is crucial to understanding what kinds of systems we're studying and what kinds of claims we can generalize across model instances. Very cool work!

24.11.2025 18:35 👍 2 🔁 0 💬 1 📌 0

This is really cool! I've been doing some related work on seed-wise variability I'll actually be presenting at the NeurIPS MechInterp workshop (openreview.net/pdf?id=sZZIO...). Will try to make it to your presentation/poster!

24.11.2025 18:34 👍 3 🔁 0 💬 1 📌 0
Post image

📍Excited to share that our paper was selected as a Spotlight at #NeurIPS2025!

arxiv.org/pdf/2410.03972

It started from a question I kept running into:

When do RNNs trained on the same task converge/diverge in their solutions?
🧵⬇️

24.11.2025 16:43 👍 108 🔁 27 💬 5 📌 6

A confounding thing for the linguistics of LMs: the best way to assess their grammatical ability is string probability. Yet string probability and grammaticality are famously not the same!

Really excited to have this out, where we give a formal account, w/ experiments, of how to make sense of that!

10.11.2025 22:23 👍 11 🔁 1 💬 1 📌 0

(By @raphaelmilliere.com and @cameronbuckner.bsky.social )

10.10.2025 00:54 👍 1 🔁 0 💬 0 📌 0
Preview
Raphaël Millière & Cameron Buckner, Interventionist Methods for Interpreting Deep Neural Networks - PhilPapers Recent breakthroughs in artificial intelligence have primarily resulted from training deep neural networks (DNNs) with vast numbers of adjustable parameters on enormous datasets. Due to their complex ...

Thank you! I'd also recommend this philosophy-oriented overview of "interventionist" methods for studying neural networks: philpapers.org/rec/MILIMF-2

10.10.2025 00:53 👍 3 🔁 0 💬 1 📌 0

Congratulations!

02.06.2025 17:56 👍 1 🔁 0 💬 1 📌 0

Hard to process the news about Harvard and international students. Other universities should stand in solidarity with our colleagues who are being persecuted.

23.05.2025 04:40 👍 1 🔁 1 💬 0 📌 0
Preview
Why I’m Resigning from the NSF and Library of Congress I cannot participate in systems that require dishonesty as the price of belonging.

Please read my essay in TIME, which @science.org
did not do carefully before publishing this assertion. time.com/7285045/resi...

14.05.2025 16:23 👍 589 🔁 172 💬 6 📌 10