I'm excited to share that this paper was accepted at ICLR 2026! We show that language models encode one of the most basic ingredients of a world model: the ability to distinguish plausible from implausible states. Check out the paper for more details!
See you in Rio!
Paper: arxiv.org/abs/2507.12553
26.02.2026 00:22
👍 30
🔁 6
💬 3
📌 0
I wrote a short article on AI Model Evaluation for the Open Encyclopedia of Cognitive Science 📕👇
Hope this is helpful for anyone who wants a super broad, beginner-friendly intro to the topic!
Thanks @mcxfrank.bsky.social and @asifamajid.bsky.social for this amazing initiative!
12.02.2026 22:22
👍 51
🔁 22
💬 0
📌 1
With some trepidation, I'm putting this out into the world:
gershmanlab.com/textbook.html
It's a textbook called Computational Foundations of Cognitive Neuroscience, which I wrote for my class.
My hope is that this will be a living document, continuously improved as I get feedback.
09.01.2026 01:27
👍 585
🔁 237
💬 16
📌 10
Hopkins Cog Sci is hiring! We have two open faculty positions: one in vision, and one language. Please repost!
12.12.2025 18:18
👍 32
🔁 34
💬 0
📌 2
Yeah exactly -- @kanishka.bsky.social in examples like yours above, if we assume that g=1 and those strings aren't likely to be ungrammatical realizations of some other messages, then diffs in p(string) will reflect diffs in p(m). Which is what we want, no?
11.11.2025 16:17
👍 2
🔁 0
💬 1
📌 0
Screenshot of paper title and list of authors. The title of the paper is: "What Can String Probability Tell Us About Grammaticality?" The authors are: Jennifer Hu, Ethan Gotlieb Wilcox, Siyuan Song, Kyle Mahowald, and Roger P. Levy.
This work was done with an amazing team: @wegotlieb.bsky.social, @siyuansong.bsky.social, @kmahowald.bsky.social, @rplevy.bsky.social
Preprint (pre-TACL version): arxiv.org/abs/2510.16227
10/10
10.11.2025 22:11
👍 11
🔁 1
💬 1
📌 0
Our work also raises new Qs. If LMs virtually always produce grammatical strings, then why is there so much overlap between the probs assigned to grammatical/ungrammatical strings?
This connects to tensions btwn language generation/identification (e.g., openreview.net/forum?id=FGT...)
9/10
10.11.2025 22:11
👍 2
🔁 0
💬 1
📌 0
An offshoot of our analysis: if you use minimal pairs that are not tightly controlled, you risk underestimating the grammatical competence of models, due to differences in underlying message probabilities. 8/10
10.11.2025 22:11
👍 2
🔁 0
💬 1
📌 0
Screenshot of a figure with two panels, labeled (a) and (b). The caption reads: "Figure 4: Evaluation of Prediction 3. (a) Distributions of scores are highly overlapping across grammatical and ungrammatical sentences (pooled across datasets). (b) Poor separability (area under receiver operating characteristic curve, or AUC) achieved by each model and probability transformation. Horizontal line at 0.5 indicates no separation. For dataset-specific results, see Section B, Figures 5 and 7."
As mentioned above, Prediction #3 shows that recent criticism about the overlap in probabilities across gram/ungram strings should NOT be interpreted as a failure of probability to tell us about grammaticality.
This overlap is to be expected if prob is influenced by factors other than gram. 7/10
10.11.2025 22:11
👍 2
🔁 0
💬 1
📌 0
Screenshot of a figure with two panels, labeled (a) and (b). The caption reads: "Figure 2: (a) Prediction 1a: Logprobs of paired grammatical (x-axis) and ungrammatical (y-axis) sentences are correlated. Dashed line: x = y. (b) Prediction 1b: Correlation between grammatical and ungrammatical logprobs (y-axis) generally decreases as within-pair cosine distance (x-axis) increases."
We use our framework to derive 3 predictions, which we validate empirically:
1. Correlation btwn the prob of string probs within minimal pairs
2. Correlation btwn LMs’ and humans’ deltas within minimal pairs
3. Poor separation btwn prob of unpaired grammatical and ungrammatical strings
6/10
10.11.2025 22:11
👍 2
🔁 0
💬 1
📌 0
In other words, when messages aren’t controlled for, gram strings won't always be more probable than ungram strings.
This phenomenon has previously been used to argue that probability is a bad tool for measuring grammatical knowledge -- but in fact, it follows directly from our framework! 5/10
10.11.2025 22:11
👍 4
🔁 0
💬 1
📌 0
Minimal pairs are pairs of strings with the same underlying m but different values of g.
Good LMs have low P(g=0), so they prefer the grammatical string in the minimal pair.
But for non-minimal string pairs with different underlying messages, differences in P(m) can overwhelm even good LMs. 4/10
10.11.2025 22:11
👍 3
🔁 0
💬 2
📌 0
Returning to first principles:
In our framework, the probability of a string comes from two latent variables: m, the message to be conveyed; and g, whether the message is realized grammatically.
Ungrammatical strings get probability mass when g=0: the message is not realized grammatically. 3/10
10.11.2025 22:11
👍 0
🔁 0
💬 1
📌 0
Here we develop and give evidence for a formal framework that reconciles these two observations.
Our framework provides theoretical justification for the widespread practice of using *minimal pairs* to test what grammatical generalizations LMs have acquired. 2/10
10.11.2025 22:11
👍 1
🔁 0
💬 1
📌 0
Screenshot of a figure with two panels, labeled (a) and (b). The caption reads: "Figure 1: (a) Illustration of messages (left) and strings (right) in toy domain. Blue = grammatical strings. Red = ungrammatical strings. (b) Surprisal (negative log probability) assigned to toy strings by GPT-2."
New work to appear @ TACL!
Language models (LMs) are remarkably good at generating novel well-formed sentences, leading to claims that they have mastered grammar.
Yet they often assign higher probability to ungrammatical strings than to grammatical strings.
How can both things be true? 🧵👇
10.11.2025 22:11
👍 91
🔁 20
💬 2
📌 3
It’s grad school application season, and I wanted to give some public advice.
Caveats:
-*-*-*-*
> These are my opinions, based on my experiences, they are not secret tricks or guarantees
> They are general guidelines, not meant to cover a host of idiosyncrasies and special cases
06.11.2025 14:55
👍 112
🔁 58
💬 4
📌 7
Interested in doing a PhD at the intersection of human and machine cognition? ✨ I'm recruiting students for Fall 2026! ✨
Topics of interest include pragmatics, metacognition, reasoning, & interpretability (in humans and AI).
Check out JHU's mentoring program (due 11/15) for help with your SoP 👇
04.11.2025 14:44
👍 27
🔁 15
💬 0
📌 1
New preprint!
"Non-commitment in mental imagery is distinct from perceptual inattention, and supports hierarchical scene construction"
(by Li, Hammond, & me)
link: doi.org/10.31234/osf...
-- the title's a bit of a mouthful, but the nice thing is that it's a pretty decent summary
14.10.2025 13:22
👍 65
🔁 22
💬 5
📌 0
At #COLM2025 and would love to chat all things cogsci, LMs, & interpretability 🍁🥯 I'm also recruiting!
👉 I'm presenting at two workshops (PragLM, Visions) on Fri
👉 Also check out "Language Models Fail to Introspect About Their Knowledge of Language" (presented by @siyuansong.bsky.social Tue 11-1)
07.10.2025 01:39
👍 25
🔁 6
💬 0
📌 0
Can AI models introspect? What does introspection even mean for AI?
We revisit a recent proposal by Comșa & Shanahan, and provide new experiments + an alternate definition of introspection.
Check out this new work w/ @siyuansong.bsky.social, @harveylederman.bsky.social, & @kmahowald.bsky.social 👇
26.08.2025 17:59
👍 21
🔁 5
💬 1
📌 0
Due to popular demand, we are extending the CogInterp submission deadline again! 🗓️🥳
Submit by *8/27* (midnight AoE)
22.08.2025 12:53
👍 10
🔁 2
💬 0
📌 0
🗓️ The submission deadline for CogInterp @ NeurIPS has officially been *extended* to 8/22 (AoE)! 👇
Looking forward to seeing your submissions!
14.08.2025 13:22
👍 4
🔁 0
💬 0
📌 0
Heading to CogSci this week! ✈️
Find me giving talks on:
💬 Prod-comp asymmetry in children and LMs (Thu 7/31)
💬 How people make sense of nonsense (Sat 8/2)
📣 Also, I’m recruiting grad students + postdocs for my new lab at Hopkins! 📣
If you’re interested in language / cognition / AI, let’s chat! 😄
28.07.2025 16:04
👍 21
🔁 3
💬 1
📌 0
Join us at NeurIPS in San Diego this December for talks by experts in the field, including James McClelland, @cgpotts.bsky.social, @scychan.bsky.social, @ari-holtzman.bsky.social, @mtoneva.bsky.social, & @sydneylevine.bsky.social!
🗓️ Submit your 4-page paper (non-archival) by August 15!
4/4
16.07.2025 13:08
👍 11
🔁 0
💬 0
📌 0
We're bringing together researchers in fields such as machine learning, psychology, linguistics, and neuroscience to discuss new empirical findings + theories which help us interpret high-level cognitive abilities in deep learning models.
3/4
16.07.2025 13:08
👍 4
🔁 0
💬 1
📌 0
Deep learning models (e.g. LLMs) show impressive abilities. But what generalizations have these models acquired? What algorithms underlie model behaviors? And how do these abilities develop?
Cognitive science offers a rich body of theories and frameworks which can help answer these questions.
2/4
16.07.2025 13:08
👍 4
🔁 0
💬 2
📌 0
Home
First Workshop on Interpreting Cognition in Deep Learning Models (NeurIPS 2025)
Excited to announce the first workshop on CogInterp: Interpreting Cognition in Deep Learning Models @ NeurIPS 2025! 📣
How can we interpret the algorithms and representations underlying complex behavior in deep learning models?
🌐 coginterp.github.io/neurips2025/
1/4
16.07.2025 13:08
👍 58
🔁 19
💬 1
📌 3
PragLM @ COLM '25
IMPORTANT DATES
Happy to announce the first workshop on Pragmatic Reasoning in Language Models — PragLM @ COLM 2025! 🎉
How do LLMs engage in pragmatic reasoning, and what core pragmatic capacities remain beyond their reach?
🌐 sites.google.com/berkeley.edu/praglm/
📅 Submit by June 23rd
28.05.2025 18:21
👍 41
🔁 18
💬 1
📌 4
Our work also suggests a new way of using AI models to study cognition: not just as a black box mapping stimuli to outputs, but potentially also as processing models.
Excited about future work using mechanistic interpretability to make new, testable predictions about human cognition!
(11/12)
20.05.2025 14:26
👍 2
🔁 0
💬 1
📌 0