Martin Schrimpf's Avatar

Martin Schrimpf

@mschrimpf

NeuroAI Prof @EPFL πŸ‡¨πŸ‡­. ML + Neuro πŸ€–πŸ§ . Brain-Score, CORnet, Vision, Language. Previously: PhD @MIT, ML @Salesforce, Neuro @HarvardMed, & co-founder @Integreat. go.epfl.ch/NeuroAI

2,804
Followers
63
Following
86
Posts
16.12.2023
Joined
Posts Following

Latest posts by Martin Schrimpf @mschrimpf

Preview
Compact deep neural network models of the visual cortex Nature - Parsimonious deep neural network models can be used for prediction of visual neuron responses.

DNN models of the brain are getting bigger. Are we replacing one complicated system in vivo with another in silico?

In new work, we seek the *smallest* DNN models of visual cortex, balancing prediction with parsimony.

It turns out these compact models are surprisingly small!

rdcu.be/e5H8G

26.02.2026 22:32 πŸ‘ 72 πŸ” 19 πŸ’¬ 1 πŸ“Œ 1

I believe the results in your and Ebrahim's paper, but I do not understand why this particular configuration is so important. If you agree with point 2, then that is much more general than what we did in 2021 (more models & data) and with a more stringent metric -- and the core claim stands.

17.02.2026 18:18 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

2. NWP task performance is correlated with brain alignment in a larger set of models and datasets (going beyond our 2021 set).

I understand your pushback to be that NWP-correlates-brainalignment does _not_ hold when using the *exact* 2021 models and datasets, *but* with a different metric.

17.02.2026 18:18 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

(maybe more as a personal summary from this, don't feel obliged to respond.)
I believe we agree on two things:
1. The results from Schrimpf et al. 2021 with the exact same specifications (datasets, metrics, models) are perfectly reproducible from the open-source code.

17.02.2026 18:18 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

I personally see Brain-Score as an evolving set of benchmarks that is improved over time (and not as a static goalpost). Indeed our community is updating it with more rigorous alignment tests and better models. I hope you will consider contributing!

17.02.2026 07:57 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

In vision, Yamins & Hong et al 2014 first established a correspondence between object classification accuracy and ventral stream alignment on a dataset that is very easy by today's standards; which has now been extended to ImageNet, larger and more diverse neural data etc. See Brain-Score.org/vision

17.02.2026 07:57 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

I guess what you mean is whether we should move past the particular methodologies we used in the 2021 paper by testing alignment more stringently and building even better brain models -- I absolutely think so!

17.02.2026 07:57 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

The core claim you mean is "Models that perform better at predicting the next word in a sequence also better predict brain measurements" -- and yes, that indeed has been validated and extended by many follow-up studies. As you said yourself, the results can also be perfectly replicated.

17.02.2026 07:57 πŸ‘ 1 πŸ” 0 πŸ’¬ 2 πŸ“Œ 0
Preview
Re-Align Hackathon Leaderboard - a Hugging Face Space by representational-alignment Submit Blue/Red hackathon JSON and rank by alignment scores.

πŸš€ The Re-Align Challenge is now LIVE!

We’re inviting you to explore what properties of vision models and data lead to convergences and divergences in representational alignment.

πŸ”— Get started: huggingface.co/spaces/repre...

πŸ§΅πŸ‘‡

16.02.2026 15:14 πŸ‘ 6 πŸ” 1 πŸ’¬ 1 πŸ“Œ 2

What's your sense as to why that is? Our intuition from the 2025 EMNLP paper is that more scaled models develop a lot more capabilities beyond formal "core" language processing; I'm curious if you agree

16.02.2026 18:18 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

correction: the original implementation was incorrect and @kartikpradeepan.bsky.social updated the model PR thanks to @ebrahimfeghhi.bsky.social linking the open source code. Updates here: bsky.app/profile/msch...

16.02.2026 10:00 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

hi Ebrahim, I responded in this thread: bsky.app/profile/msch.... Happy to discuss more

16.02.2026 09:59 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Either way I'm glad the OASM model is now part of the open-source community platform, this will be a great reference point. With the new benchmarks soon on Brain-Score, we can encourage the development of models that generalize much better than what we did 5 years ago

16.02.2026 09:57 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

Regarding the original claims: Badr & others reproduced the correlation to NWP performance with the new benchmarks and newer models so I see no reason for the 2021 claims to be invalid. These new benchmarks are enforcing stronger generalization (great!) but that doesn't mean the old ones were wrong.

16.02.2026 09:57 πŸ‘ 1 πŸ” 0 πŸ’¬ 2 πŸ“Œ 0

The AlKhamissi et al 2025 benchmarks are most stringent afaict since they split on stories instead of contiguous k-folds, which prevents temporal autocorrelation within a story. (L)LMs indeed score much higher than OASM here. I'm glad OASM is now integrated in Brain-Score as a useful reference!

16.02.2026 09:57 πŸ‘ 1 πŸ” 0 πŸ’¬ 2 πŸ“Œ 0

Thanks @kartikpradeepan.bsky.social for confirming that this model indeed scores highly on the earlier benchmarks with part-of-sentence-splits! Building on Feghhi & Hadidi et al 2024, AlKhamissi et al 2025 had identified the most stringent benchmarks. We should have merged this PR sooner. 1/

16.02.2026 09:57 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 2
Preview
Beyond linear regression: mapping models in cognitive neuroscience should align with research goals Many cognitive neuroscience studies use large feature sets to predict and interpret brain activity patterns. Feature sets take many forms, from human stimulus annotations to representations in deep ne...

Some re-mapping is necessary even for predicting one brain's activity from another, esp. in higher areas. Linear regression is one of the more restrictive ways to achieve this between two brains so we use the same for models. @neuranna.bsky.social wrote about this here: arxiv.org/abs/2208.10668

11.02.2026 18:18 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

I'll continue in this thread where @ebrahimfeghhi.bsky.social has been helpful with linking the code. I would like to remind you that there is a human at the other end of the screen and that no information will be lost by keeping this friendly.

11.02.2026 18:06 πŸ‘ 4 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Thanks Ebrahim! Would you be interested in submitting this model directly to Brain-Score? Alternatively I can let cursor attempt it again but as you pointed out, it doesn't necessarily get it right

11.02.2026 18:03 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Nima you're very much welcome to update the PR. You are even more welcome to use the Brain-Score platform as we stated previously. I don't know how we can reach common ground if you don't either use the same benchmark implementation, or release your model code.

11.02.2026 16:21 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

I am of course happy to be proven wrong, but I find the framing of this preprint a bit frustrating. We gave similar feedback before, yet the manuscript doesn't seem to engage with the counter-evidence. I would appreciate clarification on the results discrepancy -- please feel free to update the PR!

10.02.2026 11:17 πŸ‘ 20 πŸ” 2 πŸ’¬ 3 πŸ“Œ 0

This is significantly lower than the paper's reported number and far below gpt2-xl (which in the paper is outperformed by oasm). So something does not track here, either in the preprint's re-implementation of the benchmark or my reconstruction of the model.

10.02.2026 11:17 πŸ‘ 12 πŸ” 0 πŸ’¬ 1 πŸ“Œ 1
Preview
add OASM model from Hadidi et al. 2025 by mschrimpf Β· Pull Request #355 Β· brain-score/language Cursor-aided implementation based on the paper. Preliminary results from local run: 0.34 on Pereira2018-linear

3. I implemented and submitted the authors' model to Brain-Score (see PR#355 github.com/brain-score/...). The implementation follows the paper as I could not find a code release. It obtains a ceiling-normalized score of 0.34 on the criticized Pereira2018 benchmark.

10.02.2026 11:17 πŸ‘ 10 πŸ” 0 πŸ’¬ 2 πŸ“Œ 0

-- this work includes null models such as randomly-assigned stimuli responses. Brain-Score language includes benchmarks that use this stronger form of generalization, which we flagged about a year ago.

10.02.2026 11:17 πŸ‘ 6 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

2. Splitting across larger temporal chunks (eg stories) is indeed a stronger form of generalization than smaller chunks (eg sentences). @bkhmsi.bsky.social tackled this in his EMNLP'25 where we identified the most stringent evaluation of brain alignment to be linear predictivity with story splits

10.02.2026 11:17 πŸ‘ 9 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

The perhaps strongest support for this point is that recent LLMs confirm the original prediction: as their task performance improved, their alignment to the human brain further increased (see e.g. Shen et al. 2025).

10.02.2026 11:17 πŸ‘ 8 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

Thank you Dan for the ping! As far as I can tell, all of the original claims hold, for the following reasons:

1. The relationship between next-word prediction performance and brain alignment has been replicated in several other studies (eg Caucheteux et al 2022; De Varda et al 2025; Mischler 2024).

10.02.2026 11:17 πŸ‘ 12 πŸ” 0 πŸ’¬ 2 πŸ“Œ 2

Looking forward to presenting at the #AAAI #NeuroAI workshop; including 3 projects that were just accepted to ICLR! arxiv.org/abs/2509.24597, arxiv.org/abs/2510.03684, arxiv.org/abs/2506.13331 πŸ§ͺπŸ§ πŸ€–

27.01.2026 06:24 πŸ‘ 21 πŸ” 3 πŸ’¬ 0 πŸ“Œ 0
Post image

πŸŽ‰ Re-Align is back for its 4th edition at ICLR 2026!

πŸ“£ We invite submissions on representational alignment, spanning ML, Neuroscience, CogSci, and related fields.

πŸ“ Tracks: Short (≀5p), Long (≀10p), Challenge (blog)

⏰ Deadline: Feb 5, 2026 for papers

πŸ”— representational-alignment.github.io/2026/

07.01.2026 16:27 πŸ‘ 14 πŸ” 9 πŸ’¬ 1 πŸ“Œ 4

One week left to apply to the EPFL computer science PhD program www.epfl.ch/education/ph.... It's an amazing environment to do impactful research πŸ§ͺ (with unparalleled compute)! My NeuroAI group is hiring πŸ§ πŸ€–. Consider this review service by our fantastic PhD students: www.linkedin.com/posts/spnesh...

08.12.2025 14:20 πŸ‘ 3 πŸ” 4 πŸ’¬ 0 πŸ“Œ 0