Our work on contrastive SAE steering for personalizing literary machine translation was accepted to EACL main! 🎉 Check it out! ⬇️
@arianna-bis
Associate Professor at GroNLP ( @gronlp.bsky.social ) #NLP | Multilingualism | Interpretability | Language Learning in Humans vs NeuralNets | Mum^2 Head of the InClow research group: https://inclow-lm.github.io/
Our work on contrastive SAE steering for personalizing literary machine translation was accepted to EACL main! 🎉 Check it out! ⬇️
(Tagging people who may have an opinion about this :))
@mdlhx.bsky.social @bjerva.bsky.social @wpoelman.bsky.social @estherploeger.bsky.social @tiedeman.bsky.social)
📢 Paper alert!
We know typological features can drive the difficulty of language modeling & machine translation in highly controlled setups (w/ relatively small monolingual models)
But do they also drive MT quality in the age of massively multilingual LLMs?
See @v-hirak.bsky.social’s thread ⬇️
👀 Look what 🎅 has broght just before Christmas 🎁: a brand new Research Master in Natural Language Processing at @facultyofartsug.bsky.social @rug.nl
Program: www.rug.nl/masters/natu...
Applications (2026/2027) are open! Come and study with us (you will also learn why we have a 🐮 in our logo)
Wrapping up my oral presentations today with our TACL paper "QE4PE: Quality Estimation for Human Post-editing" at the Interpretability morning session #EMNLP2025 (Room A104, 11:45 China time)!
Paper: arxiv.org/abs/2503.03044
Slides/video/poster: underline.io/lecture/1315...
Interested in agent simulations of language change & pragmatic naming behavior?
Come check our poster TODAY (Fri, Nov 7, 12:30 - 13:30) #EMNLP!
Benchmarks of linguistic minimal pairs are key for LM evaluation & help us overcome the English-centric bias in NLP research
Come to our poster TODAY (Fr 7 Nov 10.30-12.00) #EMNLP to meet TurBLiMP, a new benchmark for Turkish, revealing how LLMs deal with free-order, morphologically rich languages
I'm in Suzhou to present our work on MultiBLiMP, Friday @ 11:45 in the Multilinguality session (A301)!
Come check it out if your interested in multilingual linguistic evaluation of LLMs (there will be parse trees on the slides! There's still use for syntactic structure!)
arxiv.org/abs/2504.02768
Interested in developmentally plausible LMs, and the role of child-directed language data?
Come to our poster TODAY (Fr 7 Nov, 10.30-12.00) #EMNLP!
Through repeated interactions & shifts in communication needs, the lexicon of a community evolves, eventually leading to language change
We show that NN simulations can help us unravel these complex processes, next to human experiments & corpus studies
See @yuqing0304.bsky.social’s thread below ⬇️
There’s more to Neural Nets than big fat LLMs!
We’ve built a NN-agent framework to simulate how people choose the best word in a given communication context (i.e. pragmatic naming behavior).
With @yuqing0304.bsky.social, @ecesuurker.bsky.social, Tessa Verhoef, @gboleda.bsky.social
- neural-agent simulations of language change (@yuqing0304.bsky.social)
- child-directed language & syntax learning in LMs (@frap98.bsky.social)
- Turkish benchmark of grammatical minimal pairs (@ezgibasar.bsky.social) & a massively multilingual one, MultiBLiMP (@jumelet.bsky.social)
...and more!
InCLow topics #EMNLP2025:
- MT error prediction techniques & its reception by professional translators (@gsarti.com)
- thinking language in Large Reasoning Models (@jiruiqi.bsky.social)
- effect of stereotypes on LLM’s implicit personalization (@veraneplenbroek.bsky.social)
....
Thrilled to be heading to Suzhou with a big team of GroNLP'ers 🐮
Interested in Interpretable, Cognitively inspired, Low-resource LMs? Don't miss our posters & talks #EMNLP2025!
[1/]💡New Paper
Large reasoning models (LRMs) are strong in English — but how well do they reason in your language?
Our latest work uncovers their limitation and a clear trade-off:
Controlling Thinking Trace Language Comes at the Cost of Accuracy
📄Link: arxiv.org/abs/2505.22888
𝐃𝐨 𝐲𝐨𝐮 𝐫𝐞𝐚𝐥𝐥𝐲 𝐰𝐚𝐧𝐭 𝐭𝐨 𝐬𝐞𝐞 𝐰𝐡𝐚𝐭 𝐦𝐮𝐥𝐭𝐢𝐥𝐢𝐧𝐠𝐮𝐚𝐥 𝐞𝐟𝐟𝐨𝐫𝐭 𝐥𝐨𝐨𝐤𝐬 𝐥𝐢𝐤𝐞? 🇨🇳🇮🇩🇸🇪
Here’s the proof! 𝐁𝐚𝐛𝐲𝐁𝐚𝐛𝐞𝐥𝐋𝐌 is the first Multilingual Benchmark of Developmentally Plausible Training Data available for 45 languages to the NLP community 🎉
arxiv.org/abs/2510.10159
📢 Announcing the First Workshop on Multilingual and Multicultural Evaluation (MME) at #EACL2026 🇲🇦
MME focuses on resources, metrics & methodologies for evaluating multilingual systems! multilingual-multicultural-evaluation.github.io
📅 Workshop Mar 24–29, 2026
🗓️ Submit by Dec 19, 2025
Delighted to share that our paper "Reading Between the Prompts: How Stereotypes Shape LLM's Implicit Personalization" (joint work with @arianna-bis.bsky.social and Raquel Fernández) got accepted to the main conference of #EMNLP
Can't wait to discuss our work at #EMNLP2025 in Suzhou this November!
We hope our work will advance the evaluation of LLMs in Turkish and, in general, encourage more research on the robustness of modern language technologies to typological diversity.
Finally, our experimental paradigms reveal that even LLMs excelling on general minimal pairs can be brittle to variations in word orders & subordination strategies, unlike human speakers.
See paper for results with 13 LLMs, including mono- and multilingual models of different sizes!
We also collect human acceptability judgements & show that *overall* harder phenomena for LLMs are also harder for people, but there are some notable exceptions.
TurBLiMP expands the shortlist of existing language-specific BLiMPs with 2 important properties: high word order freedom & agglutination.
To study LLMs' robustness to these properties, we create experimental paradigms testing syntactic skills w/ different word orders & subordination strategies:
This is hard, slow-paced work going well beyond benchmark translation (let alone LLM-assisted benchmark generation!) It requires real *linguistic* expertise & long discussions on what makes a phenomenon representative of a language. Here's our proposal, inspired by EnglishBLiMP w/ major adaptations:
Grammatical benchmarks are essential to drive progress in truly multilingual Language Modeling & to overcome the linguistic biases we inherit from the English-centeredness of our field.
I'm particularly happy to contribute to this for a language I spent years learning and still found fascinating!
Proud to introduce TurBLiMP, the 1st benchmark of minimal pairs for free-order, morphologically rich Turkish language!
Pre-print: arxiv.org/abs/2506.13487
Fruit of an almost year-long project by amazing MS student @ezgibasar.bsky.social in collab w/ @frap98.bsky.social and @jumelet.bsky.social
Happy to hear you find the analysis useful, Marco! If you have any extra questions, don’t hesitate to contact @jiruiqi.bsky.social
One step further in our quest to bring interpretability techniques to the service of MT end users: Are uncertainty & model-internals based metrics a viable alternative to supervised word-level quality estimation?
New paper w/ @gsarti.com
@zouharvi.bsky.social @malvinanissim.bsky.social
Large Reasoning Models are raising the bar for answer accuracy & transparency, but how does that work in multilingual settings? Can LRMs reason in your language, and what does that entail?
New preprint led by @jiruiqi.bsky.social and @shan23chen.bsky.social!
Proud to share the first key output of my Vidi project team w/ @frap98.bsky.social @jumelet.bsky.social @yevgenm.bsky.social who all took this topic to heart, as proved by the many overtime discussions at lunch time 😉
See Francesca’s thread & arXiv link below
Excited to see how the BabyLM community will take on this challenge @alexwarstadt.bsky.social @lchoshen.bsky.social @tallinzen.bsky.social @fourtassi.bsky.social and many more