Arianna Bisazza (@arianna-bis)

Our work on contrastive SAE steering for personalizing literary machine translation was accepted to EACL main! 🎉 Check it out! ⬇️

04.01.2026 15:18 👍 16 🔁 2 💬 0 📌 1

(Tagging people who may have an opinion about this :))
@mdlhx.bsky.social @bjerva.bsky.social @wpoelman.bsky.social @estherploeger.bsky.social @tiedeman.bsky.social)

09.02.2026 09:09 👍 2 🔁 0 💬 1 📌 0

📢 Paper alert!

We know typological features can drive the difficulty of language modeling & machine translation in highly controlled setups (w/ relatively small monolingual models)

But do they also drive MT quality in the age of massively multilingual LLMs?

See @v-hirak.bsky.social’s thread ⬇️

09.02.2026 09:04 👍 11 🔁 1 💬 1 📌 0

Natural Language Processing How do you build Large Language Models? How do humans experience Natural Language Processing (NLP) applications in their daily lives? And how can we...

👀 Look what 🎅 has broght just before Christmas 🎁: a brand new Research Master in Natural Language Processing at @facultyofartsug.bsky.social @rug.nl

Program: www.rug.nl/masters/natu...

Applications (2026/2027) are open! Come and study with us (you will also learn why we have a 🐮 in our logo)

18.12.2025 11:28 👍 25 🔁 15 💬 0 📌 0

Wrapping up my oral presentations today with our TACL paper "QE4PE: Quality Estimation for Human Post-editing" at the Interpretability morning session #EMNLP2025 (Room A104, 11:45 China time)!

Paper: arxiv.org/abs/2503.03044
Slides/video/poster: underline.io/lecture/1315...

07.11.2025 02:50 👍 10 🔁 1 💬 2 📌 0

Interested in agent simulations of language change & pragmatic naming behavior?

Come check our poster TODAY (Fri, Nov 7, 12:30 - 13:30) #EMNLP!

07.11.2025 00:56 👍 7 🔁 1 💬 0 📌 0

Benchmarks of linguistic minimal pairs are key for LM evaluation & help us overcome the English-centric bias in NLP research

Come to our poster TODAY (Fr 7 Nov 10.30-12.00) #EMNLP to meet TurBLiMP, a new benchmark for Turkish, revealing how LLMs deal with free-order, morphologically rich languages

06.11.2025 21:24 👍 6 🔁 1 💬 0 📌 0

I'm in Suzhou to present our work on MultiBLiMP, Friday @ 11:45 in the Multilinguality session (A301)!

Come check it out if your interested in multilingual linguistic evaluation of LLMs (there will be parse trees on the slides! There's still use for syntactic structure!)

arxiv.org/abs/2504.02768

06.11.2025 07:08 👍 27 🔁 7 💬 0 📌 0

Interested in developmentally plausible LMs, and the role of child-directed language data?

Come to our poster TODAY (Fr 7 Nov, 10.30-12.00) #EMNLP!

06.11.2025 21:13 👍 8 🔁 2 💬 0 📌 0

Through repeated interactions & shifts in communication needs, the lexicon of a community evolves, eventually leading to language change

We show that NN simulations can help us unravel these complex processes, next to human experiments & corpus studies

See @yuqing0304.bsky.social’s thread below ⬇️

06.11.2025 21:07 👍 1 🔁 0 💬 0 📌 0

There’s more to Neural Nets than big fat LLMs!

We’ve built a NN-agent framework to simulate how people choose the best word in a given communication context (i.e. pragmatic naming behavior).

With @yuqing0304.bsky.social, @ecesuurker.bsky.social, Tessa Verhoef, @gboleda.bsky.social

06.11.2025 21:07 👍 4 🔁 2 💬 1 📌 1

- neural-agent simulations of language change (@yuqing0304.bsky.social)
- child-directed language & syntax learning in LMs (@frap98.bsky.social)
- Turkish benchmark of grammatical minimal pairs (@ezgibasar.bsky.social) & a massively multilingual one, MultiBLiMP (@jumelet.bsky.social)

...and more!

31.10.2025 22:50 👍 2 🔁 0 💬 0 📌 0

InCLow topics #EMNLP2025:

- MT error prediction techniques & its reception by professional translators (@gsarti.com)
- thinking language in Large Reasoning Models (@jiruiqi.bsky.social)
- effect of stereotypes on LLM’s implicit personalization (@veraneplenbroek.bsky.social)

....

31.10.2025 22:50 👍 5 🔁 1 💬 1 📌 0

Thrilled to be heading to Suzhou with a big team of GroNLP'ers 🐮

Interested in Interpretable, Cognitively inspired, Low-resource LMs? Don't miss our posters & talks #EMNLP2025!

31.10.2025 22:50 👍 14 🔁 3 💬 1 📌 0

[1/]💡New Paper
Large reasoning models (LRMs) are strong in English — but how well do they reason in your language?

Our latest work uncovers their limitation and a clear trade-off:
Controlling Thinking Trace Language Comes at the Cost of Accuracy

📄Link: arxiv.org/abs/2505.22888

30.05.2025 13:08 👍 8 🔁 5 💬 1 📌 3

𝐃𝐨 𝐲𝐨𝐮 𝐫𝐞𝐚𝐥𝐥𝐲 𝐰𝐚𝐧𝐭 𝐭𝐨 𝐬𝐞𝐞 𝐰𝐡𝐚𝐭 𝐦𝐮𝐥𝐭𝐢𝐥𝐢𝐧𝐠𝐮𝐚𝐥 𝐞𝐟𝐟𝐨𝐫𝐭 𝐥𝐨𝐨𝐤𝐬 𝐥𝐢𝐤𝐞? 🇨🇳🇮🇩🇸🇪

Here’s the proof! 𝐁𝐚𝐛𝐲𝐁𝐚𝐛𝐞𝐥𝐋𝐌 is the first Multilingual Benchmark of Developmentally Plausible Training Data available for 45 languages to the NLP community 🎉

arxiv.org/abs/2510.10159

14.10.2025 17:01 👍 42 🔁 16 💬 2 📌 1

📢 Announcing the First Workshop on Multilingual and Multicultural Evaluation (MME) at #EACL2026 🇲🇦

MME focuses on resources, metrics & methodologies for evaluating multilingual systems! multilingual-multicultural-evaluation.github.io

📅 Workshop Mar 24–29, 2026
🗓️ Submit by Dec 19, 2025

20.10.2025 10:37 👍 34 🔁 15 💬 1 📌 0

Delighted to share that our paper "Reading Between the Prompts: How Stereotypes Shape LLM's Implicit Personalization" (joint work with @arianna-bis.bsky.social and Raquel Fernández) got accepted to the main conference of #EMNLP

Can't wait to discuss our work at #EMNLP2025 in Suzhou this November!

21.08.2025 08:59 👍 14 🔁 2 💬 0 📌 0

We hope our work will advance the evaluation of LLMs in Turkish and, in general, encourage more research on the robustness of modern language technologies to typological diversity.

19.06.2025 16:28 👍 1 🔁 0 💬 0 📌 0

Finally, our experimental paradigms reveal that even LLMs excelling on general minimal pairs can be brittle to variations in word orders & subordination strategies, unlike human speakers.

See paper for results with 13 LLMs, including mono- and multilingual models of different sizes!

19.06.2025 16:28 👍 1 🔁 0 💬 1 📌 0

We also collect human acceptability judgements & show that *overall* harder phenomena for LLMs are also harder for people, but there are some notable exceptions.

19.06.2025 16:28 👍 1 🔁 0 💬 1 📌 0

TurBLiMP expands the shortlist of existing language-specific BLiMPs with 2 important properties: high word order freedom & agglutination.

To study LLMs' robustness to these properties, we create experimental paradigms testing syntactic skills w/ different word orders & subordination strategies:

19.06.2025 16:28 👍 1 🔁 0 💬 1 📌 0

This is hard, slow-paced work going well beyond benchmark translation (let alone LLM-assisted benchmark generation!) It requires real *linguistic* expertise & long discussions on what makes a phenomenon representative of a language. Here's our proposal, inspired by EnglishBLiMP w/ major adaptations:

19.06.2025 16:28 👍 2 🔁 0 💬 1 📌 0

Grammatical benchmarks are essential to drive progress in truly multilingual Language Modeling & to overcome the linguistic biases we inherit from the English-centeredness of our field.

I'm particularly happy to contribute to this for a language I spent years learning and still found fascinating!

19.06.2025 16:28 👍 2 🔁 0 💬 1 📌 0

TurBLiMP: A Turkish Benchmark of Linguistic Minimal Pairs We introduce TurBLiMP, the first Turkish benchmark of linguistic minimal pairs, designed to evaluate the linguistic abilities of monolingual and multilingual language models (LMs). Covering 16 linguis...

Proud to introduce TurBLiMP, the 1st benchmark of minimal pairs for free-order, morphologically rich Turkish language!

Pre-print: arxiv.org/abs/2506.13487

Fruit of an almost year-long project by amazing MS student @ezgibasar.bsky.social in collab w/ @frap98.bsky.social and @jumelet.bsky.social

19.06.2025 16:28 👍 11 🔁 2 💬 1 📌 3

Happy to hear you find the analysis useful, Marco! If you have any extra questions, don’t hesitate to contact @jiruiqi.bsky.social

05.06.2025 08:59 👍 1 🔁 0 💬 0 📌 0

One step further in our quest to bring interpretability techniques to the service of MT end users: Are uncertainty & model-internals based metrics a viable alternative to supervised word-level quality estimation?

New paper w/ @gsarti.com
@zouharvi.bsky.social @malvinanissim.bsky.social

31.05.2025 18:58 👍 7 🔁 2 💬 0 📌 0

Large Reasoning Models are raising the bar for answer accuracy & transparency, but how does that work in multilingual settings? Can LRMs reason in your language, and what does that entail?

New preprint led by @jiruiqi.bsky.social and @shan23chen.bsky.social!

31.05.2025 14:01 👍 5 🔁 0 💬 1 📌 0

Proud to share the first key output of my Vidi project team w/ @frap98.bsky.social @jumelet.bsky.social @yevgenm.bsky.social who all took this topic to heart, as proved by the many overtime discussions at lunch time 😉

See Francesca’s thread & arXiv link below

30.05.2025 20:45 👍 3 🔁 0 💬 0 📌 0

Excited to see how the BabyLM community will take on this challenge @alexwarstadt.bsky.social @lchoshen.bsky.social @tallinzen.bsky.social @fourtassi.bsky.social and many more

30.05.2025 20:45 👍 3 🔁 0 💬 1 📌 0

Arianna Bisazza

Latest posts by Arianna Bisazza @arianna-bis