(@bowphs) — KonKok

The world’s largest NLP conference with almost 2,000 papers presented, ACL 2025 just took place in Vienna! 🎓✨ Here is a quick snapshot of the event via a short interview with one of the authors whose work caught my attention.
🎥 Watch: youtu.be/GBISWggsQOA

14.09.2025 11:49 👍 3 🔁 2 💬 0 📌 0

ACL paper: aclanthology.org/2023.acl-lon...
Models: github.com/Heidelberg-N...
Read more: cl.uni-heidelberg.de/nlpgroup/new...
Morphological Analysis Demo: huggingface.co/spaces/bowph...
Machine Translation Demo: huggingface.co/spaces/bowphs/
Best Thesis Award: www.gscl.org/en/activitie...

14.09.2025 09:13 👍 2 🔁 0 💬 0 📌 0

I am honored to receive the 2025 #GSCL Best Thesis Award at #KONVENS in Hildesheim for my Master’s thesis, which investigates multilinguality and develops language models for Ancient Greek and Latin. Thank you to my mentors and collaborators. I look forward to what comes next.

14.09.2025 09:13 👍 4 🔁 1 💬 1 📌 1

Looking at Bruegel's Tower of Babel in Vienna makes you wonder: How can multilingual language models overcome the language barriers? Find out tomorrow!
📍 Level 1 (ironic, right?), Room 1.15-1
🕐 2 PM
#ACL2025NLP

27.07.2025 21:11 👍 3 🔁 0 💬 0 📌 1

Read the full paper here: arxiv.org/pdf/2506.01629

Reach out if you have any questions or if you are attending ACL and want to say hi. 🙋

07.06.2025 10:11 👍 2 🔁 0 💬 0 📌 0

Table comparing text generations between early and late checkpoints for the concepts "earthquake" and "joy". Early checkpoint generations show language-specific text, while late checkpoint generations demonstrate a shift toward "language-agnostic" (= English) text.

This phenomenon has a visible effect on text generation: In BLOOM-560m, activating 'earthquake' neurons derived from Spanish data at checkpoint 10,000 generates Spanish text. At checkpoint 400,000, the same method yields English text!

07.06.2025 10:11 👍 2 🔁 0 💬 1 📌 0

Average overlap proportion of expert neurons across layers and training checkpoints. Later checkpoints exhibit more shared neurons, particularly in the middle layers.

This is not a bug, it's a feature! These layers are repurposing the space to form cross-lingual abstractions.
We track this by examining how specific concepts (like "earthquake" or "joy") align across languages.

07.06.2025 10:11 👍 2 🔁 0 💬 1 📌 0

We ask a probing classifier: "Given this hidden state from layer l, what is the language of the source text?" The results are striking: earlier checkpoints consistently solve this with high accuracy across layers. Later checkpoints, however, exhibit clear performance drops.

07.06.2025 10:11 👍 2 🔁 0 💬 1 📌 0

Probing classifier performance comparison between early and late checkpoint across layers. While the early checkpoint shows uniformly high performance, the later checkpoint exhibits relatively high variance across layers.

How and when do multilingual LMs achieve cross-lingual generalization during pre-training? And why do later, supposedly more advanced checkpoints, lose some language identification abilities in the process? Our #ACL2025 paper investigates.

07.06.2025 10:11 👍 3 🔁 2 💬 1 📌 1

Read the full paper here: arxiv.org/pdf/2506.01629

Reach out if you have any questions or if you are attending ACL and want to say hi. 🙋

07.06.2025 10:07 👍 0 🔁 0 💬 0 📌 0

Table comparing text generations between early and late checkpoints for the concepts "earthquake" and "joy". Early checkpoint generations show language-specific text, while late checkpoint generations demonstrate a shift toward "language-agnostic" (= English) text.

This phenomenon has a visible effect on text generation: In BLOOM-560m, activating 'earthquake' neurons derived from Spanish data at checkpoint 10,000 generates Spanish text. At checkpoint 400,000, the same method yields English text!

07.06.2025 10:07 👍 0 🔁 0 💬 1 📌 0

Average overlap proportion of expert neurons across layers and training checkpoints. Later checkpoints exhibit more shared neurons, particularly in the middle layers.

This is not a bug, it's a feature! These layers are repurposing the space to form cross-lingual abstractions.
We track this by examining how specific concepts (like "earthquake" or "joy") align across languages.

07.06.2025 10:07 👍 0 🔁 0 💬 1 📌 0

We ask a probing classifier: "Given this hidden state from layer l, what is the language of the source text?" The results are striking: earlier checkpoints consistently solve this with high accuracy across layers. Later checkpoints, however, exhibit clear performance drops.

07.06.2025 10:07 👍 0 🔁 0 💬 1 📌 0

Read the full paper here: arxiv.org/pdf/2506.01629

Reach out if you have any questions or if you are attending ACL and want to say hi. 🙋

06.06.2025 17:21 👍 0 🔁 0 💬 0 📌 0

Sample generations demonstrating language-specific generation in early checkpoint and language-agnostic (= English) generation in late checkpoint.

This phenomenon has a visible effect on text generation: In BLOOM-560m, activating 'earthquake' neurons derived from Spanish data at checkpoint 10,000 generates Spanish text. At checkpoint 400,000, the same method yields English text!

06.06.2025 17:21 👍 0 🔁 0 💬 1 📌 0

Expert overlap proportion across layers for different training checkpoints.

This is not a bug, it's a feature! These layers are repurposing the space to form cross-lingual abstractions.
We track this by examining how specific concepts (like "earthquake" or "joy") align across languages.

06.06.2025 17:21 👍 0 🔁 0 💬 1 📌 0

We ask a probing classifier: "Given this hidden state from layer l, what is the language of the source text?" The results are striking: earlier checkpoints consistently solve this with high accuracy across layers. Later checkpoints, however, exhibit clear performance drops.

06.06.2025 17:21 👍 0 🔁 0 💬 1 📌 0

Debates aren’t always black and white—opposing sides often share common ground. These partial agreements are key for meaningful compromises
Presenting “Perspectivized Stance Vectors” (PSVs) — an interpretable method to identify nuanced (dis)agreements

📜 arxiv.org/abs/2502.09644
🧵 More details below

21.02.2025 16:08 👍 4 🔁 3 💬 1 📌 0

An Annotated Dataset of Errors in Premodern Greek and Baselines for Detecting Them Creston Brooks, Johannes Haubold, Charlie Cowen-Breen, Jay White, Desmond DeVaul, Frederick Riemenschneider, Karthik R Narasimhan, Barbara Graziosi. Findings of the Association for Computational Lingu...

Read the full paper: aclanthology.org/2025.finding...

Work by Creston Brooks, Johannes Haubold, Charlie Cowen-Breen, Jay White, Desmond DeVaul, me, Karthik Narasimhan, and Barbara Graziosi

01.05.2025 11:29 👍 7 🔁 1 💬 0 📌 0

Our work brings new computational methods to a field traditionally dominated by manual scholarship, potentially accelerating the discovery of textual errors that have remained hidden for centuries.

01.05.2025 11:29 👍 3 🔁 0 💬 1 📌 0

Perhaps most surprising: even powerful models like GPT-4 performed barely above random chance on this specialized task! This highlights the limitations of general-purpose LLMs when dealing with ancient text restoration.

01.05.2025 11:29 👍 3 🔁 0 💬 1 📌 0

We tested several error detection methods and found that our discriminator-based approach outperforms all others. Interestingly, scribal errors (the oldest type) are universally more difficult to detect than print or digitization errors across ALL methods.

01.05.2025 11:29 👍 2 🔁 0 💬 1 📌 0

Prior work has only evaluated error detection on artificially-generated errors. Our dataset contains REAL errors that naturally accumulated over centuries - the subtle mistakes that survived precisely because they often appear perfectly reasonable.

01.05.2025 11:29 👍 1 🔁 0 💬 1 📌 0

Creating this dataset was painstaking! Our domain expert spent over 100 hours reviewing potential errors, categorizing them as scribal errors (from manuscript copying), print errors (from creating editions), or digitization errors (from converting to digital).

01.05.2025 11:29 👍 2 🔁 0 💬 1 📌 0

In "An Annotated Dataset of Errors in Premodern Greek and Baselines for Detecting Them," we introduce the first expert-labeled dataset of real errors in ancient texts, enabling proper evaluation of error detection methods on authentic textual problems.

01.05.2025 11:29 👍 3 🔁 1 💬 1 📌 0

What did Aristotle actually write? We think we know, but reality is messy. As ancient Greek texts traveled through 2,500 years of history, they were copied and recopied countless times, accumulating subtle errors with each generation. Our new #NAACL2025 paper tackles this fascinating challenge.

01.05.2025 11:29 👍 13 🔁 4 💬 1 📌 2

Latest posts by @bowphs