Ilker Kesen (@ilkerkesen)

I have been thinking about some of the consequences of closed vs open research. Closed research can slow down scientific progress and concentrate knowledge, which results in what I call “model archaeology”. I discuss this idea in my ICLR 2026 Blogpost.

Short thread 🧵and link👇

02.03.2026 14:56 👍 11 🔁 2 💬 1 📌 1

CommonLID: Re-evaluating State-of-the-Art Language Identification Performance on Web Data Language identification (LID) is a fundamental step in curating multilingual corpora. However, LID models still perform poorly for many languages, especially on the noisy and heterogeneous web data of...

Announcing our latest paper: CommonLID

In collaboration with @commoncrawl.bsky.social @mlcommons.org @jhu.edu we built a LID benchmark on actual Common Crawl text covering 109 languages. Existing evaluations overestimate how well LangID works on web data.

arxiv.org/abs/2601.18026

13.02.2026 19:27 👍 22 🔁 12 💬 1 📌 0

Our paper on ✨populism✨ and how LLMs struggle to detect it in political debates 🗣️ has been accepted to EACL main conference.
Meet the lead author @kiddothe2b.bsky.social in Rabat to discuss PolSci🤝NLP

05.02.2026 16:26 👍 8 🔁 2 💬 0 📌 0

A photograph of sunny Copenhagen in the summer!

📢 I am hiring a highly-motivated Ph.D student at the University of Copenhagen to work on tokenization-free NLP.

Read our previous work in this topic: aclanthology.org/2025.emnlp-m...
aclanthology.org/2023.emnlp-m...
openreview.net/forum?id=FkS...

Apply by March 8: employment.ku.dk/phd/?show=1563

04.02.2026 10:40 👍 19 🔁 9 💬 0 📌 0

Form and Meaning in Intrinsic Multilingual Evaluations Intrinsic evaluation metrics for conditional language models, such as perplexity or bits-per-character, are widely used in both mono- and multilingual settings. These metrics are rather straightforwar...

New EACL paper (with @mdlhx.bsky.social)! We tested if comparing perplexity of parallel data across languages is fair. Turns out: it depends. We show the choice of test set (even with consistent meaning) can flip conclusions about which language is easier to model.

Paper: arxiv.org/abs/2601.10580

28.01.2026 13:25 👍 10 🔁 3 💬 0 📌 0

Our paper has been accepted to EACL 2026!🎉 We systematically evaluate several vision-language (VLMs) and language-only models, measuring their alignment with brain responses to concept words. Our results show that vision-language models offer a promising tool to model human concept processing

23.01.2026 12:02 👍 14 🔁 4 💬 1 📌 0

Excited to share that 📏 Cetvel benchmark has been accepted to #EACL2026! 📏 Cetvel is designed to evaluate LLMs in Turkish. Please see the thread below for more details. #NLProc

21.01.2026 12:47 👍 1 🔁 1 💬 0 📌 0

İlker

İlker and friends

This week, it was our pleasure to welcome @ilkerkesen.bsky.social, who shared some of his and his group's latest findings with our group.
It shows how multimodal learning can help models transcend the boundaries of writing systems.
We enjoyed İlker's company and look forward to his future work!

19.12.2025 08:55 👍 4 🔁 1 💬 0 📌 0

This week I’m in Utrecht! I’m visiting Utrecht University to give an invited seminar talk on pixel language models, hosted by the NLP Research Group led by Albert Gatt. Note: Don’t be fooled by the image; the weather is cloudy as expected! #NLProc

16.12.2025 15:30 👍 4 🔁 1 💬 0 📌 0

This week, I'm attending #EurIPS here in Copenhagen, where I'll present our work on pretraining a multilingual pixel language model at the #ELLIS UnConference. Find me tomorrow at 4 pm in the poster session at stand no. 59 to learn more about multilingual pixel language models.

01.12.2025 14:40 👍 4 🔁 0 💬 0 📌 0

I'm forcing GPT-5.1 to translate some English text to some target language that I don't know. It decides to use its thinking feature, and then reasons about switching to the target language for the entire output, including explanations and conversational parts. Sigh.

14.11.2025 14:31 👍 0 🔁 0 💬 0 📌 0

This week at #EMNLP2025, I'll present our research on pretraining a multilingual pixel language model. Join the multilinguality session on Friday at 10:30 in Room A301 to learn more about pixel models and their benefits in multilingual settings. (Unfortunately I’ll be on Zoom)

03.11.2025 17:39 👍 2 🔁 0 💬 0 📌 1

When is a language hard to model? Previous research has suggested that morphological complexity both does and does not play a role, but it does so by relating the performance of language models to corpus statistics of words or subword tokens in isolation.

03.11.2025 11:53 👍 7 🔁 3 💬 1 📌 0

We used to develop Knet, but after the rise of HuggingFace, we stopped using it. Before HF, it was also painful to convert each model to Julia code and array, where the models were already available in PyTorch/Tensorflow. Though, I guess you can find BERT/GPT implementations in Knet, somewhere.

08.10.2025 12:37 👍 0 🔁 0 💬 0 📌 0

For more details about 📏Cetvel, please check our preprint.

📜Paper: arxiv.org/abs/2508.16431
💻Code: github.com/KUIS-AI/cetvel
📊Leaderboard: huggingface.co/spaces/KUIS-...

05.09.2025 13:39 👍 0 🔁 0 💬 0 📌 0

Furthermore, we assessed the informativeness of each task using Gini coefficients. We found that grammatical error correction, machine translation and extractive QA (about Turkish / Islam history) are the most informative tasks for evaluating LLMs in Turkish within 📏Cetvel.

05.09.2025 13:39 👍 0 🔁 0 💬 1 📌 0

and lastly (iii) Turkish-centric 8B parameter model Cere-Llama-3-8B outperforms even 70B parameter model Llama-3.3-70B on some Turkish-centric tasks such as grammatical error correction.

05.09.2025 13:39 👍 0 🔁 0 💬 1 📌 0

We tested widely used 33 open-weight LLMs covering different modely families up to 70B parameters. We find that (i) LLMs tailored for Turkish underperform compared against general-purpose LLMs, (ii) Llama 3 models dominate other LLMs within the same parameter scale, [...]

05.09.2025 13:39 👍 0 🔁 0 💬 1 📌 0

Second, 📏Cetvel also offers NLP tasks linguistically and culturally grounded in Turkish, such as proverb understanding, circumflex-based word sense disambugiation, and extractive QA centered on Turkish and Islam history.

05.09.2025 13:39 👍 0 🔁 0 💬 1 📌 0

First, 📏Cetvel goes beyond multiple-choice QA in contrast to existing Turkish benchmarks. It spans 23 tasks across 7 categories, including grammatical error correction, machine translation, summarization, and extractive QA.

05.09.2025 13:39 👍 0 🔁 0 💬 1 📌 0

So, why another Turkish benchmark? The answer is that existing benchmarks often fall short either in limited task diversity or lack of content culturally relevant to Turkish. Unlike existing benchmarks, 📏Cetvel addresses both shortcomings adequately.

05.09.2025 13:39 👍 0 🔁 0 💬 1 📌 0

📢New preprint: We introduce 📏Cetvel, a unified benchmark for evaluating language understanding, generation, and cultural capacity of LLMs in Turkish🇹🇷 #AI #LLM #NLProc

Joint work with Abrek Er, @gozdegulsahin.bsky.social, @aykuterdem.bsky.social from KUIS AI Center.

05.09.2025 13:39 👍 2 🔁 0 💬 1 📌 1

Excited to share that our paper "Multilingual Pretraining for Pixel Language Models" has been accepted to the #EMNLP2025 main conference! Please see the thread below and the paper itself for more details.

21.08.2025 12:42 👍 3 🔁 0 💬 0 📌 0

Multilingual Pretraining for Pixel Language Models Pixel language models operate directly on images of rendered text, eliminating the need for a fixed vocabulary. While these models have demonstrated strong capabilities for downstream cross-lingual tr...

For more details about PIXEL-M4, please check our preprint.

Paper: arxiv.org/abs/2505.21265
Model: huggingface.co/Team-PIXEL/p...
Code: github.com/ilkerkesen/p...

In collaboration with Jonas F. Lotz, Ingo Ziegler, Phillip Rust and Desmond Elliott @delliott.bsky.social

04.06.2025 13:44 👍 0 🔁 0 💬 0 📌 0

Data-efficiency analysis on the Indic NER benchmark also demonstrated that PIXEL-M4 excels at cross-lingual transfer learning in low-resource settings.

04.06.2025 13:44 👍 0 🔁 0 💬 1 📌 0

Investigations on learned multilingual hidden representations reveal a strong semantic alignment between pretraining languages in the later layers, particularly for English-Ukrainian and English-Hindi pairs.

04.06.2025 13:44 👍 0 🔁 0 💬 1 📌 0

Word-level probing analyses illustrate that PIXEL-M4 captures better linguistic features even on languages and writing systems not seen during pretraining.

04.06.2025 13:44 👍 0 🔁 0 💬 1 📌 0

Downstream experiments on text classification, dependency parsing and named entity tasks recognition show that PIXEL-M4 outperforms its English-only-pretrained counterpart PIXEL-BIGRAMS on almost all non-Latin script languages.

04.06.2025 13:44 👍 0 🔁 0 💬 1 📌 0

Announcing our recent work “Multilingual Pretraining for Pixel Language Models”! We introduce PIXEL-M4, a pixel language model pretrained on four visually & linguistically diverse scripts: English, Hindi, Ukrainian & Simplified Chinese. #NLProc

04.06.2025 13:44 👍 1 🔁 1 💬 1 📌 1

Today we are releasing Kaleidoscope 🎉

A comprehensive multimodal & multilingual benchmark for VLMs! It contains real questions from exams in different languages.

🌍 20,911 questions and 18 languages
📚 14 subjects (STEM → Humanities)
📸 55% multimodal questions

10.04.2025 10:31 👍 25 🔁 6 💬 1 📌 1

Ilker Kesen

Latest posts by Ilker Kesen @ilkerkesen