βDeadline extension:
- Paper submission: Jan 2
- Commitment for pre-reviewed papers: Jan 10
sites.google.com/view/vardial...
βDeadline extension:
- Paper submission: Jan 2
- Commitment for pre-reviewed papers: Jan 10
sites.google.com/view/vardial...
Interested in developing LLMs that work for dialectal Arabic? Introducing the AMIYA shared task: Arabic Modeling In Your Accent, just accepted to VarDial 2026. Please consider submitting and joining us in Morocco if you do! sites.google.com/view/vardial...
πDistaLs: A Comprehensive Collection of Language Distance Measures
π₯ Rob van der Goot, Esther Ploeger, @verenablaschke.bsky.social Tanja SamardΕΎic
π aclanthology.org/2025.emnlp-d...
π―A convenient toolkit for obtaining distance measures across languages
βΆοΈ www.youtube.com/watch?v=SSk9...
It was fun to do a bit of science outreach, but I also found it super interesting to get a look behind the scenes of how a tv segment is made π
The work I talked about is mainly described in this paper on German dialect ASR:
bsky.app/profile/vere...
π₯ @barbaraplank.bsky.social, Hinrich SchΓΌtze, and I were featured on tv, talking about how AI tools struggle with dialects!
www.ardmediathek.de/video/capric...
We're investigating how publishers handle name changes and the barriers scholars face. If you've changed your name (or are considering it) and dealt with updating your academic publications, we want to hear from you. Researchers who have changed their name for any reason, such as gender transition, marriage, divorce, immigration, cultural reasons, or citation formatting issues. Whether you've successfully updated your work, are currently trying, or decided not to because of barriers, your opinion matters. Your input will help us advocate for better, more inclusive policies in academic publishing. It takes around 5-10 minutes to complete. Survey Link: https://forms.cloud.microsoft/e/E0XXBmZdEP Please share with anyone who might benefit.
We're surveying researchers about name changes in academic publishing.
If you've changed your name and dealt with updating publications, we want to hear your experience. Any reason counts: transition, marriage, cultural reasons, etc.
forms.cloud.microsoft/e/E0XXBmZdEP
Timeline:
- Paper submission: Dec 19
- Commitment for pre-reviewed papers: Jan 2
- Acceptance notifs: Jan 23
- Camera-ready: Feb 3
- Workshop: TBD (Mar 24-29)
Organizers:
Yves Scherrer, NoΓ«mi Aepli, @tosaja.bsky.social, Nikola LjubeΕ‘iΔ, Preslav Nakov, @tiedeman.bsky.social, Marcos Zampieri & me
VarDial @ EACL 2026, with important dates (see next post for text version). Photo CC-0.
VarDial 2026 will be colocated with @eaclmeeting.bsky.social! We're looking forward to your papers on NLP for similar languages, varieties and dialects :)
Deadline: Dec 19 (Jan 2 for pre-reviewed ARR papers)
sites.google.com/view/vardial...
Slide: "Dialect NLP: How (and why) to process non-standard language varieties"
Moin! I'm on my way to Hamburg to meet the @ds-hamburg.bsky.social group and give a talk about dialect NLP! β
Thanks a lot!
Has the "Black LLMirror" work already been published / is it going to be turned into a publication? I'd love to read more about it!
#Interspeech2025 had a science fair today with lots of interactive speech tech demos, not just for conference attendees but also/especially for curious laypeople! The demos were fun, and I like the idea of combining a conference w/ a bit of scicomm for the local public
Check out the...
- talk on Mon Aug 18, 15:50β16:10
- preprint: arxiv.org/abs/2506.02894
- suppl. material: github.com/mainlp/betth...
Joint work w/ Miriam Winkler & @barbaraplank.bsky.social from @mainlp.bsky.social, and Constantin FΓΆrster & Gabriele Wenger-Glemser from Bayerischer Rundfunk!
Automatic metrics like WER and human quality judgements are moderately correlated. Dialectal words are often rendered as nonsense. Dialectal syntactic structures are often retained in the output β whether this is acceptable in Std German is hit-or-miss.
All ASR models we benchmark perform much better on Standard German than dialectal audio. Whether the transcriptions of the dialectal audios tend to be closer to the Std German references or to the dialectal references depends on the model decoder type.
A sentence from the dataset with a Standard German and a dialectal transcription that differ on the word and phrase level.
Betthupferl contains sentences from three dialect groups spoken in southeast Germany, as well as Std German sentences for comparison. The dialectal sentences have both dialectal and Std German gold transcriptions, showing differences between pronunciation, word choice and morphosyntax.
Piper title ("A multi-dialectal dataset for German dialect ASR and dialect-to-standard speech translation") and a map of the German state Bavaria showing where the Franconian, Bavarian, and Alemannic dialect groups are spoken
At #Interspeech2025 I'm going to present Betthupferl, a dataset for German dialect ASR & dialect-to-standard speech translation! We analyze differences between dialectal & Standard German transcriptions, benchmark ASR models, and examine shortcomings of current ASR models & evaluation metrics.
UPDATE: Our poster presentation got moved to Tuesday, 16:00β17:30 (session 10)! #ACL2025NLP
The poster presentation slot got moved to Tuesday, 16:00β17:30!
Joint work with Masha Fedzechkina and @maartjeterhoeve.bsky.social produced during my internship at Apple last year!
See you at the Findings poster reception on Monday July 28 (18:00-19:30) :)
Preprint: arxiv.org/abs/2501.14491
In practice, selecting a transfer language based on just one relevant similarity measure or the transfer results on a similar NLP task w/ similar input representations works well -- although it's best to compare multiple promising transfer candidates.
... Topic classification based on n-grams is sensitive to string overlap (+ correlated linguistic measures), but topic classification based on mBERT embeddings doesn't show any strong correlations β here, inclusion in the pre-training data is important instead.
Fortunately, the patterns confirm our intuitions β e.g., syntactic similarity matters for parsing but not for topic classification. However, input representations matter too....
Correlations between transfer results per experiment (parsing, POS tagging, topic classification with different input representations) and similarity measures. The results vary a lot across experiments and measures β some are described in the next posts.
At #ACL2025NLP I'll present our analysis of the effect of linguistic similarity on cross-lingual transfer! We looked at how 10 similarity measures correlate w/ transfer results btwn 263 languages across 3 NLP tasks. Different similarity measures matter for diff. experiments (no one-size-fits-all)!
My ACL 2024 keynote talk on "Are LLMs Narrowing Our Horizon? Letβs Embrace Variation in NLP!" is online now:
underline.io/events/466/s...
2024.aclweb.org/program/keyn...
It was a huge honor to me to give last year's flagship-in-NLP-conference keynote in Bangkok πΉπ
Dei Boarisch heard ned bei "Servus" und "Pfiade" auf? Dann suach ma genau Di!
Wir suachan Bairischsprecher:innen, de a kurze Umfrage ΓΌber KI-generierds Boarisch fΓΌr a Masterarbeit beantwortn mechadn.
Mid jeder Teilnahme bring ma den boarischn Dialekt a Stickal weida in de digitale Weyd!
Bavarian dialect speakers needed! Our MSc student Miriam wants to find out 1. how good/bad LLM-generated "Bavarian" is, and 2. whether dialect speakers agree with each other on this. The survey takes <5 min: survey.ifkw.lmu.de/dialquali25/ Thank you for sharing/participating!
The first archival *CL Queer in AI workshop will kick off in about 15 min! Join us in-person if you're at NAACL or virtually π
We will have presentations from our amazing contributors and invited speakers. Read on for more details π§΅
Happening now at #NAACL2025 in room Pecos.
Kicking off with amazing talks and a panel by Monojit Choudhury, Isabelle Augenstein, and Katia Shutova