Akhila Yerukola (@akhilayerukola)

🎭 How do LLMs (mis)represent culture?
🧮 How often?
🧠 Misrepresentations = missing knowledge? spoiler: NO!

At #CHI2026 we are bringing ✨TALES✨ a participatory evaluation of cultural (mis)reps & knowledge in multilingual LLM-stories for India

📜 arxiv.org/abs/2511.21322

1/10

02.02.2026 21:38 👍 45 🔁 22 💬 1 📌 2

NLP 4 Democracy - COLM 2025

I will be at #COLM2025 this week, and would love to connect with folks interested in applications (and critiques) of language modeling in social science research!

And join us for the NLP4Democracy workshop on Friday!

sites.google.com/andrew.cmu.e...

#NLP #NLProc #LLM #ComputationalSocialScience

06.10.2025 19:31 👍 16 🔁 5 💬 0 📌 0

🔈For the SoLaR workshop
@COLM_conf
we are soliciting opinion abstracts to encourage new perspectives and opinions on responsible language modeling, 1-2 of which will be selected to be presented at the workshop.

Please use the google form below to submit your opinion abstract ⬇️

08.08.2025 12:40 👍 8 🔁 4 💬 1 📌 0

I'll be at #ACL2025🇦🇹!!
Would love to chat about all things pragmatics 🧠, redefining "helpfulness"🤔 and enabling better cross-cultural capabilities 🗺️ 🫶

Presenting our work on culturally offensive nonverbal gestures 👇
🕛Wed @ Poster Session 4
📍Hall 4/5, 11:00-12:30

26.07.2025 02:46 👍 4 🔁 1 💬 0 📌 0

Using Hand Gestures To Evaluate AI Biases - Language Technologies Institute - School of Computer Science - Carnegie Mellon University LTI researchers have created a model to help generative AI systems understand the cultural nuance of gestures.

Hand gestures are a major mode of human communication, but they don't always translate well across cultures. New research from @akhilayerukola.bsky.social, @maartensap.bsky.social and others is aimed at giving AI systems a hand with overcoming cultural biases:
lti.cmu.edu/news-and-eve...

27.06.2025 18:04 👍 8 🔁 3 💬 0 📌 0

An overview of the work “Research Borderlands: Analysing Writing Across Research Cultures” by Shaily Bhatt, Tal August, and Maria Antoniak. The overview describes that We survey and interview interdisciplinary researchers (§3) to develop a framework of writing norms that vary across research cultures (§4) and operationalise them using computational metrics (§5). We then use this evaluation suite for two large-scale quantitative analyses: (a) surfacing variations in writing across 11 communities (§6); (b) evaluating the cultural competence of LLMs when adapting writing from one community to another (§7).

🖋️ Curious how writing differs across (research) cultures?
🚩 Tired of “cultural” evals that don't consult people?

We engaged with interdisciplinary researchers to identify & measure ✨cultural norms✨in scientific writing, and show that❗LLMs flatten them❗

📜 arxiv.org/abs/2506.00784

[1/11]

09.06.2025 23:29 👍 72 🔁 30 💬 1 📌 5

When it comes to text prediction, where does one LM outperform another? If you've ever worked on LM evals, you know this question is a lot more complex than it seems. In our new #acl2025 paper, we developed a method to find fine-grained differences between LMs:

🧵1/9

09.06.2025 13:47 👍 72 🔁 21 💬 2 📌 2

NLP 4 Democracy - COLM 2025

📣 Super excited to organize the first workshop on ✨NLP for Democracy✨ at COLM @colmweb.org!!

Check out our website: sites.google.com/andrew.cmu.e...

Call for submissions (extended abstracts) due June 19, 11:59pm AoE

#COLM2025 #LLMs #NLP #NLProc #ComputationalSocialScience

21.05.2025 16:39 👍 47 🔁 18 💬 1 📌 6

Yes! tbh this method is probably much more immediately useful for helping one understand subtle differences between [models trained on] subtly different data subsets, vs a loftier goal of helping one find "the" best data mixture -- to anyone considering this method, please feel free to reach out :)

06.05.2025 04:16 👍 2 🔁 1 💬 0 📌 0

These days RAG systems have gotten popular for boosting LLMs—but they're brittle💔. Minor shifts in phrasing (✍️ style, politeness, typos) can wreck the pipeline. Even advanced components don’t fix the issue.

Check out this extensive eval by @neelbhandari.bsky.social and @tianyucao.bsky.social!

18.04.2025 01:49 👍 1 🔁 1 💬 0 📌 0

📖For our last @MilaNLProc lab seminar, it was a pleasure to have @akhilayerukola.bsky.social presenting "Need for Culturally Contextual Safety Guardrails: A Case Study in Non-Verbal Gestures".

14.03.2025 16:16 👍 8 🔁 3 💬 0 📌 0

🚀 New #ICLR2025 Paper Alert! 🚀

Can Audio Foundation Models like Moshi and GPT-4o truly engage in natural conversations? 🗣️🔊

We benchmark their turn-taking abilities and uncover major gaps in conversational AI. 🧵👇

📜: arxiv.org/abs/2503.01174

05.03.2025 16:03 👍 9 🔁 6 💬 1 📌 0

Check out Akhila'S VERY cool work on culturally contextual hand gestures and how current systems (can't) handle them 🤖

26.02.2025 22:29 👍 6 🔁 2 💬 0 📌 0

My PhD student Akhila's been doing some incredible cultural work in the last few years! Check out out latest work on cultural safety and hand gestures, showing most vision and/or language AI systems are very cross-culturally unsafe!

26.02.2025 17:20 👍 27 🔁 3 💬 0 📌 0

Also, this work began while I interned with Nanyun Peng and @skgabrie.bsky.social at sunny UCLA under the guidance of my advisor @maartensap.bsky.social ! Grateful for their mentorship throughout! 🙌

26.02.2025 18:29 👍 2 🔁 0 💬 0 📌 0

Special thanks to: @sunipadev.bsky.social @841io.bsky.social , @nouhadziri.bsky.social , Jocelyn Shen, @shaily99.bsky.social , @simi97k.bsky.social ,
@vijaytarian.bsky.social , @apratapa.xyz for helpful discussions and feedback on this work!

26.02.2025 16:22 👍 1 🔁 0 💬 1 📌 0

Huge shoutout to my amazing collaborators: @skgabrie.bsky.social, Nanyun (Violet) Peng, @maartensap.bsky.social!!

26.02.2025 16:22 👍 2 🔁 0 💬 1 📌 0

🚀 I'm passionate about developing culturally contextual safety guardrails to make AI more sensitive and aware. If this work interests you, please feel free to reach out—I’d love to connect!

26.02.2025 16:22 👍 1 🔁 0 💬 1 📌 0

Mind the Gesture: Evaluating AI Sensitivity to Culturally Offensive Non-Verbal Gestures Gestures are an integral part of non-verbal communication, with meanings that vary across cultures, and misinterpretations that can have serious social and diplomatic consequences. As AI systems becom...

For more interesting findings, please check out our preprint 📜 arxiv.org/abs/2502.17710

Data 📚 github.com/Akhila-Yeruk...

26.02.2025 16:22 👍 2 🔁 1 💬 1 📌 0

The cross-cultural safety risks aren’t theoretical – they’re already impacting several applications, such as:
✈️ AI-powered travel guides
🎭 AI-generated ad visuals
🤖 Automated content moderation
Culturally contextual safety guardrails are needed for AI systems!

26.02.2025 16:22 👍 1 🔁 0 💬 1 📌 0

🔬 Key Takeaway 🥉
All models—T2I, LLMs, and VLMs—exhibit US-centric biases, with higher accuracy in identifying offensive gestures in US contexts than in non-US ones (e.g., middle finger 🖕 in US vs UK)

26.02.2025 16:22 👍 0 🔁 0 💬 1 📌 0

🔬 Key Takeaway 🥈
All models—T2I, LLMs, and VLMs—often default to US-centric interpretations of universal concepts (e.g., "good luck" → 🤞), overlooking the cultural variation in gestures used to express them

26.02.2025 16:22 👍 1 🔁 0 💬 1 📌 0

🔬 Key Takeaway 🥇
(a) T2I models struggle to reject offensive gestures. LLMs tend to overflag gestures as offensive. VLMs show mixed results, with some performing near chance and others over-flagging
(b) Adding scene context doesnt affect LLMs but worsens T2I and VLM performance

26.02.2025 16:22 👍 1 🔁 0 💬 1 📌 0

Table outlining different prompt formulations used to evaluate T2I (Text-to-Image), LLM (Large Language Model), and VLM (Vision-Language Model) responses to gestures, illustrated with the ‘fingers-crossed’ gesture in Vietnam. The table categorizes prompts into three conditions: (1) Explicit: Country – directly stating both ‘fingers-crossed’ and 'Vietnam', (2) Explicit: Country + Scene – adding contextual details such as a 'women’s community gathering,' and (3) Implicit Mention – referencing the gesture’s meaning ('wishing someone luck') without explicitly naming the gesture, while still mentioning Vietnam. The table also specifies evaluation metrics: RQ1 and RQ3 focus on rejection and offensiveness classification rates, while RQ2 measures error rates.

We assess how well T2I systems, LLMs, and VLMs understand cross-cultural gestures—revealing gaps in AI’s ability to navigate nonverbal communication safely. 💫

26.02.2025 16:22 👍 1 🔁 0 💬 1 📌 0

Table displaying examples of aggregated annotations from MC-SIGNS, listing gestures, their associated cultural meanings, contexts where they may be inappropriate, and their offensiveness ratings. The table includes gestures such as 'Horns' in Brazil (infidelity), 'Fig Sign' in Indonesia (female genitalia), and 'OK' in Turkey (homophobic). Each gesture is rated for offensiveness (Off/Obs) or hatefulness (Hate) based on annotations from five evaluators, with specific scenarios suggested for avoidance, such as public spaces, professional settings, or LGBTQ+ forums.

🌍 Introducing MC-SIGNS — a testbed of 288 gesture-country pairs across 25 gestures & 85 countries, carefully annotated by cultural experts for:
1️⃣Offensiveness – how inappropriate a gesture is
2️⃣Confidence score
3️⃣Cultural meaning – associated gloss
4️⃣Contextual factors – when/where it may be risky

26.02.2025 16:22 👍 1 🔁 0 💬 1 📌 0

Why This Matters? 🤔
Humans can resolve such misunderstandings through social cues and context.
But AI? It generates STATIC content — ads 🎭, travel tips 🛫🏝️, and images 📸 — without accounting for the cross-cultural safety risks.

26.02.2025 16:22 👍 2 🔁 0 💬 1 📌 0

Figure showing that interpretations of gestures vary dramatically across regions and cultures. ‘Crossing your fingers,’ commonly used in the US to wish for good luck, can be deeply offensive to female audiences in parts of Vietnam. Similarly, the 'fig gesture,' a playful 'got your nose' game with children in the US, carries strong sexual connotations in Japan and can be highly offensive.

Did you know? Gestures used to express universal concepts—like wishing for luck—vary DRAMATICALLY across cultures?
🤞means luck in US but deeply offensive in Vietnam 🚨

📣 We introduce MC-SIGNS, a test bed to evaluate how LLMs/VLMs/T2I handle such nonverbal behavior!

📜: arxiv.org/abs/2502.17710

26.02.2025 16:22 👍 33 🔁 7 💬 1 📌 3

Coding agents often don’t ask follow-up clarifying questions 🤷‍♀️

But interactivity isn’t about asking more questions—it’s about asking better questions! 🤖💬

Check out this new work led by Sanidhya Vijay! www.linkedin.com/in/sanidhya-...

19.02.2025 20:34 👍 2 🔁 0 💬 0 📌 0

could I be added too?

22.11.2024 04:36 👍 1 🔁 0 💬 0 📌 0

🙋‍♀️

21.11.2024 00:48 👍 1 🔁 0 💬 0 📌 0

Akhila Yerukola

Latest posts by Akhila Yerukola @akhilayerukola