Sophie Hao (@profsophie)

Ashima Suvarna🌻 on X: "1/ 🧵 New #EMNLP2025 Paper !! Toxicity detection is subjective; shaped by norms, identity, & context. Existing models and dataset overlook this nuance. Enter MODELCITIZENS: a new dataset designed to address this. ✔️ 6.8K posts, 40K annotations across diverse groups ✔️" / X 1/ 🧵 New #EMNLP2025 Paper !! Toxicity detection is subjective; shaped by norms, identity, & context. Existing models and dataset overlook this nuance. Enter MODELCITIZENS: a new dataset designed to address this. ✔️ 6.8K posts, 40K annotations across diverse groups ✔️

(3/3) See full thread on X: x.com/suvarna_ashi...

26.08.2025 18:14 👍 0 🔁 0 💬 0 📌 0

(2/3) Toxicity detection is shaped by norms, identity, & context, which existing approaches overlook. Enter MODELCITIZENS: a new dataset designed to address this.
✔️ 6.8K posts, 40K annotations across diverse groups
✔️ Context-augmented scenarios
✔️ New fine-tuned models that beat GPT-4o-mini by 5.5%

26.08.2025 18:14 👍 0 🔁 0 💬 1 📌 0

ModelCitizens: Representing Community Voices in Online Safety Automatic toxic language detection is critical for creating safe, inclusive online spaces. However, it is a highly subjective task, with perceptions of toxic language shaped by community norms and liv...

(1/3) Please check out our new paper with @skgabrie.bsky.social and her amazing students, to appear in #EMNLP2025!

(🚨 Offensive Content Warning)

arxiv.org/abs/2507.05455

26.08.2025 18:12 👍 2 🔁 1 💬 1 📌 0

I'm not personally attached to the generative linguistics apparatus per se, but I was asked by the journal to write this paper as a response to another paper, and that paper is primarily opining about the possible "end of (generative) linguistics as we know it."

14.04.2025 18:57 👍 1 🔁 0 💬 1 📌 0

I didn't say that social relevance will guarantee generative linguistics's survival (note that there is a subtle difference between "theoretical" and "generative"), but rather that social irrelevance will likely guarantee its demise.

14.04.2025 18:53 👍 1 🔁 0 💬 1 📌 0

I'm glad you liked it! (I am the author)

There are a couple of points of incommensurability between your reaction and my intentions in writing this piece, which I'll explain below.

14.04.2025 18:51 👍 1 🔁 0 💬 0 📌 0

I keep thinking "Bluesky" is a Slavic patronymic

16.12.2024 05:42 👍 4 🔁 0 💬 0 📌 0

Screenshot of the paper title "What Goes Into a LM Acceptability Judgment? Rethinking the Impact of Frequency and Length"

💬 Have you or a loved one compared LM probabilities to human linguistic acceptability judgments? You may be overcompensating for the effect of frequency and length!
🌟 In our new paper, we rethink how we should be controlling for these factors 🧵:

20.11.2024 18:07 👍 84 🔁 19 💬 1 📌 4

Sophie Hao

Latest posts by Sophie Hao @profsophie