Sophie Hao's Avatar

Sophie Hao

@profsophie

Assistant professor of Linguistics and Data Science at Boston University. NLP, computational linguistics, interpretability, social bias and fairness. she/her. https://www.notaphonologist.com/

256
Followers
329
Following
7
Posts
20.11.2024
Joined
Posts Following

Latest posts by Sophie Hao @profsophie

Preview
Ashima Suvarna🌻 on X: "1/ 🧡 New #EMNLP2025 Paper !! Toxicity detection is subjective; shaped by norms, identity, & context. Existing models and dataset overlook this nuance. Enter MODELCITIZENS: a new dataset designed to address this. βœ”οΈ 6.8K posts, 40K annotations across diverse groups βœ”οΈ" / X 1/ 🧡 New #EMNLP2025 Paper !! Toxicity detection is subjective; shaped by norms, identity, & context. Existing models and dataset overlook this nuance. Enter MODELCITIZENS: a new dataset designed to address this. βœ”οΈ 6.8K posts, 40K annotations across diverse groups βœ”οΈ

(3/3) See full thread on X: x.com/suvarna_ashi...

26.08.2025 18:14 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

(2/3) Toxicity detection is shaped by norms, identity, & context, which existing approaches overlook. Enter MODELCITIZENS: a new dataset designed to address this.
βœ”οΈ 6.8K posts, 40K annotations across diverse groups
βœ”οΈ Context-augmented scenarios
βœ”οΈ New fine-tuned models that beat GPT-4o-mini by 5.5%

26.08.2025 18:14 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
ModelCitizens: Representing Community Voices in Online Safety Automatic toxic language detection is critical for creating safe, inclusive online spaces. However, it is a highly subjective task, with perceptions of toxic language shaped by community norms and liv...

(1/3) Please check out our new paper with @skgabrie.bsky.social and her amazing students, to appear in #EMNLP2025!

(🚨 Offensive Content Warning)

arxiv.org/abs/2507.05455

26.08.2025 18:12 πŸ‘ 2 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0

I'm not personally attached to the generative linguistics apparatus per se, but I was asked by the journal to write this paper as a response to another paper, and that paper is primarily opining about the possible "end of (generative) linguistics as we know it."

14.04.2025 18:57 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

I didn't say that social relevance will guarantee generative linguistics's survival (note that there is a subtle difference between "theoretical" and "generative"), but rather that social irrelevance will likely guarantee its demise.

14.04.2025 18:53 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

I'm glad you liked it! (I am the author)

There are a couple of points of incommensurability between your reaction and my intentions in writing this piece, which I'll explain below.

14.04.2025 18:51 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

I keep thinking "Bluesky" is a Slavic patronymic

16.12.2024 05:42 πŸ‘ 4 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Screenshot of the paper title "What Goes Into a LM Acceptability Judgment? Rethinking the Impact of Frequency and Length"

Screenshot of the paper title "What Goes Into a LM Acceptability Judgment? Rethinking the Impact of Frequency and Length"

πŸ’¬ Have you or a loved one compared LM probabilities to human linguistic acceptability judgments? You may be overcompensating for the effect of frequency and length!
🌟 In our new paper, we rethink how we should be controlling for these factors 🧡:

20.11.2024 18:07 πŸ‘ 84 πŸ” 19 πŸ’¬ 1 πŸ“Œ 4