Melanie Sclar's Avatar

Melanie Sclar

@melaniesclar

PhD student @uwnlp.bsky.social @uwcse.bsky.social | Visiting Researcher @MetaAI FAIR | Prev. Lead ML Engineer @ASAPP | πŸ‡¦πŸ‡·

868
Followers
69
Following
8
Posts
11.11.2024
Joined
Posts Following

Latest posts by Melanie Sclar @melaniesclar

Check out our work on preference modeling through latent (& interpretable) attribute representation learning!

PrefPalette allows you to understand _why_ something is preferred and _how_ preference varies depending on context 🎨

22.07.2025 19:52 πŸ‘ 4 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

WHY do you prefer something over another?

Reward models treat preference as a black-boxπŸ˜Άβ€πŸŒ«οΈbut human brains🧠decompose decisions into hidden attributes

We built the first system to mirror how people really make decisions in our recent COLM paper🎨PrefPalette✨

Why it mattersπŸ‘‰πŸ»πŸ§΅

22.07.2025 14:58 πŸ‘ 7 πŸ” 2 πŸ’¬ 1 πŸ“Œ 2

See our work on procedurally generating challenging reasoning problems on detecting inconsistencies in stories! FlawedFictions is a great example of what I'm most excited about: reliable synthetic data for reasoning in under-explored domains.

(I'm at ICLR to chat, DMs open!)

24.04.2025 02:26 πŸ‘ 3 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Excited to be at #ICLR2025 🀩

I'll be giving an oral presentation for Creativity Index on Fri 25th 11:06, Garnet 212&219 πŸŽ™οΈ

I'll also be presenting posters:
πŸ“ExploreToM, Sat 26th 10:00, Hall 3 + 2B #49
πŸ“CreativityIndex, Fri 25th 15:00, Hall 3 + 2B #618

Hope to see you there!

24.04.2025 02:25 πŸ‘ 8 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
A screenshot of the first page of the paper, containing the paper title: Finding Flawed Fictions: Evaluating Complex Reasoning in Language Models via Plot Hole Detection and the names of the authors: Kabir Ahuja, Melanie Sclar, and Yulia Tsvetkov. All the three authors are from CSE department in the University of Washington in Seattle, USA. They can be reached at {kahuja,msclar,yuliats}@cs.washington.edu

A screenshot of the first page of the paper, containing the paper title: Finding Flawed Fictions: Evaluating Complex Reasoning in Language Models via Plot Hole Detection and the names of the authors: Kabir Ahuja, Melanie Sclar, and Yulia Tsvetkov. All the three authors are from CSE department in the University of Washington in Seattle, USA. They can be reached at {kahuja,msclar,yuliats}@cs.washington.edu

πŸ“’ New Paper!

Tired 😴 of reasoning benchmarks full of math & code? In our work we consider the problem of reasoning for plot holes in stories -- inconsistencies in a storyline that break the internal logic or rules of a story’s world 🌎

W @melaniesclar.bsky.social, and @tsvetshop.bsky.social

1/n

22.04.2025 18:50 πŸ‘ 10 πŸ” 4 πŸ’¬ 1 πŸ“Œ 1
Post image

🚨New Paper! So o3-mini and R1 seem to excel on math & coding. But how good are they on other domains where verifiable rewards are not easily available, such as theory of mind (ToM)? Do they show similar behavioral patterns? πŸ€” What if I told you it's...interesting, like the below?🧡

20.02.2025 17:34 πŸ‘ 22 πŸ” 5 πŸ’¬ 3 πŸ“Œ 1

Would love to be added, thank you!

25.11.2024 02:41 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Great point! In general, higher temperature does lead to a higher creativity index, but when compared to the gap between human and LLMs, the improvement is minimal. We only tried temperature in the usual [0, 1] range. @gximing.bsky.social will be able to share many more details!

23.11.2024 20:53 πŸ‘ 3 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
A screenshot from the linked paper's figure 1. The figure is a pretty-complicated three column figure, but --- in essence, it sketches out how the authors compare llm sequences to the pretraining data / human authors to the pretraining data. Humans write more novel n-gram sequences.

A screenshot from the linked paper's figure 1. The figure is a pretty-complicated three column figure, but --- in essence, it sketches out how the authors compare llm sequences to the pretraining data / human authors to the pretraining data. Humans write more novel n-gram sequences.

LLMs generate novel word sequences not contained in their pretraining data. However, compared to humans, models generate significantly fewer novel n-grams.

RLHF = 30% *more* copying than base!

Awesome work from the awesome Ximing Lu (gloriaximinglu.github.io) et al. 🀩

arxiv.org/pdf/2410.04265

22.11.2024 06:14 πŸ‘ 314 πŸ” 46 πŸ’¬ 6 πŸ“Œ 2
Post image

Are LLMs πŸ€– as creative as humans πŸ‘©β€πŸŽ“? Not quite!

Introducing CREATIVITY INDEX: a metric that quantifies the linguistic creativity of a text by reconstructing it from existing text snippets on the web. Spoiler: professional human writers like Hemingway are still far more creative than LLMs! 😲

22.11.2024 02:00 πŸ‘ 43 πŸ” 6 πŸ’¬ 3 πŸ“Œ 3

Does last name count? πŸ˜›

21.11.2024 10:06 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Would love to be added, thank you!!

21.11.2024 06:39 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Would love to be on the list, thank you so much for making this happen!

19.11.2024 06:57 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0