Sarah Griebel's Avatar

Sarah Griebel

@sgriebel

IS PhD student at UIUC

122
Followers
129
Following
3
Posts
22.11.2024
Joined
Posts Following

Latest posts by Sarah Griebel @sgriebel

This new study uses continued pretraining with historical documents on Qwen2.5, along with supervised fine-tuning and reinforcement learning for a more historically accurate CoT-tuned model. Cool methods! arxiv.org/pdf/2504.09488

26.05.2025 21:26 πŸ‘ 7 πŸ” 1 πŸ’¬ 0 πŸ“Œ 1
Preview
Can Language Models Represent the Past without Anachronism? Before researchers can use language models to simulate the past, they need to understand the risk of anachronism. We find that prompting a contemporary model with examples of period prose does not pro...

New preprint from @lauraknelson.bsky.social, @mattwilkens.bsky.social, and myself tests different ways of simulating the past with LLMs. We don't fully answer the title question hereβ€”just show that simple strategies based on prompting and fine-tuning are insufficient. +

02.05.2025 12:47 πŸ‘ 179 πŸ” 56 πŸ’¬ 7 πŸ“Œ 3
Table 1 from the paper. The table description reads: RΒ² for different representations of text on different social variables. 0.25 indicates that documents were represented by the quartile of passages with highest precocity; 1.0, represented by all passages.

Table 1 from the paper. The table description reads: RΒ² for different representations of text on different social variables. 0.25 indicates that documents were represented by the quartile of passages with highest precocity; 1.0, represented by all passages.

In every case, we find that the most pioneering quarters of the texts correspond most closely with social evidence.

Here’s a link to our paper: arxiv.org/abs/2411.15068

26.11.2024 21:30 πŸ‘ 7 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Abstract: Measures of textual similarity and divergence are increasingly used to study cultural change. But which measures align, in practice, with social evidence about change? We apply three different representations of text (topic models, document embeddings, and word-level perplexity) to three different corpora (literary studies, economics, and fiction). In every case, works by highly-cited authors and younger authors are textually ahead of the curve. We don't find clear evidence that one representation of text is to be preferred over the others. But alignment with social evidence is strongest when texts are represented through the top quartile of passages, suggesting that a text's impact may depend more on its most forward-looking moments than on sustaining a high level of innovation throughout.

Abstract: Measures of textual similarity and divergence are increasingly used to study cultural change. But which measures align, in practice, with social evidence about change? We apply three different representations of text (topic models, document embeddings, and word-level perplexity) to three different corpora (literary studies, economics, and fiction). In every case, works by highly-cited authors and younger authors are textually ahead of the curve. We don't find clear evidence that one representation of text is to be preferred over the others. But alignment with social evidence is strongest when texts are represented through the top quartile of passages, suggesting that a text's impact may depend more on its most forward-looking moments than on sustaining a high level of innovation throughout.

There are many ways to identify texts that seem ahead of their time. Our CHR 2024 paper asks which measures of textual precocity align best with social evidence about influence and change.

26.11.2024 21:25 πŸ‘ 40 πŸ” 7 πŸ’¬ 1 πŸ“Œ 2