Powerful LLMs and agent workflows have led to a whole lot of very specific "we did a thing" papers. How are people evaluating these?
Powerful LLMs and agent workflows have led to a whole lot of very specific "we did a thing" papers. How are people evaluating these?
Screenshot of plot showing ELO vs paramter count for different OCR models
There is no best VLM OCR model - rankings can flip completely by document type.
I built ocr-bench: run open OCR models on YOUR documents, get a per-collection leaderboard.
VLM-as-judge with Bradley-Terry ELO, all running on @hf.co. No local GPU needed.
The center of gravity in NLP is shifting. π
This year's #EMNLP2026 Special Theme is "New Missions for NLP Research." We welcome empirical, theoretical, or position and survey papers that reframe our collective research goals.
Find out more:
2026.emnlp.org/calls/main_c...
Excellent venue for computational humanities work, colocated with ACL in San Diego on July 6. Please share!
Lots of museums have good datasets. What methods are they learning?
Here is the announcement for Cornellβs talk
π¨ NLP4DH 2026 deadline has been extended to March 13! Submission link here: openreview.net/group?id=NLP...
Come join TRAILS as a postdoc at UMD (and work w folks at GW, MSU & Cornell) to conduct research and scholarship focused on approaches to AI that advance trust and trustworthiness with a great group of colleagues!
π go.umd.edu/trails-postd...
ποΈ Summer/Fall 2026 start
Fork away!
My favorite finding: "Surprisingly, a minimal set of eight words is sufficient to obtain 0.74 AUC on the training and test sets without any degradation in test performance. These words are of, in, to, had to indicate longer duration and you, said, it, he to indicate shorter duration."
I got frustrated copying quotes from PDFs with line breaks, and used Claude to make this little tool: mimno.github.io/copyoneline/
Paste text into the box, it removes newlines and puts the result back in your clipboard, adding quotation marks if desired.
1/
New preprint! π§΅ Can stories make arguments more persuasive? And which narrative features matter most? In ARGUS we build a framework to study this in Reddit's r/ChangeMyView, with @saranabhani.bsky.social , Khalid Al-Khatib, and @malvinanissim.bsky.social
arxiv.org/abs/2602.24109
Yes! Many libraries have a good process for depositing a zip file but an interactive page isnβt their strength. And having a thing people can cite is great for academic visibility.
Text-based data files in a GitHub repository with an AI-prototyped web front end running on GitHub pages?
Not an archival solution, but a good compromise of user access, data access, and dev cost.
Do you want to improve your knowledge of medieval manuscripts from England? Book now for this summer school course, in person, in London, 8-12 June. πβοΈπ #medievalsky please repost!
palaeography.uk/study/short-...
A picture of Joe Halpern smiling in green shirt in front of a blue background.
Today arXiv remembers our colleague Joe Halpern, who was instrumental in founding arXiv's CS section.
Joe's passions ranged far & wide and we're lucky that arXiv was one of them. Joe, thank you for giving so much to arXiv - you are missed.
blog.arxiv.org/2026/02/27/remembering-joe-halpern
One mismatch I see is institutional AI seems focused on enabling model training, but most of the applications I see are inference. Batch uses like: "upload a spreadsheet of prompts, get results back in a few hours" or "apply this prompt to these volumes and return a spreadsheet".
Circular rainbow zen pic
What do you have in mind? Libraries need exciting initiatives. Mostly it's "how do we deal with this year's budget cut" and "how do we deal with the latest demands from publishers"
β¦ or build it out of heavy, low-value, durable materials in a remote and inaccessible part of a vast desert
Building benchmarks is only one way scholars can help steer AI development. We can also measure the effects of AI on students, build better datasets, or tune new open models. Openness itself could be our most important contribution. Universities have huge libraries, and the legal doctrine of fair use should protect models trained on those collections for a nonprofit educational purpose. At the moment, we are not pressing this advantage. Higher education has been so cautious about fair use that the private sector can now train more freely on our libraries (via Google Books) than is possible for academic AI researchers. We need to be bolder: It is our duty to ensure library collections remain open to the public in a form that empowers 21st-century readers. If our intellectual heritage gets enclosed in proprietary tools, we will find ourselves making the same bad bargain we made with scientific publishers, who sell our own research back to us at a steep markup.
We're in a strange situation rn where Google can train freely on books from university librariesβbut researchers *at* universities have limited access. I'm optimistic this can be fixed, but if you're in admin or working at a foundation, please know: univs are failing here & resources are needed.
Only a few more days (full consideration deadline: March 1) to apply to our lecturer position at @cornelltech.bsky.social!
Students have perceived problem sets as real and reading as optional because for psets itβs much easier to verify that they did something. Ironically, AI has leveled the playing field to a new low baseline.
announcing the March Sadness 1990s edition bracket
marchxness.com - brackets due no later than midnight 2/28/26
Fun game for the history nerds! Note that youβll need to specify CE or BCE sometimes. I got within twenty years for nine out of ten but was out by a century for the other which Iβm feeling slightly sheepish about.
when-was-this-war.web.app
I've been getting a lot of questions recently about optimizing annotation workflows β many new NLP projects are starting atm! β¨
To share some of our tips, I put together a blog post featuring examples inspired by real use cases and a checklist to help you get started.
explosion.ai/blog/optimiz...
US copyright only applies to works with human authors, so terms of service violation is the only possible complaint that Anthropocene can make, right?
Can artists and authors impose terms of service? Or are they restricted to copyright?
On Tuesday, Feb. 24 at 1PM eastern, we're hosting a second webinar for the @schmidtsciences.bsky.social HAVI program. Feel free to jump on if you'd like to learn more about the program. Register here: schmidtentities.zoom.us/webinar/regi...
π¨ HIRING π¨
The I School invites applications for up to three new full-time Bellwether Postdoctoral Scholars to start as soon as July 2026!
This program will allow researchers to develop their own research while collaborating with leading faculty.
Next review date is Feb 28! #academicsky
Nice write-up by @uwnews.uw.edu about our research into the most read canonical American authors in Seattle, drawing on library data.
It was so fun to work on this project with @neel2112.bsky.social and a stellar group of undergraduate students.
www.washington.edu/news/2026/01...