LLMs are increasingly used as agents for decisions under uncertainty, e.g. medical diagnosis. But do they act like rational agents with coherent beliefs and preferences? Much of the difficulty is telling whether a model's response to.a prompt ("What is the probability of X?") is a "real" belief.
09.02.2026 22:10
π 3
π 1
π¬ 1
π 0
In applications based on medical diagnosis, the answer is...sometimes! In some settings, we can prove that no rational agent could hold beliefs expressed by the model. But in others, particularly for stronger models, outputs are close to consistent with rational belief
09.02.2026 22:10
π 0
π 0
π¬ 1
π 0
We give a framework to test whether the model's stated belief functions *as if it were* a rational agent's subjective probability by comparing with its decisions. We give empirically checkable conditions that don't require any assumptions about the model's "utility function".
09.02.2026 22:10
π 0
π 0
π¬ 1
π 0
You might think that models don't have coherent beliefs at all. Or, you might think that they don't report truthfully in response to any given prompt. How could we possibly tell?
09.02.2026 22:10
π 0
π 0
π¬ 1
π 0
LLMs are increasingly used as agents for decisions under uncertainty, e.g. medical diagnosis. But do they act like rational agents with coherent beliefs and preferences? Much of the difficulty is telling whether a model's response to.a prompt ("What is the probability of X?") is a "real" belief.
09.02.2026 22:10
π 3
π 1
π¬ 1
π 0
Totally agree! I think the fundamental distinction is more between people using AI in their own work vs AI being in a decision-making role that everyone is subject to
05.12.2025 22:07
π 2
π 0
π¬ 0
π 0
As UKRI explores using LLMs to review grants, it's a good time to revisit Bryan Wilder's excellent blog post.
There are a lot of naive reasons to oppose AI review ("you'll never automate human intuition!"). But there are also good reasons, including the *load-bearing role of human disagreement.*
05.12.2025 15:25
π 17
π 3
π¬ 4
π 0
Come talk to me and Angela at NeurIPS on Friday! We argue that "AI for social impact" needs to get more rigorous about evaluating deployments of AI, but also that there are many other forms of impact that get overlooked right now
01.12.2025 19:41
π 9
π 0
π¬ 0
π 0
I gave talks at MIT and Harvard this week about "Science with synthetic data". How can generative models help us learn about the actual world (e.g., social systems) in a principled way? Lots of interesting conversations -- more convinced than ever that there's nuanced issues to navigate here.
14.11.2025 19:02
π 6
π 1
π¬ 1
π 0
Iβm recruiting students this upcoming cycle at UIUC! Iβm excited about Qs on societal impact of AI, especially human-AI collaboration, multi-agent interactions, incentives in data sharing, and AI policy/regulation (all from both a theoretical and applied lens). Apply through CS & select my name!
06.11.2025 18:52
π 41
π 18
π¬ 1
π 0
How can synthetic data from LLMs be used, e.g. for social science, in a principled way? Check out Emily's thread on our NeurIPS paper! Generating paired real-synthetic samples and using both in a method-of-moments framework enables valid inference that benefits when synthetic data is informative.
10.10.2025 16:39
π 8
π 0
π¬ 0
π 0
Urban Data Science & Equitable Cities | EAAMO Bridges
EAAMO Bridges Urban Data Science & Equitable Cities working group: biweekly talks, paper studies, and workshops on computational urban data analysis to explore and address inequities.
Are you a researcher using computational methods to understand cities?
@mfranchi.bsky.social @jennahgosciak.bsky.social and I organize an EAAMO Bridges working group on Urban Data Science and we are looking for new members!
Fill the interest form on our page: urban-data-science-eaamo.github.io
03.09.2025 15:05
π 8
π 8
π¬ 1
π 1
Screenshot of paper abstract, with text: "A core ethos of the Economics and Computation (EconCS) community is that people have complex private preferences and information of which the central planner is unaware, but which an appropriately designed mechanism can uncover to improve collective decisionmaking. This ethos underlies the communityβs largest deployed success stories, from stable matching systems to participatory budgeting. I ask: is this choice and information aggregation βworth itβ? In particular, I discuss how such systems induce heterogeneous participation: those already relatively advantaged are, empirically, more able to pay time costs and navigate administrative burdens imposed by the mechanisms. I draw on three case studies, including my own work β complex democratic mechanisms, resident crowdsourcing, and school matching. I end with lessons for practice and research, challenging the community to help reduce participation heterogeneity and design and deploy mechanisms that meet a βbest of both worldsβ north star: use preferences and information from those who choose to participate, but provide a βsufficientβ quality of service to those who do not."
New piece, out in the Sigecom Exchanges! It's my first solo-author piece, and the closest thing I've written to being my "manifesto." #econsky #ecsky
arxiv.org/abs/2507.03600
11.08.2025 13:25
π 44
π 9
π¬ 2
π 3
Call for Posters
We seek poster contributions from different fields that offer insights into the intersectional design and impacts of algorithms, optimization, and mechanism design with a grounding in the social scien...
Submit an abstract to present a poster at EAAMO, deadline July 25! EAAMO is one of my favorite conferences, and a great place for anyone working on ML/algorithms/optimization in social settings. The conference is in Pittsburgh this November.
conference.eaamo.org/cfp/call_for...
16.07.2025 14:57
π 1
π 1
π¬ 0
π 0
My takeaway is that algorithm designers should think more broadly about the goals for algorithms in policy settings. It's tempting to just train ML models to maximize predictive performance, but services might be improved a lot with even modest alterations for other goals.
08.07.2025 14:59
π 2
π 0
π¬ 0
π 0
Using historical data from human services, we then look at how severe learning-targeting tradeoffs really are. It turns out, not that bad! We get most of the possible targeting performance while giving up only a little bit of learning compared to the ideal RCT.
08.07.2025 14:59
π 0
π 0
π¬ 1
π 0
We introduce a framework for designing allocation policies that optimally trade off between targeting high-need people and learning a treatment effect as accurately as possible. We give efficient algorithms and finite-sample guarantees using a duality-based characterization of the optimal policy.
08.07.2025 14:59
π 0
π 0
π¬ 1
π 0
A big factor is that randomizing conflicts with the targeting goal: running a RCT means that people with high predicted risk won't get prioritized for treatment. We wanted to know how sharp the tradeoff really is: does learning treatment effects require giving up on targeting entirely?
08.07.2025 14:59
π 0
π 0
π¬ 1
π 0
These days, public services are often targeted with predictive algorithms. Targeting helps prioritize people who might be most in need. But, we don't typically have good causal evidence about whether the program we're targeting actually improves outcomes. Why not run RCTs?
08.07.2025 14:59
π 0
π 0
π¬ 1
π 0
Learning treatment effects while treating those in need
Many social programs attempt to allocate scarce resources to people with the greatest need. Indeed, public services increasingly use algorithmic risk assessments motivated by this goal. However, targe...
Excited to share that our paper "Learning treatment effects while treating those in need" received the exemplary paper award for AI at EC 2025! This paper grew out collaborations with Allegheny County's human services department and my co-author Pim Welle (at ACDHS).
arxiv.org/abs/2407.07596
08.07.2025 14:59
π 25
π 1
π¬ 1
π 0
Excited to have this work out at ICML this year! Do LLMs make correlated errors? Yes, and those by the same company, and also more accurate/later generations are more correlated -- increasing algorithmic monoculture
arxiv.org/abs/2506.07962
03.07.2025 13:06
π 38
π 3
π¬ 1
π 3
Still thinking about this post. The broader point, which should resonate way beyond the specific issue of "peer review," is that human disagreement is not friction and waste. It's a load-bearing, functional part of social and intellectual systems.
28.05.2025 13:41
π 133
π 31
π¬ 12
π 1
I don't know one way or another, but it's at least a clearer capability to benchmark. And, if a LLM *could* summarize well enough on existing papers, arxiv using it for lower-bar moderation decisions wouldn't distort paper-writing in the future.
27.05.2025 14:24
π 0
π 0
π¬ 0
π 0
Thoughtful take on one aspect of the increasing problem of LLMs leading to βcentralizationβ of thought/writing/etc.
26.05.2025 23:45
π 3
π 1
π¬ 0
π 0
The arXiv summarization use case sounds a lot more sensible. Clear value judgment specified up-front, not outsourced to the LLM: papers should have easily summarized claims and evidence. Resulting incentives for authors seem ok (making sure LLMs can at least parse the paper probably isn't bad).
27.05.2025 02:13
π 3
π 0
π¬ 1
π 0