Bryan Wilder's Avatar

Bryan Wilder

@brwilder

Assistant Professor at Carnegie Mellon. Machine Learning and social impact. https://bryanwilder.github.io/

1,206
Followers
218
Following
25
Posts
14.11.2024
Joined
Posts Following

Latest posts by Bryan Wilder @brwilder

LLMs are increasingly used as agents for decisions under uncertainty, e.g. medical diagnosis. But do they act like rational agents with coherent beliefs and preferences? Much of the difficulty is telling whether a model's response to.a prompt ("What is the probability of X?") is a "real" belief.

09.02.2026 22:10 πŸ‘ 3 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0
Preview
Do LLMs Act Like Rational Agents? Measuring Belief Coherence in Probabilistic Decision Making Large language models (LLMs) are increasingly deployed as agents in high-stakes domains where optimal actions depend on both uncertainty about the world and consideration of utilities of different out...

Paper here: arxiv.org/abs/2602.06286. Led by my excellent PhD student Khurram Yamin

09.02.2026 22:10 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

In applications based on medical diagnosis, the answer is...sometimes! In some settings, we can prove that no rational agent could hold beliefs expressed by the model. But in others, particularly for stronger models, outputs are close to consistent with rational belief

09.02.2026 22:10 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

We give a framework to test whether the model's stated belief functions *as if it were* a rational agent's subjective probability by comparing with its decisions. We give empirically checkable conditions that don't require any assumptions about the model's "utility function".

09.02.2026 22:10 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

You might think that models don't have coherent beliefs at all. Or, you might think that they don't report truthfully in response to any given prompt. How could we possibly tell?

09.02.2026 22:10 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

LLMs are increasingly used as agents for decisions under uncertainty, e.g. medical diagnosis. But do they act like rational agents with coherent beliefs and preferences? Much of the difficulty is telling whether a model's response to.a prompt ("What is the probability of X?") is a "real" belief.

09.02.2026 22:10 πŸ‘ 3 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0

Totally agree! I think the fundamental distinction is more between people using AI in their own work vs AI being in a decision-making role that everyone is subject to

05.12.2025 22:07 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

As UKRI explores using LLMs to review grants, it's a good time to revisit Bryan Wilder's excellent blog post.

There are a lot of naive reasons to oppose AI review ("you'll never automate human intuition!"). But there are also good reasons, including the *load-bearing role of human disagreement.*

05.12.2025 15:25 πŸ‘ 17 πŸ” 3 πŸ’¬ 4 πŸ“Œ 0

Come talk to me and Angela at NeurIPS on Friday! We argue that "AI for social impact" needs to get more rigorous about evaluating deployments of AI, but also that there are many other forms of impact that get overlooked right now

01.12.2025 19:41 πŸ‘ 9 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Valid Inference with Imperfect Synthetic Data Predictions and generations from large language models are increasingly being explored as an aid in limited data regimes, such as in computational social science and human subjects research. While pri...

Based on joint work led by @yewonbyun.bsky.social, with @donskerclass.bsky.social. See our NeurIPS paper, arxiv.org/abs/2508.06635, for more!

14.11.2025 19:02 πŸ‘ 5 πŸ” 0 πŸ’¬ 0 πŸ“Œ 1
Post image

I gave talks at MIT and Harvard this week about "Science with synthetic data". How can generative models help us learn about the actual world (e.g., social systems) in a principled way? Lots of interesting conversations -- more convinced than ever that there's nuanced issues to navigate here.

14.11.2025 19:02 πŸ‘ 6 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0
Post image

I’m recruiting students this upcoming cycle at UIUC! I’m excited about Qs on societal impact of AI, especially human-AI collaboration, multi-agent interactions, incentives in data sharing, and AI policy/regulation (all from both a theoretical and applied lens). Apply through CS & select my name!

06.11.2025 18:52 πŸ‘ 41 πŸ” 18 πŸ’¬ 1 πŸ“Œ 0
Call for Proposals: Host the 2026 ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization! EAAMO is seeking proposals from universities, institutes and other appropriate venues interested in hosting the 2026 ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization (AC...

We're in the process of selecting the location for next year's ACM EAAMO conference! If you're interested in bringing the EAAMO community to your institution, please check out the open call here and get in touch. conference.eaamo.org/call_for_loc...

28.10.2025 14:58 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

How can synthetic data from LLMs be used, e.g. for social science, in a principled way? Check out Emily's thread on our NeurIPS paper! Generating paired real-synthetic samples and using both in a method-of-moments framework enables valid inference that benefits when synthetic data is informative.

10.10.2025 16:39 πŸ‘ 8 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Urban Data Science & Equitable Cities | EAAMO Bridges EAAMO Bridges Urban Data Science & Equitable Cities working group: biweekly talks, paper studies, and workshops on computational urban data analysis to explore and address inequities.

Are you a researcher using computational methods to understand cities?

@mfranchi.bsky.social @jennahgosciak.bsky.social and I organize an EAAMO Bridges working group on Urban Data Science and we are looking for new members!

Fill the interest form on our page: urban-data-science-eaamo.github.io

03.09.2025 15:05 πŸ‘ 8 πŸ” 8 πŸ’¬ 1 πŸ“Œ 1
Screenshot of paper abstract, with text: "A core ethos of the Economics and Computation (EconCS) community is that people have complex private preferences and information of which the central planner is unaware, but which an appropriately designed mechanism can uncover to improve collective decisionmaking. This ethos underlies the community’s largest deployed success stories, from stable matching systems to participatory budgeting. I ask: is this choice and information aggregation β€œworth it”? In particular, I discuss how such systems induce heterogeneous participation: those already relatively advantaged are, empirically, more able to pay time costs and navigate administrative burdens imposed by the mechanisms. I draw on three case studies, including my own work – complex democratic mechanisms, resident crowdsourcing, and school matching. I end with lessons for practice and research, challenging the community to help reduce participation heterogeneity and design and deploy mechanisms that meet a β€œbest of both worlds” north star: use preferences and information from those who choose to participate, but provide a β€œsufficient” quality of service to those who do not."

Screenshot of paper abstract, with text: "A core ethos of the Economics and Computation (EconCS) community is that people have complex private preferences and information of which the central planner is unaware, but which an appropriately designed mechanism can uncover to improve collective decisionmaking. This ethos underlies the community’s largest deployed success stories, from stable matching systems to participatory budgeting. I ask: is this choice and information aggregation β€œworth it”? In particular, I discuss how such systems induce heterogeneous participation: those already relatively advantaged are, empirically, more able to pay time costs and navigate administrative burdens imposed by the mechanisms. I draw on three case studies, including my own work – complex democratic mechanisms, resident crowdsourcing, and school matching. I end with lessons for practice and research, challenging the community to help reduce participation heterogeneity and design and deploy mechanisms that meet a β€œbest of both worlds” north star: use preferences and information from those who choose to participate, but provide a β€œsufficient” quality of service to those who do not."

New piece, out in the Sigecom Exchanges! It's my first solo-author piece, and the closest thing I've written to being my "manifesto." #econsky #ecsky
arxiv.org/abs/2507.03600

11.08.2025 13:25 πŸ‘ 44 πŸ” 9 πŸ’¬ 2 πŸ“Œ 3
Call for Posters We seek poster contributions from different fields that offer insights into the intersectional design and impacts of algorithms, optimization, and mechanism design with a grounding in the social scien...

Submit an abstract to present a poster at EAAMO, deadline July 25! EAAMO is one of my favorite conferences, and a great place for anyone working on ML/algorithms/optimization in social settings. The conference is in Pittsburgh this November.

conference.eaamo.org/cfp/call_for...

16.07.2025 14:57 πŸ‘ 1 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
Call for Posters We seek poster contributions from different fields that offer insights into the intersectional design and impacts of algorithms, optimization, and mechanism design with a grounding in the social scien...

ACM EAAMO, which is coming to Pitt this Fall, has two events for students: a doctoral consortium and a poster session, both of which are due July 25th
- poster session conference.eaamo.org/cfp/call_for...
- doctoral consortium
conference.eaamo.org/cfp/call_for...

15.07.2025 14:09 πŸ‘ 3 πŸ” 2 πŸ’¬ 0 πŸ“Œ 0

My takeaway is that algorithm designers should think more broadly about the goals for algorithms in policy settings. It's tempting to just train ML models to maximize predictive performance, but services might be improved a lot with even modest alterations for other goals.

08.07.2025 14:59 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Using historical data from human services, we then look at how severe learning-targeting tradeoffs really are. It turns out, not that bad! We get most of the possible targeting performance while giving up only a little bit of learning compared to the ideal RCT.

08.07.2025 14:59 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

We introduce a framework for designing allocation policies that optimally trade off between targeting high-need people and learning a treatment effect as accurately as possible. We give efficient algorithms and finite-sample guarantees using a duality-based characterization of the optimal policy.

08.07.2025 14:59 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

A big factor is that randomizing conflicts with the targeting goal: running a RCT means that people with high predicted risk won't get prioritized for treatment. We wanted to know how sharp the tradeoff really is: does learning treatment effects require giving up on targeting entirely?

08.07.2025 14:59 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

These days, public services are often targeted with predictive algorithms. Targeting helps prioritize people who might be most in need. But, we don't typically have good causal evidence about whether the program we're targeting actually improves outcomes. Why not run RCTs?

08.07.2025 14:59 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
Learning treatment effects while treating those in need Many social programs attempt to allocate scarce resources to people with the greatest need. Indeed, public services increasingly use algorithmic risk assessments motivated by this goal. However, targe...

Excited to share that our paper "Learning treatment effects while treating those in need" received the exemplary paper award for AI at EC 2025! This paper grew out collaborations with Allegheny County's human services department and my co-author Pim Welle (at ACDHS).
arxiv.org/abs/2407.07596

08.07.2025 14:59 πŸ‘ 25 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0
Human-AI Complementarity Workshop - NSF AI Institute for Societal Decision Making - Carnegie Mellon University Landing page that provides details for the annual AI-SDM workshop on Human-AI Complementarity for Decision Making

CMU is hosting a workshop on Human-AI Complementarity for Decision Making this September! Abstract submissions due July 15, travel will be covered for accepted presenters.

www.cmu.edu/ai-sdm/resea...

07.07.2025 15:33 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Excited to have this work out at ICML this year! Do LLMs make correlated errors? Yes, and those by the same company, and also more accurate/later generations are more correlated -- increasing algorithmic monoculture

arxiv.org/abs/2506.07962

03.07.2025 13:06 πŸ‘ 38 πŸ” 3 πŸ’¬ 1 πŸ“Œ 3

Still thinking about this post. The broader point, which should resonate way beyond the specific issue of "peer review," is that human disagreement is not friction and waste. It's a load-bearing, functional part of social and intellectual systems.

28.05.2025 13:41 πŸ‘ 133 πŸ” 31 πŸ’¬ 12 πŸ“Œ 1

I don't know one way or another, but it's at least a clearer capability to benchmark. And, if a LLM *could* summarize well enough on existing papers, arxiv using it for lower-bar moderation decisions wouldn't distort paper-writing in the future.

27.05.2025 14:24 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Thoughtful take on one aspect of the increasing problem of LLMs leading to β€œcentralization” of thought/writing/etc.

26.05.2025 23:45 πŸ‘ 3 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0

The arXiv summarization use case sounds a lot more sensible. Clear value judgment specified up-front, not outsourced to the LLM: papers should have easily summarized claims and evidence. Resulting incentives for authors seem ok (making sure LLMs can at least parse the paper probably isn't bad).

27.05.2025 02:13 πŸ‘ 3 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0