check out more here!: www.nature.com/articles/s41...
As videoconferencing becomes standard in work, health, education, and the legal system, people without reliable high-speed internet may face systematic disadvantages. The internet is often called an equalizer, but if access to internet is unequal, it may instead exacerbate disadvantage. (6/6)
And not all glitches are equally harmfulβ the more uncanny a glitch feels, the more it undermines evaluations of the person on screen. (5/6)
We found that glitches only undermined interpersonal judgments in video calls that simulate face-to-face interaction (therefore producing uncanniness), showing that the negative effect produced by glitches goes beyond mere disruptiveness, comprehension difficulties and negative attributions. (4/6)
Why does this happen? Glitches disrupt the illusion of real face-to-face interaction. Distorted faces, choppy motion, and audio hiccups create a strange, creepy, or eerie feeling termed βuncanninessββ and that feeling undermines interpersonal judgments. (3/6)
For example, glitches during job interviews decrease hiring likelihood, and in analyzing actual parole hearings, the presence of glitches was associated with a 12-percentage-point difference in whether someone was granted parole (48% vs. 60%) (2/6)
Loved seeing GLITCHES in yesterdayβs NYT crossword-- perfect timing for our new paper w/ Jacqueline Rifkin (co-first author) and Jeff Johnson, which examines how minor video-call glitches (even when no information is lost) meaningfully impact important decisions. (1/6)
7/ Read the full paper β journals.plos.org/plosone/arti...
Happy to discuss or answer questions!
6/ Why it matters
Prompt architecture can influence outputs in high-stakes domains:
β’ Hiring decisions
β’ Medical triage
β’ Policy or scientific research summaries
β’ Your research papers!
In each case, prompt architecture could silently skew resultsβunless we actively correct for it.
5/ Mitigation strategy
Instead of searching for a βperfectβ prompt, we propose Prompt Aggregation: By asking the same question multiple ways and combining answers, we can cancel out these biases.
In our βhoney vs mapleβ example, aggregation favors honey in 5 of 8 prompts. Try it out yourself!
4/ Implication: There is no neutral prompt
You can't write your way around prompt architecture effects because any prompt must have some order, some framing, some structure.
GPT-3, GPT-4, and Llama 3.1 all exhibited different prompt architecture biases.
3/ Core insight
We found LLMs are systematically biased by seemingly trivial prompt architecture:
β’ Option order (e.g., "honey or maple" vs "maple or honey")
β’ Option labels (e.g., A/B vs B/A)
β’ Question framing (e.g., "closer" vs "further")
β’ Asking for justification
Seems straightforward... until you flip the order. Same question, different answers. But why?
user: Is A honey or B maple syrup closer to sugar? chatGPT: the answer is B. maple syrup-- maple syrup is closer to pure sugar than honey
1/ Setup
Imagine you ask a simple question to ChatGPT:
New paper alert with Olivier Toubia! We show how prompt architecture introduces systematic error in LLM responses.
π§΅Key findings from our study on prompt structure (and how to mitigate silent bias in your research):