NEW: Haowen Zheng, Robert Andersen, Anders Holm, Kristian Bernt Karlson, "Is College Really “the” Equalizer? New Evidence Addressing Unobserved Selection." sociologicalscience.com/articles-v13...
NEW: Haowen Zheng, Robert Andersen, Anders Holm, Kristian Bernt Karlson, "Is College Really “the” Equalizer? New Evidence Addressing Unobserved Selection." sociologicalscience.com/articles-v13...
Lecture A09 - I get mildly ranty about the low quality of papers on discrimination and somehow also introduce generalized linear models for events and illustrate post-stratification. The theme continues next week with modeling sensitivity to unmeasured confounding. I will try to be less ranty.
A slacking occupational structure in Britain? Reflections on my substack on a piece in the @financialtimes.com by John Burn-Murdoch. And what does this piece imply for the debate on overeducation?
@data.ft.com hermwerf.substack.com/p/a-slacking...
Problem about the loneliness epidemic is, it's everywhere except in representative survey data. Let's look at where the claim comes from. 1/
Simulated null distribution for data with a sample size of 100, difference in group means of 5, and a p-value of 0.142
Simulated null distribution of a slope of 0.8 and p-value of 0.002
Finally, we have to decide if the p-value meets an evidentiary standard or threshold that would provide us with enough evidence that we aren’t in the null world (or, in more statsy terms, enough evidence to reject the null hypothesis). There are lots of possible thresholds. By convention, most people use a threshold (often shortened to α) of 0.05, or 5%. But that’s not required! You could have a lower standard with an α of 0.1 (10%), or a higher standard with an α of 0.01 (1%). Statistically significant The p-value is < 0.001 and our threshold for α is 0.05 In a world where there is no relationship between x and y, the probability of seeing a slope of at least 0.901 is < 0.1% Since < 0.001 is less than 0.05, we have enough evidence to say that the slope is statistically significant.
Evidentiary standards When thinking about p-values and thresholds, I like to imagine myself as a judge or a member of a jury. Many legal systems around the world have formal evidentiary thresholds or standards of proof. If prosecutors provide evidence that meets a threshold (i.e. goes beyond a reasonable doubt, or shows evidence on a balance of probabilities), the judge or jury can rule guilty. If there’s not enough evidence to clear the standard or threshold, the judge or jury has to rule not guilty. With p-values: If the probability of seeing an effect or difference (or δ) in a null world is less than 5% (or whatever the threshold is), we rule it statistically significant and say that the difference does not fit in that world. We’re pretty confident that it’s not zero. If the p-value is larger than the threshold, we do not have enough evidence to claim that δ doesn’t come from a world of where there’s no difference. We don’t know if it’s not zero. Importantly, if the difference is not significant, that does not mean that there is no difference. It just means that we can’t detect one if there is. If a prosecutor doesn’t provide sufficient evidence to clear a standard or threshold, it does not mean that the defendant didn’t do whatever they’re charged with†—it means that the judge or jury can’t detect guilt.
I just whipped up this little #QuartoPub site last week that demonstrates how I teach p-values/hyp-testing through simulation both with live OJS and with #rstats, and I think it's super neat! It has examples for diff-in-means, diff-in-props, and regression slopes nullworlds.andrewheiss.com #statsky
This time I try to explain group-level confounding and some ways to deal with it. Lecture B04 of Statistical Rethinking 2026 - fixed effects, Mundlak machines, latent Mundlak machines, intro to social network analysis and the social relations model. Full lecture list: github.com/rmcelreath/s...
As a statistical educator, it had not occurred to me that I need to caution students against regressing a variable on a function of itself. My naivete is unbounded.
The Peri & Sparber paper (linked below) looks really good! It has synthetic data analyses and everything.
And we're live, Lecture A1 is online. Introduction to Bayesian workflow, generative models, estimands, estimators, estimates, error checking, beginnings of probability theory and Bayesian updating. www.youtube.com/watch?v=ztbY...
Here are my favourite 2025 papers on climate policy/politics (listed in no particular order).
1. Ascari, Guido, Andrea Colciago, Timo Haber, and Stefan Wöhrmüller. 2025. ‘Inequality along the European Green Transition’. Economic Journal.
doi.org/10.1093/ej/u...
What do Britons know and think about #irregularmigration?
Read new @iclaimeu.bsky.social report out today
i-claim.eu/project/publ...
On the morning of Keir Starmer's conference speech here's a new post on an odd psychopathology in British politics - our main parties don't like the people who vote for them - the dreaded Professional Managerial Class. And so they are acting out like a divorced dad seeking cooler voters. 1/n
Young researchers in social policy, submit!
#rstats
It is with profound sadness I heard that my long-time friend and colleague, John Fox passed away this week.
He was the author of {car}, {effects}, {Rcmdr}, ... and numerous influential books. I will miss him greatly.
www.john-fox.ca
After more than 10 years of “the Danish Model”, nativism is hegemonic in the country, the far right polls near level highs again, and the Social Democrats lost Copenhagen and poll at historic low.
European Social Democrats should look at the facts, not the myths!
Me in @theguardian.com
Visas are a key tool for states to regulate incoming mobility from abroad, which can have ramifications for the establishment and perpetuation of global inequalities. In this article, we systematically analyze visa appointment wait times in German embassies and consulates worldwide. Using computational methods, we collect—and publish—fine-grained longitudinal data on the closest available appointment dates for various visa types, covering a total of 16,182 visa appointment requests. Our analysis reveals strong and systematic variance: the poorer the country a diplomatic mission is based in, the longer the wait time and the lower the chances of finding an available appointment (which ranges from almost 0 to 100 percent). We also argue that Germany’s system is quite opaque compared to other established immigration countries such as the U.S. These core findings raise important questions in light of current debates about global justice, legal pathways to migration, and efforts to attract foreign talent.
Graph that shows that 44.1 percent of requests did not lead to an appointment that could be selected. For the 55.9 percent where an appointment was available the distribution of wait times follows a steep curve with short wait times in many cases and a long tail of few cases with very long wait times of up to 98 days.
The average wait times and chances to find an appointment varied a lot between Germany's diplomatic missions. The latter range from almost 0 to 100 percent.
This variance is not random. Rather, economic wellbeing (GDP per capita) is a key predictor of wait times and chances of finding an appointment. The poorer the country a German embassy/consulate is based in, the longer the wait time and the lower the chances of finding an appointment.
New #openaccess study
We made >16,000 visa appointment requests at German embassies and consulates worldwide
Key finding: The poorer the country, the longer the wait time and the lower the chance to get an appointment.
"A time panelty for the Global South?"
shorturl.at/ZiAFb
“Little boxes” and “Coat of many colors”
The electoral outcome most strongly linked to deprivation is not any party’s vote share, but turnout. Across almost all indicators, turnout is markedly lower in more deprived areas, with only barriers to housing & services and quality in the living environment showing weaker correlations.
Pleased to see this out in print - detailing MAIHDA's desirable statistical properties.
"MAIHDA is especially valuable when inequalities are subtle or data for marginalised intersections are sparse - conditions common in practice"
journals.sagepub.com/doi/10.1177/...
@clarerevans.bsky.social
We just published a new report synthesizing more than 7 years of research on the impact of digital technologies on employment in Europe carried out with my team in the JRC. Lots of evidence and ideas for discussion! #EconSky #sociology
@sergiotorrejon.com @lauranurski.bsky.social
Ever asked yourself how to detect and extract social groups from texts with computational social science? @haukelicht.bsky.social and me have a solution for you out at @bjpols.bsky.social. You can also find the pre-trained models on huggingface!
Abstract
What do unions do? On average they make the members about $870k more wealthy over time, new findings at Social Forces show.
Models as Prediction Machines: How to Convert Confusing Coefficients into Clear Quantities Abstract Psychological researchers usually make sense of regression models by interpreting coefficient estimates directly. This works well enough for simple linear models, but is more challenging for more complex models with, for example, categorical variables, interactions, non-linearities, and hierarchical structures. Here, we introduce an alternative approach to making sense of statistical models. The central idea is to abstract away from the mechanics of estimation, and to treat models as “counterfactual prediction machines,” which are subsequently queried to estimate quantities and conduct tests that matter substantively. This workflow is model-agnostic; it can be applied in a consistent fashion to draw causal or descriptive inference from a wide range of models. We illustrate how to implement this workflow with the marginaleffects package, which supports over 100 different classes of models in R and Python, and present two worked examples. These examples show how the workflow can be applied across designs (e.g., observational study, randomized experiment) to answer different research questions (e.g., associations, causal effects, effect heterogeneity) while facing various challenges (e.g., controlling for confounders in a flexible manner, modelling ordinal outcomes, and interpreting non-linear models).
Figure illustrating model predictions. On the X-axis the predictor, annual gross income in Euro. On the Y-axis the outcome, predicted life satisfaction. A solid line marks the curve of predictions on which individual data points are marked as model-implied outcomes at incomes of interest. Comparing two such predictions gives us a comparison. We can also fit a tangent to the line of predictions, which illustrates the slope at any given point of the curve.
A figure illustrating various ways to include age as a predictor in a model. On the x-axis age (predictor), on the y-axis the outcome (model-implied importance of friends, including confidence intervals). Illustrated are 1. age as a categorical predictor, resultings in the predictions bouncing around a lot with wide confidence intervals 2. age as a linear predictor, which forces a straight line through the data points that has a very tight confidence band and 3. age splines, which lies somewhere in between as it smoothly follows the data but has more uncertainty than the straight line.
Ever stared at a table of regression coefficients & wondered what you're doing with your life?
Very excited to share this gentle introduction to another way of making sense of statistical models (w @vincentab.bsky.social)
Preprint: doi.org/10.31234/osf...
Website: j-rohrer.github.io/marginal-psy...
Check out my new article in the Journal of Organizational Sociology, where I examine how technology limits the autonomy of entry-level workers. I theorize two subtypes of technical control and discuss its implications for gender inequality
www.degruyterbrill.com/document/doi...
Not recent, but may be of interest onlinelibrary.wiley.com/doi/epdf/10....
New how-to guide now available on the European Network for Open Criminology website. This time @asiermoneva.com shares advice on writing reproducible and readable analysis code. Highly recommended! esc-enoc.github.io/how-to/repro...
The outsourcing boom of the Major-Blair years saved money in the short run but left the state without the capacity to do anything but buy in services from canny private providers who have us over a barrel and are raking it in
🚨 Major release alert
We’re thrilled to launch lissyrtools v0.2.0 — our R package that makes working with LIS & LWS microdata simpler, faster, and clearer 📦
🧵 1/12
Interested in employment and social security research? Please follow the account below (we've moved from X and need to rebuild our following!)
mlmRev::egsingle |> performance::check_group_variation( select = c("female", "grade", "math"), by = c("schoolid", "childid"), include_by = TRUE ) #> Check schoolid variation #> #> Variable | Variation | Design #> ------------------------------ #> childid | both | nested #> female | within | crossed #> grade | both | #> math | both | #> #> Check childid variation #> #> Variable | Variation | Design #> ----------------------------- #> schoolid | between | #> female | between | #> grade | both | #> math | both |
🆕 Introducing check_group_variation() in the {performance} #Rstats package! 🎉
This function makes it easy to checks if variables vary within or between levels of grouping variables.
Perfect for understanding and designing mixed models 🚀
easystats.github.io/performance/...
#stats #easystats
Writing some paragraphs about odds ratio and, more generally, different scales in nonlinear models.
Any favorite articles on odds ratio?>