Sam Clark's Avatar

Sam Clark

@samclark.net

Demographer, epidemiologist, and data scientist who develops new methods and does research in demography and epidemiology - samclark.net. Mountain bike rider.

1,854
Followers
1,263
Following
10
Posts
12.02.2024
Joined
Posts Following

Latest posts by Sam Clark @samclark.net

Preview
An in-process SQL OLAP database management system DuckDB is an in-process SQL OLAP database management system. Simple, feature-rich, fast & open source.

Use DuckDB for SQL: 

duckdb.org

07.04.2025 19:16 👍 1 🔁 0 💬 0 📌 0
Preview
Trump Administration Ends Global Health Research Program The Demographic and Health Surveys were the only sources of reliable information in many countries on metrics such as mortality, nutrition and education.

RIP DHS. We expected this, but it’s still a shock.

www.nytimes.com/2025/02/26/h...

27.02.2025 00:51 👍 10 🔁 1 💬 0 📌 0

Annual budget of USAID is about $50B. Annual budget of USA is about $7T. Eliminating USAID does not save anything in this context.

07.02.2025 13:45 👍 5 🔁 0 💬 0 📌 0
     Causal inference methods for treatment effect estimation usually assume independent units. However, this assumption is often questionable because units may interact, resulting in spillover effects between them. We develop augmented inverse probability weighting (AIPW) for estimation and inference of the expected average treatment effect (EATE) with observational data from a single (social) network with spillover effects. In contrast to overall effects such as the global average treatment effect (GATE), the EATE measures, in expectation and on average over all units, how the outcome of a unit is causally affected by its own treatment, marginalizing over the spillover effects from other units. We develop cross-fitting theory with plugin machine learning to obtain a semiparametric treatment effect estimator that converges at the parametric rate and asymptotically follows a Gaussian distribution. The asymptotics are developed using the dependency graph rather than the network graph, which makes explicit that we allow for spillover effects beyond immediate neighbors in the network. We apply our AIPW method to the Swiss StudentLife Study data to investigate the effect of hours spent studying on exam performance accounting for the students' social network.

Causal inference methods for treatment effect estimation usually assume independent units. However, this assumption is often questionable because units may interact, resulting in spillover effects between them. We develop augmented inverse probability weighting (AIPW) for estimation and inference of the expected average treatment effect (EATE) with observational data from a single (social) network with spillover effects. In contrast to overall effects such as the global average treatment effect (GATE), the EATE measures, in expectation and on average over all units, how the outcome of a unit is causally affected by its own treatment, marginalizing over the spillover effects from other units. We develop cross-fitting theory with plugin machine learning to obtain a semiparametric treatment effect estimator that converges at the parametric rate and asymptotically follows a Gaussian distribution. The asymptotics are developed using the dependency graph rather than the network graph, which makes explicit that we allow for spillover effects beyond immediate neighbors in the network. We apply our AIPW method to the Swiss StudentLife Study data to investigate the effect of hours spent studying on exam performance accounting for the students' social network.

"Treatment Effect Estimation with Observational Network Data using Machine Learning"

Arxiv: arxiv.org/abs/2206.14591
#rstats code: github.com/corinne-rahe...

#stats

20.01.2025 03:02 👍 14 🔁 4 💬 1 📌 0
A red squirrel poses for a photo

A red squirrel poses for a photo

Welcome to our Crib

07.12.2024 08:04 👍 9656 🔁 575 💬 147 📌 43
Post image

Huge congratulations to my Florida State Univ Population Center colleagues Mike McFarland and Matt Hauer (@drdemography.bsky.social) on this important and very newsworthy (!!) paper on the impact of leaded gasoline on US public health. #demography

acamh.onlinelibrary.wiley.com/doi/abs/10.1...

05.12.2024 23:30 👍 31 🔁 8 💬 2 📌 1

www.statnews.com/2023/03/13/m...

06.12.2024 18:38 👍 0 🔁 0 💬 0 📌 0

After I moved to Canada a couple of years ago I realized that I was no longer constantly running a massive stress routine in the background of my mind worrying about health care and guns. It was weirdly noticeable only when it stopped.

06.12.2024 12:59 👍 28198 🔁 3091 💬 868 📌 278
Graduate Program - Department of Demography Graduate Program UC Berkeley Demography offers three graduate degree tracks independently and in conjunction with the department of Sociology. Ph.D. in Demography The doctoral program is intended to p...

@ucberkeleyofficial.bsky.social is accepting applications for Fall 2025 for the PhD program in Demography AND the Graduate Group in Sociology & Demography. Seeking a diverse and strong cohort; applications DUE 12/17/2024.

Learn more about the program:
www.demog.berkeley.edu/graduate-pro...

04.12.2024 20:46 👍 28 🔁 21 💬 1 📌 2
Video thumbnail

Falling #fertility across the world will lead to significant changes in countries' age pyramids. By 2100, when today's newborns are in their 70s, they (or their elders!) will be the largest age group in many countries.

#demography

#rstats code: github.com/schmert/bone...

04.12.2024 13:42 👍 16 🔁 7 💬 1 📌 0
Preview
Beware the myth: learning styles affect parents’, children’s, and teachers’ thinking about children’s academic potential - npj Science of Learning npj Science of Learning - Beware the myth: learning styles affect parents’, children’s, and teachers’ thinking about children’s academic potential

Looks like it might be time to reiterate what psychologists have been screaming from the rooftops for years: learning styles as it is presented to the general public is a myth and it damages students’ sense of efficacy www.nature.com/articles/s41...

03.12.2024 12:18 👍 66 🔁 33 💬 2 📌 2

Scientists, academics, researchers: We’re excited to share that @altmetric.com is now tracking mentions of your research on Bluesky! 🧪

03.12.2024 14:10 👍 29668 🔁 5025 💬 458 📌 280
Preview
How our team at Our World in Data became a global data source on COVID-19 Our small team made COVID-19 data clear, reliable, and accessible to a global audience. This is how it happened.

Saloni, Edouard, and Lucas wrote up the history of Our World in Data during the COVID pandemic.

It's about the impact we hoped to achieve and how it felt to us during that time.

ourworldindata.org/owid-covid-h...

24.11.2024 10:22 👍 85 🔁 24 💬 2 📌 1

Is there an equivalent graphic for water flouridation and tooth decay?

23.11.2024 17:18 👍 5 🔁 3 💬 0 📌 0
Book outline

Book outline

Over the past decade, embeddings — numerical representations of
machine learning features used as input to deep learning models — have
become a foundational data structure in industrial machine learning
systems. TF-IDF, PCA, and one-hot encoding have always been key tools
in machine learning systems as ways to compress and make sense of
large amounts of textual data. However, traditional approaches were
limited in the amount of context they could reason about with increasing
amounts of data. As the volume, velocity, and variety of data captured
by modern applications has exploded, creating approaches specifically
tailored to scale has become increasingly important.
Google’s Word2Vec paper made an important step in moving from
simple statistical representations to semantic meaning of words. The
subsequent rise of the Transformer architecture and transfer learning, as
well as the latest surge in generative methods has enabled the growth
of embeddings as a foundational machine learning data structure. This
survey paper aims to provide a deep dive into what embeddings are,
their history, and usage patterns in industry.

Over the past decade, embeddings — numerical representations of machine learning features used as input to deep learning models — have become a foundational data structure in industrial machine learning systems. TF-IDF, PCA, and one-hot encoding have always been key tools in machine learning systems as ways to compress and make sense of large amounts of textual data. However, traditional approaches were limited in the amount of context they could reason about with increasing amounts of data. As the volume, velocity, and variety of data captured by modern applications has exploded, creating approaches specifically tailored to scale has become increasingly important. Google’s Word2Vec paper made an important step in moving from simple statistical representations to semantic meaning of words. The subsequent rise of the Transformer architecture and transfer learning, as well as the latest surge in generative methods has enabled the growth of embeddings as a foundational machine learning data structure. This survey paper aims to provide a deep dive into what embeddings are, their history, and usage patterns in industry.

Cover image

Cover image

Just realized BlueSky allows sharing valuable stuff cause it doesn't punish links. 🤩

Let's start with "What are embeddings" by @vickiboykis.com

The book is a great summary of embeddings, from history to modern approaches.

The best part: it's free.

Link: vickiboykis.com/what_are_emb...

22.11.2024 11:13 👍 652 🔁 101 💬 22 📌 6
Preview
Looking for Maintainers to Support First-Time Contributors Announcing a Community Call and Coworking sessions to support first contributions

At @rOpenSci.hachyderm.io.ap.brid.gy we're pairing first-time code contributors with experienced maintainers. If you are an rOpenSci or other #RStats package author and want to help build the road for new contributors and get co-maintainers, sign up for co-working!

ropensci.org/blog/2024/10...

23.11.2024 16:06 👍 26 🔁 17 💬 0 📌 0
Preview
FSU board OKs removal of over 400 courses from general education offerings after review “We’re living through an era of legislature-driven higher education reform,” FSU Provost Jim Clark said.

Important to understand that (1) political appointees, not the university administration, are doing this; (2) they're not cancelling courses, but removing them from the list that satisfy breadth reqmts (i.e. death by strangling rather than a knife to the back).

www.tallahassee.com/story/news/l...

23.11.2024 16:11 👍 27 🔁 15 💬 1 📌 0
Preview
Researchers say an AI-powered transcription tool used in hospitals invents things no one ever said Whisper is a popular transcription tool powered by artificial intelligence, but it has a major flaw. It makes things up that were never said.

AI for medical transcription - in this case Whisper sneaks in its own hallucinatory phrases
apnews.com/article/ai-a...

though i wish the AI did invent ‘hyperactivated antibiotics’ we are going to need them soon 😏

h/t @placentadoc.bsky.social

#MedSky

23.11.2024 12:27 👍 26 🔁 7 💬 0 📌 1
Preview
Guyana - Wikipedia

For the Thanksgiving break I will be in Guyana visiting one of our children who is working there for two years.

en.wikipedia.org/wiki/Guyana

22.11.2024 13:53 👍 2 🔁 0 💬 0 📌 0

Just set up an account for the openVA Team @openva.net where I will post things related to the group.

21.11.2024 15:15 👍 3 🔁 1 💬 0 📌 0
Post image

Backyard now!

21.11.2024 14:02 👍 5 🔁 0 💬 0 📌 0


CGD's very own starter pack... experts and staff former and present...

bsky.app/starter-pack...

20.11.2024 16:09 👍 22 🔁 14 💬 1 📌 2
Post image

Can anyone give lit tips for papers showing this qualitative age pattern of a mortality rate ratio (e.g. frail vs not, sick vs not, high SES vs low, in nursing home vs general pop, with disease vs without)?

21.11.2024 10:46 👍 8 🔁 3 💬 2 📌 1
21.11.2024 10:46 👍 0 🔁 0 💬 0 📌 0

SQL! SQL. JUST USE SQL

19.11.2024 02:24 👍 2039 🔁 117 💬 187 📌 38
Post image Post image

Some recent beautiful evenings

16.11.2024 19:41 👍 5 🔁 0 💬 0 📌 0
Post image

Reading Peter Turchin’s interesting and provocative books. This characterization of social science disciplines in ‘Ultrasociety’ is amusing:

19.03.2024 19:47 👍 1 🔁 0 💬 0 📌 0