's Avatar

@datascienceweekly

63
Followers
353
Following
41
Posts
13.11.2024
Joined
Posts Following

Latest posts by @datascienceweekly

Preview
Data Science Weekly - Issue 641 Curated news, articles and jobs related to Data Science, AI, & Machine Learning

Data Science Weekly - Issue 641, by @DataSciNews open.substack.com/pub/datascie...

05.03.2026 18:13 👍 0 🔁 0 💬 0 📌 0

Congratulations, Bruno!

05.03.2026 10:58 👍 1 🔁 0 💬 0 📌 0
Preview
A Few Claude Skills for R Users – R Works The community has come together to create some great Claude Skills that you can try out today.

I rounded up a few Claude Skills for #RStats users.

Huge thanks to the creators who developed them. They share Skills for everything from tidyverse code to brand.yml files to learning while using AI.

Hope the list is useful, and please let me know what I missed! 🧡

rworks.dev/posts/claude...

03.03.2026 14:05 👍 134 🔁 38 💬 4 📌 4
Video thumbnail

You can now visualize how a color palette distributes across OKHsv, OKHsl, OKLCh and CIELab, and compare 6 distance metrics side by side. Zero dependencies, raw WebGL2.

Took me 3 years to make something I wasn't too embarrassed to share 🙃

meodai.github.io/color-palett...

04.03.2026 21:40 👍 989 🔁 133 💬 27 📌 3
Preview
Data Science Weekly - Issue 640 Curated news, articles and jobs related to Data Science, AI, & Machine Learning

Data Science Weekly - Issue 640, by @DataSciNews open.substack.com/pub/datascie...

26.02.2026 13:54 👍 0 🔁 0 💬 0 📌 0
Preview
Data Science Weekly - Issue 639 Curated news, articles and jobs related to Data Science, AI, & Machine Learning

Data Science Weekly - Issue 639, by @DataSciNews open.substack.com/pub/datascie...

19.02.2026 22:01 👍 0 🔁 0 💬 0 📌 0
Preview
Data Science Weekly - Issue 638 Curated news, articles and jobs related to Data Science, AI, & Machine Learning

Data Science Weekly - Issue 638, by @DataSciNews open.substack.com/pub/datascie...

12.02.2026 19:36 👍 0 🔁 0 💬 0 📌 0
Simulated null distribution for data with a sample size of 100, difference in group means of 5, and a p-value of 0.142

Simulated null distribution for data with a sample size of 100, difference in group means of 5, and a p-value of 0.142

Simulated null distribution of a slope of 0.8 and p-value of 0.002

Simulated null distribution of a slope of 0.8 and p-value of 0.002

Finally, we have to decide if the p-value meets an evidentiary standard or threshold that would provide us with enough evidence that we aren’t in the null world (or, in more statsy terms, enough evidence to reject the null hypothesis).

There are lots of possible thresholds. By convention, most people use a threshold (often shortened to α) of 0.05, or 5%. But that’s not required! You could have a lower standard with an α of 0.1 (10%), or a higher standard with an α of 0.01 (1%).

Statistically significant
The p-value is < 0.001 and our threshold for α is 0.05

In a world where there is no relationship between x and y, the probability of seeing a slope of at least 0.901 is < 0.1%

Since < 0.001 is less than 0.05, we have enough evidence to say that the slope is statistically significant.

Finally, we have to decide if the p-value meets an evidentiary standard or threshold that would provide us with enough evidence that we aren’t in the null world (or, in more statsy terms, enough evidence to reject the null hypothesis). There are lots of possible thresholds. By convention, most people use a threshold (often shortened to α) of 0.05, or 5%. But that’s not required! You could have a lower standard with an α of 0.1 (10%), or a higher standard with an α of 0.01 (1%). Statistically significant The p-value is < 0.001 and our threshold for α is 0.05 In a world where there is no relationship between x and y, the probability of seeing a slope of at least 0.901 is < 0.1% Since < 0.001 is less than 0.05, we have enough evidence to say that the slope is statistically significant.

Evidentiary standards

When thinking about p-values and thresholds, I like to imagine myself as a judge or a member of a jury. Many legal systems around the world have formal evidentiary thresholds or standards of proof. If prosecutors provide evidence that meets a threshold (i.e. goes beyond a reasonable doubt, or shows evidence on a balance of probabilities), the judge or jury can rule guilty. If there’s not enough evidence to clear the standard or threshold, the judge or jury has to rule not guilty.

With p-values:

If the probability of seeing an effect or difference (or δ) in a null world is less than 5% (or whatever the threshold is), we rule it statistically significant and say that the difference does not fit in that world. We’re pretty confident that it’s not zero.
If the p-value is larger than the threshold, we do not have enough evidence to claim that δ doesn’t come from a world of where there’s no difference. We don’t know if it’s not zero.
Importantly, if the difference is not significant, that does not mean that there is no difference. It just means that we can’t detect one if there is. If a prosecutor doesn’t provide sufficient evidence to clear a standard or threshold, it does not mean that the defendant didn’t do whatever they’re charged with†—it means that the judge or jury can’t detect guilt.

Evidentiary standards When thinking about p-values and thresholds, I like to imagine myself as a judge or a member of a jury. Many legal systems around the world have formal evidentiary thresholds or standards of proof. If prosecutors provide evidence that meets a threshold (i.e. goes beyond a reasonable doubt, or shows evidence on a balance of probabilities), the judge or jury can rule guilty. If there’s not enough evidence to clear the standard or threshold, the judge or jury has to rule not guilty. With p-values: If the probability of seeing an effect or difference (or δ) in a null world is less than 5% (or whatever the threshold is), we rule it statistically significant and say that the difference does not fit in that world. We’re pretty confident that it’s not zero. If the p-value is larger than the threshold, we do not have enough evidence to claim that δ doesn’t come from a world of where there’s no difference. We don’t know if it’s not zero. Importantly, if the difference is not significant, that does not mean that there is no difference. It just means that we can’t detect one if there is. If a prosecutor doesn’t provide sufficient evidence to clear a standard or threshold, it does not mean that the defendant didn’t do whatever they’re charged with†—it means that the judge or jury can’t detect guilt.

I just whipped up this little #QuartoPub site last week that demonstrates how I teach p-values/hyp-testing through simulation both with live OJS and with #rstats, and I think it's super neat! It has examples for diff-in-means, diff-in-props, and regression slopes nullworlds.andrewheiss.com #statsky

11.02.2026 21:14 👍 139 🔁 26 💬 3 📌 5
Post image

Malaysia’s R community is growing! 🇲🇾 From a small network into a platform that actively connects students, researchers, and industry practitioners

r-consortium.org/posts/bringi...

#rstats #opensource #datascience #Malaysia #Shiny #tidyverse #community #analytics

09.02.2026 22:22 👍 6 🔁 2 💬 0 📌 0
A screenshot of an interactive map of all medals won so far at the 2026 Winter Olympics by place of birth of the winners

A screenshot of an interactive map of all medals won so far at the 2026 Winter Olympics by place of birth of the winners

Winter Olympics 2026 medalists by place of birth: an interactive map I built (again) thanks to @wikipedia, @wikidata and #rstats).

Check out the interactive version of the map: https://giocomai.github.io/olympics2026nuts/medalists_map.html

11.02.2026 07:23 👍 9 🔁 5 💬 1 📌 0

📈 A while back I did promise to put a post together how I generate publication-ready figures. Before I am off to #CNY2026, I finally found the time. May this be useful to some...
Also I am curious to hear what other tricks are out there.
jaquent.github.io/2026/02/crea...

#rstats #ggplot #dataviz

10.02.2026 10:14 👍 26 🔁 9 💬 4 📌 0
A hexagon R package logo, with the package name “whistledown” in a calligraphy-style font. A silhouette of a quill representing Lady Whistledown’s letters, and three bees for the Bridgerton family crest.

A hexagon R package logo, with the package name “whistledown” in a calligraphy-style font. A silhouette of a quill representing Lady Whistledown’s letters, and three bees for the Bridgerton family crest.

Dearest Gentle Reader,
I’m happy to announce the release of my new R package, “whistledown”, with color palettes from the hit show #Bridgerton!

#RStats #ggplot

11.02.2026 20:02 👍 20 🔁 3 💬 1 📌 0
Preview
Transport Modeling in R High-performance tools for transport modeling - network processing, route enumeration, and traffic assignment in R. The package implements the Path-Sized Logit model for traffic assignment - Ben-Akiva...

I’m thrilled to introduce flownet (sebkrantz.github.io/flownet/), a new R package for transport modeling, supporting stochastic or deterministic traffic assignment to large networks, and powerful tools for (multimodal) network processing/simplification: sebkrantz.github.io/Rblog/2026/0... #Rstats

09.02.2026 19:06 👍 21 🔁 4 💬 1 📌 0
Preview
Data Science Weekly - Issue 637 Curated news, articles and jobs related to Data Science, AI, & Machine Learning

Data Science Weekly - Issue 637, by @DataSciNews open.substack.com/pub/datascie...

05.02.2026 22:55 👍 0 🔁 0 💬 0 📌 0
Some Favorite Data Science Tools Going into 2026 – Practical Significance A blog post highlighting some of data science tools I’m excited about going into the new year.

On a positive note, here's a new blog post highlighting some polyglot data science tools in R and Python that I've enjoyed lately

#rstats #pydata

www.practicalsignificance.com/posts/favori...

23.01.2026 00:00 👍 15 🔁 2 💬 2 📌 2
Post image

The first data science book that has a chapter on monads reproducible-data-science.dev

Learn how to build robust #DataScience pipelines with #RStats, #Python , #Julia and #Nix !

01.02.2026 11:47 👍 26 🔁 9 💬 0 📌 2
Preview
Using R to extract results from Stata log files – Ben Harrap

Are you a #Stata user? Maybe you work with one?

Have you ever found yourself copy-pasting from the results window?

It's annoying as hell! And terrible practice. So I wrote a blog post on using #rstats to extract results from Stata log files

benharrap.com/post/2026-02...

04.02.2026 04:34 👍 10 🔁 4 💬 3 📌 1
On this page
What’s the difference between statistical significance and substantial significance?
Can we measure substantial significance with statistics?
What are all the different ways we can look at model coefficients?
Print the object name
Use summary()
Use tidy() from the {broom} package
Use model_parameters() and model_details() from the {parameters} and {performance} packages
Make nice polished side-by-side regression tables with {modelsummary}
Make automatic coefficient plots with modelplot() from {modelsummary}
Plot model predictions and marginal effects
Automatic interpretation with {report}

On this page What’s the difference between statistical significance and substantial significance? Can we measure substantial significance with statistics? What are all the different ways we can look at model coefficients? Print the object name Use summary() Use tidy() from the {broom} package Use model_parameters() and model_details() from the {parameters} and {performance} packages Make nice polished side-by-side regression tables with {modelsummary} Make automatic coefficient plots with modelplot() from {modelsummary} Plot model predictions and marginal effects Automatic interpretation with {report}

Posted a helpful little set of FAQs about regression for my causal inference class, including illustrations of statistical vs. substantive signficance and all the different things you can do with #rstats model objects

evalsp26.classes.andrewheiss.com/news/2026-02...

03.02.2026 19:49 👍 68 🔁 10 💬 3 📌 1
Hemingway-bench AI Writing Leaderboard Stop rewarding slop. Hemingway-bench is an AI writing leaderboard that takes real-world writing tasks and puts them in front of master wordsmiths. Our goal: to push AI writing from two-second vibes to...

A new creative writing style bench and leaderboard for LLMs surgehq.ai/blog/hemingw...

05.02.2026 07:38 👍 22 🔁 5 💬 0 📌 1
CS 860 - Algorithms for Private Data Analysis- Fall 2020

Did you learn differential privacy (in part or in whole) from my course? Either the videos, lecture notes, or some combination? Please send me a DM or an email, I'm trying to gather some info.

(In case you missed it, here's the course: www.gautamkamath.com/CS860-fa2020..., ft full notes & videos)

05.02.2026 15:23 👍 8 🔁 3 💬 1 📌 0
Preview
Data Science Weekly - Issue 636 Curated news, articles and jobs related to Data Science, AI, & Machine Learning

Data Science Weekly - Issue 636, by @DataSciNews open.substack.com/pub/datascie...

29.01.2026 13:50 👍 0 🔁 0 💬 0 📌 0
dbreg

Things are grim. But in more frivolous news...

@jamesbrandecon.bsky.social and I have been chipping away at `dbreg`, a 📦 for running big regression models on database backends. For the right kinds of problems, the speed-ups are near magical.

Website: grantmcdermott.com/dbreg/

#rstats

[1/2]

26.01.2026 16:57 👍 68 🔁 15 💬 4 📌 1
The Warehouse

I’ve been thinking about how to find R packages by functionality when you don’t already know the package name.

So over the holidays, Claude Code and I built The Warehouse: a functionality-first R package directory that helps you find packages by what they do.
rwarehouse.netlify.app

#rstats

27.01.2026 15:12 👍 69 🔁 21 💬 3 📌 3
Preview
Designing a Declarative Data Stack: From Theory to Practice Explore the journey of building a declarative data stack - from architectural decisions to practical implementation. Learn how to separate business logic from technical implementation using templates, automation, and modern orchestration tools.

Two approaches to generating pipelines:

Parametric: define parameters, tool generates SQL
Template-based: write SQL templates with variables

dbt took templates. Automation tools took parametric. Neither is wrong - they optimize for different teams.

29.01.2026 08:32 👍 5 🔁 2 💬 0 📌 0
Preview
Data Science Weekly - Issue 635 Curated news, articles and jobs related to Data Science, AI, & Machine Learning

Data Science Weekly - Issue 635, by @DataSciNews open.substack.com/pub/datascie...

22.01.2026 21:19 👍 1 🔁 1 💬 0 📌 0
Preview
Data Science Weekly - Issue 635 Curated news, articles and jobs related to Data Science, AI, & Machine Learning

Data Science Weekly - Issue 635, by @DataSciNews open.substack.com/pub/datascie...

22.01.2026 21:19 👍 1 🔁 1 💬 0 📌 0

crying and laughing at the same time is good for the soul

22.01.2026 20:58 👍 0 🔁 0 💬 0 📌 0

Don't know who needs to hear this, but don't be afraid of starting a blog in 2026. SEO will be brutal at first, but you can find an audience through BlueSky

Regarding stack options, Quarto is good if you are doing an R/Python heavy data science site. Or for a modern CMS I recommend Statamic

08.01.2026 16:05 👍 13 🔁 2 💬 2 📌 0
Client Challenge

For my #popgen nerds out there, I've finished building a Python package called hapnet - an easy to use, straight out the box tool for building #haplotype networks. Just plug your data in, run a single line of code and voila! A pretty color coded network. Still beta testing!
pypi.org/project/hapn...

10.01.2026 03:52 👍 8 🔁 4 💬 2 📌 0
Post image Post image Post image

We just published a JOSIS paper on what spatial data science languages have in common and what they still need. Insights from across the R, Python & Julia ecosystems.

URL: doi.org/10.5311/JOSI...

#SpatialDataScience #GISchat #OpenSource #RSpatial #GeoPython #JuliaGeo

11.01.2026 16:01 👍 53 🔁 20 💬 0 📌 2