Data Science Weekly - Issue 641, by @DataSciNews open.substack.com/pub/datascie...
Data Science Weekly - Issue 641, by @DataSciNews open.substack.com/pub/datascie...
Congratulations, Bruno!
I rounded up a few Claude Skills for #RStats users.
Huge thanks to the creators who developed them. They share Skills for everything from tidyverse code to brand.yml files to learning while using AI.
Hope the list is useful, and please let me know what I missed! 🧡
rworks.dev/posts/claude...
You can now visualize how a color palette distributes across OKHsv, OKHsl, OKLCh and CIELab, and compare 6 distance metrics side by side. Zero dependencies, raw WebGL2.
Took me 3 years to make something I wasn't too embarrassed to share 🙃
meodai.github.io/color-palett...
Data Science Weekly - Issue 640, by @DataSciNews open.substack.com/pub/datascie...
Data Science Weekly - Issue 639, by @DataSciNews open.substack.com/pub/datascie...
Data Science Weekly - Issue 638, by @DataSciNews open.substack.com/pub/datascie...
Simulated null distribution for data with a sample size of 100, difference in group means of 5, and a p-value of 0.142
Simulated null distribution of a slope of 0.8 and p-value of 0.002
Finally, we have to decide if the p-value meets an evidentiary standard or threshold that would provide us with enough evidence that we aren’t in the null world (or, in more statsy terms, enough evidence to reject the null hypothesis). There are lots of possible thresholds. By convention, most people use a threshold (often shortened to α) of 0.05, or 5%. But that’s not required! You could have a lower standard with an α of 0.1 (10%), or a higher standard with an α of 0.01 (1%). Statistically significant The p-value is < 0.001 and our threshold for α is 0.05 In a world where there is no relationship between x and y, the probability of seeing a slope of at least 0.901 is < 0.1% Since < 0.001 is less than 0.05, we have enough evidence to say that the slope is statistically significant.
Evidentiary standards When thinking about p-values and thresholds, I like to imagine myself as a judge or a member of a jury. Many legal systems around the world have formal evidentiary thresholds or standards of proof. If prosecutors provide evidence that meets a threshold (i.e. goes beyond a reasonable doubt, or shows evidence on a balance of probabilities), the judge or jury can rule guilty. If there’s not enough evidence to clear the standard or threshold, the judge or jury has to rule not guilty. With p-values: If the probability of seeing an effect or difference (or δ) in a null world is less than 5% (or whatever the threshold is), we rule it statistically significant and say that the difference does not fit in that world. We’re pretty confident that it’s not zero. If the p-value is larger than the threshold, we do not have enough evidence to claim that δ doesn’t come from a world of where there’s no difference. We don’t know if it’s not zero. Importantly, if the difference is not significant, that does not mean that there is no difference. It just means that we can’t detect one if there is. If a prosecutor doesn’t provide sufficient evidence to clear a standard or threshold, it does not mean that the defendant didn’t do whatever they’re charged with†—it means that the judge or jury can’t detect guilt.
I just whipped up this little #QuartoPub site last week that demonstrates how I teach p-values/hyp-testing through simulation both with live OJS and with #rstats, and I think it's super neat! It has examples for diff-in-means, diff-in-props, and regression slopes nullworlds.andrewheiss.com #statsky
Malaysia’s R community is growing! 🇲🇾 From a small network into a platform that actively connects students, researchers, and industry practitioners
r-consortium.org/posts/bringi...
#rstats #opensource #datascience #Malaysia #Shiny #tidyverse #community #analytics
A screenshot of an interactive map of all medals won so far at the 2026 Winter Olympics by place of birth of the winners
Winter Olympics 2026 medalists by place of birth: an interactive map I built (again) thanks to @wikipedia, @wikidata and #rstats).
Check out the interactive version of the map: https://giocomai.github.io/olympics2026nuts/medalists_map.html
📈 A while back I did promise to put a post together how I generate publication-ready figures. Before I am off to #CNY2026, I finally found the time. May this be useful to some...
Also I am curious to hear what other tricks are out there.
jaquent.github.io/2026/02/crea...
#rstats #ggplot #dataviz
A hexagon R package logo, with the package name “whistledown” in a calligraphy-style font. A silhouette of a quill representing Lady Whistledown’s letters, and three bees for the Bridgerton family crest.
Dearest Gentle Reader,
I’m happy to announce the release of my new R package, “whistledown”, with color palettes from the hit show #Bridgerton!
#RStats #ggplot
I’m thrilled to introduce flownet (sebkrantz.github.io/flownet/), a new R package for transport modeling, supporting stochastic or deterministic traffic assignment to large networks, and powerful tools for (multimodal) network processing/simplification: sebkrantz.github.io/Rblog/2026/0... #Rstats
Data Science Weekly - Issue 637, by @DataSciNews open.substack.com/pub/datascie...
On a positive note, here's a new blog post highlighting some polyglot data science tools in R and Python that I've enjoyed lately
#rstats #pydata
www.practicalsignificance.com/posts/favori...
The first data science book that has a chapter on monads reproducible-data-science.dev
Learn how to build robust #DataScience pipelines with #RStats, #Python , #Julia and #Nix !
Are you a #Stata user? Maybe you work with one?
Have you ever found yourself copy-pasting from the results window?
It's annoying as hell! And terrible practice. So I wrote a blog post on using #rstats to extract results from Stata log files
benharrap.com/post/2026-02...
On this page What’s the difference between statistical significance and substantial significance? Can we measure substantial significance with statistics? What are all the different ways we can look at model coefficients? Print the object name Use summary() Use tidy() from the {broom} package Use model_parameters() and model_details() from the {parameters} and {performance} packages Make nice polished side-by-side regression tables with {modelsummary} Make automatic coefficient plots with modelplot() from {modelsummary} Plot model predictions and marginal effects Automatic interpretation with {report}
Posted a helpful little set of FAQs about regression for my causal inference class, including illustrations of statistical vs. substantive signficance and all the different things you can do with #rstats model objects
evalsp26.classes.andrewheiss.com/news/2026-02...
A new creative writing style bench and leaderboard for LLMs surgehq.ai/blog/hemingw...
Did you learn differential privacy (in part or in whole) from my course? Either the videos, lecture notes, or some combination? Please send me a DM or an email, I'm trying to gather some info.
(In case you missed it, here's the course: www.gautamkamath.com/CS860-fa2020..., ft full notes & videos)
Data Science Weekly - Issue 636, by @DataSciNews open.substack.com/pub/datascie...
Things are grim. But in more frivolous news...
@jamesbrandecon.bsky.social and I have been chipping away at `dbreg`, a 📦 for running big regression models on database backends. For the right kinds of problems, the speed-ups are near magical.
Website: grantmcdermott.com/dbreg/
#rstats
[1/2]
I’ve been thinking about how to find R packages by functionality when you don’t already know the package name.
So over the holidays, Claude Code and I built The Warehouse: a functionality-first R package directory that helps you find packages by what they do.
rwarehouse.netlify.app
#rstats
Two approaches to generating pipelines:
Parametric: define parameters, tool generates SQL
Template-based: write SQL templates with variables
dbt took templates. Automation tools took parametric. Neither is wrong - they optimize for different teams.
Data Science Weekly - Issue 635, by @DataSciNews open.substack.com/pub/datascie...
Data Science Weekly - Issue 635, by @DataSciNews open.substack.com/pub/datascie...
crying and laughing at the same time is good for the soul
Don't know who needs to hear this, but don't be afraid of starting a blog in 2026. SEO will be brutal at first, but you can find an audience through BlueSky
Regarding stack options, Quarto is good if you are doing an R/Python heavy data science site. Or for a modern CMS I recommend Statamic
For my #popgen nerds out there, I've finished building a Python package called hapnet - an easy to use, straight out the box tool for building #haplotype networks. Just plug your data in, run a single line of code and voila! A pretty color coded network. Still beta testing!
pypi.org/project/hapn...
We just published a JOSIS paper on what spatial data science languages have in common and what they still need. Insights from across the R, Python & Julia ecosystems.
URL: doi.org/10.5311/JOSI...
#SpatialDataScience #GISchat #OpenSource #RSpatial #GeoPython #JuliaGeo