Skrub's Avatar

Skrub

@skrub-data

skrub is a Python library to ease preprocessing and feature engineering for tabular machine learning. Our long-term goal is to directly connect database tables to machine learning estimators. https://skrub-data.org https://discord.gg/ABaPnm7fDC

630
Followers
48
Following
130
Posts
19.11.2024
Joined
Posts Following

Latest posts by Skrub @skrub-data

Join the Skrub Discord Server! Check out the Skrub community on Discord – hang out with 106 other members and enjoy free voice and text chat.

You can contact us either here or on our Discord server: discord.gg/ABaPnm7fDC

18.02.2026 10:27 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

In addition, we will begin crediting specific contributors here on Bluesky when a contributor has worked on the subject of the post. We will use GitHub handles for this purpose. If you prefer your handle not to be used or would like to be credited by name instead, please let us know.

18.02.2026 10:27 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

As a follow-up, we would like to clarify how we’ll be crediting contributors moving forward.

Currently, all contributions to the repository are tracked in the changelog and highlighted in the release notes, where each PR and the GitHub handle of its author are listed.

18.02.2026 10:27 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Thanks to e-strauss for writing this example!

18.02.2026 10:25 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 1
Using PyTorch (via skorch) in DataOps This example shows how to wrap a PyTorch model with skorch and plug it into a skrub DataOps plan. The main goal here is to show the integration pattern: PyTorch defines the model (an nn.Module), sk...

While skrub Data Ops shine when preparing dataframes, their capabilities extend beyond that. For example, they can be used alongside libraries like PyTorch and skorch to work with images, and tune the model size to find the best set of hyperparameters:

skrub-data.org/stable/auto_...

18.02.2026 10:25 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Using PyTorch (via skorch) in DataOps This example shows how to wrap a PyTorch model with skorch and plug it into a skrub DataOps plan. The main goal here is to show the integration pattern: PyTorch defines the model (an nn.Module), sk...

- A new example has been added to show how skrub Data Ops can be used with pytorch and skorch to solve an image classification task.

skrub-data.org/stable/auto_...

10.02.2026 13:32 πŸ‘ 0 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0

Main changes:
- The StringEncoder now exposes the vocabulary parameter, allowing it to be passed to the underlying TfidfVectorizer.
- The function compute_ngram_distance has been made private to reduce clutter.
- The repository wheel has been made smaller by removing some benchmarking material.

10.02.2026 13:32 πŸ‘ 1 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0
Preview
Release Skrub release 0.7.2 · skrub-data/skrub ✨ skrub version 0.7.2 has been released ✨ In this release we squashed more bugs, improved the API reference, and added a new example. Main changes: The StringEncoder now exposes the vocabulary par...

✨ skrub version 0.7.2 has been released ✨

In this release we squashed more bugs, improved the API reference, and added a new example.

github.com/skrub-data/s...

10.02.2026 13:32 πŸ‘ 2 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0
Tuning DataOps with Optuna This example shows how to use Optuna to tune the hyperparameters of a skrub DataOp. As seen in the previous example, skrub DataOps can contain β€œchoices”, objects created with choose_from(), choose_...

Here is a full example on how to use skrub Data Ops with Optuna

skrub-data.org/stable/auto_...

05.02.2026 08:52 πŸ‘ 0 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0

At the end, you get a fully-fledged Optuna study to work
with. Of course, that includes support for the Optuna dashboard and access to the Optuna reporting and plotting interfaces.

05.02.2026 08:52 πŸ‘ 0 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0
Three snippets of python code showing how to use skrub Data Ops with the Optuna optimization library.The first snippet shows a standard randomized search with the Data Ops. The second snippet adds the parameter "backend", which is set to "optuna". The third snippet uses the Optuna visualization API to plot information from the study.

Three snippets of python code showing how to use skrub Data Ops with the Optuna optimization library.The first snippet shows a standard randomized search with the Data Ops. The second snippet adds the parameter "backend", which is set to "optuna". The third snippet uses the Optuna visualization API to plot information from the study.

Did you know that the skrub Data Ops support Optuna as backend to run hyperparameter search?

It's as easy as writing "backend='optuna'": this will set up a default Optuna study (and the TPE sampler) to replace the standard random sampler.

05.02.2026 08:52 πŸ‘ 4 πŸ” 2 πŸ’¬ 1 πŸ“Œ 0
Release Skrub release 0.7.1 Β· skrub-data/skrub Release 0.7.1 New features A new dataset, fetch_california_housing(), has been added to the skrub.datasets module. It allows to get a redundancy copy of the scikit-learn fetch_california_housing()...

Happy new year! πŸŽ‰πŸŽ‰πŸŽ‰

Let's celebrate 2026 with a bugfix release that implements some fixes, brings some documentation improvements and adds a new dataset fetcher:

github.com/skrub-data/s...

15.01.2026 08:59 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

The course covers:
- How to explore and sanitize data with skrub
- How to use the skrub transformers for powerful and reliable feature engineering
- How to put everything together in a machine learning pipeline

Skrub Data Ops are not included (yet).

19.12.2025 09:44 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
skrub like a pro: clean, prepare, and transform your data faster - Inria Academy

Do you want to learn how to use skrub like a pro? Then you're in luck!

Inria Academy is providing an introductory course on skrub aimed at IT personnel, engineers, data scientists, and data analysts.

www.inria-academy.fr/formation/sk...

19.12.2025 09:44 πŸ‘ 3 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Skrub: machine learning for dataframes
Skrub: machine learning for dataframes YouTube video by PyData

The recording of the talk we did at @pydataparis.bsky.social 2025 is now available on the PyData Youtube channel! πŸš€

You can find it here, if you want to check it out πŸ‘€

www.youtube.com/watch?v=k9MN...

16.12.2025 21:20 πŸ‘ 3 πŸ” 0 πŸ’¬ 0 πŸ“Œ 1
Preview
Release Skrub release 0.7.0 · skrub-data/skrub Release 0.7.0 ✨ Highlights Data Ops can now be tuned with Optuna. It is now possible to pass extra named arguments to an estimator through DataOps.skb.apply. The TableReport now supports numpy arr...

Skrub 0.7.0 is here! πŸŽ‰

✨ Main highlights:
- Tune hyperparameter choices with Optuna
- Added support for Pandas 3.0
- Estimators in data ops can now take additional kwargs

16 new contributors helped with this release πŸ‘₯

Check out the full changelog: github.com/skrub-data/s...

12.12.2025 09:55 πŸ‘ 3 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Clean code in Data Science - Gael Varoquaux - Skrub DataOps, Probabl:
Clean code in Data Science - Gael Varoquaux - Skrub DataOps, Probabl: YouTube video by dotconferences

@skrub-data.bsky.social: better data-science primitives for clean code on dataframes

Watch my dotAI talk, it's fun (live coding)!
www.youtube.com/watch?v=bQS4...
skrub really makes it easy to do machine learning with dataframes

17.11.2025 17:07 πŸ‘ 27 πŸ” 8 πŸ’¬ 0 πŸ“Œ 0
ApplyToFrame Gallery examples: Hands-On with Column Selection and Transformers

skrub-data.org/stable/refer...

08.10.2025 12:43 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
ApplyToCols Gallery examples: Getting Started Hands-On with Column Selection and Transformers

skrub-data.org/stable/refer...

08.10.2025 12:43 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Hands-On with Column Selection and Transformers In previous examples, we saw how skrub provides powerful abstractions like TableVectorizer and tabular_pipeline() to create pipelines. In this new example, we show how to create more flexible pipel...

Example: skrub-data.org/stable/auto_...

08.10.2025 12:43 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

For even more control over column selection, skrub provides a collection of selectors that let you partition dataframes by data type, column name, or user-specified functions.

08.10.2025 12:43 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

All these transformers can be concatenated and inserted in a scikit-learn pipeline to build a feature matrix with complex column selection operation, and can be seen as an alternative for the scikit-learn ColumnTransformer.

08.10.2025 12:43 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

ApplyToFrame selects columns in the same way, but then uses all of them at the same time as input to the transformer: this is useful for dimensionality reduction.
SelectCols and DropCols can be used as "filtering blocks" in a pipeline.

08.10.2025 12:43 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

Skrub includes a powerful set of transformers and selectors that allow to transform columns based on various conditions.

ApplyToCols lets you select a subset of columns in your dataframe, then applies a transformer to each selected column separately.

08.10.2025 12:43 πŸ‘ 3 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

On vous a déjà dit que Skrub c'est cool ? Et que l'intervention de @riccardocappuzzo.com était très chouette ? Hein, on vous l'a dit ?
skrub-data.org/skrub-materi...

07.10.2025 14:44 πŸ‘ 3 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
Skrub learning materials – Skrub

Slides:
skrub-data.org/skrub-materi...

07.10.2025 14:36 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Thanks to @riccardocappuzzo.com , @glemaitre58.bsky.social and Jérôme Dockès for preparing the talk, and mentoring at the sprint!

07.10.2025 14:36 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

The sprint was also a big hit, with both new and old contributors working on issues and getting to know the repository.

And to cap it all off, thanks to P16 we have stickers now πŸš€

07.10.2025 14:36 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
The skrub sticker on the back of a laptop

The skrub sticker on the back of a laptop

@pydataparis.bsky.social 2025 is over, and it was a big success!

Our talk was very well received, and we got a lot of great questions, especially about scalability and how to interface with other libraries in production environments.

07.10.2025 14:36 πŸ‘ 5 πŸ” 0 πŸ’¬ 1 πŸ“Œ 1
Post image

What a banger is skrub @skrub-data.bsky.social !

Big thumbs up for the sklearn team & the maintainer of this package

01.10.2025 08:23 πŸ‘ 14 πŸ” 4 πŸ’¬ 1 πŸ“Œ 0