Hynek Kydlíček

@hynky

MLE @huggingface 🤗 Prague, CZ 🇪🇺 eu/acc

18
Followers 3
Following 2
Posts 27.11.2024
Joined

Posts Following

Latest posts by Hynek Kydlíček @hynky

HuggingFaceFW/fineweb-2 · Datasets at Hugging Face We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Going far beyond our original FineWeb, we've created something massive - 1,893 script-language pairs with almost 3 trillion words spanning 8TB of compressed files! 📚

It's fully open-source released under ODC-By 1.0, with fully reproducible code! 💻

huggingface.co/datasets/Hug...

08.12.2024 09:27 👍 1 🔁 0 💬 0 📌 1

We heard you liked the FineWeb, so we made a second one: FineWeb 2! 🥂 Now supporting thousands of languages! 🌎

True to our standard, the fermentation process is of the highest quality; it beats all other datasets in 83% of tracked languages 📈.

08.12.2024 09:27 👍 1 🔁 0 💬 1 📌 0