Adrian Brudaru (@datateam)

Production-grade data ingestion using dlt and Snowflake. Build production-grade data pipelines using dlt and Snowflake. You’ll learn core loading patterns, external S3 staging, in-Snowflake execution with SPCS, and how Snowflake Native Apps enable integrated ingestion workflows.

Next up:
- Run dlt inside Snowflake using Snowpark Container Services (SPCS)
- Build integrated ingestion workflows with the Snowflake Native App model
- Run pipelines directly in Snowsight notebooks (Warehouse Runtime + Container Runtime)

Enroll ↓

08.03.2026 16:46 👍 1 🔁 0 💬 0 📌 0

Spring brings good things and more than just flowers. 🌸 ❄️

Catch up on Module 1 of our dlt + Snowflake course before Module 2 drops next week!

Learn nested data normalization, schema evolution, incremental loading, and merge strategies (upsert, SCD2), all in plain Python.

08.03.2026 16:46 👍 5 🔁 0 💬 1 📌 0

Tasman Analytics prototypes client pipelines with dltHub Pro Tasman Analytics cut scoping from 2 weeks to 20 min with dltHub Pro. See how they prototype any client pipeline in a single meeting and deliver faster than ever

Now Thomas in't Veld and his team prototype live in client meetings.

Real connectors, real data, before the meeting ends.

Scoping turns into a demo. Proposals turn into commitments.

Mid-level engineers do what once required seniors.

Case study ↓

05.03.2026 08:32 👍 1 🔁 0 💬 0 📌 0

Tasman.ai runs data engineering projects for mid-market and enterprise clients.

Their biggest challenge? Scoping.

Every new client meant figuring out which APIs to connect, how long it would take, and what the data actually looked like, often before seeing a single row.

05.03.2026 08:32 👍 0 🔁 0 💬 1 📌 0

Ontology driven Dimensional Modeling To understand how to answer world questions from data models, we don't need semantic layers, we need ontologies

Data models describe the data, ontologies describe the world.

With ontologies, an agent can reason over data as opposed to retrieving it and hallucinating meaning.

The ontology-model mapping is what agents need for data literacy.

This is NOW. Blog + demo

26.02.2026 19:24 👍 0 🔁 0 💬 0 📌 0

The engine behind the insights:

dlt → automated ingestion + schema evolution

dbt → reproducible SQL transformations

Metabase API → BI-as-code

A declarative, portable pipeline from source to visualization.

Full breakdown: https://dlthub.com/blog/ufc-analyser-dlt-dbt-metabase

22.02.2026 12:05 👍 2 🔁 0 💬 0 📌 0

Who is the UFC GOAT? 🥊📊

We turned that curiosity into a full-stack pipeline analyzing 30+ years of UFC fights.

Everything programmatic, even dashboard creation via API.

Production-grade insights. Full traceability. Zero manual overhead.

22.02.2026 12:05 👍 1 🔁 0 💬 1 📌 0

Debugging Our Docs RAG, Part 2: Testing New Generation By upgrading only the generative model, we achieved a 3x accuracy boost but hit a hard ceiling, proving that not only LLMs are needed for good retrieval.

Part 2 of our RAG debug series is out.

We froze retrieval, prompts & dataset, and tested newer models only.

Result: 3/14 → ~10/14 correct answers.
3× improvement just by upgrading the model.

Same system. Same eval set.
Retrieval is next.

Read more 👇

20.02.2026 16:38 👍 0 🔁 0 💬 0 📌 0

AI Memory: Understanding Modeling for Unstructured Data For the data engineering crowd, here’s an explainer of how unstructured AI memory works, though the lens of what we know from working with structured data.

AI agents fail without memory & context. @cognee.bsky.social turns data into self-improving, structured memory at scale.

If you’ve built a modern data stack (ingest → transform → access), you already know the pattern.

Backed by pebblebed, congrats on the $7.5M.

19.02.2026 18:11 👍 0 🔁 1 💬 0 📌 0

Stack: dlt → dbt → Metabase
Prefect + Scaleway

Mon: Validate dlt
Tue: Add sources
Wed: Move to self-hosted worker
Thu: Remove Airbyte
Fri: Stabilize + Slack alerting

Timeline: 5 days.
Enabler: Claude Code.

Our blog on moving from Airbyte to dlt 👇

https://dlthub.com/blog/convert-airbyte

16.02.2026 12:03 👍 1 🔁 0 💬 0 📌 0

Ingestion shouldn’t be a maintenance trap.

From Airbyte to dlt in one week.

Slides 👇

https://docs.google.com/presentation/d/e/2PACX-1vQvJapgEkJxgpsWqoMlmEw-ctV3gZe0LLc5oZBHaJNezBGAYKYoyir1aQi-37tO37SjFGaYjmQJhi_r/pub?start=false&loop=false&delayms=3000&slide=id.g175a817e68e_3_932

16.02.2026 12:03 👍 1 🔁 0 💬 1 📌 0

We’ll go beyond basic ETL:

- Handling nested & evolving schemas
- Accelerating pipeline creation with LLMs
- Moving from scripts to reliable ingestion workflows
- Validating data and schema changes using the dlt dashboard and dlt MCP

12.02.2026 18:05 👍 1 🔁 0 💬 1 📌 0

From APIs to Warehouses: AI-Assisted Data Ingestion with dlt · Luma This hands-on workshop focuses on building reliable data ingestion pipelines to data warehouses (for example, Snowflake) using dlt (data load tool), enhanced…

From APIs to Warehouses 📦

On Feb 17 (16:30 CET), together with DataTalks.Club, Aashish Nair will walk through building end-to-end ingestion pipelines with dlt, from raw APIs to production-ready warehouse loads.

Register here 👇

12.02.2026 18:05 👍 0 🔁 0 💬 1 📌 0

The Last Mile is Solved by Slop I didn't vibe-build a product. I wrote a messy scaffold that runs a pipeline, grabs the schema, and forces an agent to build a star schema. It works shockingly well.

What if dimensional modeling didn’t mean hours of boilerplate SQL?

We built an AI workflow that turns raw data into semantic models in minutes, powered by 20 questions.

Rethinking data transformation 👇

12.02.2026 16:40 👍 2 🔁 0 💬 0 📌 0

Who’s speaking in Berlin 👇

- Francesco Mucio: integrating 20+ APIs
- Bijan Soltani: real-world analytics
- Nemanja Bibic: ingestion for AI memory
- Ken Schröder: analyst-friendly ingestion w/ dlt on AWS
- Violetta Mishechkina: AI agents, data quality & what’s next

See you there 🚀

11.02.2026 21:04 👍 0 🔁 0 💬 0 📌 0

dltHub Community Meetup in Berlin with Cognee, Untitled Data Company, Gemma Analytics & Babbel · Luma Join us for the dltHub Community Meetup in Berlin. This evening is for curious minds who want to learn more about what we’re building at dltHub. We’ll share a…

Berlin, it’s meetup time!

Join us for the dltHub Community Meetup, an evening of real-world demos, lessons learned, and conversations with builders.

📍 Rosebud, Berlin
📆 Feb 17 | 18:00 – 21:00

Curious about what we’re building at dltHub? Come by 👋

11.02.2026 21:04 👍 0 🔁 1 💬 1 📌 0

Production pipelines don’t fail loudly, they drift.

Feb 12 · 16:00 CET - Online
Hands-on workshop on operating pipelines in production:
• schema changes
• backfills
• CI/CD
• long-term reliability

Register → https://community.dlthub.com/workshop-maintaining-servicing-production-data-pipelines

10.02.2026 17:27 👍 0 🔁 0 💬 1 📌 0

💘 Data Valentine Challenge started today.

5 days. 5 live data sessions with:
@datarecce.bsky.social, Greybeam, @databasetycoon.bsky.social, @bauplan.bsky.social

Our slot: Wednesday → Pipelines That Don’t Ghost You

Feb 9–13 | 9am PT | Online

https://reccehq.com/data-valentine-week-challenge/

09.02.2026 22:16 👍 1 🔁 0 💬 0 📌 0

Create, debug & maintain dlt pipelines in production - dltHub Workspace Go from writing pipeline code to ingesting data and delivering reports via Notebooks - all in one flow. Discover over 10,100 REST API data sources today.

• Markets (Kalshi, Polymarket, DEX Screener)
• AI platforms (fal, Jina AI, Kie AI)
• Macro data (World Bank, Finnhub, Alpha Vantage, Frankfurter)
• Entertainment (PokéAPI, OpenF1)

Explore the contexts 👇

09.02.2026 10:52 👍 0 🔁 0 💬 0 📌 0

January’s Rising Stars in the dlt ecosystem 👇

Builders are vibe coding pipelines around real-time markets, AI dev platforms, macro data, and more.

What’s trending right now:

09.02.2026 10:52 👍 0 🔁 0 💬 1 📌 0

3.7x Faster Pipelines: Benchmarking Arrow & ADBC vs. SQLAlchemy for EL Moved 5M rows from DuckDB to MySQL 3.7x faster, reducing time from 344s to 92s by switching from SQLAlchemy’s row-by-row path to Arrow + ADBC’s columnar pipeline.

Arrow + ADBC + dlt just broke the EL speed limit.

5M rows DuckDB→MySQL:
SQLAlchemy 344s
Arrow + ADBC 92s (3.7× faster)

One line of code. Columnar end-to-end.

Benchmarks:

03.02.2026 18:00 👍 0 🔁 0 💬 0 📌 0

The Builder: Outliving the Modern Data Stack We were told that democratization meant 'safety,' but all we got were expensive cages. The era of the SaaS hostage is ending; the era of the sovereign Builder has begun.

The Modern Data Stack™ was a comfy lie that turned data engineers into passive consumers, now the bill's due, market's schisming into vendor-locked hell vs builder freedom.

Read the Builder's Manifesto:

28.01.2026 15:45 👍 1 🔁 0 💬 0 📌 0

📍 Amsterdam
🗓 Jan 29 · 3–6 PM (GMT+1)

Agenda highlights:
• Vision — Matthaus Krzykowski (@matthausk.bsky.social) & Julian Alves | dltHub
• Demos — Vincent D. Warmerdam (@koaning.bsky.social) | Marimo, Mehdi Ouazza (@mehdio.com) | MotherDuck
• First impressions — Thomas in't Veld | Tasman Analytics

24.01.2026 13:08 👍 0 🔁 0 💬 0 📌 0

Together with @motherduck.com, @duckdb.org, and marimo, we’re bringing together a toolkit built for full-stack data developers:
🔹 ingest with dlt
🔹 query fast
🔹 serve instantly

Built for builders, not enterprise overhead.

24.01.2026 13:08 👍 0 🔁 0 💬 1 📌 0

dltHub ❤️ Marimo ❤️ MotherDuck · Luma dltHub and Marimo and MotherDuck are having a child. What are looking at? dltHub provides the ELT, runtime, and execution layer, turning production data…

Want to influence the tools you use every day?

We’re hosting a Builder’s Data Stack meetup focused on developer flow, fast iteration, and shaping the roadmap with the community.

Pull up a chair:

24.01.2026 13:08 👍 1 🔁 0 💬 1 📌 0

The Plutonium Protocol: Engineering Safety for the LLM Intern Era The “data is oil” era is over. With LLMs, data is plutonium: powerful, toxic. Shift left and secure the reactor with 5 quality pillars.

An AI agent ignored a code freeze, wiped a prod DB, then hallucinated data to cover it up.

Data quality in the LLM era isn’t optional, it’s a safety problem.

We call it data as plutonium - powerful and dangerous without containment.

21.01.2026 18:55 👍 2 🔁 2 💬 0 📌 0

🎤 Call for Speakers
Using dlt in your projects? We’re opening the mic to the community for short 10–15 min talks sharing:
🛠️ real use cases
📚 lessons learned

If this sounds like you, reach out via the event page.

Let’s learn from each other in Paris!

15.01.2026 15:45 👍 0 🔁 0 💬 0 📌 0

dlt Paris Community Meetup #2 with dltHub & Polycea · Luma Join us for an evening of community and conversation! Co-hosted by dltHub and Polycea, this meetup brings people together for short talks and networking with…

🇫🇷 Paris data folks 💛

We’re hosting a dlt Community Meetup in Paris on Feb 4th (6–9 PM) together with Polycea.

A community meetup focused on practical takeaways, shared learnings, and conversations with people using dlt hands-on.

Join us here:

15.01.2026 15:45 👍 0 🔁 0 💬 1 📌 0

Data quality is the vegetables of data engineering: everyone agrees it's important, but nobody wants to implement it.

To increase your 𝚟̶𝚎̶𝚐̶𝚎̶𝚝̶𝚊̶𝚋̶𝚕̶𝚎̶ ̶𝚒̶𝚗̶𝚝̶𝚊̶𝚔̶𝚎̶ test coverage, check out these 11 delicious recipes.

https://dlthub.com/blog/practical-data-quality-recipes-with-dlt

11.01.2026 17:07 👍 0 🔁 0 💬 0 📌 0

This gives you a self-healing system that keeps your semantic layer in sync as your data changes.

Huge thanks to Julien Hurault and Hussain Sultan for their contributions.

07.01.2026 21:31 👍 0 🔁 0 💬 0 📌 0

Adrian Brudaru

Latest posts by Adrian Brudaru @datateam