The data engineer job title is due for an update.
Not because AI is replacing the role, but because AI is finally revealing what the role was always actually about.
Moving data was never the point. Meaning it is.
Read more:
The data engineer job title is due for an update.
Not because AI is replacing the role, but because AI is finally revealing what the role was always actually about.
Moving data was never the point. Meaning it is.
Read more:
At the end of 2026, we will talk about "AI Fan Effect" [en.wikipedia.org/wik...] and the invention of a new field: Psychology for AI. Perhaps, I feel this is the future of software engineering.
As we move from dashboards to autonomous agents, something breaks.
Systems of record capture what happened, not why.
Why data platforms need Truth Registries + Context Graphs for the agentic era ๐
www.dataengineeringw...
#DataEngineering #AgenticAI #Graphs #LLMs
Data Engineering Weekly's 254th edition is out. Context Graph is the new talk of the town!!
The companies that build the most boring data stack often win the market!!!
Prove me wrong.
Data Contract: There was no shortage of activity around the topic. Definitions were proposed and refined. Conceptual boundaries were drawn and redrawn.
I pen down a reflection of the Data Contracts here
www.dataengineeringweekly.com/p/data-contr...
How to build a scalable shopping agent?
Here's a wild thought:
What ifโand hear me outโwe let humans click that Buy Now button? Just throwing ideas out there.
This week, it is mostly about Multi-Agent Architecture. Do you think the data infrastructure is ready for a multi-agent architecture? Where is the gap?
Is semantic Spec Good enough to run an enterprise system? I listed challenges to adopting the Iceberg Rest Catalog
Continuing our yearly tradition of Year in Review Data Engineering Weekly, we published the 2025 Year in Review. What do you think is the most notable trend of 2025?
www.dataengineeringw...
Look at the tech stack IBM now controls:
๐ง Compute: Red Hat (Linux/OpenShift)
โ๏ธ IaC: HashiCorp (Terraform)
๐ฐ FinOps: Kubecost
๐ Streaming: Confluent (Kafka)
๐ง Vector/AI: DataStax (Cassandra)
โก Query Engine: Ahana (Presto)
๐ Ingest: StreamSets
LinkedIn moves FishDB to Rust, DoorDash builds AI swarms, and Dropbox masters context engineering. ๐คฏ Data Engineering Weekly #247 is packed with system design deep dives from the best engineering teams.
If the Data Catalog is the answer for AI, the question was wrong.
We stopped asking if data was useful because storage got cheap. Now, "Dark Data" is actively poisoning your AI context windows with hallucination vectors.
Read about the Data Sustainability index
The open source companies built their success on top of open-source platforms, benefited from community contributions and adoption, but now must abandon open-source principles to survive commercially.
๐ The 244th edition of Data Engineering Weekly dives into:
AI agents as execution engines, LLM inference economics, databases for AI, personalization, and product evidence.
Read more ๐ www.dataengineeringw...
#DataEngineering #AI #LLMs
Cricket has been Indiaโs greatest force in overcoming centuries of colonial suppression. Todayโs Womenโs World Cup win echoes the spirit of 1983 โ a triumph that will inspire generations to come. ๐ฎ๐ณ๐
This is the most personal essay that I have written in Data Engineering Weekly. I shared a few key moments in my life and how fortunate I was to meet mentors along my professional journey, which shaped my career.
๐ Data Vault vs. Dimensional Modeling vs. Medallion Architecture โ When viewed through a modern enterprise data lens, these techniques interlock.
I break down how in Part 2 of my โRevisiting the Medallion Architectureโ series.
Fivetran and dbt form a strong foundation for modern data infrastructure, known for bringing simplicity to complex engineering workflows. That said, calling it โopenโ data infrastructure feels like a stretch.
Should we update the definition of an "Analytical Engineer"?
As a data engineer, you can't treat zero-party (consent) and third-party (inferred) data the same way. This distinction is critical for building systems that are scalable, private, and trustworthy.
Hereโs my guide:
Could be. Composable CDP has not gained significant market share, as identity resolution is a key component that is often proprietary.
With Census already in with Fiveatran and with dbt, it is most likely to evolve as a composable CDP.
Airbnb: Real-Time Key-Value Store
Airbnbโs next-gen key-value store supports real-time ingestion and bulk uploads with sub-second latency, powering feature stores and fraud detection.
Read the full story here: www.dataengineeringw...
Grab: Partner Gateway Metrics at Sub-Second Speed
Real-time partner analytics at scale is tough. Grab uses Apache Pinot, KafkaโFlink ingestion, partitioning, and Star-tree indexing to cut query latency to <300 ms, enabling efficient API monitoring and fast issue resolution.