www.youtube.com/watch?v=syPE...
You're probably sick of me saying "B-tree" but these impact SO MUCH of database performance. They're used all over the place in Postgres, MySQL, and SQLite.
This week I broke down B-tree lookups and how the page cache makes lookups faster.
Choose your storage layer carefully.
Good catch. Fixed.
Blood, sweat, and tears π
But seriously, these were Js + GSAP, and in this case Cursor helped quite a bit too.
Introducing pg_strict for Postgres.
Our new extension adds a safety net to Postgres, catching dangerous queries before they run.
www.youtube.com/watch?v=noPn...
If databases fascinate you like they do me, this article's for you!
Every time you interact with a website, database transactions are keeping your data consistent, safe, and isolated.
I wrote an interactive guide to how they work β¬οΈ
Tuning your database just right can be counter-intuitive, unless you understand all levels of the system.
Intuitively, most would say "more work_mem = better" for building indexes, but this hurts performance due to L3 cache behavior.
Great article by Tomas Vondra.
vondra.me/posts/dont-g...
A key difference from B-trees: searching for a single rectangle may require searching multiple tree paths! In the ideal case, B-trees offer O(log n) search performance, but due to possible overlaps the worst-case performance is actually O(n).
Moving up the tree, the bounding rectangles get larger and larger, up to the root node storing a small number of large bounding boxes.
Entries in an R-tree are bounding rectangles. At the leaves we store the minimum bounding rectangle (MBR) for each region being stored, with a reference to the full geometry stored on-disk elsewhere. The parent entry of a leaf MBR stores a bounding rectangle that fully bounds all children.
They function similarly to B-trees: Itβs a tree structure with multiple entries at each page-aligned node. This generally keeps the trees nice and shallow, and allows for efficient lookups for millions of elements stored on-disk. It also generally only stores data values at the leaves, like B+trees.
R-trees are a powerful structure for indexing geometric data.
Theyβre used by MySQL, and Postgres uses an R-tree-like structure via GiST in PostGIS.
π§΅
TLDR: io_uring won't help much if treated as a drop-in replacement for existing database I/O architectures. Better performance will often require architectural changes. When applied, there's tons of performance gains to be had.
Here's the paper: arxiv.org/pdf/2512.04859
I'm excited about the database performance io_uring will unlock.
Last year I benchmarked Postgres 17 vs 18 to test the initial io_uring upgrades. I was surprised to see they weren't always a clear win for TPC-C.
This paper studies the potential, and the future looks good.
At PlanetScale we observe this first-hand for both MySQL and Postgres. We then get to go tackle these hard-but-fun engineering problems, so our customers don't have to.
Software at scale reveals the cracks.
Managing a system for a single use-case (databases or otherwise) can make it seem like a perfect solution. It just might be for that narrow environment!
At scale you see all the edge cases because you're operating on so many workloads.
This makes for fast change detection (O(log n)), and saves bandwidth since we only need to re-sync files that we know have been modified.
When syncing local file changes with a remote server (like Git) we can quickly tell if changes were made by comparing the client's root hash to the server's. If they differ, the tree is navigated to find the leaf node(s) with changes, and only those files need to be re-synced.
We then build a tree based on the directory structure. A parent node's hash is a hash of the concatenation of all its children's hashes. Inner nodes' hashes are based on the data / hashes of its descendants. This culminates in the root node, whose hash is based on ALL the tracked source files.
What do Git, Cursor, and Dynamo have in common?
Merkle trees!
A great data structure for tracking file changes, facilitating incremental sync with remote servers.
Say we want to track changes to a codebase at a per-file level. We compute a hash for each source file, and these become leaf nodes.
Need a break from AI in the timeline?
Listen to me talk about data organization instead :)
Friday's stream was a fun one. Sequential writes, binary search trees, block I/O devices, and B-trees. The latest slice dropped this afternoon.
www.youtube.com/watch?v=84b_...
Equip yourself with the fundamental building-blocks of software systems. This combined with the right LLM tooling can take you very far.
Merkle trees, consistent hashing, vector clocks, gossip protocols, and quorum algorithms were all previously known and had been used to build other software. But their unique combination to build Dynamo worked super well for Amazon's use-case, and helped them to scale to millions of DAU.
Though it was published in 2007 (nearly 20 years ago!) it was revolutionary in its day, and an example of how already-known technologies can be combined to make something new and extremely successful.
If 2026 is the year of AI, it's also the year to read more papers.
LLMs make writing code cheaper. This places greater emphasis on architectural choices, understanding design tradeoffs, ensuring security, and building things people actually need.
Great example: yesterday I read the Dynamo paper.
2026 is the year to end TikTok brain.
Instead, learn database internals on YouTube.
Speaking of which, another dropped today (link below).
Cameras, lenses, framing, and everything in-between have fascinated me for many year.
This morning I read Bartosz Ciechanowski's article on the subject. It's the best explainer I've seen. The interactivity really sells it.
Great article to kick off your year with:
ciechanow.ski/cameras-and-...