Jepsen's Avatar

Jepsen

@jepsen.mastodon.jepsen.io.ap.brid.gy

Breaking distributed systems, one fault at a time. [bridged from https://mastodon.jepsen.io/@jepsen on the fediverse by https://fed.brid.gy/ ]

2,032
Followers
1
Following
48
Posts
30.04.2024
Joined
Posts Following

Latest posts by Jepsen @jepsen.mastodon.jepsen.io.ap.brid.gy

Preview
Filesystem sync errors unchecked · Issue #7629 · nats-io/nats-server Observed behavior There are multiple places in server/filestore.go where the file .Sync() call is used: https://github.com/search?q=repo%3Anats-io%2Fnats-server%20%22.Sync()%22%20path%3Aserver%2Ffi...

In the vein of https://research.cs.wisc.edu/adsl/Publications/cuttlefs-tos21.pdf, Lee Brotherston writes that NATS sometimes neglects to check whether calls to `fsync` actually succeeded: https://github.com/nats-io/nats-server/issues/7629

09.12.2025 17:27 👍 3 🔁 0 💬 0 📌 0
Jepsen: NATS 2.12.1

A new #Jepsen report: we demonstrate data loss and persistent split-brain in the NATS streaming system, in response to simulated power failures/OS crashes.

https://jepsen.io/analyses/nats-2.12.1

08.12.2025 18:47 👍 24 🔁 12 💬 1 📌 1
Original post on mastodon.jepsen.io

Jepsen and Antithesis worked together to write a glossary for anyone building, testing, and operating distributed systems. It covers the basics of concurrency, consistency models and phenomena, faults, and some testing approaches:

https://antithesis.com/resources/reliability_glossary/ […]

02.12.2025 22:52 👍 16 🔁 5 💬 0 📌 0
Original post on mastodon.jepsen.io

A new #Jepsen release, 0.3.10, brings improved support for controllable random value generation, and running tests inside Antithesis. Jepsen's composable generator system has also been extracted to a minimal library, making it easier to re-use in other systems […]

02.12.2025 22:40 👍 4 🔁 0 💬 0 📌 0

The latest Jepsen talk, from Systems Distributed in June, goes live in 15 minutes. We'll be doing a live chat during the premier, if you want to chat about databases and testing. :-)

https://www.youtube.com/watch?v=dpTxWePmW5Y

18.08.2025 16:50 👍 5 🔁 4 💬 0 📌 0
Original post on mastodon.jepsen.io

A new #Jepsen report! We tested early builds of Capela, an unreleased distributed programming environment, and found twenty-two issues, including four language problems, fourteen crashes or non-fatal panics, performance degradation, and three safety issues including lost update […]

07.08.2025 14:44 👍 8 🔁 3 💬 0 📌 0

An interview with Kaivalya Apte, on The GeekNarrator podcast. We talk about mapping properties to tests, type I and II errors, performance, LLMs, and more.

https://www.youtube.com/watch?v=IvE1VbOol88

06.08.2025 15:15 👍 7 🔁 1 💬 0 📌 0

The video of my BugBash talk, "Jepsen 17: ACID Jazz" is out now! https://www.youtube.com/watch?v=v8cG2hh10SQ

05.08.2025 17:54 👍 6 🔁 1 💬 0 📌 0
Preview
A distributed systems reliability glossary A list of key concepts for building and testing reliable distributed systems, with basic definitions and deep references.

Antithesis and Jepsen are releasing a glossary of terms useful in distributed systems testing: https://antithesis.com/resources/reliability_glossary/

15.07.2025 14:42 👍 18 🔁 13 💬 0 📌 0
A parody of John Waters' "Serial Mom", except it's "Serializable Mom". I'm holding scissors (to partition the network) and trying to channel my best homage to Kathleen Turner.

A parody of John Waters' "Serial Mom", except it's "Serializable Mom". I'm holding scissors (to partition the network) and trying to channel my best homage to Kathleen Turner.

Systems Distributed. June 19-20, Amsterdam.

https://systemsdistributed.com/

07.06.2025 15:53 👍 50 🔁 6 💬 3 📌 4
Two bugs sitting on cozy chairs, sewing. The Tiger Beetle bug is gesturing "No! Stop!"  to the other bug, who is sewing a shirt all wrong. I think the other bug is Jepsen. :D

Two bugs sitting on cozy chairs, sewing. The Tiger Beetle bug is gesturing "No! Stop!" to the other bug, who is sewing a shirt all wrong. I think the other bug is Jepsen. :D

The companion blog post from TigerBeetle is great too--dives into detail on the bug we found in the index-intersection query code:
https://tigerbeetle.com/blog/2025-06-06-fuzzer-blind-spots-meet-jepsen/

06.06.2025 11:34 👍 12 🔁 1 💬 0 📌 0
Original post on mastodon.jepsen.io

A new #Jepsen report! We worked with TigerBeetle to find seven crashes, elevated latencies during single-node failures, and requests which were retried forever in version 0.16.11. We found only two safety issues: missing results for queries with multiple predicates, and incorrect timestamps in a […]

06.06.2025 10:53 👍 42 🔁 12 💬 1 📌 2
Preview
Release 0.3.9 · jepsen-io/jepsen A medium-sized release, this version makes several quality-of-life improvements. We (finally!) download log files only once, rather than twice at the end of a run; this should make multi-GB log dow...

Jepsen 0.3.9 is now available, including a module for restarting flaky databases, a few improvements to downloading logs, more capable generators, and friendlier error messages. https://github.com/jepsen-io/jepsen/releases/tag/v0.3.9

07.05.2025 13:53 👍 10 🔁 1 💬 0 📌 0
Preview
How To Understand That Jepsen Report There is a new Jepsen report which, as always, is an entertaining read and gives some insight into the way things might be broken in subtle ways. I learned...

Justin Jaffray has written up a lovely companion to this piece, diving into Snapshot Isolation and how to understand transaction dependency structures. https://buttondown.com/jaffray/archive/how-to-understand-that-jepsen-report/

06.05.2025 00:58 👍 16 🔁 4 💬 0 📌 1
Original post on mastodon.jepsen.io

Thanks to AWS's Sergey Melnik, as well as HN commenters matashii and Ants Aasma, we now know that Long Fork in PostgreSQL clusters is caused by a disagreement between primaries and secondaries on the order in which transactions become visible […]

03.05.2025 15:11 👍 11 🔁 6 💬 1 📌 1
Original post on mastodon.jepsen.io

A small issue in Amazon RDS for PostgreSQL: at the "Repeatable Read” isolation level, which in PostgreSQL normally means Snapshot Isolation, Amazon RDS for PostgreSQL clusters appear to exhibit Long Fork. We observed this behavior in healthy clusters, in versions ranging from 13.15 to 17.4 […]

29.04.2025 14:29 👍 20 🔁 15 💬 1 📌 1

Added four new phenomena to Jepsen's docs: P4 (Lost Update), A5A (Read Skew), A5B (Write Skew) and Process.

https://jepsen.io/consistency/phenomena#sql

30.03.2025 20:28 👍 11 🔁 2 💬 0 📌 0
Original post on mastodon.jepsen.io

For the daytime crew: Jepsen's distributed systems class starts next week. The accompanying workshop, where we practice writing and debugging our own distributed systems, follows the week after […]

10.03.2025 18:27 👍 0 🔁 1 💬 0 📌 0
Preview
Release 0.3.8 · jepsen-io/jepsen The centerpiece of 0.3.8 is a new nemesis for corrupting files: jepsen.nemesis.file. This nemesis can be scoped to specific regions of a file, is aware of chunk structure (e.g. database pages) and ...

Jepsen 0.3.8 is now available. It includes a new nemesis for file corruption, and improvements to clock-skew tests. https://github.com/jepsen-io/jepsen/releases/tag/v0.3.8

06.03.2025 23:42 👍 5 🔁 0 💬 0 📌 0
Original post on mastodon.jepsen.io

What IS Strong Serializability, really? Ever want to try writing your own gossip service. Two open sessions of Jepsen's training classes are coming up: the Distributed Systems Fundamentals class, and (for the first time!) its accompanying workshop […]

05.03.2025 02:43 👍 0 🔁 2 💬 1 📌 0
Preview
BugBash 2025: software reliability conference Join us at a tech conference organized by Antithesis to explore reliable software development. April 3-4, 2024 at Yours Truly DC hotel, Washington D.C.

I'll be speaking on Jepsen at Bug Bash (DC, April 3-4), and Systems Distributed (Amsterdam, June 19-20). Come join!

https://bugbash.antithesis.com/

https://systemsdistributed.com/

05.03.2025 00:12 👍 11 🔁 3 💬 0 📌 0

Woke up to a bunch of excellent emails--y'all rock. Will try to write back to everyone in the next hour or so. ❤️

22.01.2025 14:40 👍 1 🔁 0 💬 1 📌 0
Original post on mastodon.jepsen.io

So, uh, the last time I had to call malloc() was a quarter century ago, and I am struggling to do basic tasks without corrupting the heap. I would love to hire one of you stripey-socked C witches for a very small contract to help finish […]

21.01.2025 23:35 👍 25 🔁 20 💬 1 📌 0

Added descriptions of the SQL isolation level anomalies P0, P1, P2, and P3 to the phenomena page: https://jepsen.io/consistency/phenomena#sql

22.12.2024 21:02 👍 21 🔁 4 💬 0 📌 0
Preview
GitHub - jepsen-io/maelstrom: A workbench for writing toy implementations of distributed systems. A workbench for writing toy implementations of distributed systems. - jepsen-io/maelstrom

Released version 0.2.4 of Maelstrom, Jepsen's workbench for writing toy distributed systems: https://github.com/jepsen-io/maelstrom

04.12.2024 18:49 👍 8 🔁 0 💬 0 📌 0
Preview
Distributed Systems Fundamentals Learn the theory and practice behind distributed systems, from safety and liveness to deployment and monitoring.

There's a few tickets left for the distributed systems class coming up in just over a week. If you'd like to join, now's the time. :-)

https://www.eventbrite.com/e/distributed-systems-fundamentals-registration-1060426286569?aff=mastodon

04.12.2024 18:29 👍 9 🔁 4 💬 0 📌 0
Jepsen Test on Patroni: A PostgreSQL High Availability Solution | Bin Wang - My Personal Blog

Bin Wang put together a Jepsen test for Patroni, a PostgreSQL replication system. All sorts of good stuff in here, including that the cluster can't handle a series of single-node failures: https://www.binwang.me/2024-12-02-PostgreSQL-High-Availability-Solutions-Part-1.html

03.12.2024 22:29 👍 16 🔁 3 💬 0 📌 0
Original post on mastodon.jepsen.io

Antithesis, Buf, and Jepsen are running a joint webinar on December 5th. We'll discuss a Kafka protocol safety issue, talk about the challenges of distributed systems testing, and show how Jepsen and Antithesis helped identify critical safety errors in Bufstream. Come watch Antithesis pause […]

27.11.2024 16:02 👍 19 🔁 8 💬 0 📌 0

Thanks to everyone who wrote in objecting to the report's description of data loss due to auto-commit. Some experiments this morning suggest that we got it wrong (at least for the official Java client). We've published an update to the report: https://jepsen.io/analyses/bufstream-0.1.0#updates

13.11.2024 16:59 👍 6 🔁 0 💬 0 📌 0
Original post on mastodon.jepsen.io

"That's never been my understanding of auto-commit, that would be a crazy default wouldn't it?"

I feel like there's a real need for database user research--building a quantitative and qualitative picture of how actual users understand the systems they use […]

12.11.2024 15:49 👍 8 🔁 0 💬 1 📌 0