1. 24
    1. 5

      Another great/thorough writeup. And kudos to the dgraph team (and all of the other previous jepsen customers), for putting their money where their mouth is to have their work publicly torture tested like this.

      @aphyr, 3 questions if you don’t mind the tangent:

      • Any chance on a FoundationDB analysis?
      • Are there a lot of jepsen users out in the wild? I imagine some bigger places have engineers that have a need/capability of using it within their own organization, but I don’t know what kind of feedback (if any) they give you.
      • Any thoughts on integrating jepsen with more formal methods? Any engineering pros/cons would be insightful.

      Thanks

      1. 7

        Thank you! To answer your questions…

        1. There’s no FDB analysis planned; they haven’t approached me, and since I just moved and took a couple months off, I really need to focus on taking paying gigs and rebuilding funds. My next client is all lined up though, and I should have more results to show in winter. :)

        2. It’s hard to say! I get PRs from maybe 5 active orgs, and I know of… maybe a dozen orgs who use it independently? I’ve also trained… maybe 150 people in writing Jepsen tests, but I don’t necessarily know whether those folks went on to use Jepsen at their orgs internally, adapted the techniques to their own test suites, or moved on to other things. I think the techniques are more important than the tool itself, so even if folks aren’t using Jepsen itself, I’m happy that they’re doing more testing, fault injection, and generative testing!

        3. I have sort of a “part of this balanced breakfast” take on Jepsen–it exists on a spectrum of correctness methods: normal proofs, machine-checked proofs, model checking, simulation testing, the usual unit & integration tests, Jepsen-style tests, internal self-checks, production telemetry, fault injection/chaos engineering, and user reports. In the early design phase, you want provable algorithms, but complexity might force you to give up on machine-checked proofs and move to model checking; model-checking covers weird parts of the state space but isn’t exhaustive, so it’ll miss some things. The map is never the territory, so we need simulations and tests for individual code components and the system as a whole, to verify that each piece and the abstraction boundaries between them hold up correctly. As you move to bigger tests, you cover more system interactions, but the state space generally explodes: larger tests explore less of the state space. Jepsen’s at the far end of that testing continuum, looking at all the interactions of a real production system, but only over short, experimentally accessible trajectories–a simulation test, like FDB does, is going to cover a lot more ground in the core algorithm, but may not catch bugs at the simulation layer itself or in untested components, e.g. a weird interaction between the filesystem and database which wouldn’t arise in an in-memory test. And Jepsen is specifically constrained to simple, testable workloads; it’s never gonna hit the data or request volumes, or query diversity, that real users will push at the system–that’s why we need user reports, telemetry from production, self-checks, etc.

        There’s a lot of “formal methods” in Jepsen; every test encodes, more or less explicitly, an abstract model of the system being evaluated. We take a range of approaches for performance and coverage reasons, so some actually involve walking graphs of state spaces, and others are just checking for hand-proved invariants. Developing new and faster checkers is a great place to apply your formal methods knowledge, if you’re looking to contribute!

        1. 1

          re FoundationDB. I wanted to see that, too. He had good reason for not doing it described here. In that thread, he said he hasn’t tested it because “their testing appears to be waaaay more rigorous than mine.” Still might be good for independent replication, though. Plenty of scientific papers look like they have a lot of rigor until you find that they missed this, used that incorrect algorithm, or just made stuff up for fame or fortune.

          I say that as someone who was Wow’d by FoundationDB. Hopefully, Jepsen just confirms it was as good as it appeared. If not, people get to fix any problems he finds. It’s all-win scenario unless he finds a problem they can’t fix somehow.

          1. 3

            That was based on a phone conversation I had with one of the FDB team members–they were doing a bunch of tests, like hardware faults and simulation tests, that weren’t really feasible for Jepsen because a.) I didn’t have custom hardware, and b.) simulation testing has to be built into the database code itself, and Jepsen takes a black-box approach. FDB also spun up their own Jepsen test, but I can’t tell you how deeply they explored there.

            Then FDB got eaten by Apple, and fell off my radar–but I’m happy it’s re-emerging now! We don’t have any plans to work together right now, and I’ve got my hands full with other clients, but I’d be happy to work on FDB tests in the future. :-)

      🇬🇧 The UK geoblock is lifted, hopefully permanently.