Systematic testing is such a powerful thing. We built it into the Verona runtime with an approach based on Pantazis Deligiannis’s PhD work. In systematic testing mode, we have a single thread actually running at a time and points in the runtime that may yield, with a deterministic pseudorandom number generator that decides whether to yield. Anything that involves communication between threads is a yield point. We can write tests and then run them with thousands of different interleavings in CI, but when one fails we can reproduce the failure locally by rerunning the test with the same seed.
I didn’t build any of this, but it was amazing while working on the runtime and things that use the runtime. CI would find bugs and then I could locally debug the one interleaving that caused problems. Pantazis went on to build something else that, I think, Azure is using around an evolution of the same ideas. Great to see it in more places.
Not done through the blog post yet, but I was struck by the list of early clients: MongoDB (okay), Ethereum Foundation (urgh), Palantir (“no comment”). I wish it was possible to build software that makes the world a better place without first empowering entities whose notion of “better place” is so different from mine.
The first customers for a product typically pay a higher proportion of the up-front R&D costs and effectively subsidise it for all later customers. I’m quite happy when they’re organisations that I’d prefer to have less money.
They also get more influence on the design choices and implementation decisions, directing them in a way that benefits them directly (and may benefit future users less) and premium access to the people doing the work (so more direct skill transfer, opportunities to poach them to work in-house, etc.), and of course they start extracting value of the product months or years in advance of their competitors. I don’t think it’s as bad a deal as you seem to suggest.
I don’t want to get too excited, but this is the product I’ve wanted for the longest time. I’m really curious about the technical details here - it almost sounds like they’ve created a custom container runtime that simulates an OS. Other than that, I can’t see how they could ensure determinism in any meaningful way. Or maybe it’s using hermit?
Anyway, if this was coming from anyone other than the FoundationDB team, I’d be very very skeptical. But this sounds like the real deal.
I hadn’t read about FoundationDB’s testing process before now, but “We never got tested by Jepsen because Kyle thought our tests were better” is one hell of an endorsement.
We thought about this and decided to just go all out and write a hypervisor which emulates a deterministic computer. Consequently, we can force anything inside it to be deterministic.
It appears that they also have language level integrations that give them the ability to affect thread/task scheduling as well, at least in some cases. I imagine they could modify the scheduler in the machine hosting the test workload to do the same, for systems which use OS threads.
This is very cool and a great way to sidestep the otherwise huge integration costs of model-based testing. Hopefully they release a free or lower-cost tier for FOSS projects some day.
That was my first thought as well. They can probably still make money off of the test runner hosting plus the container orchestration. But the hypervisor part at least should probably be open source.
I thought software verification was considered “hard” by complexity theorists. I guess the question now is what bugs exist in the hypervisor? Did they run it in a hyper-hypervisor?
My understanding is that this is basically a reproducible fuzzy tester on steroids for distributed systems. It can find bugs and help at reproducing them even if its correctness is not formally verified.
The “zero bugs” claim means “zero bugs reported by users”, not “mathematically guaranteed to have no bugs”
This looks very interesting! Looking forward to knowing more. Also, I’ll just put this in my bag for later use:
But starting a company is about banishing that thought and telling yourself that when you reach the pit full of poisonous spikes, you’ll figure something out.
Systematic testing is such a powerful thing. We built it into the Verona runtime with an approach based on Pantazis Deligiannis’s PhD work. In systematic testing mode, we have a single thread actually running at a time and points in the runtime that may yield, with a deterministic pseudorandom number generator that decides whether to yield. Anything that involves communication between threads is a yield point. We can write tests and then run them with thousands of different interleavings in CI, but when one fails we can reproduce the failure locally by rerunning the test with the same seed.
I didn’t build any of this, but it was amazing while working on the runtime and things that use the runtime. CI would find bugs and then I could locally debug the one interleaving that caused problems. Pantazis went on to build something else that, I think, Azure is using around an evolution of the same ideas. Great to see it in more places.
the foundationdb people built flow as an extension of C++ for similar reasons.
Not done through the blog post yet, but I was struck by the list of early clients: MongoDB (okay), Ethereum Foundation (urgh), Palantir (“no comment”). I wish it was possible to build software that makes the world a better place without first empowering entities whose notion of “better place” is so different from mine.
The first customers for a product typically pay a higher proportion of the up-front R&D costs and effectively subsidise it for all later customers. I’m quite happy when they’re organisations that I’d prefer to have less money.
They also get more influence on the design choices and implementation decisions, directing them in a way that benefits them directly (and may benefit future users less) and premium access to the people doing the work (so more direct skill transfer, opportunities to poach them to work in-house, etc.), and of course they start extracting value of the product months or years in advance of their competitors. I don’t think it’s as bad a deal as you seem to suggest.
I don’t want to get too excited, but this is the product I’ve wanted for the longest time. I’m really curious about the technical details here - it almost sounds like they’ve created a custom container runtime that simulates an OS. Other than that, I can’t see how they could ensure determinism in any meaningful way. Or maybe it’s using hermit?
Anyway, if this was coming from anyone other than the FoundationDB team, I’d be very very skeptical. But this sounds like the real deal.
I hadn’t read about FoundationDB’s testing process before now, but “We never got tested by Jepsen because Kyle thought our tests were better” is one hell of an endorsement.
It appears that they also have language level integrations that give them the ability to affect thread/task scheduling as well, at least in some cases. I imagine they could modify the scheduler in the machine hosting the test workload to do the same, for systems which use OS threads.
This is an incredible idea.
This is very cool and a great way to sidestep the otherwise huge integration costs of model-based testing. Hopefully they release a free or lower-cost tier for FOSS projects some day.
That was my first thought as well. They can probably still make money off of the test runner hosting plus the container orchestration. But the hypervisor part at least should probably be open source.
I thought software verification was considered “hard” by complexity theorists. I guess the question now is what bugs exist in the hypervisor? Did they run it in a hyper-hypervisor?
My understanding is that this is basically a reproducible fuzzy tester on steroids for distributed systems. It can find bugs and help at reproducing them even if its correctness is not formally verified.
The “zero bugs” claim means “zero bugs reported by users”, not “mathematically guaranteed to have no bugs”
This looks very interesting! Looking forward to knowing more. Also, I’ll just put this in my bag for later use: