Castlevania [So called because it is a Metroidvania game set in a Castle.]
I am fully aware the author did this on purpose and I’m still mad.
(For those not in the know: Castlevania 1 is not a metroidvania, and I think the term was coined to describe Castlevania: Symphony of the Night (1997) and the following entries in the series, which in terms of gameplay are more closely related to Metroid than the previous Castlevania games.)
I really enjoyed this. I wish I had more faith that this kind of software quality product could get traction at most shops, but IME it requires a different kind of effort that’s harder to find. I’m remembering my experiences trying to introduce property-based testing, or TLA+ at previous workplaces: you have to know what you want in more detail than normal and be able to specify it formally and after you’ve written it in strange syntax, be willing to debug your tests and specs, often realizing the source of your bugs isn’t what you thought it was. It’s overhead and most people don’t see the promise on the other side.
@hwayne (a TLA+ expert and consultant) and I had a conversation about why “unit tests + mocking” seems ubiquitous (and, after interviewing hundreds of engineers, synonymous in their minds with “testing” or “quality control”) where other approaches have failed to get traction. I like what he said, which was (paraphrasing, if you don’t like it, then I misremembered it): “writing unit tests take the same skills as programming application code, so there’s very little to learn or fail at before getting to basic proficiency. Most other approaches don’t map as well.”
(Blimey, I don’t like being advertised precisely because people are so good at making ads. This ad got me good.)
I don’t understand something about antithesis. They say this isn’t fuzzing, but the result looks exactly like fuzzing to me: testing random things until you find the codepath you’re looking for, without regards to efficiency or common user inputs.
This is most obvious by the final TA”S” at the bottom where you can see the full run and Simon just spends a lot of time doing irrelevant jumping and whipping.
Why does antithesis say that they’ve invented something other than fuzzing?
Why does antithesis say that they’ve invented something other than fuzzing?
The fuzzing is just part of the system, the really cool part is the deterministic hypervisor (which is deterministic down to the CPU instruction level). That means all your code, even stuff that does IO/threading/distributed network IPC is deterministic, for free. So you can take the same docker containers you push to prod and run them all deterministically. The fuzzer generates inputs and uses coverage metrics/hints to guide what parts of the system are explored. I think there’s fault injection too, so you can simulate OS/disk/network failures and replay the whole run once stuff breaks in a way that you’ve hinted matters (kind of like property based testing invariants).
So yeah, it’s more of a deterministic virtual machine + coverage-guided fuzzer with hinting + property-based testing system all rolled into one.
We use a lot of fuzzing techniques but we wrote our own fuzzer. We iterated on it by playing a lot of NES games like this until we had our deterministic hypervisor.
Now the same fuzzer controls the hypervisor to fuzz distributed systems/arbitrary linux programs
I am fully aware the author did this on purpose and I’m still mad.
(For those not in the know: Castlevania 1 is not a metroidvania, and I think the term was coined to describe Castlevania: Symphony of the Night (1997) and the following entries in the series, which in terms of gameplay are more closely related to Metroid than the previous Castlevania games.)
Castlevania 2 was a metroidvania before it was cool to call it that.
Castlevania 2 is on my long list of games that are so close to greatness but just totally flubbed the execution
Tried Bisqwit’s translation yet?
I really enjoyed this. I wish I had more faith that this kind of software quality product could get traction at most shops, but IME it requires a different kind of effort that’s harder to find. I’m remembering my experiences trying to introduce property-based testing, or TLA+ at previous workplaces: you have to know what you want in more detail than normal and be able to specify it formally and after you’ve written it in strange syntax, be willing to debug your tests and specs, often realizing the source of your bugs isn’t what you thought it was. It’s overhead and most people don’t see the promise on the other side.
@hwayne (a TLA+ expert and consultant) and I had a conversation about why “unit tests + mocking” seems ubiquitous (and, after interviewing hundreds of engineers, synonymous in their minds with “testing” or “quality control”) where other approaches have failed to get traction. I like what he said, which was (paraphrasing, if you don’t like it, then I misremembered it): “writing unit tests take the same skills as programming application code, so there’s very little to learn or fail at before getting to basic proficiency. Most other approaches don’t map as well.”
anyway, good luck Antithesis!
(Blimey, I don’t like being advertised precisely because people are so good at making ads. This ad got me good.)
I don’t understand something about antithesis. They say this isn’t fuzzing, but the result looks exactly like fuzzing to me: testing random things until you find the codepath you’re looking for, without regards to efficiency or common user inputs.
This is most obvious by the final TA”S” at the bottom where you can see the full run and Simon just spends a lot of time doing irrelevant jumping and whipping.
Why does antithesis say that they’ve invented something other than fuzzing?
The fuzzing is just part of the system, the really cool part is the deterministic hypervisor (which is deterministic down to the CPU instruction level). That means all your code, even stuff that does IO/threading/distributed network IPC is deterministic, for free. So you can take the same docker containers you push to prod and run them all deterministically. The fuzzer generates inputs and uses coverage metrics/hints to guide what parts of the system are explored. I think there’s fault injection too, so you can simulate OS/disk/network failures and replay the whole run once stuff breaks in a way that you’ve hinted matters (kind of like property based testing invariants).
So yeah, it’s more of a deterministic virtual machine + coverage-guided fuzzer with hinting + property-based testing system all rolled into one.
(I work at antithesis)
We use a lot of fuzzing techniques but we wrote our own fuzzer. We iterated on it by playing a lot of NES games like this until we had our deterministic hypervisor.
Now the same fuzzer controls the hypervisor to fuzz distributed systems/arbitrary linux programs
They invented a simulated environment that allows for “perfect reproducibility”.
What is it exactly? It’s always hard to tell from marketing materials what the thing is.
Is it fuzzing with a recorder so you can replay? I thought fuzzers already did that.
https://antithesis.com/blog/deterministic_hypervisor/
My understanding is that this is less fuzzing with a recorder and more sans I/O at scale.