Neat that it was straight-forward to solve in both TLA+ and Hypothesis. The latter looks extremely readable, too, partly thanks to Python itself. I did a quick Google to see if anyone tried it in a proof assistant or something. Didn’t look past page 1 but that had a number theory solution:
This post couldn’t have been timed better. I’m exploring formal methods for proving certain invariant about a system I’m designing, and I stumbled accoss the lamport tla programs.
I spent half a day trying to get TLA running and failed. As far as I can tell tla+ was an ocaml program that relied on a bunch of random SAT solvers from various universities (which were very difficult to run). It seems like maybe it’s been rewritten in java now?
If someone who uses TLA+ would make a Dockerfile that would be much appreciated.
The python version looks quite interesting, is someone familiar with the problem space able to chime in on the features of Hypothesis vs TLA+?
Hi, Hypothesis author here (though I’m only passingly familiar with TLA+).
The big difference is that Hypothesis is not a formal methods system, it’s a way of doing normal testing augmented by an external source of data and decision making. This is both its advantage and its disadvantage over TLA+.
It’s an advantage because it’s flexible and easy to use (the bit shown in the article is actually one of the least user friendly bits of Hypothesis IMO) - there are a lot of things that you can readily test with Hypothesis that it would probably be a real struggle to model in TLA+ - anything you could write tests in Python for you can test using Hypothesis with almost no extra work.
It’s a disadvantage because this means that there are bugs that TLA+ will catch that there’s just no way that Hypothesis will ever find. Hypothesis is a well tuned and somewhat smart random tester, but there exist “deep” bugs that no random tester will ever find that formal methods can just spot immediately because they understand the system thoroughly.
Fortunately it turns out that a lot of interesting bugs are surprisingly shallow, and a lot of software has such bugs, so in practice I think you get a much better effort : reward ratio by starting with something like Hypothesis, but once you’ve done that if you still wanted to try TLA+ it would probably flush out some interesting bugs that Hypothesis had missed.
The other big difference is that TLA+ is explicitly about modelling concurrent systems and I haven’t done much with concurrency in Hypothesis, mostly because Python.
The other other big difference is that you can do useful stuff with Hypothesis after like an hour while you have to invest a couple of days into learning TLA+ to get any benefit. I love TLA+ as much as the next guy but I’m not gonna pretend it’s user-friendly.
The best way to start with TLA+ is probably by downloading the TLA+ Toolbox, which is an all-in-one IDE for modelling and testing. The model checker (TLC) does an exhaustive search, while the proof system (TLAPS) uses SAT solvers on the backend. For the most part you can get pretty far with just TLC, and I’ve not yet had a need to write full proofs.
As for TLA+ vs Hypothesis, I want to caveat that I’m much more familiar with TLA+ than Hypothesis, so I can’t speak to accurately to it. From my understanding, though, they’re for different things. Hypothesis is designed for testing actual code: it goes right in your test suite and can partially replace your unit tests. That’s what makes it so fantastic: there’s almost no overhead to start using it and it provides much more rigorous testing than simple unit tests do.
TLA+, on the other hand, is more for testing designs: you have to translate it to code, and that leave room for errors. But since it’s declarative, it can specify complex properties in very simple ways. And since it’s abstract, it natively handles liveness and concurrency problems a lot better than a concrete tool, like Hypothesis, can.
So I think they’re complementary: Hypothesis works great for testing your code, while TLA+ works great for making sure the algorithm itself isn’t broken.