Heck yeah! Always awesome to see another TLA+ enthusiast here :D
My one nit here: while the time in TLA+ is essential complexity, we shouldn’t brush off time concerns as a serious cost. Complex models can take days or even weeks to run, and model optimization is a skill you have to learn to get really good at TLA+. That said, TLC is a pretty old model-checker and there’s been a lot of advances in model checking that could make it faster. I’ve seen some interesting proposals on offloading some computations to SAT or SMT solvers and then only switching to brute-force for the stuff they can’t do.
Absolutely, I tried to still point that out. I’ve found that one hidden time cost of creating specs is in the debugging/iterating cycle. I’ve had a few times where I think the spec is at a good checking point and I run the model. Then I’m sitting waiting for 5 mins and have to decide on my own that this seems like way more states than it should be. This is not a problem you typically run into with a test.
I haven’t really gone into the optimization part too much just yet since my specs are still pretty simple. It’s interesting that even, what feels like, the ultimate abstraction tool, is leaky once you try to do (e.g. model check) something complex with it. I guess there’s no escaping the law of leaky abstractions.
Btw, thanks for Learn TLA! That’s what helped me bridge the gap from Specifying Systems to writing my own specs.
Then I’m sitting waiting for 5 mins and have to decide on my own that this seems like way more states than it should be. This is not a problem you typically run into with a test.
One trick that helps here is to define a TypeInvariant that makes bounded models a safety property. Something like
Then if the model has an infinite number of states, TypeInvariant fails and you catch the problem early. Another trick is to watch the state queue and model diameter in the “model results” page as its checking. If the diameter suddenly shoots up to 30 something’s probably wrong.
Btw, thanks for Learn TLA! That’s what helped me bridge the gap from Specifying Systems to writing my own specs.
interesting proposals on offloading some computations to SAT or SMT solvers
Specifically doing this from TLA+ ? I’d be interested in links, if you have any.
It looks like I was misremembering a little. Most of the work in applying SMT to TLA+ are focused on the proof system. There’s Alloy Meets TLA+, where they used Alloy’s SAT translations to enumerate initial states for TLA+, but since then they’ve instead been porting TLA+ ideas into Alloy (that’s Electrum.) There’s also Apalache, which is trying to write a symbolic model checker for TLA+, but I’m unsure if they’re intending to replace TLC or just augment it.
Formal methods do not require any setup code because they are not tied to the actual components under test.
This is what makes me a bit wary of verifying things outside of the program itself (unlike e.g. types or contracts). We have a bunch of tools like doctest because we can’t even manage to keep docstring examples in sync with the implementation that occurs right below them.
Are there ways to bridge such gaps, that could be added to a CI pipeline alongside other tests and verifiers (i.e. checking that the implementation is making transitions that the model allows, checking that the model’s transitions are actually possible in the implementation)?
This is the main concern that’s come up as I’ve been sharing TLA+ with coworkers.
The purpose of doc strings is to give meaning to your code (i.e explain the implementation). To fulfill their purpose, docs have to stay in sync. The problem is there is no automated way to evaluate if they are in sync because they are words and, as you pointed out, we’re bad at checking them.
What I find interesting about lightweight formal methods is that they are completely useful outside of the code. If the code doesn’t match the spec at all that doesn’t diminish the value of the spec. They are a way of describing your thinking and then checking it. So I guess my response is maybe we don’t need to automate checking the model vs the implementation.
For a better idea of how this can all fit together, I highly recommend reading Towards Zero Defect Programming by Allan M. Stavely. It walks through the Cleanroom methodology which is a way of incorporating formal methods into the software dev cycle. My main take way is that there’s a lot of value in keeping the spec separate from the implementation. Decoupling the two allows teams to understand the program better together.
I wish articles like this had more concrete examples. Whilst reading, all the examples that came to mind from my experience fell into 1-3 of their proposed categories, making their taxonomy less fundamentally (if subjectively) interesting to me.
This makes me think of something that I didn’t get into here which is that in practice, erroneous behavior is often the result of several errors. So if I have a concurrency issue debugging that might also uncover a logic issue. They get fixed as part of the same bug but are different parts. The way I presented the separation here is a simplification of reality but I think there’s value in analyzing errors through this lens of thinking vs. implementation. As I was working through this blog post, it sort of reminded me of when I first started decomposing programs into stateful and stateless pieces. At first, through an OO point of view, it didn’t make much sense because state was everywhere. Then after doing functional programming in a functional language it became much more natural and valuable to make this split. TLA+ feels like the functional language here in that you’re forced to think about what the thinking problem is and bringing the implementation problems in only gets in the way.
After writing all that, I realize that doesn’t make it any more concrete!
Heck yeah! Always awesome to see another TLA+ enthusiast here :D
My one nit here: while the time in TLA+ is essential complexity, we shouldn’t brush off time concerns as a serious cost. Complex models can take days or even weeks to run, and model optimization is a skill you have to learn to get really good at TLA+. That said, TLC is a pretty old model-checker and there’s been a lot of advances in model checking that could make it faster. I’ve seen some interesting proposals on offloading some computations to SAT or SMT solvers and then only switching to brute-force for the stuff they can’t do.
Absolutely, I tried to still point that out. I’ve found that one hidden time cost of creating specs is in the debugging/iterating cycle. I’ve had a few times where I think the spec is at a good checking point and I run the model. Then I’m sitting waiting for 5 mins and have to decide on my own that this seems like way more states than it should be. This is not a problem you typically run into with a test.
I haven’t really gone into the optimization part too much just yet since my specs are still pretty simple. It’s interesting that even, what feels like, the ultimate abstraction tool, is leaky once you try to do (e.g. model check) something complex with it. I guess there’s no escaping the law of leaky abstractions.
Btw, thanks for Learn TLA! That’s what helped me bridge the gap from Specifying Systems to writing my own specs.
One trick that helps here is to define a
TypeInvariantthat makes bounded models a safety property. Something likeThen if the model has an infinite number of states,
TypeInvariantfails and you catch the problem early. Another trick is to watch the state queue and model diameter in the “model results” page as its checking. If the diameter suddenly shoots up to 30 something’s probably wrong.Glad to help :)
It looks like I was misremembering a little. Most of the work in applying SMT to TLA+ are focused on the proof system. There’s Alloy Meets TLA+, where they used Alloy’s SAT translations to enumerate initial states for TLA+, but since then they’ve instead been porting TLA+ ideas into Alloy (that’s Electrum.) There’s also Apalache, which is trying to write a symbolic model checker for TLA+, but I’m unsure if they’re intending to replace TLC or just augment it.
This is what makes me a bit wary of verifying things outside of the program itself (unlike e.g. types or contracts). We have a bunch of tools like
doctestbecause we can’t even manage to keep docstring examples in sync with the implementation that occurs right below them.Are there ways to bridge such gaps, that could be added to a CI pipeline alongside other tests and verifiers (i.e. checking that the implementation is making transitions that the model allows, checking that the model’s transitions are actually possible in the implementation)?
This is the main concern that’s come up as I’ve been sharing TLA+ with coworkers.
The purpose of doc strings is to give meaning to your code (i.e explain the implementation). To fulfill their purpose, docs have to stay in sync. The problem is there is no automated way to evaluate if they are in sync because they are words and, as you pointed out, we’re bad at checking them.
What I find interesting about lightweight formal methods is that they are completely useful outside of the code. If the code doesn’t match the spec at all that doesn’t diminish the value of the spec. They are a way of describing your thinking and then checking it. So I guess my response is maybe we don’t need to automate checking the model vs the implementation.
For a better idea of how this can all fit together, I highly recommend reading Towards Zero Defect Programming by Allan M. Stavely. It walks through the Cleanroom methodology which is a way of incorporating formal methods into the software dev cycle. My main take way is that there’s a lot of value in keeping the spec separate from the implementation. Decoupling the two allows teams to understand the program better together.
I wish articles like this had more concrete examples. Whilst reading, all the examples that came to mind from my experience fell into 1-3 of their proposed categories, making their taxonomy less fundamentally (if subjectively) interesting to me.
This makes me think of something that I didn’t get into here which is that in practice, erroneous behavior is often the result of several errors. So if I have a concurrency issue debugging that might also uncover a logic issue. They get fixed as part of the same bug but are different parts. The way I presented the separation here is a simplification of reality but I think there’s value in analyzing errors through this lens of thinking vs. implementation. As I was working through this blog post, it sort of reminded me of when I first started decomposing programs into stateful and stateless pieces. At first, through an OO point of view, it didn’t make much sense because state was everywhere. Then after doing functional programming in a functional language it became much more natural and valuable to make this split. TLA+ feels like the functional language here in that you’re forced to think about what the thinking problem is and bringing the implementation problems in only gets in the way.
After writing all that, I realize that doesn’t make it any more concrete!