1. 48

  2. 3

    Is TLA+ useful for people writing single-threaded (and safety critical) software?

    I have the impression most use cases are in distributed/parallel computing.

    1. 4

      I’ve used it to verify terminating algorithms, like graph sorting, but you’re correct it’s more oriented to concurrency problems. If you’re writing something deterministic and threaded, the problem domain is a lot simpler and tools like Dafny and Frama-C become more feasible.

      1. 3

        It’s useful for code where order matters. I’ve used it for checking front end state machines (like Redux).

      2. 2

        Enjoyed the article. I’ve been slowing introducing formal methods as a design simulation and engineering thought tool. Basically, matching what the article discusses as debuggable design. One thing that I’ve found is hard for most software engineers is that they structure their specs like a system, not like a set of purer logic.

        To help get of over that boundary I’ve spent a lot of time with mCrl2 which allows the spec to look much closer to a system. But, I’m curious if anyone has advice on limitations of that toolset in comparison with TLA+ or Alloy?

        Perhaps, I need to write a mCrl2 version of articles example to show how it would look different…

        1. 3

          But, I’m curious if anyone has advice on limitations of that toolset in comparison with TLA+ or Alloy?

          Whoops, accidentally downloaded mCrl2

          Whoops, accidentally installed it

          Whoops, accidentally started reading the documentation

          1. 2

            For those that are interested in formal methods and haven’t seen mCrl2 before, I’m including my notes on what I belive makes it unique. I’ll include some links to parts of it’s toolset which aren’t covered specifically in it’s tutorials which fill roles you may be used to like generating counter examples and checking invariants.

            1. It excels at modeling how a system communicates. It models actions (which I think of as observable elements of a process) and state as 2 separate concerns. This has proven a valuable separation when modeling separate components/services of a larger system.
            2. It has a tremendous amount of tooling to describe how two specs are different. This really helps understand how a spec can change over time, and if those changes require updates to the different components. It also can help in finding simpler models in theory but I’m less sure how that process would be realized.
            3. The simulation helps both writers and readers of the specifications gain confidence it matches their expectations. It’s like having a usable REPL to each of your specs. The simulation also comes both with state visualization GUI tools which helps less technical folks remain engaged in the spec.
            4. It’s really helpful to be able to start with an FSM and convert that back to a process definition. My mind was blown when I realized the power of it’s different formats. Don’t skip on understanding the LTS Formats what what it means to check them for similarities. It might be a different way to specify things, but I’ve found it to be a great way to explore different solution approaches.
            5. The way mCrl2 deals with time, and asserting properties over time is really hard to grasp. Somehow I can’t internalize the meaning of the mu-calculus it uses. It seems like the most powerful tool it has, but I can’t find a way to describe it to others; so I’m sure I don’t understand it yet.
            6. It took me too long to find the tooling for asserting invariants. But it seems like a nice enough setup to it. I imagine this is the “easy-mode” where most examples use the mu-calculus I’m still trying to understand enough to use.
            7. Most tools can generate counter examples with a flag. Usually -c, but you often have to review it as a graph or then check it for other things like deadlocks to get a clear trace, like you get directly out of TLA+.

            Overall, I think it sits in an interesting place for formal methods. It’s less abstract (and less like pure logic) which makes harder to write directly off of requirements. Having said that I think the tutorials show lots of examples where going from a set of requirement can be specified and tested. When I compare it with TLA+, Alloy or VDM it’s exists somewhere between the TLA+ and VDM, perhaps closer to B or Z; but I have no experience with those. Finally, the simulation side of it has been the mostly incredibly valuable tool for sharing the specs. It also helps validation while writing which I now miss when utilizing other methods.

            I would love to hear anyone else’s experience with either mCrl2 or how they share spec’s with teams and get the teams to gain confidence in them.

        2. 2

          Just an readability tip: I didn’t realize that the name of each subject (company/project) was linked to a specific event, so I didn’t click on any name that I already recognized.

          On topic: nice resource, the copy is on point and focusing on time/money makes FM look really attractive.

          1. 1

            Thanks! Do you think it would be better with a note saying all of them are links, or as “Case Study: Company (link)”?

            1. 2

              Since you have succinct copy in each section, I would probably link text like

              Two days of TLA+ specification caught several major bugs


              Cockroach labs used TLA+ to model their parallel commit optimization

              But that is probably subjective. I’m not sure what other people think about linking (almost) full clauses.

          2. 1

            Nice writeup. I have a question about this claim:

            Your unit tests won’t find this.

            Doesn’t this depend on what your units are and what tests you’re writing?

            If I were writing this system, I would fully expect to have function objects representing this entire process, and to have an injected dependency like VerifyOwnership, and then a test case where a double of that dependency returns false, and then an assertion that the transfer didn’t go through.

            I’m not saying the proposed bug isn’t realistic… only that the implicit argument is essentially… “If you have an incorrect model of the system and are only unit testing at too fine a level of granularity, then you will miss this kind of bug.” From which you can’t conclude “only FMs will catch this bug.”

            I suppose my objection boils down to… whether you’re using FMs or ordinary development methods, some person has to think through all this correctly at some point, and you can’t get around that.

            Is the argument that writing a FM spec will make it much more likely that this correct thinking through occurs? Can a mistake not be made in writing the spec? Is it just less likely?

            Please lmk if I’m misunderstanding….

            1. 4

              Good question. It’s true that a good enough unit tester could catch this with unit tests, depending on the existing implementation of the system. If this involved to microservices, this would need to be an integration test instead, but it’s still possible to catch with tests.

              But catching it with tests means seeing that 1) this is an issue, and 2) this sequence of events might cause it. By contrast, the tla+ is a bit above beginner level, but definitely something my workshop students could write by the end of the workshop. You need a lot more skill to catch this with unit tests than with a software model.

              This is also a very simple business case. Imagine if we added in proper trades, or paying money for credits, or escrow. The number of possible things you’d have to test will grow exponentially, while the tla+ model will scale linearly. It handles finding the event sequences for you.

              (I also think that it’s easier to “think correctly” at the spec level, but that’s a lot harder to sell to businesses!)

              1. 1

                I’m not the author. I think the general argument is:

                1. specs are quicker and easier to write
                2. the computer is much better at testing for problems in the spec than you are likely to be

                So, trivially, yes, any bug you find with TLA could be found with unit tests, but you would have to have thought about the case carefully and you would have to have implemented the system. Seeing as you’re going to have to put all that effort into simulating the program in your head, why not just describe it in a concise modelling language and then have the computer prove if your desired properties hold?

                And, yes, you can absolutely make mistakes when writing a spec, but there is orders of magnitude less code in the spec than in the implementation. I think there’s a reasonable argument that the theorem prover will help you find bugs in the spec, too.

                1. 1

                  Okay, I figured that was the gist of it. Thanks.

                  And, yes, you can absolutely make mistakes when writing a spec, but there is orders of magnitude less code in the spec than in the implementation.

                  But to be clear, the relevant difference we’re discussing isn’t between the spec and the implementation, but the “spec as written in TLA as a way to protect against impl errors” and “the spec as written in a series of (hopefully) comprehensive unit tests as a way to protect against impl errors”.

                  Not being familiar with TLA, I could take the argument on faith that the former still might have a better chance of success, but, while willing to be proved wrong, I’d be skeptical of orders of magnitude difference between those two quantities. Mainly because I think failures of the human understanding of the domain are the high order bit in both cases, and that will always be at play.

              2. 1

                Hmm, not sure if it’s possible (I’ve read somewhere on Hacker News that Erlang companies say they’re Java shops to deceive competitors and retain their edge), but it would be super awesome to name the people who could back up and vouch for these quotes. Especially for light technical or non-technical stakeholders, having a network of people to reach out to in case things go wrong may help de-risk taking that first step. Just a friendly suggestion :)

                1. 11

                  I’ve read somewhere on Hacker News that Erlang companies say they’re Java shops to deceive competitors and retain their edge

                  Hah, I always thought this was a bit overblown. From my understanding, talking with insiders at companies, their “edge” is never really a specific language or anything. It’s always something like “we have a team of Math and CS PhDs who spend all day writing faster compression algorithms” or “we spend tons of money on developer training” or other human factors like that. Things that, even if you knew what they were doing, would be really hard to repeat for yourself.

                  IME the main reason companies don’t talk about their tech is for brand management. If a company says “yeah we use Erlang” on the record, they’re now That Company That Uses Erlang. If they eventually decide to stop using Erlang, they’re now That Company That Decided Erlang Was a Bad Idea, or even worse That Company That Keeps Chasing New Technology.

                  That’s one reason all of the case studies are things that are on-the-record, where a person at the company wrote a thing where they explicitly say “here’s how FM benefited us.” I have quotes by individuals on my consulting page. I want to add more, but getting them takes time.

                  1. 2

                    Yeah, I guess that makes sense. I talked to one company during one interview round that mentioned they did a rewrite of a service from Ruby to Elixir and back to Ruby, and now they have a particular connotation in my mind now despite my best intentions.

                    Whoops, I didn’t see that those are clickable links with named citations! I thought it was just “Amazon the company found TLA+ useful”. That’s totes on me. I think my only suggestion then might be adding names with the company name; “$NAME and $NAME at $COMPANY used TLA+” vs. “$COMPANY used TLA+”.

                    1. 3

                      Given you and @iamnearlythere had exactly the same issue, it’s 100% a UI failing on my end.