1. 26
  1.  

  2. 10

    I was recently working with coreos/etcd’s Raft implementation, specifically it’s PreVote feature which was something alluded to in the Raft dissertation. I convinced myself that specifying PreVote in TLA+ would help me better understand it (etcd’s PreVote implementation has had some issues as of late), which I did here: irfansharif/raft.tla.

    Some other resources that came in handy learning TLA+ then were Leslie Lamport’s video series on the subject and Microsoft Research’s Dr. TLA+ series.

    1. 3

      I convinced myself that specifying PreVote in TLA+ would help me better understand it

      How much did it help?

      1. 5

        Quite a bit, I was sure to familiarize myself with the original Raft TLA+ spec (Appendix B.2 from the thesis paper) which itself was a very succinct representation of the possible state changes in vanilla Raft. Also note what I mentioned above is about understanding raw TLA+, the post linked covers PlusCAL which adds a C like pseudo-code interface to TLA+ that is essentially transpiled to TLA+ – not something I covered.

        EDIT: welp, didn’t realize I was responding to the author of the post.

        1. 6

          EDIT: welp, didn’t realize I was responding to the author of the post.

          Hah, no worries! I’m just glad to get all of the info I can about how people use TLA+. I mostly pitch it as a way to find bugs in a spec, but just as beneficial is using it to understand a spec. It’s a much more subtle bonus, but writing a spec means not being able to handwave or glaze over what you actually want to happen. The guide is something I really care about, and part of caring means constantly revising it to make it more useful to readers.

          the post linked covers PlusCAL which adds a C like pseudo-code interface to TLA+ that is essentially transpiled to TLA+ – not something I covered.

          PlusCal is sort of an interesting case. I chose to focus on it because I think it’s easier to learn and, for a lot of common use cases, is “good enough” for most people. But it’s definitely less powerful than raw TLA+. For example, in PlusCal you can’t simulate a system reset by forcing all processes to simultaneously go back to their starting state, or have one process create a second at runtime. I think that if TLA+ ever becomes popular, most people will be working in DSLs like Pluscal – C to TLA+’s assembly language.

      1. 3

        Oh sorry, did not see it.

        1. 4

          Good thing its worth the repost!

      2. 4

        The big thing I’m confused about with TLA that I haven’t seen any of the commentary about it handle is, how can I use TLA to say something useful about software that I’m currently working on? For instance, right now I’m working on an interpreter for a toy programming language, and I have no idea how I could apply TLA to make that codebase less buggy, or make my toy language design better in some way.

        1. 3

          I’ve not worked on interpreters, so I can’t think of a good example off the top of my head, but I’ve written and presented on how TLA+ has been useful for adding functionality to a web app. I also like this article on detecting thread deadlocks, and this example of simulating trading algorithms.

          1. 3

            Just wanna let you know that all your work is appreciated, I’ve got “learn TLA” as my main go-to task over the upcoming Christmas break. Waiting on my copy of LL’s book to arrive before then as well, I hope :)

            Edit: Me fail english? That’s unpossible!

            1. 2

              Aww, thank you! If it doesn’t arrive in time, it’s also downloadable free here.

            2. 2

              Two questions (as I work through this list).

              I’m just finishing watching your Strangeloop talk, got to the part about “Redacted” where you describe TLA+ finding that your fixes to satisfy the invariant cause another invariant to no longer hold. I was curious if TLA+ is also capable of finding contradictory invariants; specifically non-obvious ones.

              My second question relates to the first, at $work, I’m responsible for maintaining and designing the infrastructure surrounding one of our main applications. I’m not alone in this task, but I’m significantly more experienced (most of the team is new as of the beginning of this year, I’ve been there for 5 years) than the rest of the team so naturally a lot of things fall to me. The existing architecture we maintain is gnarly and old, having been designed by poorly trained orangutans gentlemen before my time. We are simultaneously planning a port to a new infrastructure with some architectural changes, and I’m curious about applying TLA+ to the design of this new infrastructure. Part of the issue with the old design is the plethora of constraints applied to everything due to old contracts, SLA expectations, etc; a major concern that I have been working through is ensuring that we aren’t painted into an impossible corner, and while my background is in math, doing that much of it by hand is not… y’know… ideal. I was curious if you know of any resources on infrastructural/low-level/devops-y design with TLA+. It seems like it’d be a natural fit (esp. as I try to get us to a newer model, but also for diagnosing systemic errors in the old).

              EDIT: Also, just finished the talk as I wrote this, great talk – everyone should go watch it, it’s the one hwayne links above.

              1. 2

                I’m just finishing watching your Strangeloop talk, got to the part about “Redacted” where you describe TLA+ finding that your fixes to satisfy the invariant cause another invariant to no longer hold. I was curious if TLA+ is also capable of finding contradictory invariants; specifically non-obvious ones.

                Ooh, good question. The short answer is “probably”. The long answer is it depends.

                Invariants in TLA+ don’t have any special status: they’re just operators that return booleans. When you declare it an invariant, the model checker raises an error if it’s ever false. If you want to find out if two invariants are contradictory, that’s logically equivalent to determining if, in all states, both don’t hold. So you could make your invariant Contradictory == ~(A /\ B). If that doesn’t hold, TLA+ will give you a state where both invariants are true at the same time.

                The two caveats are 1) TLA+ only checks finite state spaces, so if the model passes, you aren’t guaranteed it will always be contradictory. It might hold for a larger model. And 2) This is only applies to safety invariants, aka “bad things never happen” invariants. There’s also liveness invariants, which are things like “all messages sent are eventually received” or “the database will eventually reach consistency.” Those can’t be composed in the same way.

                I was curious if you know of any resources on infrastructural/low-level/devops-y design with TLA+. It seems like it’d be a natural fit (esp. as I try to get us to a newer model, but also for diagnosing systemic errors in the old).

                One of the problems with TLA+ as it currently exists is the community is tiny and most of the specs are private. I did a quick demo of modeling zero-downtime deployments but that’s the only public devops thing I can think of. “Redacted” was a pretty heavy infra project, though, so I guess I can vouch for TLA+’s applicability in those kinds of situations.

                1. 1

                  Invariants in TLA+ don’t have any special status: they’re just operators that return booleans…

                  Interesting. I suppose once you have the invariants in a reasonably notation, it’s not too hard to put them through some other sort of Satisfier and have it look for contradictions. Half the battle is probably getting to the point of a formal notation for them.

                  I did a quick demo of modeling zero-downtime deployments… so I guess I can vouch for TLA+’s applicability…

                  This is good, this is the sort of thing that helps me justify this isn’t a rabbit hole when I tell my boss I’m working on it. :D

                  1. 2

                    Half the battle is probably getting to the point of a formal notation for them.

                    It’s not actually that bad. TLA+ is a first order logic, so it’s pretty easy to do stuff like

                      OneTargetPerWorker ==
                        \A w1 \in Workers, w2 \in Workers:
                            target[w1] = target[w2] =>
                                target[w1] = NULL \/ (w1 = w2)
                    

                    Most of my trouble comes from running into TLA+ footguns, which is why I spend so much time writing about how to avoid TLA+ footguns.

                    This is good, this is the sort of thing that helps me justify this isn’t a rabbit hole when I tell my boss I’m working on it. :D

                    Awesome! Feel free to message me if you need any help with anything!

            3. 1

              It’s kind of the reverse of the “rest of the fucking owl” problem :)

              1. 1

                If it has concurrency, it’s possible TLA+ could help with ordering problems. Additionally if it was fault-tolerant with several keeping in sync with some protocol. Otherwise, Alloy for data structures or SPARK Ada for correctness would be more appropriate far as lightweight, formal methods go.

                http://alloy.mit.edu/alloy/

                https://en.wikipedia.org/wiki/SPARK_(programming_language)

                1. 2

                  I’ve started learning Alloy and it’s really, really cool. I want to spend more time improving before I make sweeping statements about it, but I think it might actually beat TLA+ for the case of non-concurrent time modeling. That’s because you can represent a single data structure undergoing mutation as an ordered sequence of structures where each pair is restricted to differ by a mutation. Alloy can then look at the entire “timeline” at once, while TLA+ is restricted to treating the mutations as separate states in a behavior.

                  The reason TLA+ does that, of course, is to provide first-class temporal support, so adding multiple structures changing over time is trivial, while for Alloy you’d need to add an explicit time variable and then everything gets really messy. I think it’ll be good to know both.

                  Also the Alloy resources are so much better than the TLA+ resources. I wonder if I would have been willing to learn TLA+ if I started with Alloy. I hope I would have.

                  1. 1

                    I only recently came across Alloy, it does indeed seem really interesting and the tutorial I found looks great. Unfortunately for me the UI NPEs the instant it receives a keypress, so I can’t use the tools. At some point I might extract the sources and try to debug it, but I’ve not decided to act on that yet because I fear it’s likely to lead to a lot of work :)

                    1. 1

                      Very interesting observation on time modeling. I was hoping you’d learn Alloy. You have a talent for turning this odd tech into stuff people can understand and use. I do encourage you to wait on the writeup so you can apply Alloy to various structures and use-cases first. Definitely skim through the publication list to see the kinds of things they used it on. That was everything from data structures to security policies to component repair from contracts. Think of the things you’ve seen real-world developers deal with that’s similar. Small examples with tricky-to-test flaws or that would require brute-force with time consuming pile of tests are good examples. Attempts at Alloy on such diverse problems will show you its strengths and weaknesses. If you commit to it, I’m sure we’ll enjoy your write-up at least as much as the TLA+ one. :)

                      I’m glad you learned TLA+ first, though, as I might have never had a chance to learn it if I had to face the documentation and guides you did. They did not look beginner-friendly compared to Alloy. I had seen it first in a high-assurance VPN [1]. Quite a few people used it for configuration verification. Esp networked- or component-based. I bet that still has potential with all the black boxes I see ops people flinging together in submissions.

                      [1] https://lobste.rs/s/whvs2u/lessons_learned_from_building_high