1. 18

  2. 4

    Here is the crux: for professional programmers, writing code that implements an unambiguous formal specification is easy.

    Really? These specs exist at an abstraction layer that has no notion of details like sockets, queues, syscalls, filesystems, schedulers, etc., and all of the errors that come from those things. In my experience, that’s the stuff that determines whether or not a system is successful. The spec is the easy part! The difficult bit is the translation.

    1. 3

      That’s what I was saying here. I think there are broadly two kinds of systems: ones where the implementation is more complicated than the spec, and ones where the spec and implementation are very closely related. I think this quote is true of the latter, and there are real systems like this. But many applications are more like the former too.

      One really interesting thing about diving into the formal idea of implementation (via refinement) is that if there’s any error in the socket / syscall / scheduler logic, then the implementation does not correctly implement the spec. That’s what’s so powerful about having the spec - yes it’s much simpler, but that simplicity can wrangle the complexity of all of the implementation-level craziness.

      1. 1


        if there’s any error in the socket / syscall / scheduler logic, then the implementation does not correctly implement the spec

        It’s been my experience that the socket / syscall / scheduler logic can’t really be evaluated as containing an “error” or not. Similarly, any spec defined in a language like TLA+ doesn’t really have a “correct” implementation in practice. The simplifying assumptions made by those higher levels of abstraction tend to be impossible to satisfy in any real-world system. (Reliable message delivery is a common example – you don’t get it from normal TCP connections!) So there’s always some fudge factor in the translation, which is where all of the complexity tends to live.

        1. 9

          Unreliable message delivery is pretty straightforward. Let’s say for simplicity “message delivery” is just a value state:

          SendMsg ==
            /\ status' = "sending"
          DeliverMsg ==
            /\ status = "sending"
            /\ status' = "sent"
          \* Our property
          Liveness ==  <>(status = "sent")

          Then unreliable delivery is just a nondeterministic selection of a different status:

          DeliverMsg ==
            /\ status = "sending"
            /\  \/ status' = "sent"
                \/ status' = "fail"
          RetryMsg ==
            /\ status = "fail"
            /\ status' = "sending"

          Now Liveness doesn’t hold as a property, because we could have the infinite loop Deliver -> Retry -> Deliver -> Retry ...

          What does hold, though, is Liveness modulo strong fairness:

          WeakerLiveness == SF_status(DeliverMsg /\ status' = "sent") => Liveness

          IE, Liveness holds as long as we eventually get through the retry loop, even if it takes a long time. We can extend this to cover “the message was received but the sender doesn’t know it was received”, by splitting the state of the message into separate state machines. I’m doing that in a project right now and it works out well.

          (What about an undeliverable message? We can add a third end-state to DeliverMsg, like status' = "error", that can’t be retried. Under that circumstance WeakerLiveness doesn’t hold either.)

          1. 3

            Neat! I’ve not seen that technique.

          2. 3

            Isn’t that asuuming that message delivery errors are missing in the spec? That’s something I would definitely put in a spec. Even if it’s as high level as “an error occurred.”

            Specs can have error handling in them, what you abstract and what you choose to model is up to you.

            1. 3

              In TLA+ it’s pretty standard to model messages getting lost or delivered out of order. Usually this is done by putting all messages in transit in a big unordered set in a network variable, then having a LoseMessage action nondeterministically remove an element of the set (or have the message not be put in the set in the first place). Recipients nondeterministically select a message in the set addressed to them to process.

              1. 4

                Right. Whenever I hear “this can’t be evaluated,” it generally means that there’s some aspect of the system that’s taken to be magical, when all that we have to do is explicitly model that part. I’ve yet to encounter some kind of magical algorithm in the real world that can’t be modeled.

              2. 1

                There’s actually no way for a node to reliably know if a message it sends to a remote node has been delivered or not! (At least not without a consensus protocol – which has to be specified against a network layer that doesn’t provide reliable delivery 🙃) Receiving an error does not mean the message was not received by the remote peer, and not receiving an error does not mean the message was received by the remote peer.

        2. 2

          I like your point that an ironclad proof showing that the implementation refines the spec isn’t really necessary, because implementation is the easy part. Then you go on to wonder if LLMs can be used to check the implementation against the spec. I’ve actually been thinking about that same idea a little differently: if we get AI good enough to reliably produce code (without hallucinating libraries, etc.), then writing formal specs for the AI could be a big advantage over going back and forth with the AI in natural language.

          1. 4

            because implementation is the easy part

            That’s not always true, in fact one of the best usages of refinement is when the spec is much simpler than the implementation. This is very often the case in business software deployed on the web, where the business rules are relatively simple, but the logic is spread out across a client + multiple services, or something like that.

            Also, my favorite usage of refinement is for generating tests. Proving refinement is a whole lot of effort, but checking for refinement can be done pretty much automatically when given a spec and an implementation. Examples here and here.

            1. 1

              I agree, it depends on how concrete your specification is. If you only have an abstract functional specification, it might be worth refining that to get closer to what the actual implementation will be like. Like in your example of logic being spread across multiple services, it could be easier reason through that in situation in a modeling language before moving on to the full implementation.

              The articles you linked look interesting, I’ll definitely be going through your Alloy articles.

          2. 2

            Super cool example. This is definitely TLA+’s sweet spot, being a distributed data management system.

            Much of the difficulty when writing code comes from unclear requirements and uncertainty about how the code will interact with the rest of the system. Working from a pre-existing formal specification ensures all that heavy lifting has already been done!

            I wish I could upvote this a thousand times. As programmers, and probably especially as programmers in the agile generation, we conflate design and implementation. Agile rallied against formal documentation and up-front design, but what if we threw the baby out with the bathwater?

            I see specifications as a big time saver in practice. The key is being to incrementally specify, and only specify the new change that you’re working on. In practice, I don’t see how to work on a team without doing this, and every minute we’ve spent refining feature-level specifications has been rewarded with people on the team being able to work independently, i.e. up-front specs increase concurrency and throughput.

            This has gotten me really interesting in spec / model-based workflows. Because actually writing down the spec formally isn’t that much extra work, but once you have that it opens up a slew of other doors (model checking, model-based test generation, etc.)