1. 25
    1. 11

      From my misspent time in physical sciences: All models are wrong, some models are useful.

      1. 5

        One of my favorite quotes. A model has to leave details out at the end of the day. So all models are definitely wrong. But they become quite useful when they’re wrong in inconsequential ways.

        1. 5

          Alternative phrasing, “a map is not the territory”:


          1. 2

            That’s actually even better. Maps were created because the territory was unnavigable. That’s exactly what I find useful about models - they make implementations navigable.

      2. 5

        All models are wrong: database models of information, mental models of systems, code models of reality. It’s our job as software engineers to make these wrong models useful.

        1. 2

          This should be the tagline for “Data and Reality”

    2. 4

      I know, vaguely, that modeling and formal methods have been explained at length numerous times on here, but could anyone point me to their personal favorite introduction to designing and writing models? I found this article compelling, but I’m not sure what choices make for good models, nor how to answer some of the questions raised in the article (eg the verification gap).

      1. 8

        Here’s one of my favorite intros: modeling the browser same-origin policy in Alloy. To get a model of the same-origin policy, you have to model browsers, servers, URLs, and HTTP requests. Real-world implementations of these are obviously gigantic, but the model is small and digestible. As @ahelwer mentioned, Specifying Systems is like the holy grail of modeling books. Big +1 there.

        I’ll also add that modeling itself doesn’t have to be tied to formal methods. I’ve modeled several calculations at work in spreadsheets, and they come in handy in a few different ways.

        I’ve also created different models in some previous posts:

        Property-Based Testing Against a Model of a Web Application uses an executable model to express the domain logic of a web app and use that in a test against the implementation - that’s one solution to the verification gap.

        In Misspecification I built a model of a business-logic-esque feature and checked a property about it.

        I’m biased, but I think these are good intros because they focus on “typical” business applications.

      2. 4

        I’m a great fan of TLA+ so I recommend learning that. @hwayne on here wrote a textbook about it and has the website https://learntla.com/; personally I learned from the textbook Specifying Systems by Leslie Lamport.

        Everyone new to formal specification worries about the verification gap. It is a question asked at every single intro presentation. It’s difficult to convince people but it is literally Not A Problem. If the core logic of your app changes enough that the spec becomes outdated, then of course you’d want to update the spec for the same reasons justifying writing the spec in the first place! If you’re firing off updates to the core logic of your system without such validation, well… what happens happens. Specs are usually high-level enough that it isn’t as though you need to be changing them with every commit, or even every ten commits.

        1. 6

          It’s worth noting that most tools for formal models are tailored to a specific kind of problem (implicitly, if not by design) and so the right tool for the job depends on the problem that you’re trying to solve. For example, we use Sail for ISA modelling, but I wouldn’t recommend it for anything else (though, in theory, it should be able to model most forms of computation over things that can be mapped to bit vectors, to basically anything that you can compute on a real machine).

          Whether you get to verification or not, simply having a formal model gives you a proof that a correct implementation is possible. This is not always true and I’ve seen people build quite complex systems before realising that the thing that their approach is unsound and having t throw it away and start again.

          If you are proving things about your model then that does add quite a bit to the maintenance burden: redoing proofs when some of the early steps are no longer valid is time consuming. On the other hand, it should also give you more confidence that the approach is correct and reduce the need for changes. I generally like the approach of prototype, then formal model, then real implementation, then test the implementation against the model (tandem verification: do recorded execution traces through the implementation match permitted traces through the model?), the prove properties of the model. If your problem is small enough, then prove correspondence between the model and then implementation.

          To put that concretely, for my current project we have a formal model of the ISA that we’ve used with an SMT solver to find some counter examples of properties that we now have, and thought we had earlier. Some friends in Oxford are proving that the Verilog implementation corresponds to the ISA spec. Some friends in Cambridge have previously proven some security properties of an older version of the ISA and we hope to rerun their proofs soon. The core of our software TCB is around 300 instructions. We can combine these with the ISA model to export something into a theorem prover and try to prove things about (hoping to start that next year), but the first step of that is unambiguously articulating what the properties that we require there are and that step alone is valuable for the security audit.

    3. 2

      One thing not covered by the author is the existence of symbolic model checkers such as Apalache for TLA+, which have a different performance profile compared to finite model checking (enumerating all states and transitions between them as a BFS, basically).

      For the second paragraph about models as oracles, the term of art is “model-based testing” and has two flavors: using the model to generate events which are then pushed onto the system to check that the resulting state is as expected, or collecting traces from the running system and checking them against the model to see whether it is a valid trace of the system. A. Jesse Jiryu Davis et. al tested both these approaches at MongoDB: https://arxiv.org/abs/2006.00915

      Good article!

      1. 1

        Big +1 on symbolic model checking. And if I could pick a favorite buzzword, right now it’s “model-based testing” :D. I got interested in models from thinking about how to improve testing strategies. Anything related to testing and verification ultimately ends up with: well, you need a specification of your desired behavior to determine what’s “correct.” Then the next logical jump is: once you have your specification in the form of a model, why not use that to just generate your tests? It opens up a totally different workflow.

        Like you said, there’s different ways of going about the actual test generation, and that’s an interesting research area. I think there are plenty of new things to figure out there.

    4. 2

      This is an outstanding post. One of the best posts I’ve read this year. I’m looking forward to diving into other content from the same author/site.

      1. 3

        I really appreciate that. What are some topics that you’re interested in?

        1. 2

          Since my last comment I’ve already found several of your posts interesting. I’m interested in any techniques that give us more leverage in terms of specification / testing / correctness. I’m interested in approachable introductions / tutorials / examples of model based testing and formal models.

    5. 1

      Speaking of the P language, does anyone know why they “regressed” from generating production C code from the model in the Windows USB case (~2014), to re-implementing the model as another program for AWS ?

      I watched the Strange Loop 2022 video and that was something that stuck out. The creator said “performance” was the reason, but I see the first case as much more demanding and sensitive performance-wise than the second

      I asked that here: https://lobste.rs/s/k0qhav/implementing_state_machines#c_youwtm

      Talk: https://www.youtube.com/watch?v=5YjsSDDWFDY&t=1s

      1. 1

        I don’t know the reason first-hand, but I noticed that too. I’m working on a model-based language / compiler in my spare time, and if I were to guess, it’s because synthesizing production code from a model is actually hard, and probably impossible to do in the general case.

        So in my case, I switched to supporting macros which give you control over how to derive the implementation. It’s more work, but also more control. There are just so many different design patterns / optimizations you can use when going from a model to an implementation that I think it’s much harder to generalize than “regular” languages. By “regular,” I mean where the compilation is not introducing any abstractions and just trying to get a one-to-one translation of the source program.

    6. 1

      One problem I periodically noodle on is, once I have say 100 tests for a text editor widget, how can I get the big picture of its semantics?

      To keep the conversation concrete, here’s the set of tests I’m currently thinking about:

      initial state
      draw text
      draw wrapping text
      draw word wrapping text
      draw text wrapping within word
      draw wrapping text containing non ascii
      # mouse
      click moves cursor
      click to left of line
      click takes margins into account
      click on empty line
      click on wrapping line
      click on wrapping line takes margins into account
      click on wrapping line
      click on wrapping line rendered from partway at top of screen
      click past end of wrapping line
      click past end of wrapping line containing non ascii
      click past end of word wrapping line
      # cursor movement
      move left
        move left to previous line
      move right
        move right to next line
      move to start of word
        move to start of previous word
        move to start of word on previous line
      move past end of word
        move past end of word on next line
      skip to previous word
        skip past tab to previous word
        skip multiple spaces to previous word
      skip to next word
        skip past tab to next word
        skip multiple spaces to next word
      # mutating test
      insert first character
      edit wrapping text
      insert newline
      insert newline at start of line
      insert from clipboard
      backspace from start of final line
      backspace past line boundary
      backspace over selection
      backspace over selection reverse
      backspace over multiple lines
      backspace to end of line
      backspace to start of line
      # scroll
      pagedown often shows start of wrapping line
      pagedown can start from middle of long wrapping line
      pagedown never moves up
      down arrow moves cursor
      down arrow scrolls down by one line
      down arrow scrolls down by one screen line
      down arrow scrolls down by one screen line after splitting within word
      pagedown followed by down arrow does not scroll screen up
      up arrow moves cursor
      up arrow scrolls up by one line
      up arrow scrolls up by one screen line
      up arrow scrolls up to final screen line
      up arrow scrolls up to empty line
      pageup scrolls up by screen line
      pageup scrolls up from middle screen line
      enter on bottom line scrolls down
      enter on final line avoids scrolling down when not at bottom
      inserting text on final line avoids scrolling down when not at bottom
      typing on bottom line scrolls down
      left arrow scrolls up in wrapped line
      right arrow scrolls down in wrapped line
      home scrolls up in wrapped line
      end scrolls down in wrapped line
      position cursor on recently edited wrapping line
      backspace can scroll up
      backspace can scroll up screen line
      # selection
      select text
      select text using mouse
      select text using mouse and shift
      select text repeatedly using mouse and shift
      cursor movement without shift resets selection
      edit deletes selection
      edit with shift key deletes selection
      deleting selection may scroll
      copy does not reset selection
      cut without selection
      paste replaces selection
      # search
      search upwards
      search wrap
      search wrap upwards
      # undo
      undo insert text
      undo delete text
      undo restores selection

      The full set of tests is 2kLoC. Has anyone built a model for something like this? Found inconsistencies in the model? Was the model denser than the tests?

      1. 3

        Spitballing here, but:

        • Daniel Jackson (https://people.csail.mit.edu/dnj/) has done a bunch of modeling of photo manipulation software in Alloy, so he’d be a good person to talk to, and he’s super friendly and open to email.
        • The Philosophy of Software Design Guy (John Ousterhout?) explicitly uses a text editor in his example, but it wasn’t formal models, still he might have leads.
        • State machines????
        • Yeah I’d probably first try modeling it as a state machine
        • The best approach would likely not be one all-encompassing model, but several individual models for different aspects that assume the other aspects are “already correct”.
        • Right now I’m too fixated on making sure that the undo is always correct, that’s a standard formal methods example problem. Starting with the intricate stuff is good
      2. 3

        Ok, there’s quite a bit of functionality here. I started out modeling it, just as a Typescript class here. The key being that the class state represents the core system state, and the methods represent user actions. Definitely nowhere near all of the functionality you’ve shown here, but it’s a fun example either way.

        The first thing I noticed about your test cases is that a lot of effort is being spent trying to make sure that word wrapping works correctly. So the first thing I did was think about what should the core data structures of the editor be so that wrapping could be supported. I don’t know anything about implementing real text editors (other than that some interesting data structures like ropes), but that’s ok because I want the model to be simple. So I first modeled “content” as a 2d array of lines of text, separate from the buffer data structure that’s rendered to the screen which includes wrapping. This seemed simplest to me, because as you’re typing, you don’t have to worry about word wrapping or anything related to display, you just keep appending to the line until a newline character is hit.

        That right there is an example of the benefit of modeling, you can use simpler data structures because the purpose is to define behavior, not to release this to the world.

        After that, I also wrote a simple property-based test to see that no matter what the content is, when it’s rendered to the screen the word wrapping never creates a line greater than the configured line width. This is one way to check that the model itself is correct, by checking invariants.

        Last thing - don’t forget, you can always just check individual test cases on the model if there’s anything you’re uncertain about. Those tests will be easier to set up and faster to run because the model is much simpler.

        Of course, as you model more and more functionality, the model will become denser, and you run the risk of misspecifying it. But, I don’t think test suites solve that problem either. When you make a logic change, how do you know exactly which test cases should break or be changed ahead of time?

        1. 1

          Thanks so much for engaging with my example.

          I actually use exactly this data structure for my “real world” implementation :)

          I find it very interesting that the conversation about model checking transitioned so seamlessly to property testing. Do they seem like adjacent concepts to you? It seems right with hindsight, but I hadn’t suspected this before, and a superficial reading of the wikipedia links doesn’t seem to show this either. So thanks for the insight. I’ve used property testing extensively in the past, so perhaps I’m more mentally prepared to get into model checking than I’d realized.

          1. 3

            I find it very interesting that the conversation about model checking transitioned so seamlessly to property testing. Do they seem like adjacent concepts to you? It seems right with hindsight, but I hadn’t suspected this before, and a superficial reading of the wikipedia links doesn’t seem to show this either.

            They’re very related, it just doesn’t look that way online because they come from different academic silos. That’s also why there wasn’t (until recently) much overlap between PBT and fuzzing, and (still) PBT and metamorphic testing.

          2. 2

            I actually use exactly this data structure for my “real world” implementation :)

            Ha, nice. I’m sure it gets you pretty far.

            I absolutely think model checking and property testing are related. In both cases, you have a “property” which is just a logical predicate that you want to be true in all scenarios. Model checking just means you check that the property holds in all cases up to some bound, so that the checking is finite. Most people associate property testing with checking random inputs, so it’s not exhaustive, but because of that you can control how long you run it for / how deep you explore.

            It’s just one of those things where people come up with idiomatic phrases, and they stick, but they share the same high-level idea. I think model checking has a lot more nuance to it, because you can evaluate things symbolically and don’t ever need to execute an actual program. But at the end of the day, in both cases you have a description of a system and are checking logical predicates about it.