1. 25
  1. 11

    From my misspent time in physical sciences: All models are wrong, some models are useful.

    1. 5

      One of my favorite quotes. A model has to leave details out at the end of the day. So all models are definitely wrong. But they become quite useful when they’re wrong in inconsequential ways.

      1. 5

        Alternative phrasing, “a map is not the territory”:


        1. 2

          That’s actually even better. Maps were created because the territory was unnavigable. That’s exactly what I find useful about models - they make implementations navigable.

      2. 5

        All models are wrong: database models of information, mental models of systems, code models of reality. It’s our job as software engineers to make these wrong models useful.

        1. 2

          This should be the tagline for “Data and Reality”

      3. 4

        I know, vaguely, that modeling and formal methods have been explained at length numerous times on here, but could anyone point me to their personal favorite introduction to designing and writing models? I found this article compelling, but I’m not sure what choices make for good models, nor how to answer some of the questions raised in the article (eg the verification gap).

        1. 8

          Here’s one of my favorite intros: modeling the browser same-origin policy in Alloy. To get a model of the same-origin policy, you have to model browsers, servers, URLs, and HTTP requests. Real-world implementations of these are obviously gigantic, but the model is small and digestible. As @ahelwer mentioned, Specifying Systems is like the holy grail of modeling books. Big +1 there.

          I’ll also add that modeling itself doesn’t have to be tied to formal methods. I’ve modeled several calculations at work in spreadsheets, and they come in handy in a few different ways.

          I’ve also created different models in some previous posts:

          Property-Based Testing Against a Model of a Web Application uses an executable model to express the domain logic of a web app and use that in a test against the implementation - that’s one solution to the verification gap.

          In Misspecification I built a model of a business-logic-esque feature and checked a property about it.

          I’m biased, but I think these are good intros because they focus on “typical” business applications.

          1. 4

            I’m a great fan of TLA+ so I recommend learning that. @hwayne on here wrote a textbook about it and has the website https://learntla.com/; personally I learned from the textbook Specifying Systems by Leslie Lamport.

            Everyone new to formal specification worries about the verification gap. It is a question asked at every single intro presentation. It’s difficult to convince people but it is literally Not A Problem. If the core logic of your app changes enough that the spec becomes outdated, then of course you’d want to update the spec for the same reasons justifying writing the spec in the first place! If you’re firing off updates to the core logic of your system without such validation, well… what happens happens. Specs are usually high-level enough that it isn’t as though you need to be changing them with every commit, or even every ten commits.

            1. 6

              It’s worth noting that most tools for formal models are tailored to a specific kind of problem (implicitly, if not by design) and so the right tool for the job depends on the problem that you’re trying to solve. For example, we use Sail for ISA modelling, but I wouldn’t recommend it for anything else (though, in theory, it should be able to model most forms of computation over things that can be mapped to bit vectors, to basically anything that you can compute on a real machine).

              Whether you get to verification or not, simply having a formal model gives you a proof that a correct implementation is possible. This is not always true and I’ve seen people build quite complex systems before realising that the thing that their approach is unsound and having t throw it away and start again.

              If you are proving things about your model then that does add quite a bit to the maintenance burden: redoing proofs when some of the early steps are no longer valid is time consuming. On the other hand, it should also give you more confidence that the approach is correct and reduce the need for changes. I generally like the approach of prototype, then formal model, then real implementation, then test the implementation against the model (tandem verification: do recorded execution traces through the implementation match permitted traces through the model?), the prove properties of the model. If your problem is small enough, then prove correspondence between the model and then implementation.

              To put that concretely, for my current project we have a formal model of the ISA that we’ve used with an SMT solver to find some counter examples of properties that we now have, and thought we had earlier. Some friends in Oxford are proving that the Verilog implementation corresponds to the ISA spec. Some friends in Cambridge have previously proven some security properties of an older version of the ISA and we hope to rerun their proofs soon. The core of our software TCB is around 300 instructions. We can combine these with the ISA model to export something into a theorem prover and try to prove things about (hoping to start that next year), but the first step of that is unambiguously articulating what the properties that we require there are and that step alone is valuable for the security audit.

          2. 2

            One thing not covered by the author is the existence of symbolic model checkers such as Apalache for TLA+, which have a different performance profile compared to finite model checking (enumerating all states and transitions between them as a BFS, basically).

            For the second paragraph about models as oracles, the term of art is “model-based testing” and has two flavors: using the model to generate events which are then pushed onto the system to check that the resulting state is as expected, or collecting traces from the running system and checking them against the model to see whether it is a valid trace of the system. A. Jesse Jiryu Davis et. al tested both these approaches at MongoDB: https://arxiv.org/abs/2006.00915

            Good article!

            1. 1

              Big +1 on symbolic model checking. And if I could pick a favorite buzzword, right now it’s “model-based testing” :D. I got interested in models from thinking about how to improve testing strategies. Anything related to testing and verification ultimately ends up with: well, you need a specification of your desired behavior to determine what’s “correct.” Then the next logical jump is: once you have your specification in the form of a model, why not use that to just generate your tests? It opens up a totally different workflow.

              Like you said, there’s different ways of going about the actual test generation, and that’s an interesting research area. I think there are plenty of new things to figure out there.

            2. 2

              This is an outstanding post. One of the best posts I’ve read this year. I’m looking forward to diving into other content from the same author/site.

              1. 3

                I really appreciate that. What are some topics that you’re interested in?

                1. 2

                  Since my last comment I’ve already found several of your posts interesting. I’m interested in any techniques that give us more leverage in terms of specification / testing / correctness. I’m interested in approachable introductions / tutorials / examples of model based testing and formal models.

              2. 1

                Speaking of the P language, does anyone know why they “regressed” from generating production C code from the model in the Windows USB case (~2014), to re-implementing the model as another program for AWS ?

                I watched the Strange Loop 2022 video and that was something that stuck out. The creator said “performance” was the reason, but I see the first case as much more demanding and sensitive performance-wise than the second

                I asked that here: https://lobste.rs/s/k0qhav/implementing_state_machines#c_youwtm

                Talk: https://www.youtube.com/watch?v=5YjsSDDWFDY&t=1s

                1. 1

                  I don’t know the reason first-hand, but I noticed that too. I’m working on a model-based language / compiler in my spare time, and if I were to guess, it’s because synthesizing production code from a model is actually hard, and probably impossible to do in the general case.

                  So in my case, I switched to supporting macros which give you control over how to derive the implementation. It’s more work, but also more control. There are just so many different design patterns / optimizations you can use when going from a model to an implementation that I think it’s much harder to generalize than “regular” languages. By “regular,” I mean where the compilation is not introducing any abstractions and just trying to get a one-to-one translation of the source program.

                2. 1

                  One problem I periodically noodle on is, once I have say 100 tests for a text editor widget, how can I get the big picture of its semantics?

                  To keep the conversation concrete, here’s the set of tests I’m currently thinking about:

                  initial state
                  draw text
                  draw wrapping text
                  draw word wrapping text
                  draw text wrapping within word
                  draw wrapping text containing non ascii
                  # mouse
                  click moves cursor
                  click to left of line
                  click takes margins into account
                  click on empty line
                  click on wrapping line
                  click on wrapping line takes margins into account
                  click on wrapping line
                  click on wrapping line rendered from partway at top of screen
                  click past end of wrapping line
                  click past end of wrapping line containing non ascii
                  click past end of word wrapping line
                  # cursor movement
                  move left
                    move left to previous line
                  move right
                    move right to next line
                  move to start of word
                    move to start of previous word
                    move to start of word on previous line
                  move past end of word
                    move past end of word on next line
                  skip to previous word
                    skip past tab to previous word
                    skip multiple spaces to previous word
                  skip to next word
                    skip past tab to next word
                    skip multiple spaces to next word
                  # mutating test
                  insert first character
                  edit wrapping text
                  insert newline
                  insert newline at start of line
                  insert from clipboard
                  backspace from start of final line
                  backspace past line boundary
                  backspace over selection
                  backspace over selection reverse
                  backspace over multiple lines
                  backspace to end of line
                  backspace to start of line
                  # scroll
                  pagedown often shows start of wrapping line
                  pagedown can start from middle of long wrapping line
                  pagedown never moves up
                  down arrow moves cursor
                  down arrow scrolls down by one line
                  down arrow scrolls down by one screen line
                  down arrow scrolls down by one screen line after splitting within word
                  pagedown followed by down arrow does not scroll screen up
                  up arrow moves cursor
                  up arrow scrolls up by one line
                  up arrow scrolls up by one screen line
                  up arrow scrolls up to final screen line
                  up arrow scrolls up to empty line
                  pageup scrolls up by screen line
                  pageup scrolls up from middle screen line
                  enter on bottom line scrolls down
                  enter on final line avoids scrolling down when not at bottom
                  inserting text on final line avoids scrolling down when not at bottom
                  typing on bottom line scrolls down
                  left arrow scrolls up in wrapped line
                  right arrow scrolls down in wrapped line
                  home scrolls up in wrapped line
                  end scrolls down in wrapped line
                  position cursor on recently edited wrapping line
                  backspace can scroll up
                  backspace can scroll up screen line
                  # selection
                  select text
                  select text using mouse
                  select text using mouse and shift
                  select text repeatedly using mouse and shift
                  cursor movement without shift resets selection
                  edit deletes selection
                  edit with shift key deletes selection
                  deleting selection may scroll
                  copy does not reset selection
                  cut without selection
                  paste replaces selection
                  # search
                  search upwards
                  search wrap
                  search wrap upwards
                  # undo
                  undo insert text
                  undo delete text
                  undo restores selection

                  The full set of tests is 2kLoC. Has anyone built a model for something like this? Found inconsistencies in the model? Was the model denser than the tests?

                  1. 3

                    Spitballing here, but:

                    • Daniel Jackson (https://people.csail.mit.edu/dnj/) has done a bunch of modeling of photo manipulation software in Alloy, so he’d be a good person to talk to, and he’s super friendly and open to email.
                    • The Philosophy of Software Design Guy (John Ousterhout?) explicitly uses a text editor in his example, but it wasn’t formal models, still he might have leads.
                    • State machines????
                    • Yeah I’d probably first try modeling it as a state machine
                    • The best approach would likely not be one all-encompassing model, but several individual models for different aspects that assume the other aspects are “already correct”.
                    • Right now I’m too fixated on making sure that the undo is always correct, that’s a standard formal methods example problem. Starting with the intricate stuff is good
                    1. 3

                      Ok, there’s quite a bit of functionality here. I started out modeling it, just as a Typescript class here. The key being that the class state represents the core system state, and the methods represent user actions. Definitely nowhere near all of the functionality you’ve shown here, but it’s a fun example either way.

                      The first thing I noticed about your test cases is that a lot of effort is being spent trying to make sure that word wrapping works correctly. So the first thing I did was think about what should the core data structures of the editor be so that wrapping could be supported. I don’t know anything about implementing real text editors (other than that some interesting data structures like ropes), but that’s ok because I want the model to be simple. So I first modeled “content” as a 2d array of lines of text, separate from the buffer data structure that’s rendered to the screen which includes wrapping. This seemed simplest to me, because as you’re typing, you don’t have to worry about word wrapping or anything related to display, you just keep appending to the line until a newline character is hit.

                      That right there is an example of the benefit of modeling, you can use simpler data structures because the purpose is to define behavior, not to release this to the world.

                      After that, I also wrote a simple property-based test to see that no matter what the content is, when it’s rendered to the screen the word wrapping never creates a line greater than the configured line width. This is one way to check that the model itself is correct, by checking invariants.

                      Last thing - don’t forget, you can always just check individual test cases on the model if there’s anything you’re uncertain about. Those tests will be easier to set up and faster to run because the model is much simpler.

                      Of course, as you model more and more functionality, the model will become denser, and you run the risk of misspecifying it. But, I don’t think test suites solve that problem either. When you make a logic change, how do you know exactly which test cases should break or be changed ahead of time?

                      1. 1

                        Thanks so much for engaging with my example.

                        I actually use exactly this data structure for my “real world” implementation :)

                        I find it very interesting that the conversation about model checking transitioned so seamlessly to property testing. Do they seem like adjacent concepts to you? It seems right with hindsight, but I hadn’t suspected this before, and a superficial reading of the wikipedia links doesn’t seem to show this either. So thanks for the insight. I’ve used property testing extensively in the past, so perhaps I’m more mentally prepared to get into model checking than I’d realized.

                        1. 3

                          I find it very interesting that the conversation about model checking transitioned so seamlessly to property testing. Do they seem like adjacent concepts to you? It seems right with hindsight, but I hadn’t suspected this before, and a superficial reading of the wikipedia links doesn’t seem to show this either.

                          They’re very related, it just doesn’t look that way online because they come from different academic silos. That’s also why there wasn’t (until recently) much overlap between PBT and fuzzing, and (still) PBT and metamorphic testing.

                          1. 2

                            I actually use exactly this data structure for my “real world” implementation :)

                            Ha, nice. I’m sure it gets you pretty far.

                            I absolutely think model checking and property testing are related. In both cases, you have a “property” which is just a logical predicate that you want to be true in all scenarios. Model checking just means you check that the property holds in all cases up to some bound, so that the checking is finite. Most people associate property testing with checking random inputs, so it’s not exhaustive, but because of that you can control how long you run it for / how deep you explore.

                            It’s just one of those things where people come up with idiomatic phrases, and they stick, but they share the same high-level idea. I think model checking has a lot more nuance to it, because you can evaluate things symbolically and don’t ever need to execute an actual program. But at the end of the day, in both cases you have a description of a system and are checking logical predicates about it.