1. 28

I keep seeing books such as Software Foundations show up on forums with people interested in formal verification told they should look at them. I’ve watched people try and fail to learn this stuff for years. Very few make it. Some even get bitter against the concept of formal methods because they attempted the hardest kind. Those that enjoy it outside of CompSci’s heavy hitters are usually doing toy problems for fun. I’d rather advise people new to the topic in such a way that we get more people into the field over time using whichever productive, formal methods they were able to learn.

So, based on a prior comment here, I just drafted up this advice to drop on any thread like that I see:


Here’s the structure of the argument/advice:

  1. Tell them how hard it is. Most don’t really want to deal with that.

  2. Tell them about formal methods that are easy to learn with immediate benefit. Suggest they start with those.

  3. Tell them what resources to use to learn the hard stuff. Tell them it will take lots of time to learn.

  4. Conditional: If they fail, tell them where No 2 can take them. If they succeed, tell them about best works in No 3 to increase odds theirs will be similar.

It seems pretty straight-forward. I’m sure there’s room for improvement, though.


  2. 14

    I would absolutely love a “beginners guide to formal methods” resource. Right now the field is mostly impenetrable but doesn’t have to be. I think a lot of that is because what formal methods used to be useful for is much more limited than what it is useful for now, but resources are still aimed at that narrow domain. Here’s my thoughts on the current structure you’re outlining:

    Tell them how hard it is. Most don’t really want to deal with that.

    I don’t think formal methods are necessarily that hard. Like most of us understand type systems, and that’s a kind (if not a very robust kind) of formal method. There’s a spectrum of difficulty/power, and I think that’s worth emphasizing. So maybe we could adjust this to:

    • There are many different kinds of formal methods, just like there are many different kinds of type systems or bicycles.
    • Flyweight Methods, like TLA+ or Alloy, are much easier to learn and easier to use. Formal Verification is more rigorous, but much harder and slower.
    • Start with flyweight methods. For 99% of people it will be much more useful in their problem domain. For the remaining 1%, it will give you a solid foundation you’ll need for learning formal verification.

    Tell them about formal methods that are easy to learn with immediate benefit. Suggest they start with those.

    Woop! I’d add to this that it’s worth explaining why it’s easier and more beneficial.

    Tell them what resources to use to learn the hard stuff. Tell them it will take lots of time to learn.

    This is a bit more extreme, but what if this is extracted out? Instead of saying “here are the resources to for learning verification”, we say “go here to find the resources for formal verification”. It’s a small change and adds a bit more maintenance, but it might make it more obvious that it’s not a great starting point.

    EDIT: I’d be happy to write a brief compare/contrast on TLA+ vs Alloy if that would help the expansion.

    1. 2

      I’ll look into how to easily explain the benefit of lighter methods. The heavy part could be extracted out into its own introductory article. David A. Wheeler’s High Assurance FLOSS article already has some material to draw on for both of these. Regarding compare/contrast, that would be pretty nice to write-up as I’ve only seen maybe one of those. There’s a lot of people who will ask about how they compare to each other if they keep seeing them together.

      1. 1

        Like most of us understand type systems, and that’s a kind (if not a very robust kind) of formal method.

        Agreed. I have seen it suggested that static type systems are the most widely used kind of formal method. That, of course, is debatable … still, food for thought.

      2. 9

        I think an expansion of this would be an enormously popular and beneficial resource. Lay out what the major topics are and draw the graph of their prerequisites and what resources are available for learning those stair-step topics. (And avoid the temptation to be comprehensive or make some kind of general tool for mapping knowledge!)

        1. 3

          Appreciate the feedback.

          And avoid the temptation to be comprehensive or make some kind of general tool for mapping knowledge!

          Lol. Yeah, that persistent habit of mine alternates so much between getting sighs and props over those comments that it’s hard to imagine how best to improve on it. I’m still going to try. Far as this proposal, it should be pretty easy since I already kept HN version concise. Maybe do next version like college papers where I limit size to a page or two (or less!) while still attempting clean presentation.

          1. 2

            And avoid the temptation to be comprehensive or make some kind of general tool for mapping knowledge!

            Or build this guidance in to an exiting one!

            I’ve been using the Learn Anything map for linear algebra. There is a small stub of one for formal methods that could use some work.

          2. 8

            In general, I think there is a major lack across much of the ‘hard fields’ (not just STEM, but in the humanities too) of meta-information. I’m a serial autodidact, so I’ve gathered a reasonable amount of data with respect to my approach to new topics. Specifically, the pattern is something like:

            1. Identify a problem I want to solve/thing I want to build
            2. Look for prior art – has someone done the thing/built the thing I want before me? Can I crib from them?
            3. Look for best practices – how do people usually approach this? What are the tools/techniques/ideas I need to know to try to do something new here?
            4. Based on 2 and 3, compile a list of sources to study
            5. Study the sources from 4.
            6. Attempt to solve the problem/build the thing
            7. If successful, repeat from #1 with new problem (perhaps a subproblem of the main idea, maybe something wholly new), if unsuccessful, repeat from #2 with the same problem until no new information can be found.

            This approach has two major problems, first – I have no idea what to do with the second branch of #7 if there is no new information found after the loop. I call that problem “Reality Sucks” and have not found any good solution for it.

            The second issue is understanding the gravity of attempting to walk down this path for any given topic. In my wheelhouse (Math and CS) I can more or less assess the ‘mass’ of a particular problem just by intuition. I have enough experience to have a horse-sense about it, and that’s good enough. For me, therefore, the advice you give would be redundant (that doesn’t mean it’s not valuable, just that I already have that information). However, this class of idea is very good. If someone told me before I dove into my more recent Mechanical Engineering bit that it would be a lot deeper rabbit hole than I thought it’d be, I might’ve started with a smaller problem/more limited scope.

            The meta-information – the horse-sense that people have about problems – is incredibly valuable. It helps to add context which can guide someone away from areas they are not prepared for, and can act as a meterstick for progress.

            This is a longwinded way of saying, “Yes, I think this is valuable, I wish there were more people around who did this class of thing for other areas.” But also to support @pushcx’s point, don’t necessarily try to make the resource generic; but I’d add that it is worth knowing the pointers off the map, rather than “Here be dragons”, it’s much nicer if you can say, “And if you go down that edge, you land off the map and in a scary place we call ‘Category Theory’ – it’s full of Haskellers, we try not to talk about it.”

            1. 2

              That’s a decent method to approach problems. I do something similar except I build way less to cover more ground. Instead of building, I dig into the lessons learned reports, case studies, and articles from veterans to do meta-studies of sorts. That’s why you’ll see me have a lot of useful info on a lot of topics but also slip up on something ridiculously obvious to a practitioner on occasion. I mainly do that to bring info to people. In general case, I recommend someone build something in each sub-field as you describe.

              Far as the warnings, yeah I can be more specific. One I was giving out to a lot of people before was the article below showing what it took to verify a simple system for landing gear. The thing I emphasized is that the formal specs exploded as they went from abstract requirements to detailed design. They also seem like piles of little things. It’s a combination of what people track in their heads about code that’s implicit along with limitations of the logics that are good for this sort of thing. Optionally, show them separation logic verifying simple code with pointers. Then, I tell them to imagine trying to do that for a real program they’d write more complex than state machine for landing but with pointers. To imagine how the extra combinations would explode the amount of logical terms they’d have to track. Good luck.


            2. 5

              I think there is a lack of imagination in your comment and your tone. Suppose someone wanted to make a fire. Your same train of thought sounds like this:

              1. Making fire is hard. Most people don’t want to deal with that.
              2. There are other ways to be warm. Have you tried wearing furs? Way less hassle, immediate benefit, you should start there.
              3. Here’s a list of resources about how to make a fire.
              4. If you can make a fire, here are the things I need you to burn. Don’t hold your knowledge hostage, use it for the purposes me and my friends have already enumerated in the following list.

              You are aware that I made it through about half of Software Foundations. Don’t delude yourself into thinking if I had finished the book I would be helping you with formal verification of boring security software. I took the class because I know how to program and wanted a better understanding of math. All technology is like fire: applicable to many things, often not foreseen by the inventor or discoverer. Your list of things you need burned, is it going to include making pizza, wax seals, and hardening steel?

              If every programmer spends one day with Coq, the improvement in the general understanding of what the word “formal” means might yield deep benefits. To see that one day’s investment in math as failure because it might not blossom into full-blown you-hood is a very limited perspective. I got what I wanted to get from that class: exposure therapy and a better understanding of mathematics itself. You will have to be content with that.

              1. 3

                You bring up a good point that many people explore this stuff for reasons other than its utility in specific problem domains. My version above assumes the opposite in how I present it. So, I should probably change the next one to be posed as a question about what the person wants to achieve. If it’s similar to my aims, then I offer them the next version of this. Otherwise, I tell them I hope they enjoy themselves on the adventure into formal methods.

                1. 2

                  Thank you!

              2. 4

                I think part of point (1) that’s important to address is: this is not programming. Or, to break it down further, formal methods are a different sort of thinking, have a different experience as you work through them, and arrive at different sorts of outcomes than you are accustomed to with many sorts of problem solving and particular with programming.

                On your HN comment someone asked what they would get out of using Alloy. The best answer I could give was “deep insight” and “maybe a really interesting test suite”. This is sort of true but also not a great way to actually discuss what using these tools will produce.

                Another part of (1) that’s important to address is: it’s probably going to be a big process of learning that you were wrong about things in really subtle ways. This, I feel, is one of the big barriers to entry on learning proofs generally: until you’ve gotten the hang of formalized reasoning you really, really want to believe that your own intuition is sufficient. If you’re open-minded then you’re in for a lot of stings as you learn a new mechanism of evidence and begin to create a healthy distrust for your intuition. It’s actually emotionally difficult.

                So, I think all of this leads in to (2): immediate benefits and drives are super necessary to push through the initial emotional and mental difficulty of turning the corner on formal methods. Clearly articulating what “success” is and providing fast wins builds confidence. I aimed at this a bit above with “test suite” in that it’d be cool to not just identify flaws but be able to produce counterexamples that can be valuable when sharing what you learned with others.

                Generally, I think a good foal for starting someone off on formal methods is: use them to be able to quickly take some kind of uncertainty you personally feel and translate it into a tool for building confidence in your uncertainty and communicating it with others. Type systems are, I think, particularly bad for this. They don’t have to be, but the fast compiler feedback loop is too tempting. The dependent systems get it right where they force you to go through multiple iterations of logical design as you build out an interface, but they’re waaaaay too difficult to get started on. Model checkers OTOH are great as I think many have discovered.

                (3) I think is worth poking at as well. Things can take a long time to learn and someone should be aware that the rabbit hole exists. On the other hand, there are lots of TLA+ success stories where people get whole teams on board relatively quickly. You can have fast success and if that’s what you want then: go get it.

                The goals of the student thus become important: how deep do they want to go? Discuss this with them and scale it appropriately. On one side, you can start off reading about logical foundations, constructivism, constraint solving. On the other, you can learn a convenient modeling language and delve into ever more complex practical situations.

                I think all routes here are harder than they ought to be, though. Ironically, the “if you want to build a type system, first you must invent the universe” direction is probably the “easiest” in that it’s been done over as a school curriculum for years. Other routes have faster bang-for-buck but require delving through more partial texts. I think this is often what someone is asking for when they “want to learn formal methods”.

                1. 1

                  Yeah, thinking of it as programming made some of it hard for me. Very different mindset. I probably need to look into explaining your great points about how the methods challenge people’s intuition. That’s what good science does in other areas. Like with some experiments, the results of the formal methods fight with what people expect. They might reject good data or internalize failings too much.

                  Your point about counterexamples for others was one of reasons formal methods were applied in high-assurance security and systems. The idea is the certification would product concrete evidence of certain properties that the supplier could hand to customers. This would be descriptions, formal specs/proofs, and tests in combination. Prior work shows this is helpful if the reviewing party takes the time to thoroughly read and check all that.

                  No 3 was about things like Coq and Isabelle, not TLA+. The easy stuff like TLA+ was No 2. I’ll try to clarify that in next version. I agree with your point on TLA+. If I’m reading you correctly, I also agree most people wanting to learn formal methods want practical techniques to improve their programming instead of the foundational stuff you describe. That might interest them but we need to be sure it’s the journey they want. Hence, my proposal. The partial texts problem is a big problem that combines rather painfully with the semi-complete tooling that Sophistfunk described. The good news is the lightweight ones have one set of good docs (Alloy) and one that’s in progress courtesy of a Lobster. ;) Once anything hits mainstream, I expect a lot of resources to emerge organically as people help each other and share models.

                  The heavyweight stuff still needs work, though. It’s too foreign to me for me to even say exactly what to do. Current stuff seems to pile things on people. Need a bunch that try to do it like the better books on programming. I will say the requirement is incremental process of building up skills with series of practical exercises.

                2. 3

                  My biggest issue with learning any of these things is the way no two parts of it are kept in matching, working order.

                  If the tools work, the completed documentation is for a version from 6 years ago with a different syntax. If the documentation is good, the tools don’t even build, or they crash out. Or they’re just 404s due to link-rot or the company that made them has been eaten by a conglomerate that sells guided missiles and airbag controllers.

                  … or finally, the documentation and tools are kept in working order, but they’re in Smalltalk, so the UI not only looks like your 3-year-old did it, but even the most basic behaviours you expect are completely alien, and once you get lost in some part of the IDE that has nothing to do with the actual tool you’re trying to use, you just give up because image persistence means you can’t quit and start over.

                  1. 1

                    The classic, CompSci problem that also affects the industrial ones sometimes in this space. That’s a good point. People trying to find ways to contribute who don’t want to learn formal verification might be helpful by just maintaining the darned packages. I wonder lightest method to do this. Maybe a different, trimmed VM or whatever for each project. One VM with a bunch of verification tools for many current ones. Anything worth keeping around might just get updated to the latest version. This is already sounding complicated, isn’t it? ;)

                    Yeah, this could be a problem for some time to come. I figure packaging up some key projects in ACL2, Coq, Isabelle/HOL, and HOL4 could have impact if they’re the ones many others build on. Then, project by project add the others that are worth it to the collection. Maybe some test framework to run the proofs or extractions for each on occasion to automatically spot when they break.

                  2. 3

                    I wouldn’t discourage people, I would just make it clear that this is still an evolving field and it is tough stuff, and not to judge their successes on what this field might become. One alternative for folks interested in this stuff might be Edwin Brady’s Idris book.

                    1. 1

                      I hear you. Might try to be careful about that. A few people I know gave up on formal methods because of Idris being too hard to learn. I think that was before the book, though. Do you know of a lot of people that aren’t math strong who were able to follow the book and apply its lessons?

                    2. 2
                      1. Tell them why it’s useful, so they can decide whether to move on to step 2, before discovering that they can’t really miss out step 1. I didn’t understand the point or the application of formal methods until I had the basics of the underlying maths taught to me (formal proofs in Boolean logic and set theory, in the context of the Z notation). But that was explained so well that even when I’m doing Python or JavaScript, I’m thinking in terms of maths and converting that into code. Then I can see how notations that let me capture more details of the maths, like design by contract, OCL, TLA+ and other formal methods, would be useful.

                      I will grant that some people will want a tool that can be applied to “make code better”, my experience is that many of the tools that have been introduced assume a certain way of thinking about expressing your problem, and have been ignored (design by contract, relational and logic programming, constraint programming, some would put TDD and object-oriented programming here) or “considered harmful” (structural programming, object-oriented programming, and TDD in their “as practised” forms) when they’re applied without the associated change in thought. Which is not surprising, when you consider that people are both searching for tools, and claiming to promote the tools they promote, in the camp of your step 2, skipping over (for adopters) or glossing over (for promotors) step 1.

                      1. 2

                        Yeah, people learning new paradigms do often say the benefits came after they shifted their thinking to work with it. Newcomers should be warned that this is necessary for these methods which similarly benefit people once that shift happens. I made the mistake long time ago of glossing over the difficulties because experts in the field were overselling them. These days I try to be a little more careful.

                      2. 2

                        As many here have said, “formal methods” could be a lot of different things. People will have different motivations, too. Some may want to master something difficult or esoteric for its own sake, or for reasons of ego and status. Others are genuinely in search of practical ways to make more trustworthy software, but there are lots of possibilities there too. Are they designing a system, or writing a spec? How much control do they have over the implementation? The most difficult verification projects I can think of involve proving that a pre-existing implementation (written for performance, probably in C) obeys a frozen, pre-existing spec (hopefully in some formal language!). As far as I know, even FM experts can only do this for very small, well-defined systems.

                        On the other hand, merely choosing to build your software in a language that offers some safety guarantees can avoid some headaches altogether. Gaining proficiency and fluency in these languages necessarily involves learning to think mathematically, with both precision and discretion. These habits of mind have not been fully appreciated by most practitioners in industry, nor are they given much incentive there. (Academic computer science, where logical rigor can become an end in itself, has its own perverse incentives, which may help explain why its artifacts are so often inaccessible.)

                        “Almost anything in software can be implemented, sold, and even used given enough determination. There is nothing a mere scientist can say that will stand against the flood of a hundred million dollars. But there is one quality that cannot be purchased in this way — and that is reliability. The price of reliability is the pursuit of the utmost simplicity. It is a price which the very rich find most hard to pay.” – Sir Tony Hoare

                        Managing complexity is an essential skill, and difficult in that it involves a discipline uncommon in our modern resource-rich computing culture, where orders of magnitude worth of code bloat have been progressively normalized for generations. Worse, the people making the decisions about which new features to implement rarely have much insight into, or responsibility for, the cumulative cost of these features. I think that engineers investigating formal methods should first try to attain reasonable expectations for these techniques.

                        You’re absolutely right that here’s a continuum from easy to difficult, with different payoff curves for different kinds of projects. The perfect is the enemy of the good, and effective engineering solutions shouldn’t be conflated with mathematically elegant proofs. If you have control over both the program spec and its implementation, there are plenty of gradual approaches possible. If your project doesn’t have any tests, write some. If unit tests can’t cover enough, try property testing. From there it’s not a big leap to design by contract: check your assertions at runtime, not just testing time. If you decide that runtime exceptions are not acceptable, you’ll be more ready for something like Daphny, where an SMT solver runs against your code as it co-evolves with its spec. Having come that far, Coq and similar “industrial strength” verification systems won’t seem so intimidating any more. But, even if you don’t get that far (and most projects really shouldn’t bother) your program will be somewhat more safe, and you’ll understand it somewhat better.

                        1. 2

                          If you decide that runtime exceptions are not acceptable, you’ll be more ready for something like Daphny, where an SMT solver runs against your code as it co-evolves with its spec

                          Yeah, Dafny is a great example here (https://github.com/Microsoft/dafny) and also Whiley (http://whiley.org). Checkout this demo which shows formal methods doesn’t have to be that hard:


                          1. 2

                            Thanks for the spelling correction, and the Whiley link! Looks quite interesting.

                            1. 1

                              I was about to ask why you push Whiley in this thread so much when we have SPARK and Dafny in quite a usable state. SPARK has already been applied in IRONSIDES DNS and industry. MS Research is doing some amazing stuff on top of or using Dafny. Too much to write up. Then, I see your profile.

                              Well, pleasure to have you here. Also, thanks for contributing effort to high-assurance software esp in a language people might use for a productivity boost on practical stuff down the line. I think I still want to ask you, albeit nicely, in terms non-verification-experts can understand, “What advantages or disadvantages does Whiley have compared to the other two?” It’s worth asking given the small ecosystems of both verified code and tooling to support it building up around those languages. Those make SPARK and Dafny my default recommendations in this space if it’s about using stuff for practical reasons. There’s still people just enjoying a language for various reasons or contributing to one that’s different than others. Maybe even practical advantages if we’re lucky. :)

                              EDIT: Sorry I forgot that yours was object/functional, not imperative. I warn that I got serious memory problems so I slip on that. Still want to here your selling points on current or future state, though.

                              1. 2

                                What advantages or disadvantages does Whiley have compared to the other two?

                                Ah, this is the question I get asked all the time!! It’s a fair question. There are different answers:

                                • Compared with SPARK, I think there are a lot of differences. Compile-Time Verification in SPARK is still relatively limited in what can be done. My goal is to go way beyond this. Of course, SPARK has a largish community and has been used in lots of real-world industrial projects. It is hard to compete with that.

                                • Compared with Dafny, well actually they are quite comparable in many ways. Whiley and Dafny actually started around the same time (roughly 2009). When I started, I simply didn’t know about it! There are some differences in the syntax which I personally believe are significant (but I am biased here). The main thing though is that Dafny builds on Z3, whereas Whiley builds on its own theorem prover. That’s good and bad, but it just provides another point in the space. It’s easy to work with Whiley and extend it as you want. At the moment, the theorem prover in Whiley is not as good as Z3, though I am catching up pretty fast now.

                                • Ultimately, the real answer to your question. My goal in building Whiley was just to learn everything about building such a system. That’s why it’s a start-from-scratch project and also why I build the theorem prover as well.

                                I warn that I got serious memory problems so I slip on that.

                                Well, it has imperative syntax and you shouldn’t underestimate the benefit this gives (with respect to memory). For example, when compiling arrays to Java they go to Java arrays. The only question then is how often you need to clone these arrays (to achieve functional semantics) and, in many cases, you hardly ever have to. But, I do still have more work to do on efficient backends …

                                1. 2

                                  In SPARK and Frama-C, they use Why3 to leverage the results of the SAT and SMT solver advancements with boosts over time. Then, the number of projects building on Coq, Isabelle/HOL, and HOL4 mean they’re advantageous for how your efforts might get easier over time. Smart frontends or heuristics might be built for them to better suit your language.

                                  So, why did you decide to write your own prover? I’m curious since trying to exceed SPARK in industry or Dafny in MS-funded R&D is already pretty challenging.

                                  1. 2

                                    So, why did you decide to write your own prover?

                                    Well, just because I wanted to know how they work. I don’t want my system to rely on a black box that I don’t understand and/or cannot fix/update, etc. And I can provide the best experience possible by having a tight integration between my language and my prover.

                                    I’m curious since trying to exceed SPARK in industry or Dafny in MS-funded R&D is already pretty challenging.

                                    I think you might be surprised. MS is not investing heavily in either Dafny or Z3 any more, though they are still being developed. Building a theorem prover is not as hard as you might think, despite what everyone says. Especially in the context of a language like Whiley / Dafny where the generated verification conditions are pretty small. Something like Z3 can handle boolean formulas with 100K+ variables. But, you’ll never see something like that when verifying whiley/dafny programs. For SPARK, well I know someone who was on the team for the SPARK 2014 update. There’s no rocket science going on here. It’s not magic. They have some people like me working on it 9-5. But, the mythical man month always applies and they have other concerns to worry about as well.

                                    1. 1

                                      “Well, just because I wanted to know how they work.”

                                      Makes sense. I feel you on the black box thing as ground-up, trustworthy systems is one of my research areas. I thought about trying to learn the stuff myself at one point. People told me book, The Little Prover, by the Little Schemer people was nice. I’m trying to avoid the heavy texts. I also thought studying and trying to extend Davis’ Milawa prover might be fruitful given what it already accomplished. Maybe put a Prolog in it that ties into the prior ones that were partially verified. Then, type systems, verification generators, or some other things might be prototyped in something that verifies them.

                                      Note: You might wonder at some point why I talk about more angles than I could possibly be building on. I’ll say upfront that I mostly work on a high-level so I can run through as many concepts as possible to filter down to good recommendations for specialists that do build things. Also, integrate things in fields that are often siloed. I’m basically like an applied science version of meta studies who tries facilitate serendipity in R&D among hands-on researchers. I don’t build a lot myself. Of sub-fields, I’m most ignorant in hands-on formal verification and hardware development but still get and give useful ideas here and there talking to specialists. :) I can send you a few of my summary write-ups that came from that process if you’re curious. Back to the conversation.

                                      “I think you might be surprised. MS is not investing heavily in either Dafny or Z3 any more, though they are still being developed. “

                                      WHAT!? That’s crazy. MS Research seemed to be in the lead developing so much jaw-dropping stuff on every part of the stack, shallow vs deep verification, and even some stuff easier to learn (eg Dafny). I expected MS to piss on their own good work but not MSR. They seemed wiser…

                                      “There’s no rocket science going on here. It’s not magic. They have some people like me working on it 9-5.”

                                      It’s good to know. You’ll probably be fine on existing capabilities of something like SPARK over time in that case. There’s still apparently challenges handling algorithms with lots of pointers or dynamic allocation. The examples I’ve seen take separation logic at the least. Rust’s borrow-checker cheats turns it into basically a heuristic exercise of figuring out what it will accept. I told Yannick McCoy they need to get that into Ada or SPARK pronto to remain reigning champ in safe, systems languages. I wonder what odds are of getting equivalent of a borrow-checker into a useful subset of C. You using a language with a GC might let you dodge that one but it’s a huge TCB. You might want to consider compiling with a borrow-checker to Clight or something at some point. You current model might at least get significant impact in mission-critical Java, a market SPARK doesn’t serve.

                                      1. 1

                                        I can send you a few of my summary write-ups that came from that process if you’re curious

                                        Sure, interested.

                                        MS Research seemed to be in the lead developing so much jaw-dropping stuff on every part of the stack

                                        Well, Spec# didn’t make it into production. I guess they don’t make much money of Z3, though they must make some. They definitely don’t make any of Dafny, and the lead guy has moved to Amazon (I think it is). Maybe they’ll continue development.

                                        You might want to consider compiling with a borrow-checker to Clight or something at some point

                                        Rust is definitely interesting to me. See:

                                        http://whiley.org/2016/05/28/reference-lifetimes-in-whiley/ http://homepages.ecs.vuw.ac.nz/~djp/files/MSCThesisSebastian.pdf

                                        1. 1

                                          Sure, interested.

                                          I can share some of my experiences with TLA+ and Alloy, if that interests you too. Throw me a message!

                            2. 2

                              That’s a great quote by Hoare. I also like your incremental approach. I’ve been leaning in that direction talking a default of DbC w/ property-based and random testing. They can always selectively do more where they want to. That’s the thing I liked about Cleanroom where all the structure was the same but they encouraged people to be reasonable about how each was verified. Probably should include a reminder about this in next version of this article as some will interpret the push as an all or nothing method like how some languages do OOP or function programming.

                            3. 2

                              Great advice! Thinking about your three points made me think of the following, I hope you find it useful.

                              In your comment you linked, I think the list of all the projects is extremely useful to see what’s going on. Related to your point (1), I think is that each of these projects is impressive because they built something within a theorem prover, e.g., no one would be impressed if someone wrote a C-compiler in C++, but verifying non-trivial theorems about a C-compiler is impressive.

                              Perhaps a better way to look at the problem of “how to get into formal methods?” is to look at it as “what do you want to get from your efforts?”.

                              This, I believe, expands on your point (2): if you want to reduce the number of bugs in your code, perhaps just fuzzing is the way to go. This probably has the highest return-on-investment for existing projects. I know this is probably no longer “formal” methods but I would consider fuzzing at least a distant cousin. Extending beyond this are more heavy-weight techniques like bounded model checking, abstract interpretation and concolic execution: these again are not full-blown verification but provide some formal guarantees. These are maybe stepping stone before the lightweight stuff mentioned in your comment, or perhaps are around the same area.

                              If you want to learn how to encode proofs into your type system, then maybe the first few chapters for Software Foundations or some other theorem prover textbook would be great: you won’t build anything “real”, but perhaps will have some extra appreciation for types. This relates to your point (1).

                              Related also to (1) is the difficulty in formalizing “correctness” of a program: often writing the specification for a program is more difficult than writing the program itself. A contrived example is, is specifying the correctness of int max(int a, int b). Especially for a “real engineering” problem where the problem itself is not very well specified, coming up with a notion of correctness is in and of itself difficult. Stuff like using the runtime sanitizers (ASAN, UBSAN, MSAN) makes encoding some notion of correctness into your existing tests easy.

                              Anyway, I feel this comment was a bit rambling but I perhaps the ideas will be useful to you: I agree that equating formal-methods with dependent types is maybe too far, and wanted to show some of the other techniques for reducing the number of bugs, which sometimes are not officially included in “formal methods”. (As you can maybe tell, my own interests are in fuzzing/bounded-model checking/symbolic execution/abstract interpretation instead of dependent types ;)

                              1. 2

                                To illustrate then, here’s the max function written in Whiley:

                                function max(int a, int b) -> (int r)
                                ensures (r >= a) && (r >= b)
                                ensures (r == a) || (r == b):
                                    if a >= b:
                                        return a
                                        return b

                                (try it out for yourself at http://whileylabs.com)

                                But, yes, in this case the specification is as long as the body itself. Here’s another example: the well-known indexOf function:

                                function indexOf(int[] items, int item) -> (int r)
                                // If valid index returned, element matches item
                                ensures r >= 0 ==> items[r] == item
                                // If invalid index return, no element matches item
                                ensures r <  0 ==> all { i in 0..|items| | items[i] != item }
                                // Return value is between -1 and size of items
                                ensures r >= -1 && r < |items|:
                                    int i = 0
                                    while i < |items|
                                        where i >= 0
                                        where all { k in 0 .. i | items[k] != item }:
                                        if items[i] == item:
                                            return i
                                        i = i + 1
                                    return -1

                                Again, overall, probably as much spec stuff as code. But, in the end, is it such a problem? Think about the lines of unit test code you would write to test this function. Likewise, the lines of documentation you would write for it anyway.

                                1. 2

                                  Thanks for the thorough response with the examples.

                                  In retrospect the max example was probably not the best for the point I was trying to make: I didn’t mean to say that “writing the spec is too hard:” I definitely agree, especially with comments+tests, that it’s preferable to write the spec and prove it’s correct rather than write tests.

                                  But, what about cases where it’s hard to determine what correct even means? For example, what if you wanted to write a spec that captures an arbitrary webpage loading? You can’t look at stuff like exceptions being thrown and not caught since so many webpages do this without them being “incorrect.” This is probably a moot point when talking about verifying stuff since it implies you need to know the specification first: so, what I was trying to say is that often actually coming up with the spec is hard, let alone proving its correct, thus making the problem even more difficult.

                                  1. 2

                                    what if you wanted to write a spec that captures an arbitrary webpage loading?

                                    Right, I see. Yeah, so this is really about whether or not we always try to make our specifications “complete” in some sense. The key to remember here is that we can still get value from it without specifying absolutely everything. In your example, it could be too hard to specify that. And, that’s fine! Or, we could try to capture it with e.g. a boolean flag to indicate when its loaded or not. Then we can specify things over this flag (e.g. cannot call this method unless flag is true, etc). That’s a kind of proxy approach.

                                    Other things like e.g. “this page must load within 200ms” are also pretty hard to specify. So, it’s not a silver bullet … but it can still be useful (much like types are).

                                    1. 2

                                      Sorry, just to add to that. My pet peeve (which I think is relevant to the original post) is that formal methodists do often seem to get carried away with always trying to specify everything. I personally think we need to move away from that mindset to a more middle ground.

                                      1. 3

                                        I’m gonna start calling myself a formal methodist from now on.

                                        Also, I completely agree. This is my favorite “don’t get carried away” piece on verification.

                                        1. 1

                                          I’m gonna start calling myself a formal methodist from now on

                                          You’ll like this quote then:

                                          it is clear to “Formal Methodists” like ourselves, for some of whom formal methods can be something of a “religion”, that introducing greater rigor into software development will improve the software de- velopment process, and result in software that is bet- ter structured, more maintainable, and with fewer er- rors


                                          1. 3

                                            Oh wait, that’s good for a separate reason.


                                            I told Ron Pressler (pron) that his vision of future formal methods was the past. That is, verifying the algorithms, not code, in formal specifications w/ combo of model-checkers and provers depending on utility. He said TLA+ was superior because it was basic mathematics: set theory and (iirc) first-order logic with concurrency/time support. I told him he basically just described Z + CSP/SPIN used for high-assurance systems back in the day. The work proceeded to go deeper than algorithm level due to the problems at code, compiler, and assembly levels they ran into. He’s right there’s still tons of mileage to get out of that level as prior work showed with low defects.

                                            Then, I kept telling @hwayne that one might have substituted TLA+ for SPIN combined with Z for its benefits on sequential. Otherwise, emulated same benefits in TLA+ itself if capable (not specialist enough to know). Then, your link has the TLA+ inventor himself doing exactly that. Wow. Then, the audience has no interest in it. Z’s tool support combined with TLA+‘s ease of use could’ve sped up deployment of lightweight methods that would’ve drawn more funding to them after successes came in like with ASM’s, SPIN, and Alloy. And they had no interest. Unbelievable…

                                            Well, I appreciate you drawing it to my attention since it was neat to see he tried it. He was right that he did have the benefit of no Z baggage. Maybe it was for the better they didn’t listen and he made TLA+. Had Z really taken off, the opposite might have been true as in Worse is Better effect of hitching a ride on a bandwagon.

                                            1. 2

                                              Then, I kept telling @hwayne that one might have substituted TLA+ for SPIN combined with Z for its benefits on sequential.

                                              I don’t remember you telling me that and now I feel really bad for forgetting. Sorry dude :(

                                              Regardless, though, I agree. I’m a huge fan of TLA+, but I recognize that it’s in part due to sentimental reasons. TLA+ is the first formal method I learned and also how I’ve managed to contribute back to tech. I wrote Learn TLA+ because there weren’t good TLA+ resources but also because I didn’t really know about any other formal methods techniques when I started. If I started with, say, Alloy, I probably would have seen how much harder TLA+ was to learn and how much smaller the community was, gone “nah”, and never learned it.

                                              I still think TLA+ is amazing, but now I recognize that 1) it’s “the best tool in a lot of cases” and not “the only tool in all cases” and 2) most of the difficulty using it is UI/UX, not essential complexity. It’s a rough diamond that needs to be cut.

                                              Alternatively, someone may be inspired by it to make something better. That’d be awesome, too.

                                              1. 1

                                                I don’t remember you telling me that and now I feel really bad for forgetting. Sorry dude :(

                                                I tell people hear about a lot of stuff. Forgetting something is inevitable. No worries.

                                              2. 2

                                                He was right that he did have the benefit of no Z baggage.

                                                Right, and I think that’s a big deal. Z is very big and somewhat bloated I think. Even attempts to mechanise the language were challenging, with tools like CZT appearing to help. That’s why B spun out with its tool support and then eventually event B.

                                              3. 1


                                                What book is that chapter you’ve referenced from? I found another chapter:


                                                … ?

                                                1. 3

                                                  Peter Gutmann. He’s super nice! I emailed him gushing about how much I loved that essay and he didn’t even call me weird for reading his PhD thesis for fun.

                                                  1. 3

                                                    ok, no worries I figured it out:


                                                    Another New Zealander no less!!

                                                    1. 3

                                                      It was Peter Guttman. I eventually give it to people to try to keep them from being overly optimistic about what formal verification can accomplish in general or alone. Of course, high-assurance security required a lot of assurance activities that all acted as a check against each other. Formal verification itself was done only on the small, central component enforcing the security policy. As CompSci operates in silos, many in formal methods community thought they’d solve everything that way much like we saw with OOP and unit testing zealots.

                                                      Old saying goes, “Everything in moderation.” Just seems more true the longer I’m around. I do dig the whole-stack efforts like CLI, Verisoft, and DeepSpec, though. We can keep some teams trying to verify all the core or at least practical stuff while the rest of us exercise a bit more sense for more productivity. I expected trickle down effects to kick in eventually which were delivered at least with CompCert and CakeML given usefulness in security-critical C and CompSci tooling (esp compilers or extracted code). Push-button simple eliminating whole classes of errors. Probably should count those checking SCOOP concurrency model or certified WCET analyses given obvious benefits of both in various sectors.

                                                      So, final outlook is formal verification should be usually done to simplest, most-critical parts of probably hardly any software projects. Most do stuff like DbC w/ generated tests in memory-safe languages. That with the modifier that even those people can benefit if some teams push heavy stuff as hard as possible.

                                                2. 1

                                                  That was the position of reviews in Cleanroom methodology. They kept everything structured for easy analysis but level of formality depended on what they actually needed.

                                          2. 1

                                            On top of it being in a theorem prover, most of them are done in a “LCF-style” theorem prover designed to make it hard to screw up the proof process. Both Isabelle and Coq also have work done on foundations where the verifiers or extractors can be verified themselves to some degree. So, people can focus more on the specs and extracted code than with a prover not designed that way. If the specs are peer-reviewed and right, this leads to a level of confidence that’s hard to achieve without formal methods. I dare say near impossible for at least optimizing compilers based on empirical evidence.

                                            Far as reducing bugs, you first have to know what your code is supposed to do. That’s why I push behavioral contracts a la Design by Contract. This can be combined with methods such as fuzzing to help you spot where it broke with laser precision. Those same contracts can also be used for generation of tests (e.g. Eiffel) or verification conditions (e.g. SPARK). Harder to do it the other way around. Hence, my recommending DbC combined with automated analysis and tests. That’s not a substitute for review or manual tests where appropriate (esp hard to specify stuff).

                                            The problem you mention on difficulty of specification is real. My shortcut as above is to just tell people to switch to tests for anything that’s hard. They can Google prior work if they want to see if anyone did it and how. As minimax was suggesting, might mention something about this when describing tradeoffs. Far as abstract interpretation, Astree Analyzer was one of my favorite results of software, assurance field given prevalence of C in embedded. I wonder what it would take for a CompSci team to crank out a FOSS version with equivalent capabilities. Far as dependent types equivalence, it’s definitely too far since people who give up on Coq and Idris often like TLA+, Alloy, SPIN, or even Z.