1. 24
  1. 3

    That sounds pretty cool.

    One question I have is, if the SMT solver in CBMC doesn’t find a solution, can you guide it towards one?

    How fast do these checks run? Are they cheap enough to run in CI? (i.e. about one second in average per check, say).

    Do you ever have a problem where you upgrade or change the settings for CBMC and then find a large number of tests blow up because the model checker no longer solves them because of some small change? Or… do the solvers tend to be really robust?

    Fwiw one advantage that the current #[cfg(rmc)] syntax has is they it does surely admit the possibility of other competing model checkers. ;)

    1. 2

      I think it is a problem if the solver cannot find a solution, as it just loops endlessly and you eventually give up. To help this, you need to reduce the scope of the search (e.g. reduce the size of arrays you are considering). Obviously that’s not ideal.

      As for changes in CBMC … well actually I don’t have enough experience with that tool yet to comment. However, with other tools (e.g. Z3) I have experienced exactly this problem. Something that works, no longer does. And, conversely, something I couldn’t verify worked after I upgraded Z3!! It’s not a super fun experience at times, but I guess that’s beauty of being on the “edge” :)

      1. 2

        Something that works, no longer does. And, conversely, something I couldn’t verify worked after I upgraded Z3

        This is something that F* solved quite early on. When you verify something it records enough hints that it can push Z3 into the same solution in the future even if the tactics in Z3 change between releases. It doesn’t solve the other problem though, that sometimes a solution exists but it’s a million years away in the path that Z3 is exploring. In general, solvers (including Z3) address this kind of thing by providing a mechanism for user-defined tactics to guide the exploration. Surfacing that in the language is hard, but I think F* has a few tricks to try to do this.

        It’s a general problem with the approach though, the possible results are always ‘pass’, ‘counterexample’ or ‘solver timeout’. C++ is an inspiration here because C++ programmers are already to accept this (C++ compilers either succeed, report that the input is invalid, or say template / constexpr recursion depth exceeded) so as long as you can make the failures deterministic and understandable then there’s a good chance that something like this could be possible.

        1. 1

          Hey David,

          That’s interesting about F* … neat trick.

          solvers (including Z3) address this kind of thing by providing a mechanism for user-defined

          Yeah, Boogie has “triggers” which you can provide to help. In fact, adding triggers to the underlying boogie will allow this example to verify. But, I have now syntax for triggers in Whiley. Furthermore, I don’t want to hack in specific Boogie-related syntax, but something which is somehow more general and would apply across different verification systems.

        2. 1

          Thank you.

      2. 1

        How does this sort of thing compare with formal verification? I’m not entirely well versed in formal verification. Is this the same thing?

        1. 1

          Well, it is a form of lightweight formal verification. Its not full static verification because, in most cases, it only checks a program is correct up to some bound (e.g. arrays of most size 3 in my example). Therefore, its possible that there are bugs exposed only for larger arrays. In contrast, full formal verification checks for all possible sizes (but is actually quite a lot harder to do).

        2. 1

          How does this relate to the approach in QuickCheck? From a quick scan of the article it seems similar.

          1. 2

            Well, the way you go about using tools like QuickCheck and RMC is perhaps similar, but the way they actually work under-the-hood is actually quite different. For example, RMC uses CBMC which means it does symbolic execution of the program — that is, it doesn’t actually execute the code. Whereas QuickCheck does actually execute the code and must try a lot of inputs to find useful stuff.