1. 14

  2. 9

    This was a really exiting paper for me. It’s a young, proof-like approach to software correctness that’s just entering practicality following all the improvements to SAT/SMT solvers starting in the mid-2000s. This doesn’t get much into it, but this can also be sort of inverted to act as a foundation for program synthesis, generating programs to exercise tests or automatically patch bugs. I’m going to play with these tools and read more papers to get a feel for the space. I’m very tempted to write an industry testing tool on it. This research has been oriented towards binaries and C-like languages, but an expression-oriented language with control of side effects might be able to sweep away quite a lot of the pitfalls and minimize/memoize the state space.

    1. 5

      Somewhat related, I read the paper from https://news.ycombinator.com/item?id=15626816 for “collapsing” nested interpreters and compilers. Their technique is to make each language’s elimination forms (function calls, if/then/else, etc.) into parameters, and pass in different implementations depending on whether we’re interpreting (e.g. f(x) should call f with x), compiling (e.g. f(x) should generate code which, if executed, will call f with x), etc.

      It dawned on me that (at least in pure functional languages), it’s only the elimination forms that we can’t reduce without input data, and hence they are the only thing stopping us from executing programs ahead of time. Symbolic execution solves this by “taking both branches” and accumulating formulas which describe all possible paths. We could do something else: reduce until we hit an elimination form, then take one of the branches (as if we had a value, rather than a symbolic input), but store a formula which records the branch we took.

      For example, if we get stuck at if x then foo else bar, we can pick a branch arbitrarily (like a fuzzer or property checker). Say we pick the then branch, we’d store the following formula:

      For all a and b, if x then a else b reduces to a

      Thanks to this new reduction rule, our reduction becomes “unstuck”, since we can reduce if x then foo else bar to foo. We can carry on reducing until we get stuck again. Note that we shouldn’t encounter any inconsistency, since any other branches on x will be reduced the same way. This lets us trace execution paths through any (pure) function without having to obtain or generate any input; I don’t think we even need to know the input type, if we know that the function type-checks and pattern matches are exhaustive.

      Note that the reduction rules we accumulate are actually co-patterns which (perhaps partially) define the input value.

      This is very similar to symbolic execution, but also related to techniques like LazySmallCheck. If we have control of the language’s elimination forms (either by hooking into a meta-level, or by writing our own interpreter) then we don’t need pesky things like data generators. AFAIK fuzzers already allow this, but they manage it by instrumenting at the machine-code level, where all data is just words/bytes/bits and all branching is binary; this makes the implementation easy (in theory!), but any interpretation of the results will be quite far removed from the original language’s semantics.

      I think I’ll write this up somewhere… :)

      1. 3

        Please do write this up and submit it. This was interesting with useful references and I’m very curious to learn more.

        I can’t claim to fully understand what you’re proposing, but it sounds quite a bit like how languages with immutable data almost always do mutation by passing modified values at function calls, sort of moving mutability into the call stack. Similarly, this seems to move the path constraints into the call stack of the solver.

        I’m also curious as to what happens if symbolic execution/concolic testing starts from the invariants and solves by working bottom-up the AST from the return points rather than top down from the starts of the function. I don’t have the mathematical intuition or hands-on tinkering to know if this would be an improvement or even viable, and a brief search didn’t turn up relevant papers from people taking a bottom-up approach to symbolic execution.