Excerpts from Introduction: “A loop invariant is a predicate that holds for all possible executions of the loop. Loops are technically interesting because they can encode an infinite number of behaviors. The goal of an invariant inferencer is to find a loop invariant, a predicate that is valid for all behaviors that the loop can have. This inference problem is hard in general and tools apply a myriad of sophisticated techniques to this end. In contrast, an invariant checker proves whether a given predicate is an invariant or not, a substantially easier task. Unsurprisingly, existing static analyses for invariant checking are more mature than their inference counterparts.
We break down the problem of loop invariant inference into two phases. In the guess phase, the loop is executed on a few inputs and the observed program states are logged. These logs constitute our data. Subsequently, a learner guesses a candidate invariant from the data. This task is substantially easier than actual inference as the output of the learner is only a candidate invariant: there can be executions of the loop (that are absent in the data) for which the learned candidate does not hold. In the check phase, a static analysis proves whether the candidate invariant is an actual invariant or not. In the former case, the inference succeeds. In the latter case, we repeat the process with additional data. We refer to this overall approach that alternates between guessing and checking as Guess-and-Check.
In contrast to a static analysis, the learner infers candidate invariants solely from data and does not look at the program text at all. The program text needs to be analyzed only by the checker that solves the easier problem of invariant checking. Therefore, a Guess-and-Check based approach can succeed where a traditional static analysis can fail. Next, this separation allows us to leverage the significant advances achieved in statistical machine learning. Since machine learning techniques are very different from the techniques routinely used in verification, the Guess-and-Check based inference engines have different strengths and weaknesses compared to traditional static analysis based inference engines. Finally, the decomposition substantially simplifies the architecture of a verifier and leads to simpler implementations. The learners are just data crunching algorithms which are easy to implement. For checking, we use off-the-shelf SMT solvers and take advantage of recent advances in automatic theorem proving.”