I’m aggregating all of the formally-verified Leftpads people wrote during the Theorem Prover Showdown. I asked participants to submit descriptions of their languages and explanations of their proofs so that people can compare and contrast the different tools. I’m still waiting on a few submissions, but there are enough up now that it’s worth sharing.
People are, of course, welcome to submit proofs for languages not yet present.
Thanks for this amazing project!
If I may, I’d like to gently complain about this sentence:
Thinking about total guarantees in this way confused me about proving code for a long time. Whenever I saw a “proven” piece of software, I could immediately think of several possible ways for it to nevertheless be wrong, so it was unclear how the “proof” was qualitatively different from other verification strategies.
Maybe a more accurate description could be:
Provably correct code is code that has been rigorously shown to implement a thorough specification using proof techniques from mathematical and logical praxis.
Yeah, I’m against the use of word guarantee or anything absolute like that. It hurt the formal methods field in the past when they were overpromising. Maybe something like, “Provably correct code is mathematically-proven to implement its intended behavior. If the specification is correct, code built this way usually has few to no errors in practice.”
Something along those lines. Maybe even with a link to CompCert’s testing results or something like that to illustrate the point.
Correctness is a relation between the input and output of a program. We say a program is correct if for every possible input, an execution of the program eventually results in a termination with the expected output.
The correctness relation is called functional if for every input only a single output is related. That is, each input uniquely determines an output.
Not all correctness relations have a correct program. An input-output relation for which no correct program exists is called undecidable. On the other hand, some correctness relations have multiple correct programs.
Typically, a programming language itself is already a specification language that allows one to establish an input-output relation in a rigorous manner. Sometimes one wishes to give a functional/relational specification before implementing a program, typically written down using mathematical notation.
Recent tools allow us to express functional/relational specifications on the computer too:
This is, of course, a simplification. Typically, the system one wishes to formalize is not functional, and thus requires more effort in abstract specification and refinement.
The words “rigorous” and “thorough” are not needed.
There’s a difference between establishing all that in math and it actually being true for the real code. It requires your specification to be correct, any specs/logics it depends on, the proofs, the checker, usually a parser, usually some I/O, the hardware, and so on. In this case, some of them needed automated solvers to be correct that are pretty complicated. I’m no sure that all have witness certificates or whatever they’re called yet. To be fair to proof engineers, many of them mention all of the above in their papers in the section describing assumptions and/or the TCB.
So, mbrock’s point that calling it a guarantee is too much seems to stand. I mean, even the seL4 people didn’t call it a total guarantee. They said the code should follow the spec assuming (long list here) isn’t wrong or broken.
I agree with your points. A proof is as much worth as you want it to be. See the DeepSpec project which tries to obtain proofs for a pragmatic computer: hardware specifications, OS specifications, prover specifications et cetera. However, this still not guarantees anything absolute.
My point is still that “Rigorously shown:” is vague. When is a method that shows correctness rigorous, when is it not? “Thorough specification:” is vague. When is a specification thorough, when not? Both are matters of trust, it seems to me.
“But there is a difference between showing that in math and it really being true for code.” Well, if one mathematically deduces a property of some code, and it turns out to be really false, one should start to doubt the foundations or mechanism by which the deduction took place. In principle, any deduction should be true for the real code.
Although in some logics, not all true properties can be deduced (incompleteness), it does not mean that we should accept logics in which false properties can be deduced (unsoundness).
Or did I misunderstand what you mean by “the real code”?
“My point is still that “Rigorously shown:” is vague”
Ok. That’s a decent point. It is kind of vague. It does have a subjective meaning that a lot of people already understand. They think that a “rigorous” analysis, testing, or whatever phase means piles of effort was put in vs a non-rigorous approach. They expect to get significantly more results or less problems. The fact that people rarely use that word for software QA might make that perception sink in even more.
I think I mainly keep using it for the psychological impact on people that might not understand the formal terminology well. I could tell them it relates to the specification. Yet, they might think UML when I say that. So, personally, I use a mix of precise terms from the field and terms that seem to convey understanding to those outside of it. I have no proof this term is better: by all means experiment to see if it works in your conversations or find a better way to say it. Talking rigorous vs typical QA has just worked for me. From there, I might say low, medium, or high rigor, confidence, and so on. Although the field is called assurance, I’m thinking of ditching that word, too, since confidence clicks in more people’s heads. I’ve been seeing some people talking about “high-confidence” software or techniques that “increase confidence that software works as intended.” Robustness and resilience are the other two terms many are using.
It would be great if the README went somewhere. It just stops after explaining why we’re proving leftpad. What’s next? Am I supposed to start browsing the code in the folders? Are there some worked examples elsewhere in the repo?
Ya, each folder has its own proof in that language along with an explanation of the implementation.