This is a nice worked example of proving various implementations of the greatest common divisor using SPARK. The ‘Ghost’ in the title refers to using Ghost procedures which are procedures that only exist at proving time, not runtime. Similar to proof functions in ATS.

As in other threads, I’ll warn that the effort on verifying this simple algorithm looks like more work than it is. One must compare it to what it replaces: full, manual verification of imperative code in theorem provers or something like B method. You’d have to learn something like the “Software Foundations” or “Certified Programming with Dependent Types” books to do that. Whereas, with SPARK’s automation, you just learn to do specifications of what the algorithm does plus experiment with what simplifications are necessary to get it through automated prover. Quantum leap in usability for what assurance you get.

Note: I always say something similar about Rust’s borrow checker. People compare it to GC’s but they have a cost for safety. Proper comparison is to separation or matching logic used to prove temporal safety of low-level programs in C and such. Fighting with borrow-checker a bit is much easier than learning separation logic for most people.

Honest question: What use is writing well over two screenfuls of code to mechanically verify something that you can easily be proven in half a page of paper? I’m all for mechanical verification when it saves work, but here that doesn’t seem to be the case.

There’s an element of “Proof golf” here. Similar to code golf in languages like J/APL where people try to get the shortest or most interesting solution to a problem, so is there a challenge to “How much can I prove”. In real world code maybe you wouldn’t go as far. But by going that far you learn techniques about what is possible if you do need to prove things.

Mechanical proofs are better than “half a page of paper” because it is machine checked. Changes to the code don’t require redoing paper proofs. It can be checked automatically. They are like tests but have the confidence of working across the entire range of possible inputs to the pre/post conditions vs just the input data or edge cases being tested.

Mechanical proofs are better than “half a page of paper” because it is machine checked.

The essence of a proof is in the ideas contained in it, not in the methodology and tools you used to come up with it. For instance, I can understand if you say that a proof is better than another because it applies to a wider class of structures, or because it can be carried out using less complex conceptual machinery. But, to say that a proof is better because it was checked by a machine strikes me as no different from saying that a literature work is more valuable because it was written using a word processor. It doesn’t make much sense.

Changes to the code don’t require redoing paper proofs.

Changes to the code are themselves changes to the proof. What is a ghost procedure, if not a proof object?

It can be checked automatically.

Automatic checking is the first best thing. The zeroth best thing is writing proofs so simple they can be checked by hand effortlessly.

They are like tests but have the confidence of working across the entire range of possible inputs to the pre/post conditions vs just the input data or edge cases being tested.

This applies equally well to handwritten and machine-(verified|generated) proofs.

The simplest argument against your point is that handwritten proofs can have mistakes. It’s the same argument for using type systems instead of “just” conforming to the types.

In an ideal situation, our machine-verified proofs can be easy enough to write that they represent not much more effort than the hand-written ones we might want anyways.

to say that a proof is better because it was checked by a machine strikes me as no different from saying that a literature work is more valuable because it was written using a word processor

I posit that the proof checked by the machine is better because it checks the actual implementation being used. A proof on paper checks what the proof writer thinks the implementation is. It can miss things where what the programmer wrote doesn’t match what they think they wrote.

Changes to the code are themselves changes to the proof.

Not necessarily. For example. Pre and Post conditions at the specification level are not affected by changes in the implementation level.

It can miss things where what the programmer wrote doesn’t match what they think they wrote.

It happened in cryptography with padding that got left out of the elegant, little proofs. The defeat of security properties came through padding issues. Another popular one is where math might use natural numbers but implementers will use modular arithmetic. Another is where they simply can’t handle due to all the work involved but computers can brute force it with a SAT solver or whatever.

They used to do proofs by hand in early, high-assurance security. The process caught problems. Yet, there was a lot of mumbling that one might not trust 200 pages of proof terms, obligations, lemmas, ghost code, whatever is involved. There could be mistakes in there where it looks right but is inconsistent. Heck, the proofs are usually several times larger than the code that we didn’t trust developers to write correctly in the first place. ;) Machine-checked proofs can spot common mistakes with automated analysis of all the material. Some tools can even heuristically hint at or synthesize solutions to your problem. It’s why almost all work in formal verification of software today is done with machine assistance (i.e. proof assistants). Some good example problems in general mathematics of what hand proofs are facing in this article:

There’s also the parallel notion of reducing trust. You want to have to trust as little of the process as possible. That usually means you prevent problems or can catch them with something much simpler than what introduces them. For example, LCF-style provers like Isabelle/HOL and Coq are designed such that everything done during proving can be checked for logical accuracy by a tiny, checking program. That is, the complex process that drives those can be buggy as hell but you’ll know whether it’s done right long as you get one, simple part right. Then, you just have to worry about that component and the specs you feed it which encode the design or properties. This approach has eliminated a lot out of trusted part of code in various projects: parsers, pretty-printers, interpreters, compilers, extraction from specs, basic I/O, data structures, control flow, some side channels recently, and a few even partly verify the verifier itself. It’s getting to the point where the final check might trust only logical specs and a few hundred to a thousand lines of hand-verifiable code. Here’s an example showing how small the trust can be by Appel and Michael on certifying safety of 3rd-party code w/ proofs and tiny verifier:

An industrial example is CompCert verifying compiler for C language. Regehr’s team used Csmith to throw millions of attempts at breaking it and major compilers. Broke them in all kinds of ways with CompCert having just two errors related to bad specs. The implementation and middle end were unusual among compilers: “The striking thing about our CompCert results is that the middle-end bugs we found in all other compilers are absent. As of early 2011, the under-development version of CompCert is the only compiler we have tested for which Csmith cannot find wrong-code errors. This is not for lack of trying: we have devoted about six CPU-years to the task. The apparent unbreakability of CompCert supports a strong argument that developing compiler optimizations within a proof framework, where safety checks are explicit and machine-checked, has tangible benefits for compiler users.”

Now, you can try to do that proof or similar ones checking each step by hand. However, the correctness proof is 31,920 lines long. Isn’t it more reasonable to trust a developer to get a few hundred to a thousand lines of straight-forward code right (in proof-checker) vs 31,920 lines of proof? And how many people can even read that many pieces of boring logic without their eyes glazing over at some point?

It depends on the language. ATS proofs tend to be a bit more entwined in the implementation. Here’s a factorial example. Liquid Haskell looks interesting - it’s quite readable. Both of those are functional languages whereas SPARK tends towards proving more imperative implementations.

A test might be which does an imperative, lower-level programmer understand better and which performs better? Performance/efficiency takes priority here since embedded programmers won’t use anything making unpredictable or wasteful object code.

This is a nice worked example of proving various implementations of the greatest common divisor using SPARK. The ‘Ghost’ in the title refers to using Ghost procedures which are procedures that only exist at proving time, not runtime. Similar to proof functions in ATS.

As in other threads, I’ll warn that the effort on verifying this simple algorithm looks like more work than it is. One must compare it to what it replaces: full, manual verification of imperative code in theorem provers or something like B method. You’d have to learn something like the “Software Foundations” or “Certified Programming with Dependent Types” books to do that. Whereas, with SPARK’s automation, you just learn to do specifications of what the algorithm does plus experiment with what simplifications are necessary to get it through automated prover. Quantum leap in usability for what assurance you get.

Note: I always say something similar about Rust’s borrow checker. People compare it to GC’s but they have a cost for safety. Proper comparison is to separation or matching logic used to prove temporal safety of low-level programs in C and such. Fighting with borrow-checker a bit is much easier than learning separation logic for most people.

Honest question: What use is writing well over two screenfuls of code to mechanically verify something that you can easily be proven in half a page of paper? I’m all for mechanical verification when it saves work, but here that doesn’t seem to be the case.

There’s an element of “Proof golf” here. Similar to code golf in languages like J/APL where people try to get the shortest or most interesting solution to a problem, so is there a challenge to “How much can I prove”. In real world code maybe you wouldn’t go as far. But by going that far you learn techniques about what is possible if you do need to prove things.

Mechanical proofs are better than “half a page of paper” because it is machine checked. Changes to the code don’t require redoing paper proofs. It can be checked automatically. They are like tests but have the confidence of working across the entire range of possible inputs to the pre/post conditions vs just the input data or edge cases being tested.

The essence of a proof is in the ideas contained in it, not in the methodology and tools you used to come up with it. For instance, I can understand if you say that a proof is better than another because it applies to a wider class of structures, or because it can be carried out using less complex conceptual machinery. But, to say that a proof is better because it was checked by a machine strikes me as no different from saying that a literature work is more valuable because it was written using a word processor. It doesn’t make much sense.

Changes to the code are themselves changes to the proof. What is a ghost procedure, if not a proof object?

Automatic checking is the first best thing. The zeroth best thing is writing proofs so simple they can be checked by hand effortlessly.

This applies equally well to handwritten and machine-(verified|generated) proofs.

The simplest argument against your point is that handwritten proofs can have mistakes. It’s the same argument for using type systems instead of “just” conforming to the types.

In an ideal situation, our machine-verified proofs can be easy enough to write that they represent not much more effort than the hand-written ones we might want anyways.

I posit that the proof checked by the machine is better because it checks the actual implementation being used. A proof on paper checks what the proof writer thinks the implementation is. It can miss things where what the programmer wrote doesn’t match what they think they wrote.

Not necessarily. For example. Pre and Post conditions at the specification level are not affected by changes in the implementation level.

It happened in cryptography with padding that got left out of the elegant, little proofs. The defeat of security properties came through padding issues. Another popular one is where math might use natural numbers but implementers will use modular arithmetic. Another is where they simply can’t handle due to all the work involved but computers can brute force it with a SAT solver or whatever.

They used to do proofs by hand in early, high-assurance security. The process caught problems. Yet, there was a lot of mumbling that one might not trust 200 pages of proof terms, obligations, lemmas, ghost code, whatever is involved. There could be mistakes in there where it looks right but is inconsistent. Heck, the proofs are usually several times larger than the code that we didn’t trust developers to write correctly in the first place. ;) Machine-checked proofs can spot common mistakes with automated analysis of all the material. Some tools can even heuristically hint at or synthesize solutions to your problem. It’s why almost all work in formal verification of software today is done with machine assistance (i.e. proof assistants). Some good example problems in general mathematics of what hand proofs are facing in this article:

https://www.sciencenews.org/article/how-really-trust-mathematical-proof

There’s also the parallel notion of reducing trust. You want to have to trust as little of the process as possible. That usually means you prevent problems or can catch them with something much simpler than what introduces them. For example, LCF-style provers like Isabelle/HOL and Coq are designed such that everything done during proving can be checked for logical accuracy by a tiny, checking program. That is, the complex process that drives those can be buggy as hell but you’ll know whether it’s done right long as you get one, simple part right. Then, you just have to worry about that component and the specs you feed it which encode the design or properties. This approach has eliminated a lot out of trusted part of code in various projects: parsers, pretty-printers, interpreters, compilers, extraction from specs, basic I/O, data structures, control flow, some side channels recently, and a few even partly verify the verifier itself. It’s getting to the point where the final check might trust only logical specs and a few hundred to a thousand lines of hand-verifiable code. Here’s an example showing how small the trust can be by Appel and Michael on certifying safety of 3rd-party code w/ proofs and tiny verifier:

https://www.cs.princeton.edu/~appel/papers/flit.pdf

An industrial example is CompCert verifying compiler for C language. Regehr’s team used Csmith to throw millions of attempts at breaking it and major compilers. Broke them in all kinds of ways with CompCert having just two errors related to bad specs. The implementation and middle end were unusual among compilers: “The striking thing about our CompCert results is that the middle-end bugs we found in all other compilers are absent. As of early 2011, the under-development version of CompCert is the only compiler we have tested for which Csmith cannot find wrong-code errors. This is not for lack of trying: we have devoted about six CPU-years to the task. The apparent unbreakability of CompCert supports a strong argument that developing compiler optimizations within a proof framework, where safety checks are explicit and machine-checked, has tangible benefits for compiler users.”

Now, you can try to do that proof or similar ones checking each step by hand. However, the correctness proof is 31,920 lines long. Isn’t it more reasonable to trust a developer to get a few hundred to a thousand lines of straight-forward code right (in proof-checker) vs 31,920 lines of proof? And how many people can even read that many pieces of boring logic without their eyes glazing over at some point?

[Comment removed by author]

It depends on the language. ATS proofs tend to be a bit more entwined in the implementation. Here’s a factorial example. Liquid Haskell looks interesting - it’s quite readable. Both of those are functional languages whereas SPARK tends towards proving more imperative implementations.

Here’s one that cheats a bit by using a recursive language w/ no mention of performance optimizations.

https://www.cs.cornell.edu/courses/cs3110/2011sp/lectures/lec16-verif/gcd.htm

A test might be which does an imperative, lower-level programmer understand better and which performs better? Performance/efficiency takes priority here since embedded programmers won’t use anything making unpredictable or wasteful object code.

No idea.