I don’t understand your skepticism about CakeML and and correctness. From what I heard and can read around it’s a
compiler that has verified passes in the sense that they proved theorems about the behavior that can happen in every one of them. In the end you can combine all you proofs about the little passes in a bigger proof for you compiler and prove your compiler correct up to the specification introduced by the theorems.

What does it mean to prove a theorem about some code?

You might prove,say, that some algorithm never exceeds some limit; and the code is an implementation of that algorithm. So, modulo bugs in the code, it will never exceed the limit proved.

This does not prove the code is correct, only that the limit is not exceeded.

What can be said about a sequence of chunks of code, each associated with an algorithm that has an associated proof of some property? Individually they have the proven properties, but as a whole? What of emergent behavior?

This argument can be used to attack pretty much any security guarantee provided by everything from cryptography to access control to formal proofs. You’re not really saying anything here and you’re certainly not actually digging into the proof to point out gaps in their logic. Its a value free over-generalization that adds nothing to the conversation. Flaws in methodology, even formal methods methodology, do not automatically make a method “snake oil” as your original post put it.

He did eventually tell me what he thought about that. Lobsters search isn’t giving me the exact comment, though. He’s an empiricist that wants to see theories confirmed with experimental data. Most paper on formal verification do proof but no testing or even peer review. Write-up’s like Guttman’s showed some historical programs with proofs failed on one execution or inspection. My current guess is that he’d prefer each of these theorems to have a good number of experiments and review showing that they work. Then something similar for each integration with or layer on top of them.

An example of some validation of verified program was Csmith testing of multiple compilers, including CompCert. The testing found piles of errors in other compilers but only 2 or 3 in CompCert’s specs (not just code). This offered, for such inputs, evidence that CompCert’s methods had high quality, may be better than others (apples to oranges comparison), and code didn’t add additional errors on top of spec [1].

Although a fan of verification research, I support the position I just described. A mix of methods was the default for high-assurance security and other reliability/security-focused projects in the past. Here’s an example (pdf) showing why: different methods catch different problems. It’s the assurance version of defense in depth. In another example, the seL4 team put all their eggs in the formal verification basket. Whereas, INTEGRITY-178B had to do a lot more to get certified at EAL6+ High Robustness. The requirements, listed in bottom right boxes, included a mix of formal methods, thorough testing, leak prevention/analysis, and independent pentesting. My own requirements for high-assurance are a superset of what they did.

Oh OK. I can’t evaluate that since I can’t read their specs and theorems. If I validated it, I was going to recruit one of them to just tell me what each means so I could write test generators and checkers for them. Manly was interested in WordLang and DataLang as verified backends for non-ML compilers.

For anyone who wants an actual understanding of whats involved here, go to the CakeML page, download the code, and compile it, and start disassembling it to see what it is that these sort of techniques are doing on the processor level. Then step through the proof. Work like this takes literally years to accomplish, and its going to take even sophisticated users, i.e. programmers, mathematicians, security folks, more then the ten minutes you spend flying through the paper.

The claims in this paper would not get past the advertising standards authority.

The usual development process was followed, i.e., specification, implementation, compilation.

Then some theorems were stated. Voila, the code is verified.

As always with formal methods snake oil, the term correctness keeps popping up. They even have a “single end-to-end correctness theorem”.

I don’t understand your skepticism about CakeML and and correctness. From what I heard and can read around it’s a compiler that has verified passes in the sense that they proved theorems about the behavior that can happen in every one of them. In the end you can combine all you proofs about the little passes in a bigger proof for you compiler and prove your compiler correct up to the specification introduced by the theorems.

What does it mean to prove a theorem about some code?

You might prove,say, that some algorithm never exceeds some limit; and the code is an implementation of that algorithm. So, modulo bugs in the code, it will never exceed the limit proved.

This does not prove the code is correct, only that the limit is not exceeded.

What can be said about a sequence of chunks of code, each associated with an algorithm that has an associated proof of some property? Individually they have the proven properties, but as a whole? What of emergent behavior?

This argument can be used to attack pretty much any security guarantee provided by everything from cryptography to access control to formal proofs. You’re not really saying anything here and you’re certainly not actually digging into the proof to point out gaps in their logic. Its a value free over-generalization that adds nothing to the conversation. Flaws in methodology, even formal methods methodology, do not automatically make a method “snake oil” as your original post put it.

He did eventually tell me what he thought about that. Lobsters search isn’t giving me the exact comment, though. He’s an empiricist that wants to see theories confirmed with experimental data. Most paper on formal verification do proof but no testing or even peer review. Write-up’s like Guttman’s showed some historical programs with proofs failed on one execution or inspection. My current guess is that he’d prefer each of these theorems to have a good number of experiments and review showing that they work. Then something similar for each integration with or layer on top of them.

An example of

somevalidation of verified program was Csmith testing of multiple compilers, including CompCert. The testing found piles of errors in other compilers but only 2 or 3 in CompCert’s specs (not just code). This offered, for such inputs, evidence that CompCert’s methods had high quality, may be better than others (apples to oranges comparison), and code didn’t add additional errors on top of spec [1].Although a fan of verification research, I support the position I just described. A mix of methods was the default for high-assurance security and other reliability/security-focused projects in the past. Here’s an example (pdf) showing why: different methods catch different problems. It’s the assurance version of defense in depth. In another example, the seL4 team put all their eggs in the formal verification basket. Whereas, INTEGRITY-178B had to do a lot more to get certified at EAL6+ High Robustness. The requirements, listed in bottom right boxes, included a mix of formal methods, thorough testing, leak prevention/analysis, and independent pentesting. My own requirements for high-assurance are a superset of what they did.

[1] @derek-jones, am I properly qualifying that?

My response has nothing to do with experimental proof.

The claims being made (e.g., correctness) are not justified by the theory work that has been done.

Oh OK. I can’t evaluate that since I can’t read their specs and theorems. If I validated it, I was going to recruit one of them to just tell me what each means so I could write test generators and checkers for them. Manly was interested in WordLang and DataLang as verified backends for non-ML compilers.

For anyone who wants an actual understanding of whats involved here, go to the CakeML page, download the code, and compile it, and start disassembling it to see what it is that these sort of techniques are doing on the processor level. Then step through the proof. Work like this takes literally years to accomplish, and its going to take even sophisticated users, i.e. programmers, mathematicians, security folks, more then the ten minutes you spend flying through the paper.