I worked a little bit with formal verification undergrad. I asked this exact question to the grad students I was working with: “How do you know that your specification is correct?”. They had no idea how to answer the question for the general case, but if you are building a library used by other systems which are formally verified, the answer is simple: if the end user can prove the things they want to prove about their system using yours, then it is correctly specified.

Basically kick the can down the line. If the consumers of your proof are happy, then mission accomplished. It is interesting to think that proofs can have dual purpose as axioms for other proofs and as verification that the code works.

I think this article undersells tests. The critiques of tests all make sense, but the conclusion that you should toss out most of your test suite in favor of properties doesn’t seem right to me.

Until we have automated systems for proving that an implementation precisely matches a formal spec (or alternately, a general-purpose way to generate complete implementations of complex systems from formal specs) there will be implementation bugs that no amount of spec analysis can possibly uncover. Tests are not the best way to prove that a specification is correct, but they’re still a pretty good way to increase confidence that the code does what you expect it to, sanity-check that you haven’t broken existing functionality in the course of making a change, expose places where your implementation is awkward to interact with, and identify components whose implementations are unnecessarily coupled. The fact that tests can also increase confidence that the high-level spec is what you want is a bonus, but it’s rarely the primary motivation for writing tests.

We also actually understood the desired requirements here, what if we don’t even have a firm grasp on that?

Glad this line was included. In a lot of environments, this is the biggest problem by an overwhelming margin, but sadly it’s mostly a people problem, not a technical one.

The post shows the visualizer for the counterexample but you can also see valid examples. It’s been a while since I’ve used Alloy but my first step after writing a spec was to check out some valid examples. Often I would see an example that shouldn’t be valid and was a misspecification. For me, that was one of Alloy’s greatest strengths.

I worked a little bit with formal verification undergrad. I asked this exact question to the grad students I was working with: “How do you know that your specification is correct?”. They had no idea how to answer the question for the general case, but if you are building a library used by other systems which are formally verified, the answer is simple: if the end user can prove the things they want to prove about their system using yours, then it is correctly specified.

Basically kick the can down the line. If the consumers of your proof are happy, then mission accomplished. It is interesting to think that proofs can have dual purpose as axioms for other proofs and as verification that the code works.

I think this article undersells tests. The critiques of tests all make sense, but the conclusion that you should toss out most of your test suite in favor of properties doesn’t seem right to me.

Until we have automated systems for proving that an implementation precisely matches a formal spec (or alternately, a general-purpose way to generate complete implementations of complex systems from formal specs) there will be implementation bugs that no amount of spec analysis can possibly uncover. Tests are not the best way to prove that a specification is correct, but they’re still a pretty good way to increase confidence that the code does what you expect it to, sanity-check that you haven’t broken existing functionality in the course of making a change, expose places where your implementation is awkward to interact with, and identify components whose implementations are unnecessarily coupled. The fact that tests can also increase confidence that the high-level spec is what you want is a bonus, but it’s rarely the primary motivation for writing tests.

Glad this line was included. In a lot of environments, this is the biggest problem by an overwhelming margin, but sadly it’s mostly a people problem, not a technical one.

The post shows the visualizer for the counterexample but you can also see valid examples. It’s been a while since I’ve used Alloy but my first step after writing a spec was to check out some valid examples. Often I would see an example that shouldn’t be valid and was a misspecification. For me, that was one of Alloy’s greatest strengths.