1. 5

Rebuttal by people backing program verification to the prior submission from 1979 arguing it was useless. They give examples to support their claim of it being practical and useful.


  2. 3

    The question is not whether these techniques are practical and useful, but whether they are cost effective (after discounting the claims that soap powder manufacturers would not get away with).

    Would you buy second hand software from a formal methods researcher?

    Does using formal methods mean anything?

    1. 1

      whether they are cost effective

      For whom? Doesn’t that depend on a budget and an assessment of the cost of failure?

      1. 1

        The vendor gets to make the cost decision, based on what the client is willing to pay.

      2. 1

        I don’t trust soap claims any more than the FDA. Plus, advertisers are among most full of shit group out there. They claim they’re about one thing but really only care about pushing product. It’s always weird to me you’re using that as the point of comparison instead of evaluations like Underwriter Laboratories in Consumer Reports or something. I’m glad most formal methods research uses techniques with more evidence of effectivenss than soap ads.

        Far as your other links, I’m already in agreement something like an unverified, C parser needs extensive validation before anything depending on it is “verified” in any way. On the errors in proofs link, I noticed you’re not differentiating between two things even that 1979 critique did: human-oriented proofs that leave out key details vs proofs done in systems that machine-check each low-level step for consistency. Somewhere between most or all of the proof errors I’ve seen look like they’re in the former. The errors in the latter people have showed me were only specification errors so far despite a lot of proof terms. The LCF/machine-check approach that “most mathematicians” avoid out of tradition before making their mistakes seems like it would help them out. That means your claim, if matched against empirical data, weakens to “there could be errors in the specs while the proofs are usually robust.” And then fairness would note formal specs usually reduce errors over informal ones. And then we’re at the cost/effectiveness assessment which is a fair critique for average use case that wouldn’t tolerate the cost.

        Another problem I saw was that VDM was a major player but not really the best point of comparison for cost-effectiveness. Cleanroom, Praxis’ work, or maybe SPIN on distributed systems would be. Those three had most industrial success with these two for regular software: Cleanroom studies that showed low defects at good cost-effectiveness, sometimes no extra cost; Praxis’ Correct-by-Construction with Z and SPARK Ada with tiny, defect rates in delivered code at 50% premium. Both had companies warranting their code at specific defect rates, too. Cleanroom faded due to IBM charging for the method and I hear less of Praxis stuff after Altran bought them out. Oh well.

        Since plenty of SPARK case studies, I just got Stavely’s Cleanroom book to review that method. Most approachable one I’ve seen yet with key features that would give a lower defect rate based on results of prior methods that were similar. However, the careful, algebraic methods might fall apart on modern libraries or big, messy problems requiring basic science to handle effectively in realistic constraints. ;) I’m mainly reading it to assess it for a future study with a modern language and tooling to test effectiveness in a variety of scenarios. Especially combined with something like SPARK or Haskell.