1. 18
  1.  

  2. 10

    I was curious as to how this bug could have occurred and found on the compcert c page that

    This part of CompCert (transformation of C source text to CompCert C abstract syntax trees) is not formally verified.

    So it seems CompCert C’s guarantee is a bit weaker than

    the generated executable code behaves exactly as prescribed by the semantics of the source program

    as claimed on http://compcert.inria.fr/

    And it’s more like, “the generated executable behaves exactly as prescribed by the semantics of the AST we parse your C source program into”

    1. 4

      Right: the bug was actually in the parser, not the compiler proper. Maybe I should edit the title?

      Seemed like a big deal anyway.

      Edit: Noted parser in title.

      1. 6

        http://compcert.inria.fr/man/manual001.html#sec10 shows the project structure, and claims the parser is in fact verified.

        Edit: https://xavierleroy.org/publi/validated-parser.pdf has the full story on the verified parser. In brief, it’s verified with respect to an input grammar. Which, in this case, contained an error.

        1. 5

          [CompCert’s parser is] verified with respect to an input grammar

          If that’s so, why doesn’t the commit that fixed this bug contain a change to the grammar or the proof?

          1. 3

            A look through the patch shows I was wrong about it being a grammar bug. Those do happen, but the actual cause of failure was a little more complex here. I should be more careful about speculating out loud!

            All the changes were in the “typed elaborator” phase of the compiler, which takes the raw parsed AST into the type-annotated AST that is the top of the verified stack. It seems like a basic error when handling variable scopes that is triggered by the shadowed variable names.

            Formally reasoning about scope extrusion is difficult and tedious… but I bet verifying the typed elaborator just rose a little higher in the CompCert development priority queue.

          2. 4

            Deleted my other comment since I didnt know parser had been verified. My bad. Related to it, I will say the tally of bugs found in specs vs implementation will remain interesting as the weakest version of the proof says the implementation will be at least as accurate or failed as the specification. The Csmith bugs were spec errors IIRC that the implementation faithfully followed. ;)

            Anyway, such results continue to remind folks formal methods need to be supplemented with other verification techniques.

          3. 2

            I’m being pedantic, but the “runtime error” occurred in a program that was compiled by CompCert (and its parser), so the title still isn’t really accurate. (Just saying “error” would be enough…)

            1. 1

              Hey, no better venue for pedantry than a story like this. :-)

              I’d edit the title again, but I can’t. I’d speculate that there’s some kind of time limit or score threshold on title edits… not going to go read the Lobsters source to find out exactly, though.

          4. -3

            Formal methods is so intellectually dishonest.

            Once I have a translator for language-X, I can claim to have created a formally verified compiler for that language, using the same arguments made by the authors of the Compcert papers.

            Papers involving formal methods should have to pass the same truthfulness and accuracy requirements as soap powder adverts: http://shape-of-code.coding-guidelines.com/2013/03/10/verified-compilers-and-soap-powder-advertising/

            1. 9

              You can’t claim to have created a verified compiler until you have a formal semantics of source, of target, a spec of your transformations, and proof they do what they do. It’s so hard they started with mini-Pascals before working up to C subsets (eg C0->TALC) to the subset in CompCert. There’s also semantics by another team to catch undefined behavior.

              So, you just made that crap up with no evidence. Further, what empirical evidence we have by looking up data in the field shows verification works at reducing defects in various parts of lifecycle. Far as CompCert, Csmith empirically confirmed the CompCert team’s claims of strong defect reduction when they through tons of CPU time at trying to find defects. They found just a few spec errors in CompCert but piles of errors in others. Regehr et al were amazed by its quality.

              Comments about worth of formal methods should have to use the scientific method or something when making claims. In this case:

              1. Define correctness requirements for compilers. There may be several.

              2. Assess how well various compilers met them, how often, and developed with what methods

              3. Determine if formally-verified compilers achieved higher quality than those without it. Also, compare to other QA methods such as code review or unit testing.

              CompCert passes across the board with such criteria. Not that I still want, as in EAL7 requirements or DO-178C, to throw more exhaustive testing and review at it to check for any errors in specs. I wont rely only on one method as empirical data says use diverse array for highest confidence. It’s already orders of magnitude better in defect rate than other compilers, though.

              1. 2

                Formal methods is so intellectually dishonest.

                I think you mean the compcert people.

                Once I have a translator for language-X, I can claim to have created a formally verified compiler for that language, using the same arguments made by the authors of the Compcert papers.

                I think you can only claim that starting from the C code, that what your frontend produces will be as faithfully be converted to machine code as what is technically possible using the best technology available.

                Edit:

                To your articles point about undefined behaviors, I’m pretty sure it is possible to prove absence of those problems in C code using frama-c and acsl. The problems will always be pushed to the boundaries until you have done this for the entire system.

                1. 0

                  I think you mean the compcert people.

                  No, I mean formal methods people in general. The compcert people have done some interesting work, but they are not advertising what they have actually done.

                  … undefined behaviors, I’m pretty sure it is possible to prove absence…

                  I’m not sure about proving, but it should be possible to get a high level of confidence.

                  Yes, there are tools to do this, but compcert did (still does not?) use them (or do it itself these days?).

                  1. 3

                    Try using frama-c with -wp-rte plugin to see what I mean about runtime errors. compcert compiles valid C, other tools help you write valid C.

                    1. 2

                      “ The compcert people have done some interesting work, but they are not advertising what they have actually done.”

                      The papers I read note the specific goals, limitations, and techniques used in their methods. They did publish a lot of stuff. Maybe I missed something. What do you think they’re lying about in what papers about the CompCert tool?

                      “No, I mean formal methods people in general”

                      Much of the formal methods literature done by pros lists their specific methods plus their dependencies and what parts can fail. I rarely see people outside the formal methods community talking reliability or security do that. Looking for “TCB” is actually one of techniques I use in Google searches to find the papers I submit. It’s that rare for it to pop up in papers outside either formal methods or high-assurance systems despite it being foundation of their claims.

                      There’s certainly zealotry to call out. There’s certainly problems with peer review or reproducible results. These are true in most if not all sub-fields of CompSci. Maybe for experimental science in general since a lot of article in recent years are complaining about non-reproducible or fake research. For some reason, though, you only bring it up on formal methods threads. I wonder why you find the other threads acceptable with their imprecise and/or non-reproducible claims but drop in a formal methods thread to throw under the bus researchers who note strengths, weaknesses, and dependencies plus publish quite a bit of source code. Quite a double standard you have, Sir!