1. 10

    I was curious as to how this bug could have occurred and found on the compcert c page that

    This part of CompCert (transformation of C source text to CompCert C abstract syntax trees) is not formally verified.

    So it seems CompCert C’s guarantee is a bit weaker than

    the generated executable code behaves exactly as prescribed by the semantics of the source program

    as claimed on http://compcert.inria.fr/

    And it’s more like, “the generated executable behaves exactly as prescribed by the semantics of the AST we parse your C source program into”

    1. 4

      Right: the bug was actually in the parser, not the compiler proper. Maybe I should edit the title?

      Seemed like a big deal anyway.

      Edit: Noted parser in title.

      1. 6

        http://compcert.inria.fr/man/manual001.html#sec10 shows the project structure, and claims the parser is in fact verified.

        Edit: https://xavierleroy.org/publi/validated-parser.pdf has the full story on the verified parser. In brief, it’s verified with respect to an input grammar. Which, in this case, contained an error.

        1. 5

          [CompCert’s parser is] verified with respect to an input grammar

          If that’s so, why doesn’t the commit that fixed this bug contain a change to the grammar or the proof?

          1. 3

            A look through the patch shows I was wrong about it being a grammar bug. Those do happen, but the actual cause of failure was a little more complex here. I should be more careful about speculating out loud!

            All the changes were in the “typed elaborator” phase of the compiler, which takes the raw parsed AST into the type-annotated AST that is the top of the verified stack. It seems like a basic error when handling variable scopes that is triggered by the shadowed variable names.

            Formally reasoning about scope extrusion is difficult and tedious… but I bet verifying the typed elaborator just rose a little higher in the CompCert development priority queue.

          2. 4

            Deleted my other comment since I didnt know parser had been verified. My bad. Related to it, I will say the tally of bugs found in specs vs implementation will remain interesting as the weakest version of the proof says the implementation will be at least as accurate or failed as the specification. The Csmith bugs were spec errors IIRC that the implementation faithfully followed. ;)

            Anyway, such results continue to remind folks formal methods need to be supplemented with other verification techniques.

          3. 2

            I’m being pedantic, but the “runtime error” occurred in a program that was compiled by CompCert (and its parser), so the title still isn’t really accurate. (Just saying “error” would be enough…)

            1. 1

              Hey, no better venue for pedantry than a story like this. :-)

              I’d edit the title again, but I can’t. I’d speculate that there’s some kind of time limit or score threshold on title edits… not going to go read the Lobsters source to find out exactly, though.

          4. -3

            Formal methods is so intellectually dishonest.

            Once I have a translator for language-X, I can claim to have created a formally verified compiler for that language, using the same arguments made by the authors of the Compcert papers.

            Papers involving formal methods should have to pass the same truthfulness and accuracy requirements as soap powder adverts: http://shape-of-code.coding-guidelines.com/2013/03/10/verified-compilers-and-soap-powder-advertising/

            1. 9

              You can’t claim to have created a verified compiler until you have a formal semantics of source, of target, a spec of your transformations, and proof they do what they do. It’s so hard they started with mini-Pascals before working up to C subsets (eg C0->TALC) to the subset in CompCert. There’s also semantics by another team to catch undefined behavior.

              So, you just made that crap up with no evidence. Further, what empirical evidence we have by looking up data in the field shows verification works at reducing defects in various parts of lifecycle. Far as CompCert, Csmith empirically confirmed the CompCert team’s claims of strong defect reduction when they through tons of CPU time at trying to find defects. They found just a few spec errors in CompCert but piles of errors in others. Regehr et al were amazed by its quality.

              Comments about worth of formal methods should have to use the scientific method or something when making claims. In this case:

              1. Define correctness requirements for compilers. There may be several.

              2. Assess how well various compilers met them, how often, and developed with what methods

              3. Determine if formally-verified compilers achieved higher quality than those without it. Also, compare to other QA methods such as code review or unit testing.

              CompCert passes across the board with such criteria. Not that I still want, as in EAL7 requirements or DO-178C, to throw more exhaustive testing and review at it to check for any errors in specs. I wont rely only on one method as empirical data says use diverse array for highest confidence. It’s already orders of magnitude better in defect rate than other compilers, though.

              1. 2

                Formal methods is so intellectually dishonest.

                I think you mean the compcert people.

                Once I have a translator for language-X, I can claim to have created a formally verified compiler for that language, using the same arguments made by the authors of the Compcert papers.

                I think you can only claim that starting from the C code, that what your frontend produces will be as faithfully be converted to machine code as what is technically possible using the best technology available.

                Edit:

                To your articles point about undefined behaviors, I’m pretty sure it is possible to prove absence of those problems in C code using frama-c and acsl. The problems will always be pushed to the boundaries until you have done this for the entire system.

                1. 0

                  I think you mean the compcert people.

                  No, I mean formal methods people in general. The compcert people have done some interesting work, but they are not advertising what they have actually done.

                  … undefined behaviors, I’m pretty sure it is possible to prove absence…

                  I’m not sure about proving, but it should be possible to get a high level of confidence.

                  Yes, there are tools to do this, but compcert did (still does not?) use them (or do it itself these days?).

                  1. 3

                    Try using frama-c with -wp-rte plugin to see what I mean about runtime errors. compcert compiles valid C, other tools help you write valid C.

                    1. 2

                      “ The compcert people have done some interesting work, but they are not advertising what they have actually done.”

                      The papers I read note the specific goals, limitations, and techniques used in their methods. They did publish a lot of stuff. Maybe I missed something. What do you think they’re lying about in what papers about the CompCert tool?

                      “No, I mean formal methods people in general”

                      Much of the formal methods literature done by pros lists their specific methods plus their dependencies and what parts can fail. I rarely see people outside the formal methods community talking reliability or security do that. Looking for “TCB” is actually one of techniques I use in Google searches to find the papers I submit. It’s that rare for it to pop up in papers outside either formal methods or high-assurance systems despite it being foundation of their claims.

                      There’s certainly zealotry to call out. There’s certainly problems with peer review or reproducible results. These are true in most if not all sub-fields of CompSci. Maybe for experimental science in general since a lot of article in recent years are complaining about non-reproducible or fake research. For some reason, though, you only bring it up on formal methods threads. I wonder why you find the other threads acceptable with their imprecise and/or non-reproducible claims but drop in a formal methods thread to throw under the bus researchers who note strengths, weaknesses, and dependencies plus publish quite a bit of source code. Quite a double standard you have, Sir!

              1. 1

                with the exception of the zero-width assertions, with only these features it’s possible to match any POSIX regular expression…

                I’m skeptical that zero-width assertions are an exception. What’s an example of a backreference-free POSIX regexp that does not specify a regular language?

                1. 1

                  POSIX REs per se don’t have backreferences. The point I’m making in that sentence is that POSIX has zero-width assertions (as a feature, they are compatible with regular grammars) but since the toy regexp engine doesn’t have them, any POSIX regexp that depends on them is not replicable with it.

                  1. 1

                    POSIX REs per se don’t have backreferences

                    Do you have a citation?

                    According to http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html :

                    9.5.2 RE and Bracket Expression Grammar

                    This section presents the grammar for basic regular expressions, including the bracket expression grammar that is common to both BREs and EREs. … %token BACKREF L_ANCHOR R_ANCHOR

                    That, as well as other parts of the doc, seem to indicate that back references are part of POSIX regexps.

                    1. 1

                      Sorry, I meant POSIX extended REs. Only basic REs have backreferences. I’ll clear that up when I update the article.

                      (Why does ‘extended’ mean ‘has fewer features’? That’s committee-think for you.)

                    2. 1

                      any POSIX regexp that depends on [zero-width assertions] is not replicable with [a regexp engine that lacks them].

                      I believe this is false. Again, why do you think it isn’t “possible” to transform regexps with zero-width assertions into equivalent regexp without them? Note that your original sentence even specifies, “after applying the appropriate transformation.”

                      1. 1

                        Can you explain a method to transform arbitrary uses of zero-width assertions (^, $, and \b) into arbitrary regexps without them?

                        1. 1

                          A straightforward but inefficient method is to convert the source regexp to a DFA, then convert the DFA back to a regexp that uses only union, concatenation, and star.

                    1. 1

                      With Go and No-Limit Poker falling to bots I don’t think there are any games left where humans dominate bots, just games that bots haven’t been written for yet.

                      1. 3

                        Ha! I can’t wait to see the Starcraft results. The complexity is way higher than these toy games that so simplify the interactions between combatants. I want to see it combine 3D object recognition, planning, replanning, detecting/making bluffs, and so on. These games, especially simple mechanics + billions of moves analyzed, make it too easy for the AI’s to outdo humans. I want to see harder situations closer to the messiness that is the real world. Then, see them do the same with as little data as we work with on the games themselves.

                        1. 2
                          1. 2

                            I know. So far, though, all the best bots for Starcraft were defeated by humans trivially despite defeating other bots with amazing displays of micro or planning prowess. The humans just spotted their patterns then worked around them. Occasionally, they bluffed them to hilarious end.

                            An example was the bot who did battles between units based on a scoring mechanism to determine the strength of one, a group, and so on. Human sensed this. Counter was to send large group of units at an enemy group to make them scatter. Then, split that group into smaller ones to go after each individual enemy unit. Scoring made enemy units weaker than what was sent after them. So, enemy units always would flee without firing a single shot back at human players despite actually being able to defeat a bunch of them. Another example was even simpler where human just ran some units randomly through the enemy bases while building an army to hit them. Enemy AI was unprepared as it wasted all its focus countering the “attack” since it couldn’t call a bluff.

                            Examples;

                            https://www.cs.mun.ca/~dchurchill/starcraftaicomp/history.shtml

                          2. 2

                            I agree with Nick that we’re far from bots dominating all games. I’m not sure Starcraft will hold out that long once they get going on it because computers have an inherent edge in the reflex-based/micro parts of the game.

                            I’d love to see someone work on competitive bots for more complex games of imperfect information, like Magic or Netrunner. Although I suppose computers have a similar edge to the one they have in Starcraft, in that they can count cards and compute probabilities much faster and more reliably that humans.

                            I’m also interested in competitive card games because metagaming (i.e. having a good understanding of the dominant strategies and styles of play, how to beat the others with yours, etc.) is necessary to do well. How will we give the machines metagame knowledge? Having a metagame also complicates the definition of “fair” for human vs. computer play.

                            1. 2

                              Indeed, even amateur bot authors report a significant edge:

                              This achieves an “idle” APM of about 500 and its battle APM is between 1000 and 2000 (as measured by SC2’s replay counter.)

                              I’m not into SC2, but I found some fans saying tournament-level humans operate around 150 APM with spikes into the 300-600 range.

                              I think metagaming is a distinction without a difference to AIs like Alpha Go.

                              1. 1

                                I think metagaming is a distinction without a difference to AIs like Alpha Go.

                                I don’t think you’re correct, at least not for all games.

                                Consider this scenario:

                                I’m trying to build an AI for a rock-paper-scissors-lizard-spock tournament. An AI like Alpha Go will ultimately find the optimal strategy: pick uniformly at random. However humans suffer from biases, so a human who has studied their opponent’s habits may place better in the tournament than the AI.

                                In games like Go, this doesn’t seem to end up being particularly important. In games like Starcraft, I could be convinced that it’s not too important. In games like MtG, I think it’s quite important, at least until the AI successfully models deck construction as part of the game, which is qualitatively different than what Alpha Go has already done.

                                1. 2

                                  There’s metagame in Chess and Go, though they’re certainly much smaller than in MtG. The popularity of openings and responses to common gambits has changes over the years just like it does in MtG. It didn’t matter.

                                  “Metagame” is a convenient distinction for people to talk about playing matches in the larger game of deckbuilding, but that doesn’t make it any less of a game. The task of analyzing and responding to “metagame” strategies is exactly the same as the task of playing the live game itself.

                                  It’s going to be years before a state-of-the-art AI implementer turns to MtG, but I’m not going out on a limb when I say that humans will lose.

                                  1. 1

                                    but I’m not going out on a limb when I say that humans will lose.

                                    I agree that humans will lose eventually, but I think your original claim of “distinction without a difference” is too strong, because it asserts that humans will lose to AI that is largely of the same form as the AI now.

                                    I would not be surprised (though I am not confident it is the case) if the success of Alpha Go’s architecture is limited in MtG and if competing with humans requires a different architecture.

                              2. 1

                                computers have an inherent edge in the reflex-based/micro parts of the game

                                Tournament organizers might initially allow computers to have that edge, but eventually, to keep the games interesting, I expect they would cap the AI’s actions per minute to the level of a top human.

                                1. 3

                                  That’s what I recommend at first. It will help us assess what they can pull off given similar constraints to humans. I’m sure someone will complain that it’s basically cheating for humans because the computers could do better. In that case, I would be fair by limiting the computer to the size and energy consumption of the human brain but removing the actions per minute limit. The bot authors would ask to reinstate the APM limit instead.

                              3. 1

                                I want to see it combine 3D object recognition

                                Here’s an approach.

                              4. 1

                                Your comment made me lookup the state of Arimaa, a game designed to be difficult for computers using then-standard techniques: TIL that computers beat humans at it in 2015 (source).

                              1. 3

                                http://www.keithschwarz.com/darts-dice-coins/ gives a step-by-step derivation of the alias method from simpler algorithms.

                                1. 1

                                  thanks! my reaction to discovering the algorithm was precisely the same as the author’s:

                                  Second, and perhaps more surprisingly, this algorithm has been known for decades, but I had not once encountered it! Considering how much processing time is dedicated to simulation, I would have expected this technique to be better- known.

                                1. 11

                                  [F]rameworks… Or libraries. I never understood the difference.

                                  Your code calls libraries. A framework calls your code.