1. 12
  1.  

  2. 8

    This has been posted a couple times, but I think is too long to get much traction, which is a shame, since it’s a great read - some highlights:


    Checking code deeply requires understanding the code’s semantics. The most basic requirement is that you parse it. Parsing is considered a solved problem. Unfortunately, this view is naïve, rooted in the widely believed myth that programming languages exist.


    Many (all?) compilers diverge from the standard. Compilers have bugs. Or are very old. Written by people who misunderstand the specification (not just for C++). Or have numerous extensions. The mere presence of these divergences causes the code they allow to appear. If a compiler accepts construct X, then given enough programmers and code, eventually X is typed, not rejected, then encased in the code base, where the static tool will, not helpfully, flag it as a parse error.

    The tool can’t simply ignore divergent code, since significant markets are awash in it. For example, one enormous software company once viewed conformance as a competitive disadvantage, since it would let others make tools usable in lieu of its own. Embedded software companies make great tool customers, given the bug aversion of their customers; users don’t like it if their cars (or even their toasters) crash. Unfortunately, the space constraints in such systems and their tight coupling to hardware have led to an astonishing oeuvre of enthusiastically used compiler extensions.


    If divergence-induced parse errors are isolated events scattered here and there, then they don’t matter. An unsound tool can skip them. Unfortunately, failure often isn’t modular. In a sad, too-common story line, some crucial, purportedly “C” header file contains a blatantly illegal non-C construct. It gets included by all files. The no-longer-potential customer is treated to a constant stream of parse errors as your compiler rips through the customer’s source files, rejecting each in turn. The customer’s derisive stance is, “Deep source code analysis? Your tool can’t even compile code. How can it find bugs?”


    The award for most widely used extension should, perhaps, go to Microsoft support for precompiled headers. Among the most nettlesome troubles is that the compiler skips all the text before an inclusion of a precompiled header. The implication of this behavior is that the following code can be compiled without complaint:

    I can put whatever I want here. It doesn’t have to compile. If your compiler gives an error, it sucks. #include <some-precompiled-header.h>


    Do bugs matter? Companies buy bug-finding tools because they see bugs as bad. However, not everyone agrees that bugs matter. The following event has occurred during numerous trials. The tool finds a clear, ugly error (memory corruption or use-after-free) in important code, and the interaction with the customer goes like thus:

    “So?”

    “Isn’t that bad? What happens if you hit it?”

    “Oh, it’ll crash. We’ll get a call.” [Shrug.]

    If developers don’t feel pain, they often don’t care. Indifference can arise from lack of accountability; if QA cannot reproduce a bug, then there is no blame. Other times, it’s just odd:

    “Is this a bug?”

    “I’m just the security guy.”

    “That’s not a bug; it’s in third-party code.”

    “A leak? Don’t know. The author left years ago…”

    1. 3

      This is a good one too:

      “Why is it when I run your tool, I have to reinstall my Linux distribution from CD?”

      This was indeed a puzzling question. Some poking around exposed the following chain of events: the company’s make used a novel format to print out the absolute path of the directory in which the compiler ran; our script misparsed this path, producing the empty string that we gave as the destination to the Unix “cd” (change directory) command, causing it to change to the top level of the system; it ran “rm -rf *” (recursive delete) during compilation to clean up temporary file

      I recall Mozilla seemed to put a lot support into Elsa/Elkhound and some of McPeeks other stuff (a static analyzer something to do with pork?). I often wondered about that. Lately I’ve been forced to use a static analysis service to scan binaries, has the industry given up on source code static analysis?

      1. 1

        Well, Mozilla created Rust because they felt it was the only viable way forward I guess.

        Oh, also Facebook created Infer: https://fbinfer.com/

        1. 1

          Rust started as a personal project that then eventually got funded by Mozilla Research. Not sure that counts as “created”.

          1. 1

            I think it counts. Rust in its current form, influenced by Mozilla’s needs, is pretty far from GH’s original Rust which was (my impression) a C-ified OCaml.

            EDIT: although I should correct myself on Infer, technically FB acquired it when they bought Monoidics.

        2. 1

          has the industry given up on source code static analysis?

          If anything, there’s more going on in that field than there used to be. There’s lots of products to choose from. The number that have low noise is higher than before. Now, we also have tools like RV-Match built on a formal semantics of C, tools like Facebook’s Infer that work on massive codebases, and academics continuing to build better prototypes. The latter were converging on common foundations, too. Other than algorithms, powerful hardware that can scale up or out is cheaper than ever with clouds having some, too.

          It’s the golden age of static analysis. Building on above combined with automated, test generation is what I’d have focused my work on had not the crisis shifted my priorities a bit.