1. 15

  2. 2

    This is a great piece with some definitely actionable advice. The one problem I have with it is it implies issues like developers ignoring the warnings are problems with static analysis. Those are people problems that should be fixed by management. It’s long known that you get more quality by getting buy-in of senior management to put it in company culture or require at least a percentage of time spent on it. In my company, there’s certain reviews and reports that have to get done with managers checking a sample of them to make sure we didn’t pretend to do them. If someone ignores reviews or problems, management takes action to force them to address them ranging from a warning to a write-up with more management review of their activities to termination after lots of write-ups.

    The same thing could, probably should, be done in a company like Google. That changes the bar to integrating static analysis as a step in their build system. Note that a lot of good advice in the article still applies at that point about suppressing false positives, prioritizing/triaging bugs, focusing on analyses that comes with fixes, allowing customization, and so on. These would still benefit given you get more out of tools the developers like than are just forced to use. They could even split analysis between the quick, compiler-focused tools they developed and the slower, more-thorough tools that run over time. About everything they’re doing still applies. They just get even more results when people are pushed to address quality issues.

    And, while we’re at it, I’ve always wondered why Google didn’t just buy one of the top vendors of static analysis tools. They could buy them, make first priority doing stuff like in the article to improve usability or integration with Google, that gets released to other customers that are paying, and tools continue to improve with both the vendor’s revenue and contributions from Google in article. That’s on top of the improvements from CompSci folks those vendors pick up regularly. At this point, though, they’ve put so much into their tooling with good results that they probably couldn’t justify such a purchase to management. Might have been a good idea earlier.

    1. 3

      The article focused a lot on the experiments to integrate static analysis into developers workflow. That is addressing the people problem. Since when does senior management have much influence on how programmers work?

      1. 2

        Of course management have a lot of influence on how programmers work. If not, management is incompetent.

        1. 3

          Competent management will fund effort to incorporate improvements into existing workflows rather than telling programmers that they need to change their patterns and tools.

          1. 2

            Management can influence programmer incentive to change false positive rate tolerance, which Google reports as 10% in this article.

            Let’s say you have 20% false positive rate analysis. Google’s approach is to wait until the analysis improves. I think what nickpsecurity is saying is that with management buy-in, you can successfully deploy 20% analysis. Since the analysis catches real bugs 80% of the time, this can improve software quality a lot. I believe this actually happened with Microsoft and SAL.

            1. 1

              Exactly. Microsoft’s SDL was a great example that dramatically reduced 0-days and crashes in Windows kernel by embedding strong reviews into each part of the development process. Interestingly enough, it was actually optimized for development pace. The mandatory use of their driver verifier also eliminated most blue screens. Earlier, OpenVMS teams alternated one week to build features, run huge number of tests over weekend, a week fixing, and repeat. Great reliability. IBM also had mandated inspections and verifications under Fagan’s Inspection Process and Mills’ Cleanroom, respectively.

              Far as static analysis, it’s really common in safety-critical industry to force developers to use and review the output of the tools. They know they lose some time to fighting with false positives. Yet, that they deliver low-defect code day in and day out with workflows that include strong reviews, machine analysis, and testing proves it can be done. One NASA team actually used four, static analyzers with author saying they each caught things others missed. There’s also tools that focus on minimizing number of false positives so they don’t overload developers. If that was a priority, then a company could use a mix of those to start with. That’s actually something I advocate more often these days given so much developer resistance.

              Edit: Wait, I thought you meant SDL with SAL being a typo. Your other comment makes me think you might have intentionally been talking about benefits of SAL. So, count this as complementary evidence since both brought major benefits.

        2. 1

          Since when does senior management have much influence on how programmers work?

          A combo of senior and middle management already dictated all kinds of practices at Google from using the monorepo or internal tools to their job compensation and performance ratings. They can mandate more time addressing bugs, too. It’s happened at other companies. Most write-ups on getting companies to do more QA in software also mention the importance of senior management buy-in so everyone is pushed from top-down to keep it a priority. Without it, people ignoring it might get away with it.

        3. 2

          To your first point, I think that they partially address this with their points on developer happiness. Google is big enough that they have to consider the effects of onboarding a developer that doesn’t have any experience with static analysis tools or that has had negative experiences with them. You want your tooling to be perceived by all developers as adding value (the vast majority of the time) so that everyone feels comfortable with it being part of the workflow. Anything short of that adds to the friction and google could lose a developer that has already gotten over the hurdles of the hiring process over… toolchain arguments. Just to be clear, it do think the answer is more and more usable static analysis, but getting people to change their mind about things is more than just a simple engineering challenge.

          And to your final point - I suspect it’s fear of messing with the secret sauce. I heard (somewhere) that Google, Microsoft and other places with gigantic c/c++ code bases already are the biggest paying customers of static analysis tools but are content to buy without exposing what’s behind the curtain.

          1. 1

            Anything short of that adds to the friction and google could lose a developer that has already gotten over the hurdles of the hiring process over… toolchain arguments.

            I see where you’re going there as this is a valid concern. I think I just disagree about what they should be optimizing for. In this case, we’re talking one of the companies that almost everyone fights to join to get its salaries, perks, and prestige. I think a small fraction of the day vetting reports from static analysis won’t make most productive Googlers quit. If it does, I predict their places will be taken by others that will accept a QA step. In some companies, there’s usually even specific people or teams that do this sort of thing working alongside the other developers.

            “ I heard (somewhere) that Google, Microsoft and other places with gigantic c/c++ code bases already are the biggest paying customers of static analysis tools but are content to buy without exposing what’s behind the curtain.”

            Might be true but I can’t assume too much. Microsoft is heavily invested in them via Microsoft Research. They publish a lot of reports on their activities and even FOSS some of it. Microsoft SAL sanxiyn mentioned, PREfix/PREfast (older), Dafny, VCC (used in Hyper-V), Code Contracts (Design-by-Contract), F*, Midori project, and so on come to mind. A lot of their stuff requires more significant investment, though, since it aims for stronger correctness. The SDL process, driver verification, and Code Contracts were examples that didn’t add too much overhead for their standard, development pace and priorities. Microsoft also does writeups on using their tools with a random example I just got out of Google.

          2. 2

            If you’re going to start firing programmers for ignore static analysis warnings instead of trusting their judgement, you’d wanna pray for a serious improvement in static analysis.

            And your best programmers will probably quit, so there’s that.

            1. 4

              Microsoft deployed SAL annotation top-down, resulting in much improved security of Windows. The initial SAL paper reports 28% false positive rate.

              I claim: even with high false positive rate, forcing programmers to fix static analysis warnings work. Serious improvement in static analysis is welcome but not necessary. Also, as far as I know, best programmers in Microsoft didn’t quit over SAL.

              1. 1

                I think adding a contract system into your source tree wouldn’t really come under the umbrella of adding analysis to existing code, but either way I didn’t know about SAL and found it super-interesting, cheers!