1. 29
  1.  

  2. 15

    “The reason for this high percentage is because Windows has been written mostly in C and C++, “

    That’s part of the reason. The other part is that Microsoft is intentionally not spending enough money applying the tools for finding bugs or proving their absence in C/C++ code. Tools that Microsoft can afford to not only license: they could acquire the entire company, apply it to their code base, do all the fixes w/ false positives identified, and still have billions of dollars. They might have even cornered the market on C/C++ analyzers buying all of them at some point. They just don’t care.

    Corporate apathy from the product divisions of Microsoft are the major cause here. MS Research should be highlighted for steadily inventing awesome, more-practical-than-usual tools for these things. One knocked out most of their driver issues (aka blue screens). The product division doesn’t adopt more, even the automated stuff. Even Midori, with all its advances, was only fielded in one product that I know of.

    They also use C/C++. The empirical evidence tells us teams with little focus on or time to address QA who use those languages will produce more vulnerable code than memory-safe languages. They should probably get off C/C++ slowly over time if they are minimizing QA and not throwing all kinds of verification tooling at what their teams produce. They need a safe-by-default language with ability to selectively tune modules so we at least know where the dragons will be. The rest might be fine… by default.

    That’s why dodging C/C++ is best fit for these kinds of companies and projects. Which is most of them that I can tell.

    1. 27

      That’s part of the reason.

      It’s actually the entire reason.

      The other part is that Microsoft is intentionally not spending enough money applying the tools for finding bugs or proving their absence in C/C++ code. Tools that Microsoft can afford to not only license: they could acquire the entire company, apply it to their code base, do all the fixes w/ false positives identified, and still have billions of dollars. They might have even cornered the market on C/C++ analyzers buying all of them at some point. They just don’t care.

      This is an extremely misinformed and baseless opinion. I used to work on MS Office and I can’t imagine a company spending more time and effort on securing C/C++ codebases with amazing static analysis tooling, insanely smart people working on teams entirely dedicated to security, the SDL, etc. It’s a herculean effort.

      1. 8

        Microsoft spent a long time not doing security. This let them acquire a legacy codebase which was full of problems, esp design. They also intentionally used obfuscated formats to block competition. While achieving that, it made long-term maintainability and security worse. While their security features were basic, smaller companies were designing security kernels into their OS’s that they were building in safer languages such Pascal, Ada, and Gypsy. Some of this work predates the existence of Microsoft. Some of them tried to sell to Microsoft who aren’t interested in those or other such products to this day. They build their own stuff with less assurance than projects like this. See Layered Design and Assurance sections that deserve the phrase “herculean effort.”

        Note: That project, using a basic analysis for timing channels, found side channels in caches using relatively, low-cost labor. I heard Microsoft is offering six digits for anyone that finds those with critical impact. Maybe they should’ve just applied public methods that worked. Data61 did, finding lots of new components to hit.

        So, Microsoft ignored high-assurance security despite being one of few companies that could afford to apply it, at least on new code. They got hit by every preventable defect you could think of. Eventually, they caved into pressure to save their image by hiring Steve Lipner from VAX VMM. Microsoft’s priority was quickly shipping massive piles of features not designed with security in mind connected to legacy code with same problem. So, Lipner’s philosophy there was to dramatically water down the TCSEC into the SDL to add some review, testing, and other steps to each phase of their lifecycle. This made huge improvements to the security of their codebase but fell totally short of what Microsoft was capable of doing with money on hand.

        Today, we’ve seen both small companies and FOSS projects delivering useful functionality with great security. They often had access to fewer tools than Microsoft. The trick was they used careful design, minimalism in implementation style, and whatever tools they had available. Unlike Microsoft’s code, theirs did way better during pentesting. We also have some suppliers formally verifying their OS’s or core protocols. MS Research has world-class people in that are who Microsoft’s managers largely don’t use. I remember Hyper-V using VCC with the company using some of their static analysis, tools. They applied driver verifier, which eliminated most blue screens. That’s about it. There’s also tools like RV-Match, Trust-in-Soft, and Astree Analyzer that can prove absence of the kind of errors hitting Microsoft’s codebase. Far as I know, they’re not using them. There’s also about five types of automated, test generation that knock out piles of bugs. Their developers didn’t mention Microsoft using them even though MS Research could’ve clean slated them in better form in short time if management requested.

        Just piles and piles of methods out there, mostly from 1970’s-1990’s, that do better than what Microsoft is doing now. Altran/Praxis said their Correct-by-Construction methodology cost about 50% extra to do. Cleanroom costs anywhere from negative (lifecycle savings) to that. You’re telling me Microsoft is throwing herculean effort at their codebase but the results consistently have more defects than smaller players’ builds? I doubt that. I think they’re likely throwing herculean effort at inferior design techniques, implementation methods (including C/C++), and inadequate tooling. This is great for checklists, even producing a lot of results. For actual security, their money spent to vulnerabilities found ratio is way worse than what those smaller firms were doing on projects with a handful of developers. A firm with billions a year in profit employing people with similar skill should do better. If they cared and listened to their best experts in security and verification. ;)

        1. 6

          They applied driver verifier, which eliminated most blue screens. That’s about it. There’s also tools like RV-Match, Trust-in-Soft, and Astree Analyzer that can prove absence of the kind of errors hitting Microsoft’s codebase. Far as I know, they’re not using them.

          I don’t know why you’re persisting here but you have absolutely no information. Microsoft has an internal version of SAL/PREfast that does static analysis beyond the compiler and it’s amazing, and they’ve been using it for far longer than most of the industry has cared about static analysis. It can find things like unsafe uses of raw buffers, unsafe array access, return values that go unbound, etc.

          1. 1

            Microsoft has an internal version of SAL/PREfast that does static analysis beyond the compiler and it’s amazing

            I mentioned their use of internal tooling for static analysis in my comment on top of a few others (SLAM and VCC). Apparently I did have information since you just corroborated it. Since you’re continuing with unsubstantiated claims of rigour, let’s just go with what others were doing comparing it to Microsoft. Just tell me which of these methods have been in use in your groups. I’d like the names of the tools, too, so any corrections I get tell me about tools that I can pass onto other developers. I’m leaving off review since I know Lipner had that in every part of SDL.

            1. Formal specifications. Z, VDM, ASM’s… companies like IBM started specifying requirements and/or high-level design in a way where they could capture properties about the data and interfaces. Those tools caught all kinds of errors. VDM and ASM at different points had code generators. Many companies used SPIN for analyzing concurrency in apps and protocols with similar successes. TLA+ is taking over that niche now. Eiffel and Praxis were using Design by Contract for their stuff. Where the requirements and/or design of Microsoft’s components formally specified to prevent errors due to ambiguities, incorrect bounds, and/or interface mistakes?

            2. Static analyzers. The standard practice at places with funding is to use more than one since each tend to catch errors others miss. They’ll use at least one, badass, commercial tool they have to pay for combined with a mix and match of open-source tools with proven track record. NASA’s teams with less funding than Microsoft were using about four. Some folks mix and match the commercial ones with at least one that finds tons of things w/ false positives combined with one of the “no-false positive,” sound tools. Which three or four other static analyzers were Microsoft using in addition to their internal tool?

            3. Dynamic analysis. Did they use any of those tools to supplement the static analysis and test generation through program analysis? Teams loading up on tooling will typically try at least one. It’s usually Daikon.

            4. Testing. Path-based, symbolic, combinatorial, spec/model-contract-based, concolic… lots of phrases and methods in Lobsters submissions for automatically generating tests from software that hit high coverage throughout it. If I was Microsoft, I’d use one of each just alternating them during overnight runs. The rest of the time before the morning would be fuzzing. Which automated, test generators were they using while you were at Microsoft? Did they at least try KLEE on production code? And how much fuzzing did they do?

            5. Concurrency. I mentioned model-checking in SPIN. However, Eiffel had a model called SCOOP that makes concurrent code about as easy to write as sequential code. There’s been a few that automatically put in locks after annotations. IBM had a tool for their Java apps that would automatically find lots of concurrency errors. Was Microsoft following high-assurance practice in using something like Ada Ravenscar or SCOOP to be immune to some of these heisenbugs? Followed by an analysis tool and/or model-checking to catch the dead/live-locks that might slip through?

            6. Aside from OS’s, a number of projects like cryptlib and OP Web Browser (later Chrome, too) were building in forms of security kernels inside the app with isolation mechanisms, enforcement of policies, and/or dropping privileges of whole app or specific components when unneeded. Guttman adopted what was called “assured pipelines” inside his library. You specify exactly what order functions should be called in what states of the program. They do these through the security kernel instead of just jumps. It can be an actual kernel underneath processes or just a module that checks calls. Did Microsoft’s products use such security mechanisms to prevent situations such as opening a Word document from controlling an entire computer?

            7. Compilers had a bad habit of introducing bugs and/or removing security checks. The optimizations also screwed things up a lot. A number of teams in academia started writing their compilers in strongly-typed, safe, functional languages using a combo of intermediate languages and type system to improve quality. A few verified compilers for simple languages with CompCert doing one for C. AdaCore modified their IDE’s to allow developers to see the assembly easily for human inspection. Did you hear about or see the compiler teams at Microsoft use 1-5 on top of the safe, functional, type-driven approaches that were working? Did they license CompCert to protect their C code like companies in safety-critical are doing? Did they develop or contract their own certifying compiler for C and/or C++?

            8. Drivers and protocols. SLAM was a nice, success story. It would be great if they forced all drivers to also undergo at least 2-4 above. They could also acquire AbsInt to get both CompCert and Astree Analyzer. Astree proves absence of all the common problems in code structured in ways similar to how one might write device drivers. That’s on top of probably many other things in Windows ecosystem. At one point, Astree was the only tool that could do this. You hear about Microsoft buying AbsInt or just licensing Astree to use on as many drivers and protocols as it can with passing being mandatory for interoperability certification and branding?

            9. Formal proof. MS Research is awesome at all that stuff. They have tools that make it easier than usual. They even applied some of them to MS products as part of their funded, research activities. Microsoft adopted a few. They have more, though, like the ability to do everything from proving code correctness in Dafny to type-safety in assembly. Did you see them using any of that at least on riskiest and most-used routines? Did any assembly code you see use their type- and/or memory-safe variants? Were they using Frama-C or SPARK Ada on critical routines?

            So, there’s a few just based on what academics, smaller companies, defense contractors, smartcard companies, and a random number of players in other industries have been doing. If they can do it, so can Microsoft. Some of this was done on tech that predates Microsoft. So, it would’ve been easier for Microsoft to do it than the forerunners. These are also a diverse array of things that industries do when they really want to achieve the goal. They throw everything at it that they can afford to do. Most can’t afford to do all of this. Some do, though, with those NASA teams and Praxis having way, way, less money than Microsoft’s operations side. They still put in a herculean effort with some of the lowest-defect, most-reliable software in existence coming out. If Microsoft did likewise, you’ll be able to name a lot of tools and approaches in 1-9.

            I actually hope you do since I’d love to be wrong. Problem is many of these are trick questions: many CVE’s Microsoft got hit by would be impossible if they applied these methods. Empirical data already refutes your point or refutes my belief that using 1-9 creates low-defect software. Most evidence indicates my belief that 1-9 bottoms out the defects is true. So, I defaulted to believing Microsoft wasn’t using these methods… wasn’t putting in herculean effort like other companies, some academics, and some folks FOSSing stuff were. Their employees might not even know much of this stuff exists. That would be more damning given MS Research has bright folks who certainly pushed adoption of some of that. What of 1-9 did Microsoft do in your group and other groups doing production software?

            Note: That excludes Midori even though it was used in a production app. It was mostly just for research. I haven’t heard about any migrations of their services to it.

            1. 3

              This is a really weird conversation. Do you want to know more because you’re curious or are you quizzing me? I honestly can’t tell.

              1. 3

                I keep track of what methods different companies use to assure software, what results, and (if good results) their availability. I do comparisons among them. I do it mostly to learn and pass things on. I also use it to objectively assess what they’re doing for software correctness, reliability, and security. I assessed Microsoft based on their simultaneous decline in 0-days and blue screens (success of their methods) plus evidence in what remains that they were not using strongest methods that they certainly could afford. I claimed apathy was the reason that they’d ignore those methods, especially automated ones, in their assurance activities. They could give their teams the training and budget if they wanted.

                You said they were doing all kinds of things, putting in massive effort, and I had no clue what I was talking about. You kept insisting that last part without evidence past an argument from authority (“I was there, I know, trust me”). I very rarely have people tell me I’m clueless about verification tech, esp if seeing my Lobsters submissions and comments. Maybe you’re new I thought missing stuff like this.

                In any case, whether I’m right or wrong, I decided to test your claim by presenting different classes of assurance techniques companies outside Microsoft were using (esp projects/companies with fewer resources), describe some benefits/examples, and then ask what Microsoft was doing there. If I’m wrong, you’ll have examples of most of that maybe even with descriptions of or links to the tools. If I’m right, you’ll be looking at that stuff saying “we didn’t do almost any of that stuff.” You might even wonder if it was even possible to do that stuff given the budget and time-to-market requirements forced on you. If so, that makes it even more likely I’m right about them intentionally limiting what security you could achieve to maximize their numbers for the execs.

                So, I’m simply testing your claims in a way that gets you to give us actual data to work with, maybe teaches us about awesome QA methods, and maybe teaches you about some that you weren’t aware of. I try to be helpful even in debate. There’s also others reading that might benefit from specific techniques and assessments in these comments. They’ll tune out as the thread gets deeper and more flamish. Hence, me just switching right to asking about specific methods already used outside Microsoft, some for decades, to see if Microsoft was matching that effort or doing less-effective efforts.

                1. 1

                  I’m sorry but without primary sources, you’re appealing to (self) authority to make your claim. You seem to be self assured in your position, enough to analyze internal Microsoft decisions.

                  1. 1

                    Im really not. Ive submitted endlessly examples of these activities countering problems Microsoft is experiencing. If they’re experiencing them, then it’s a valid inference that Microsoft isn’t using them.

                    The other person is doing argument from authority by saying we should trust a huge claim contradicting empiricsl observations just because they claim to know what Microsoft was doing. No evidence past that. Weird that you call my comment out rather than theirs or both of ours if you wanted provable, primary sources or evidence.

                    1. 1

                      If they’re experiencing them, then it’s a valid inference that Microsoft isn’t using them.

                      No, you’re creating a black-and-white issue where there isn’t one. Formal methods (as you undoubtedly know) have many issues when used in practice. Maybe Microsoft did try to use these tools (especially given how many of them come from MSR), but found their codebases intractable, the amount of instrumentation to be too onerous to ship without a major rewrite, or maybe they did run these tools and classes of bugs were still not found, or there were too many false positives to justify using a tool. It feels overly simplistic, and almost disingenuous, to insinuate that just by having certain classes of bugs that they didn’t run the formal checkers or implement the formal methods that counteract them.

                      1. 1

                        Although you bring up sensible points, let me illustrate why I’m not using them in a Microsoft assessment by applying your comment pre-SDL. That was back when Microsoft was hit by vulnerabilities left and right. We claimed they weren’t investing in security based on that field evidence. We also cited OS’s and products with fewer vulnerabilities and/or better containment as evidence of what they could’ve been doing. You could’ve come in with the same counter:

                        “No, you’re creating a black-and-white issue where there isn’t one. Security engineering (as you undoubtedly know) has many issues when used in commercial products. Maybe Microsoft did try to use such methods (especially given how many in MSR know about them), but found their codebases intractable, the amount of instrumentation to be too onerous to ship without a major rewrite, or maybe they did do reviews and run security tools and classes of bugs were still not found, or there were too many constraints or false positives to justify attempting to secure their legacy products or use security tools. It feels overly simplistic, and almost disingenuous, to insinuate that just by having certain classes of bugs that they didn’t have a security team in place or use the techniques that counter them.”

                        Yet, we’d have been right because it wasn’t simplistic: methods and tools that consistently improve things in specific ways would’ve likely done it for them. Then, they adopted a few of them, new results came in, and they matched our outcome-based predictions (i.e. corroboration). Piles of companies, including IBM, were using some on my list. Some of these were on big projects, new and legacy. Hell, fuzzers are a low-effort tool that work on about everything. Like shooting fish in a barrel I’m told.

                        I doubt Microsoft is doing something so unique that no extra techniques could’ve helped them. If anything, you’re introducing a new claim against decades of field evidence saying Microsoft’s code is so unique that universal, QA methods all fail on them except a few, specific tools they’re already using. Since mine is status quo about QA results, burden of proof is on you to prove Microsoft code is the exception to the rule.

                        That brings me to another problem with your counter: you exclusively focus on formal methods, the smallest part of my comment. If that’s all I said, I could about half-concede it to you at least on legacy, C/C++ code they had. However, my assessment had a range of techniques which included push-button tools like source-based test generators, fuzzers, and 3rd-party static analyzers. We’re talking low to no effort tooling that works on fairly-arbitrary code. You’re telling me Microsoft is about the only company who couldn’t benefit from using extra tools like that? And with what evidence?

                        The simplest test of your counter is quickly DuckDuckGoing if anyone used a fuzzer on Microsoft products getting results in relatively, little effort. Yup. I stand by my method of extrapolating likely internal investment in techniques from results those suppliers achieved as assessed by independent parties hacking the software. If one is low and other is high, then they probably aren’t doing enough of the good things.

        2. 3

          The point is that they could do more. I believe Microsoft can invest more in security, but probably can’t do so economically now. Which is, as you said, herculean, since I believe other companies can invest more in security, even in economically sound manner.

          1. 2

            The Office team is like a completely separate company compared to how other MS teams work, so maybe Office was better than the others’ at this?

            1. 1

              Could very well be!

        3. 8

          My summary of most of the defensive comments I’ve seen to this article: C doesn’t kill people, people kill people!

          1. 0

            This is interesting, but also unsurprising, as writing correct C or C++ may as well be impossible for anything but trivial programs.

            I don’t see why this post is tagged with rust, considering Ada is a much better language and, unlike rust, has actually been used in critical software with real repercussions, along with having real standards that don’t constantly change. Ada has an ’‘always safe unless you explicitly use a subprogram that can violate the rules’’ attitude, whereas rust has an attitude of ’‘you can do whatever you want in the strait jacket and if that’s too restrictive you’re on your own’’.

            It’s possible to write a large and featureful Ada program without using anything unsafe, whereas it’s my understanding this isn’t practical at all with rust.

            Ada is used in rockets and automobiles and other situations where people die unintentionally if things go wrong. Rust is used in Firefox. That says it all, really.

            1. 12

              Are there stats on how widely Ada’s used? My understanding was the people who use it really like it, but most automobile code is still in C.

              Rust is used in Firefox. That says it all, really.

              Rust is much younger than Ada, so we can’t infer much from this.

              A more interesting question: how much Ada code in the wild is Ada12 vs an older version?

              1. 8

                I agree it is irrelevant to tag it rust, I’ve removed the tag.

                Some notes:

                whereas it’s my understanding this isn’t practical at all with rust

                that isn’t true, you can write large and featureful programs without it.

                Ada is used in rockets and automobiles and other situations where people die unintentionally if things go wrong. Rust is used in Firefox. That says it all, really.

                that feels fairly dishonest to me. Also, the browser is a security-critical component deployed on essentially every user-facing computer manufactured. I can’t think of a more important place to have very safe tools. Reminds me of that (possibly apocryphal) story about memory leaks and cruise missiles.

                1. 7

                  It’s definitely possible to write large Rust programs with very few or no uses of the unsafe keyword. It is the case though that the Rust standard library sometimes uses unsafe-marked code with safe abstractions wrapped around it. Ideally these safe abstractions should actually be safe, but it’s possible in principle that there’s a bug in one of those abstractions that lets the memory-unsafe behavior leak in certain circumstances.

                  I’m curious if this is the case for Ada as well - does Ada also wrap unsafe abstractions in ostensibly-safe ones that might prove unsafe because of a bug, or does it have some way of guaranteeing the safety of those safe abstractions themselves?

                  1. 4

                    I think it’s mostly that Ada puts safety at the top, and will add checks to code, whereas the Rust devs have convinced themselves that they can get C++ programmers to switch if and only if there’s never any runtime penalty for safety.

                    1. 11

                      Yes, Rust aims slightly different as Ada. If Ada checks all your boxes and the runtime properties are fine with you, by all means, use it.

                      I also heard from some Adacore people that they enjoy having Rust as a competitor, as it brings their topics on the table again and raises awareness.

                      1. 1

                        I was thinking AdaCore should become a supplier of Rust IDE’s, a second compiler, a provable subset like SPARK, and corporate libraries. They might even help improve Rust language with their experience.

                        They can continue supporting Ada for legacy customers while simultaneously having a lucrative transition path to Rust. Maybe even automated tooling they make some consulting revenue on, too.

                        1. 1

                          I’m not trying to badmouth Rust, I’m a big fan. I’d be using it right now if there was any sort of stable UI solution that was made within Rust (not bindings to things like QT and friends). For now it seems there’s a lot of churn in that space, with too many projects too young.

                          1. 2

                            All good, I didn’t understand it as such. I just wanted to make the point that we see Ada as a competition we can grow with.

                        2. 1

                          Do existing Ada implementations have moderately-quick compile times?

                        3. 1

                          it’s possible in principle that there’s a bug in one of those abstractions that lets the memory-unsafe behavior leak in certain circumstance

                          No, it actually happened, not long ago, in the standard library.

                          1. 14

                            Yes, and we’re quite upfront about that. I hate it when some people paint it in another way. There is a point where you need to take a fundamentally unsafe thing (memory and processors) and start building a base of a safe system.

                            There are advantages in Rust, e.g. being able to fix that abstraction in one go and and update, by people sharing it, but the danger is there.

                            That’s also the reason why there is a large ongoing project that formalizes lifetime, borrowing and multiple types that we consider as important as primitives (e.g. Mutex). They regularly find subtle bugs. Not day-by-day, but like, every quarter.

                    2. 1

                      This is an unfair comparison as Ada is what 40 years old whereas Rust is 10 years old.

                      1. 13

                        Rust, in a fashion that can be adopted, is ~4 years old. Before that, it can be considered a research language.

                        Also, for that reason, I have no problem with it currently not being used in high-safety-critical places, this is a thing that has to build over time. And I want people to take that time.

                        1. 3

                          I applaud your honesty in saying that.

                        2. 6

                          I see that only as a point in Adas favor in this context, where”tried and true” Actually means something.

                          I’m not neophobic; I love exciting new technology as much as the next guy. I even like seeing new developments in the “safer alternatives to C” category, which is something that sorely needs constant research and new inventions. I just think that all else being equal, the older tech has a leg up simply for being older, in this area.

                          Rust is a fantastic breeding ground for ideas, and I’m convinced that at some point, Rust (or something like it) will be so obviously superior to the Ada dinosaur that it will be better dispute its age difference. We’re just not there yet.

                          1. 1

                            Isn’t that “leg up” the reason why Adam said it wasn’t a fair comparison?

                        3. 0

                          I don’t agree with either language in this context. I like Rust well enough, and Ada might be cool though I don’t know much about it, but there’s zero chance of Microsoft moving any of their critical codebases to either. Getting them to stop active development and get serious about static analysis, fuzzing, etc might be possible. Gradually converting some things over to a safe subset of C++ might be possible. A total rewrite of their major products in an unproven language is a non-starter.