1. 22
  1. 9

    For the magnitude of vulnerability types, I highly recommend looking at the 2019 Common Weakness Enumeration Top 25! For 2019, the top 25 was based on quantitative analysis of vulnerabilities reported in the NIST National Vulnerability Database (NVD; which itself includes all Common Vulnerabilities & Exposures [CVE] reports). This article describes in greater detail how NVD entries were categorized, how scores were calculated, what CWE entries almost made the cut, and the limitations of the approach.

    Unsafe deserialization (CWE 502) is #23 on the list, for example.

    Full disclosure: I work at MITRE, and personally know and have worked with members of the CWE team. I do not myself work on CWE.

    1. 6

      There are vulnerabilities that can only exist in memory-safe languages (e.g. use of eval on untrusted inputs; eval tends to only exist in very high-level languages, which are all memory-safe)

      eval exists for C programs for any system that can shell out to gcc. I’m not sure if this is a nitpick or not given the prevalence of gcc. Hopefully these kinds of security issues are rare for any programming language.

      1. 9

        (Author here)

        eval exists on x86, just JMP to a buffer! (This worked better before no-exec was a ubiquitous thing). Still, I think it’s important to recognize that as an empirical matter, eval of source code doesn’t exist in memory-unsafe languages in a meaningful sense.

        An early draft of this post said that I honestly wasn’t sure what belonged in this category :-)

        1. 3

          “Static typing where possible, dynamic typing where needed” argues that eval is quite pervasive:

          Many people believe that the ability to dynamically eval strings as programs is what sets dynamic languages apart from static languages. This is simply not true; any language that can dynamically load code in some form or another, either via DLLs or shared libraries or dynamic class loading, has the ability to do eval. The real question is whether your really need runtime code generation, and if so, what is the best way to achieve this.

          This is not just a theoretical argument, dynamic compilation and class loading is not unusual in Java, it’s the way Ruby’s current draft JIT works and Haskell wraps its own compiler in hint. Varnish’s config language is working in similar ways.

          I still agree it’s rather fringe, but not unheard of and I’ve seen it considered in practice.

          1. 2

            Your paragraph on vulnerabilities specific to memory managed languages mostly talks about “unsafe deserialization”. I don’t see how unsafe deserialization is limited to memory-safe languages. Is that what you meant to imply?

        2. 4

          Until you have the evidence, don’t bother with hypothetical notions that someone can write 10 million lines of C without ubiquitious memory-unsafety vulnerabilities – it’s just Flat Earth Theory for software engineers.

          Couldn’t say it better than this. I’m pretty irritated by the existence of genius programmers, who use C so well that they don’t ever create a bug in it; but in reality most of the time this just means that they don’t really write that much code they claim they do.

          1. 2

            Lots of software gets also much less scrutiny than peoples think. If your code is popular and not exploited, doesn’t mean it’s not exploitable.

          2. 3

            Until you have the evidence, don’t bother with hypothetical notions that someone can write 10 million lines of C without ubiquitious memory-unsafety vulnerabilities – it’s just Flat Earth Theory for software engineers.

            Or you could advocate for smaller dependencies in general. 1 million lines of C has fewer bugs than 10 million lines of rust by these metrics.

            1. 6

              That’s kind of an orthogonal tactic though. If you can restrict a project to 1m lines of C you could also restrict it to 1m lines of Rust. Changing the slope of the line is a win no matter the size of the project.

              1. 1

                I don’t follow at all. Small projects aren’t necessarily higher quality. If anything, many small dependencies pay a higher cost in overhead. A large project with substantial functionality is more likely to attract an ecosystem of contributors and therefore be able to amortize costs like security contacts and processes to ensure code quality.

              2. 3

                Excellent work. But from the viewpoint of science, you’re missing a part of the process: Looking at large projects in memory-safe languages (C#, Python, JS, whatever) and seeing how they stack up. You’ve gathered data and discovered your trend, now to actually confirm it’s real you need to turn the question around and double-check.

                That’s a harder task though, since you’re comparing less apples-to-apples. You’re also looking for the absence of something, which is harder.

                1. 2

                  https://lobste.rs/s/o0xtns/some_were_meant_for_c describes a hypothetical C compiler that emits dynamic checks for undefined behavior such as reading outside the bounds of an allocation. It would be interesting to see what percentage of vulnerabilities would be prevented this way, and at what runtime cost.

                  Similarly tools like nanoprocesses or RLBox might be a cheap way to retrofit existing code to limit the extent of any vulnerabilities, seeing as we’re not going to run out of legacy C code any time soon.

                  Other reasons to look at nanoprocesses or similar:

                  There’s been a sudden burst of research in memory-safe runtime-less languages. In a future where we write libraries in Rust, Rust++, Verdigris etc how do those libraries talk to each other safely? Do we all have to agree on a common lifetime system?

                  There’s been some concern lately about the size of dependency trees and the possibility of widely used libraries being hijacked. It would be nice if my json parsing library didn’t have the ability to read my private ssh key and tweet it.

                  1. 6

                    describes a hypothetical C compiler that emits dynamic checks for undefined behavior such as reading outside the bounds of an allocation

                    This compiler exists, and is called clang. The address and memory sanitizers emit this kind of check. There are also thread and UB sanitizers.