1. 46
    1. 17

      This is a very interesting post! One takeaway is that you don’t need to re-write the world. Transitioning new development to a memory safe language can bring meaningful improvements. This is much easier (and cheaper) than needing to port everything over in order to get an effect.

      1. 2

        agree! I love rust, but swift’s interop story with cpp is very intriguing. Facebook shared in a podcast about the difficulties of cpp interop with rust’s async await paradigm. I feel the hard problem is Cpp being difficult to use as a lingua franca.

      2. 10

        The argument is that most vulnerabilities come from recently-added code, so writing all the new code in a safe language (without touching old code) is effective at reducing the amount of vulnerabilities, because after a few years only safe code has been recently added, and older code is much less likely to still contain vulnerabilities. (More precisely, they claim that vulnerabilities have an exponentially-decreasing lifetime, pointing at experimental findings from previous research.)

        I find the claim rather hard to believe, it is implausible and my intuition is that it is completely wrong for many codebases. For example, if I have an unsafe-language codebase that has very few users and does not change often, by the reasoning above we could wait a few years and all bugs would have evaporated on their own? Obviously this is not true, so the claim that vulnerabilities have an exponentially-decreasing lifetime must only hold under certain conditions of usage and scrutiny for the software. Looking at the abstract of the academic publication they use to back their claim, the researchers looked at vulnerability lifetimes in Chromium and OpenSSL. Those are two of the most actively audited codebases for security vulnerability, and the vast majority of software out there does not have this level of scrutiny. Google has setup some automated fuzzing for open source infrastructure software, but is that level of scrutiny enough to get into the “exponential decay” regime?

        So my intuition is that the claim should be rephrased as:

        • if your unsafe-language software gets similar level of security scrutiny as Chromium or OpenSSL
        • and you start writing all new code in a safe language or idiom
        • and you keep actively looking for vulnerabilities in the unsafe code
        • then after a few years most safety vulnerabilities will be gone (or at least very hard to find), even if a large fraction of your codebase remains unsafe

        Phrased like this, this starts sounding plausible. It is also completely different from the messaging in the blog post, which makes much, much broader claims.

        (The post reads as if Google security people make recommendations to other software entities assuming that everyone has development and security practices similar to Google’s. This is obviously not the case, and it would be very strange if the Google security people believed that. They probably have a much narrower audience in mind, but miscommunicate?)

        1. 7

          For example, if I have an unsafe-language codebase that has very few users and does not change often,

          I think another difference between Google’s perspective and yours, in addition to that their old code gets vulnerabilities actively hunted, is that they’re focussing on codebases where large amounts of new code are added every year, as they add features to their products.

          1. 6

            If the alternative is “keep doing what you’re doing” (and “rewrite everything in a safe language” not being an option), I’m sure everyone’s better off adding new stuff in safe languages, even if the unsafe bits don’t get as much scrutiny as Google’s stuff. Eventually, you’ll probably rewrite bits you have to touch anyway in a safe language because you’ll feel more proficient in it.

            1. 3

              Okay, yeah, “your software will be safer if you write new stuff in a safe language” sounds very true. But the claims in the blog post are quite a bit stronger than that. Let me quote the second paragraph:

              This post demonstrates why focusing on Safe Coding for new code quickly and counterintuitively reduces the overall security risk of a codebase, finally breaking through the stubbornly high plateau of memory safety vulnerabilities and starting an exponential decline, all while being scalable and cost-effective.

              An exponential decline in vulnerabilities is a rather strong claim.

              1. 7

                But it’s an extremely realistic claim for any code base that is being actively worked on with bugs being fixed as they are found. That may not apply to your code bases, but I think it’s a very reasonable claim in the context of this blog, which is making something that is widely used much safer.

                1. 2

                  I don’t find it realistic. Bugs in general, sure: we find bugs by daily usage of the software, report them, and they get fixed over time – the larger the bug, the sooner it is found by a user by chance. But security vulnerabilities? You need people actively looking for those to find them (at least by running automated vuln-finding tools), and most software out there has no one doing that on a regular basis.

                  1. 1

                    Because most software is not critical to safety? At least yet, because there are juicier targets?

                2. 4

                  They support the claim with real world measurements over several years.

                  1. 1

                    I went to look a bit more at the PDF. One selection criterion is:

                    The project should have a considerable number of reported CVEs. In order to allow a thorough analysis of all projects, we limited ourselves to those with at least 100 CVEs to en- sure meaningful results

                    How many CVEs have been reported against the software that you are writing? For mine, I believe that the answer is “2” – and it is used by thousands of people.

                    My intuition is that the experiments in the paper (that claim exponential decay) only apply to specific software development practices that do not generalize at all to how the rest of us write software.

                  2. 1

                    Yeah, that sounds like hyperbole for sure

                3. 5

                  That claim is based on some Google Project Zero work, but it’s not aligned with my experience either. I suspect that it’s an artefact of the following flow:

                  1. Find a new vulnerability.
                  2. Search the code for similar code patterns.
                  3. Fix all of the instances you find.

                  Imagine that you fix all of the occurrences of bug class A in a codebase. Now you write some new code. A year later, you look for instances of bug class A. They will all be in the new code. In practice, you don’t fix all instances, but you fix a big chunk. Now you’ll see exponential decay.

                  The converse is also common: Find an instance of a bug class, add a static analyser check for it, never see it in new code that’s committed to the project.

                  The problem with all of these claims is that there’s no ground truth. If you could enumerate all of the bugs in, say, Linux, then you could (moderately) easily map them back to the commits that introduced them. If you could do this, you could also ship a 100% bug-free version of Linux. In practice, you only have data on the bugs that are found. That tends to be bursty as people find new techniques for identifying bugs.

                  In the things that we’ve ported to CHERI, I don’t think we’ve seen evidence that memory-safety bugs are more likely to be present in new code. Quite a few of the bugs we’ve found and fixed have been over 20 years old. There is certainly an effect that bugs that cause frequent crashes get fixed quickly, but the more pernicious ones where you’ve got a small out-of-bounds write, or a use-after-free that depends on concurrency and doesn’t trigger deterministically, are much more likely to hide in codebases for a long time.

                  1. 3

                    Quite a few of the bugs we’ve found and fixed have been over 20 years old.

                    Nota bene, one from a code base of similar heritage is about to drop with incredibly wide attack surface.

                    1. 2

                      In the things that we’ve ported to CHERI, I don’t think we’ve seen evidence that memory-safety bugs are more likely to be present in new code. Quite a few of the bugs we’ve found and fixed have been over 20 years old.

                      Doesn’t this undermine an argument you’ve used for why to use an old TCP stack in C rather than newly written one in Rust? As I recall, the thinking went that the old TCP stack was well tested and studied, and thus likely to be better both in terms of highly visible bugs and in security bugs, than a newly written Rust version.

                      1. 5

                        Doesn’t this undermine an argument you’ve used for why to use an old TCP stack in C rather than newly written one in Rust? As I recall, the thinking went that the old TCP stack was well tested and studied, and thus likely to be better both in terms of highly visible bugs and in security bugs, than a newly written Rust version.

                        Possibly. I’d like to see a new TCP/IP stack in Rust that we could use (which needs some Rust compiler support first, which is on my list…) but yes, I would expect very new code to be buggy.

                        I think I expect something less of a decay. Very new code has had little real-world testing. A lot of things tend to be shaken out in the first couple of years. Very old code likely has a lot of bugs hiding in it that no one has looked at properly with more modern tooling. I’m not sure where the sweet spot is.

                        My main worry with a new TCP/IP stack is not that it’s new code, it’s that the relevant domain knowledge is rare. There’s a big difference between a new project and new code in an existing project. Someone contributing code to an existing TCP/IP stack will have it reviewed by people who have already learned (often the painful way) about many ways to introduce vulnerabilities in network protocol implementations. If these people learned Rust and wrote a new stack, they’d probably do a better job (modulo second system problems) than if they did the same in C. But finding people who are experts in Rust, experts in network stack implementations, and experts in resource-constrained embedded systems is hard. Even any two out of three is pretty tricky.

                        1. 4

                          The most popular Rust TCP stack for embedded is, I think, smoltcp, which was created by someone who I am very sure is an expert in both Rust and resource-constrained embedded systems, but I have no idea how to evaluate their expertise in network stack implementations, nor the expertise of its current maintainers.

                          It might not be suitable anyway since it is missing a bunch of features.

                          1. 6

                            We use smoltcp at Oxide, so it is at least good enough for production use, if it fits your use case. As you say at the end, it’s possible there are requirements that may make that not work out.

                  2. 4

                    This is completely beside the point, but what does “95% faster” mean? new time = 0.95 * old time? new time = old time - 0.95 * old time? new time = old time - 0.95 * new time? I’m honestly not sure if they mean “slightly better speed” (not as impressive as the context suggests), “about 2x speed” (pretty good tbh), or “about 20x speed” (wow that sounds way faster than 95%).

                    1. 3

                      If sandboxing has some fixed overhead even for trivial operations (such as generating a QR code) then a 2x speedup seems reasonable, and is also the meaning that I would expect. IMO, the only reasonable interpretation (which doesn’t necessarily mean the one people use) of “x% faster” is “new_rate = old_rate + old_rate * x%”.

                      1. 3

                        And since rate=1/time, that gives:

                        new_rate = (1 + x%) * old_rate
                        
                        1 / new_time = (1 + x%) / old_time
                        
                        new_time = old_time / (1 + x%)
                        

                        E.g. if the old time was 100ms, then the new time is

                        100ms / 1.95 = 51.3ms
                        

                        This all being under Lonjil’s Reasonable Interpretation of Speedup Numbers.

                    2. 4

                      Diluting the old unsafe code with new safe code sounds a bit like herd immunity in epidemiology.

                      1. 4

                        I don’t think there’s the same parallel because it often takes one memory safety bug to get arbitrary code execution. The WannaCry ransomware attacks were enabled by a single memory-safety bug in the SMB stack in Windows, for example, and cost billions in damages. It didn’t matter that Microsoft had used a load of static analysis on the code and fixed a lot more bugs, one was enough.

                      2. 2

                        The article repeatedly mentions reducing the proportion of vulnerabilities (including in Android) caused by memory unsafety but only in the middle explains why this matters, for readers who don’t already know, so I highlight that explanation:

                        As we noted in a previous post, memory safety vulnerabilities tend to be significantly more severe, more likely to be remotely reachable, more versatile, and more likely to be maliciously exploited than other vulnerability types. As the number of memory safety vulnerabilities have dropped, the overall security risk has dropped along with it.

                        1. 2

                          In the December 2022 blog post they link to, it is stated that there are 1.5 million lines of Rust code in AOSP. Does anyone know of a more up to date number?

                          1. 2

                            Glad to see c++ is used to show it can be written in a safe way.

                            1. 1

                              This was certainly an interesting article. Apparently written with the Kotlin supplementing Java perspective in mind, and also that of Rust supplementing C and C++.

                              My takeaway though was that for C code bases, Rust may be “a bridge too far”, and hence it struck me that the “betterC” subset of D (dlang.org) may be an easier sell. Or in its GDC form ‘-fno-druntime’.

                              That as D in that subset form, hence sans GC, has some worthwhile improvements over C, a large degree of interoperability in a mixed source code base, and does not require too much additional learning. i.e. it would be an easy transition. My preference would probably be for Zig, but since that is still in flux, it would also be too difficult a sell with colleagues.

                              1. 11

                                D and Zig aren’t memory safe though?

                                It is very odd to me that people will see some result about how moving to some chunk of a codebase to Rust improves some aspect of the codebase because of Rust’s memory safety, and suggest that this means that codebases should move to their favourite non-memory safe language instead in order to get those benefits.

                                1. 2

                                  I’d suggest that “memory safety” is not an absolute state, but something which comes in degrees. That Rust merely offers more (at a cost), not that D and Zig are without safety features.

                                  So the ability to have bounds checks on arrays, and arrays/slices passed as first class objects close one significant memory issue (buffer overflow / access beyond array bounds). The ability to have pointers forbidding pointer arithmetic, and points which forbid holding nil references is another. In the absence of parallel code, those alone are probably the major gains.

                                  Note that the original article was (at least as I read it) not just about moving to Rust (it also seemed to suggest moving to Kotlin), but about how adding new code in a safer language (as opposed to rewrite it all) has measurable, and significant improvements.

                                  Which then offers some home for those of us who view adopting Rust as impractical for various reasons.

                                  BTW - if we deem GC languages as “safe”, one can observe that D can be run using a GC, and hence used in a “safe” fashion.

                                  1. 6

                                    BTW - if we deem GC languages as “safe”, one can observe that D can be run using a GC, and hence used in a “safe” fashion.

                                    I don’t know enough about D to be able to evaluate this, but I want to point out that having a GC does not make your safe. For example, Go is not memory safe in the presence of data races.

                                    1. 6

                                      So, ‘memory safety’ has a meaning, and it isn’t ‘has safety features that are related to memory’. Neither Zig nor D are memory safe, and neither is D’s GC mode, any more than C or C++ are memory safe just because you can run them with GC.

                                      Rust, Kotlin, and some iterations of Visual Basic (under some constraints) are all examples of memory safe languages; the reason that Rust is specifically being considered for Android stuff (and also for Linux kernel stuff) is because it is being used in an area that (at least notionally) needs the kinds of things that only a C-alike can provide, and Rust is the only practical C-alike that is also memory safe, hence it being a practical option to replace C in places where C had not thus far been able to be replaced by a non C-alike language.

                                      When you increase the scope to ‘replacing C in places where C should never have been used in the first place, as you could have used Ruby or something instead’ then yes, Rust is probably not the best option - you could just use Ruby or something; but Zig and D are definitely not viable options. In that kind of case, you should of course be looking at a way higher level language (that is actually memory safe), and there are an infinitude of excellent options.

                                      Anyway, the point is that the article gives no reason to move from C to Zig or D at all. You can still of course argue that Zig or D are good to move to because you like them better than C or Rust, but that is quite a different argument.

                                      1. 2

                                        In the corporate world, a lot of this is being driven by a push from the US government, and they seem to be using a more inclusive definition of “memory safe”, which doesn’t exclude data races.

                                        At least some of the documents they’ve produced lists the following as MSL: C#, Go, Java, Python, Delphi/Object-Pascal, Rust, Swift, Ruby, Ada.

                                        We know it is still possible to have data races in Go depending upon how one writes code.

                                        The point of the push from the USA seems to be mainly to take a step change away from C and C++ (or assembly), and toward those with forms of automatic memory management, and bounds checking. Which then places lots of other languages in scope. Not just Rust and Kotlin.

                                        1. 4

                                          Man, you really keep moving the goalposts.

                                          How is that relevant to the article, which you were commenting on, where in your comment you were suggesting that in response to the article, there could be a justification to move to Zig or D instead of Rust? Even under some kind of colloquial and lax view of memory safety, Zig and D are not memory safe, unless you also count C and C++ as memory safe, so…

                                          If the argument is ‘I think we could fool the US Government’s safety push by claiming D or Zig are memory safe, so we can adopt those instead of going memory safe’ then… okay, but uhhh what? And also, how does this connect to the rest of what you were saying?

                                          Edit: If your argument was a more general ‘hey, they have shown some improvements by doing a thing, but I (or hypothetically my workmates or etc) don’t like it, so maybe we could do the opposite but in a way that I (or hypothetically my workmates or etc) like and that would also help’, then, yes, that is an argument you could make, but it is weird to kind of pretend the ‘we could do the opposite’ bit in there is not going to get some strong pushback.

                                          1. 2

                                            Nope, in the corporate world the reason to pay ANY attention to this is that the US gov is pushing it. What they’re pushing is something simpler than “adopt Rust”, one of the docs being:

                                            https://media.defense.gov/2022/Nov/10/2003112742/-1/-1/0/CSI_SOFTWARE_MEMORY_SAFETY.PDF
                                            

                                            “Using a memory safe language can help prevent programmers from introducing certain types of memory-related issues. Memory is managed automatically as part of the computer language; it does not rely on the programmer adding code to implement memory protections. The language institutes automatic protections using a combination of compile time and runtime checks. These inherent language features protect the programmer from introducing memory management mistakes unintentionally. Examples of memory safe language include Python®, Java®, C#, Go, Delphi/Object Pascal, Swift®, Ruby™, Rust®, and Ada.”

                                            So in the context of the article, where my takeaway was that simply using some form of MSL for new code has worthwhile gains. I then used the expansive form of MSL which the US gov has been advocating as what one could add, not one parties particular favourite language. So one could add code to a largely C based project in any language which would provide some additional form of “guard rails”.

                                            Now if you wish to interpret the original article as simply “this is evidence only for adopting Rust/Kotlin”, then sure feel free to take that perspective.

                                            1. 2

                                              Now if you wish to interpret the original article as simply “this is evidence only for adopting Rust/Kotlin”, then sure feel free to take that perspective.

                                              The article was only providing evidence for adopting actually memory safe languages, so, yes, I take the perspective that the article said what it said, rather than pretending it said something else.

                                              Edit: And of the languages that they list, only Rust and maybe Ada work in cases where you need a C-like. So, in cases where you need a C-like (or where stakeholders are going to act like you need a C-like), the solution is in fact Rust (or maybe Ada). Which is what was under discussion. Did you even read my comments before replying to them?

                                          2. 4

                                            Considering Go is the only popular language with GC that fails to be memory safe in the presence of data races, I find it far more likely that it’s simply an oversight, rather than them purposefully not caring about data races.

                                            In the corporate world, a lot of this is being driven by a push from the US government

                                            But this doesn’t have anything whatsoever to do with the topic of this post. Android started using Rust in 2019, so they aren’t using it due to the US gov’t encouraging “memory safe languages” under some non-standard definition of memory safe, they’re using it because it solves real problems they’ve had for a long time.

                                            But also, even a laxer understanding of memory safety still excludes Zig, and presumably D. The US gov’t would obviously recommend neither, so I don’t get what your point is.

                                    2. 1

                                      My takeaway though was that for C code bases, Rust may be “a bridge too far”, and hence it struck me that the “betterC” subset of D (dlang.org) may be an easier sell. Or in its GDC form ‘-fno-druntime’.

                                      Maybe it’s a bridge too far for C programmers, but I would expect it to be much easier to add Rust to a C codebase than to a C++ codebase, and also easier to slowly migrate code.