1. 32
  1.  

  2. 6

    He asked: why is there no argument to memcpy() to specify the maximum destination length?

    That’s the third one.

    If you really insist, #define safe_memcpy(d, s, dn, sn) memcpy(d, s, min(dn, sn))?

    1. 4

      Yeah, also, I don’t understand why would they want that.

      Imagine calling memcpy(d, 10, s, 15), and having your data not copied entirely, having your d buffer with cropped data. Garbage, essentially. How would that be better?

      edit: to be clear, I’m not complaining about your suggestion, but about the reasoning of the presenter on this.

      1. 4

        Yeah, also, I don’t understand why would they want that.

        Imagine calling memcpy(d, 10, s, 15), and having your data not copied entirely, having your d buffer with cropped data. Garbage, essentially. How would that be better?

        Cropped data would be a logic error in your application. With standard memcpy the additional 5 bytes overwrite whatever is in memory after the d buffer. This can even enable an attacker to introduce execution of their own code. That’s why ie. Microsoft ships a memcpy_s.

        Reading materials:

        1. 8

          But the unanswered question is why you’re calling memcpy(d, s, 15) instead of memcpy(d, s, 10)? At some level the problem is calling the function with the wrong argument, and adding more arguments maybe doesn’t help.

          1. 4

            Every security exploit can be drilled down to “why were you doing this!”. If there was an obvious answer, security exploit would have been a thing of the past. Meanwhile advocating harm reduction is as good as we can get because even if calling memcpy with a smaller destination is wrong to begin with, truncated data still has a more chance to end up with non-exploitable crash than plain old buffer overflow that often end up with reliable code exec.

            1. 3

              But why do we assume this extra parameter is better than the other parameter which we have assumed is incorrect? Why not add another extra parameter? memcpy_reallysafe(dest, src, destsize, srcsize, destsize_forserious, doublechecksize)

              1. 3

                Because in ten years a line of code can change and the assumptions that made one variable the right one will break. Suddenly you got the wrong variable in there. Personally, I think this is where asserts belong, to codify the assumptions over a long span of time and multiple developers.

                1. 3

                  A common use case of memcpy is to copy a buffer over another. The way program are structure we often end up with srcsize and dstsize that matches their buffer. The error come from the implicit contract that srcsize is always at least bigger than dstsize. Sure, good code would ensure this is always true. Actual code had many instance where it is not. Adding dstsize to memcpy means that this contract is now explicit and can be asserted by the actual function that put this contract in place.

                  I mean, at this point we are not arguing of hypothetical scenario, we have a whole history of this bug class happening over and over again. Simply keeping track of the semantic (Copy one buffer to the other) and asking for all the properties required (Buffer and their size) is a low effort and easy way to prevent many of those bug.

                  1. 1

                    Yeah, keeping track of the buffer size is a very good idea. But if you want it to always be correct, it should be done without requiring the programmer to manually carry the buffer size along in a separate variable from the buffer pointer.

                    Either something like “Managed C++”, where the allocator data structures are queried to figure out the size of the buffer, or something like Rust slices:

                    typedef struct {
                        char *ptr;
                        size_t len;
                    } slice_t;
                    slice_t slice(slice_t slice, size_t start, size_t end) {
                        assert(start <= end);
                        assert((end - start) <= slice.len);
                        slicet.ptr += start;
                        slice.len = end - start;
                        return slice;
                    }
                    slice_t slice_front(slice_t slice, size_t start) {
                        assert(start <= slice.len);
                        slice.ptr += start;
                        slice.len -= start;
                        return slice;
                    }
                    slice_t slice_back(slice_t slice, size_t end) {
                        assert(end <= slice.len);
                        slice.len = end;
                        return slice;
                    }
                    void slicecpy(slice_t dest, slice_t src) {
                        assert(dest.len == src.len);
                        memcpy(dest, dest.len, src);
                    }
                    

                    The point being to make it harder to mix up which len goes with which ptr, plus providing a assert-assisted pointer manipulation in addition to the safe memcpy itself. A safe abstraction needs to account for the entire life cycle of its bounds check, not just the point of use.

                    Also, this would really, really benefit from templates.

      2. 5

        I really like this talk, because it covers how C _API_s can be a really big source of issues. Programming languages can provide foot guns, but nice API designs can take them away, or at least hide them away to make it less likely to accidentally use it.

        So much C code, even among ‘well written’ codebases, mix so many levels of abstraction together (obvious one being memory management) that wouldn’t fly in many other community’s codebases. But instead of going for “rewrite everything in rust” approaches, first trying to clean up the interface can already help with catching a lot of issues.

        Refactoring might be hard in macro-land, but it’s one of my favorite bug-squashing tools

        1. -3

          “Considered Harmful” Essays Considered Harmful (I think “considered dangerous” falls in the same category)

          It’s not difficult to use C correctly. Don’t blame your vulnerabilities on C when the real culprit is your own sloth.

          I’ll concede that C (and it’s API) has quite a few foot guns, but I’ve learned how to avoid them pretty effectively, and I should be able to expect the same from kernel devs. The whole “rewrite everything in <insert promising new lang here>” mentality doesn’t work for large projects (like kernels). To rewrite the Linux kernel in Rust would take months (even if you had all hands on deck). And, who’s to say that Rust wouldn’t change incompatibly three times in the middle?

          1. 21

            It’s not difficult to use C correctly.

            [citation needed]

            There is no evidence to suggest that large codebases written in C can maintain memory safety in the face of that. The counter evidence, that writing code in C/C++ tends to produce large volumes of vulnerabilities, for reasons that are explained by language choice, is plentify. To whit, every major OS (Windows, Linux, macOS), every major browser (Chrome, Firefox, Edge, Safari), every major anti-virus program, every major image parsing library, I can keep going for a while.

            Denialism about the dangers of memory unsafety is not productive, we need to move on to discussing how we address this.

            1. 0

              There is no evidence to suggest that large codebases written in C can maintain memory safety in the face of that.

              Using C correctly means not making large codebases. C isn’t a language for programming in the large.

              1. [Comment removed by author]

                1. 7

                  Yes there is. The default, failure mode of safe languages doing common things is not potential code injection. The default for C language is. Given same bug count, using C will lead to more severe problems. The field results confirm that from fuzzing to CVE’s.

                  1. 4

                    Yes there is. The default, failure mode of safe languages doing common things is not potential code injection.

                    I don’t think this is wrong, exactly, but there’s a 100 exploits related to python pickle, etc. as counterexamples. And java serialize, etc.

                    1. 3

                      Do the memory-safe parts have the memory errors of C (a) at all or (b) as much? And do libraries in concurrency safe languages show same or less races as equivalent in multithreaded C?

                      You’re going to find vulnerabilities in all of them. My side are saying C amplifies that number by default or others greatly reduce it by default. That’s all we’re saying. I think the evidence is already supporting that.

                      1. 1

                        amplify requires some comparative numbers.

                        1. 2

                          The numbers on using C are that the common operations lead to piles of vulnerabilities with code injection. This happens a lot on average. It happens less with veterans but still happens. That’s irrefutable. The numbers on safe languages show the problems mostly lead to compiler failures or DOS’s from runtime checks. The burden of proof is on your side given your side’s stuff is getting smashed the hardest all the time whether the app is small or big.

                          What numbers do you have showing C is safer for average developer than Ada, Rust and so on? And I’m especially interested in fuzzing results of software to see how many potentially lead to code injection among new, half-ass, or just time-constrained programmers in C vs the same in safe, systems languages.

                          1. 1

                            you don’t even have good examples of large scale systems built using some other language that are substantially safer. Until you do, it’s just folklore.

                    2. 0

                      I see a real shortage of example of large-scale systems constructed in any language that are secure and bug free but I am happy to look at references. Like what do we have comparable to Qmail written in something better that has fewer bugs? I know that C has numerous limitations, but in CS we tend to embrace projects that claim a win by hiding a problem by e.g. using pragmas to do the things that are the most buggy as if pushing the problem into the corner made it go away.

                      And the code injection bugs I see are all example of bad engineering - not of bad programming.

                      1. 3

                        There’s bugs and there’s serious bugs that the language causes. The latter are what hackers hit the most. The latter are what we’re talking about, not just bugs in general. The size of the program also doesnt matter since the safe language is immune to the latter by design. Scaling code up just increases odds of severe vulnerabilities in the unsafe, control language.

                        Java and .NET apps are what to look at if you want big ones. Very few CVE’s posted on the apps of the kind you see in C apps. The ones that are posted are usually in C/C++ runtimes or support libraries of such languages. That just illustrates the problem more. The languages whose runtimes arent C have fewer of those since they’re immune or contain them by design.

                        1. 1

                          My impression is that a) the reasons that those c/c++ runtimes show up so much is that these language delegate the most dangerous code such as parsing of raw input or packets or complex interaction with the OS to the C/C++ runtimes where it is possible to do that work and b) the same errors show up in different form in different languages. The massive prevalence of scripting exploits is not due to C but to lazy interface construction where, for example, user inputs are treated as parts of database scripts etc etc. I do not think that “do all the hard stuff in pragmas or C libraries” actually does limit vulnerabilities.

                          1. 1

                            “where it is possible to do that work”

                            The first part is true. That part isn’t. They think lower-level language is better for speed, bit handling, or OS interface. The second part implies you need C to do that work. There’s systems languages which can do that work with more safety than C. So, it’s “possible to do that work” in them without C’s drawbacks. Many low-level programs and OS’s were written in PL/0, PL/S, Ada, Modula-2, Oberon, Modula-3, Clay, and so on. They’re safe by default turning it off only where you need to. C doesn’t do that since it’s designers didn’t care when they were hacking on a PDP-11 for personal use.

                            “b) the same errors show up in different form in different languages. The massive prevalence of scripting exploits is not due to C but to lazy interface construction where, for example, user inputs are treated as parts of database scripts etc etc.”

                            Aside from something language-specific, the logic errors that happen in scripting languages can happen in C, too. You get those errors plus C’s errors plus the catastrophic effect that comes with them being in C. Let’s say you wrote the interpreter in Ada or Rust with safety-checks on. Most of the errors in the interpreter won’t lead to hacks. The extensions would have same property if building on base language like how extensions to C-based programs are often in C having same problems. Platforms like Java that built libraries on C are hit heavily in those C dependencies.

                            Additionally, the extensions could leverage aspects of these languages, such as type or module systems, designed for knocking out integration errors. Finally, if it’s Ada 2012 and SPARK, they can eliminate runtime checks in performance-critical code by using the provers to show they’re not needed if specific pre-conditions pass early on. Unlike Frama-C, they get a good baseline on code they hurried and highest assurance of what they proved.

                            1. 1

                              Data would help. These arguments by what seems sensible to different people don’t go anywhere.

                2. 16

                  To rewrite the Linux kernel in Rust would take months (even if you had all hands on deck).

                  Months? It would take at least 10 years, regardless of headcount.

                  I’ve learned how to avoid them pretty effectively, and I should be able to expect the same from kernel devs.

                  I’m impressed with your abilities, but then something nags me about the order-of-magnitude mistake in your rewrite estimate. Hmm.

                  1. 13

                    It’s not difficult to use C correctly. Don’t blame your vulnerabilities on C when the real culprit is your own sloth. I’ll concede that C (and it’s API) has quite a few foot guns, but I’ve learned how to avoid them pretty effectively, and I should be able to expect the same from kernel devs. The whole “rewrite everything in ” mentality doesn’t work for large projects (like kernels). To rewrite the Linux kernel in Rust would take months (even if you had all hands on deck). And, who’s to say that Rust wouldn’t change incompatibly three times in the middle?

                    I suggest you read the linked article first. The title is clickbait but the content is solid. No one even mentioned Rust or anything else… The guy talks on their effort to reduce the foot guns in the kernel code…

                    Here is a quote for the lazy:

                    Kees Cook gave a presentation on some of the dangers that come with programs written in C. In particular, of course, the Linux kernel is mostly written in C, which means that the security of our systems rests on a somewhat dangerous foundation. But there are things that can be done to help firm things up by “Making C Less Dangerous” as the title of his talk suggested.

                    1. 4

                      I suggest you read the linked article first.

                      Ok, you got me, I only skimmed the article and I didn’t see any mention of rewrite until the comments (it was literally the first response to the second comment). Although I do hear that mentality about other large projects (such as Firefox) as well. I guess I should’ve said “Clickbait considered harmful” ;-)

                      I’ve read some more of the article and he seems to know what he’s talking about but I would like to see the original talk.

                      As far as reducing foot guns, I guess Linux did start out as just one guy so I can understand a lot of foot shooting, but it’s been years and I would’ve thought that things like VLAs would’ve been avoided in the kernel. Then agian, I’ve never worked on a project as large as Linux so i guess I’m not the best judge of such things.

                      1. 4

                        Ok, you got me, I only skimmed the article and I didn’t see any mention of rewrite until the comments (it was literally the first response to the second comment). Although I do hear that mentality about other large projects (such as Firefox) as well.

                        Agreed. It’s annoying as hell, and the loud-mouths never do the work.

                        I guess I should’ve said “Clickbait considered harmful” ;-)

                        Funny because the talk is titled ‘Making C Less Dangerous’ - the lwn reporter is actually responsible for the horrible title that misrepresents the content and invites rewrite talks. I think this is the first time I’m using the lobste.rs ‘suggest’ a new title option to rename the link to ‘Making C Less Dangerous’ disrespecting the reporters chosen title. This is an abstract of the talk so keep the title close to the content.

                    2. 7

                      Literally 20+ years of unending computer security exploits disagree with you.