1. 48
    1. 42

      In case anyone wants to cross-check, out of the 23 curl CVEs in 2016, at least 10 (1, 2, 3, 4, 5, 6, 7, 8, 9, 10) are due to C’s manual memory management or weak typing and would be impossible in a memory-safe, strongly-typed language. (Note that, while I like Rust and it seems to have been the motivator for this post, many modern languages meet this bar.) While “slightly more than half” as non-C-related vulnerabilities may technically be “most”, I’m not sure it’s fitting the spirit of the term.

      There are some very compelling advantages to C, certainly, which the author enumerates; in particular, its portability to nearly every platform in existence is a major weakness of Rust (and, to the best of my knowledge, any other competitor) at the moment. But it’s very important to note that nontrivial C code practically always contains serious vulnerabilities, and nothing we’ve tried (especially “code better”, the standard advice for avoiding C vulnerabilities) works to prevent them. We should be conscious that, by writing C, we are trading away security in favor of whatever benefits C provides at that moment.

      edit: It’s worth noticing and noting, as I failed to, that 2016 was an unusual year for curl vulns. /u/amaurea on Reddit helpfully counted and cataloged all the vulns on that page, and 2016 is an obvious outlier for raw count, strongly suggesting an audit or new static analysis tool or something. However, the proportion of C to not-C bugs is not wildly varied over the entire list, so the point stands.

      1. 9

        […] 2016 is an obvious outlier for raw count, strongly suggesting an audit or new static analysis tool or something.

        It was an audit.

      2. 5

        especially “code better”, the standard advice for avoiding C vulnerabilities

        If the curl codebase is as bad as its API then this is honestly a completely fair response.

        We had this code recently:

        int status;
        void * some_pointer;
        curl_easy_getinfo( curl, CURLINFO_RESPONSE_CODE, &status );
        

        which trashes some_pointer on 64bit Linux because curl_easy_getinfo( CURLINFO_RESPONSE_CODE ) takes a pointer to a long and not an int. The compiler would normally warn about that, but curl_easy_getinfo is a varargs function, which brings no benefits and means the compiler can’t check the types of its arguments. WTF seriously? Why would you do that??

        I also recall reading somewhere that curl is over 100k LOC, which is insane. If the HTTP spec actually requires the implementation to be that large (and it wouldn’t surprise me if it does), then you are free to, and absolutely should, just not implement all of it. If the spec is so unwieldy that nobody could possibly get it right, then why try? Implement a sensible subset and call it a day.

        If you know you’re not going to be using many HTTP features, it’s not hard to implement it yourself and treat anything that isn’t part of the tiny subset you chose as an error. For example, it’s only a few hundred lines to implement synchronous GET requests with non-multipart responses and timeouts, and that’s often good enough.

        1. 5

          I also recall reading somewhere that curl is over 100k LOC, which is insane. If the HTTP spec actually requires the implementation to be that large (and it wouldn’t surprise me if it does), then you are free to, and absolutely should, just not implement all of it.

          curl supports a lot more protocols than just http though.

          1. 3

            Indeed. From the man page.

            curl is a tool to transfer data from or to a server, using one of the supported protocols (DICT, FILE, FTP, FTPS, GOPHER, HTTP, HTTPS, IMAP, IMAPS, LDAP, LDAPS, POP3, POP3S, RTMP, RTSP, SCP, SFTP, SMB, SMBS, SMTP, SMTPS, TELNET and TFTP).

            1. 1

              damn, that’s a juicy attack surface

        2. 3

          CURL is highly compatible with a lot of the strange behaviors that browsers do support and are usually outside of (or even prohibited by) the spec/standard. Just implementing the spec doesn’t quite make it useful to the world, when the world isn’t even spec compliant. Even if you write down the standard, the real standard is what all the other browsers do, not what a piece of paper says.

          Example: https://github.com/curl/curl/issues/791

          1. 1

            But it is useful even if you only implement a tiny subset of HTTP, because most use cases involve sending trivial requests to sensible servers.

            1. 3

              The point is that cURL isn’t a project that supplies that subset, regardless of it being useful or not. cURL supplies a complete and comprehensive package that runs pretty much anywhere and supports pretty much any protocol you might need at some point (and some you might not need).

              Nothing wrong in making a slimmed down works-most-of-the-time-and-will-be-enough-for-most-people project, it might be very useful indeed, but thats not the goal of the cURL project. There’s space for both.

            2. 1

              This is the way. Start small. I would assume that 90% of the use cases for curl is just some simple HTTP(S) queries and that can be implemented in any language quite quickly.

              For example, D currently has curl in its standard library, which will probably be deprecated and removed. For simple HTTP(S) queries, there is requests, which is pure D except for the ssl and crypto stuff.

      3. 8

        nothing we’ve tried works to prevent them

        Formal verification actually works. seL4 exists.

        1. 10

          Verifying seL4 took a few years and it was roughly 10000 LoC. Curl has an order of magnitude more. 113316 as counted by sloccount on the Github repo right now. Verification is getting easier, but only very slowly.

          There is no immediate commercial advantage since curl works fine. This leaves it to academia to get the ball rolling.

          1. 4

            Verifying seL4 took a few years and it was roughly 10000 LoC.

            Formally verifying 15,000ish lines of Haskell-generated C in seL4 took ~200,000 lines of proof, actually, per this. Formally verifying all of curl would easily run into the millions of lines of proof – and you’d basically be rewritting it into C-writing Haskell to boot.

        2. 3

          seL4 has two versions, a Haskell version that’s used to verify model safety and a C version that’s just a translation of the Haskell version. It may actually be a bit of a counter-example to your claim (that formal verification on C works in practice).

          1. 1

            This is incorrect. seL4 project actually proved C version is equivalent to (technically, refines) Haskell version. And then they (semi-automatically) proved generated assembly is equivalent to (refines) C so that they don’t need to rely on C compiler correctness.

      4. 2

        Yes but a lot of these are only published and fixed because curl is so widely used—and scrutinized. For example number 2 on your list:

        If a username is set directly via CURLOPTUSERNAME (or curl’s -u, –user option), this vulnerability can be triggered. The name has to be at least 512MB big in a 32bit system. Systems with 64 bit versions of the sizet type are not affected by this issue.

        Literally this doesn’t matter.

        Also, how would Rust prevent this? I’m pretty sure multiplication overflow happens in Rust too.

        1. 14

          Rust specifies that:

          1. If overflow happens, it is a “program error,” but is well-defined as two’s compliment wrapping.
          2. In debug builds, overflow must be checked for and panic.

          In the future, if overflow checking is cheap enough, this gives us the ability to require it. Who knows when that’ll ever be :)

          Also note that this means it might lead to a logic error, but not a memory safety error. Just by making it defined helps a lot.

          1. 3

            Is there a formal or semi-formal Rust specification anywhere?

            1. 9

              Not quite yet; or at least, it’s not all in one place. While all those universities are working on formalisms, we’re not working hard to get one in place, since it’d have to take that work into account, which would mean throwing stuff out and re-writing it that way, I’d imagine.

              There is some work going on to make the reference (linking to nightly docs since some work has recently landed to split it up into manageable chunks) closer to a spec; there’s also been an RFC accepted that says before stabilization, we must have the reference up-to-date with the changes, but we have to backfill all the older ones. So currently, it’s always accurate but not complete.

              This area is well-specified though, in RFC 560 https://github.com/rust-lang/rfcs/blob/master/text/0560-integer-overflow.md (one RFC I refer to so often I remember its number by heart)

              1. 1

                Thank ye

          2. 2

            That’s neat! Still, I find it hard to believe anything would have coverage of all multiplication errors in allocations, even if it were written in Rust. If anyone can show me a single Rust project that deliberately trips the debug panic for multiplication errors during allocation in its unit tests, I’ll be impressed. But I’ll bet the only way to really be robust against this class of error is to use something like OpenBSD’s reallocarray. That’s equally possible in C and Rust.

            1. 3

              I do have an few overflow tests in one of my projects, but not for that specifically: https://github.com/steveklabnik/semver-parser/blob/master/src/range.rs#L682

              We have pretty decent fuzzer support, seems like that might be something it would be likely to find.

              1. 2

                I guess that depends on how often you run your fuzzer on 32 but systems long enough for it to accumulate gigabytes of input.

                The example here triggers after half a gig, but many of this class of bug would need more.

    2. 9

      Maybe the lesson here is more along the lines of: complicated protocols suck because they beget bugs.

    3. 6

      It’s been over a decade since I’ve programmed in C. I read through the source code for this project and now I remember what I hated about C.

      There’s a ton of laziness in this code about checking types. It’s not self-documenting at all. There’s a lot of if (!timeout), if (!ptr), if (!...) where that ... could be an int, a pointer, a char, or any number of other things. Then you see a lot of commits trying to fix bad assumptions that those values would be 0. For example, in the timeout case the code initially only checked whether it was 0 or nonzero. Then later there was a commit to take into account the fact that the timeout could be negative.

      I have no idea what this function does but in any sane modern language that function would return a type that you could switch over if the expected behaviors are for timeout < 0, timeout == 0, and timeout > 0.

      The commit that changed

      else if('\\' == *ptr) {
      

      to this

      else if('\\' == *ptr && ptr[1]) {
      

      is only because of C’s uniquely bad string type

      There are literally hundreds of commits fixing memory leaks.

      My bet is that if you were to re-write curl in a newer language, you might miss out on some functionality but you wouldn’t have nearly as many security holes.

      1. [Comment removed by author]

        1. 10

          I’m super curious where you thought you were going with this? Some weird form of “since theres no alternative, (this is the best way|stop bitching about it)”? Or maybe “well if it’s so bad, then you get in there and fix it!”?

          Are we not allowed to comment on bad source code simply because its widely used or that we haven’t written a similarly popular library? Objectively bad source code can and does exist. This code wasn’t written by just one person, it was written by many.

          Maybe you were genuinely asking for examples on other software. If so, you should know that your demand for examples comes off as vaguely hostile. And then to ask everyone who doesn’t understand you to “grow up” reinforces the perceived hostility in your original comment.

          My apologies if you’re already aware of all this, the last thing I want is a confrontation.

          1. 1

            I read an airy

            My bet is that if you were to re-write curl in a newer language, you might miss out on some functionality but you wouldn’t have nearly as many security holes.

            and I asked for for backup on the implied comparison. It’s easy to assert that something that works is terrible compared to what you imagine it should be, but there are actual engineering reasons why such a high percentage of the internet machinery is written in C/C++ or Java and not in some “newer language”. What are the examples of solid, widely used, systems components written in something newer or better and what does the CVE track of those examples look like. If you have them, that would be interesting. If you don’t then the critique is lacking traction.

        2. [Comment removed by author]

    4. 6

      I think this is spot-on. I love Rust, and write it as often as I can, but sometimes members of the RIIR crowed can get a little carried away, and imo, actually damage the credibility of the rust community by advocating for rewriting everything-but-the-kitchen-sink in rust.

      1. 24

        That last largest group running a “IN RUST WE TRUST”-viral thingy was The Register. What should we do in this case? They are not “the Rust community”. There’s a notable bunch of people who never tried Rust in practice who root for this. It’s the more annoying part of my work to them that doesn’t make any sense.

        You may note that the Rust project itself takes a very concious non-agressive stance on this issue (putting down other peoples work is generally frowned upon). We cannot do much more then this.

        Re-implementations are obviously an interesting thing, but Daniel puts it right: he’s the maintainer of curl, not of of a curl-compatible library. He lays out the practical points very right. He also welcomes re-implementations in that post , which would be the very FOSS way.

        Also note that he literally mentions Rust once, in the intro, I don’t think that’s his beef.

        The kitchen sink, I’d rewrite though. Mine’s leaking again.

        1. 4

          You may note that the Rust project itself takes a very concious non-agressive stance on this issue (putting down other peoples work is generally frowned upon). We cannot do much more then this.

          It is a rough place to be in, and you have my condolences. :(

          There’s got to be some way of saying, though, “Hey, if you’re doing this, please please please knock it off.”

        2. 3

          I think my comment was poorly worded, in that I don’t think “the rust community” has done anything wrong, or not-done something “they” should have done. My fear is only that I’m afraid the RIIR-crowd, when viewed from outside the community, might give others cause to negatively stereotype “the rust community,” though this would be unfair. I also agree that Daniel doesn’t seem to have any beef with rust or the community, but I definitely was thinking of the RIIR crowd while reading Daniel’s post. This is my own fault, and I didn’t mean to ascribe any motivations to Daniel that weren’t actually there.

          1. 4

            Okay, I can see that. I must say that my biggest fear for Rust is it being pushed into too many places to soon, in a too cargo-culty matter. That will might cost us dearly.

            Luckily, I know that most community members in positions where they could actually make such a push are really careful about that.

      2. -3

        I assume they’re advocating to rewrite the kitchen sink too.

    5. 3

      Pretty good article which I mostly agree with aside from his assertions about most of curl’s bugs not being C-centric.

      1. 1

        …but a disproportionate share of the CVE bugs are, FWIW.