1. 36

  2. 8

    The last two paragraphs are really important.

    Be knowledgeable about what’s actually in the C and C++ standards since these are what compiler writers are going by. Avoid repeating tired maxims like “C is a portable assembly language” and “trust the programmer.”

    Unfortunately, C and C++ are mostly taught the old way, as if programming in them isn’t like walking in a minefield. Nor have the books about C and C++ caught up with the current reality. These things must change.

    There’s no excuse for invoking UB anymore. Either you have a use case that absolutely requires the ridiculous features of C and C++ (in which case you better be damn sure you’re not stepping on a mine), or you’re performing malpractice by using an inappropriate language.

    1. 20

      I think a lot of people don’t understand just how easy it is to write code with undefined behavior in it. Because C and C++ programmers often aren’t taught about it when learning the language, when they do hear about it, they think “oh, that’s something that other people’s code has to deal with. I write good code, and I’m this doesn’t affect me. If it did, I would have been taught it when I learned the language.”

      This is why I value John Regehr’s work so much. He’s been beating the drum of UB as a danger to take seriously for a while, and I’m really happy to see his stuff getting more traction in the programming community.

      I’m going to be teaching a class on C programming soon, and I’ll absolutely put UB front and center in it; maybe with people like John Regehr pushing for better education and better tooling, more people will do the same.

      1. 1

        What resources do you guys recommend to learn C?

        1. 2

          I think anyone trying to learn the language properly needs at a very minimum, the (draft) standards. There are far too many books, tutorials, etc. that just gloss over important details or make grossly misleading simplifications.

          Then read (good) man pages for library functions. Read code. Write code. Lots of code. Keep an eye out on some projects and watch it as bugs get fixed. Try to understand the bug and the fix.

          A book might or might not help. I’ve only read one book on C – Expert C Programming. It was an ok read but I didn’t learn too much from it. A newcomer might find it more useful however.

          EDIT: taossa ch6 is also a great read for anyone doing C. http://ptgmedia.pearsoncmg.com/images/0321444426/samplechapter/Dowd_ch06.pdf

          Alternative link: https://trailofbits.github.io/ctf/vulnerabilities/references/Dowd_ch06.pdf

      2. 5

        I could agree with that, but there’s a lot of code written 20 years ago or more. And it’s not so easy to simply use an old compiler. Want to run arm64? Get a new compiler. And the arm64 platform would be rather less interesting if you simply banned all old software.

        1. 1

          That’s a great point about old software being used on new hardware. In such cases, would it be better to use a compiler with defined (and safe) behavior in all UB? (Presumably the performance of the new hardware is better than the old hardware it was originally written for, so you could afford things like symbolic addresses, bounds checking, “gcc-x86-like” overflow, etc.)

          It still seems like it would be better to fix the old software so it doesn’t invoke UB, but we all agree how hard that is.

          1. 2

            There’s quite a lot of UB in C, much of it designed so that different platforms can each use their “native” support for various things like addressing modes, alignment, etc. If you really wanted to define all behavior to be essentially equivalent to what gcc on x86 traditionally does, this would amount to a pretty significant runtime layer on non-x86 platforms emulating x86-like behavior. At which point why even use C, not something that actually has a portable runtime and/or bytecode defined from the start? I guess purely for legacy software it’d make sense.

            1. 1

              That’s a great point about old software being used on new hardware. In such cases, would it be better to use a compiler with defined (and safe) behavior in all UB?

              There are so many people proposing this that eventually one might start to believe it is actually a thing.

              But I have not seen such a compiler and I doubt I ever will.