1. 27
  1.  

  2. 17

    Could have? It’s been known since at least 1989 (okay, November of 1988) that gets() was a bad function, but it was probably too late to get it removed from the C89 standard. It was deprecated in 1999 (C99) and removed in 2011 (C11). The code in question was written in 2013! Who ever wrote the code did not bother reading any compiler warnings, or didn’t know C well enough to know that pointers (and arrays) don’t include limits. I find it amazing in this day and age that this went seven years without notice.

    1. 9

      I think it has been known since about 1975 that gets() is bad*. Modern language communities no longer take 24 years to officially deprecate an API after it has been recognized as bad, and another 12 years to remove, so maybe that’s progress.

      *Mike Lesk wrote the Portable I/O Library for C around 1972-1973. Stdio was in use by 1975; it was written as a replacement: much faster, not fully backwards compatible. As far as I can reconstruct, gets() was left in stdio to help migrate code from the old library to stdio (but I’m missing access to some historical documents that would verify this). fgets() used a deliberately different interface, because the buffer overflow problem of gets() was obvious even in 1975. K&R first edition (1978) documents stdio and fgets(), but leaves out gets(), providing source code for a getline() function instead that reads stdin using a safe interface.

      1. 4

        Modern language communities don’t have to deal with OS vendors, because they have their own packages and package management, and modern language communities can be that independent because of the Internet. You can see the beginnings of this in how gcc and GNU libc spread to become de-facto standards even at sites where the OS maker didn’t package them, because they were an FTP away.

      2. 4

        November of 1988

        What event are you referring to here?

        1. 6

          The Morris Worm was the first malware attacking the internet, using a gets() buffer overflow in fingerd to take control of thousands of Unix systems. Read about it here (Gene Spafford’s Nov 1988 analysis): https://spaf.cerias.purdue.edu/tech-reps/823.pdf

        2. 2

          The code in question was written in 2013!

          Not really, that example was always part of uthash since its first release so it was written by Troy D. Hanson somewhere between 2004 and 2006.

          Moreover I have no doubt that Troy knew his code wasn’t safe. But it was a simple example in a manual so he probably didn’t care.

        3. 16

          Q. Where did the names “C” and “C++” come from?

          A. They were grades.

          (Courtesy of The Unix Hater’s Handbook).

          If this isn’t a poster child for just how bad C really is, I don’t know what is.

          1. 4

            C and C++ are IMO super dangerous — on the level of using a chainsaw without gloves or eye protection — unless you at least turn on -Wall -Werror. In my projects I turn on a few dozen more Clang warning flags too, and my debug builds always have the Address Sanitizer and Undefined Behavior Sanitizer enabled. Then it’s a pretty reasonable experience where stupid mistakes are usually caught at compile time or in unit tests.

            It would be nice if all those warnings were on by default and made into errors, but I guess it would break too much old code…

          2. 6

            There’s another solution that’s conformant and doesn’t require any new language features:

            #define gets (sizeof(struct{char _;_Static_assert(0,"can't use gets");}),gets)
            

            (This protects you from code that creates pointers to gets, which the naïve _Static_assert solution doesn’t.)

            Freebsd went a step further and removed the function entirely; so you can’t even link against it.

            In fact, you should use -Wall -Wextra (on GCC or Clang) or -W4 (on MSVC) and fix everything it complains about

            Disagree. Many warnings are trivial, and which ones are valid can depend on project idiosyncrasies. (That being said, warnings about implicit declarations are universally valid.)

            1. 3

              Git has something like this, which also doesn’t require any new language features: https://github.com/git/git/blob/master/banned.h

              1. 2

                Yes; my solution is slightly nicer, though, as it doesn’t inhibit static analysis or error recovery. It also prevents you from taking pointers to banned functions, which git doesn’t.

                1. 2

                  Git prevents taking the address of the function doesn’t it? If you write, say, void *ptr = &strcat, that will expand to void *ptr = &sorry_strcat_is_a_banned_function, which will error (with a reasonably nice error message).

                  1. 2

                    No. The definition is #define strcat(x,y). It only matches function-like macro invocations, not others.

              2. 2

                You’d need to add args to the macro, otherwise it’s not backwards-compatible:

                char *gets = "gnu ets";
                

                Replacing a function with a macro has other side effects. In this case they’re probably just fine, but I’ve been burned when I’ve tried to patch up API of my library with macros:

                • #undef macro starts affecting the behavior
                • The preprocessor doesn’t understand inline initializers like (int[]){1,2,3} and parses it as (int[]){1 followed by two other macro arguments.
                1. 2

                  shadowing

                  Good point…not sure if there’s a good solution there.

                  designated initializers

                  You can paper this over with an extra set of parentheses, as in ((int[]){1,2,3}). It’s not always practical to, but sometimes I do a variadic macro and parenthesize __VA_ARGS__ before passing the result off to another macro.

              3. 5

                Instead of giving the possibility of a buffer overflow, now it gives the certainty of a segfault!

                Reads like a feature to me.

                Of course blowing up at compile time would have been even better

                1. 5

                  Another option could have been to redefine gets to write at most 1 character. The buffer is guaranteed to be at least 1 char large, so that would make gets safe. That would even be backwards-compatible for applications that just read “continue Y/N?” :)

                  1. 4

                    (Replace bool with int and it would be perfectly valid C89 code.)

                    Minor nitpick, but you’d also have to move the declaration of r to the start of the block. Being able to declare variables anywhere in C99 is a significant change from C89.