Could have? It’s been known since at least 1989 (okay, November of 1988) that gets() was a bad function, but it was probably too late to get it removed from the C89 standard. It was deprecated in 1999 (C99) and removed in 2011 (C11). The code in question was written in 2013! Who ever wrote the code did not bother reading any compiler warnings, or didn’t know C well enough to know that pointers (and arrays) don’t include limits. I find it amazing in this day and age that this went seven years without notice.
I think it has been known since about 1975 that gets() is bad*. Modern language communities no longer take 24 years to officially deprecate an API after it has been recognized as bad, and another 12 years to remove, so maybe that’s progress.
*Mike Lesk wrote the Portable I/O Library for C around 1972-1973. Stdio was in use by 1975; it was written as a replacement: much faster, not fully backwards compatible. As far as I can reconstruct, gets() was left in stdio to help migrate code from the old library to stdio (but I’m missing access to some historical documents that would verify this). fgets() used a deliberately different interface, because the buffer overflow problem of gets() was obvious even in 1975. K&R first edition (1978) documents stdio and fgets(), but leaves out gets(), providing source code for a getline() function instead that reads stdin using a safe interface.
Modern language communities don’t have to deal with OS vendors, because they have their own packages and package management, and modern language communities can be that independent because of the Internet. You can see the beginnings of this in how gcc and GNU libc spread to become de-facto standards even at sites where the OS maker didn’t package them, because they were an FTP away.
The Morris Worm was the first malware attacking the internet, using a gets() buffer overflow in fingerd to take control of thousands of Unix systems. Read about it here (Gene Spafford’s Nov 1988 analysis): https://spaf.cerias.purdue.edu/tech-reps/823.pdf
C and C++ are IMO super dangerous — on the level of using a chainsaw without gloves or eye protection — unless you at least turn on -Wall -Werror. In my projects I turn on a few dozen more Clang warning flags too, and my debug builds always have the Address Sanitizer and Undefined Behavior Sanitizer enabled. Then it’s a pretty reasonable experience where stupid mistakes are usually caught at compile time or in unit tests.
It would be nice if all those warnings were on by default and made into errors, but I guess it would break too much old code…
There’s another solution that’s conformant and doesn’t require any new language features:
#define gets (sizeof(struct{char _;_Static_assert(0,"can't use gets");}),gets)
(This protects you from code that creates pointers to gets, which the naïve _Static_assert solution doesn’t.)
Freebsd went a step further and removed the function entirely; so you can’t even link against it.
In fact, you should use -Wall-Wextra (on GCC or Clang) or -W4 (on MSVC) and fix everything it complains about
Disagree. Many warnings are trivial, and which ones are valid can depend on project idiosyncrasies. (That being said, warnings about implicit declarations are universally valid.)
Yes; my solution is slightly nicer, though, as it doesn’t inhibit static analysis or error recovery. It also prevents you from taking pointers to banned functions, which git doesn’t.
Git prevents taking the address of the function doesn’t it? If you write, say, void *ptr = &strcat, that will expand to void *ptr = &sorry_strcat_is_a_banned_function, which will error (with a reasonably nice error message).
You’d need to add args to the macro, otherwise it’s not backwards-compatible:
char *gets = "gnu ets";
Replacing a function with a macro has other side effects. In this case they’re probably just fine, but I’ve been burned when I’ve tried to patch up API of my library with macros:
#undef macro starts affecting the behavior
The preprocessor doesn’t understand inline initializers like (int[]){1,2,3} and parses it as (int[]){1 followed by two other macro arguments.
Good point…not sure if there’s a good solution there.
designated initializers
You can paper this over with an extra set of parentheses, as in ((int[]){1,2,3}). It’s not always practical to, but sometimes I do a variadic macro and parenthesize __VA_ARGS__ before passing the result off to another macro.
Another option could have been to redefine gets to write at most 1 character. The buffer is guaranteed to be at least 1 char large, so that would make gets safe. That would even be backwards-compatible for applications that just read “continue Y/N?” :)
(Replace bool with int and it would be perfectly valid C89 code.)
Minor nitpick, but you’d also have to move the declaration of r to the start of the block. Being able to declare variables anywhere in C99 is a significant change from C89.
Could have? It’s been known since at least 1989 (okay, November of 1988) that
gets()
was a bad function, but it was probably too late to get it removed from the C89 standard. It was deprecated in 1999 (C99) and removed in 2011 (C11). The code in question was written in 2013! Who ever wrote the code did not bother reading any compiler warnings, or didn’t know C well enough to know that pointers (and arrays) don’t include limits. I find it amazing in this day and age that this went seven years without notice.I think it has been known since about 1975 that
gets()
is bad*. Modern language communities no longer take 24 years to officially deprecate an API after it has been recognized as bad, and another 12 years to remove, so maybe that’s progress.*Mike Lesk wrote the Portable I/O Library for C around 1972-1973. Stdio was in use by 1975; it was written as a replacement: much faster, not fully backwards compatible. As far as I can reconstruct,
gets()
was left in stdio to help migrate code from the old library to stdio (but I’m missing access to some historical documents that would verify this).fgets()
used a deliberately different interface, because the buffer overflow problem ofgets()
was obvious even in 1975. K&R first edition (1978) documents stdio andfgets()
, but leaves outgets()
, providing source code for agetline()
function instead that readsstdin
using a safe interface.Modern language communities don’t have to deal with OS vendors, because they have their own packages and package management, and modern language communities can be that independent because of the Internet. You can see the beginnings of this in how
gcc
and GNU libc spread to become de-facto standards even at sites where the OS maker didn’t package them, because they were an FTP away.What event are you referring to here?
The Morris Worm was the first malware attacking the internet, using a
gets()
buffer overflow infingerd
to take control of thousands of Unix systems. Read about it here (Gene Spafford’s Nov 1988 analysis): https://spaf.cerias.purdue.edu/tech-reps/823.pdfNot really, that example was always part of uthash since its first release so it was written by Troy D. Hanson somewhere between 2004 and 2006.
Moreover I have no doubt that Troy knew his code wasn’t safe. But it was a simple example in a manual so he probably didn’t care.
(Courtesy of The Unix Hater’s Handbook).
If this isn’t a poster child for just how bad C really is, I don’t know what is.
C and C++ are IMO super dangerous — on the level of using a chainsaw without gloves or eye protection — unless you at least turn on
-Wall -Werror
. In my projects I turn on a few dozen more Clang warning flags too, and my debug builds always have the Address Sanitizer and Undefined Behavior Sanitizer enabled. Then it’s a pretty reasonable experience where stupid mistakes are usually caught at compile time or in unit tests.It would be nice if all those warnings were on by default and made into errors, but I guess it would break too much old code…
There’s another solution that’s conformant and doesn’t require any new language features:
(This protects you from code that creates pointers to
gets
, which the naïve_Static_assert
solution doesn’t.)Freebsd went a step further and removed the function entirely; so you can’t even link against it.
Disagree. Many warnings are trivial, and which ones are valid can depend on project idiosyncrasies. (That being said, warnings about implicit declarations are universally valid.)
Git has something like this, which also doesn’t require any new language features: https://github.com/git/git/blob/master/banned.h
Yes; my solution is slightly nicer, though, as it doesn’t inhibit static analysis or error recovery. It also prevents you from taking pointers to banned functions, which git doesn’t.
Git prevents taking the address of the function doesn’t it? If you write, say,
void *ptr = &strcat
, that will expand tovoid *ptr = &sorry_strcat_is_a_banned_function
, which will error (with a reasonably nice error message).No. The definition is
#define strcat(x,y)
. It only matches function-like macro invocations, not others.You’d need to add args to the macro, otherwise it’s not backwards-compatible:
Replacing a function with a macro has other side effects. In this case they’re probably just fine, but I’ve been burned when I’ve tried to patch up API of my library with macros:
#undef macro
starts affecting the behavior(int[]){1,2,3}
and parses it as(int[]){1
followed by two other macro arguments.Good point…not sure if there’s a good solution there.
You can paper this over with an extra set of parentheses, as in
((int[]){1,2,3})
. It’s not always practical to, but sometimes I do a variadic macro and parenthesize__VA_ARGS__
before passing the result off to another macro.Reads like a feature to me.
Of course blowing up at compile time would have been even better
Another option could have been to redefine
gets
to write at most 1 character. The buffer is guaranteed to be at least 1char
large, so that would makegets
safe. That would even be backwards-compatible for applications that just read “continue Y/N?” :)Minor nitpick, but you’d also have to move the declaration of
r
to the start of the block. Being able to declare variables anywhere in C99 is a significant change from C89.