1. 16
  1.  

  2. 22

    Clang and GCC both support type-checking the parameters to printf and related functions; you just have to enable the right warning flag (which I don’t have handy right now on my lawn chair.) You can even enable this checking on your own functions that take format strings + varargs, by tagging them with an attribute.

    This has been around for more than 10 years. Given how easy it is to make mistakes with printf, every C or C++ programmer should enable these warnings (and -Werror) if their compiler supports them.

    (Yeah, C++ has its own formatted output library, and I sometimes use it, but it generates very inefficient code.)

    1. 8

      -Wall -Werror. And when developing, -fsanitize=memory.

      1. 10

        I have nothing against -Werror for CI, or for development, or for anything like that.

        But please, please don’t use -Werror by default if you’re working on an open-source project where you want others to come and compile your code. Inevitably, a new compiler version comes around which introduces a new warning or makes an existing warning trigger in more situations. There are so, so many cases where I’ve cloned a project, then followed the build instructions exactly, only to realize that my compiler is slightly different than the one the author is using, and the build fails due to -Werror. It’s never fun to have to dive deep into an unknown build system to disable -Werror when you just want to compile something.

        -Werror for CI and for development. -Wno-error for the default debug and release builds. Please.

        Even if you’re the only developer and user, -Werror is probably gonna bite you while bisecting. Your 2 year old commits probably contain code which your current compiler will warn about.

        1. 0

          I mean you can just disable -Werror while you’re developing, no big deal.

          1. 3

            What do you mean? My main points are about the experience for a user who wants to compile your software, not the experience for the developer.

            1. 0

              I mean if they want to compile it then presumably they know their way around a Makefile or a config script. Users shouldn’t impose development practices on the project developers.

              1. 4

                What? The needs of your users shouldn’t affect development? That literally doesn’t make sense. You’re developing software for your users, no? Presumably all development should be influenced by the needs of your users, right?

                If you make software where no user is going to compile your software from source, and where you’re not looking for contributors, and won’t ever ask people to make a build with debug symbols enabled to get a stack trace, and you release binaries for every conceivable system, then sure, do whatever you want. But most open source projects will have some users who want to compile from source, and you should probably adapt the software to the needs of your users. Most software acknowledges this by putting build instructions prominently in the readme.

                You can at least do the slightest amount of effort to make your software nicer to use for your users. If you care at all, avoid intentionally breaking their build by default. Please.

                1. 1

                  EDIT: btw, your point:

                  -Wno-error for the default debug and release builds.

                  Is super reasonable and I totally agree for release builds.

                  But for development…

                  The needs of your users shouldn’t affect development? That literally doesn’t make sense. You’re developing software for your users, no? Presumably all development should be influenced by the needs of your users, right?

                  The needs of users should influence the user experience of the software, not the development process of the software. If a user wants to compile from source it would surely be easier if the compilation disabled safety checks like say the typechecker phase. But it would also be massively inconvenient for the developers. Like I said, when it comes to development process, the needs of the developers outweigh the needs of users.

                  If you care at all, avoid intentionally breaking their build by default. Please.

                  Developers don’t intentionally break builds. At least, assuming a reasonable developer. What we don’t like is demands from users that would compromise our development practices. If a compiler warning breaks a build, the user of the open source software hopefully takes the time to file a ticket so the dev is aware and can fix it. Open source is a give-and-take, it’s not one side (developers) continuously caving to user demands.

            2. 1

              Use it for CI if you want it. Otherwise the point about compilers stands.

              1. 1

                No, it doesn’t, as I explained elsewhere in this thread. By default disabling safety checks during the normal development process just leads to more buggy and unmaintainable code over time.

                1. 1

                  They’re warnings for a reason. If you want to ensure your checked-in code has no warnings, put them in CI. But don’t burden your users.

                  1. 1

                    Don’t optimize for the niche cases. Users typically don’t compile software themselves, they install from pre-compiled packages or download pre-built executables. In the normal course of things it’s much more likely that packagers will be the ones actually compiling/setting up builds. And if their packaging system is any good, they will lock down the versions of things like C compiler as much as possible, to ensure reproducibility.

          2. 4

            I would make that -Wall -Wextra -Werror=format -Werror=switch, plus -Werror and -fsanitize=… for the developer build.

            -Wall is effectively the first warning level. There is also -Wextra and -Wpedantic.

            I would reserve -Werror for the developer build, unless you want to break the ability to compile your current software with future compilers (since compiler writers come up with new warnings all the time). In particular, if you have users, you don’t want them to have to debug this.

            The problem is of course false warnings – some warnings are pure noise in my opinion:

            if(size == argc) // -Wsign-compare
            

            Lastly, if you care about specific warnings (who doesn’t?), it’s safer (safe enough, I suppose) to turn just those into errors, such as with -Werror=format (which catches this) and -Werror=switch (which catches unhandled switch cases). Those are some quality warnings on top of my head that are no reason to tolerate once your code is written to that standard.

            Also, -Werror=sometimes-uninitialized for clang.

            1. 2

              At least the clang policy is for -Wextra to include things that have a lot of false positives, whereas -Wall includes things that have very low false-positive reates. You should think very carefully about enabling it by default for two reasons:

              • Every false positive a developer encounters for a warning makes them more likely to just do the thing that silences the warning, rather than fix the issue.
              • Your codebase will include an increasing number of things that just silence warnings, which makes the code less readable and makes it easier for bugs to sneak in.
              1. 1

                Are you sure you don’t mean -Weverything? Apart from -Wsign-compare (which is in -Wall in g++ and -Wextra in gcc), which I would say that about, I find -Wextra and even -Wpedantic quite livable.

                1. 4

                  At least the last time we tried to write a policy for this, the consensus was:

                  • -Wall is things that are unlikely to cause false positives and we’d like to recommend everyone uses.
                  • -Wextra is things with too high a false positive rate for -Wall.
                  • -Weverything is all warnings including experimental ones that may give completely nonsense results.

                  I wouldn’t recommend -Weverything for anyone who is not working on the compiler. As I recall, -Wsign-compare had quite a bit of discussion because it does have a fairly high false positive rate, but it also catches a lot of very subtle bugs and it encourages people to be more consistent about their use of types, so leads to better code overall.

                  1. 1

                    Ok. My take on -Wsign-compare is that it mixes up equality vs inequaity comparisons. I wish it didn’t:

                    if(size < argc) // Bad: Result depends on signedness.
                    if(size == argc) // Ok: Result does not depend on signedness.
                    

                    Only one of these comparisons is problematic, yet -Wsign-compare complains about both! As such, half this warning is legit; half of it is never that, and the need to silence it is just detrimental.

                    You’re right, I wouldn’t use -Wextra: I would use -Wextra -Wno-sign-compare, because that’s the unreasonable part.

                    1. 2

                      The second is not, by itself, a problem, but it can depend on a cast that might be a problem (you must have constructed size from something, and what happens if that size is in the range that is negative for int? Is your logic still correct in this case?). It can also mask problems on some platforms because of the exciting integer promotion rules in C that are invoked here. Unfortunately, C and POSIX APIs are incredibly inconsistent about when they use signed and unsigned values (argv really should be a size_t, but that would break backwards compatibility[1]).

                      C also doesn’t have a good way of asking the question that you really want to ask, which is ‘are both of these values contained in the range that can be expressed in both types and, if so, are they identical?’. The integer promotion rules correctly handle that if there is a type that can express the range of both, but fails with annoying corner cases where there isn’t one. You really want a 31- or 63-bit unsigned type in a lot of cases so that you have something that can be unambiguously cast to a signed or unsigned 32- or 64-bit integer.

                      [1] I’d love for C++ to define a new entry point that provided a pair of iterators for the arguments and have the compiler synthesise the prologue in main, but that’s probably never going to happen.

                  2. 4

                    -Wextra has warnings for a lot of things which don’t necessarily indicate a problem. The biggest example of this is the unused parameter warning, which warns when a function has an unused parameter.

                    Sometimes, an unused parameter indicates a problem, sure. But a lot of the time, the parameters have to exist because the function has to implement some particular signature to be used as a function pointer (or to override a virtual method in C++), but doesn’t need to use one of the arguments. In C++, a common pattern is to comment out the name of the parameter - something like, void foo(int /*count*/, int *whatever) { ... }, but that’s illegal syntax in C.

                    I find things like -Wsign-compare to be a bit annoying sometimes, but mostly reasonable. It could possibly catch errors, and it’s easy enough to do the cast (or use the right signedness in for loops). But it’s certainly another example of something which has a lot of false positives.

                    I often use -Wall -Wextra -Wpedantic -Wno-unused-parameter in my makefiles.

                    1. 3

                      You can avoid that warning in C too. I think the commonest way is to cast it to void:

                      void api(int unused) {
                          (void)unused;
                      }
                      
                      1. 3

                        You can silence most of the warnings. The questions is whether the extra noise in the source code lowering the overall readability is going to cause more bugs than the warning prevents.

              2. 1

                Except -Wall doesn’t turn on all warnings, just “a lot”, and in my experience omits some I consider important. (Im sure this is compiler- and maybe version-dependent.)

                IIRC, -Wpedantic turns on “too many” warnings in Clang but “about right” in GCC.

                In Xcode, I use a curated xcconfig file that turns on nearly every warning; then in my own config file I turn off the ones that are so annoying I don’t use them, mostly super pedantic stuff like implicit sign conversions.

              3. 1

                (Yeah, C++ has its own formatted output library, and I sometimes use it, but it generates very inefficient code.)

                Not to mention terrible ergonomics.

                1. 6

                  In what way? The API is almost the same as printf, except that the format strings are in braces and the qualifiers can be a bit more readable and generates better code. It also provides a much richer API allowing things like compiling the format string so that you get compile-time type checking even in cases where the string and the arguments are provided in different places. You can also provide formatters for your own types that have the same type safety guarantees.

                  1. 2

                    Wow, I thought they were talking about ostream. I’m not up to date on C++20, the last I used was C++17. This is cool! It seems just like Rust’s format! macro.

                    1. 2

                      C++20 has std::format, which is basically a stable subset of the fmt APIs. It’s sufficiently easy to add fmt to a project that I’d only use it for things that are aggressively wanting to trim dependencies though.

                      Completely agreed on ostream. It’s not amenable to localisation, it’s hard to use in a multithreaded context, and the APIs for actually doing the stringification are horrible (depending on pushing state into the stream to control formatting). The core bits of iostream are actually quite nice: separating the buffer management from the I/O interface, for example, but the high-level bits are awful. They’re over 20 years old at this point (they’re inherited from STL, before C++98’s standard library), so also predate multithreading as a common pattern.

              4. 7

                The quote from the C standard is not quite the right one. %d is a perfectly valid conversion specification. The problem arises with the sentence afterward:

                If any argument is not the correct type for the corresponding conversion specification, the behavior is undefined.

                Also note that the cast syntax in this example is only valid in C++. If you try to compile that code as C, you’ll get an error. It’s a bit strange to me that the author used some ambiguous looking C++-specific syntax to argue against writing C.

                That said, I agree that variadic functions can be very error-prone. You miss out on type-checking for the variadic arguments, so they should be only used sparingly and with extra caution.

                1. 11

                  the code, as presented, won’t compile under any actual C compiler. The line:

                  printf("%d\n",double(42));
                  

                  is not valid C syntax. Given that he use #include <cstdio> tells me he used a C++ compiler, which may very well allow such dubious syntax (I don’t know, I don’t program in C++). This blog post to me reads as an anti-C screed by someone who doesn’t program in C.

                  1. 3

                    double(1) is a call to the double primitive type’s instrinsic constructor. It’s not really a type cast, though that’s effectively what happens (at least as I understand it).

                    1. 1

                      It’s completely legit syntax in C++.

                      The principle behind it is that user-defined types should generally be able to everything primitive types can do, and to some extent vice-versa. This is why C++ supports operator overloading, for example.

                      In this case allowing primitive types to use constructor-like syntax means you can write something like T(42) in a template and it will compile and do the expected thing whether T is a user-defined class or the primitive double type.

                      1. 3

                        Nobody said it was legitimate syntax in C++. They said it was illegitimate syntax in C

                  2. 5

                    TLDR: Doing undefined behavior can cause unexpected (undefined?) results. Therefore: “don’t write C code”.

                    1. 8

                      It’s more: there is no way of expressing an API in C that does not involve undefined behaviour if you use it wrong, therefore use a language that can express this kind of API in such a way that incorrect use is a static type-checking error.

                      It’s worth noting that C++11 is the first version of C++ in which it was possible to implement a type-safe printf.

                    2. 3

                      Printf is actually perfectly type-safe if you’re using the right language :-)

                      1. 3

                        I think you meant this language.

                        1. 1

                          How type-safe is it? I only looked at it briefly without knowing rust, and it seems to me they sidestep the issue by making it possible to plug in any other type in a printf? Or am I mistaken?

                          1. 3

                            Not a rust programmer, but I know that format! is a macro. What could be happening is that at compile time the fmt string is parsed, and display specifc functions for the specifier take the positional argument. If that argument isn’t the right type for the specifier, it’s a compile time type error.

                            My guess is that Rust doesn’t allow dynamic format strings, or if it does, it’s a runtime error, no different than OCaml, when the format is incompatible.

                            Would love to be corrected.

                            1. 3

                              What could be happening is that at compile time the fmt string is parsed, and display specifc functions for the specifier take the positional argument.

                              Good guess, this is exactly what happens!

                              My guess is that Rust doesn’t allow dynamic format strings

                              2/2, format strings must be static, and they’re checked at compile time.

                      2. 3

                        On my system, g++ by default prints

                        pr.cc: In function ‘int main()’:
                        pr.cc:4:35: warning: format ‘%d’ expects argument of type ‘int’, but argument 2 has type ‘double’ [-Wformat=]
                             printf("%d\n", double(2101253));
                        

                        And yes, I do use -Wall (even though not needed for this example), and -fsanitize, and C++ smart pointers. Bugs caused by unsafe memory access are pretty rare, and when they happen, I get verbose runtime error messages that make the problem easy to fix. It’s easy to exaggerate the difficulties of C++ memory unsafety if you don’t regularly use the language and know the idioms and tools.

                        1. 3

                          Then why do memory unsafety CVE’s keep happening in the most heavily-used C++ projects in the world? Are their developers idiots?