1. 24
  1.  

  2. 16

    Don’t use macros.

    Why not? Macros can be very useful. For example, say I have a dispatch table to call function’s with a common signature and set of local variables. If there are 30 different functions, a macro defining the function and declaring the common variables means that if something changes I only have to change it in one place. This is more than just a ease of coding thing: if I change from signed to unsigned or change the width of an integer and forget to change it in one place, there can be serious and hard-to-find consequences.

    Don’t use fixed-size buffers.

    Always use static, fixed-sized buffers allocated in the bss, if you can get away with it (that is, you know the maximum size at compile time). Allocation can fail at runtime, and adding checks everywhere for this is error-prone. If you’re allocating and freeing chunks of memory at runtime, you run the risk of use-after-free, reference miscounts, etc.

    If the size of a block isn’t known until runtime, but is known at startup, allocate the necessary memory at startup and free it at shutdown.

    Only as a last resort should you be doing allocation and freeing repeatedly during runtime, when the set of objects and their sizes depends on data only accessible while running.

    1. 11

      I feel the writer is not so experienced with C.

      Not only generic recommendations like Prefer maintainability (when should we not prefer maintainability?) or Use a disciplined workflow (yes, but what kind of workflow?), some of them are against common C best practices, like: Do not use a typedef to hide a pointer or avoid writing “struct” .

      Taking into account opaque pointers are something standard in stdlib and highly recommended to hide complexity and allow code change, I don’t know from where he got these ideas.

      1. 3

        Opaque pointers hidden behind typedefs are something I’ve never been totally comfortable with, though I guess I’ve been using them without knowing! Where in libc are they used?

        1. 4

          typedef void* lobster_handle_t; is probably the most common way–of which I’m aware–of exposing types and structs for public consumption without giving away internal implementation details to users. This is doubly useful if you have, for example, the same interface implemented differently on different platforms: your _win32.c and _posix.c variants are chosen based on #ifdefs, but user code including your headers only ever sees the opaque pointer.

          1. 5

            Wouldn’t a lobster handle just be a claw?

            1. 3

              Or the tail

            2. 4

              Forward declaration is the new hotness:

              struct T;
              void f( T * x ); // feel free to pass around T*, but you don't get to see inside
              

              It brings no benefits to C code because all pointer types implicitly cast to each other, but in C++ they don’t and it’s definitely preferred there.

              1. 4

                It brings no benefits to C code because all pointer types implicitly cast to each other

                Whoa, no they don’t. void * implicitly converts to any other type of (non-function) pointer, and vice-versa, but that’s it.

                (many compilers do allow for function pointer <-> void * conversions, even implicitly, but I think that’s an extension for POSIX compatibility.)

                1. 1

                  MSVC/GCC/clang all allow it, but they do warn about it by default.

                2. 3

                  T isn’t a valid type name in C. You have to use struct T unless you supply a typedef.

              2. 2

                FILE, for example.

                1. 3

                  Correct me if I’m wrong, but doesn’t one usually use a FILE * rather than working with a raw FILE?

                  1. 1

                    Sorry I was thinking just “opaque pointer” not one hidden behind a typedef. An example of a completely opaque type (from the perspective of the standard library) is va_list. Extending beyond the C standard library, you have things like pthread_t in POSIX (which could be “the standard library” depending on your definition), which is of unspecified type.

                    1. 1

                      Keep in mind, va_list is not necessarily a pointer, and it’s only opaque in the sense that its contents are undefined and unportable. On x86-64 Linux, for example, it’s a 24 byte struct, and may be defined (depending on your compiler, headers, and phase of moon) as:

                      struct __va_list_struct {
                          unsigned int gp_offset;
                          unsigned int fp_offset;
                          union {
                              unsigned int overflow_offset;
                              char *overflow_arg_area;
                          };
                          char *reg_save_area;
                      };
                      
                      1. 1

                        Right, I was trying to think of an example that is an explicitly opaque type hiding behind a typedef. It’s always interesting to see how POSIX and/or C sometimes mandates somethings as completely undefined by type, but not others. jmp_buf has to be an array type, for example, but is not specified beyond that, and va_list is explicitly of any type at all.

                        1. 1

                          time_t Standard C does not mandate a definition at all (it could be an integer, could be a float, could be a structure). POSIX defines it though.

                          1. 2

                            Time is an illusion. Lunchtime doubly so.

                2. 1

                  FILE * is the more visible example.

              3. 4

                if I change from signed to unsigned or change the width of an integer and forget to change it in one place, there can be serious and hard-to-find consequences.

                Agree, which is why using typedefs to make maximal use of C’s sad type system is a better move than a mere macro. Also, macros can do weird things when expanded in code, and it’s easy to end up with a codebase that is unreadable and ungreppable because of having to continually expand non-intuitive macros. They’re handy, in moderation, but overuse is not so great.

                Only as a last resort should you be doing allocation and freeing repeatedly during runtime, when the set of objects and their sizes depends on data only accessible while running.

                Spoken like a true Fortran programmer! ;)

                More seriously, anything that is actually interactive and of any real practical use is easier coded with dynamic allocation. Also, the number of people that properly write fixed-size allocation code without leaving gigantic security holes and undefined behavior open is small. Better just to use malloc and free and know that you have problems than to hope somebody didn’t mismatch a buffer size with a differently-spec'ed memmove call.

                That said, in a library, if you don’t allow users to specify their own allocation routines you are bad and you should feel bad.

                ~

                Overall, I agree that this advice is not so great, probably because the author hasn’t had to deal with producing libraries for others to consume. That very much colors how these things are evaluated.

                1. 6

                  Fortran

                  curls up in a ball, rocks back and forth, crying

                  They’re handy, in moderation, but overuse is not so great.

                  That’s true of just about anything, but yes, macros are a sharp tool. It’s very easy to hurt yourself if not used very carefully, but like any sharp tool sometimes there’s a good use case. Never say never. :)

                  More seriously, anything that is actually interactive and of any real practical use is easier coded with dynamic allocation.

                  True, but not everything need be interactive. The most critical code I work on right now is highly dynamic at runtime, but does no memory allocation after startup. We calculate the sizes of various structures based on parameters provided by the system at startup, and allocate memory once. This is necessary for various reasons, but most importantly because of performance; we deal with tens-of-thousands of work units a second, of varying size. Repeatedly allocating and freeing blocks would rapidly result in fragmentation.

                  We originally thought about allocating fixed-size blocks, since most modern allocators would handle that well so long as there weren’t any other allocations happening. Things like tcmalloc would still probably be okay, but at the end of the day we decided to use a static allocation scheme with what amounts of a large array with chase pointers in each slot, making allocation an O(1) operation with zero fragmentation (basically a slab allocator). Additionally, we can use mlock to keep those pages in memory to avoid any indeterminacy with swapping.

                  Variable-sized data is fed into a ring buffer with chase pointers and we keep pointers to things in the ring in the slab-allocated structures; we never copy out of the ring. We track the ring pointers and invalidate any data in a block that gets overwritten while in use (which is surprisingly cheap if you do it right).

                  (Sorry, that was a big digression, but I really like working on that code.)

                  Also, the number of people that properly write fixed-size allocation code without leaving gigantic security holes and undefined behavior open is small.

                  I would argue that writing strncpy(foo, bar, BUFSIZE) is less error-prone than strncpy(foo, bar, dynamically_allocated_size_that_changes). (I admit that’s a contrived example.)

                  Again, obviously, not everything can work this way. There are times when you have to use dynamic allocation, but, at least in my experience, people have a bigger problem tracking reference counts and avoiding use-after-free than they do dealing with fixed-size buffers.

                  1. 4

                    it’s easy to end up with a codebase that is unreadable and ungreppable because of having to continually expand non-intuitive macros

                    That’s true, although macros are also sometimes used to fix the problem that C codebases are often hard to grep in the first place. The Linux kernel uses a whole series of WARN macros partly for that reason. Lots easier to grep for WARN_ONCE in a big source tree than have to pore through every inline use of printk.

                2. 6

                  Ignore the C advice, and just read the non-specific principles, and then the post isn’t so bad. The C specific stuff is pretty bad advice on average though. I’ll take one example:

                  Do not assume the architecture is x86

                  Do assume the architecture is x86 if that’s what you’re using. You will most likely not be porting any significant C program to a new architecture without spending significant time porting. If you don’t have the devboard and some CI to actually test it: you don’t know what you’re doing, likely. I’d avoid any fake porting.

                  1. 2

                    Wow, never use inline? High-quality inline functions in headers are much better than macros or unnecessary performance overhead of function calls. Zero cost predicates for bit fields are one example.

                    No GNU? What about error(3) or asprintf(3)?

                    1. 5

                      Using static inline functions as a replacement for macros is a good idea. I think that explicitly marking functions as inline as a performance optimization is a bad idea though. It’s better to leave those decisions up to the compiler.

                    2. 2

                      Keep your build system simple and transparent. Don’t use stupid hacky crap just because it’s a cool way of solving the problem.

                      Sorry, I just taught a class on doing Angular today, and this is downright hilarious.

                      //I don’t like Angular, but big companies do, and they pay me.

                      1. 1

                        I remember finding out about Angular. I thought it had some interesting features and ideas, but wondered how the hell it did some of the things that it did, so I looked into it. Oh, of course, stringifying function objects and running regexes on them! That’s a perfectly sane idea. I’ll just go over here now.

                      2. 2

                        Use only standard features. Do not assume the platform is Linux. Do not assume the compiler is gcc. Do not assume the libc is glibc. Do not assume the architecture is x86. Do not assume the coreutils are GNU. Do not define GNUSOURCE.

                        Writing 100% portable code is at best a waste of time, and probably actually impossible for anything non-trivial. You have to make assumptions at some point.

                        Of course you can assume float is what it should be. Of course you can assume NULL is 0. Of course you can assume errno is TLS. How could you sell a platform where those weren’t true? Even in the language lawyer strawman that is “embedded”, “Btw our platform is broken/a huge pain in the ass, but you should still choose us over our dozens of competitors” is not going to fly.

                        If you’re writing PC software, of course you can assume little endian, twos complement and SSE2, there’s no need to waste your time on fallback code when there’s no reasonable hardware in use today that requires it.

                        1. 8

                          Maybe my PC is a raspberry pi and doesn’t have SSE?

                          1. 7

                            If you’re writing PC software, of course you can assume little endian, twos complement and SSE2,

                            It’s easy to write code that doesn’t care about endianness. Compilers optimize it well, and therefore it doesn’t even have a performance cost. SSE2 is not present on many devices – ARM is not exactly rare, or an unlikely target to port to. And while assuming complement twos sounds sane at first glance, anything you do with that assumption is likely to be undefined behavior in C. Thanks to the attitudes of modern compiler developers, that means your code is likely to be miscompiled. It’s not a bad assumption, but it requires parsing the spec pretty closely.

                            1. 1

                              Twos complement signed integers have undefined behavior around the edges (INTMIN and INTMAX and shifting past the integer size). Unsigned integers have sane semantics (modulo).

                              The assumptions I make these days are:

                              • Byte addressable with 8-bit bytes
                              • Twos complement (I have never seen a sign magnitude or ones complement computer and I’ve been programming since the mid 80s).
                              • IEEE 754 floating point math (last time I used non-IEEE 754 floating point it was BASIC on a 6809 in the 80s)
                              • memset(pointer,0,sizeof(pointer) results in a NULL pointer (again, I have never seen a machine with virtual memory where the NULL address wasn’t all zero bits—I’m sure one exists somewhere).

                              And so far, my code has been ported to Linux, Solaris, Mac OS-X and (some time ago) Windows with (if any) changes. The only time endian issues comes up is networking or working with certain binary files (not as often as you might think).

                              1. 3

                                memset(pointer,0,sizeof(pointer) results in a NULL pointer (again, I have never seen a machine with virtual memory where the NULL address wasn’t all zero bits

                                POSIX actually requires this now, although not going so far as to fully equate null and all-zero-bits, and only since the very recent 2016 update. What it now requires is that a pointer made up of all zero bits is guaranteed to be a null pointer, so your example is standard now (on POSIX systems). However, there are still allowed to be multiple physical representations of a null pointer, so you cannot conclude the inverse, that a pointer not made up of all zero bits is necessarily a non-null pointer. I’m not sure why they felt it useful to keep that leeway.

                                1. 3

                                  I’m not sure why they felt it useful to keep that leeway.

                                  So that architectures using segmented memory architectures or that have flag bits in the high part of the pointer can still be POSIX compliant, I would imagine. I’ve never programmed one (alas!), but I know that pointers on Crays and some of the big iron IBM machines have funny formats.

                                  Old versions of classic MacOS would use the top byte of addresses for flags, and such practices were not unheard of in the Amiga world either. In those cases, though, that wasn’t actually a requirement of the hardware but rather just space-saving hacks that ended up biting future generations in the ass.

                                2. 3

                                  Twos complement signed integers

                                  Twos complement only applies to signed values.

                                  1. 1

                                    Since reading https://commandcenter.blogspot.co.nz/2012/04/byte-order-fallacy.html I haven’t had any problems with network or binary file endian issues.

                                  2. 0

                                    It’s easy to write code that doesn’t care about endianness.

                                    Agreed, but it’s even easier to just not write it. (even easier to not have to go out and find big endian hardware to test it on too)

                                    SSE2 is not present on many devices – ARM is not exactly rare, or an unlikely target to port to.

                                    SSE2 support is at 100.00% in the Steam hardware survey!

                                    Thanks to the attitudes of modern compiler developers, that means your code is likely to be miscompiled.

                                    I don’t think the UB clowns have much influence outside GCC/clang, and both of those have -fno-strict-overflow.

                                    1. 5

                                      Agreed, but it’s even easier to just not write it.

                                      I don’t really find x = get_le32(buf) to be especially hard.

                                      SSE2 support is at 100.00% in the Steam hardware survey!

                                      Weird, most of the devices my code runs on don’t have SSE – none of the Android devices, for sure. And I kind of like doing the bulk of development on my desktop because the tools for doing it are better.

                                      I don’t think the UB clowns have much influence outside GCC/clang

                                      The only two compilers that are widely used use on anything other than Windows. No big deal – it’s not like anyone uses anything other than C# on Windows anyways. Also, I’m pretty sure that MSVC has similar – but different – stupidity in the face of undefined behavior.

                                      1. 1

                                        I don’t really find x = get_le32(buf) to be especially hard.

                                        Of course, but what are you going to test it on?

                                        none of the Android devices

                                        Again of course mobile devices don’t have SSE, which is why I said PC software.

                                        The only two compilers that are widely used use on anything other than Windows. No big deal – it’s not like anyone uses anything other than C# on Windows anyways.

                                        MSVC doesn’t chase pointless optimisations as hard as GCC/clang do. I expect icc doesn’t either since from what I’ve seen the Intel tools are more about guiding you towards optimising the code yourself.

                                  3. 5

                                    There are a whole lot of ARM devices out there, meaning you can’t assume architecture as much as you might want to. If your library will work with minor tweaks on both Linux-on-the-desktop and Linux-on-the-device, it’s going to get picked and paid for before something that assumes x86.

                                    I’ve written reasonably large applications that compile and run unchanged on Linux on x86, x86_64, ARM, and PowerPC. It’s not impossible, and in the long run it paid off.

                                    1. 3

                                      SSE2

                                      Can’t say I have ever had the urge to write anything that explicitly depended on SSE2.

                                      I have often used standard C functions that gcc cunningly implements for me as SIMD instructions on whatever platform I’m using.

                                      I have also used gcc _builtin where appropriate that are implemented as SIMD’s.

                                      My code usually is run on a mixed bag of intels, arms and oddly enough sparcs.