1. 62
  1.  

  2. 12

    The first rule of C is don’t write C if you can avoid it.

    It really depends on the domain. I almost exclusively write my stuff in C. And in general, if you think the Unix-way, you’ll quickly see that the real goal should be to keep your C programs small and combine them on a higher level with something else (sh, …). Use the language you are most efficient in. You can definitely bork a C program, but secure practices can easily be employed with enough experience and the right data structures, including segregation.

    1. [Comment removed by author]

      1. 3

        What do you think of C++? It seems to be, alas, all over the place in quant finance.

        I’m also really into the advanced functional programming languages, though I find C compelling. C++, on the other hand, holds no appeal for me. I’d use it, if there were a strong reason to do so, but if I were to pull out a high-level language, I’d probably stretch for something more to-the-purpose than C++.

        1. 9

          C++ has a lot of features which make it very easy to write unreadable and slow code, and suck all the fun out of programming when your superiors expect you to use them:

          • namespaces. I’ve personally never encountered a problem that namespaces solve, but having to write CompanyName::OverlyLongModuleName::foo for every library call gets tedious very quickly.
          • classes. Classes/OOP by themselves aren’t necessarily terrible, but C++ classes are quite a bit more “powerful” than in other languages. For example, a = b can do literally anything (and usually allocates memory), because you can overload the equals operator. Note that a = b occurs implicitly when you call f( a ) (unless f takes a reference as its first argument (which is invisible on the calling side)).
          • templates. Template abuse makes your code unreadable and sends compile times into orbit. It’s normally done so you can have the fastest implementation for several different types. In practice, when you hide the implementation behind templates, you tend to miss optimisations that would be obvious if you just wrote separate functions.

          C++ has lots of traps you’ll fall in if you aren’t aware of them. This is on top of all of C’s traps, which there are more than enough of already.

          The STL is a pile of shit. It gives you a load of generic containers (see what I said about templates), and generic algorithms which work on said containers through their iterator interface. The idea is that you can write your code using some data structure, then swap it out for another without having to change any of the code that uses it. This has a few problems:

          • You’re adding a lot of complexity for something that never happens.
          • If you do actually swap your data structure, you can run performance into the ground when operations silently become an order of magnitude more expensive.
          • IT DOESN’T EVEN WORK. All the standard containers have iterators which work differently, so you can’t just plug them into each other anyway.

          Finally, @FRIGN mentioned in one of his posts:

          If you are still thinking about using VLAs in your code, take a look at the GCC implementation.

          If you pick any C++ feature at random and substitute it for VLAs in that quote, 90% of his argument will still hold water.

          1. 1

            Sounds like a shitload of fun, without the fun part– just the shitload.

            I used C++ for 6 months at Google and didn’t like it, but I’ve grown and aged and recognized that my negative experience was too brief to judge the language, especially because even people who loved C++ agreed that the code I was told to maintain was a pile of shit. I enjoy C and assembler quite a bit, although I’ve never built anything large using them. I do remember C++ iterator invalidation. Has that not been fixed?

            I think that C++ has two things keeping it in place. The first is that it sells itself as offering C-level performance (because you can do anything in C in C++, at least in theory) while being a high-ish level language (that isn’t much of one, but never mind that). The second, I think, is that there are a lot quants who see the nastiness of the language, and the fact that it’s not taught anymore (C is, but not C++) as a source of job security. On one hand, I tend to support the existence of a barrier to entry, because someone who’s not willing to learn a new language really shouldn’t be in this industry, but on the other, it seems like then having to use C++ isn’t worth it. This is probably compounded by the fact that many quants (like many data scientists) have an active hostility to getting deeper into the programming rabbit hole for the fear that it’ll put them in a worse bonus pool (and, alas, that fear isn’t unfounded for quants).

            1. 3

              I think it’s important to keep in mind that the number of quants, even under the most relaxed of definitions, is dwarfed by the number of people who write software for Windows, where I’d wager obscurantism is normally distributed.

              C++ always made more sense to me in theory than in practice, as “pay only for what you use” sounds much more appealing than “a billion mutually but subtly incompatible dialects”. I wrote some in anger this summer for the first time in a decade or so, and it was better than I’d remembered but not really enjoyable.

    2. 7

      I love how he promotes the usage of VLAs only to give plenty of warnings later on how easily this can fail for larger objects and that a user can exploit that to crash your program. At least with malloc, you can check to see if the allocation failed. With VLAs, you literally have to just live with the stack overflow in case you requested too much stack at once.

      The only thing I agree on is the stdint.h-usage. It greatly improves the readability. Most of the other points were more of an experimental nature or don’t matter (personal taste).

      1) “C99 allows variable declarations anywhere” / “C99 allows for loops to declare counters inline”

      Why is it “bad practice” to declare the variables at the top of the function? Only because you can doesn’t mean you should. And if your functions grow too large you might have to think about splitting them up a bit, not scatter your variable declarations all over the place. This way you end up with more cruft in the end, not less.

      2) “#pragma once”

      Way to go for portability. If you only care about the gcc/clang-monoculture, this may seem logical, but it’s non-standard, so don’t use it. :P

      3) “restrict-keyword”

      If you do numerical mathematics, go ahead, use it. I use it for my work as well. Most people however don’t even know how restrict even works exactly and just add it anywhere, thinking it’s safe. In most cases, the speed benefit won’t matter anyway, because your program is stuck in I/O 99% of the time.

      4) “Return Parameter Types”

      The convention ‘0’ for success and ‘1’ for error is common knowledge. The bool-proposal was kind of stupid, because you end up setting up conventions there as well. Does return ‘true’ mean error or success?

      5) “Never use malloc, use calloc”

      Seriously? This can actually shadow bugs in your program (forgotten 0-terminators on dynamic strings) which can fuck things up later on. Also it’s slower. If you use calloc everywhere, you basically admit that your data structures are messed up and have let your program grow too much. Or that you have simply not understood the language/machine.

      And the most important point: Make up your own mind people! If you prefer your own coding style, then use it. If it’s too weird, it’s not guaranteed if people will commit something, but in C you can’t go too wrong anyway. Nevertheless, I like the gofmt approach. :) Also, take those “how to’s” always with a grain of salt. This is merely a reflection of the author’s opinion. Hell, take what I say with a grain of salt. Read the docs, read the standards(!) and inform yourself. C is simple enough that you can make up your own mind on those technical details. If you are still thinking about using VLAs in your code, take a look at the GCC implementation.

      Guides like this are the reason so many people are still writing bad code, because they let others think for them instead of informing themselves.

      1. 4

        I don’t think 1 for error is that a common convention, though I agree not 0 is a relatively common way to signal failures. In a lot of the code I work on, almost everything returns an int which is 0 for success and -1 for failure.

        I personally think it’s a “bad practice” (whatever that means – to be avoided, I guess) to declare variables outside the scope in which they are used. If you need a variable inside one arm of an if statement, put it in there, not at the top of the block. Inline loop counter declaration is essentially the same thing.

        1. 1

          Regarding (1), declaring variables as needed instead of at the beginning of the block can help you in my experience. in ANSI C, it is easy to miss if a variable has not been initialized or actually has vanished from the code. Also, patterns of variable reduce (like i am going to reuse i here…) are probably not emerging as often.

          So its not necessarily that declaring variabls at the top is bad, it is just nicer to declare them as you go.

          1. 2

            In the end it doesn’t matter. I often reuse my loop variables, you probably don’t. I guess even if we worked on a project together this wouldn’t be too much of an issue, anything else is not important.

          2. 0

            Way to go for portability. If you only care about the gcc/clang-monoculture, this may seem logical, but it’s non-standard, so don’t use it. :P

            I actually think this as nice bonus! :P

            Every time I have been using some other compiler than gcc/clang, there has been horrible headaches in every corner (especially with IAR, damn it!). Although I must say that all my experiences outside gcc/clang world has been with propietary compilers. I might be somewhat biased.

            1. 2

              Try PCC. You might be pleasantly surprised? :)

          3. 3

            -Os helps if your concern is cache efficiency (which it should be)

            I have yet to see a benchmark where the famous cache efficiency of -Os is proven. Are you aware of any?

            1. 2

              I can’t dive into the statement on caches, but generally, -Os reduces the binary size. There’s no need to do -O2 or something, and from my experience, I’ve even seen cases where -Os was faster than -O2.

              1. 1

                Of course this depends on workload, but I wonder how the effect of -Os fares for “google-style” workloads, given that those binaries are huge, their i-cache working sets are bigger than i-Cache already and growing at up to 27% per year, as per Kanev et. al in ISCA 2015 - see sec. 6.

                I suspect that the effect of -Os is insignificant for any real application now, and what that paper tells us is that any argument you see for -Os based on SPEC is suspect, as all of those programs are much, much smaller than the things we’re actually running.

                1. 3

                  Yes, the best way still is to just write lean programs instead of forcing the compiler to try to optimize away the cruft in all the huge libraries people are using. This is not the way to go. If you use statically linked applications, -Os makes an even bigger difference.

            2. 5

              This seems to be more of a guide of what not to do.

              1. 8

                I would be more likely to point to this with the disclaimer “You see this guy’s opinions? Do the opposite of what he says.”

                Some of the compiler features he mentions are non-standard. This matters for me. I actually use a C compiler that isn’t GCC or clang on a regular basis (pcc). -march=native is often unacceptable for downstream distributors, and generally I’m annoyed when programs ignore my CFLAGS in favor of their own ridiculous optimizations. Usually I value a fast compilation far more than non-hot parts of the code being sprinkled with magic. As others have mentioned, “#pragma once” is also non-standard, and variable size arrays (i.e. alloca) can be a security risk.

                No specific comments on types (though you should certainly use char to refer to utf-8 octets, otherwise people who have to use your libraries or read your code will be annoyed). I use “unsigned” when I want a integer that’s at least 16 bits and don’t care about specifics. That’s in line with the standard.

                There are valid arguments for separating declarations from code, especially when you have resources you want to allocate and free. for loops are perhaps a case when this rule can be broken - not sure I have a strong opinion here.

                1. 4

                  “You see this guy’s opinions? Do the opposite of what he says.”

                  That’s exactly what I meant. My sentence was obviously ambiguous.

                  1. 2

                    Ah, ok, that makes more sense. Thanks.

                2. 1

                  Isn’t that at least half of any effective programming guide? Knowing how to write a program that compiles and runs in a given language is easy. Knowing how to write a good program that minimizes errors, maximizes readability, performance, security, and refactoribility is hard.

                3. 4

                  The first rule of C is don’t write C if you can avoid it.

                  I agree with pretty much the whole thing, but this is probably the best advice.

                  1. [Comment removed by author]

                    1. 0

                      I can and do avoid C completely. I don’t think there are any such use cases for new projects in 2016.

                      1. 14

                        Umm, new filesystems? Kernel modules? These are 2 of the things I work on that have had new projects in the past year alone. How about embedded programming, where c/c++ is about it? And no, rust does not qualify yet to even be considered yet. Maybe in 5 years once it has a few notches in its belt. Right now the devil we know is a better choice.

                        Just because you can avoid c, doesn’t mean c can be avoided.

                        1. 5

                          This is exactly why I can’t wait until Dropbox publishes stuff about their deployment of Rust. It’s been in production almost a month, so it’ll take some time to see how it really shakes out, as opposed to the tests they’ve been doing.

                          1. 2

                            Filesystems or drivers can live in userspace. (Maybe not for a module for an existing kernel - but then that’s not really a new project). Embedded toolchains these days tend to be based on gcc or LLVM and in either case you have alternatives to C (and even if all you have is a C compiler, there’s always transpiling). And in the worst case, when the devil we know is this problematic, I would sooner take my chances with rust.

                            1. 1

                              Filesystems or drivers can live in userspace.

                              A NIC driver requires access to raw memory, handling of interrupts and integration into the kernel’s networking infrastructure. On which OSes can you write one in userspace, and which languages can you use for that?

                              transpiling

                              Compiling, please. That’s what translating from one language to another is called.

                              Embedded toolchains these days tend to be based on gcc or LLVM and in either case you have alternatives to C (and even if all you have is a C compiler, there’s always transpiling).

                              The issue is not toolchains, but cramming all the functionality you need into 16KB of text+rodata+data and 4KB of data+bss+stack. Go runtime, for instance, isn’t even close to fitting into such space.

                              I would sooner take my chances with rust.

                              Rust is interesting, yes.

                              1. 1

                                A NIC driver requires access to raw memory, handling of interrupts and integration into the kernel’s networking infrastructure. On which OSes can you write one in userspace, and which languages can you use for that?

                                I’d assume QNX; people do do them on linux (DPDK etc) though for different reasons. There’s no particular language restriction - I’ve seen people write bit-banging drivers in Python (high-level languages can still give you a way to access raw memory, they just don’t do it by default for everything). (Yes for most use cases for a NIC you’d want something more performant than Python. That doesn’t have to mean C)

                                Compiling, please. That’s what translating from one language to another is called.

                                Language is a tool for communication. I find the distinction useful.

                                The issue is not toolchains, but cramming all the functionality you need into 16KB of text+rodata+data and 4KB of data+bss+stack. Go runtime, for instance, isn’t even close to fitting into such space.

                                That kind of use case is disappearing (these days it’s cheaper to get a standardized SoC with 64+mb than a specialized device with less memory). Though yes,there are cases where that is a requirement (but I’d still think rust, or C generated by a higher-level language program, would be a better option).

                                1. 1

                                  I’d assume QNX; […]

                                  Interesting.

                                  That kind of use case is disappearing

                                  It’s just becoming less visible, but it’s growing, with IoT and all.

                                  (these days it’s cheaper to get a standardized SoC with 64+mb than a specialized device with less memory).

                                  SoCs with less memory are also “standardized” (as in, readily available). Even if bigger devices are indeed cheaper to get (cheaper than 1€ with RAM and FLASH, really?), they’re much more expensive to run off a battery. Low-power variants of ARM Cortex are all the rage these days.

                                  Though yes,there are cases where that is a requirement (but I’d still think rust, or C generated by a higher-level language program, would be a better option).

                                  Rust is a special case, but I’d wait for it to mature a bit before judging. But compiling (either via C or directly) a program in a high level language with a garbage collector for an environment where most people feel they don’t have enough RAM to afford malloc is problematic.