1. 22
  1.  

  2. 25

    I know it’s a really unpopular opinion these days, but I love C. Is it unsafe? yes. Are there tools that can do everything it can do? yes. Can at least some of them do the job just as fast or as minimally? usually. Is it dangerous and should it probably not be used anymore? probably.

    Here’s the thing, so few other languages are so incredibly internally consistent. All of C’s data structures and how they work are all so consistent and predictable once you understand them.

    Here’s a really superficial example:

    In C, there are three functions for outputting data (roughly):

    1. put
    2. print
    3. write

    Given only those three names, can you guess what the three functions are for inputting data (again, roughly)?

    1. get
    2. scan
    3. read

    Put more plainly, the language actually feels like it was designed coherently. Think about $FAVORITE_LANGUAGE, is it so well-designed?

    More substantively, C’s arrays get a lot of flak, but the flak they get really is not their fault (the most common one being that they don’t have bounds-checking). Here’s the thing, when you think of a “C array”, what you’re probably thinking of doesn’t exist. When you do something with an array in C, you’re actually doing something with a pointer that points into an array. Pointers (in C) are just numbers—you cannot expect numbers to know about the ends of arrays (though that is a solid argument for why we should do pointers differently in other languages).

    To drive the point home, arr[2] doesn’t actually index an array (named arr to its third element). What is actually happening is that you have added the offset 2 to a pointer (named arr) and dereferenced it. More simply, arr[2] is identical to *(arr + 2). In fact, [] is actually just sugar syntax for this offset dereference, and as a result, the following is legal (and functions): 2[arr]. This is why *arr is equivalent to arr[0] (and actually, the primary reason C counts from zero; pretty much every other language that counts from zero does so out of tradition, not reason (which is why some languages like Lua—which is another tremendously internally consistent language—abandoned it)).

    Now, similar to arrays, when you manipulate a struct, you are doing something very similar. In fact, sct.nme is exactly the same as *(sct + nme) (given that nme is actually just a name for an offset). Further, psct->nme is exactly the same as sct.nme except it dereferences its left argument first: *((*psct) + nme).

    This is an incredible level of internal consistency that almost no other language out there that I have looked into seems to have. I would love to use something safer (I’ve looked a lot at Haskell), but I just cannot separate myself from the amazing level of consistency.

    How about C’s type system? There are tons of stricter type systems (like that of Haskell, Idris, Ada, Java, Python and a laundry list of others). But, those type systems have a different goal than C’s. Those systems are present to make the language and the code you write safer and more reliable. C’s type system has nothing to do with safety. Its whole purpose for being is to simplify work for you. Using bitwise operators, pointers and */&, you can actually do essentially everything (you can actually still do this with the void * type if you’re feeling adventurous). The point of C’s type system is to make your life simpler—to give you shortcuts for integral arithmetic so that you don’t actually have to implement it yourself using bitwise operators.

    I love the notion of type-safety. I love that Haskell’s type system gives me a ton of guarantees about the code that I write. I love that if I have an Idris program with %defaults total enabled (and I adhere to it), once it compiles, it is essentially mathematically proven to always be correct. But, I also love that C was designed with no regard—whatsoever—for safety, but instead for consistency, flexibility and power. “Trust the programmer” might be out-moded, but it’s still something I appreciate.

    1. 11

      Is it a really unpopular opinion? (On this forum, of all forums?)

      (edit: while I’m here, I feel like a lot of the assertions about arrays/pointers/structs you’re giving fall down in the face of aliasing rules. C isn’t so straightforward after all.)

      1. 6

        I feel like it is (though it may not be). I see far more articles condemning C than offering anything positive about it (but again, that may very well just be my experience).

        As for aliasing rules, they are not there to offer you any type-safety (that isn’t their goal whatsoever). The goal of aliasing rules is to make compiler writers' (particularly those who work on optimizations) lives easier. You can easily get around all the aliasing rules using -O0 (or better yet -fno-strict-aliasing on a compiler that supports it), or if you want to be standards-compliant, there are universally two cases where you can always alias:

        1. when you are aliasing to a char * (any type can be aliased to a char * safely)
        2. when you are doing your aliasing through a union (which exists exclusively for the purpose of punning the type system).

        So, I am not sure which assertions of mine you think fall down in the face of aliasing rules.

        However, I would go further; I am not sure I would claim that C is straight-forward. There are all sorts of incredibly unfortunate choices that went into the language that were I to redo it now, I would change. The biggest one is that I would radically curtail the amount of undefined behavior present in the standard.

        But then, I would not even say the things I talked about above actually make C straight-forward, they’re just internally-consistent. It’s not that they make C easier to understand or more approachable—just that they offer you a sense of cohesion once you understand it.

        1. 6

          It’s not that they make C easier to understand or more approachable—just that they offer you a sense of cohesion once you understand it.

          I think this is right. I don’t suffer the same disquiet when visiting C as when visiting e.g. C++; mostly this is because C is one of the first languages I used in anger, of course, but also because of this consistency.

          1. 1

            I’m not making any statement about types or type-safety; what I mean is, assertions like “More simply, arr[2] is identical to *(arr + 2)” fall down in the face of analyses that C compilers are allowed to make by the spec; Subtleties of the ANSI/ISO C standard conveys this wonderfully:

            int y = 31, x = 30;
            int *p = &x + 1, *q = &y;
            if (memcmp(&p, &q, sizeof(p)) == 0) {
                *p = 10;
                printf("%d %d\n", *p, *q);
            }
            

            The above will readily print “10 31” when compiled with gcc -O1 (varies per version, but it’s spec-conformant). As the paper quotes:

            Implementations are permitted to track the origins of a bit-pattern and treat those representing an indeterminate value as distinct from those representing a determined value. They may also treat pointers based on different origins as distinct even though they are bitwise identical.

            This is great fun! But it also means you can’t hold that arr[n] is equivalent to *(arr + n) for all values of arr and n, at least not transitively.

            Sorry if I’ve missed your point here! I’m just reminded — particularly by papers like these — that there’s an awfully deep rabbit-hole one can find oneself at the bottom of sometimes, even in something as apparently consistent as C. Stuff like this ruins the sense of cohesion I find myself building up in usual programming work.

            1. 1

              Sorry for the late reply; I just saw this comment.

              First of all, let me say that I fibbed a little. arr[n] is not actually identical to *(arr + n). In fact, you will usually find that the code generated from the two are different (of course, typically, the difference is whether or not the offset happens before or after the dereference, but it is still a difference despite being minimal). Having said that, assuming arr and n do not change types between arr[n] and *(arr + n), they should yield the same result unless arr is already a pointer-type—casting between T[] and T* is only an issue if T is already a pointer type (since pointers-to-arrays can get weird).

              Second, yep, pointers-to-arrays get weird :) Again, I would not make the claim that C is perfect or that it is perfectly consistent; there are definitely edge cases where all manner of odd things happen. But the line where that happens (at least in my opinion) is so much further down the line than it is in other languages, that it’s kept me happy for years. In fact, in the many years now (I’m getting old D:) that I’ve been using C as my primary language, I have only ever run into those edge cases once or twice, and usually it was because I was doing something I shouldn’t have been doing anyway :P

              Finally, I just want to say that when people critique C, it doesn’t bother me; there are plenty of things about C that should be changed in C2x and unless we have honest and level-headed critiques of C, those changes won’t happen. The only reason I felt the need to post something positive in the manner as I did was because I usually feel like the critiques I read are made by people who don’t actually understand what they are critiquing and are more railing about how C is old than about how it could be made better. Even drawing the conclusion that C should be completely abandoned would be okay in the face of a really solid alternative proposition (actually, I’m hoping Idris will play that role eventually, but it still has a long way to go before it’s ready for that).

        2. 6

          This is a great post.

          C is arcane from a modern perspective, but it’s minimalistic in an attractive way. It’s far more structured than assembly, but has all the capability. You don’t have a runtime intermediating between you and the machine.

          I would guess that, when C was designed, short programs and command-line tools were a lot more common than large, multi-developer projects. I may be going out on a limb here, but I suspect that programs were smaller in 1970 and that multi-developer projects were the exception rather than the rule. Resource constraints were also more of a pressure than safety and adversarial users weren’t even in the purview for most systems, because the Internet was in its infancy and did not exist in a form that the modern world would recognize.

          For its time and original purpose, C was great. It’s still a great language at its level of abstraction, if used by people who know what they’re doing. And, moreover, we know that very reliable, safe programs can be written in C. JPL does so. It’s just not how 2016-era web startups do things; for them, it may not be economical.

          1. 3

            I agree with your judgements on C the language; but, think your statements about large multi-developer projects and adversarial users might be off the mark.

            One point of reference is that UNIX and C were born from the Multics project— a large, multi-developer project. Multics itself had design goals which included high availability and security.

            Beyond that, Bell used Unix and C in their network management and switches. Reading old BSTJ is an exercise in depression at how much we, as an industry, forget and relearn.

            1. 1

              Unix was, of course, a multi-developer project. Bell Labs had the resources for such things. What I meant to say was that typical commercial software outfits were (presumably) less likely to do multi-developer projects.

              You certainly can use C for multi-developer projects. It was done then and it’s done now. It’s expensive, but quite possible. However, I think that in 2016, the concept of “program” more often brings to mind a large single-program project. As for whether that’s a good thing or not, that’s another discussion. I tend to think that management pathologies are not a small part of the Big Code syndrome.

          2. 1

            I don’t understand your definition of consistency. Yes, arrays and structs are implemented with pointers, but I don’t see what’s consistent about that. Almost all languages implement most of the language in terms of simpler constructs in the language. Almost all of every language is convenience. What is it you find inconsistent about e.g. Haskell?

            1. 3

              In some respects, after getting familiar with the λ-calculus, Haskell feels quite consistent to me. In others, far less so than I would wish.

              Starting again with that superficial example (this is actually something that almost every language I know of gets wrong). In Haskell, the canonical way of outputting text is putStrLn (which makes some sense: “put a String ending in a new Line”), but the canonical way of inputting text is getLine. put and get are opposites and correctly aligned, so that’s a good start (better than many languages), but since getLine gets a String delimited by a Line, why isn’t it getStrLn? Haskell also has arbitrary redefinition, so I could actually just “fix” this oversight in my own code if I wanted—though everyone would immediately hate using my code :P

              Next up, Haskell, despite being touted as a much safer language than many others still does not have a total Prelude. The most obvious example of this is head which will throw an exception if you pass it an empty list. That actually makes some sense, but it segues into my larger point for this section. Haskell has a massive array of available ways to throw exceptions or errors (e.g., the error keyword, explicitly using undefined for unhandled cases, type-checked exceptions from something like control-monad-exception, and so many others). But, built into the language’s type system, there are two fully-functional systems for handling possible failure that make me wonder if any of these are necessary (i.e., Maybe and Either).

              In fact, Idris (which does have a total Prelude) offers two versions of several of its Prelude functions—one which uses dependent types to accomplish totality, and the other which simply uses Maybe or Either. Haskell is just starting to get really solid dependent typing and I’ve barely gotten my feet wet with how Haskell does it, but it does not surprise me that Haskell’s Prelude does not take advantage of such things. However, it really does surprise me that Haskell’s Prelude doesn’t default to using the Maybe/Either variants so that its Prelude defaults to totality rather than partiality.

              Furthermore, where in C, it’s not just that structs and arrays are implemented from pointers—it’s that everything in C is reducible to only a few ideas that all work together—Haskell’s construction feels like it was not remotely as well-thought-out. For more what I mean here, let’s talk about strings. C gets a lot of flak for not having a real string type, but most of that flak (as far as I can tell) really comes from C’s manipulating arrays through pointers, and that C’s strings use an unenforced encoding (the NUL terminator). With only a couple of changes, I would be very happy with C strings. Namely, char literals in C (now) actually are ints—on a 32-bit linux, they will be 32-bits wide—but C strings are canonically char [] or char *. If C2x added a new type glyph or rune that was guaranteed to be UTF32, and strings became glyph [] or rune * (or whatever), that would be plenty enough for me to be happy with C strings.

              Haskell’s strings, on the other hand, there are so many problems with. First of all, I understand that linked lists are an incredibly powerful and simple data-structure which lets you implement most other aggregate types. I also understand that they are kind of the bread-and-butter of λ-calculus-based languages and that they offer special advantages in lazy languages (à la infinite lists). But is there really a reason for Haskell’s String type to be based on them? In fact, almost anyone using Haskell in-production would tell you that you shouldn’t use String (a synonym for [Char]), they would tell you to use Data.Text because it is performant. Data.Text is not in the Prelude, and actually conflicts with many Prelude-exported functions, so now you have to deal with namespacing just to have solid strings. Also, there are two types of Data.Text for different use-cases, so now you have to decide which one you want (I just want to be clear, I like that these choices exist, I just feel like it is an oversight that a solid generic choice was not made for the Prelude). But, on Haskell’s lists more generally, the [] syntax is a special syntax for lists themselves, despite the fact that the Haskell community has begun to realize that lists really are not the greatest thing out there and that the Prelude should become more general (so that other things that are list-like can have their day). FTP landed, but because the syntax has been reserved for lists, you now have to enable a language pragma to allow other solid list-esque data-types to use such a clean syntax.

              I really like Haskell, and I have used it (and will continue to do so) in production of a few systems because it is so good at certain things. But, the direction and design of the language is far less cohesive than C (part of that is actually just because Haskell continues to grow and be refactored incrementally—which is not bad). I also want to say very clearly that this post is not meant to rag on Haskell (because, again, I actually really like the language); many of these design inconsistencies are present in many languages—Haskell is just an example that I use because I am relatively familiar with it and because it was mentioned both in my OP and in the Parent’s response.


              For a quick, non-Haskell example, let’s talk about APL. I love the idea behind APL, but there were some decisions made in its creation that I dislike. For example, one of APL’s major ideas is that the orthography of the language can itself be a tool, so it used unicode characters to create incredibly concise representations of ideas. Awesome! Except it disallowed users from using unicode characters for user-defined functions, so user-defined functions can never be as concise or well-represented. Or how about / and \. In APL, / is the cognate of Haskell’s foldr1. It is totally understandable that a right-side fold would be chosen because right-side folds can be simpler to implement (and because APL evaluates lines from right-to-left instead of having an operator precedence order). However, \ (the scan/expand operator) is a left-side scan. In essentially every other functional language you will find, scans are defined in terms of their same-side fold counter-part. So, in Haskell, scanr1 is defined in terms of foldr1 and scanl1 is defined in terms of foldl1. To get the equivalent of APL’s \ operator in Haskell, you have to jump through a ton of hoops.


              Now, C is by no means a perfect language (not even close). But it does manage this incredible level of cohesiveness so that once you grasp it, there is an incredible zen about the language that I still have yet to find in another (Lua is actually really close, but there are a few more practical issues I have with it). This does not stop me from liking or using other languages—but it means that C will likely always have a place for me :)

              1. 2

                Eh. I just don’t feel that sense of cohesion. To be honest the biggest reason I find C unusable because it lacks a native tagged union type, and so you end up with inconsistent implementations of that - I guess if you’re not using that kind of style maybe it looks more cohesive? But even then there are things like: all the control flow constructs are special cases built into the language (and therefore they’re not values, except setjmp/longjmp), all the operators are special cases that only exist for the built-in types (the precedence table is longer than in most languages and includes weird operators such as ,), overflow behaviour varies depending on signedness and the rules for type promotion when mixing signed and unsigned are surprising, IIRC returning functions is somehow different from passing them…

          3. 9

            I don’t think the “X is terrible” rhetorical device serves to really tell us anything about X. I get the idea of a flashy header, it’ll get more reads, generate more discussion, etc.

            I think it’s more useful to focus on the tradeoffs inherent in using a tool.

            Inheritance gives us a way to treat objects polymorphically and enforces a heirarchy of specialization on those objects.

            The tradeoffs for this behaviour are:

            • high cost of maintenance for any children of a given class (in this case the same as any breaking change to an interface)
            • any further extension relies on changing parents or using interfaces and composition.

            • by contrast interfaces and composition can achieve the same behaviour without relying on some external pattern.

            That doesn’t make it a useless pattern, it just makes it useful in a few limited cases.

            • With exceptions it allows us to create ‘families’ of exceptions which are useful for debugging. This would be pretty tricky to pull off with composition or interfaces since the programmer would have to implicitly encode that heirarchy and ensure they’ve extended correctly.

            • In ruby it (for better or for worse) allows us to add behaviours to all objects or all objects of a given class.

            It would be pretty difficult to implement an OO language without single inheritance, and for code that, once stable, is rarely modified the maintenance costs can be amortized over millions of person-hours.

            No tool or technique is universally good, but I think avoiding a tool at all costs also prevents us from learning its strengths and weaknesses.

            1. 7

              I’d argue that Go is a successful OO (or maybe OO-lite) language without inheritance. It has struct embedding, which forwards method calls from a type to one of its embedded subtypes, but no concept of a abstract base class and super keyword.

              1. [Comment removed by author]

                1. 1

                  In haskell, virtually anything you can express with subtyping can be expressed by typeclass hierarchy

                  Now, give me a list of different types that all implement typeclass A. You probably have to resort to existential types or scrap your type classes [1].

                  [1] https://lukepalmer.wordpress.com/2010/01/24/haskell-antipattern-existential-typeclass/

                2. 2

                  Regarding inheritance in OO, it is good to note that there is also prototypical inheritance, which, oddly enough, is the base model for JavaScript. Io is a notable modern language built on prototypical inheritance, Self is probably the oldest/most interesting one. Prototypical inheritance is quite interesting because it is quite simple and versatile, and you can build single inheritance on top of it.

                  1. 2

                    I think rather than talking about if X is terrible or not, the discussion should be around if the feature is important enough to be included as a concrete concept in a language. For example, you can implement inheritance yourself with a record and some functions, so is it important enough to have a language construct that, more or less, creates that struct for you.

                    1. 1

                      I’m a big fan of having less, for most languages under debate, that ship has long since sailed though.

                      1. 3

                        Sure, but these discussions are never about if it will be removed from a language of or if developers will stop using it, it’s just about thinking about the value of things. Every once in a while a language like Go gets momentum and people have a chance to reset things a little bit (regardless of if one likes Go or not. it’s a popular language that dials back some existing popular features.