1. 14
  1.  

  2. 12

    Not that I disagree entirely, but I think this post misses the mark slightly. It seems to be advocating for no hidden details. However, like @teiresias says in a sibling comment, abstraction (which is a vital part of programming) in and of itself is all about hiding details. Hell, any function call or macro can hide details you may care about when debugging.

    No matter what language one chooses, the user of said language should be aware of which details could be hidden and where to look for them. C may not hide certain aspects compared to other languages, but then C arguably hides plenty of other details that more modern languages make explicit. If the author is comfortable with the details that C hides, great.

    1. 3

      Agreed.

      From my perspective, having special operators with special rules in the language that cannot be replicated by user code is a disadvantage, because reducing the amount of magic in the language is more important to me than letting people guess implementation details.

      If users have to worry about implementation details, the implementer has done a poor job, which leads us to …

      Some languages do this better, by giving macros an explicit syntax like name!(args…) […]

      … which is another example of things I consider a bad idea: If the user has to know whether something is a method call or a macro invocation, the macro author has failed.

      As a user, I shouldn’t have to worry about implementation details, and therefore Rust’s ! requirement is wrong and broken.

      1. 1

        Totally agreed. As an example of a tool for abstraction, let’s take polymorphism:

        Claim: Polymorphic code can make it easier to understand what something is doing.

        For example, if you’re using Semigroup operators, you don’t need the “mydatatype_concat” name. So the abstraction actually frees up mental space, and the reader gains insight because they can work with abstractions they already know when reading the code. They already know the Semigroup laws, if they’re lucky they don’t even need to look up the definition.

        I am sure Drew wouldn’t like it because he thinks he has to do the instantiation manually, but with Haskell IDE Engine that is not necessary, you can have the method call link to the implementation used, even if it just for this particular call site instantiation.

      2. 10

        I’ve made exactly this argument before about C++ (with exactly that example) but over time my opinion has changed somewhat. Sure, in C, I know more about what a+b does, but I don’t know more about what add(a, b) does. In a complex project, you’re likely to see a lot of things like the latter that actually go and look at some hand-rolled C implementation of a vtable without any type safety and do some form of dynamic dispatch and those things will look exactly like a normal C function call.

        In short, I think it’s a good rule of thumb that the smallest number of lines of code that I need to read to understand what one line of code does is a good thing, but that’s a property of the codebase far more than it is of the language. In C, you typically need a bunch of macro indirections to build the kinds of things that C++ gives you. Finding where certain things in the Linux kernel, for example, were implemented involved reading five different files and untangling multiple levels of macro instantiation. In C++, equivalent functionality in other codebases that I’ve worked on required finding a using directive to see what a type alias was defined to be and then going and looking at that concrete type. For extra fun, a modern code editor often has functionality to jump through the middle one in C++, whereas the equivalent in C is impossible because it’s bespoke macro magic and not a C language construct.

        1. 9

          If this were written in C, without knowing anything other than the fact that this code compiles correctly, I can tell you that x and y are numeric types, and the result is their sum. I can even make an educated guess about the CPU instructions which will be generated to perform this task.

          x could be a pointer and y could be an int. A pointer isn’t a numeric type. On most architectures, it kind of is numeric; it stores an address – which is just a numeric index into memory. FWIW I never fully grokked pointers until I learned assembly.

          C hides a lot of details too. That’s what programming languages at a higher level than assembly are all about: abstracting over details.

          Regardless of what language you write in, it’s good to be aware of the performance characteristics of your code. And it’s totally possible to cultivate that sort of awareness, even in the face of things like operator overloading. Start by learning some assembly.

          1. 7

            Not to mention that even if they’re numeric types, then “the result is their sum” can still not be true either - IIRC:

            • if the types are unsigned, this line could be an overflow and thus the result would be sum modulo;
            • even worse, if the types are signed, this line could be an undefined behavior, and thus a spooky eldritch Cthulhu action crawling slowly throughout your app and devouring any logic.

            Also, the malloc line does not check the returned value, so:

            • in case you run close to an out-of-memory condition, the strcpy and strcat could overwrite memory, also doing spooky eldritch Cthulhu corruption action;
            • in case x or y come from your users, they could be maliciously crafted such that strlen could overflow int, leading again to spooky eldritch Cthulhu corruption action, security breach etc. etc. (I think; not 100% sure if some other effect doesn’t accidentally neutralize this, I don’t care enough about C anymore to try and track precisely what would happen there in such case).
            1. 1

              in case you run close to an out-of-memory condition, the strcpy and strcat could overwrite memory, also doing spooky eldritch Cthulhu corruption action;

              if there is no memory, it returns NULL - this is probably just gonna segfault and not corrupt anything.

            2. 5

              I think this comment, and @akavel’s follow-up, are both missing the point. Yes, it’s more complex than my brief summary affords. However, the action is well-defined and constrained by the specification, and you can learn what it entails, then apply that knowledge to your understanding of all C programs. On the contrary, for languages which have operator overloading, you can never fully understand the behavior of the + operator without potentially consulting project-specific code.

              1. 3

                I totally get the point of “spooky action at a distance”, and totally agree with it; I learnt of this idea in programming quite some time ago and do look at it similarly since. I also honestly appreciate the look at #define in C as a feature that can be said to exhibit a similar mechanism; I never thought of it like this, and I find it a really cool and interesting take.

                Yet, I still do find the claims about C’s simplicity that this article makes highly misleading. Especially in this context of subtle hidden difficulties. If it at least provided some disclaimer footnote (or whatever else form given gemini), I could swallow this. But as shown in the comments, some of the sentences are currently just plain wrong.

                Let me phrase this differently: I believe this could be a great article. I could recommend it to others. However, currently I see it as, looking at a whole, an okay-ish article. Some good points, some false claims. I don’t plan to advertise it to others, given the dangerous and unfortunately seductive ideas it also conveys (esp. to non-expert or just would-be C programmers) “piggybacked” with the main argument (e.g. that “C is easy and simple”). I lament that the article is not better. And I’m painfully aware I need to actually warn some people against naively succumbing to the overt simplifications presented in this article. Because I’ve seen so many unaware and sincerely gullible people, and myself have been unaware for ohhhh so long

                1. 1

                  I don’t think the omission is consequential to the point of the article. In fact, it criticizes C for its role in this “spooky action at a distance”. The point is not to explain how C works.

                  1. 2

                    Given that this reply basically rehashes parts of what both of us already wrote above, and I apparently read some parts of the article differently than you, I’m assuming neither of us nor anyone else will probably gain anything from any further discussion by us on this matter.

                2. 1

                  What C code seemingly free of function calls does depends on platform as well, for instance see:

                  https://godbolt.org/z/jsWqzM

                  Here binary operations on 64-bit operands compiled as 32-bit code may result in function calls. The other example is initializing a structure of a certain size, which may also result in a function call.

                3. 1

                  Right. Try this for fun,

                  cat <<'.'|cc -x c - && ./a.out && rm a.out
                  #include<stdio.h>
                  #define P(T) do{T*x=0;printf("%lx\n",(unsigned long)(x+y));}while(0)
                  int main(int c,char**v){int y=1;P(char);P(short);P(int);P(long);P(void);return 0;}
                  .
                  
                4. 7

                  That sounds suspiciously like one of the main motivations for zig, from the Zig main page:

                  Focus on debugging your application rather than debugging your programming language knowledge.
                  
                      No hidden control flow.
                      No hidden memory allocations.
                      No preprocessor, no macros.
                  

                  @ddevault, any thoughts there?

                  1. 2

                    Zig does well in this regard, but comptime - one of my biggest complaints about Zig - betrays this somewhat.

                    1. 3

                      Really honestly curious what aspect of comptime are you thinking of here? I haven’t dove deep enough into Zig yet to have my own solid opinions, so I’m really interested esp. in potential pitfalls to watch out for.

                      1. 2

                        It’s hard to reason about when a line of code will run - at runtime? Build time? What conseqeunces will that have on the program’s behavior? What if one of those consequences leads to, for example, a timing attack?

                        1. 7

                          Why do you need to know whether a line of code will run at runtime or compile-time? Even with C and similar languages the optimizer may turn a runtime-known value into a compile-time known value. If you’re concerned about a timing attack, it sounds like you would be interested in this issue, which, again, plagues all C-like languages.

                          1. 2

                            I wonder whether Zig could support E-style auditors, which are objects that can prove arbitrary properties about other objects. During an audition, an auditor receives an AST for an object, and the binding guards for incoming values (but not the values themselves). This sounds abstract and fancy, but for Zig, the binding guards would ultimately just be type annotations; Zig-compatible auditors would simply receive a typed AST for a routine.

                            1. 2

                              I don’t want to get into the weeds with you about Zig today, but I think it’s fair to state that the comptime semantics of Zig far exceed those of e.g. C, otherwise they would not be a compelling value-add over C when pitching Zig; and mention that adding a language-level feature specifically to address constant time operations is an unpleasant solution to my sensibilities.

                              1. 3

                                adding a language-level feature specifically to address constant time operations is an unpleasant solution to my sensibilities

                                For any optimizing / code-transforming compiler, you need some feature to opt out in some way in order to guarantee constant time operation. Otherwise your code as written can be changed in ways that break your constant time guarantees.

                                In-line assembly solves the problem, albeit in an architecture dependent way. Arguably this could be a linker feature rather than language level. Regardless, you need something.

                        2. 1

                          Would you say it’s not worth the tradeoff?

                          1. 1

                            Yes.

                            1. 1

                              Isn’t that up to you to decide within the context of your specific constraints?

                        3. 7

                          And what about mathematics? You see an equation like “C = A + B” and you don’t know whether those are integers, real numbers, complex numbers, vectors, matrices, or some entity defined in the same text. They could be members of some arbitrary group, in which case you don’t even know if A+B is the same as B+A! Clearly mathematics like this has no place in science.

                          As kbknapp said in an earlier comment: programming is all about abstraction. The only way we can make ever more complex things understandable is to wrap the underlying entities in abstractions.

                          A language like C gives us a canned set of abstractions like numbers and pointers. It allows the most expressive and familiar operators like “+” and “==“ to be used only with those types. Custom types have to use named function calls with prefix syntax. The effect is to make higher level constructs less intuitive, much less compact, and to add a lot of custom vocabulary to them.

                          Note that later on C broke down and gave us one new abstraction, letting us use arithmetic operators on a new type: complex numbers. That’s a weird choice IMHO — I’ve never heard of C being a big language for scientific or mathematical programming, unlike FORTRAN or Python. Why didn’t they give us vectors and matrices, which are used heavily in domains like graphics and AI? Why is this decision locked into the language spec, instead of me being able to say “let us define an operation ➿, such that A➿B is …” the way a mathematician or scientist can?

                          1. 1

                            programming is all about abstraction

                            Not really. This is especially false for any kind of programming that has to deal with a particularly gnarly aspect of reality, like systems programming or enterprise software development. All kinds of compromises are necessary to make things happen, and in both these (very different) examples abstraction is more often a problem than a solution.

                            When it comes to systems programming, as the article stated, you need to maintain a sense of what operations your computer ends up doing as a consequence of your programming, in a way that it not equally important in other fields. Abstractions that don’t prioritize this aspect will produce more harm than good.

                            1. 5

                              Let’s be specific about “abstraction” which is only interesting within context.

                              Assembly abstracts over the specific binary instruction format so that you can speak to the CPU in terms of what the instructions mean as opposed to what they specifically are.

                              High level functional languages abstract over memory management, cpu instructions, and the general Turing model of computation to allow the programmer to write business logic rules that would otherwise be complex to encode in a language built on different abstractions.

                              Abstraction doesn’t mean “high level programming language with garbage collector”. It just means allowing the programmer to express computation in the specific level of detail that matters to their work.

                              1. 4

                                I disagree. It’s all abstractions, even embedded-device firmware and OS kernels. My day job is writing performance-sensitive bit-twiddling database code in C++, and I create lots of abstractions to keep from drowning. But I haven’t lost touch with what operations the computer ends up doing, especially when I have tools like dtrace, Instruments and a good disassembler.

                            2. 7

                              I like that this is an ocaml advocacy post in disguise :-).

                              1. 3

                                When I think of the phrase “spooky action at a distance” with respect to programming, the thing that always comes to my mind is mutable state. I know of no better analogy within programming to quantum mechanics’ “spooky action at a distance” than mutable state, though admittedly I know next to nothing about quantum mechanics. Mutable objects in programming seem a lot like objects that have been “quantum entangled”. I think about this a lot when I have reason to share a piece of mutable state between two objects, such as when writing a unification-based type system (which one of my current projects includes). I find it interesting to think about how the type of some expression, as a type variable, can propagate down different branches of a program tree. Then at some point the type checker unifies the type on one branch to something (more) fully specified, and suddenly, spookily, the type of expressions in a distant branch are similarly specified.

                                Not that I think mutable state is inherently bad – it’s a great engineering tool that is often overly maligned by functional programming purists. But I do think, just like good design of countless other things in programming, exactly how to use mutable state with clarity rather than confusion requires good taste and judgment. (This is not a claim that I necessarily have the best taste and judgment.)

                                Now I’d like to nitpick a statement from the OP, even though others are commenting on the same statement:

                                x + y

                                If this were written in C, without knowing anything other than the fact that this code compiles correctly, I can tell you that x and y are numeric types, and the result is their sum.

                                In C, without using a particular compiler that specifies particular semantics beyond the standard, you can not know the result (or even the behavior of the surrounding code or entire program!) of a + b without knowing the dynamic state of the running program, because a + b can result in undefined behavior. There is no programming language more spooky than C.

                                1. 2

                                  I don’t think it’s mutable state by itself that’s the problem, it’s aliased mutable state. In C, I can write code like this:

                                  int *a = &foo;
                                  int *b = a + 42;
                                  b[2] = 12;
                                  

                                  And this changes the value of a[54], even though I never used the name a in my mutation. That’s action at a distance and, to me, a good language should provide some static type information to tell you when things might be aliased (neither C nor C++ does a good job here).

                                  Aliasing between concurrent execution contexts is the worst case of this. In C, there’s no protection against this at all and (worse) the language says it’s undefined behaviour if two threads concurrently update a variable and it isn’t _Atomic qualified. Shared mutable state is the worst possible kind of spooky action at a distance: you can step through a thread one instruction at a time and still see a result that you couldn’t explain from looking at that thread’s code. This is why Verona makes it impossible to express concurrently mutable state.

                                  1. 1

                                    I agree, without mutable state a function can be efficient, or inefficient, but never really spooky.

                                    1. 1

                                      Regarding your mention of unification state propagating through different branches of the program tree: you may be interested in this paper, which defines a type system that’s strictly compositional (the type of an expression is based entirely on the type of its subexpressions).

                                    2. 2

                                      Let me know if there’s already a name for this phenomenon, but I’m noticing that there’s is an inherent trade-off between a language that is (for a lack of a better term) “easy to write” vs “easy to read”:

                                      If you go all the way and make everything locally explicit, it’s going to be tedious to write. It will feel stupid to write boilerplate and redundant information that the compiler could figure out itself. At the same time, such code will be easier to understand for readers, because everything will be spelled out, nothing hidden out of the view.

                                      And conversely, “clever” code with macros, generics, inference, sigils, and other kinds of syntax sugar may be pleasant to write, and perform a lot of good work with a few lines of code, but if overdone it becomes write-only.

                                      This tension between “too magical!” and “not magical enough!” keeps coming back.

                                      1. 2

                                        How much of this is tooling related? If we were using editors that made it trivial to show “what executes here? where does this go to? where is this defined?”, would we be still thinking this is “spooky?”

                                        1. 2

                                          I think that Haskell IDE Engine has helped a lot, since it allows you to see how a polymorphic function is instantiated. So it helps a lot, yes.

                                          1. 1

                                            An excellent point. A great example is Causeway, a debugger for E which could trace execution across multiple distributed regions of computation, and understood asynchronous execution.