This is a really timely post. Yesterday after reading @garybernhardt’s revision of Linus’ email, I went down the rabbit hole of type punning, strict aliasing, and undefined behavior. Here are some more helpful articles:
John Regehr’s articles are always informative, as are Lattner’s. Cardelli’s is really helpful as far as understanding the value of type systems, how to think about them, and what they can and cannot do for you.
The article @GeoffWozniak posted here has helpful links in the footnotes as well.
Cardelli also worked on Modula-3 systems language. It has one of best balances Ive seen of features vs simplicity along with fast compiles, safe by default, and concurrency support. It also had first, standard library with partial, formal verification using the precurser to Why3.
All it needed was macros to be nearly perfect alternative to C++. Well, maybe C syntax and some compatibility given how important that turned out. I consider that mandatory these days if aiming for C or C++ crowds.
This is a typical illustration of the problem with the Standard. The rules are complex, totally unclear, easy to violate, and yet don’t provide the greater precision one would want. For example, it appears that you cannot write malloc/free in conforming C code since the type of memory that is freed may then change as it is reallocated. Even worse, nobody is really sure of this because the definition of effective type is so opaque. The exception for character type was put in in a hurry when the committee figured out it was impossible to write memcpy in C - it is a terrible hack used to escape from a problem that the standard writers made for themselves. The idea that programmers should use memcpy and rely on the optimizer to figure out what they really meant seems to me to be both a mess and a violation of the basic language design. A Haskell or Java compiler should be able to figure out how to do an in place update without a copy - a C programmer should be able to do that kind of computation directly. The memcpy hack appears to me to be the worst of both worlds: we have to allocate a buffer to copy the data into (and maybe back from) which is the kind of low level tedious bookeeping we don’t have in higher level languages, but only do so in the hope that the optimizer will throw it away because its doing the kind of smart stuff that higher level languages do routinely. And, of course, there is no improvement in type safety because we can write anything over anything anyways.
The importance of type punning in C is that it allows the programmer to access the underlying binary representation of a data item. There’s a great example of this in DJB’s post-quantum encryption algorithm, where he uses the Intel vector intrinsics to get super efficient sorting, but also manages to keep execution time constant so there is no escape of information. That is the kind of precise control over computation C is good for and that the Standard has muddied.
What would be useful in C would be an ability to avoid pathological or inadvertent type punning and a way of telling the compiler that e.g. these particular pointers do not refer to intersecting sets of memory cells to enable optimization. Typedef that was more strict would also be interesting as would the ability to turn off automatic promotion. I’d like to be able to typedef metric and english types, for example, on top of int and get compiler errors or warnings when they were mixed (someone here posted a clever method of using structs to get similar effect) . To me, one of the major missed opportunities with the direction of the C standard is that with some work, static analysis could be made much more powerful. That’s a lot more interesting to me than tweaks to make it possible to get better standard benchmark numbers on microbenchmarks.
Both are important. On a typical team project, effective communication is probably more important than understanding the minutiae of aliasing and type punning. Mr. Bernhardt’s post is also more than the nth- rant about Linus swearing, because it includes a complete revision of the email.
This is a really timely post. Yesterday after reading @garybernhardt’s revision of Linus’ email, I went down the rabbit hole of type punning, strict aliasing, and undefined behavior. Here are some more helpful articles:
John Regehr’s articles are always informative, as are Lattner’s. Cardelli’s is really helpful as far as understanding the value of type systems, how to think about them, and what they can and cannot do for you.
The article @GeoffWozniak posted here has helpful links in the footnotes as well.
Cardelli also worked on Modula-3 systems language. It has one of best balances Ive seen of features vs simplicity along with fast compiles, safe by default, and concurrency support. It also had first, standard library with partial, formal verification using the precurser to Why3.
All it needed was macros to be nearly perfect alternative to C++. Well, maybe C syntax and some compatibility given how important that turned out. I consider that mandatory these days if aiming for C or C++ crowds.
This is the best write up on this topic that I’ve ever seen. It’s worth the read.
This is a typical illustration of the problem with the Standard. The rules are complex, totally unclear, easy to violate, and yet don’t provide the greater precision one would want. For example, it appears that you cannot write malloc/free in conforming C code since the type of memory that is freed may then change as it is reallocated. Even worse, nobody is really sure of this because the definition of effective type is so opaque. The exception for character type was put in in a hurry when the committee figured out it was impossible to write memcpy in C - it is a terrible hack used to escape from a problem that the standard writers made for themselves. The idea that programmers should use memcpy and rely on the optimizer to figure out what they really meant seems to me to be both a mess and a violation of the basic language design. A Haskell or Java compiler should be able to figure out how to do an in place update without a copy - a C programmer should be able to do that kind of computation directly. The memcpy hack appears to me to be the worst of both worlds: we have to allocate a buffer to copy the data into (and maybe back from) which is the kind of low level tedious bookeeping we don’t have in higher level languages, but only do so in the hope that the optimizer will throw it away because its doing the kind of smart stuff that higher level languages do routinely. And, of course, there is no improvement in type safety because we can write anything over anything anyways.
The importance of type punning in C is that it allows the programmer to access the underlying binary representation of a data item. There’s a great example of this in DJB’s post-quantum encryption algorithm, where he uses the Intel vector intrinsics to get super efficient sorting, but also manages to keep execution time constant so there is no escape of information. That is the kind of precise control over computation C is good for and that the Standard has muddied.
What would be useful in C would be an ability to avoid pathological or inadvertent type punning and a way of telling the compiler that e.g. these particular pointers do not refer to intersecting sets of memory cells to enable optimization. Typedef that was more strict would also be interesting as would the ability to turn off automatic promotion. I’d like to be able to typedef metric and english types, for example, on top of int and get compiler errors or warnings when they were mixed (someone here posted a clever method of using structs to get similar effect) . To me, one of the major missed opportunities with the direction of the C standard is that with some work, static analysis could be made much more powerful. That’s a lot more interesting to me than tweaks to make it possible to get better standard benchmark numbers on microbenchmarks.
Now, this is a useful submission compared to the n-th rant about Linus swearing. :)
Both are important. On a typical team project, effective communication is probably more important than understanding the minutiae of aliasing and type punning. Mr. Bernhardt’s post is also more than the nth- rant about Linus swearing, because it includes a complete revision of the email.