C is called portable assembly language for a reason, and I like it because of that reason.
People do keep calling C “portable assembly language”, and it continues to be wrong. Thinking that you understand exactly (or even roughly) what assembly will be generated for a particular piece of C code is a trap, one that leads to the nastiest and subtlest of bugs.
I agree with the criticisms directed at C++, but the arguments made in favor for C are weak at best IMO. It basically boils down to C code being shorter than equivalent code in competing languages (such as Rust), and C being more powerful and giving more tools to the programmer. I disagree strongly with the second point: unless you’re trying to write obfuscated code in C (which admittedly is quite fun to do), the features C has that make it supposedly more “powerful” are effectively foot-guns that can be used to write reliable code, not the other way around. C’s largest design flaw is that it not only allows for “clever” unsafe code to be written, but it actively encourages it. By design, it’s easier to write unsafe C code than it is to properly handle all edge-cases, and C’s design also makes it incredibly difficult to spot these abuses. In plenty of cases, C seemingly encourages the abuse of undefined behavior, because it looks cleaner than the alternative of writing actually correct code. C is a language I still honestly quite like, but it is a deeply flawed language, and we need to acknowledge the language’s shortcomings, rather than just pretend they don’t exist or try to defend what isn’t defensible.
C is called portable assembly language for a reason, and I like it because of that reason.
C is a higher-level version of the PDP-11’s assembly language. C still thinks that every computer works just like the PDP-11 did, and the result is that the language really isn’t as low level as some believe it is.
In plenty of cases, C seemingly encourages the abuse of undefined behavior, because it looks cleaner than the alternative of writing actually correct code.
In some of those cases, this is because the standard botched their priorities. Specifically, undefined behaviour of signed integer overflow: the clean way to check for signed overflow is to perform the addition or whatever, then check whether the result is negative or something. In C, that’s also the incorrect way, because overflow is undefined, despite the entire planet being 2’s complement.
Would that make optimisations harder? I doubt it matters for many real world programs though. And even if it does: perhaps we should have a better for loop, with, say, an immutable index?
In my opinion, Zig handles overflow/underflow in a much better way. In Zig, overflow is normally undefined (though it’s caught in debug/safe builds), but the programmer can explicitly use +%, -%, or *% to do operations with defined behavior on overflow, or use a built-in function like @addWithOverflow to perform addition and get a value returned back indicating whether or not overflow occurred. This allows for clean and correct checking for overflow, while also keeping the optimizations currently in place that rely on undefined behavior on overflow. All that being said, I would be curious to know how much of a performance impact said optimizations actually have on real code.
Having such a simple alternative would work well indeed.
I’m still sceptical about the optimisations to be honest. One example that was given to me was code that iterates with int, but compares with size_t, and the difference in width generated special cases that slows everything down. To which I thought “wait a minute, why is the loop index a signed integer to begin with?”. To be checked.
And in my mind, it can stick with C89. Can you name one other 1’s complement machine still in active use with a C compiler? Or any sign-magnitude machines? I think specifying 2’s complement and no trap representation will bring C compilers more into alignment with reality [1].
[1] I can’t prove it, but I suspect way over 99.9% of existing C code assumes a byte-addressable, 2’s complement machine with ASCII/UTF-8 character encoding [2].
[2] C does not mandate the use of ASCII or UTF-8. That means that all existing C source code is not actually portable across every compiler because the character set is “implementation defined.” Hope you have your EBCDIC tables ready …
Don’t forget that the execution character set can differ from the translation character set, so it’s perfectly fine to target an EBCDIC execution environment with ASCII (or even ISO646) sources.
Well, I guess we’ll still have to deal with legacy code in some narrow niches. Banks will be banks.
Outside of legacy though, let’s be honest: when was designed the last ISA that didn’t use 2’s complement? My bet would be no later than 1980. Likely even earlier. Heck, the fight was already over when the 4-bit 74181 ALU went out, in the late sixties.
Oh yeah, I definitely keep these examples on file for when people tell me that all negative numbers are 2’s complement/all floating points are IEEE854/all bytes are 8-bit etc., but they point to a fundamental truth that C solves a lot of problems that “C replacements” don’t even attempt to do. C replacements are often “if we ignore a lot of things you can do in C, then my language is better”.
Except I’m not even sure C does solve those problems. Its approach has always been to skip the problems, that with implementation defined and undefined behaviour. There’s simply no way to be portable and take advantage of the peculiarities of the machines. If you want to get close to the metal, well, you need a compiler and programs for that particular metal.
In the mean time, the most common metal (almost to the point of hegemony) has 8-bit bytes, 2’s complement integers, and IEEE floating point numbers. Let’s address that first, and think about more exotic architecture later. Even if those exotic architectures do have their place, they’re probably exotic enough that they can’t really use your usual C code, and instead need custom code, perhaps even a custom compiler.
I’ve always felt people over-react to the implementation defined behavior in the C standard.
It’s undefined in the language spec, but in most cases (like 2’s complement overflow) it is defined by the platform and compiler. Clearly it’s better to have it defined by the standard, but it’s not necessarily a bad thing to delegate some behavior to the compiler and platform, and it’s almost never the completely arbitrary, impossible to predict behavior people make it out to be.
It’s a pain for people trying to write code portable to every conceivable machine ever created, but let’s be realistic: most people aren’t doing that.
Signed overflow is not implementation defined, it is undefined. Implementation-defined behaviour is fine. It requires that the implementer document the behaviour and deterministically do the same thing every time. Undefined behaviour allows the compiler to implement optimisations that are sound if they assume as an axiom that the behaviour cannot exist in any valid program. Some of these are completely insane: it is UB in C99 (I think they fixed this in C11) for a source file to not end with a newline character. This is because limitations of early versions of Lex/YACC.
It’s undefined in the language spec, but in most cases (like 2’s complement overflow) it is defined by the platform and compiler
It’s defined by the platform only. Compilers do tread that as “we are allowed to summon the nasal demons”. I’m not even kidding: serious vulnerabilities in the past have been caused by security checks being removed by the compilers, because their interpretation of undefined behaviour meant the security check was dead code.
In the specific case of signed integer overflow, Clangs -fsanitize=undefined does warn you about the overflow being undefined. I have tested it. Signed integer overflow is not defined by the compiler. It just doesn’t notice most of the time. Optimisers are getting better and better though. Which is why you cannot, in 2021, confidently write C code that overflows signed integers, even on bog standard 2’s complement platforms. Even on freaking Intel x86-64 processors. The CPU can do it, but C will not let it.
If the standard actually moved signed overflow to “implementation defined behaviour”, or even “implementation defined if the platform can do it, undefined on platform that trap or otherwise go bananas”, I would very happy. Except that’s not what the standard says. It just says “undefined”. While the intend was most probably to say “behave sensibly if the platform allows is, go bananas otherwise”, that’s not what the standard actually says. And compiler writers, in the name of optimisation, interpreted “undefined” in the most broad way possible: if something is undefined because one platform can’t handle it, it’s undefined for all platforms. And you can pry the affected optimisations from their cold dead hands.
Or you can use -fwrapv. It’s not standard. It’s not quite C. It may not be available everywhere. There’s no guarantee, if you write a library, that your users will remember to use that option when they compile it. But at least it’s there.
It’s a pain for people trying to write code portable to every conceivable machine ever created, but let’s be realistic: most people aren’t doing that.
I am. You won’t find a single instance of undefined behaviour there. There is one instance of implementation defined behaviour (right shift of negative integers), but I don’t believe we can find a single platform in active use that does not propagate the sign bit in this case.
Thou shalt foreswear, renounce, and abjure the vile heresy which claimeth that “All the world’s a VAX”, and have no commerce with the benighted heathens who cling to this barbarous belief, that the days of thy program may be long even though the days of thy current machine be short
Whilst the world is not a VAX any more, neither is it an x86. Consider that your code may run on a PowerPC RISC-V, ARM, MIPS or any of the many other architectures supported by Linux. Some processors are big endian, others little. Some 32-bit and others 64. Most are single core but increasingly they are multi-core.
if you want to retrieve an array value with two offsets, one of which can be negative, in C you write arr[off1 + off2] while in Rust it would be arr[((off1 as isize) + off2) as usize].
In other words, Rust is making you consider signed vs. unsigned arithmetic and the possibility of overflow. Those are good things in my book. Ignoring them is simpler, but debugging the possible after effects is not, especially when array access is just syntactic sugar for pointer arithmetic. How many real-world vulnerabilities have stemmed from this sort of thing?
Similarly memset() and memmove() are powerful tools.
They are chainsaws, the kind without finger-guards. And while it’s possible to make sculpture using chainsaws, it usually works better to use finer chisels.
IMHO if you value simplicity that highly, best to work in a safer higher level language like JavaScript or Python, where mistakes are either impossible or at least less catastrophic and easier to debug. Or if you want performance, use a language with more compile-time safeguards like C++ or Rust, even if some of those safeguards require you to be a bit more explicit about what you’re doing. Or look at Nim or Go, which are somewhere in between.
If I sound a bit judgey, it’s because so many software bugs and vulnerabilities come from this style of coding.
I really hate C++, the one thing it has going for it is that it isn’t as bad as other languages in the same space. A few examples:
Templated classes are really useful (especially with concepts) for defining closed-world specialisations. For example, if you define a platform- and architecture-abstraction layer as a class, you can instantiate the rest of your code with this and get compile-time checking if you’ve mismatched an interface somewhere. You can then also instantiate parts of your code with different versions of this for testing. At compile time, all of this is erased and you get static single dispatch for your fast paths.
C++11, unlike C11, gets atomics right. It’s possible to implement C++11’s atomics entirely in the library on any compiler that supports inline assembly, with template specialisations for all types that fit in a register and a guard lock for larger types. C11 atomics are a mess.
Lambdas (especially with auto parameters) are great for avoiding those little places where you do the same thing a bunch of times in a function. They avoid copy and paste and will generally be inlined and erased. You can do the same thing with the C preprocessor, but then you need to remember to #undef the macro at the end of the scope and the code becomes a lot less readable.
Interfaces are useful for when you have an open-world abstraction and need to support plugins. Most kernels define ad-hoc vtable structures for this, in C++ the language knows about the dispatch and can do useful optimisation things (for example, it can assume vtable pointers do not change during an object’s lifetime and that vtables are immutable, which enables devirtualization when analysis can determine the type at compile time).
C++ makes it easy to implement unit types and type-safe enums, whereas C makes it easy to introduce bugs because most of these things are implicitly int.
C++ makes it easy to write generic data structures. Compare using something like uthash, which is probably the cleanest generic hash table I’ve seen in C, to std::unordered_map. Now imagine that you’ve written code using one of these and you discover that you want to change it to something else, such as a Robin-Hood has-table based map. In C++, it involves changing one using or typedef line. In C, it requires some significant rewrites. Data-structure agility in C++ means I can prototype things using the standard-library collections and then replace them with something more optimised for my use case after profiling tells me that they’re a bottleneck.
C++ has fairly limited compile-time reflection, but it does far more than C. This composes well with things like std::forward_as_tuple and std::apply to let me write generic code that does something different depending on the arguments.
I’ve pointed at this before, but I have some C++ code that highlights a bunch of these things. It exists to handle the signal delivered when a static OS sandboxing policy is violated by a system call attempt and try to do an RPC toa more-privileged process that may have a dynamic policy that allows the blocked behaviour. This uses a generic lambda to dispatch to the right function, inferring the type of the function from its symbol, then uses compile-time type information to extract the argument values from the signal frame. The actual decomposition of the system call frame depends on the target and is implemented by a self-contained class in the platform abstraction layer, with one for FreeBSD and one for Linux so far. These are not yet using concepts, but will as soon as compiler support is a bit more mature.
Between templates and constexpr, C++ quite rich compile-time support. This makes it quite easy to do things like generate look-up tables at compile time, where C programmers typically end up with a C or Python program that generates a C source file that they have to integrate into their build systems.
Any time I use C to do anything nontrivial, I end up writing a bad implementation of a load of C++ in the language. The same is true of every large C codebase I’ve ever worked on.
I do some c and c++ for my dayjob. If you forego inheritance, modern C++ isn’t so bad. Just stick to std:: types and functions wherever possible and you’ll be reasonably safe, expressive, and performant. The standard library is improving but simple things like parsing integers and tokenizing strings still really sucks and is very complicated with string views.
I think the criticisms of C++ are valid, but they don’t bother me as much as they bother the author. And for me, RAII brings enough to the table that I don’t care very much about these other criticisms. If I need to start a new project in one of these two languages for some reason, it’ll be C++ every time for that reason alone, unless there is something preventing it like lack of a reasonable C++ compiler for my target platform.
People do keep calling C “portable assembly language”, and it continues to be wrong. Thinking that you understand exactly (or even roughly) what assembly will be generated for a particular piece of C code is a trap, one that leads to the nastiest and subtlest of bugs.
C is not a portable assembly language.
I agree with the criticisms directed at C++, but the arguments made in favor for C are weak at best IMO. It basically boils down to C code being shorter than equivalent code in competing languages (such as Rust), and C being more powerful and giving more tools to the programmer. I disagree strongly with the second point: unless you’re trying to write obfuscated code in C (which admittedly is quite fun to do), the features C has that make it supposedly more “powerful” are effectively foot-guns that can be used to write reliable code, not the other way around. C’s largest design flaw is that it not only allows for “clever” unsafe code to be written, but it actively encourages it. By design, it’s easier to write unsafe C code than it is to properly handle all edge-cases, and C’s design also makes it incredibly difficult to spot these abuses. In plenty of cases, C seemingly encourages the abuse of undefined behavior, because it looks cleaner than the alternative of writing actually correct code. C is a language I still honestly quite like, but it is a deeply flawed language, and we need to acknowledge the language’s shortcomings, rather than just pretend they don’t exist or try to defend what isn’t defensible.
C is a higher-level version of the PDP-11’s assembly language. C still thinks that every computer works just like the PDP-11 did, and the result is that the language really isn’t as low level as some believe it is.
In some of those cases, this is because the standard botched their priorities. Specifically, undefined behaviour of signed integer overflow: the clean way to check for signed overflow is to perform the addition or whatever, then check whether the result is negative or something. In C, that’s also the incorrect way, because overflow is undefined, despite the entire planet being 2’s complement.
Would that make optimisations harder? I doubt it matters for many real world programs though. And even if it does: perhaps we should have a better
for
loop, with, say, an immutable index?In my opinion, Zig handles overflow/underflow in a much better way. In Zig, overflow is normally undefined (though it’s caught in debug/safe builds), but the programmer can explicitly use
+%
,-%
, or*%
to do operations with defined behavior on overflow, or use a built-in function like@addWithOverflow
to perform addition and get a value returned back indicating whether or not overflow occurred. This allows for clean and correct checking for overflow, while also keeping the optimizations currently in place that rely on undefined behavior on overflow. All that being said, I would be curious to know how much of a performance impact said optimizations actually have on real code.Having such a simple alternative would work well indeed.
I’m still sceptical about the optimisations to be honest. One example that was given to me was code that iterates with
int
, but compares withsize_t
, and the difference in width generated special cases that slows everything down. To which I thought “wait a minute, why is the loop index a signed integer to begin with?”. To be checked.Huh. I guess compilers really are written to compensate for us dumb programmers. What a world!
Perhaps your planet is not Earth but the Univac 1100 / Clearpath Dorado series is 1’s complement, can still be purchased, and has a C compiler.
And in my mind, it can stick with C89. Can you name one other 1’s complement machine still in active use with a C compiler? Or any sign-magnitude machines? I think specifying 2’s complement and no trap representation will bring C compilers more into alignment with reality [1].
[1] I can’t prove it, but I suspect way over 99.9% of existing C code assumes a byte-addressable, 2’s complement machine with ASCII/UTF-8 character encoding [2].
[2] C does not mandate the use of ASCII or UTF-8. That means that all existing C source code is not actually portable across every compiler because the character set is “implementation defined.” Hope you have your EBCDIC tables ready …
Don’t forget that the execution character set can differ from the translation character set, so it’s perfectly fine to target an EBCDIC execution environment with ASCII (or even ISO646) sources.
Well, I guess we’ll still have to deal with legacy code in some narrow niches. Banks will be banks.
Outside of legacy though, let’s be honest: when was designed the last ISA that didn’t use 2’s complement? My bet would be no later than 1980. Likely even earlier. Heck, the fight was already over when the 4-bit 74181 ALU went out, in the late sixties.
Oh yeah, I definitely keep these examples on file for when people tell me that all negative numbers are 2’s complement/all floating points are IEEE854/all bytes are 8-bit etc., but they point to a fundamental truth that C solves a lot of problems that “C replacements” don’t even attempt to do. C replacements are often “if we ignore a lot of things you can do in C, then my language is better”.
Except I’m not even sure C does solve those problems. Its approach has always been to skip the problems, that with implementation defined and undefined behaviour. There’s simply no way to be portable and take advantage of the peculiarities of the machines. If you want to get close to the metal, well, you need a compiler and programs for that particular metal.
In the mean time, the most common metal (almost to the point of hegemony) has 8-bit bytes, 2’s complement integers, and IEEE floating point numbers. Let’s address that first, and think about more exotic architecture later. Even if those exotic architectures do have their place, they’re probably exotic enough that they can’t really use your usual C code, and instead need custom code, perhaps even a custom compiler.
I’ve always felt people over-react to the implementation defined behavior in the C standard.
It’s undefined in the language spec, but in most cases (like 2’s complement overflow) it is defined by the platform and compiler. Clearly it’s better to have it defined by the standard, but it’s not necessarily a bad thing to delegate some behavior to the compiler and platform, and it’s almost never the completely arbitrary, impossible to predict behavior people make it out to be.
It’s a pain for people trying to write code portable to every conceivable machine ever created, but let’s be realistic: most people aren’t doing that.
Signed overflow is not implementation defined, it is undefined. Implementation-defined behaviour is fine. It requires that the implementer document the behaviour and deterministically do the same thing every time. Undefined behaviour allows the compiler to implement optimisations that are sound if they assume as an axiom that the behaviour cannot exist in any valid program. Some of these are completely insane: it is UB in C99 (I think they fixed this in C11) for a source file to not end with a newline character. This is because limitations of early versions of Lex/YACC.
It’s defined by the platform only. Compilers do tread that as “we are allowed to summon the nasal demons”. I’m not even kidding: serious vulnerabilities in the past have been caused by security checks being removed by the compilers, because their interpretation of undefined behaviour meant the security check was dead code.
In the specific case of signed integer overflow, Clangs
-fsanitize=undefined
does warn you about the overflow being undefined. I have tested it. Signed integer overflow is not defined by the compiler. It just doesn’t notice most of the time. Optimisers are getting better and better though. Which is why you cannot, in 2021, confidently write C code that overflows signed integers, even on bog standard 2’s complement platforms. Even on freaking Intel x86-64 processors. The CPU can do it, but C will not let it.If the standard actually moved signed overflow to “implementation defined behaviour”, or even “implementation defined if the platform can do it, undefined on platform that trap or otherwise go bananas”, I would very happy. Except that’s not what the standard says. It just says “undefined”. While the intend was most probably to say “behave sensibly if the platform allows is, go bananas otherwise”, that’s not what the standard actually says. And compiler writers, in the name of optimisation, interpreted “undefined” in the most broad way possible: if something is undefined because one platform can’t handle it, it’s undefined for all platforms. And you can pry the affected optimisations from their cold dead hands.
Or you can use
-fwrapv
. It’s not standard. It’s not quite C. It may not be available everywhere. There’s no guarantee, if you write a library, that your users will remember to use that option when they compile it. But at least it’s there.I am. You won’t find a single instance of undefined behaviour there. There is one instance of implementation defined behaviour (right shift of negative integers), but I don’t believe we can find a single platform in active use that does not propagate the sign bit in this case.
https://www.electronicsweekly.com/open-source-engineering/linux/the-ten-commandments-for-c-programmers-2009-04/
In other words, Rust is making you consider signed vs. unsigned arithmetic and the possibility of overflow. Those are good things in my book. Ignoring them is simpler, but debugging the possible after effects is not, especially when array access is just syntactic sugar for pointer arithmetic. How many real-world vulnerabilities have stemmed from this sort of thing?
They are chainsaws, the kind without finger-guards. And while it’s possible to make sculpture using chainsaws, it usually works better to use finer chisels.
IMHO if you value simplicity that highly, best to work in a safer higher level language like JavaScript or Python, where mistakes are either impossible or at least less catastrophic and easier to debug. Or if you want performance, use a language with more compile-time safeguards like C++ or Rust, even if some of those safeguards require you to be a bit more explicit about what you’re doing. Or look at Nim or Go, which are somewhere in between.
If I sound a bit judgey, it’s because so many software bugs and vulnerabilities come from this style of coding.
I mostly agree with you, but now and then a chainsaw is appropriate :)
That’s why Rust has
unsafe
.I really hate C++, the one thing it has going for it is that it isn’t as bad as other languages in the same space. A few examples:
Templated classes are really useful (especially with concepts) for defining closed-world specialisations. For example, if you define a platform- and architecture-abstraction layer as a class, you can instantiate the rest of your code with this and get compile-time checking if you’ve mismatched an interface somewhere. You can then also instantiate parts of your code with different versions of this for testing. At compile time, all of this is erased and you get static single dispatch for your fast paths.
C++11, unlike C11, gets atomics right. It’s possible to implement C++11’s atomics entirely in the library on any compiler that supports inline assembly, with template specialisations for all types that fit in a register and a guard lock for larger types. C11 atomics are a mess.
Lambdas (especially with
auto
parameters) are great for avoiding those little places where you do the same thing a bunch of times in a function. They avoid copy and paste and will generally be inlined and erased. You can do the same thing with the C preprocessor, but then you need to remember to#undef
the macro at the end of the scope and the code becomes a lot less readable.Interfaces are useful for when you have an open-world abstraction and need to support plugins. Most kernels define ad-hoc vtable structures for this, in C++ the language knows about the dispatch and can do useful optimisation things (for example, it can assume vtable pointers do not change during an object’s lifetime and that vtables are immutable, which enables devirtualization when analysis can determine the type at compile time).
C++ makes it easy to implement unit types and type-safe
enum
s, whereas C makes it easy to introduce bugs because most of these things are implicitlyint
.C++ makes it easy to write generic data structures. Compare using something like uthash, which is probably the cleanest generic hash table I’ve seen in C, to std::unordered_map. Now imagine that you’ve written code using one of these and you discover that you want to change it to something else, such as a Robin-Hood has-table based map. In C++, it involves changing one
using
ortypedef
line. In C, it requires some significant rewrites. Data-structure agility in C++ means I can prototype things using the standard-library collections and then replace them with something more optimised for my use case after profiling tells me that they’re a bottleneck.C++ has fairly limited compile-time reflection, but it does far more than C. This composes well with things like
std::forward_as_tuple
andstd::apply
to let me write generic code that does something different depending on the arguments.I’ve pointed at this before, but I have some C++ code that highlights a bunch of these things. It exists to handle the signal delivered when a static OS sandboxing policy is violated by a system call attempt and try to do an RPC toa more-privileged process that may have a dynamic policy that allows the blocked behaviour. This uses a generic lambda to dispatch to the right function, inferring the type of the function from its symbol, then uses compile-time type information to extract the argument values from the signal frame. The actual decomposition of the system call frame depends on the target and is implemented by a self-contained class in the platform abstraction layer, with one for FreeBSD and one for Linux so far. These are not yet using concepts, but will as soon as compiler support is a bit more mature.
Between templates and
constexpr
, C++ quite rich compile-time support. This makes it quite easy to do things like generate look-up tables at compile time, where C programmers typically end up with a C or Python program that generates a C source file that they have to integrate into their build systems.Any time I use C to do anything nontrivial, I end up writing a bad implementation of a load of C++ in the language. The same is true of every large C codebase I’ve ever worked on.
I do some c and c++ for my dayjob. If you forego inheritance, modern C++ isn’t so bad. Just stick to std:: types and functions wherever possible and you’ll be reasonably safe, expressive, and performant. The standard library is improving but simple things like parsing integers and tokenizing strings still really sucks and is very complicated with string views.
I think the criticisms of C++ are valid, but they don’t bother me as much as they bother the author. And for me, RAII brings enough to the table that I don’t care very much about these other criticisms. If I need to start a new project in one of these two languages for some reason, it’ll be C++ every time for that reason alone, unless there is something preventing it like lack of a reasonable C++ compiler for my target platform.