In my opinion, it would have been a mistake had there been better choices available. C was one of the very few languages that made life easier for programmers (recall most were using assembly back then), and offered the desired performance benefits of being close to the machine. And it was a small language, allowing quick learning and unparalleled flexibility. Of course, that flexibility came with a tradeoff – it was much much easy to shoot yourself in your foot (and we continue to see these footguns even today!)
Even today I don’t see any formidable opponent to C when it comes to implementing the OS kernel, networking stack, filesystems, and similar serious applications. Rust may be the C replacement, but we have to wait to see where it goes.
On that note, I recall an argument about using newer memory-safe language as a replacement for C on the OpenBSD mailing list, and this email from Theo de Raadt is worth pondering over: https://marc.info/?l=openbsd-misc&m=151233345723889&w=2. If someone cares, the entire thread is worth reading.
I fully agree with this article. One of the most annoying things about the C standard library, apart from all the things mentioned here, is that the atoi function returns 0 for invalid strings. So there is no way to distinguish the input string horse from 0 and there is no way to properly verify input with that function. The entire C standard library is fraught with such really quite obvious and terrible API mistakes, and it sucks the fun out of writing correct software in C. I usually just reimplement the standard library functions with correct and sane replacements… But you really don’t want to reimplement the standard library as your first task when you write a C program. The C standard library didn’t just not age well, it was terrible from the get-go.
What I’m really curious about is whether the author has some kind of standard library alternative that he is using, and if so, which one and can we use it too?
I agree with your frustration at much of the standard library, but the solution to atoi’s crap is to use strtol and friends (along with errno checking) - POSIX even requires atoi to be equivalent to
(int) strtol(str, (char **)NULL, 10)
except for error handling. The only reason atoi remains in POSIX is “because it is used extensively in existing code”.
I know, but atoi just strikes me as a prime example of what is wrong with the C standard library design. A lot has to go wrong for such a function to find its way into a language standard library. And as you said, it is used extensively in existing code - probably all of it buggy and full of parsing issues.
C was around for over 15 years, maybe close to 20, before the first standard was released, and all the C compiler vendors at the time wanted to do the least amount of work to confirm to the standard, so a lot of compromises were made. Also, a lot of C libraries prior to standardization where implemented right from K&R, and gues what? atoi() is right there in the book (first edition was 1978 to put things into perspective).
Mostly agreed. The locales stuff is a mess, though the _l versions in POSIX are nicer (at least the locale is explicit).
Without libc you don’t have to use this global, hopefully thread-local, pseudo-variable. Good riddance. Return your errors, and use a struct if necessary.
Most of libc doesn’t use errno, it’s primarily used as the return from system calls. Without libc, you need some other way of doing system calls. The FreeBSD calling convention for system calls can’t actually be represented directly in C because it uses the carry flag to differentiate between valid and error returns.
Types are not atomic, loads and stores are atomic
C screwed this up trying to localise a C++ API into C. The intent of the standard was to allow std::atomic<T> (translated into C as _Atomic(T)) and T to have different representations. For example, if you do _Atomic(struct SomeBigStruct) may have a lock word at the start (or end). Instead, atomics have some interesting ptifalls where you can put them in shared memory and they’re not actually atomic.
However, I don’t think the atomic functions require _Atomic-qualified arguments
They do, though the wording of the standard is somewhat impenetrable. They also require the arguments to be volatile, which is just bizarre.
Introduced in C11, but never gained significant traction. Anywhere you can use C threads you can use pthreads, which are better anyway.
Don’t use pthreads on Windows if you can possibly avoid it, the wrappers are bad. The C APIs are there as the result of a horrible compromise:
The standards committee felt that they couldn’t introduce atomics without adding threads in the standard.
No one wanted a threading API that was an exact match for another platform.
They ended up standardising a terrible set of APIs. PHK has some good rants about this.
I can’t really argue with the article, although it would have been nice to know it’s more of a reaction to the standard C library on Microsoft Windows (which is only visible if you follow the links) where most of the issues stand out.
Sigh, MSVC’s libc is so bad. Even fopen is unusable due to lacking Unicode support. mingw is nice enough to support UTF-8 paths out of the box, but with MSVC that needs ifdef one way or another.
For those who haven’t seen it, CCAN is a great collection of reusable C code, much of which specifically exists to work around the kinds of issues mentioned in OP. It turns “oh crap I should probably roll my own” into “wait I bet someone has already re-rolled this”.
Edit: See below re the ccodearchive.net link belonging to a squatter now! I’ve updated the link to CCAN’s GitHub repo, which is still active and not spammy. h/t @taal for pointing this out!
However, also good warning to not include other peoples code found on the internet without looking at the actual code in detail - a lot of nasty things can be done by a squatter… especially with this kind of code that is outsourcing magic and goes into.low level parts people might not monitoring.
It’s definitely impossible on Apple platforms, where there is no separate libc; it’s part of the libSystem dylib, which you have to link with (unless you don’t want to make any syscalls…)
Pretty right-on, although I still use the standard library since I don’t want to rewrite it and make it work on all platforms. Why don’t we have a better alternative? C++’s library does provide a lot of replacements but they have their own issues (e.g. iostreams has pretty poor performance.)
I wish the OP had said what they use for strings. A different implementation of the same data structure? IMHO using nul-terminated char strings is itself bad design; strings should (a) have an explicit length and (b) use unsigned chars (interpreted as UTF8.) C++’s string_view handles (a) pretty well — it’s a struct with a pointer to the beginning and the end.
If I had to guess what the author was using it’d either be a ranged pointer with a pointer to the beginning and ending of a string or a pointer and length bundled into a struct. Those are the most reasonable implementations of strings imo.
I sometimes wonder which of those is more optimal. (In terms of performance; the API can be the same, of course.) I’ve used both on different projects. They both seem to involve similar amounts of arithmetic — in the first form size() requires subtracting, while in the other one end() requires adding. It probably comes down to gory details of ISA addressing modes, and may vary by CPU.
Aside from iostreams having poor performance, I haven’t seen it demonstrated that it even covers all of the same use cases, and what it does do can be needlessly obscure. For example, printing an integer as a hexadecimal value of a given width is not obvious, and that kind of thing fits on a cheat-sheet for printf style functions.
I don’t think it’s any less obvious that it is for printf, probably more so since the names are meaningful (and could easily also fit on a cheat sheet). Granted it’s verbose (especially if you need to prepend std:: to each of those identifiers) but I wouldn’t call it “needlessly obscure”.
Unfortunately, on top of being very verbose and burying what is going on below half a line of stuff, the cout version leaves the stream in hex mode. The next number printed will be in hex as well, which you may or may not realize when looking at the output.
std::format is clearly much nicer for basically doing the same as printf, just type safe.
If so, and if printf isn’t needlessly obscure, how is std::format needlessly obscure?
If it was the first example that you are claiming is needlessly obscure, what is obscure about it? Doesn’t spelling out the operations make it less obscure than a printf-style format string?
The new C++20 fmt library is much better in that regard (it’s inspired by Python.) My only complaint is that making a custom type format table is an exercise in ugly template grunge.
This article is missing a few functions I’m curious about like snprintf but yeah. The Windows libc implementation means that for code that’s meant to be portable to it, libc is much much much less useful because the modern APIs are missing. The one place I seriously disagree is the atomics. I feel like about 99% of the time when you want to operate on an atomic, you want to only interact with it as an atomic. But they shouldn’t have overloaded the operators, I’ve run into situations where people get extremely confused that a += b; is not equivalent to a = a + b;
Types are not atomic, loads and stores are atomic.
This is technically true, but types are actually a pretty nice model for atomics. Apart from unsynchronized one-time initialization (which atomic_init does), you almost always want to access them atomically. Having them as an atomic type is self-documenting and prevents “cheating” with unsynchronized access by mistake. Data races are so painful to debug, that it’s better to have one synchronized access too many than one too few.
The only problem with atomics as types is that they’re just not that well implemented in C. Mainly because * just works on atomics as if they were regular pointers, and it looks innocent despite being mostly a wrong thing to do (you don’t specify ordering and it’s too easy to write racy read-modify-update). If atomic types required a function like atomic_fetch_* every time, then it’d be more robust (but that’s an argument for types enforcing use of atomic functions, rather than merely having these functions with no type safety).
This design works even better in Rust where &mut allows better optimizations for non-atomics, and on atomics it can give non-synchronized zero-cost access, because it can statically prove when they’re not shared with another thread.
The article also suggests rolling your own I/O instead of using <stdio.h> (even on POSIX systems), which is one of the main reasons you’d need to be reading errno.
This is a great argument for never using C for any serious application.
Except, C is used for many serious applications! :-)
Everyone makes mistakes sometimes.
In my opinion, it would have been a mistake had there been better choices available. C was one of the very few languages that made life easier for programmers (recall most were using assembly back then), and offered the desired performance benefits of being close to the machine. And it was a small language, allowing quick learning and unparalleled flexibility. Of course, that flexibility came with a tradeoff – it was much much easy to shoot yourself in your foot (and we continue to see these footguns even today!)
Even today I don’t see any formidable opponent to C when it comes to implementing the OS kernel, networking stack, filesystems, and similar serious applications. Rust may be the C replacement, but we have to wait to see where it goes.
On that note, I recall an argument about using newer memory-safe language as a replacement for C on the OpenBSD mailing list, and this email from Theo de Raadt is worth pondering over: https://marc.info/?l=openbsd-misc&m=151233345723889&w=2. If someone cares, the entire thread is worth reading.
I was (begrudgingly) agreeing until here.
Full quote is
Maybe this has changed since the email in 2017.
The part I find objectionable is the idea that not supporting i386 is a terrible problem for a new language.
You can program Rust just fine on OpenBSD/i386. What you can’t do is include it in the base system.
As long as at least one supported architecture cannot build Rust from base, it’s not going to get into the OpenBSD base system.
I fully agree with this article. One of the most annoying things about the C standard library, apart from all the things mentioned here, is that the
atoi
function returns0
for invalid strings. So there is no way to distinguish the input stringhorse
from0
and there is no way to properly verify input with that function. The entire C standard library is fraught with such really quite obvious and terrible API mistakes, and it sucks the fun out of writing correct software in C. I usually just reimplement the standard library functions with correct and sane replacements… But you really don’t want to reimplement the standard library as your first task when you write a C program. The C standard library didn’t just not age well, it was terrible from the get-go.What I’m really curious about is whether the author has some kind of standard library alternative that he is using, and if so, which one and can we use it too?
I agree with your frustration at much of the standard library, but the solution to
atoi
’s crap is to usestrtol
and friends (along witherrno
checking) - POSIX even requiresatoi
to be equivalent toexcept for error handling. The only reason
atoi
remains in POSIX is “because it is used extensively in existing code”.I know, but
atoi
just strikes me as a prime example of what is wrong with the C standard library design. A lot has to go wrong for such a function to find its way into a language standard library. And as you said, it is used extensively in existing code - probably all of it buggy and full of parsing issues.C was around for over 15 years, maybe close to 20, before the first standard was released, and all the C compiler vendors at the time wanted to do the least amount of work to confirm to the standard, so a lot of compromises were made. Also, a lot of C libraries prior to standardization where implemented right from K&R, and gues what?
atoi()
is right there in the book (first edition was 1978 to put things into perspective).Mostly agreed. The locales stuff is a mess, though the _l versions in POSIX are nicer (at least the locale is explicit).
Most of libc doesn’t use
errno
, it’s primarily used as the return from system calls. Without libc, you need some other way of doing system calls. The FreeBSD calling convention for system calls can’t actually be represented directly in C because it uses the carry flag to differentiate between valid and error returns.C screwed this up trying to localise a C++ API into C. The intent of the standard was to allow
std::atomic<T>
(translated into C as_Atomic(T)
) andT
to have different representations. For example, if you do_Atomic(struct SomeBigStruct)
may have a lock word at the start (or end). Instead, atomics have some interesting ptifalls where you can put them in shared memory and they’re not actually atomic.They do, though the wording of the standard is somewhat impenetrable. They also require the arguments to be
volatile
, which is just bizarre.Don’t use pthreads on Windows if you can possibly avoid it, the wrappers are bad. The C APIs are there as the result of a horrible compromise:
They ended up standardising a terrible set of APIs. PHK has some good rants about this.
What’s really funny is that a lot of non-Unix OSes other than Windows adopted pthreads (VMS and i).
I can’t really argue with the article, although it would have been nice to know it’s more of a reaction to the standard C library on Microsoft Windows (which is only visible if you follow the links) where most of the issues stand out.
Sigh, MSVC’s libc is so bad. Even
fopen
is unusable due to lacking Unicode support. mingw is nice enough to support UTF-8 paths out of the box, but with MSVC that needsifdef
one way or another.For those who haven’t seen it, CCAN is a great collection of reusable C code, much of which specifically exists to work around the kinds of issues mentioned in OP. It turns “oh crap I should probably roll my own” into “wait I bet someone has already re-rolled this”.
Edit: See below re the ccodearchive.net link belonging to a squatter now! I’ve updated the link to CCAN’s GitHub repo, which is still active and not spammy. h/t @taal for pointing this out!
Please note that it seems this is NOT the right link anymore. Giveaway: the link to the online casino at the bottom. See: https://lists.ozlabs.org/pipermail/ccan/2022-September/001411.html
However, also good warning to not include other peoples code found on the internet without looking at the actual code in detail - a lot of nasty things can be done by a squatter… especially with this kind of code that is outsourcing magic and goes into.low level parts people might not monitoring.
This took me back to the comp.lang.c days when the top posters all had their an alternative or complementary library to (parts of) libc.
One thing that I do not understand in the post is avoiding libc linking. I think that is impossible for any real-world program.
It’s definitely impossible on Apple platforms, where there is no separate libc; it’s part of the libSystem dylib, which you have to link with (unless you don’t want to make any syscalls…)
It’s fine on Linux at least.
Pretty right-on, although I still use the standard library since I don’t want to rewrite it and make it work on all platforms. Why don’t we have a better alternative? C++’s library does provide a lot of replacements but they have their own issues (e.g. iostreams has pretty poor performance.)
I wish the OP had said what they use for strings. A different implementation of the same data structure? IMHO using nul-terminated char strings is itself bad design; strings should (a) have an explicit length and (b) use unsigned chars (interpreted as UTF8.) C++’s string_view handles (a) pretty well — it’s a struct with a pointer to the beginning and the end.
If I had to guess what the author was using it’d either be a ranged pointer with a pointer to the beginning and ending of a string or a pointer and length bundled into a struct. Those are the most reasonable implementations of strings imo.
I sometimes wonder which of those is more optimal. (In terms of performance; the API can be the same, of course.) I’ve used both on different projects. They both seem to involve similar amounts of arithmetic — in the first form size() requires subtracting, while in the other one end() requires adding. It probably comes down to gory details of ISA addressing modes, and may vary by CPU.
I honestly expect that the performance impact is extremely small. An extra add is hardly going to show up in profiling.
Then you run into “which C++ library are you referring to?”
Aside from iostreams having poor performance, I haven’t seen it demonstrated that it even covers all of the same use cases, and what it does do can be needlessly obscure. For example, printing an integer as a hexadecimal value of a given width is not obvious, and that kind of thing fits on a cheat-sheet for
printf
style functions.I don’t think it’s any less obvious that it is for printf, probably more so since the names are meaningful (and could easily also fit on a cheat sheet). Granted it’s verbose (especially if you need to prepend
std::
to each of those identifiers) but I wouldn’t call it “needlessly obscure”.edit: for C++20, the following would also work:
Unfortunately, on top of being very verbose and burying what is going on below half a line of stuff, the cout version leaves the stream in hex mode. The next number printed will be in hex as well, which you may or may not realize when looking at the output.
std::format is clearly much nicer for basically doing the same as printf, just type safe.
I’m not saying it’s a great API at all, but it’s hardly obscure.
If I use
std::format
I might as well useprintf
because it’s syntactically equivalent.If so, and if
printf
isn’t needlessly obscure, how isstd::format
needlessly obscure?If it was the first example that you are claiming is needlessly obscure, what is obscure about it? Doesn’t spelling out the operations make it less obscure than a printf-style format string?
The new C++20 fmt library is much better in that regard (it’s inspired by Python.) My only complaint is that making a custom type format table is an exercise in ugly template grunge.
It’s better in that it’s equivalent to
printf
meaning there’s no advantage there.This article is missing a few functions I’m curious about like snprintf but yeah. The Windows libc implementation means that for code that’s meant to be portable to it, libc is much much much less useful because the modern APIs are missing. The one place I seriously disagree is the atomics. I feel like about 99% of the time when you want to operate on an atomic, you want to only interact with it as an atomic. But they shouldn’t have overloaded the operators, I’ve run into situations where people get extremely confused that
a += b;
is not equivalent toa = a + b;
This is technically true, but types are actually a pretty nice model for atomics. Apart from unsynchronized one-time initialization (which
atomic_init
does), you almost always want to access them atomically. Having them as an atomic type is self-documenting and prevents “cheating” with unsynchronized access by mistake. Data races are so painful to debug, that it’s better to have one synchronized access too many than one too few.The only problem with atomics as types is that they’re just not that well implemented in C. Mainly because
*
just works on atomics as if they were regular pointers, and it looks innocent despite being mostly a wrong thing to do (you don’t specify ordering and it’s too easy to write racy read-modify-update). If atomic types required a function likeatomic_fetch_*
every time, then it’d be more robust (but that’s an argument for types enforcing use of atomic functions, rather than merely having these functions with no type safety).This design works even better in Rust where
&mut
allows better optimizations for non-atomics, and on atomics it can give non-synchronized zero-cost access, because it can statically prove when they’re not shared with another thread.Isn’t it used by POSIX system calls? Can’t escape it entirely in that case. :-/
Oh, /u/spc476 says this is mainly a reaction to Windows programming. Don’t need POSIX there, then.
The article also suggests rolling your own I/O instead of using
<stdio.h>
(even on POSIX systems), which is one of the main reasons you’d need to be readingerrno
.