1. 48
  1.  

  2. 16

    I’m impressed that 90% of this is shit that C should have fixed or just really low hanging fruit that should have been fixed years ago. Better late than never; if they implement even half of this, C23 (whoops, I said 21 in the original comment) will become the new C99-like baseline.

    1. 10

      Are they finally going to fix the abomination that is C11 atomics? As far as I can tell, WG14 copied atomics from WG21 without understanding them and ended up with a mess that causes problems for both C and C++.

      In C++11 atomics, std::atomic<T> is a new, distinct type. An implementation is required to provide a hardware-enforced (or, in the worst case, OS-enforced) atomic boolean. If the hardware supports a richer set of atomics, then it can be used directly, but a std::atomic<T> implementation can always fall back to using std::atomic_flag to implement a spinlock that guards access to larger types. This means that std::atomic<T> can be defined for all types and be reasonably efficient (if you have a futex-like primitive then, in the uncontended case it’s almost as fast as T and in the contended state it doesn’t consume much CPU time or power spinning).

      Then WG14 came along and wanted to define _Atomic(T) to be compatible with std::atomic<T>. That would require the C compiler and C++ standard library to agree on data layout and locking policy for things larger than the hardware-supported atomic size, but it’s still feasible. Then they completely screwed up by making all of the arguments to the functions declared in stdatomic.h take a volatile T* instead of an _Atomic(T)*. For historical reasons, the representation of volatile T and T have to be the same, which means that _Atomic(T) and T must have the same representation and there is nowhere that you can stash a lock. The desire to make _Atomic(T) and std::atomic<T> interchangeable means that C++ implementers are stuck with this.

      Large atomics are now implemented by calls to a library but there is no way to implement this in a way that is both fast and correct, so everyone picks fast. The atomics library provides a pool of locks and acquires one keyed on the address. That’s fine, except that most modern operating systems allow virtual addresses to be aliased and so there are situations (particularly in multi-process situations, but also when you have a GC or similar doing exciting virtual memory tricks) where simple operations _Atomic(T) are not atomic. Fixing that would requiring asking the OS if a particular page is aliased before performing an operation (and preventing it from becoming aliased during the operation), at which point you may as well just move atomic operations into the kernel anyway, because you’re paying system call for each one.

      C++20 has worked around this by defining std::atomic_ref, which provides the option of storing the lock out-of-line with the object, at the expense of punting the determination of the sharing set for an object to the programmer.

      Oh, and let’s not forget the mtx_timedlock fiasco. Ignoring decades of experience in API design, WG14 decided to make the timeout for a mutex the wall-clock time, not the monotonic clock. As a result, it is impossible to write correct code using C11’s mutexes because the wall-clock time may move arbitrarily. You can wait on a mutex with a 1ms timeout and discover that the clock was wrong and after it was reset in the middle of your ‘get time, add 1ms, timedwait’ sequence, you’re now waiting a year (more likely, you’re waiting multiple seconds and now the tail latency of your distributed system has weird spikes). The C++ version of this API gets it right and allows you to specify the clock to use, pthread_mutex_timedlock got it wrong and ended up with platform-specific work-arounds. Even pthreads got it right for condition variables, C11 predictable got it wrong.

      C is completely inappropriate as a systems programming language for modern hardware. All of these tweaks are nice cleanups but they’re missing the fundamental issues.

      1. 3

        Then they completely screwed up by making all of the arguments to the functions declared in stdatomic.h take a volatile T* instead of an _Atomic(T)*. For historical reasons, the representation of volatile T and T have to be the same, which means that _Atomic(T) and T must have the same representation and there is nowhere that you can stash a lock.

        I’m not too familiar with atomics and their implementation details, but my reading of the standard is that the functions in stdatomic.h take a volatile _Atomic(T) * (i.e. a pointer to volatile-qualified atomic type).

        They are described with the syntax volatile A *object, and earlier on in the stdatomic.h introduction it says “In the following synopses: An A refers to one of the atomic types”.

        Maybe I’m missing something?

        1. 2

          Huh, it looks as if you’re right. That’s how I read the standard in 2011 when I added the atomics builtins to clang, but I reread it later and thought that I’d initially misunderstood. It looks as if I get to blame GCC for the current mess then (their atomic builtins don’t require _Atomic-qualified types and their stdatomic.h doesn’t check it).

          Sorry WG14, you didn’t get atomics wrong, you just got mutexes and condition variables wrong.

          That said, I’ve no idea why they felt the need to make the arguments to these functions volatile and _Atomic. I am not sure what a volatile _Atomic(T)* actually means. Presumably the compiler is not allowed to elide the load or store even if it can prove that no other thread can see it?

          1. 1

            I’ve no idea why they felt the need to make the arguments to these functions volatile and _Atomic

            I’ve no idea; but a guess: they want to preserve the volatility of arguments to atomic_*. That is, it should be possible to perform operations on variables of volatile type without losing the ‘volatile’. I will note that the c++ atomics contain one overload with volatile and one without. But if that’s the case, why the committee felt they could get away with being polymorphic wrt type, but not with being polymorphic wrt volatility is beyond me.

            There is this stackoverflow answer from a committee member, but I did not find it at all illuminating.

            not allowed to elide the load or store even if it can prove that no other thread can see it?

            That would be silly; a big part of the impetus for atomics was to allow the compiler to optimize in ways that it couldn’t using just volatile + intrinsics. Dead loads should definitely be discarded, even if atomic!


            One thing that is clear from this exchange: there is a massive rift between specifiers, implementors, and users. Thankfully the current spec editor (JeanHeyd Meneide, also the author of the linked post) seems to be aware of this and to be acting to improve the situation; so we will see what (if anything) changes.

            1. 3

              One thing that is clear from this exchange: there is a massive rift between specifiers, implementors, and users. Thankfully the current spec editor (JeanHeyd Meneide, also the author of the linked post) seems to be aware of this and to be acting to improve the situation; so we will see what (if anything) changes.

              It’s not really clear to me how many implementers are left that care:

              • MSVC is a C++ compiler that has a C mode. The authors write in C++ and care a lot about C++.
              • Clang is a C++ compiler that has C and Objective-C[++] modes. The authors write in C++ and care a lot about C++.
              • GCC includes C and C++ compilers with separate front ends, it’s primarily C so historically the authors have cared a lot about C, but for new code it’s moving to C++ and so the authors increasingly care about C++.

              That leaves things like PCC, TCC, an so on, and a few surviving 16-bit microcontroller toolchains, as the only C implementations that are not C++ with C as an afterthought.

              I honestly have no idea why someone would choose to write C rather than C++ these days. You end up writing more code, you have a higher cognitive load just to get things like ownership right (even if you use nothing from C++ other than smart pointers, your live is significantly better than that of a C programmer), you don’t get generic data structures, and you don’t even get more efficient code because the compilers are all written in C++ and so care about C++ optimisation because it directly affects the compiler writers.

              C++ is not seeing its market eroded by C but by things like Rust and Zig (and, increasingly, Python and JavaScript, since computers are fast now). C fits in a niche that doesn’t really exist anymore.

              1. 2

                I honestly have no idea why someone would choose to write C rather than C++ these days.

                For applications, perhaps, but for libraries and support code, ABI stability and ease of integration with the outside world are big ones. It’s also a much less volatile language in ways that start to really matter if you are deploying code across a wide range of systems, especially if old and/or embedded ones are included.

                Avoiding C++ (and especially bleeding edge revisions of it) avoids a lot of real life problems, risks, and hassles. You lose out on a lot of power, of course, but for some projects the kind of power that C++ offers isn’t terribly important, but the ability to easily run on systems 20 years old or 20 years into the future might be. There’s definitely a sort of irony in C being the real “write once, run anywhere” victor, but… in many ways it is.

                C fits in a niche that doesn’t really exist anymore.

                It might not exist in the realm of trendy programming language debates on the Internet, but we’re having this conversation on systems largely implemented in it (UNIX won after all), so I think it’s safe to say that it very much exists, and will continue to for a long time. That niche is just mostly occupied by people who don’t tend to participate in programming language debates. One of the niche’s best features is being largely insulated from all of that noise, after all.

                It’s a very conservative niche in a way, but sometimes that’s appropriate. Hell, in the absolute worst case scenario, you could write your own compiler if you really needed to. That’s of course nuts, but it is possible, which is reassuring compared to languages like C++ and Rust where it isn’t. More realistically, diversity of implementation is just a good indicator of the “security” of a language “investment”. Those implementations you mention might be nichey, but they exist, and you could pretty easily use them (or adapt them) if you wanted to. This is a good thing. Frankly I don’t imagine any new language will ever manage to actually replace C unless it pulls the same thing off. Simplicity matters in the end, just in very indirect ways…

                1. 4

                  For applications, perhaps, but for libraries and support code, ABI stability and ease of integration with the outside world are big ones. It’s also a much less volatile language in ways that start to really matter if you are deploying code across a wide range of systems, especially if old and/or embedded ones are included.

                  I’d definitely have agreed with you 10 years ago, but the C++ ABI has been stable and backwards compatible on all *NIX systems, and fairly stable on Windows, for over 15 years. C++ provides you with some tools that allow you to make unstable ABIs for your libraries, but it also provides tools for avoiding these problems. The same problems exist in C: you can’t add a field to a C structure without breaking the ABI, just as you can’t add a field to a C++ class without breaking the ABI.

                  I should point out that most of the things that I work on these days are low-level libraries and C++17 is the default tool for all of these.

                  You lose out on a lot of power, of course, but for some projects the kind of power that C++ offers isn’t terribly important, but the ability to easily run on systems 20 years old or 20 years into the future might be.

                  Neither C nor C++ guarantees this, in my experience old C code needs just as much updating as C++ code, and it’s often harder to do because C code does not encourage clean abstractions. This is particularly true when talking about running on new platforms. From my personal experience, we and another group have recently written memory allocators. Ours is written in C++, theirs in C. This is what our platform and architecture abstractions look like. They’re clean, small, and self-contained. Theirs? Not so much. We’ve ported ours to CHERI, where the hardware enforces strict pointers and bounds enforcement on pointers with quite a small set of changes, made possible (and maintainable when most of our targets don’t have CHERI support) by the fact that C++ lets us define pointer wrapper types that describe high-level semantics of the associated pointer and a state machine for which transitions are permitted, porting theirs would require invasive changes.

                  It might not exist in the realm of trendy programming language debates on the Internet, but we’re having this conversation on systems largely implemented in it (UNIX won after all), so I think it’s safe to say that it very much exists, and will continue to for a long time.

                  I’m writing this on a Windows system, where much of the kernel and most of the userland is C++. I also post from my Mac, where the kernel is a mix of C and C++, with more C++ being added over time, and the userland is C for the old bits, C++ for the low-level new bits, and Objective-C / Swift for the high-level new bits. The only places either of these systems chose C were parts that were written before C++11 was standardised.

                  Hell, in the absolute worst case scenario, you could write your own compiler if you really needed to.

                  This is true for ISO C. In my experienced (based in part on building a new architecture designed to run C code in a memory-safe environment and working on defining a formal model of the de-facto C standard), there is almost no C code that is actually ISO C. The language is so limited that anything nontrivial ends up using vendor extensions. ‘Portable’ C code uses a load of #ifdefs so that it can use two or more different vendor extensions. There’s a lot of GNU C in the world, for example.

                  Reimplementing GNU C is definitely possible (clang, ICC, and XLC all did it, with varying levels of success) but it’s hard, to the extent that of these three none actually achieve 100% compatibility to the degree that they can compile, for example, all of the C code in the FreeBSD ports tree out of the box. They actually have better compatibility with C++ codebases, especially post-C++11 codebases (most of the C++ codebases that don’t work are ones that are doing things so far outside the standard that they have things like ‘works with G++ 4.3 but not 4.2 or 4.4’ in their build instructions).

                  More realistically, diversity of implementation is just a good indicator of the “security” of a language “investment”. Those implementations you mention might be nichey, but they exist, and you could pretty easily use them (or adapt them) if you wanted to.

                  There are a few niche C compilers (e.g. PCC / TCC), but almost all of the mainstream C compilers (MSVC, GCC, Clang, XLC, ICC) are C++ compilers that also have a C mode. Most of them are either written in C++ or are being gradually rewritten in C++. Most of the effort in ‘C’ compiler is focused on improving C++ support and performance.

                  By 2018, C++17 was pretty much universally supported by C++ compilers. We waited until 2019 to move to C++17 for a few stragglers, we’re now pretty confident being able to move to C++20. The days when a new standard took 5+ years to support are long gone for C++. Even a decade ago, C++11 got full support across the board before C11.

                  If you want to guarantee good long-term support, look at what the people who maintain your compiler are investing in. For C compilers, the folks that maintain them are investing heavily in C++ and in C as an afterthought.

                  1. 3

                    I’d definitely have agreed with you 10 years ago, but the C++ ABI has been stable and backwards compatible on all *NIX systems, and fairly stable on Windows, for over 15 years. C++ provides you with some tools that allow you to make unstable ABIs for your libraries, but it also provides tools for avoiding these problems. The same problems exist in C: you can’t add a field to a C structure without breaking the ABI, just as you can’t add a field to a C++ class without breaking the ABI.

                    The C++ ABI is stable now, but the problem is binding it from other languages (i.e. try binding a mangled symbol), because C is the lowest common denominator on Unix. Of course, with C++, you can just define a C-level ABI and just use C++ for everything.

                    edit

                    Reimplementing GNU C is definitely possible (clang, ICC, and XLC all did it, with varying levels of success) but it’s hard, to the extent that of these three none actually achieve 100% compatibility to the degree that they can compile, for example, all of the C code in the FreeBSD ports tree out of the box. They actually have better compatibility with C++ codebases, especially post-C++11 codebases (most of the C++ codebases that don’t work are ones that are doing things so far outside the standard that they have things like ‘works with G++ 4.3 but not 4.2 or 4.4’ in their build instructions).

                    It’s funny no one ever complains about GNU’s extensions to C being so prevalent that it makes implementing other C compilers hard, yet loses their minds over say, a Microsoft extension.

                    1. 2

                      The C++ ABI is stable now, but the problem is binding it from other languages (i.e. try binding a mangled symbol), because C is the lowest common denominator on Unix. Of course, with C++, you can just define a C-level ABI and just use C++ for everything.

                      That depends a lot on what you’re binding. If you’re using SWIG or similar, then having a C++ API can be better because it can wrap C++ types and get things like memory management for free if you’ve used smart pointers at the boundaries. The binding generator doesn’t care about name mangling because it’s just producing a C++ file.

                      If you’re binding to Lua, then you can use Sol2 and directly surface C++ types into Lua without any external support. With something like Sol2 in C++, you write C++ classes and then just expose them directly from within C++ code, using compile-time reflection. There are similar things for other languages.

                      If you’re trying to import C code into a vaguely object-oriented scripting language then you need to implement an object model in C and then write code that translates from your ad-hoc language into the scripting language’s one. You have to explicitly write all memory-management things in the bindings, because they’re API contracts in C but part of the type system in C++.

                      From my personal experience, binding modern C++ to a high-level language is fairly easy (though not quite free) if you have a well-designed API, binding Objective-C (which has rich run-time reflection) is trivial to the extent that you can write completely generic bridges, and binding C is possible but requires writing bridge code that is specific to the API for anything non-trivial.

                      1. 1

                        Right; I suspect it’s actually better with a binding generator or environments where you have to write native binding code (i.e. JNI/PHP). It’s just annoying for the ad-hoc cases (i.e. .NET P/Invoke).

                        1. 2

                          On the other hand, if you’re targeting .NET on Windows then you can expose COM objects directly to .NET code without any bridging code and you can generate COM objects directly from C++ classes with a little bit of template goo.

        2. 2

          Looks like Hans Boehm is working on it, as mentioned in the bottom of the article. They are apparently “bringing it back up to parity with C++” which should fix the problems you mentioned.

          1. 4

            That link is just Hans adding a <cstdatomic> to C++ that adds a #define _Atomic(T) std::atomic<T>. This ‘fixes’ the problem by letting you build C code as C++, it doesn’t fix the fact that C is fundamentally broken and can’t be fixed without breaking backwards source and binary compatibility.

        3. 4

          There had been talks of adding a Zig-style defer to C at one point, and I’d pay real American money to have a *? type to distinguish nullable pointers.

          This isn’t to belittle the work of the Committee, more just “C23 is great, here’s something for C25.”

          1. 3
            1. 2

              I mean that works. But god help you if you ever accidentally use return instead of Return .

              Also, having to put Deferral in the start of every single scope you want to be able to use Defer in is a bit stupid.

              I’d prefer an actual defer statement in the language. It’d be much cheaper at runtime too, because the compiler would just know what code to run on return, rather than having to iterate through a list of (non-standard) label pointers.

              1. 2

                god help you if you ever accidentally use return instead of Return

                Perhaps a good idea to #define return Return, if using it extensively.

                having to put Deferral in the start of every single scope you want to be able to use Defer in is a bit stupid

                Indeed. Though note it is per-function, not per-scope.

                much cheaper at runtime too, because the compiler would just know what code to run on return, rather than having to iterate through a list of (non-standard) label pointers

                If control flow is complex, the compiler will have to do exactly the same thing. If control flow is simple, I expect it to be folded down to the same thing—conditional constant propagation is whack!

                I’d prefer an actual defer statement in the language

                Me too!

              2. 1

                __attribute__((cleanup)) and no custom Return required :)

                1. 1

                  Yes, but that is nonstandard. Also: death to gobject!

                  1. 1

                    __attribute__((cleanup)) works well when the only thing you want your deferred functions to do is to somehow free an object when it goes out of scope. It’s useless for other use cases like unlocking a mutex at the end of the scope or anything else which doesn’t fit the extremely narrow scope of __attribute__((cleanup)).

                    Also, it requires defining a function out-of-line, with a name and everything, which is both more work for the programmer and, in many cases, harder to read than a proper defer.

                    1. 2

                      I wrote some macros that wrapped __attribute__((cleanup)) for handling locks in the Objective-C runtime many years ago and have not had to modify them. This code needs to be exception safe, because there are places where Objective-C or C++ will throw exceptions through C stack frames. The out-of-line function is implemented once and used everywhere where you’d want to use a lock, you just use the same kind of pattern that you’d use with RAII in C++.

                      If you want to run something that’s not just a function, then your compiler is going to end up doing closure construction in the middle, at which point you probably want to just expose closures in your language. And at that point, you may as well use C++ where it’s trivial to write a trivial Defer template class that takes a lambda in its constructor and invokes it in the destructor, something like this:

                      template<typename T> struct Defer
                      {
                              T fn;
                              Defer(T &&f) : fn(std::move(f)) {}
                              ~Defer() { fn(); }
                      };
                      

                      In general, I’d consider something like this to be an antipattern though. Running arbitrary deferred code makes it very hard to reason about control flow. If you encapsulate cleanup code, either in destructors if you have them or in macros that wrap __attribute__((cleanup)) if that’s all that you have, then the set of things that can run on return / unwind is small and well defined.

                      1. 1

                        at which point you probably want to just expose closures in your language

                        Well, funny that you mention ObjC — coincidentally, clang brought Objective-C closures to C with -fblocks

                        1. 2

                          I am very aware of this, having written a blocks runtime before Apple open sourced theirs.

                          Blocks are a mess for C, because C doesn’t have any notion of a destructor or any equivalent. A block that captures an Objective-C object pointer will release it when the closure is destroyed, dropping is refcount and allowing it to be deallocated. A block that captures a C++ object will run its destructor when the block is destroyed. This can be used in C++ with smart pointers so that an object either has ownership transferred to the block with a unique pointer or has a reference owned by the block with shared pointer. When the block is destroyed, all captured resources are destroyed.

                          In contrast, when a block captures a C pointer, there is no place to deallocate it. The block itself is reference counted, so can happily capture heap objects and reuse them across multiple invocations, but it cannot then destroy them implicitly on deallocation. This means that blocks are fine in C for downward funargs (i.e. you can pass a block down the stack as long as it isn’t captured) but you can’t pass a block up the stack or have it captured by a function that it is passed down to.

                          You can’t fix that without introducing something like a destructor into C and, as with so many other things in C, why would you bother when you can just use C++ and have mature support for all of these things in existing compilers?

                          1. 1

                            <sarcasm>Clearly, blocks in C should instead take an IUnknown* and decrement the reference count.</sarcam>

                2. 2

                  pay real American money to have a *? type to distinguish nullable pointers

                  Also: @ for non-nullable pointers, # for length-accompanied slices. A man can dream.

                  1. 2

                    @ for non-nullable pointers

                    I suppose they could borrow (no pun intended) C++’s references syntax…

                    1. 8

                      Please don’t. Call-by-reference is a horrible idea. It should be obvious at the call site if the callee can mutate the passed variable.

                      And in C++ I cannot have, for instance, a reference to a reference. And it’s easy to accidentally turn a reference into a value, and then back into a reference but one which no longer refers to the original object. What’s wanted is an honest-to-god pointer, but one with minimal type-level assurance that it always points to one object.

                  2. 2

                    Did the talks on defer stop? There was a presentation on the proposal by Jens Gustedt and Rober Seacord the last 20th of March of 2021: https://www.youtube.com/watch?v=Y74i_1khQX8 I guess they’re still going for it.

                    For others that would be interested in alternatives, the following quote is from the “Related work” section of the documentation for my cedro C pre-processor that has, among other features, an unrestricted defer-style feature without limit on number or complexity of deferred code. I presented it a month ago here in Lobsters: https://lobste.rs/s/18axic/c_programming_language_extension_cedro

                    Apart from the already mentioned «A defer mechanism for C», there are macros that use a for loop as for (allocation and initialization; condition; release) { actions } [a] or other techniques [b].

                    [a] “P99 Scope-bound resource management with for-statements” from the same author (2010), “Would it be possible to create a scoped_lock implementation in C?” (2016), ”C compatible scoped locks“ (2021), “Modern C and What We Can Learn From It - Luca Sas [ ACCU 2021 ] 00:17:18”, 2021

                    [b] “Would it be possible to create a scoped_lock implementation in C?” (2016), “libdefer: Go-style defer for C” (2016), “A Defer statement for C” (2020), “Go-like defer for C that works with most optimization flag combinations under GCC/Clang” (2021)

                    Compilers like GCC and clang have non-standard features to do this like the __cleanup__ variable attribute.

                    1. 4

                      Robert on twitter:

                      Definitely not going to make C23. We need to publish a TR/TS first.

                  3. 4

                    getting rid of footguns like parameter-less/takes-any-argument functions

                    Wow, finally. A few times I’ve been told to “make my functions ANSI” by reviewers, which always got me like “WHAT? I don’t use the weird K&R style decls before the opening {, this is ANSI??” and a minute later “oh, the stupid (void) parameter, argh” >_<

                    1. 1

                      There’s a Warning For That™

                    2. 1

                      Guess who uses apostrophes for digit separators! The SWISS! And who is in Switzerland? ISO! I see WG14 is no less corrupt than WG21.

                      I reject this modern form of cultural imperialism. I implore the committee to use the period . as a digit separator, as any good Norwegian would.

                      (/s)

                      But seriously, both digit separators and binary literals have been things I’ve wanted for ages. Maybe this will make me move from -std=c99 to -std=c23.