1. 37
  1.  

    1. 10

      If you are interested in seeing how you can change large code bases to adopt -fbounds-safety, you can take a look at the xnu networking stack which adopted it last year.

      You can search for the new compiler keywords in bsd/net*.

      Example: https://github.com/apple-oss-distributions/xnu/blob/main/bsd/net/bpf.c#L230

      1. 3

        It would be really cool if there was an experience guide of someone writing how they ported a large codebase to this over time. :) Asking for, ehm, a friend.

        1. 5

          We had the best engineers in the world (i.e., the architects and the implementers of -fbounds-safety) holding our hands so our experience doesn’t translate well to the rest of the world.

          However, some kind souls documented the process and it’s open source:

      2. 7

        Since the performance impact is often critical for the decision of using such functionalities. Here the info from their slides.

        • LOC changes: 2.7% (0.2% used unsafe constructs)
          • Much lower than prior approaches
        • Compile-time overhead: 11%
        • Code-size (text section) overhead: 9.1% (ranged -1.4% to 38%)
        • Run-time overhead: 5.1% (ranged -1% to 29%)
          • Tend to rely more on run-time checks with benefit of lower adoption cost
          • Can be improved with optimization improvements

        Measurement on iOS

        • 0-8% binary size increase per project
        • No measurable performance or power impact on boot, app launch
        • Minor overall performance impact on audio decoding/encoding (1%)
          1. [Comment removed by author]

            1. 8

              please file a bug report, this code is compiling large amounts of code in production code bases, so it really should not be trivially crashing, of course it’s possible your code does just happen to hit a case it has never seen before, but it could also just be the fork is currently based on a borked baseline (entirely possible, especially if a debug/assertion enabled build).

              To be super clear:

              • Crashing: obviously bad, irrespective of cause, clang should produce a crash log and I think it also dumps a “repro” case in many build modes.
              • Not compiling: what does this mean? it could be incorrect use of annotations (implying unclear or incorrect documentation), it could be some implicit platform assumption (e.g. assuming some Darwin platform property), or it could be an implementation bug.

              Again, if you have any time to spare, any information you could provide would be really appreciated (you can file reports on the git repo or dm me - I can get the info to the right people)

              1. 1

                try to compile dte editor on ppc64le on musl - fatality. In oposite dm me if i should try this on x64.

                1. 1

                  This project? https://github.com/craigbarnes/dte

                  There’s definitely going to have been more testing on x86_64 and arm64, so it’s possible it’s a backend problem. Do you recall what build config you used?

                  1. 1

                    doing x64 tests too.

            2. 1

              No snark: what took so long?

              I thought such bounds-checking was fundamentally incompatible with the language. I thought that UB here was required in order for compiler authors to be able to do their job, which I take to be to optimise the binary as much as possible.

              We have known since the beginning of time (epoch date 0) that bounds checking in C was the responsibility of the programmer. There have been multiple proposals to add bounds-checking to C since then. None have gained significant traction.

              So what’s different this time?

              Skimming the RFC, it seems like this extension requires using a new reserved identifier all over the place where you want to enforce bounds checking. Perhaps this is a placeholder for a future keyword or set of keywords. It seems a bit awkward to use, but it also seems like an obvious solution.

              So I ask again, seeing how this solution seems obvious, what took so long?

              1. 13

                I thought such bounds-checking was fundamentally incompatible with the language. I thought that UB here was required in order for compiler authors to be able to do their job, which I take to be to optimise the binary as much as possible.

                UB is not required for optimization, as demonstrated by numerous other languages.

                The problem is that there are a lot of benchmarks where turning certain cases of UB into non-UB (either defined, unspecified, or implementation defined - these are different things) hurts those benchmarks. Part of the problem of course is that there are decades of optimizations based on exploiting that UB, and nowhere near as much effort into achieving performance without those “optimizations”.

                Other things languages are also designed in ways that mitigate the need for these UB style optimizations (range based enumeration vs index based for example can have significantly different perf characteristics depending on how you do code gen).

                So what’s different this time?

                A company with the money and resources is willing to commit the money and resources to the problem.

                Skimming the RFC, it seems like this extension requires using a new reserved identifier all over the place where you want to enforce bounds checking. Perhaps this is a placeholder for a future keyword or set of keywords. It seems a bit awkward to use, but it also seems like an obvious solution.

                As a compiler vendor you can’t just blindly add keywords, etc to the language, because that can cause problems in the language later on if your syntax conflicts with the language, hence underscores everywhere. But even if this exact proposal were standardized you’d still have reserved identifiers/keywords everywhere, just without leading underscores.

                So I ask again, seeing how this solution seems obvious, what took so long?

                Because “obvious” and “easy” are not the same thing :D

                There are a bunch of issues - most historical attempts and adding bounds checking to C basically involve adopting wide pointers, but that is not something that works in any case where you have ABI compatibility constraints.

                So if you want to maintain ABI compatibility you need to develop and implement a syntax that allows you to specify what the bounds of a pointer are, in a way that can be reasoned about locally.

                In some cases it is very easy - and lots of proposals over the years have said “lets solve bounds safety” by suggesting handling for these easy cases.

                void f(int N, int array[N])
                struct S {
                  int N;
                  int tail[]; // I think gcc may allow [N]?
                }
                

                (which does not have any actual impact on ABI or codegen, or even just warnings, int array[N] is literally identical to int*) - I believe in this proposal this syntax is recognized and becomes equivalent to

                void f(int N, int * __counted_by(N) array)
                

                So lots of existing trivially checkable code doesn’t need new annotations.

                But even trivial cases like this easily break down with minor changes, e.g.

                void f(int array[...?], int N)
                

                Or non-VLA types

                struct S {
                  int * __counted_by(N) elements;
                  int N;
                }
                

                Off the bat you can see that you need to support forward references to both struct fields and parameters. While C++ supports forward references to fields C does not, and neither supports forward references to parameters.

                Beyond that, there are many many ways that pointer bounds are managed. Historical attempts to add bounds safety only attempted to manage cases that were essentially type buffer[N] for some trivial N. To actually support real world code, it turns out you need to support things that are logically type buffer[arbitrary expression using parameters and struct fields, etc], you also need to handle “span” types where what you have is pointers to the beginning and end of a buffer (and maybe the middle), buffers that are terminated by some special value (null terminated strings being the classic), in some places a buffer may be sized in bytes, in other cases it might be the count of elements, and on and on.

                So you have to design the system so that you can handle not just every case, but also manage transitions from one model to another, e.g say

                void f0(int *start, int end) { … } void f1(int count, int buffer[count]) { f0(buffer, buffer+count); } void f2(int x, int y, int buffer[xy]) { f1(x * y, buffer); }

                You also want to minimize the amount of manual annotations for things you already know, e.g:

                void* __sized_by(sz) malloc(size_t sz); … int * buffer = malloc(100); // we know the bounds of the return here

                All of this is fine, but I’m not aware of old attempts even considering how you deal with something like, but you have to handle it correctly:

                struct S {
                   int sz;
                   int *__counted_by(sz) buffer;
                }
                
                void f(struct S *s) {
                   s->sz *= 2;
                   // or
                   s->buffer = malloc(....)
                }
                

                and similar - I believe the way this proposal handles this issue is that updates like this must always correct everything at the same time, which is an easy enough rule, but one you have to identify and ensure is done correctly, and everything remains coherent.

                After all of this, you still don’t want people to have to put these annotations everywhere, so this proposal also - in addition to the various sizing annotations, introduces a pile of wide pointers as well, which is what are used implicitly for locals (that’s a lot of how the intermingling of bounds can be managed), but also they can just be used as types everywhere else (going all the way back to the early Abi-breaking wide pointer “solutions”).

                Then after all of that you have to make it go fast enough - historically C/C++ compilers were not super great at bounds check elimination (because C/C++ programmers don’t do bounds checks :-/) and years of effort into optimizing bounds checking was needed for this - happily that benefits all code that does bounds checking.

                So the TLDR is that while it might seem “obvious”, the full scope of the “obvious” portion is quite a lot larger than you might initially think, and the complexity of actually implementing it is extremely high, which is why it has taken so long.

                1. 2

                  UB is not required for optimization, as demonstrated by numerous other languages.

                  If you want to have absolutely zero runtime cost, not a single CPU instruction wasted, you need UB for out-of-bounds indexing, don’t you?

                  1. 4

                    [Header added after I finished writing the wall of text reply] Sigh, I’m super sorry, this has become a super long reply because I was having fun describing the path from a single instance UB to losing all bounds checking, even the non-UB related ones.

                    I’ve tried to keep the basic answer to your question “concise” and at the head of the reply, and I’ve moved the UB v security misery to the bottom, with the heading ### Oliver goes into the weeds ###. I think it’s interesting, and I was enjoying writing it/procrastinating, but it’s somewhat tangential to what you were asking about :D [End of the authors introduction to a wall of text :D ]

                    Sorry, that was unclear on my part and I made an assumption. First off, you’re obviously correct: no bounds checks at all - and the obvious potential OoB (obvs UB) - is free.

                    When people are complaining about UB based optimizations in C/C++ they’re not generally talking about the “there is no memory safety”, they’re talking about the completely definable operations that are none the less UB, and so that is what I assumed that was what you were referring to, apologies for that.

                    The classic case for UB derived optimizations in C/C++ is integer overflow, which leads to security bugs in “obviously correct” (but actually incorrect according to the C/C++ abstract machine) where overflow checks like this:

                    if (a + b < a)

                    are removed (the incorrect belief that C/C++ is “high level assembler” causes many problems like this).

                    There are “real” examples of “overflow is UB” helping performance, but I am completely blanking on the examples, but an example the I believe is a correct interpretation of what a C++ compiler case use UB to do, say you have a loop:

                    for (int i = 0; some condition; i += some_thing()) { ... }
                    ...
                    

                    Assuming int is a 32bit int, and you’re on a 64 bit machine, the compiler ends up doing (the most pseudo of pseudo asm)

                    r1 = 0; r1 will contain i, i = 0
                    start:
                      .... evaluate termination condition;
                      branch if false .end
                      call some_thing 
                      add r1, r1, r0
                      ;... do some stuff
                      branch .start
                    end
                    

                    The thing to note here is that we’re just adding a value to the register. But the register is 64bit, which means that I can become greater than MAX_INT.

                    If overflow is not UB, and the compiler cannot prove that overflow cannot happen, the above code needs to change, so

                      add r1, r1, r0
                    

                    becomes

                      add r1, r1, r0
                      and r1, 0x7fffffff
                    

                    You can imagine some case where this could go wrong from a security PoV, but this is just intended as a super basic example of how “overflow is UB” can help perf.

                    I know there are much “better” examples of more significant perf improvements, but my mind is completely blanking on them, but this just illustrates a way the UB-ness can help perf. It’s possible that it helps autovectorisation or something? My mind’s just drawing a complete blank.

                    Now, in C++ at least - I don’t know about C - unsigned arithmetic is now defined explicitly as 2s complement, and therefore under and overflow have completely defined behavior.

                    You might ask why they haven’t done the same for signed arithmetic (or at least made it have one of the non-UB options: unspecified, implementation defined), and my super cynical view (conceivably incorrect, but I’m really cynical here) is that the major C/C++ compiler benchmarks are all quite old and have a lot of int based loops, that have obviously had decades of super focused optimizations that both may not help real world code, and also would also break if signed overflow had any kind of defined behavior :D

                    Oliver goes into the weeds

                    Now the earlier mentioned overflow check removal only became possible because in the early 2000s compiler authors went from a definition of “UB means we don’t guarantee the outcome” to “if we can prove UB occurs, then that code is incorrect, and therefore that code path cannot be reached, so we can then use the discovery of UB, to ‘prove’ properties of variables”. e.g. if (a) some_ub() went from “no guarantees of what will happen” to “this proves that a is false”, this could happen even if the code was bool a = true; if (a) some_ub(); <compiler now believes a is false, and could even assume it has always been false> (e.g bool a = true; if (a) printf("1"); if(!a) printf("2"); if (a) { printf("3"); some_ub(); } might print “13”, “2”, “12”, “”, … depending on how and when it has noticed the ub vs constant folding, etc.

                    Anyway, it goes without saying that this change in semantics resulted in many examples of “safe” code becoming exploitable, and yet somehow that definition has been allowed to stick. Another example, of this interpretation causing real world security bugs, is even more nefarious, imagine a bounds check:

                    struct Array {
                       int size;
                       int *buffer;
                    }
                    Array_load(Array* a, int idx) {
                       if (idx >= a->size) { terminate in some way }
                       return a->buffer[idx];
                    }
                    

                    Bounds checking is super hot, and if you’re doing it on every access the code size alone can impact performance, so you need the simplest possible termination mechanism, an a function call (exit()) is not that, nor is throwing exceptions (if c++), and inserting an asm() block simply destroys perf. So what many projects had was:

                    Array_load(Array* a, int idx) {
                       if (idx >= a->size) { *(void**)NULL; }
                       return a->buffer[idx];
                    }
                    

                    which is a single instruction on pretty much all hardware - definitely all hardware that is expected to have any kind of “performance”. Except dereferencing null is UB, so this new definition that said “if the compiler can prove UB in a code path, that code path cannot be reached” let them do this sequence

                    Array_load(Array* a, int idx) {
                       if (idx >= a->size) { <ub> }
                       return a->buffer[idx];
                    }
                    
                    Array_load(Array* a, int idx) {
                       if (idx >= a->size <must be false>) { <ub> }
                       return a->buffer[idx];
                    }
                    
                    Array_load(Array* a, int idx) {
                       return a->buffer[idx];
                    }
                    

                    This is obviously bad, right? But it gets “better”, say you wrote code

                    int sum_first_n_elements(Array* a, int n) {
                      if (n > a->size) abort(); // no undefined behavior here
                      int result = 0;
                      for (int i = 0; i < n; ++i) {
                         int value = Array_load(a, i);
                         result += value;
                      }
                    }
                    

                    For obvious perf reasons you ensure Array_load is inlineable, so the compiler gets to work

                    int sum_first_n_elements(Array* a, int n) {
                      if (n > a->size) abort(); // no undefined behavior here
                      int result = 0;
                      for (int i = 0; i < n; ++i) {
                         // Using $Array_load_#### as "variables in the inlined function"
                         $Array_load_a = a;
                         $Array_load_idx = I;
                         if ($Array_load_idx >= $Array_load_a->size) { *(void**)NULL; }
                         int value = $Array_load_a->buffer[$Array_load_idx];
                         result += value;
                      }
                    }
                    

                    elide the copies:

                    int sum_first_n_elements(Array* a, int n) {
                      if (n > a->size) abort(); // no undefined behavior here
                      int result = 0;
                      for (int i = 0; i < n; ++i) {
                         if (i >= a->size) { *(void**)NULL; }
                         int value = a->buffer[I];
                         result += value;
                      }
                    }
                    

                    The UB happens

                    int sum_first_n_elements(Array* a, int n) {
                      if (n > a->size) abort(); // no undefined behavior here
                      int result = 0;
                      for (int i = 0; i < n; ++i) {
                         // bounds check removed as (i >= a->size) "always false"
                         int value = a->buffer[I];
                         result += value;
                      }
                    }
                    

                    I now what we get is the pinnacle of what is called “time travel UB”. Recall, the compiler has “proved” that i is always less that a->size, right? So here is what the compiler now “knows”

                    1. i is always less than a->size
                    2. the maximum value of i is n-1
                    3. therefore n-1 is also less than a->size
                    4. therefore the maximum value of n, is a->size
                    5. therefore n > a-size is false

                    And now the function becomes:

                    int sum_first_n_elements(Array* a, int n) {
                      int result = 0;
                      for (int i = 0; i < n; ++i) {
                         int value = a->buffer[I];
                         result += value;
                      }
                    }
                    

                    And note that even the very first bounds, that contained no undefined behavior, is now gone. Woo!

                    This is in fact the exact opposite of what most people expect, which would be

                    1. The entry check means that when we hit the loop, we have proved that n <= a->size
                    2. Therefore I is always less than a->size
                    3. Therefore if (i >= a->size) must always be false (so the UB would not be hit)
                    4. Therefore the if (I >= a->size) statement can be removed

                    And this could be what happens, the thing that makes this interpretation of UB so bad, is that all kinds of different things impact when certain optimizations occur, which means code order, even minor compiler version changes, can change when the compiler hits which code, and therefore which assumptions it ends up making. Imagine sum_first_n_elements is inlined, you could literally end up with some uses of it having bounds checks, and some uses of it not having bounds checks, in the same file. It’s also absurd because a “conforming compiler” could see this

                    if (a) {
                       some_ub;
                    } else {
                       printf("a is false\n");
                    }
                    
                    if (!a) {
                       some_ub;
                    } else {
                       printf("a is true\n");
                    }
                    

                    and turn it into

                    printf(“a is false\n”); printf(“a is true\n”);

                    1. 3

                      Now, in C++ at least - I don’t know about C - unsigned arithmetic is now defined explicitly as 2s complement, and therefore under and overflow have completely defined behavior.

                      Unsigned arithmetic in C has always been mod 2^n and overflow has always been well defined. (Unsigned numbers can’t be two’s complement because that’s a representation of signed numbers.)

                      What changed recently in both C and C++ is that the representation of signed integers is now always two’s complement (no more one’s complement or sign-magnitude). But signed integer overflow is still undefined behaviour.

                      1. 1

                        Sorry, I misspoke. I’ll try to be exact

                        1. Storage of an n-bit integer type is n-bits (duh! :D)
                        2. Storage of an n-bit signed integer is in 2s complement
                        3. Until recently in C++, unsigned arithmetic was not specified as being restricted to the bitiwidth of the type, which meant over and underflow were UB. This has now been fixed.
                        4. Arithmetic on signed integers is not restricted to the bitiwidth of the type (which would require either trapping, or to match unsigned behavior, would require specifying that the arithmetic is also 2s complement in order to match the storage - 1s complement or sign-magnitude cannot represent the values supported by the storage type so the arithmetic must be specified as 2s complement).
                        5. Because of <4> signed under and overflow is undefined.

                        The “easy” spec fix would to say signed under/overflow has unspecified, or is erroneous behavior if C++, and then the spec hasn’t had to argue about any of this, but developers get to stop worrying about a compiler literally pretending that arithmetic does not behave in the way it objectively does.

                        The better fix would be to just say “signed integer arithmetic is performed using 2s complement of the width of the type”.

                        1. 2

                          Until recently in C++, unsigned arithmetic was not specified as being restricted to the bitiwidth of the type, which meant over and underflow were UB.

                          I checked a draft of C++17 which was before signed integers were restricted to two’s complement, and it says (like C) that unsigned integers are mod 2^n and there’s no UB on overflow.

                    2. 2

                      You don’t need UB to eliminate bounds checks if you have a better type system that can guarantee statically that an index is in bounds.

                      1. 2

                        I still don’t get it. Can’t indices come from runtime? You can’t statically check every index value.

                        Are you just disallowing runtime index values? Or shoving them into an unsafe block?

                        1. 2

                          Can’t indices come from runtime? You can’t statically check every index value.

                          Such a language can require that an integer be bounds-checked before being converted into an index (e.g., immediately before, in whatever function constructs the index type). Thus the bounds-checks for runtime indices are moved around, to a point where they may be easier to optimize (or not).

                          1. 2

                            Right, but now that adds a runtime cost, which is what allegedly static type checking could avoid.

                            1. 1

                              @fanf, were you alleging that using sufficiently expressive types can completely avoid runtime bounds-checks even for statically unknown indices?

                          2. 1

                            What the -fbounds-safety proposal does is make the bounds of a pointer visible to the compiler. It does that by a mix of wide pointers and annotations. The wide pointers are essentially what you would call a slice in rust, or similar. The annotations are for abi constrained values, where you have a pointer and the size of the region is governed by some arbitrary expression.

                            We’ll ignore the wide pointers/slices, because semantically that’s easy to understand, instead lets look at a function like this:

                            int sum(int* __counted_by(n) p, int n) {
                              int result = 0;
                              for (int I = 0; I < n; I++) 
                                result += p[I];
                              return result;
                            }
                            

                            The compiler now knows p is a pointer to (at least) n values.

                            In the for loop I is iterating from 0 to n-1, so that compiler is now statically (e.g. at compile time) aware that p[I] is always in bounds and does not need to do any runtime bounds checks. If we changed the condition to I <= n that property would no longer hold and it would perform bounds checks at runtime - exactly how many would depend on optimizations, etc, and you could imagine in a case this trivial the compiler might even warn or error at compile time.

                            When you index into a buffer, the need for a bounds check is determined by whether the compiler can determine statically that the index is within the bounds.

                          3. 2

                            But that wouldn’t be C anymore:

                            I thought such bounds-checking was fundamentally incompatible with the language.

                      2. 6

                        It’s not that obvious. A lot of different solutions were tried over the decades with somewhat different properties, e.g. Annex K.

                        But also, there simply wasn’t a ton of effort behind these kinds of safety improvements for the longest time. The alternative approach of “just get better at programming” was tried and tried again until it became absolutely undeniable that it wasn’t going to work - which I think finally became apparent when the US government had to step in to name memory safety as a serious issue.

                        Along with that, there is a culture of absolutely “zero overhead”, which this violates. An even more obviously good change, namely that variables that aren’t explicitly initialized get filled with 0-bytes instead of uninitialized memory was also rejected for a long time due to this principle.

                        The twist: it turns out this can in many cases improve performance, because an extra xor eax eax instruction breaks the dependency between the previous and the next value in eax, giving the CPU more freedom for register renaming and such tricks. We hadn’t bothered to measure this, it was simply assumed that more code means it’s slower.

                        1. 4

                          C programmers are very superstitious when it comes to performance. It has been widely believed for a long time that bounds checks represent a quite costly overhead, even though there’s no evidence for this.

                          1. 6

                            And in fact we now have evidence from Google that the cost of bounds checks is tiny https://lobste.rs/s/rgcgev/retrofitting_spatial_safety_hundreds

                          2. 3

                            what took so long? … So what’s different this time?

                            I don’t know, but my guess would be: computers are way faster, with more memory to do better analysis.

                            I’m not a fan of the RFC as it requires a lot of user annotation. My preference would be a function signature like: int foo(blah *x) be treated as if the user typed int foo(blah *__single x). If you want to pass in an array, do it like: int foo(blah x[n]) where n becomes a size_t variable passed immediately after x, so equivalent to int foo(blah x[],size_t n). C89 already can parse int foo(blah x[5]) so this wouldn’t be much of a change—certainly less than the one presented.

                            1. 7

                              The RFC is going into a lot of the technical elements only.

                              You do not have to use the annotations unless the bounds are non-trivial, so all your examples act as you would expect/want

                              int foo(blah *x) // implicitly int foo(blah * __single x)
                              int foo(int n, blah x[n]) // implicitly int foo(int n, blah *__counted_by(n) x)
                              int foo(blah x[5]) // implicitly int foo(blah *__counted_by(5) x)
                              int thing[5]; // thing is __counted_by(5)
                              

                              The design of -fbounds-safety is to be a usable solution for existing code, which means it’s designed to reduce as much as possible the need for annotations unless they are actually needed, so you don’t need to do

                              int * __sized_by(100) buffer = (int*)malloc(100)
                              

                              just

                              int * buffer = (int*) malloc(100)
                              

                              Results in buffer having correct bounds information. Note this is not specific to malloc, under this proposal malloc is simply specified as

                              void* __sized_by(sz) malloc(size_t sz);
                              

                              You only need to add annotations for cases where there is no information to determine the bounds.

                              Returning to the above malloc case, you will get correct bounds checking if you do

                              int * __sized_by(100) buffer = (int*)malloc(100)
                              int *why = buffer + 50;
                              why[1] = 0;
                              why[-1] = 0;
                              why[50] = 0; // would trap
                              why[-51] = 0; // would also trap
                              

                              A huge part of the problem in prior attempts to “solve” bounds checking in C is that people basically tried to go all in on no annotations (can’t work without at least breaking ABI, if it can work even then), or “annotations everywhere”.

                              A huge amount of the design work in this proposal was to minimize the extent to which new/additional annotations must be added, because every additional annotation increases the difficulty in adopting the model in the mountains of existing C code that exists, so as much as possible is inferred or automatic. Similarly the scope of the design and the number of different counting and bounding mechanism are the result of needing to apply to as much existing code as possible.

                              It’s also important to understand that the reason there are so many of these annotations, is because huge amounts of existing code is subject to ABI constraints, which means that there is limited, if any, ability to change function signatures, struct fields or layout, etc.

                              For code that is not subject to such constraints, it’s likely easier to just use the wide pointers, e.g.

                              int f(int* __indexable ptr) {
                                 for (int* p = __ptr_lower_bound(ptr); p != __ptr_upper_bound(ptr); ++p) { ... }
                              }
                              

                              Or similar - though those macros could possibly be more concise :D

                            2. 3

                              I thought such bounds-checking was fundamentally incompatible with the language.

                              I think your assumptions are all incorrect. There have been bounds-checked variants of C forever. Hell, I wrote one myself.

                              On the other hand, doing it without losing the low-level memory access capabilities of C is a harder thing. That requires fundamental changes to the language, which is what this is.

                              what took so long?

                              Lack of interest, really. None of the memory safe variants of C have been popular (including mine).

                              1. 3

                                There have been many attempts, but the reality is it’s an immense amount of work to retrofit a major semantic change to a core part of the language (for any language).

                                All the paths that are generally “achievable” in C without immense investment, either can’t handle the wide gamut of possible mechanisms that real world code “specifies” buffer sizes, or aren’t able to be ABI compatible - the way every safe language handles memory safety is by having the size of memory referenced by a pointer be either part of the type, or by having “pointers” include their bounds. But because it’s awful C/C++ don’t distinguish single object pointer from arrays, arrays don’t include their bounds, some arrays don’t even have the concept of “size” but use a sentinel, etc.

                                On top of that, because historically people writing C/C++ have not performed bounds checks, the compilers were not particularly good at optimizing said checks. Florian gave a talk about the work he did to make constraint optimization - especially bounds checks - much more effective in 2021: https://www.youtube.com/watch?v=1hm5ZVmBEvo

                                1. 2

                                  but the reality is it’s an immense amount of work

                                  As mentioned in my post, I’ve literally implemented a memory safe C. It wasn’t especially complex from a language design point of view, however the memory safety did come with some overhead. The main issue was that a lot of C programs depend on unfettered memory access, so my users requested that I remove the memory safety feature, which I ended up doing.

                                  1. 2

                                    You implemented a memory safe C that apparently could not be used, which is the point.

                                    1. 1

                                      That’s not the case. It was used by a lot of people prior to the change. It’s just that the users preferred C’s loose access to memory so they could do direct device access.