1. 25
  1. 34

    Hate to be the one to say it but didn’t enjoy the clickbait. Undefined behavior has been like this for at least 20 years.

    If you are going to be doing arithmetic with untrusted user input it’s your responsibility to check ahead of time that it satisfies your program’s preconditions and doesn’t invoke undefined behavior. If you need help you can use the integer overflow builtins. __builtin_mul_overflow() would have helped you in this case.

    1. 16

      The only reason everyone is so aware of integer overflow being UB, is clang and gcc deciding UB allowed them to remove overflow related security checks from a bunch of critical software a decade or so ago, and then respond to the “you removed the overflow checks” with “well that’s UB and you were wrong to rely on the way all arithmetic works on all computers”. Understand that many of these security checks were based on the basics of how integers work on all computers, had been present for decades prior to gcc and clang deciding that removing them was acceptable.

      There is no justification that ever warranted such overflow being UB, and the only reason it remains UB is for compilers to game compiler benchmarks. Today there is no hardware that exists where it can’t be defined as 2’s complement, but even before modern (e.g. last 30-40 years) hardware was not random, and the overflow could have been unspecified or implementation defined. There is not now, and more importantly, there never has been, any time where the result of overflow or underflow could not be defined by a compiler vendor.

      1. 6

        Compilers started doing this because people use signed variables as loop induction variables, with addition on every loop iteration, and a less-than check for termination. If you assume that signed overflow can’t happen, then you know that the loop must terminate. It is then amenable to scalar evolution, which helps drive vectorisation, which then gives a 4-8x speed up on some important workloads. Without that assumption, the sequence of loop iterations is not amenable to scalar evolution and you lose all of these benefits.

        1. 5

          I know that those for loops are the reason that C/C++ compilers have adopted this contorted view of what UB means. I believe that that is solely for benchmark gaming because the supposed benefit seems extraordinarily hard to get to happen.

          For the UB to be beneficial I have to have a function that will compile meaningfully differently with clang or gcc with -fwrapv vs -fno-wrapv. The only example I have found from existing posts justifying the “integer overflow should be UB” that has any real difference in codegen is a bzip comparison function iirc which I butchered thusly:

          template <typename IntType>  __attribute__((noinline))  int
          cmp(IntType i1, IntType i2, unsigned char *buf)
          {
              for (;;) {
                  int c1 = buf[i1];
                  int c2 = buf[i2];
                  if (c1 != c2)
                      return c1 - c2;
                  i1++;
                  i2++;
              }
          }
          

          I can compile this with -fwrap or -fno-wrap and while there is some restricting (using indexed loads, etc) there is no performance difference.[*]

          I firmly believe that keeping the “signed overflow is UB” nonsense is purely a benchmark game that does not offer actual performance wins in real world code, but has massive costs - both in terms of performance and security.

          [*] There is a 2x difference in perf under rosetta 2, but assuming the same benchmarks being compiled then r2 may just have less work in the uncommon code you get from gcc and clang compiling x86_64 with -fwrapv.

          1. 3

            The places where this shows up are usually after inlining and some local optimisations. They generally don’t show up in microbenchmarks because code that is simple enough to fit in a microbenchmark is amenable to more analysis, but that example is definitely not the shape of one that I would expect to see. The loops where this is usually a win are typically nested loops of the shape for (int i=x ; i<y ; i++). For each loop in the loop nest, you can reduce this to a range and then you can model this as an arithmetic, geometric, (and so on depending on the nesting depth) sequence. This analysis exposes vectorisation opportunities that are not present if one of the loops may be infinite (or just follow a weird pattern) as a result of signed overflow resulting in a negative number.

            As a rule of thumb, about 90% of compute time is spent in loop nests in a typical program, so you don’t need a bit speedup on loop nests to see some real-world improvements.

            I firmly believe that keeping the “signed overflow is UB” nonsense is purely a benchmark game that does not offer actual performance wins in real world code

            For this to be true, the following would have to be true:

            • It would have to make a measurable difference in one or more benchmarks.
            • Those benchmarks would have to be the only instances of that code pattern in CPU-bound code.

            Much as I dislike SPEC CPU, I think it’s very unlikely that this is the case. In my experience, these have been pushed by big companies that have run the modified compiler on large codebases and seen measurable improvements in performance-critical loops.

            1. 1

              I’ve also seen the argument of “optimizing” this

              for (int i=0; i < inputNumber; ++i) …
              

              into correctness (on 64-bit machines):

              for (size_t i=0; i < inputNumber; ++i) …
              

              Arguably, size_t was the correct integer natural number type all along and using an integer type for a natural number quantity is just asking for trouble.

            2. 4

              Is there any situation where the user couldn’t recover the optimisation by rewriting their loops a little bit?

              1. 2

                Probably not, but given the 10+ billion lines of C/C++ code in the world, who is going to pay for auditing it all, finding the bottlenecks introduced by these problems, and fixing them all?

                1. 3

                  Sure. On the other hand, I’m thinking the same thing about all those integer overflow UB we haven’t yet discovered, for which -fwrapv haven’t yet been enabled.

              2. 1

                …Which is a good example of how C/C++ style signedness for integers is kinda bullshit. XD C-style for loops are a sneaky-good way to expose the problems with them; iterators or ranges are one possible way of making some of the problems go away. Use the right iterator and you know it will terminate.

              3. 2

                Unsigned integer overflow is well-defined in C. Only signed integer overflow is undefined. Whether signed integer overflow should be undefined or implementation defined behavior seems like it was a judgment call, though I’m sure the original C standards committee had a valid rationale at the time for making it undefined behavior. The hardware landscape in the 1970s and 1980s was a lot different from what it is now.

                For what it’s worth, C/C++ have always embraced the idea of UB. On the other hand, most mainstream contemporary languages fully define invalid constructs. If you’re coming from a such a language, where you are used to the paradigm that everything is well-defined, I can imagine that seeing code deleted during optimization is probably a surreal experience. Ideally people who are not familiar with C/C++‘s expectation that the programmer ensures that undefined things don’t happen absolutely should not casually write C/C++. In practice, due to a confluence of factors, that ideal will never be met. Fortunately Rust is emerging as an alternative lowish level language that has fully defined semantics compatible with what the average programmer coming from a language like Java/JavaScript expects.

                1. 8

                  Unsigned integer overflow is well-defined in C. Only signed integer overflow is undefined.

                  As I’ve said elsewhere, the only reason for this divergence is compiler benchmark games.

                  More over, even in the 70s and 80s integer overflow did not produce random output on any hardware, and so should be either unspecified or implementation defined. Neither of which allow the compiler to just assume they can’t happen and make backwards assumptions from there.

                  That said, there has not been any hardware not using 2’s complement now in a few decades, so any call to older hardware is kind of stupid, and the behavior should just be specified.

                  For what it’s worth, C/C++ have always embraced the idea of UB.

                  No, there is a vast difference between acknowledging that there are things the language allows you to do, that have unpredictable outcomes, and what blindly saying things are “undefined behavior means that the compiler can do whatever it wants”. The latter is a fairly contorted view of what the specification says that compiler devs started taking a few decades ago and immediately started introducing security vulnerabilities - specifically by removing overflow checks that had worked for decades, and worked on all systems.

                  The problem is only partially that C/C++ erroneously uses “UB” for behaviors that can be unspecified or implementation defined, a bigger part of the problem is that compiler writers have taken “what happens when you do an operation X that triggers UB places no requirements on the compiler” as meaning “pretend X never happens, and remove code paths in which it does”.

                  The only people with this expansive view of “Undefined Behavior” are the compiler authors. To get an idea of the absurdity:

                  int i = f();
                  if (i == 0) {
                    logError("Whoops, we're dividing by zero\n");
                  }
                  int x = g() / i;
                  

                  By the rules compiler vendors invented, you will never get a message logged, because the only program that can possibly get that message has i==0 and that makes g()/i UB, which is invalid so can’t happen. Therefore i can never be 0 so that if() statement can be dropped.

                  If you’re coming from a such a language, where you are used to the paradigm that everything is well-defined, I can imagine that seeing code deleted during optimization is probably a surreal experience.

                  Alternatively if you’re used to writing code in C and C++, and know how computers work you may be a bit miffed at compilers actively breaking decades old code based on a highly questionable definition of what they are permitted to do, which is an interpretation that clang and gcc introduced and have pushed only over the last 10-15 years.

                  We can acknowledge that the c and c++ compilers have taken this view, and write our code in the knowledge that c and c++ compilers are fundamentally an adversary, while continuing to be angry about said compilers making decisions that they know are against anything a developer has written.

                  1. 6

                    By the rules compiler vendors invented

                    We can acknowledge that the c and c++ compilers have taken this view

                    while continuing to be angry about said compilers making decisions that they know are against anything a developer has written

                    With respect, I think you’ve got entirely the wrong end of the stick. No compiler developer has ever sat down, scowled at their monitor, and decided to make their compiler break people’s code. What to you looks like malice is actually entirely incidental to the process of writing optimising compilers.

                    The ‘highly questionable’ behaviour you’re seeing is emergent: it appears out of the process of applying logical rules of inference on a system where the priors are reasonable and desirable, in much the same way that an artificial general intelligence might turn the world into paperclips if their goals are insufficiently constrained, despite this never being the objective of its programmer.

                    There is no way to ‘solve’ this other than to either forego desirable & sensible optimisations, or to make the C standard define more behaviours. The hard logic of a compiler doesn’t compromise when held against the fuzzy intuitions humans have about what counts as ‘correct’, and there’s no way to solve this problem: the compiler cannot know what code you intended to write, it doesn’t have enough information! So each incremental optimisation pass will do something reasonable in isolation, and then suddenly - whoops! Your code is broken.

                    Ralf Jung’s article about this emergent phenomena in relation to pointer provenance is very good: https://www.ralfj.de/blog/2020/12/14/provenance.html

                    1. 7

                      No compiler developer has ever sat down, scowled at their monitor, and decided to make their compiler break people’s code.

                      But they did disregard the consequences of their actions. They wanted to enables some optimisations, saw that the standard let them get away with it, then proceeded to ignore bug reports warning them they were introducing vulnerabilities in old, reasonable-looking code.

                      There is no way to ‘solve’ this other than to either forego desirable & sensible optimisations, or to make the C standard define more behaviours.

                      Obviously C should define more behaviours. They don’t. I suspect one reason is the presence in the standard committee of compiler writers that just wouldn’t let go of their pet optimisations. I mean, yes, if we made -fwrapv the default, some existing programs would indeed be slower. But then, is there any such program that couldn’t recover their optimisations by modifying their loops a little bit? Right now I don’t think there are any.

                      Compiler writers favour speed over correctness. When they introduced those new optimisations that broke old code because of UB, they had a simple choice: either assume -fwrapv and renounce some speed up on old programs, or do the UB thing and introduce critical vulnerabilities in generated code that weren’t there before. They chose the latter.

                      I mean, it’s probably not explicit malice, but there is such a thing as criminal negligence.

                      1. 2

                        What’s the point of this reply? It feels like arbitrary finger-pointing to no end. If you don’t like how C is managed and what it’s designed to do, then… don’t use it? Rust is just there and solves the vast majority of these issues in practice, you know?

                        1. 2

                          You’ re being too kind to the standard committee. How do I put it… Call me entitled, but when you have such a global impact you have a moral obligation to world class results. And as difficult and gruelling their job may be, the standards committee is falling short.

                          Rust is just there

                          Not quite. First class support is limited to Windows, Linux, and MacOS. Tier 2 support looks better, but doesn’t even include OpenBSD. As promising as it is for application development on a limited number of very mainstream platforms, it stops working as soon as we aim for broader portability.

                          If you don’t like how C is managed and what it’s designed to do, then… don’t use it?

                          I can’t avoid C.

                          I do wish there was a better way, but at this point the best I can hope for would be a language that compiles to C, in a way that avoids as many pitfalls as possible. In fact, I’m seriously considering the possibility of a “C2”, similar to Herb Stutter’s Cpp2.

                          1. 1

                            Zig is going to be getting a backend that transpiles to pure C very soon. Given its already brilliant interop with C and its at least mostly sane semantics, perhaps that’ll suit your needs?

                            1. 1

                              It actually may.

                    2. 1

                      Unsigned integer overflow is well-defined in C. Only signed integer overflow is undefined.

                      As I’ve said elsewhere, the only reason for this divergence is compiler benchmark games.

                      Maybe I’m misunderstanding you but I don’t think that’s the case. At the time that C was first standardized, there were a few different signed integer representations used in mainstream architectures. Different unsigned integer representations didn’t really exist in mainstream architectures.

                      That said, there has not been any hardware not using 2’s complement now in a few decades, so any call to older hardware is kind of stupid

                      Not sure if this was targeted at me but if it was then it’s based on a misreading of my comment. I didn’t justify the existence of UB today on the basis of old hardware. My point was that in the 70s and 80s when C was in the process of being standardized, deciding on UB for signed integer overflow may have be a sensible thing to do given the hardware landscape at that time.

                      “what happens when you do an operation X that triggers UB places no requirements on the compiler” as meaning “pretend X never happens, and remove code paths in which it does”.

                      Logically the first statement clearly permits the second.

                      In general I think your comments seem a bit angry and lacking an assumption of good faith on the actors who have shaped C from the past to the current day. There’s no reason to assume that the compiler writers or standards committees had or have malicious or antagonistic intentions. Everyone wants a secure software stack. There are just many different factors to consider and use cases to accommodate.

                      1. 1

                        Maybe I’m misunderstanding you but I don’t think that’s the case. At the time that C was first standardized, there were a few different signed integer representations used in mainstream architectures. Different unsigned integer representations didn’t really exist in mainstream architectures.

                        C and C++ have the concept of implementation defined behavior, which is what describes “different forms of integer”

                        Not sure if this was targeted at me but if it was then it’s based on a misreading of my comment. I didn’t justify the existence of UB today on the basis of old hardware.

                        No, it wasn’t directed at you and I apologize that it came off as such.

                        It was a response to the general “the claim the lack of standardization in the industry at the time warranted this being UB”, which is simply false. C and C++ both have the concept of “implementation defined”, which clearly would cover the wide variety of integer behaviors that exist. At the time classifying things as UB wasn’t a huge problem, but once compilers changed the interpretation of what UB allowed them to do the difference between ID and UB becomes a source of security vulnerabilities.

                        Logically the first statement clearly permits the second.

                        This is the language lawyering that people dislike - the language does not say “the compiler can replace any instance of UB with ‘system(“rm -rf /”)’ but we all understand that if a compiler did do that, the compiler would be wrong. The problem is the decision to take “no constraints” as “can do anything on any code path that invokes UB”, which is an interpretation that only works if you decide to ignore the context and focus only on dictionary interpretation of each word. This is flawed because that isn’t how human language actually works.

                        In general I think your comments seem a bit angry and lacking an assumption of good faith on the actors who have shaped C from the past to the current day.

                        I am fairly angry about this interpretation of UB, because it is wrong and has resulted in far more security vulnerabilities than it has meaningfully improved performance. I understand why compiler devs have taken this approach: compilers are compared almost exclusively on the basis of codegen performance and this interpretation helps benchmarks. They are not compared on the basis of security of their generated code (see the slew of security regressions created when they first adopted this definition of UB), a non-zero part of this is because who the heck knows how you would measure that?

                        A lot of UB remains that has never had any real reason to be “undefined”: the spec has the concept of implementation defined for decades/ever? nonetheless definable behavior (even as “implementation defined”) has been classified as UB and so added totally unnecessary foot guns to a language that is already a footgun prone horror.

                        I don’t think these decisions are intended to be malicious, but the result of those decisions is that if you are writing or maintaining C/C++ code you have to assume that the compiler is an adversary working against you whenever you are writing code that is security sensitive.

                      2. 1

                        This is a C/C++-specific problem. Reread the entire thread and pay attention to which languages are mentioned.

                        1. 2

                          Rust also potentially suffers from this in some cases, the main one being pointer provenance. It’s just less notable because they’re only really triggerable in unsafe.

                      3. 2

                        Whether signed integer overflow should be undefined or implementation defined behavior seems like it was a judgment call, though I’m sure the original C standards committee had a valid rationale at the time for making it undefined behavior.

                        I’ve heard that at the time, some platforms went bananas upon signed integer overflow, so they made it undefined, perhaps without realising that compiler writers would eventually take that as a license to interpret that literally for all platforms, including the 2’s complement ones that comprise >99.999% of all computers.

                        I think there’s a disconnect between the spirit and the letter of “undefined” here. And now compiler writers won’t give up their pet optimisations.

                    3. 12

                      Completely agreed. The entire post boils down to “I invoked undefined behavior and was surprised that something happened”. Undefined is in the name - you can’t expect anything!

                      We could debate whether UB should continue this way or if we should handle it differently, but that’s an entirely separate discussion from what’s going on in the post.

                    4. 16

                      By default, GCC optimizes for performance at the expense of security, safety and the ability to audit code.

                      This is not the right choice for everybody, and I don’t think it should be the default. Nevertheless, there is a compiler flag: -fwrapv:

                      This option instructs the compiler to assume that signed arithmetic overflow of addition, subtraction and multiplication wraps around using twos-complement representation. This flag enables some optimizations and disables other.

                      The Linux kernel has used -fwrapv since 2009. There are open bugs in Mozilla and Chrome to enable the flag.

                      1. 2

                        By default, GCC doesn’t optimise. You have to pass a -O flag to make GCC optimise.

                      2. 13

                        This is standard and well known UB - though it clearly and objectively should not be - you cannot rely on triggering UB to detect UB, and I recall a few years back (a decade or more maybe?) I slew of security bugs when GCC+Clang decided to elide if (a + b < a) error(), etc error checks because the overflow was “UB”.

                        The result is overflow checks had to be added that were both harder to read, and slower - compilers only added the required builtins (or fixed terrible codegen of existing builtins) for correct behavior after they introduced the aforementioned security bugs.

                        The unrelenting stupidity of any form of integer overflow being UB was, is, and always will be a source of security bugs, and is fundamentally inexcusable. There is no platform on which any of these are undefined, and the only reason that unsigned overflow is defined but integer overflow is not is because compilers have benchmarks that use ‘int’ rather than unsigned. There is no justification, and no remotely sane alternative reason fir the bifurcation of UB on overflow between signed and unsigned integer arithmetic.

                        1. 5

                          It’s baffling that C won’t even add functions for safe signed arithmetic. People have been getting burned on the overflow checks for decades, and it will continue to be a problem for as long as C lives.

                          1. 1

                            right?

                            Here’s the standard compliant and UB-free way to safely compute a*b:

                            template <std::integral IType> constexpr std::optional<IType> safe_overflow(IType a, IType b) {
                              if (b == 0) {
                                return 0;
                              }
                              auto max_factor = std::numeric_limits<IType>::max() / b;
                              if (a > max_factor)
                                return std::nullopt;
                              return a * b;
                            } 
                            

                            a non-standard version is

                            template <std::integral IType> constexpr std::optional<IType> safe_overflow_ns(IType a, IType b) {
                              IType result = 0;
                              if (__builtin_mul_overflow(a, b, &result))
                                return std::nullopt;
                              return result;
                            } 
                            

                            You will be shocked to hear that the latter is vastly vastly faster than a division and compare that the standard requires.

                            1. 1

                              It’s been added in C23.

                          2. 7

                            I noticed a weird trend where the check for if something is undefined happens after the undefined behavior. Intuitively it makes sense to me that this is insufficient so I’m wondering why this failure is so common? For example here the overflow check happens after the overflow already happened.

                            1. 4

                              I don’t usually do that, but in my case there were 2 reasons:

                              • the initial intention when writing it wasn’t protecting against overflow/UB but simply protecting against the computation going outbound for the dereference
                              • the assignment needed to be done early because I was targeting a codebase with some ancient C convention on variable declaration required to be before any code; and since I had the form where I wanted to bail out early instead of opening a scope, I had to declare it early:
                              int x = ...
                              if (...)
                                  return -1
                              // x can not be declared here
                              return f(x)
                              
                              1. 9

                                Not trying to be critical, but it shouldn’t be news that you can’t check for a UB condition after it’s happened. I’ve seen so many cases where people run into similar problems, or problems related to the type-punning rules. Usually the thought process is something along the lines of:

                                ok, so I know this is supposed to have undefined behavior. But the variable will still have some value, so I can check if the value is in the correct range and so avoid issues that way…

                                No, you can’t. This is what undefined behaviour means. All bets are off; if it’s hit, it is a bug in the code, full-stop, and no checking afterwards can fix it because the behaviour from that point (*note1) on is totally undefined. Maybe it seems to work in some cases. It doesn’t matter; use a different compiler, or a later version of the same compiler, and all of a sudden it could stop “working”.

                                Don’t think of the C compiler as some sort of glorified high-level assembler. It’s way more sophisticated than that. There are rules you have to follow, if you are using any of the modern compilers. You don’t have to like it (and there are even switches available that will give behaviour that you want, but that’s not standard C any more) but it is the case. You must not ever invoke undefined behaviour.

                                Note 1: Actually, I believe the whole program behaviour is undefined if the program exhibits undefined behaviour at any point. So technically, even things that were supposed to happen before the UB might not happen or might happen differently.

                                1. 5

                                  You are technically correct, but I’m sure you understand that the consequences of such a policy means that pushed to the extreme we could have the situation where a 100k LoC codebase has a single little bug deep down somewhere, then crashing or executing random code straight at startup is an acceptable behavior.

                                  The cost of a single mistake is very high, that’s the main point I’m trying to make.

                                  1. 9

                                    What’s the alternative? If the compiler can’t optimise around code that hasn’t been executed yet having UB, then the opportunities for useful optimisation become near non-existent.

                                    The compiler is not doing anything unreasonable here: every step, in isolation, is desirable and valid. If the end result feels unreasonable, then that’s either (a) a problem with the language spec being insufficiently relaxed about what it considers to be UB or (b) a problem with insufficient tooling (or, in the case of a language like Rust, built-in compiler checks) to catch issues like this.

                                    To point the finger at the compiler is a very hard sell indeed because there’s no specific thing to point at that it’s doing wrong.

                                    1. 9

                                      It might be reasonable not to do the optimization. The alternative in rust is to actually define the behavior of wrapping, which would be equivalent to using -fwrapv in C. Sure we loose some optim, but is it worth it? I’m starting to believe so.

                                      1. 10

                                        Yes, I agree: but that’s a problem with the language spec, not the compiler. The language spec should just say ‘overflow has wrapping semantics’. You’ll lose some opportunities for optimisation and compatibility with a lot of older of obscure platforms (some platforms have arithmetic instructions that don’t wrap on overflow, and this is one of the big reasons that the C spec leaves overflow undefined!), but this is enough of a footgun that I think it’s a worthwhile tradeoff in the year of our lord 2022.

                                        But again, this isn’t GCC’s fault: this is the fault of the language spec and the compromises that went into its creation. Don’t like it? Time to get a new language (this isn’t me trying to be gatekeepy: horrid footgun shit like this is a big reason I moved to Rust and never looked back).

                                        1. 6

                                          Not saying it’s GCC fault, but just because a spec did a mistake doesn’t mean GCC should be braindead about it: it holds a responsibility for all the software in C out there. Nothing forces GCC to do dangerous optimizations; it can still follow the specs by not honoring this part. GCC serves the user, not the specs; the question becomes: do users want this kind of optimization and assume its consequences by default?

                                          1. 3

                                            Where’s the mistake? Integer overflow being undefined is a feature, not a bug. There are platforms where the behaviour of overflow is implementation defined, entirely unpredictable, or just straight up UB at a hardware level, leaving the machine in a totally invalid state. C is designed to target bizarre and unusual architectures like these, and so having integer overflow be undefined is a huge boon to the language’s portability without sacrificing (and even improving, in many cases) performance.

                                            If you’re just going to do language spec revisionism and claim that ‘the spec is wrong’ or something, then I think it’s clear that C’s not the language for you. Heck, it’s definitely not the language for me: I aggressively avoid touching the thing nowadays.

                                            1. 3

                                              I am sure there is, so please name one.

                                              1. 3

                                                Many GPUs have saturating semantics on overflow. Other architectures emulate small integers with large integers, meaning that overflow results in unobservable ‘extra bits’. Changing the standard to make integer overflow always wrap would make writing C for these architectures extremely difficult without significant performance ramifications.

                                                If reducing portability is fine with you, then so be it: but that’s not what C is for: it’s supposed to be the lowest common denominator of a vast array of architectures, and it does this quite effectively in no small part because it leaves things like integer overflow undefined.

                                              2. 3

                                                There are platforms where the behaviour of overflow is implementation defined, entirely unpredictable, or just straight up UB at a hardware level, leaving the machine in a totally invalid state.

                                                Can you name one such platform? That is still used after Y2K?

                                                Also note that the spirit of UB in 1989 was almost certainly a compatibility thing. I doubt the standard committee anticipated anything other than -fwrapv on regular 2’s complement processors. And it’s only later that compiler writers realised that they could interpret “UB” in a way that in this particular case was most probably against the spirit of it.

                                            2. 2

                                              Yes, I agree: but that’s a problem with the language spec, not the compiler.

                                              Compiler writers are on the standard committee…

                                              1. 1

                                                I dont think defining the behaviour of overflow is desirable: programmers want overflow to happen in very rare cases and defining its behaviour now means tools cannot distinguish between overflow the programmer wanted/expected and accidental overflow (the vast majority of cases in my experience).

                                                We currently can write sanitizers around overflow because its undefined, if we had defined it as wrapping the sanitizers could only say “well its wrapping, but I guess you wanted that, right ?”

                                                AFAIU rust traps on overflow in debug, and defines it as wrapping in release, I believe this is mostly because they decided undefined behaviour in safe code was unacceptable, so they went with defined but very likely wrong in release.

                                              2. 4

                                                You lose far fewer optimisations in a language that is not C. Unfortunately, in C, it is a very common idiom to use int as the type for a loop induction variable. Having to reason about wrapping breaks a lot of the analyses that feed vectorisation. In C++ or Rust, you typically use iterations, rather than raw indexes, and these iterations will use an unsigned type by default. Operating over the domain of positive integers with wrap to zero is much simpler than operating over the domain of signed integers with overflow wrapping to a large negative number and so the C++ and Rust versions of these loops are easier to vectorise. In C, using something like size_t as the type of the induction variable will often generate better code.

                                                1. 2

                                                  Then… how about renouncing these optimisations, and tell everyone to update their code to use size_t so it is fast again? Because I sure resent compiler writers for having introduced critical vulnerabilities, and tell everyone to fix their programs so they are safe again…

                                                  I mean, sometimes the hoops I have to jump through… libsodium and Monocypher for instance can’t use arithmetic left shifts on signed integers at all. Instead of x << 26 we need to write x * (1<<26), and hope the compiler will be smart enough to generate a simple shift (thankfully it is). Reason being, left shifting negative integers is straight up undefined. No ifs, no buts, it’s undefined even when the result would stay within range.

                                                  1. 3

                                                    Then… how about renouncing these optimisations

                                                    That’s what PCC does. It renounces all of these optimisations and, as a result, generates much slower code. OpenBSD tried to use it for a while, but even they couldn’t put up with the poor performance (and OpenBSD generally favours security over performance). The market has shown time and time again that a C compiler that generates fast code will always be chosen over one that generates more predictable code for undefined behaviour.

                                                    It’s not like there aren’t alternative compilers that do what you claim to want, it’s just that no one (including you) actually wants to pay the performance cost of using them.

                                                    1. 3

                                                      The market has shown time and time again that a C compiler that generates fast code will always be chosen over one that generates more predictable code for undefined behaviour.

                                                      Gosh, I think our mutual employer provides a strong counter-example. The incentives of a standalone compiler vendor are very different to a vendor that uses the compiler to compile billions of lines of its own production code. Our compiler adds new security features at the expense of performance continually, and internal code is required to use them. IMHO these end up at the opposite absurd end of the spectrum, like default-initializing stack variables to ensure the undefined behavior on access becomes implementation defined, stack overflow buffer checks, etc. In addition to performance vs. security, there’s also a stronger emphasis on compatibility vs. performance; updating the compiler in a way that would defeat large numbers of existing security checks would come under a lot of scrutiny.

                                                      1. 2

                                                        I thought about MSVC’s interpretation of volatile as a counter example here (it treats it as the equivalent of sequentially consistent atomic, because that’s what a lot of internal legacy code assumed). But then I thought of all of the third-party project switching to using clang on Windows, including things like Chrome and, by extension, all Electron apps and realised that it wasn’t such a counter example after all. For a long time, MSVC was the only compiler that could fully parse the Windows headers, which gave it a big advantage in the market, now that clang can do the same, that’s eroding (I believe clang can now parse all of the Windows source code, but it can’t correctly codgen some large chunks and doesn’t generate code that is the shape expected by some auditing tools).

                                                        Alias analysis is another place where MSVC avoids taking advantage of undefined behaviour. Apple pushed for making -fstrict-aliasing the default and fixed (or encouraged others to fix) a huge amount of open source and third-party code, giving around 5% better performance across the entire system. MSVC does not take advantage of type-based alias analysis because doing so would break a lot of existing code that relies on UB. This is also pushing people who have code that does not depend on illegal type punning to use clang and get more performance.

                                                        Note that I am talking specifically about interpreting the standard with respect to UB to enable optimisations here. I see security flags such as /GUARD, stack canaries, InitAll, and so on as a different category, for three reasons:

                                                        • They are opt in, so you can ignore them for places where you know you’re sandboxing your program or where it isn’t operating on untrusted data.
                                                        • They make certain behaviour well defined, which makes it easier to reason about your source code. Not taking advantage of UB does not have this property: your program still depends on UB and may still behave in unexpected ways and your performance is now harder to reason about because it will vary hugely depending on whether, post inlining, you have enough hints in your source for the compiler to prove that the condition will not trigger.
                                                        • They, in general, don’t impede other optimisations. For example, InitAll combines with stead store elimination and typically can be elided by this optimisation (and Shayne did some great work to improve this). /GUARD is applied very late in the pipeline (I worked on the LLVM implementation of this so that we could have CFG for Rust and Objective-C), and so inlining and devirtualisation can significantly reduce the number of places where you need the check (MSVC has some very impressive profile-guided devirtualisation support, which helps a lot here). In contrast, things like not assuming that integer addition results in a larger number have a big knock-on effect on other optimisations.
                                                      2. 1

                                                        Well, there is renouncing a class of optimisations, and defining a class of behaviours. I don’t think those are the same. Which one PCC was trying? Did it define integer overflow and pointer aliasing etc. or did it disable dangerous looking optimisations altogether?

                                                        it’s just that no one (including you) actually wants to pay the performance cost of using them.

                                                        I put myself in a situation where I can actually cop out of that one: I tend to write libraries, not applications, and I ship source code. This means I have no control over the compilation flags, and I’m forced to assume the worst case and stick to strictly conforming code. Otherwise I would try some of them (most notably -fwrapv) and measure the impact on performance. I believe I would accept any overhead below 5%. But you’re right, there is a threshold beyond which I’d just try to be more careful. I don’t know for sure which threshold this is though.

                                                        1. 1

                                                          I tend to write libraries, not applications, and I ship source code. This means I have no control over the compilation flags

                                                          How’s that? Libraries would still come shipped with a build system to produce (shared) objects, right?

                                                          1. 1

                                                            Libraries would still come shipped with a build system to produce (shared) objects, right?

                                                            Not when this library is literally one C source file with its header, with zero dependency, and used on obscure embedded targets that don’t even have a standard library and I don’t know of anyway.

                                                            I do ship with a Makefile, but many people don’t even use it. And even if they did, they control $CFLAGS.

                                                            1. 1

                                                              Ouch, that’s not an enviable situation to be in :S

                                                              Too bad you can’t enforce some of those semantics using #pragma or something.

                                                              1. 1

                                                                Well, then again, I did it on purpose: sticking to standard C99 with zero dependency is how people ended up using it in those contexts. My work is used on a previously underserved niche, that’s a win.

                                                                And in practice, I did one error of any consequence, and it was a logic bug, not anything to do with C’s UB. I did have a couple UB, but none ended up amounting to anything. (There again, it helps that my library does zero heap allocation.)

                                              3. 6

                                                Yes, that is exactly the by-design consequence of C UB. A single UB anywhere deep in your code could convert your computer into a giant whale or a potted plant.

                                                1. 4

                                                  Yes. Writing code in C is a minefield, and I think people who write code in this language need to be aware of that.

                                              4. 3

                                                the assignment needed to be done early because I was targeting a codebase with some ancient C convention on variable declaration required to be before any code

                                                If this is referring to C89 style, then you can declare a variable without assigning it:

                                                int x;
                                                if (...) { return -1; }
                                                x = 123;
                                                
                                                1. 3

                                                  Yeah but I don’t like that for 2 reasons:

                                                  • 2 lines instead of 1
                                                  • I can’t do const int x = … anymore (and I like to use const everywhere because it helps the developer mental model about non-mutability expectations)
                                              5. 4

                                                Good observation. In C/C++ you are intended to check for valid preconditions before you perform an operation that relies on them. In Python and many others, there is a pervasive “look before you leap” idiom because there is no undefined behavior, either it behaves correctly or throws an exception, i.e. every operation is checked beforehand. Could be from an influx of folks into C/C++ from those languages.

                                                For those who don’t understand, C/C++ does it this way because specifying “undefined behavior” allows you to assume that preconditions are valid without having to recheck them on every call, allowing the programmer to be more efficient with the CPU.

                                                1. 3

                                                  In Python and many others, there is a pervasive “look before you leap” idiom because there is no undefined behavior, either it behaves correctly or throws an exception, i.e. every operation is checked beforehand.

                                                  I think “look before you leap” (LBYL) is the opposite of what you’re trying to describe. I’ve usually heard that described as “easier to ask forgiveness than permission” (EAFP).

                                                  1. 2

                                                    My mistake, I meant “leap before you look”

                                                2. 1

                                                  Note that the order of operations doesn’t matter for UB. UB is not an event that happens. Instead, “UB can’t happen” is an assumption that the compiler is free to make, and then move or delete code under that assumption. Mere existence of any UB anywhere in your program, even in dead code that is never executed, is a license to kill for a C compiler.

                                                  1. 1

                                                    even in dead code that is never executed, is a license to kill for a C compiler.

                                                    No, unless you mean that it’s a license to remove the dead code (which the compiler can do anyway).

                                                    If code that would have undefined behaviour when executed is never executed, then it does not trigger the undefined behaviour (by definition).

                                                    1. 1

                                                      Whole-function analysis can have an effect that seems like UB going back in time. For example, the compiler may analyze range of possible values of a variable by checking its every use and spotting 2 / x somewhere. Division by 0 is UB, so it can assume x != 0 and change or delete code earlier in the function based on this assumption, even if the code doesn’t have a chance to reach the 2 / x expression.

                                                      1. 2

                                                        For example, the compiler may analyze range of possible values of a variable by checking its every use and spotting 1 / x somewhere, and then assume x != 0 and change or delete code based on that earlier in the function, even before execution has a chance to reach the 1 / x.

                                                        Yep, but if that 1 / x is in dead code it can’t affect assumptions that the compiler will make for live code. And if the 1 / x is in a particular execution path then the compiler can’t use it to make assumptions about a different path.

                                                        As an example, for:

                                                        if (x == 0) {
                                                            printf("x is zero!\n");    
                                                        }
                                                        
                                                        if (x == 1) {
                                                            printf("1/x = %d\n", 1 / x);
                                                        }
                                                        

                                                        … the compiler will not remove the x == 0 check based on division that occurs in the x == 1 branch. Similarly, if such a division appears in dead code, it can’t possibly affect a live execution path.

                                                        So:

                                                        even in dead code that is never executed, is a license to kill for a C compiler.

                                                        No.

                                                        (Edit, based on your edits): In addition:

                                                        Division by 0 is UB, so it can assume x != 0 and change or delete code earlier in the function based on this assumption,

                                                        Yes, if it is guaranteed that from the earlier code the 2 / x division must be subsequently reached, otherwise no.

                                                        even if the code doesn’t have a chance to reach the 2 / x expression.

                                                        No. As per above example, the compiler cannot assume that because something is true on some particular execution path it is true on all paths.

                                                        If what you were claiming was true, it would be impossible/useless to perform null checks in code. Consider:

                                                        if (p != NULL) {
                                                            *p = 0;
                                                        }
                                                        

                                                        If the compiler can assume that p is not NULL based on the fact that a store to *p exists, it can remove the NULL check, converting the above to just:

                                                        *p = 0;
                                                        

                                                        This is clearly different and will (for example) crash if p happens to be NULL. But a compiler can’t and won’t make that change: https://godbolt.org/z/hzbhqdW1h

                                                        On the other hand if there is a store that appears unconditionally on the same execution path it can and will remove the check, eg.

                                                        *p = 0;
                                                        if (p != NULL) {
                                                            printf("p is not null!");
                                                        }
                                                        

                                                        … for which both gcc and clang will remove the check (making the call to printf unconditional): https://godbolt.org/z/zr9hc7315

                                                        As it happens, neither compiler will remove the check in the case where the store (*p = 0) is moved after the if block, but it would be valid for them to do so.

                                                  2. 1

                                                    I think this is the core of the issue and why people are getting so fired up.

                                                    If you assume that integer operations are sent to the CPU in tact, and the CPU was made in the last 30 years, then checking for overflow after the fact is a single compare.

                                                    If you have to check for the potential for overflow beforehand, the comparison is much more involved. I was curious what it actually looks like and stumbled onto this which implements it in four compares (and three boolean evaluations.)

                                                    At some level, this whole conversation becomes a disagreement about the nature of bounds checking. If you assume bounds checking does not exist (or can be compiled away!) then you can exploit UB to optimize signed arithmetic to improve performance. If you assume bounds checking needs to exist, that UB exploit is a huge negative because it forces much more pessimism to put the bounds check back, making performance worse.

                                                    Then we end up with compiler builtins to perform signed arithmetic deterministically. This is odd because the UB optimization assumes that if the language spec doesn’t require something it’s not needed in an implementation, but the existence of the builtin suggests otherwise. The UB optimization assumes that there’s no value in having a predictable implementation defined behavior, but the builtin is a predictable implementation defined behavior.

                                                    1. 1

                                                      It’s the observation that most of the time overflow is a bug. If you wanted overflow semantics, you should have asked for it specifically. This is how e.g. Zig works.

                                                      1. 1

                                                        Avoiding the overflow requires a bounds check. I interpreted your earlier question being about why these bounds checks often create an overflow in order to perform the check (which is not a bug, it’s integral to the check.) There’s no standards compliant way to request overflow semantics specifically, so that option doesn’t really exist, and doing the check without an overflow is gross.

                                                        If the standard had provided a mechanism to request overflow semantics via different syntax, we probably wouldn’t have such intense discussion.

                                                        1. 1

                                                          I agree, not having a checked add or a two’s complement add is definitely a hole in the standard and should be fixed.

                                                  3. 7

                                                    Funny to see another post about someone getting burned by UB just after helping a friend with their own UB problem.

                                                    It got me to write a blog post about the various falsehoods I’ve heard folks state about how UB can affect your programs: https://predr.ag/blog/falsehoods-programmers-believe-about-undefined-behavior/

                                                    1. 5

                                                      This is a great post, you should submit it as standalone.

                                                      1. 3

                                                        Thank you! It’s late in the evening over here, and I’m about to go to bed. Perhaps in the morning :)

                                                        Feel free to submit it earlier, if you’d like!

                                                    2. 6

                                                      Okay, I wasted some time and godbolted this C code and the equivalent Rust code: https://godbolt.org/z/8aMhdhvx8

                                                      Without optimization the C code works and the Rust code crashes. With optimization the C code crashes and the Rust code works.

                                                      At least if by “works” you’re expecting only mostly portable machine semantics.

                                                      1. 5

                                                        There’s a good chance you already know this, but the standard arithmetic operators in rust are defined to “either panic or wrap” on overflow, so your rust code “works” regardless of machine semantics up to the fact that it might crash in a well defined manner. While the current compiler doesn’t take advantage of this, it’s allowed to panic even in release mode, it isn’t allowed to wrap in debug mode.

                                                        You can get equivalent behaviour to C (i.e. undefined behaviour on overflow) by using unsafe{ x.unchecked_mul(0x1ff) }

                                                        1. 5

                                                          Rust’s take on this is definitely better than C. They benefit both from hindsight and that all architectures that matter use 2s complement ints now.

                                                          I think it’s a problem for both languages that optimization level affects behavior. IMO Rust should either always wrap or always panic.

                                                          1. 3

                                                            I agree but eliding the overflow checks can make a big difference for performance. My experience with Rust is that compiling anything that involves a lot of number-crunching in debug-mode results in it being painfully slow. Though it’s been a while since I’ve done this seriously, might be worth revisiting.

                                                            What’s really needed is better overflow checks, imo. Some architectures like Itanium and PowerPC have instructions for math operations that saturate the overflow/carry flag, rather than setting or clearing it. That way you can do a big long chain of operations, then check whether it overflowed anywhere inside the chain with one instruction. As far as I know x86_64 and RISC-V don’t have these; not sure about Arm/Aarch64.

                                                            1. 3

                                                              For what it’s worth, you can enable overflow checks and optimization in rust simultaneously. I haven’t tried it in number crunching code specifically, but in general it is nearly as fast as rust without overflow checks.

                                                              I believe android enables them by default even in release builds.

                                                              1. 3

                                                                Yeah might be an interesting thing to explore semi-seriously sometime. I spent 10 minutes fiddling around with old code I thought I’d seen this behavior on, then went “oh wait this is all floating point math anyway, integer overflows probably aren’t the bottleneck”.

                                                              2. 1

                                                                ARM apparently has saturating instructions: https://developer.arm.com/documentation/dui0068/b/ARM-Instruction-Reference/ARM-saturating-arithmetic-instructions

                                                                Not sure if subsequent instructions would clear the Q flag, but i bet some trickery with conditional instructions (not branches) could be used…

                                                        2. 6

                                                          I compile all my C code with -fno-strict-overflow and -fno-strict-aliasing, and I recommend you to do so as well. C standard committee and GCC and Clang are being stupid, but that does not mean you should suffer their stupidity.

                                                          1. 4

                                                            And I wrote a library, and I have zero control over how it will be compiled… I have to assume users will use the most unsafe flags allowed by the standard.

                                                            1. 3

                                                              I truly pity you.

                                                          2. 3

                                                            This seems like a case of the use of UB in optimisation being taken a bit too far: “just because you can doesn’t mean you should’

                                                            If an optimisation is likely to cause bugs and confusion maybe it’s better not to do that optimisation, even if it’s theoretically allowed.

                                                            1. 14

                                                              This is… an understandable but poorly reasoned view.

                                                              The compiler isn’t performing any insane leaps of logic here, or deliberately trying to break anything. What you’re seeing is just the cumulative effect of many relatively simple optimisation passes applied in succession. Even very simple optimisations, when applied in combination, can produce unintuitive results, and that’s not a bug: it’s a feature! It’s why your programs run fast.

                                                              If we were to try to draw a line in the sand and try to rigorously define what optimisations count as being ‘too clever’ then we’d need to come up with a specification: what things are okay for the compiler to exploit, and what aren’t?

                                                              Guess what: we already have this specification, and it’s called the C standard! Your complaint seems less about the compiler and more about what the standard permits the compiler to do.

                                                              1. 10

                                                                Your complaint seems less about the compiler and more about what the standard permits the compiler to do.

                                                                I will complain about the compilers! Or, rather, I will say that I think this article is entirely correct when it says that the ideas which have been built up over the years, by compiler authors, about the definition of UB are not supported by the text of the standard and are at best based in a misreading of the standard.

                                                                The specific quotation that matters here is:

                                                                Undefined behavior — behavior, upon use of those particular nonportable or erroneous program constructs, or of erroneous data, or of indeterminately-valued objects, for which the Standard imposes no requirements.

                                                                Compiler authors treat this as if it reads:

                                                                BEGIN DEFINITION OF TERM “UNDEFINED BEHAVIOR”

                                                                behavior, upon use of those particular nonportable or erroneous program constructs, or of erroneous data, or of indeterminately-valued objects

                                                                END DEFINITION OF TERM “UNDEFINED BEHAVIOR”

                                                                BEGIN DEFINITION OF HANDLING OF UNDEFINED BEHAVIOR

                                                                the Standard imposes no requirements.

                                                                END DEFINITION OF HANDLING OF UNDEFINED BEHAVIOR

                                                                But getting to that interpretation requires torturing the plain English text. It’s clear that the correct, intended interpretation is:

                                                                BEGIN DEFINITION OF TERM “UNDEFINED BEHAVIOR”

                                                                behavior, upon use of those particular nonportable or erroneous program constructs, or of erroneous data, or of indeterminately-valued objects, for which the Standard imposes no requirements.

                                                                END DEFINITION OF TERM “UNDEFINED BEHAVIOR”

                                                                In other words, “for which the Standard imposes no requirements” is part of the definition of UB, not of the handling of UB. Or: rather than “UB means the Standard doesn’t impose requirements on handling”, it’s “a thing is only UB if the Standard hasn’t imposed requirements for how to handle that thing”.

                                                                It becomes even more clear, as noted in the linked article, when you notice that the Standard immediately follows this up with a paragraph giving permissible ways to handle undefined behavior. If the intent was that Standard “imposes no requirements” on handling UB, why does it have a paragraph listing permissible ways to handle UB? The fact that the Standard was hackily edited for C99 (again, refer to the article) to try to make that followup paragraph non-normative is even more evidence.

                                                                But the edit to that paragraph also drives home the fact that the text of the Standard doesn’t matter and never did. Compiler authors effectively invented their own language that wasn’t C as defined by the Standard, told everyone it was good to have this break from the Standard because they could use it to “optimize” programs, and then ad-hoc changed the Standard years later to make the compiler authors’ approach no longer a violation.

                                                                And this gets back to something I sort of joke about but don’t find funny, which is that the expansive, Standard-violating definition of UB favored by compiler authors has ensured that it is effectively impossible to write a non-trivial C program, or even many trivial ones, without UB, which means that a “compiler” which simply emits a no-op executable for every input is arguably compliant.

                                                                1. 8

                                                                  In other words, “for which the Standard imposes no requirements” is part of the definition of UB, not of the handling of UB. Or: rather than “UB means the Standard doesn’t impose requirements on handling”, it’s “a thing is only UB if the Standard hasn’t imposed requirements for how to handle that thing”.

                                                                  There are things that the standard specifically says is undefined behaviour. Eg the example from that very definition:

                                                                  EXAMPLE An example of undefined behavior is the behavior on integer overflow.

                                                                  The article you’ve linked uses that line of reasoning to somehow say that there is some behaviour mandated for integer overflow, despite it being right there in the example that it has undefined behaviour.

                                                                  The behaviour mandated, according to Yodaiken’s article, is:

                                                                  Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message)

                                                                  (The article argues that the original “permissible” had different semantics than the “possible” which replaced it, and that the “original” permissible was more correct, even though it was intentionally changed. I hope it’s clear why that’s a tenuous position to take). It’s right there in that text:

                                                                  ignoring the situation completely with unpredictable results

                                                                  That’s exactly what we’re seeing when integer overflow causes what the OP’s article calls “wild” behaviour - the compiler ignores the situation (that overflow happened) completely and optimises assuming it didn’t happen. Victor’s post twists this to mean that the compiler should “ignore” that the undefined behaviour was triggered and produce a result for the operation. He ignores the “unpredictable results” clause.

                                                                  In other words, “for which the Standard imposes no requirements” is part of the definition of UB, not of the handling of UB

                                                                  You can’t have one without the other. If it imposes requirements on the handling, then it imposes requirements. And then it would not be the case that it “imposes no requirements”.

                                                                  In any case, there is more detailed explanation of what constitutes undefined behavior elsewhere in the text (chapter 4):

                                                                  If a ‘‘shall’’ or ‘‘shall not’’ requirement that appears outside of a constraint is violated, the behavior is undefined. Undefined behavior is otherwise indicated in this International Standard by the words ‘‘undefined behavior’’ or by the omission of any explicit definition of behavior. There is no difference in emphasis among these three; they all describe ‘‘behavior that is undefined’’

                                                                  So “undefined behaviour” also means “the behaviour is undefined”. It’s incredibly tenuous to claim that there are in fact requirements on the behaviour imposed, and that furthermore these requirements are stated only in the “definitions” section of the document (which in general does not specify requirements at all).

                                                                  It’s also tenuous to claim that the “undefined behaviour” of integer overflow should just result in an implementation-defined value, despite the fact that there is also a defined term “implementation-defined value” which could have been used to describe that case.

                                                                  I.e. your argument is that integer overflow is an example of undefined behaviour and so should have implementation-determined value, despite the fact that the standard makes it quite specific in other cases when a value is implementation-determined but does not do so for integer overflow.

                                                                  getting to that interpretation requires torturing the plain English text.

                                                                  I strongly disagree and would say the same about the interpretation you’re arguing for, with the evidence that I’ve stated.

                                                                  1. 2

                                                                    You can’t have one without the other. If it imposes requirements on the handling, then it imposes requirements. And then it would not be the case that it “imposes no requirements”.

                                                                    The “imposes no requirements” should logically be read as meaning no other part of the Standard tells you how to handle this thing. Other parts of the Standard might sometimes have instructions for how to handle, say, an indeterminately-valued object, which would mean that for the situations where the Standard does provide instructions, it’s not UB.

                                                                    Hence this must be read as part of the definition of UB, not part of the handling of UB. The following paragraph tells you what to do with it, and this is not a contradiction.

                                                                    I strongly disagree and would say the same about the interpretation you’re arguing for, with the evidence that I’ve stated.

                                                                    Once again, the sentence is:

                                                                    Undefined behavior — behavior, upon use of those particular nonportable or erroneous program constructs, or of erroneous data, or of indeterminately-valued objects, for which the Standard imposes no requirements.

                                                                    The “official” reading says to split this into everything before the final comma and everything after the final comma, and to treat the former as the definition of UB and the latter as the handling of it. There is absolutely no world in which that is the natural plain-English reading of that sentence. And, again, if the intent was to have that be the meaning, then there would be no reason to ever have the following paragraph, let alone to originally have that paragraph be normative.

                                                                    I admit that the expansive definition of UB and how to handle it is a fait accompli at this point, but I wish its proponents would be honest and admit that getting there required willful violation of the standard.

                                                                    1. 2

                                                                      The “imposes no requirements” should logically be read as meaning no other part of the Standard tells you how to handle this thing.

                                                                      But it doesn’t say that at all. You’re inserting “other part”; it’s not there, and was never there, in the text. The text says, “for which this standard imposes no requirements”. If the following part were in fact imposing requirements on the handling or any other aspect, then it would certainly be contradicting itself.

                                                                      And, again, if the standard was indeed intending to impose requirements on how undefined behaviour should be “handled”, the correct place to do that would certainly not be in the definitions section.

                                                                      And, again, finally, one of the “permissible” (or “possible”) behaviours is “ignoring the situation completely with unpredictable results”, which can easily be read to mean exactly what compilers are currently doing. Even if you think the intention was to place requirements on the handling of UB, this allowance seems by itself to remove any requirements. And certainly, “unpredicatable results” doesn’t match with the notion that overflow should wrap (or otherwise match the behaviour of arithmetic instructions on the underlying hardware platform).

                                                                      There is absolutely no world in which that is the natural plain-English reading of that sentence

                                                                      I strongly disagree (and so do many others).

                                                                      if the intent was to have that be the meaning, then there would be no reason to ever have the following paragraph

                                                                      To elaborate. Admittedly, it seems like it could’ve been a NOTE rather than normative text, but it also hardly seems to matter. It wouldn’t be the only place where not-strictly-needed normative text was present.

                                                                      I wish its proponents would be honest and admit that getting there required willful violation of the standard.

                                                                      I feel like that’s designed to be inflammatory. Can you give a constructive response to the above criticisms of your standpoint? You have ignored these points in your reply; this feels more like dishonesty to me than not “admitting” that there was a “willful violation of the standard”.

                                                                      1. 3

                                                                        It’s not intended to be inflammatory. It just is what it is. The linked article made the argument pretty thoroughly, and I can’t think of any other situation in English where a similar sentence construction is naturally read in the way the definition of UB is allegedly meant to be read. The simple truth is that compiler authors did what they believed was most convenient for their purposes, it’s at odds with what the standard originally said, and now we’re stuck with it as a deeply-embedded part of C.

                                                                        I just wish there were more honesty about it, and less condescension (which is inflammatory) in these threads toward people who don’t like the status quo. Lots of things that compilers rely on being UB could instead have been implementation-defined or otherwise had clear semantics. It wouldn’t be the end of all performance forever. There’s no logically-necessary reason why C compilers behave the way they do – it’s just historical inertia that could have been avoided, and in more recent and better-designed languages is avoided.

                                                                        1. 2

                                                                          Similar to how law cannot be read as common English but instead interpreted through decades and centuries of clarification, precedent, and interpretation, the same is true of a standard document. Standard English just isn’t good enough to describe things unambiguously enough to have uncontestable meaning. The standard means what people say the standard means and if the creators of the original standard want to clarify as otherwise, they should speak out and clarify their intentions.

                                                                          1. 1

                                                                            The linked article made the argument pretty thoroughly

                                                                            It didn’t, though. It doesn’t address the points I raised (and nor have you).

                                                                            I can’t think of any other situation in English where a similar sentence construction is naturally read in the way the definition of UB is allegedly meant to be read

                                                                            It doesn’t require the strange reading that you use to come to the conclusion that the definition of UB. The error in reading is yours; you think that “handling” and “behaviour” are two separate things, but they aren’t; furthermore if the definition specifies (as it does) that part of the definition of undefined behaviour is that the standard imposes no requirements on it then this means that the standard can impose no requirements on it - nor any aspect of it such as “handling” (although as I’ve said, this makes no sense anyway) - since that would be a contradiction.

                                                                            Once more, with emphasis added:

                                                                            Undefined behavior — behavior, upon use of a nonportable or erroneous program construct, or of erroneous data, or of indeterminately-valued objects, for which the Standard imposes no requirements.

                                                                            Undefined behaviour is the behaviour when the erroneous construct (etc) is used, it is not the use itself. If “the standard imposes no requirements” on the behavior then it cannot then, in the next paragraph which is part of the standard, impose requirements on that behaviour. This notion that “it is now going to separately talk about the handling of such behaviour” is wrong because (a) the behaviour is the handling and (b) it’s already clearly stated that there are no requirements on such. But also, perhaps most importantly, (c), the “permissions” (one of them in particular) for “handling” are so general as to impose no requirements anyway, and very specifically contradict the integer overflow behaviour that you argue for (via the “unpredictable results” phrasing). You consistently fail to address this. The article by Yodaiken fails to address it also.

                                                                            The simple truth is that […]

                                                                            No, it’s just not. (We can both play that game - just stating something with an air of authority doesn’t make it true.)

                                                                            1. 1

                                                                              You and I did not get to consensus when we did this a year ago, so I have low hopes for it now.

                                                                              wrong because (a) the behaviour is the handling and (b) it’s already clearly stated that there are no requirements on such

                                                                              And I strongly disagree with (b), and last time we went round and round on this you seemed to understand why. The plain English reading of the Standard’s text does not lead to the idea that the Standard “imposes no requirements” on UB, and even if it did, then it makes no sense whatsoever for the very next paragraph to be a normative list of permissible ways to handle UB. I don’t know any plainer way to put this, and if you can’t see it, we should just stop talking past each other.

                                                                              1. 2

                                                                                I don’t know any plainer way to put this, and if you can’t see it, we should just stop talking past each other.

                                                                                And to this I somewhat agree, but it’s hard not to respond to you when you claim that your favoured interpretation is a simple truth and further that anyone who takes the other viewpoint is somehow being dishonest - claims that you actually made.

                                                                                1. 1

                                                                                  And I strongly disagree with (b), and last time we went round and round on this you seemed to understand why.

                                                                                  No, you need to read the post you’ve linked again. I understood what you believed; but re-iterated that I thought it wasn’t right:

                                                                                  While that makes the whole argument make a little more sense, I don’t think the text supports this; the actual text does not use the “other part” wording at all (it’s unorthodox to put normative requirements in the “terms” section anyway), and I think my other points still hold: the “permissible undefined behaviour” from C89 specifically includes an option which allows for “unpredictable results”, i.e. the “required behaviour” is not requiring any specific behaviour anyway.

                                                                                  You have still never addressed that.

                                                                                  it makes no sense whatsoever for the very next paragraph to be a normative list of permissible ways to handle UB

                                                                                  … which is why it only makes sense to interpret that next paragraph as a potentially non-exhaustive list of behaviours that, anyway, seems to impose no real requirements since, once more, they allow the compiler to “ignore the situation completely, with unpredictable results”.

                                                                                  That last point, you continue again to overlook. I’ve brought it up repeatedly in this thread, and you never address it.