1. 12
  1.  

  2. 10

    I’m a bit puzzled why the author seems to think that integer wrap on overflow behaviour has something to do with C and undefined behaviour. The same thing happens with nearly all languages which use the processor’s integer arithmetic, because those semantics are provided by the processor itself. Java, C#, etc. all wrap on overflow. There are some exceptions though - Ada provides the “exception on overflow” semantics the author prefers, but it does come with a significant performance penalty because checking for overflow requires additional instructions after every arithmetic operation.

    The point here is that if you want performant arithmetic it’s all about what the processor is designed to do, not anything to do with the languages. Java defines integer wrap as the language’s standard behaviour but as a result it incurs a performance penalty for integer arithmetic on processors which don’t behave this way. C doesn’t incur this penalty because it basically accepts that overflow works however the processor implements it. And let’s face it if your program is reliant on the exact semantics of overflowing numbers you’re probably doing it wrong anyway.

    There are some processors which provide interrupts on integer overflow. This eliminates the performance penalty associated with overflow checks if your language is Ada and so you want to trap on overflow. There are other semantics around too - DSP processors often have “clamp on overflow” instead since that suits the use case better and old Unisys computers use ones complement rather than twos complement so their overflow behaves slightly differently.

    1. 4

      Performance penalty of “trap on overflow” can be reduced by clever modeling, for example by allowing delayed trap instead of immediate trap. As-if Infinitely Ranged is one such model. Immediate trap disallows optimizing a+b-b to a, because if a+b overflows the former traps and the latter doesn’t. Delayed trap allows such optimization.

      1. 3

        I’m a bit puzzled why the author seems to think that integer wrap on overflow behaviour has something to do with C and undefined behaviour.

        You are mixing up underlying behaviour of the processor with defined (or un-defined) behaviour of the language. Wrap on integer overflow is indeed the natural behaviour of most common processors, but C doesn’t specify it. The post is saying that some people have argued that wrap-on-overflow should be the defined behaviour of the C language, or at least the implementation-defined behaviour implemented by compilers, and then goes on to provide arguments against that. There is a clear example in the post of where behaviour of a C program doesn’t match that of 2’s complement arithmetic (wrapping).

        The same thing happens with nearly all languages which use the processor’s integer arithmetic, because those semantics are provided by the processor itself.

        That’s the point - in C, it doesn’t happen.

        1. 1

          I don’t get the point. The advantage of using integer wrap for C on processors that implement integer wrap is that it is high performance, simplifies compilation, has clear semantics, and is the semantics programmers expect. If you want to argue that it should be e.g. trap on overflow, you need to provide a reason more substantive than theoretical compiler optimizations that are shown by hand waving. The argument that it should be “generate code that overflows but pretend you don’t” you needs a stronger justification because the resulting semantics are muddy as hell. I’m actually in favor of a debug mode overflow trap for C but a optimized mode of use processor semantics.

          1. 1

            you need to provide a reason more substantive than theoretical compiler optimizations that are shown by hand waving

            Read the post, then; there are substantive reasons in it. I’m not engaging with you if you’re going to start by misrepresenting reasoned arguments as “hand waving”.

            1. 0

              “However, while in many cases there is no benefit for C, the code generation engines and optimisers in compilers are commonly general and could be used for other languages where the same might not be so generally true; “

              Ok! You think that’s a substantive argument.

              1. 1

                You’re making a straw man. What you quoted is part of a much larger post.

                1. 1

                  That’s not what “straw man” means.

                  1. 1

                    It means that you’re misrepresenting the argument, which you are. I said that the post contained substantive reasons, you picked a particular part and insinuated that I had claimed that that particular part on its own constituted a substantive reason, which I didn’t. And: you said “If you want to argue that it should be e.g. trap on overflow, you need to provide a reason more substantive than theoretical compiler optimizations that are shown by hand waving” but optimisations have nothing very little to do with trapping being a better behaviour than wrapping, and I never claimed they did, other than to the limited extent that trapping potentially allows some optimisations which wrapping does not. But that was not the only reason given for trapping being a preferable behaviour; again, you mis-represented the argument.

        2. 2

          I’m a bit puzzled why the author seems to think that integer wrap on overflow behaviour has something to do with C and undefined behaviour.

          They are related, yes. E.g. whilst signed integer overflow is well defined in most individual hardware architectures (usually as a two’s compliment wrap), it could vary between architectures, and thus C leaves signed integer overflow undefined.

          1. 1

            The whole argument is odd.

          2. 9

            wrap comes naturally from hardware implementation of numbers.

            but certainly a programming language can throw an exception when the carry bit comes on.

            1. 1

              wrap comes naturally from hardware implementation of numbers.

              Correct for most platforms, but different platforms may have different integer implementations (for example some DSPs have saturating integers which do not wrap at all). As I understand (and OP can verify) the C spec leaves signed integer overflow undefined to allow for the whole range of signed integer wrap/saturation behaviours.

              It’s true that on the the popular architectures that signed integers do indeed wrap. So why does the C spec not make that the norm for x86_64 and ARM? Well, because then your C programs wouldn’t be portable.

              It’s also interesting that unsigned integer overflow is defined to wrap in C, yet unsigned integers can saturate on some platforms! Madness.

              but certainly a programming language can throw an exception when the carry bit comes on.

              Well, OP is talking about C…

            2. 3

              Wrap on integer overflow is a good idea, because it will get rid of one of undefined behavior in C.

              Undefined behavior is evil. One evil it causes is that it makes codes optimization-unstable. That is, something can work in debug build but does not work in release build, which is very undesirable. The article does not address this point at all.

              1. 1

                The article does not address this point at all.

                To remove all undefined behaviour in C would severely impact the performance of C programs.

                The post does suggest that trap-on-overflow is a superior alternative to wrap-on-overflow, and of course you could apply it to both debug and release builds if this optimisation instability concerns you. Even if you apply it just to your debug build, you’ll at least avoid the possibility that something works in the debug build but not in the release.

                1. 2

                  Note that Rust wraps on overflow and as far as I can tell this does not impact performance of Rust programs.

                  1. 1

                    This is essentially the same as one of the arguments I addressed in the post. Although in certain types of programs, particularly lower-level languages (like C and Rust) where the code is written directly by a human programmer, there probably are not going to be many cases where the optimisations enabled by having overflow be undefined will kick in. However, if the program makes heave use of templates or generated code, or is produced by transpiling a higher-level language with a naive transpiler, then it could do (I’ll conceded this is theoretical in that I can’t really give a concrete example). The mechanism by which the optimisation works is well understood and it isn’t too difficult to produce an artificial example where the optimisation grants a significant speed-up in the generated code.

                    Also, in the case of Rust programs, you can’t really reliably assess the impact of wrapping on overflow unless there is an option to make overflow undefined behaviour. Is there such an option in Rust?

                    1. 2

                      No, there is no such option, and there never will be. Rust abhors undefined behaviors. Performance impact assessment I had in mind was comparison with C++ code.

                      On the other hand, rustc is built on LLVM so it is rather trivial to implement: rustc calls LLVMBuildAdd in exactly one place. One can replace it with LLVMBuildNSWAdd (NSW stands for No Signed Wrap).

                  2. 0

                    To remove all undefined behaviour in C would severely impact the performance of C programs.

                    This cannot be entirely true. As a reducto ad absurdum, it would be possible in principle to laboriously define all the things that compilers currently do with undefined behaviour and make that the new definition of the behaviour, and there would then be zero performance impact.

                    C compiler writers might argue that removing all undefined behaviour without compromising performance would be prohibitively expensive, but I’m not entirely convinced; there are carefully-optimized microbenchmarks on which the naive way of defining currently-undefined behaviour produces a noticeable performance degradation, but I don’t think it’s been shown that that generalises to realistic programs or to a compiler that was willing to put a bit more effort in.

                    1. 2

                      This cannot be entirely true. As a reducto ad absurdum, it would be possible in principle to laboriously define all the things that compilers currently do with undefined behaviour and make that the new definition of the behaviour, and there would then be zero performance impact

                      Clearly that would be absurd, and it’s certainly not what I meant by “remove all undefined behaviour”. Your “possible in principle” suggestion is practically speaking completely impossible, and what I said was true if you don’t take such a ridiculously liberal interpretation of it. Let’s not play word games here.

                      C compiler writers might argue that removing all undefined behaviour without compromising performance would be prohibitively expensive

                      They might, but that’s not what I argued in the post.

                      I don’t think it’s been shown that that generalises to realistic programs or to a compiler that was willing to put a bit more effort in.

                      There’s no strong evidence that it does, nor that it wouldn’t ever do so.

                      1. 1

                        Well then, what did you mean? You said “To remove all undefined behaviour in C would severely impact the performance of C programs.” I don’t think that’s been demonstrated. I’m not trying to play word games, I’m trying to understand what you meant.

                        1. 1

                          I meant “remove all undefined behaviour” in the sense and context of this discussion, in particular related to what @sanxiyn above says:

                          Undefined behavior is evil. One evil it causes is that it makes codes optimization-unstable. That is, something can work in debug build but does not work in release build, which is very undesirable

                          To avoid that problem, you need to define specific behaviours for cases that are currently specify undefined behaviour (not ranges of possible behaviour that could change depending on optimisation level). To do so would significantly affect performance, as I said.

                          1. 1

                            I could believe that removing all sources of differing behaviour between debug and release builds would significantly affect performance (though even then, I’d want to see the claim demonstrated). But even defining undefined behaviour to have the current range of behaviour would be a significant improvement, as it would “stop the bleeding”: one of the insidious aspects of undefined behaviour is that the range of possible impacts keeps expanding with new compiler versions.

                            1. 2

                              I could believe that removing all sources of differing behaviour between debug and release builds would significantly affect performance

                              It’s not just about removing the sources of differing behaviour - but doing so with sensibly-defined semantics.

                              though even then, I’d want to see the claim demonstrated

                              A demonstration can only show the result of applying one set of chosen semantics to some particular finite set of programs. What I can do is point out that C has pointer arithmetic and this is one source of undefined behaviour; what happens if I take a pointer to some variable and add some arbitrary amount, then store a value through it? What if doing so happens to overwrite part of the machine code that makes up the program? Do you really suppose it is possible to practically define what the behaviour should be in this case, such that the observable behaviour will always be the same when the program is compiled with slightly different optimisation options - which might result in the problematic store being to a different part of code? To fully constrain the behaviour, you’d need pointer bounds checking or similar - and that would certainly have a performance cost.

                              But even defining undefined behaviour to have the current range of behaviour would be a significant improvement, as it would “stop the bleeding”

                              As I’ve tried to point out with the example above, the current range of undefined behaviour is already unconstrained. But for some particular cases of undefined behaviour, I agree that it would be better to have more restricted semantics. For integer overflow, in particular, I think it could reasonably be specified that the result becomes unstable (eg. behaves incosistently in comparisons), but the behaviour is otherwise defined - for example. Note that even this would impede some potential optimisations. (And that I still advocate trap on overflow as the preferred implementation).

                      2. 2

                        I suspect that one issue is that compilers may manifest different runtime behaviour for undefined behaviour, depending on what specific code the compiler decided to generate for a particular source sequence. In theory you could document this with sufficient effort, but the documentation would not necessarily be useful; it would wind up saying ‘the generated code may do any of the following depending on factors beyond your useful control’.

                        (A canonical case of ‘your results depend on how the compiler generates code’, although I don’t know if it depends on undefined behaviour, is x86 floating point calculations, where your calculations may be performed with extra precision depending on whether the compiler kept everything in 80-bit FPU registers, spilled some to memory (clipping to 64 bits or less), or used SSE (which is always 64-bit max).)

                        1. 1

                          It’s not only possible: Ive seen formal semantics that define various undefined behaviors just like you said. People writing C compilers can definitely do it if they wanted to.

                    2. 2

                      For the trivial case where wrapping behaviour does allow simply detecting overflow after it occurs, it is also straightforward to determine whether overflow would occur, before it actually does so. The example above can be rewritten as follows:

                      If it’s really so trivial then why not have the compiler do that rewrite? Given that the “wrong” version is valid language syntax, programmers will write it and compilers will have to compile it; no amount of encouraging programmers to rewrite gets us away from having to decide what the compiler should do when fed the “wrong” code.

                      An obvious mitigation for the problem of programmers expecting this particular behaviour is for the compiler to issue a warning when it optimises based on the alternative undefined-behaviour-is-assumed-not-to-occur semantics.

                      Building for maximum performance and warning about a correctness violation seems like the wrong priority. Why not build the code to behave in the way that you’re sure matches the programmer’s intent and warn about the missed optimisation opportunity?

                      Also, even without overflow check elimination, it is not necessarily correct to assume that wrapping integers has minimal direct cost even on machines which use 2’s complement representation. The Mips architecture, for example, can perform arithmetic operations only in registers, which are fixed size (32 bit). A “short int” is generally 16 bits and a “char” is 8 bits; if assigned to a register, the underlying width of a variable with one of these types will expand, and forcing it to wrap according to the limit of the declared type would require at least one additional operation and possibly the use of an additional register (to contain an appropriate bitmask). I have to admit that it’s been a while since I’ve had exposure to any Mips code and so I’m a little fuzzy on the precise cost involved, but I’m certain it is non-zero and other RISC architectures may well have similar issues.

                      It would be good to permit trapping. It would be good to permit whatever the native MIPS behaviour is. But it’s obviously absurd to permit optimizing out the programmer’s overflow check.

                      How about: “An expression in which signed integer overflow occurs shall evaluate to an implementation-defined value and may also cause the program to receive a signal”? In conjunction with the rules in 5.1.2.3.5 this still permits the compiler to trap, still permits the compiler to use the machine behaviour (twos-complement wrapping, some other form of wrapping, saturating or what-have-you), and still permits the compiler to reorder arithmetic operations (provided they don’t cross a sequence point), but rules out craziness like the elimination of overflow checks.

                      1. 3

                        If it’s really so trivial then why not have the compiler do that rewrite?

                        Because the compiler doesn’t know what was intended. And I sure as heck don’t want the compiler re-writing any of my code to what it “thinks” I intended. And automatically “fixing” it in some cases but not others (less trivial) will likely lead to confusion.

                        Why not build the code to behave in the way that you’re sure matches the programmer’s intent and warn about the missed optimisation opportunity?

                        The compiler can’t be sure what the programmer intended.

                        You can’t be sure that the code will match the programmer’s intent. Maybe that “overflow check” really is redundant. Maybe the code is generated. Maybe the “overflow check” is only obviously redundant after several other optimisation passes have taken effect.

                        Giving a warning lets the programmer decide: Is this really ok or did I make a mistake? “Fixing” it for them pesimises without any way to undo that pesimisation, except in the very trivial cases, which, again, might be in generated source code.

                        It would be good to permit trapping. It would be good to permit whatever the native MIPS behaviour is. But it’s obviously absurd to permit optimizing out the programmer’s overflow check.

                        Both trapping and the MIPS behaviour (where a value in an integer of some type can be stored in a register wider than that type, which can incidentally also be done on just about any architecture) would make leaving the overflow check in absurd - because it wouldn’t achieve what it was intended to achieve anyway (even if the compiler could determine the intent).

                        1. 2

                          In Ada, you can tell the compiler what behavior you want for integer overflow (or checks to catch it). Are there compiler hints in C for that or other undefined behavior that make the result predictable?

                          1. 2

                            -fwrapv

                            1. 1

                              Thanks! Ill look it up.

                          2. 2

                            Because the compiler doesn’t know what was intended. And I sure as heck don’t want the compiler re-writing any of my code to what it “thinks” I intended. And automatically “fixing” it in some cases but not others (less trivial) will likely lead to confusion.

                            It would be hard to do worse than deleting overflow checks that the programmer wrote into the code, but only sometimes, which is the current behaviour. Undefined behaviour practically guarantees all the things that you’ve just said you don’t want. (Indeed, the compiler is already permitted to behave in the way I’ve suggested it should - it’s just also permitted to do other, less helpful and more confusing, things).

                            (I do agree that it’s not actually trivial for the compiler to fix these cases - my point was if it’s not so trivial for the compiler, it’s not so trivial for the programmer either. So “the programmer should just rewrite their code so the problem doesn’t happen” isn’t a good answer.)

                            You can’t be sure that the code will match the programmer’s intent. Maybe that “overflow check” really is redundant. Maybe the code is generated. Maybe the “overflow check” is only obviously redundant after several other optimisation passes have taken effect.

                            Indeed you can’t be sure, which is why a responsible compiler should fail-safe rather than fail-dangerous. A missed optimization opportunity is much better than a security bug.

                            Both trapping and the MIPS behaviour (where a value in an integer of some type can be stored in a register wider than that type, which can incidentally also be done on just about any architecture) would make leaving the overflow check in absurd - because it wouldn’t achieve what it was intended to achieve anyway (even if the compiler could determine the intent).

                            Trapping achieves what the programmer usually intends - the programmer usually wants the function to abort/error when overflow occurs. Certainly it avoids the worst possible outcome, the one thing we can be certain that the programmer didn’t intend - blindly continuing to execute after overflow.

                            On second thoughts you’re right about MIPS. The programmer almost certainly never intended for an integer to be operated on at a certain width and then truncated at some mysterious later point in the execution of their program. That is so rarely an intended behaviour that I don’t think any responsible compiler should implement it without being very explicitly instructed to.

                            1. 3

                              Undefined behaviour practically guarantees all the things that you’ve just said you don’t want.

                              No, it doesn’t. In making this claim, you are implying that code which invokes undefined behaviour still has defined semantics. Certainly, the compiler won’t guess what was intended and then try to use undefined behaviour to try and implement that; it assumes that what it was told was intended (i.e. what was expressed by the code, according to the semantics of the language) is what was intended. That’s a reasonable assumption to make given the nature and purpose of a compiler.

                              It would be hard to do worse than deleting overflow checks that the programmer wrote into the code, but only sometimes, which is the current behaviour.

                              I agree that, if the compiler could divine that some check (which it would otherwise optimise away) is meant to be a wrapping overflow check for some particular preceding operation, it would be nice if the compiler could issue a strong warning. I do not believe it should ever silently “fix” the problem by applying wrapping semantics to the relevant preceding operations, because that is going to lead to C programmers increasingly believing that overflow does have wrapping behaviour, and I don’t think that is a good idea even if wrapping was the ideal overflow behavior.

                              One question is, would assuming that most/all code that fits that general pattern is an indication that wrapping behaviour was required by previous operations and compiling accordingly (with a warning), have a negative impact on optimisations and performance? I think the answer is “probably not for most existing programs”, but I also think it could clearly have a significant impact potentially. This could be extended generally to a question about whether wrapping semantics generally could impact optimisation (which I think has the same answer).

                              Another question is, is wrapping behaviour generally useful, other than for purpose of these overflow checks? I’ve address that in the post; I think the answer is clearly no.

                              However, is wrapping better behaviour (disregarding performance) than undefined behaviour? From a safety perspective it may be slightly better, since the effect of overflow bugs is more constrained, but the bugs can still happen and can still be exploited. This is anecdotal, but most of the overflow-related security holes that I’ve seen, other than the small number due to compiler-removed post-overflow checks, have been directly caused by the wrapping and not by any other associated undefined behaviour.

                              So:

                              • I’m not convinced that wrapping semantics will never significantly impact performance in real programs
                              • I don’t see wrapping as a useful behaviour, other than for implementing erroneous post-overflow checks, which can and should anway be re-written as correct pre-overflow checks
                              • I’m not convinced that wrapping on overflow is much better than UB on overflow, in practice (yes, in theory UB can do very nasty things. But for this particular case of UB, practically speaking, the effects are usually constrained). But let’s assume that we want to avoid arbitrary UB. In that case:
                              • Trap-on-overflow is a superior alternative because it’s safer, except in cases where performance is critical (and in those cases, wrapping might not be the right choice either).

                              Getting back to what you said, the only way to not delete any overflow checks is to enforce wrapping semantics everywhere, and I disagree with that due to the points above.

                              Indeed you can’t be sure, which is why a responsible compiler should fail-safe rather than fail-dangerous. A missed optimization opportunity is much better than a security bug.

                              Agreed - that’s why trapping and not wrapping is the right behaviour.

                              Thanks for bringing up some reasonable discussion. I don’t think there’s a totally objective right/wrong answer at this point - but I hope you see some validity in the points I’ve raised above.

                              (some small edits made after posting to improve readability).