Regardless of whether the intended effect is reasonable or not, I don’t think this one-word change would really help. There’s still way too much ambiguity:
is the list of “possible(/permissable)” behaviours list supposed to be exhaustive? The words “ranges from” suggest not.
is “ignoring the situation completely with unpredictable results” any different from what optimising compilers currently do in the presence of undefined behaviour? I’d argue it’s not - they ignore the possibility that the undefined behaviour can be triggered, with unpredictable results when it is.
There’ve been a number of posts recently trying to argue that C should essentially do away with undefined behaviour; I think it’s time for people to move on and accept that the undefined behaviour has been inherent in the standard for some time, and made use of for optimisation by compilers for some (slightly lesser) time, and it’s here to stay. Code which relied on particular integer overflow behaviour, or aliasing pointers with incompatible types, or so on, was never really correct C - it’s just the compiler once (or at least usually) generated code which did what the code author intended. Now people are getting upset that they can’t use certain techniques they once did. In some cases this isn’t ideal - I’ll grant that there needs to be a simple way in standard C to detect overflow before it happens, and there currently isn’t - but it’s time to accept and move on. Other languages provide the semantics you want, and compiler switches allow for non-standard C with those semantics too; use them, and stop these endless complaints.
As for making the overflow behaviour “sane”, the notion that you could add two positive integers and then meaningfully check whether the result was smaller than either was bat-shit crazy to begin with.
As for making the overflow behaviour “sane”, the notion that you could add two positive integers and then meaningfully check whether the result was smaller than either was bat-shit crazy to begin with.
Wow, so all that work in finite field theory is bat-shit crazy?
The C standard defines “int” as fixed length binary strings representing signed integers and even has a defined constant max value. C ints are not bignums and C does not ask the compiler to detect or prevent overflows or traps or whatever the architecture does. As a consequence of the definition of ints, x+y > x cannot be a theorem. If it was a theorem, it would follow that ints can represent infinite sets of numbers which would be a great trick with a finite number of bits.
Can people stop “explaining” that making C into Java would be hard and would lose performance or that C ints are not really integers or other trivia as attempted justifications of these undefined program transformations?
As for making the overflow behaviour “sane”, the notion that you could add two positive integers and then meaningfully check whether the result was smaller than either was bat-shit crazy to begin with.
Wow, so all that work in finite field theory is bat-shit crazy?
That’s… not what I said.
Can people stop “explaining” that making C into Java
I’m afraid you’ve crossed your wires again. Nobody was talking about making C into Java.
so from the C standard I can both conclude that sizeof(int) == 4 or 8 and for int i, i+1 > i is a theorem so a test if(i+1 <= i) panic(); is “bat-shit crazy”? Think about it. Testing to see if addition of fixed length ints overflows is not only mathematically sound, but it matches the operation of all the dominant processors - that’s how fixed point 2s complement math works which is why almost all processors incorporate an overflow bit or similar. Ints are not integers.
so from the C standard I can both conclude that sizeof(int) == 4 or 8 and for int i, i+1 > i is a theorem so a test if(i+1 <= i) panic(); is “bat-shit crazy”?
The test “if (i + 1 < = i)” doesn’t make sense mathematically because it is always false. If the range of usable values of (i + 1) is limited, then it is always either false or undefined.
Testing to see if addition of fixed length ints overflows is not only mathematically sound
It’s very definitely not mathematically sound. Limited range ints only have mathematically sound operation within their limited range.
Ints are not mathematical integers. They are not even bignums. Try again.
Here is a useful theorem for you: using n bytes of data, it is impossible to represent more than 2^{8*n} distinct values.
In mathematics whether i+1 > i is a theorem depends on the mathematical system. For example in the group Z_n, it is definitely not true. Optimization rules that are based on false propositions will generate garbage.
“Limited range ints only have mathematically sound operation within their limited range.” - based on what? That’s absolutely not C practice and certainly not required by the C standard. It doesn’t follow mathematical practice and it’s way off as a model of how processors implement arithmetic.
Right, they have a limited range. Within that range, they behave exactly as mathematical integers.
what do you base that on? And you know they don’t behave like the mathematical integers mod 2^n because? Even though that’s how the processors usually implement them?
There is nothing in the C standard that supports such an approach. In fact, if it were correct, then x << 1 would not be meaningful in C.
I base that on how the C language defines operations on them; for +, for example, “The result of the binary + operator is the sum of the operands”. It does not say “… the sum of the operands modulo 2^n”.
And you know they don’t behave like the mathematical integers mod 2^n because?
For unsigned types, the text says: “A computation involving unsigned operands can never overflow,
because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type” (C99 6.2.5). Therefore, the unsigned integers do behave like mathematical integers mod 2^n. However, there is no equivalent text for signed types, and C99 3.4.3 says: “An example of undefined behavior is the behavior on integer overflow”. Specifically, 6.5 says: “If an exceptional condition occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not in the range of representable values for its type), the behavior is undefined.” (emphasis added).
I’m sure you will be able to find the corresponding sections in C11 if you wish.
There is nothing in the C standard that supports such an approach.
Not except for the text which describes it as such, as reproduced above.
In fact, if it were correct, then x << 1 would not be meaningful in C.
I could only guess how you came to that conclusion, but I don’t care to. This discussion has become too ridiculous for me. Good day.
However, there is no equivalent text for signed types, and C99 3.4.3 says: “An example of undefined behavior is the behavior on integer overflow”.
Correct. So it’s possible, if you are a bad engineer and a standards lawyer, to claim that the standard gives permission for the implementation to run Daffy Duck cartoons on overflow. However, nothing in the standard forbids good engineering - for example - it is totally permissable to use the native arithmetic operations of the underlying architecture and I am 100% sure that was the original intention. There is certainly no requirement for your “mathematics with holes in it” model and since there is no good engineering excuse for it, QED.
Since compilers already provide options for wrapping integer overflow, I think it’s a reasonable to propose to make those options default. After all, people who want undefined integer overflow for optimization or otherwise can use options to do so after default is changed. (If this sounds inconvenient, the exact same applies to “use options and stop complaints”.) Note that this change is backward compatible. (Although going back won’t be.)
Same applies for strict aliasing. I am much more uncertain about other undefined behaviors, for example null dereference, because when there are no pre-existing options such standard change would require (in my opinion quite substantial) additional work for implementations.
Since compilers already provide options for wrapping integer overflow, I think it’s a reasonable to propose to make those options default.
Just because compilers offer an option to do something, doesn’t mean that it’s reasonable to make that something a default. (But sure, if the standard gets changed - I doubt it will - so that integer overflow is defined as wrapping, everyone can use compiler flags to get the old behaviour back, and that would be perfectly acceptable).
I’d personally much rather have integer overflow trap than wrap. As far as I can see all that wrapping gives you is an easier way to check for overflow; there’s very few cases where it’s useful in its own right. The problem is, people will still forget to check, and then wrapping still gives the wrong result. But there’s no need to change the standard for this: I can already get it with a compiler switch. (edit: note also that trapping on overflow still allows some of the optimisations that defining it as wrapping wouldn’t).
I am much more uncertain about other undefined behaviors, for example null dereference
It would be easy enough to define that as causing immediate termination; the real question is whether this would be worth doing.
Edit: you may also have missed the main point of my comment, which was that this proposed (one-word) change would not actually cause the behaviour to become defined.
I am much more uncertain about other undefined behaviors, for example null dereference
It would be easy enough to define that as causing immediate termination
Easy enough to define it that way, sure, but I don’t think it would be a popular move in the embedded world – on MMU-less systems where the hardware might not trap it, seems like that would force the compiler to insert runtime checks before every pointer dereference.
Right, hence the note about considering whether it would be worth doing. (I suspect that what a lot of complaints about the standard are missing, is just how significant these little optimisations from exploiting the nature undefined behaviour are, when the code potentially runs on some small embedded device. Really, most of the complaints about the language should be re-directed to the compiler vendors: why do they not choose safer defaults? But then, to be fair, they largely do. I don’t think gcc for example enables strict overflow by default: you have to enable optimisation).
It would be easy enough to define that as causing immediate termination; the real question is whether this would be worth doing.
Nobody is asking for C implementations to force traps on null dereference. Nobody. So why are you trying to explain it would be hard or have negative consequences?
The statement you quoted had nothing to do with traps on overflow, it was about null pointer dereference. (In fact, I specifically argued for trap-on-overflow. I think you’ve got your wires seriously crossed).
Trap on null dereference is also something that is not necessary. What most people would prefer is that, when reasonable, the action be whatever is characteristic of the environment. So if the OS causes a trap or the architecure explodes on null dereference, or the OS (Like some versions of UNIX and many embedded systems) has valid memory at 0 the derefence fetches the data. This is not something that compilers have any useful information on and they should move on.
My point is while -fwrapv gives wrapping semantics, there is no similar flags for null dereference to compile to “whatever is characteristic of the environment”. This will need additional implementation work.
Look like it is on the way. This “optimization” is already a major source of error, but with LTO it’s going to be unspeakable. Consider a parsing library with extensive null checks linked with a buggy front end. Boom.
Reading the whole thread (including continuation in May) reinforces my impression that this is substantial amount of work. Searching the archive for June and July, it seems the patch author is missing in action and no actual patch was posted.
“That second paragraph is labeled as a “Note” which means that it is non-normative (informational) and do not contain requirements so they are not binding on the implementation.”
“Interesting it looks like C89 did not have notes, but a lot of content was moved to notes in C99.”
I’d like to think I’ve a fair bit about UB in C (both recently, and over the years I’ve worked with the language), but I’m not familiar with the oldest version of the standard (C89) and was unaware of this difference from that version and its modern descendants.
I must say, many of the proposals I’ve heard online lately about fixing UB in C I’ve found either ill-informed, over-zealous or in some other way objectionable; but this seems to not only offer a much friendlier environment for the programmer, but also seems completely reasonable. It does not curtail a compiler’s ability to avoid having to care about the difficult-to-catch situations (as they can just ignore the case), which is one of the only general benefits of UB (to implementors).
I’m not a member of the C Committee (if only wishing made it so), but if I were, I’d love to hear this proposal.
Regardless of whether the intended effect is reasonable or not, I don’t think this one-word change would really help. There’s still way too much ambiguity:
There’ve been a number of posts recently trying to argue that C should essentially do away with undefined behaviour; I think it’s time for people to move on and accept that the undefined behaviour has been inherent in the standard for some time, and made use of for optimisation by compilers for some (slightly lesser) time, and it’s here to stay. Code which relied on particular integer overflow behaviour, or aliasing pointers with incompatible types, or so on, was never really correct C - it’s just the compiler once (or at least usually) generated code which did what the code author intended. Now people are getting upset that they can’t use certain techniques they once did. In some cases this isn’t ideal - I’ll grant that there needs to be a simple way in standard C to detect overflow before it happens, and there currently isn’t - but it’s time to accept and move on. Other languages provide the semantics you want, and compiler switches allow for non-standard C with those semantics too; use them, and stop these endless complaints.
As for making the overflow behaviour “sane”, the notion that you could add two positive integers and then meaningfully check whether the result was smaller than either was bat-shit crazy to begin with.
Wow, so all that work in finite field theory is bat-shit crazy?
The C standard defines “int” as fixed length binary strings representing signed integers and even has a defined constant max value. C ints are not bignums and C does not ask the compiler to detect or prevent overflows or traps or whatever the architecture does. As a consequence of the definition of ints, x+y > x cannot be a theorem. If it was a theorem, it would follow that ints can represent infinite sets of numbers which would be a great trick with a finite number of bits.
Can people stop “explaining” that making C into Java would be hard and would lose performance or that C ints are not really integers or other trivia as attempted justifications of these undefined program transformations?
That’s… not what I said.
I’m afraid you’ve crossed your wires again. Nobody was talking about making C into Java.
so from the C standard I can both conclude that sizeof(int) == 4 or 8 and for int i, i+1 > i is a theorem so a test if(i+1 <= i) panic(); is “bat-shit crazy”? Think about it. Testing to see if addition of fixed length ints overflows is not only mathematically sound, but it matches the operation of all the dominant processors - that’s how fixed point 2s complement math works which is why almost all processors incorporate an overflow bit or similar. Ints are not integers.
The test “if (i + 1 < = i)” doesn’t make sense mathematically because it is always false. If the range of usable values of (i + 1) is limited, then it is always either false or undefined.
It’s very definitely not mathematically sound. Limited range ints only have mathematically sound operation within their limited range.
Ints are not mathematical integers. They are not even bignums. Try again.
Here is a useful theorem for you: using n bytes of data, it is impossible to represent more than 2^{8*n} distinct values.
In mathematics whether i+1 > i is a theorem depends on the mathematical system. For example in the group Z_n, it is definitely not true. Optimization rules that are based on false propositions will generate garbage.
“Limited range ints only have mathematically sound operation within their limited range.” - based on what? That’s absolutely not C practice and certainly not required by the C standard. It doesn’t follow mathematical practice and it’s way off as a model of how processors implement arithmetic.
Right, they have a limited range. Within that range, they behave exactly as mathematical integers.
Irrelevant.
what do you base that on? And you know they don’t behave like the mathematical integers mod 2^n because? Even though that’s how the processors usually implement them?
There is nothing in the C standard that supports such an approach. In fact, if it were correct, then x << 1 would not be meaningful in C.
I base that on how the C language defines operations on them; for +, for example, “The result of the binary + operator is the sum of the operands”. It does not say “… the sum of the operands modulo 2^n”.
For unsigned types, the text says: “A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type” (C99 6.2.5). Therefore, the unsigned integers do behave like mathematical integers mod 2^n. However, there is no equivalent text for signed types, and C99 3.4.3 says: “An example of undefined behavior is the behavior on integer overflow”. Specifically, 6.5 says: “If an exceptional condition occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not in the range of representable values for its type), the behavior is undefined.” (emphasis added).
I’m sure you will be able to find the corresponding sections in C11 if you wish.
Not except for the text which describes it as such, as reproduced above.
I could only guess how you came to that conclusion, but I don’t care to. This discussion has become too ridiculous for me. Good day.
Correct. So it’s possible, if you are a bad engineer and a standards lawyer, to claim that the standard gives permission for the implementation to run Daffy Duck cartoons on overflow. However, nothing in the standard forbids good engineering - for example - it is totally permissable to use the native arithmetic operations of the underlying architecture and I am 100% sure that was the original intention. There is certainly no requirement for your “mathematics with holes in it” model and since there is no good engineering excuse for it, QED.
Since compilers already provide options for wrapping integer overflow, I think it’s a reasonable to propose to make those options default. After all, people who want undefined integer overflow for optimization or otherwise can use options to do so after default is changed. (If this sounds inconvenient, the exact same applies to “use options and stop complaints”.) Note that this change is backward compatible. (Although going back won’t be.)
Same applies for strict aliasing. I am much more uncertain about other undefined behaviors, for example null dereference, because when there are no pre-existing options such standard change would require (in my opinion quite substantial) additional work for implementations.
Just because compilers offer an option to do something, doesn’t mean that it’s reasonable to make that something a default. (But sure, if the standard gets changed - I doubt it will - so that integer overflow is defined as wrapping, everyone can use compiler flags to get the old behaviour back, and that would be perfectly acceptable).
I’d personally much rather have integer overflow trap than wrap. As far as I can see all that wrapping gives you is an easier way to check for overflow; there’s very few cases where it’s useful in its own right. The problem is, people will still forget to check, and then wrapping still gives the wrong result. But there’s no need to change the standard for this: I can already get it with a compiler switch. (edit: note also that trapping on overflow still allows some of the optimisations that defining it as wrapping wouldn’t).
It would be easy enough to define that as causing immediate termination; the real question is whether this would be worth doing.
Edit: you may also have missed the main point of my comment, which was that this proposed (one-word) change would not actually cause the behaviour to become defined.
Easy enough to define it that way, sure, but I don’t think it would be a popular move in the embedded world – on MMU-less systems where the hardware might not trap it, seems like that would force the compiler to insert runtime checks before every pointer dereference.
Right, hence the note about considering whether it would be worth doing. (I suspect that what a lot of complaints about the standard are missing, is just how significant these little optimisations from exploiting the nature undefined behaviour are, when the code potentially runs on some small embedded device. Really, most of the complaints about the language should be re-directed to the compiler vendors: why do they not choose safer defaults? But then, to be fair, they largely do. I don’t think gcc for example enables strict overflow by default: you have to enable optimisation).
Nobody is asking for C implementations to force traps on null dereference. Nobody. So why are you trying to explain it would be hard or have negative consequences?
The statement you quoted had nothing to do with traps on overflow, it was about null pointer dereference. (In fact, I specifically argued for trap-on-overflow. I think you’ve got your wires seriously crossed).
Trap on null dereference is also something that is not necessary. What most people would prefer is that, when reasonable, the action be whatever is characteristic of the environment. So if the OS causes a trap or the architecure explodes on null dereference, or the OS (Like some versions of UNIX and many embedded systems) has valid memory at 0 the derefence fetches the data. This is not something that compilers have any useful information on and they should move on.
My point is while -fwrapv gives wrapping semantics, there is no similar flags for null dereference to compile to “whatever is characteristic of the environment”. This will need additional implementation work.
fno-delete-null-pointer-checks
fno-delete-null-pointer-checks is not implemented in Clang.
Look like it is on the way. This “optimization” is already a major source of error, but with LTO it’s going to be unspeakable. Consider a parsing library with extensive null checks linked with a buggy front end. Boom.
I’d appreciate the link to Clang work in progress.
middle of the discussion http://lists.llvm.org/pipermail/llvm-dev/2018-April/122717.html
Thanks for the link!
Reading the whole thread (including continuation in May) reinforces my impression that this is substantial amount of work. Searching the archive for June and July, it seems the patch author is missing in action and no actual patch was posted.
The comments are great too:
“That second paragraph is labeled as a “Note” which means that it is non-normative (informational) and do not contain requirements so they are not binding on the implementation.”
“Interesting it looks like C89 did not have notes, but a lot of content was moved to notes in C99.”
I’d like to think I’ve a fair bit about UB in C (both recently, and over the years I’ve worked with the language), but I’m not familiar with the oldest version of the standard (C89) and was unaware of this difference from that version and its modern descendants.
I must say, many of the proposals I’ve heard online lately about fixing UB in C I’ve found either ill-informed, over-zealous or in some other way objectionable; but this seems to not only offer a much friendlier environment for the programmer, but also seems completely reasonable. It does not curtail a compiler’s ability to avoid having to care about the difficult-to-catch situations (as they can just ignore the case), which is one of the only general benefits of UB (to implementors).
I’m not a member of the C Committee (if only wishing made it so), but if I were, I’d love to hear this proposal.
This is essentially the proposal I have made. https://docs.google.com/document/d/1xouelPcphQ-o7DmdSwz5UcL42M6bdA3t93Nm_5Hbomc/edit?usp=sharing
Pretty click-baity title