1. 28

Hopefully, my last word on this topic for some time.

  1.  

  2. [Comment removed by author]

    1. 2

      Coming from another angle, the animosity seems to be between people who see C as a high level assembler and people who recognize the abstract machine and its semantics. This isn’t just a compiler authors vs C programmers thing.

      The cool thing about UB is that it allows an implementation to be a high level assembler or not. It allows very simple, naive implementations as well as complicated optimizing ones. You as a user should get to choose. The choice isn’t made for you by the standards committee.

      As soon as you ask the committee to take out UB and dictate specific behaviors, you’re trying to use their hand to force C into becoming one thing above the other.

      1. 1

        See limits.h. anyways I’m amazed at how contraversial it is to object to terrible software design. Surprises are bad. Demands that users scrutinize details of a complex illogical standard and then guess what common idioms may fail are symptomatic of a dysfunctional Dev team.

        1. 1

          As soon as you ask the committee to take out UB and dictate specific behaviors, you’re trying to use their hand to force C into becoming one thing above the other.

          Couldn’t you get rid of UB but make it all “implementation defined”?

          1. 2

            Think about it.

            That would not help as much as you might think. It’s either a terrible documentation burden for the implementation (which entails more than just the compiler!), or they’re going to write something like “anything might happen”, which is as good as UB. Whiners who whined about UB would now just whine about terrible vague implementation defined behavior they can’t rely on.

            For a concrete example, try to define and document the behavior for array out of bounds access. What might happen? A segfault, perhaps. Or you overwrite some variable or pointer that changes your program behavior unpredictably. Or your code is running on some system with memory mapped io and that io write sets off the fire alarm and sprinklers. Or it launches a nuke.

            Just about anything could happen, and it is impossible for the compiler writer to write a non-vague description unless their implementation always does some kind of array bounds checking to ensure something predictable always happens. (Good luck with arbitrary pointer arithmetic.)

            How would you like an implementation that documents that null pointer checks following a deference of said pointer may be optimized out? We get the same problem, and same whining about it. Changing the standard like this doesn’t fix it.

            Now if you want that array bounds checking and other implementation specific stuff, you can have that already without rewriting the standard. UB doesn’t mean “the implementation may not do something sensible and predictable and documented.”

            Again, I consider this a feature. An implementation may be really simple, and you don’t have to use it, you can go find (or implement) an extreme implementation that has all the checks and guarantees (with documentation) that you want.

            1. 2

              You don’t belong to the Internet, we don’t like sensible people or their sensible arguments around here.

              1. 2
                1. Code containing UB is “non-comforming” according to the standard. UB is treated as different from “indeterminate” or even “machine dependent”.
                2. The “anything might happen” condition is being interpreted in a really peculiar way which, I am sure, is very far from the original intended meaning. I doubt that the C89 standard or even C90 envisioned that a compiler might format the hard drive on a UB arithmetic shift.
                1. 3
                  1. How UB is treated is entirely up to the implementation. The standard doesn’t impose any requirements, but an implementation is free to provide any documented guarantees it wants.

                  2. So you have a beef with some implementation(s). And to force their hand, you would prefer to change the standard. That’s a disappointingly aggressive way to get where you want to be. Instead of whining about the standard, you could exercise the freedom it gives you to find or make an implementation that gives you all the guarantees you want. (You could start by using -fwrapv and by not abusing -O3; the manual of your friendly compiler probably has a lot more in store for you). Meanwhile the rest of us may continue to disagree with your opinion of the interpretation.

                  1. 0

                    You are attempting to excuse poor engineering design. Try: “My default query optimizer for SQL will format the disk if the query tries to join incorrectly and the SQL standard says that query has undefined results.” Or: “My default memory map system for for the OS will replace your program with echo rm -rf * if you have a memory fault because POSIX does not mandate any particular behavior on memory faults”. To me, and this is just my opinion which you are of course free to reject, the purpose of software is to run applications. If your software breaks applications as an “optimization” that does not have a super compelling justification, then you should be fired.

          2. 2

            Go is C’s successor :)

          3. 8

            There’s a lot of complaining in that post, but ultimately, the compilers aren’t doing anything “wrong” or even technically unexpected. “Undefined behavior” is just that, undefined.

            Early versions of GCC took that in classic hacker humor fashion, even. The C89 standard defines the “#pragma” directive as having “undefined behavior”. Early GCC versions would, upon encountering an unknown “#pragma”, launch NetHack or Rogue.

            1. 13

              I’m sure we all agree that this compiler behaviour conforms to the letter of the C standard. That doesn’t make it productive or even, I would say, “right”.

              1. 5

                In fact, they are doing something wrong. As Jones noted - the standard does not prohibit error flagging or expected machine dependent behavior on UB. The compiler developers made a design choice to assume 0=1 on UB when they could perform “optimizations”. But that’s a stupid design decision. I think the attitude that someone like DJB should just suck it up and waste his time trying to decipher the obscure “nuances” of the standard is ridiculous.

                1. 12

                  You’re not prohibited from compiling with -fwrapv either. Telling people about this option might actually help them.

                  1. 4

                    Yeah, it’s the sort of thing where the compiler developers have put in the substantial amount of time necessary to really grok all the details of the C standard, and have little care for the idea that other people haven’t and won’t put in that same amount of time. More importantly, that the compiler developers should have some degree of sympathy for those other people and try to avoid making these sorts of assumptions and optimizations that will surprise anyone without the same advanced level of expertise.

                    To make a small analogy to role playing games, compiler developers are the rules lawyer min-maxers, murderhoboing all over your code. And you’re looking for a good story and a fun experience, and the whole thing is off putting when they make these sorts of aggressive optimizations, and their only defense for being like that is that you should read the standard. The only option becomes to be just like the murderhobo, and that’s not fun for a lot of people.

                    1. 1

                      It’s really worth trying to read the C11 standard explanation of, for example, what type casts are permitted. Look at page 77 of (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf) for example.

                    2. 2

                      I don’t think the compiler developers ever made an actual choice to “assume 0=1” or anything similarly pugnacious. Rather they make the choice to assume that constraints are not violated by the program and make use of this (together with the “undefined behaviour” allowance) to make optimisations that wouldn’t otherwise be possible. The problem is people expecting a certain behaviour from, for instance, signed integer overflow which they may have seen previously - because the compilers didn’t have such sophisticated (or “agressive” if you insist) optimisation as they do today, but never because that behaviour was actually mandated.

                      Actually, if you don’t rely on signed overflow and you don’t do type punning, C is still a perfectly usable language for encryption or anything else (though there are often much better choices).

                      I understand what you are saying - and I feel like sometimes the compiler vendors do go too far - but I feel like complaints such as these are often either misdirected (is it the compiler or the language standard to blame?) or come from a some kind of self-righteous indignation (dammit! why won’t this compiler do what I want it to? Who do these compiler developers think they are, telling me the bug is in my code?!!).

                      Yes, there are obscure nuances which e.g. allow type punning in some cases. But if you avoid type punning altogether - which many languages force you to do - you don’t need to understand those nuances. And if you don’t think of C as a kind of high-level assembly language, then you shouldn’t expect signed integer overflow to work - and if you do think of it as a high-level assembly, then you’re wrong.

                      1. 3

                        I don’t think many C programmers were even aware of the process of adding additional UB. And obviously the standard committee itself even didn’t understand the implications of what they were doing - or else the char* hack would not have been needed.

                        Who do these compiler developers think they are, telling me the bug is in my code?!!).

                        The problem is that they do not tell you. Note that using signed integers for for loop counters is about as idiomatic as you can go in C.

                        1. 2

                          I don’t think many C programmers were even aware of the process of adding additional UB

                          I don’t think there is such a process. What is UB now was always UB. The difference is that in the past, the behaviour tended to more closely mirror behaviour expected due to an understanding of the underlying hardware and an incorrect assumption about how the compiler operated.

                          Note that using signed integers for for loop counters is about as idiomatic as you can go in C

                          Sure, but there’s nothing wrong with using signed integers for loop counters. The problem only comes when you write code that expects that if it keeps incrementing such a counter it will eventually jump to a negative value. That’s not idiomatic.

                          1. 7

                            Also, fun fact, loop counters being idiomatically plain int is actually what motivates the compiler to rely on signed overflow not occurring. Otherwise it generates uglier code. For an example: https://gist.github.com/rygorous/e0f055bfb74e3d5f0af20690759de5a7

                            1. 1

                              My experiments with gcc and clang show that using a unsigned int counter in a loop improves performance a bit.

                              1. 1

                                Can you show these experiments?

                                1. 1

                                  they are totally simple

                                   void floop(int n, float P[], float v){
                                      for(int i=0; i < n; i++) P[i]= v;
                                      }
                                    void ufloop(int n, float P[], float v){
                                      for(unsigned int i=0; i < (unsigned)n; i++) P[i]= v;
                                     }
                                  

                                  run a bunch of times and count

                                  1. 2

                                    https://godbolt.org/g/fFpBuk

                                    On gcc 7.1 with -O2 these two only differ by one opcode. Only on -O3 they start differing significantly.

                                    1. 2

                                      On gcc 7.1 with -O2 these two only differ by one opcode.

                                      And it isn’t even in the loop.

                                      Only on -O3 they start differing significantly.

                                      Yet these differences are rather unsubstantial. Differently named labels, swapped operands for cmp, and correspondingly swapped jump opcodes. The most substantial difference is that the code using ints does a few sign extensions. Worth noting is that all those differences take place at the “tail” of the loop, where the few remaining floats that weren’t done by the xmm-wide copy are handled. The hot part of the loop is identical, just as in the -O2 version.

                            2. 1

                              Not correct. The standards have added things to the UB list from “indeterminate” etc.

                              1. 1

                                If you’re talking about the proposals of David Keaton, you need to be aware that:

                                Unfortunately, there is no consensus in the committee or broader community concerning uninitialized reads

                                In fact, there is a formal proposal here (which comes out of the Cerberus survey) which specifically suggests clarifying that the read of an uninitialized variable does not give UB:

                                http://www.cl.cam.ac.uk/~pes20/cerberus/n2089.html

                                If you’re talking about something else, other than minor clarifications of behaviour which was obviously dubious in the first place, I’m curious to hear what it is.

                                1. 0

                                  Note the original C ANSI standard included the following

                                  Undefined behavior — behavior, upon use of a nonportable or erroneous program construct, of erroneous data, or of indeterminately-valued objects, for which the Standard imposes no requirements. Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).

                                  This would forbid what we are now told is intrinsic to C compilation. Just as a matter of common sense, “optimizing” “a = b+c; Assert(a> b && a>c)” to remove the assert because it is convenient to assume undefined behavior never happens is absurd. Or consider:

                                  The value of a pointer that referred to an object with automatic storage duration that is no longer guaranteed to be reserved is indeterminate.

                                  I believe that is now UB. Or consider

                                  A pointer to an object or incomplete type may be converted to a pointer to a different object type or a different incomplete type. The resulting pointer might not be valid if it is improperly aligned for the type pointed to. It is guaranteed, however, that a pointer to an object of a given alignment may be converted to a pointer to an object of the same alignment or a less strict alignment and back again; the result shall compare equal to the original pointer.

                                  Compare that to the total hash in C11.

                                  As for the proposal you link to, it’s indicative of a disregard of C practice. For example, the proposal is that if you compute a checksum on a partly initialized data structure, that checksum could be made unstable and unreliable. I don’t get the advantage of ruling the BSD and Linux network stacks to be subject to this sort of nonsense.

                                  1. 2

                                    Just as a matter of common sense, “optimizing” “a = b+c; Assert(a> b && a>c)” to remove the assert because it is convenient to assume undefined behavior never happens is absurd

                                    I disagree. The reason signed overflow is specified as undefined (and not say some specific set of possible behaviours including eg wraparound, termination and termination) is precisely to allow such an optimisation. For idiomatic use of loop counters, for instance, this can allow for generation of significantly better code. Even allowing those specific behaviours would at least make the behaviour platform dependent.

                                    In any case the assert could only be optimised away if the compiler was certain that b > 0 and c > 0. For the programmer to have written the above and not merely be trying to assert that (b > 0 && c > 0) implies that they were familiar with bitwise representation but ignorant of C’s operator semantics.

                                    However:

                                    For example, the proposal is that if you compute a checksum on a partly initialized data structure, that checksum could be made unstable and unreliable

                                    I agree that this needs a solution, and I agree that the best solution is to mandate that reading an uninitialised variable (including a member of a partly initialised structure) and then reading it again should yield the same value both times. I don’t see much value in that proposal; I linked to it merely to point out that it’s not the case that reading an unitialised value is necessarily going to become undefined behaviour.

                                    1. 1

                                      This would forbid what we are now told is intrinsic to C compilation.

                                      I don’t read it that way. e.g. I think it’s fair to characterize the compiler’s behaviour in your example as ignoring the situtaion where b + c overflows completely, which is something the standard explicitly permits.

                                      1. 0

                                        It’s not ignoring it - it is making the assumption that it cannot happen. Ignoring it would involve compiling the code as is, not rewriting the loop. Consider the case DJB notes - where the conforming null check was removed because the compiler assumed if the pointer was null the prior reference would be UB!

                                        1. 3

                                          Removing a redundant assert is entirely normal and reasonable behaviour for an optimizing compiler, and that particular assert is flagrantly redundant in all situations except the one in which an explicitly permissible behaviour is “ ignoring the situation completely with unpredictable results”.

                                          1. 1

                                            It’s redundant in all false cases, yes.

                      2. [Comment removed by author]

                        1. 4

                          Some DACs and ADCs have “saturating” math: INT_MAX + 1 == INT_MAX. This is because rollover would cause glitching in the audio stream (switching from near INT_MAX to near -INT_MAX and back), whereas sticking at INT_MAX produces a much less noticeable glitch and does not run a risk of harming analog hardware being fed the signal.

                          1. 4

                            As vyodaiken said, some architectures can trap on signed overflow. I know the VAX can do that (there’s a flag in the CPU you can set to enable that) and the MIPS (by using the add instruction, but every C compiler I ever used on the MIPS used the addu version which doesn’t trap).

                            The problem is that most (and by “most” I mean “over 99 44/100% of programmers” but I could be off by as much a a percent or two) has never encountered or will never encounter such a system (much like most programmers will never encounter a 1’s complement integer CPU, and no programmer will ever encounter a signed-magnitude integer CPU [1] yet the C standard still makes allowances for those).

                            The other problem is that C doesn’t mandate a set size for integers. On a 16-bit CPU, yes, overflow is probably going to happen. On a 32-bit system, less likely (unless you are working with large data sets) and almost not at all on a 64-bit system (unless it’s a real bug and yes, this is a gut feeling I have).

                            Having read up on this mess, I’m slowly coming the conclusion that the following routine:

                            int add(int a,int b)
                            {
                              return a + b;
                            }
                            

                            Invokes undefined behavior because it could overflow—to be really sure, you need to do:

                            int add(int a,int b)
                            {
                              if ((a > 0) && (b > 0))
                              {
                                if (INT_MAX - a > b) abort();
                                if (INT_MAX - b > b) abort();
                              }
                              else if ((a < 0) && (b < 0))
                              {
                                if (INT_MIN - a < b) abort();
                                if (INT_MIN - b < a) abort();
                              }
                              return a + b;
                            }
                            

                            And here, I’m only half-joking. Sometimes, I really miss the carry bit from assembly.

                            [1] I’m certain on the signed-magnitude bit because as far as I can tell, only one machine was ever commercially available, and that was back in the 50s!

                            1. 2

                              I think you are right. If you have

                               int a= INT_MAX, b = 1; // or some code that produces equivalent
                               ....
                               c= myfunc(a,b);
                               ...
                              
                               int myfunc(int a,int b) { return a+b; }
                              

                              I believe the compiler is permitted to replace your program with ransomware according to the current standard and interpretation.

                            2. 3

                              In some architectures signed overflow can be set to or must cause traps. In those cases you’d want some signal. It’s an actual error to overflow. there may be other methods. It would be nice if the overflow flag was visible to C code so you could do for(int i=0; nooverflow() && i < N; i++) ….

                              1. 2

                                Some platforms trap overflows in hardware, resulting in program termination/signals/exceptions/whatever.

                              2. [Comment removed by author]

                                1. 2

                                  Unsigned ints form the commutative group of integers modulo 2^numberofbits. Signed ints seem like they could be treated as a commutative ring of some sort. It would be nice to see actual mathematical analysis in place of this ad-hoc stuff.

                                2. 1

                                  It shouldn’t be news to anybody that C is “dangerous” and filled with undefined behavior.

                                  There are plenty of other languages to use.

                                  1. 2

                                    For new projects, sure. Use them. Meanwhile, C programmers will have to continue maintenance of infrastructure you rely on. If all C programs vanished from the universe tomorrow, the internet would be dead. Solve that problem before telling everybody to stop using C.

                                    1. 1

                                      The point I was trying to make is that this rant doesn’t add any value. Everybody knows C has a ton of undefined behavior. If a person has chosen to use C, despite all of the alternatives, it’s silly for them to complain about it. Integer overflow is one of the more common and well known instances of undefined behavior in C, so this really shouldn’t have been a surprise.

                                      Nobody’s forcing this person to maintain this software in C. He could start a new project that does the same thing (presumably just as fast, but safer) in a better language. But he’s not doing that, he’s choosing to use C, so C is what he gets.

                                      1. 1

                                        The internet routes around damage, I’m pretty sure it would keep running - certainly it would not take long to get it up and running without C as soon as there was an actual incentive to do so. Unfortunately at the moment the costs of insecure software fall elsewhere, so insecure software (and make no mistake, most C connected to the Internet is remotely exploitable) is routinely treated as “ain’t broke, don’t fix it”.

                                        1. 2

                                          It wouldn’t keep running because there would not be any device drivers for all the machines wired up on the net, nor any mature kernels with decent platform support for all of these machines.

                                          1. 1

                                            Many machines would have limited, immature support for their hardware, sure. But the internet can work with that.

                                        2. 1

                                          Your analysis ignores the ability of alternatives to integrate or compile to C with more analysis or safety checks done automatically. That allows one to use safer languages in place of C in a large, C project. The project is gradually rewritten in the new language over a long period of time with all new stuff having a chance to benefit immediately. There’s also been developments like Cyclone language that would keep the new one close to C with extra safety. In such a situation, the developer is mostly keeping the code the same with the rewrites stating their intent (i.e. constraints) to the compiler for easier analysis.

                                          Most of these have either disappeared or languished in obscurity since C developers just don’t use them. What it comes down to is not operational necessity as you indicate but social factors such as group preferences. There’s an economic argument but Cyclone-like languages or compiler-assisted safety that mostly eliminate it didn’t get any extra uptake over the others. So, we’re back to developer’s simply preferring to use something unsafe with plenty of undefined behavior. I am talking desktop, server, and smartphone mobile here rather than people working in embedded on MCU’s.

                                          1. 2

                                            Yes, I agree. For software such as Subversion, using C makes no sense today. It could be gradually rewritten in Rust or some similar language, and probably should (if anyone wants to do that, ask me how to get involved – I am serious but rather doubt anybody will take me up on that). I was told the original reasons for choosing C for the SVN project were that, in 99/2000 most people in the open source community were familar with C and they needed to grow the developer base, there were workable ways to integrate with langauges other than C (see the bindings, which unfortunately never got used to their full potential), and that open source C++ compilers weren’t considered mature enough at the time. But C++ was seriously considered. Just a few years later, Greg Stein (SVN and Python dev) found himself rewriting the entire working copy support code and told me he wished all/most of SVN had been written in Python (SVN’s test suite is mostly written in Python).

                                            As for OpenBSD, well… there’s the obvious matter of heritage, a huge amount of expertise in C within the community, and the lack of support for older platforms in many fancy newer languages.

                                            I have first hand experience with the above projects. I cannot speak for any others. But there must be other projects with valid reasons for still using C. Some people just don’t seem to get that…

                                            1. 1

                                              “As for OpenBSD, well… there’s the obvious matter of heritage, a huge amount of expertise in C within the community, and the lack of support for older platforms in many fancy newer languages.”

                                              I was talking about legacy software like that in my post. I pointed out people made safer, easier-to-analyze languages that kept as close to C as possible. If you make them easily FFI and output C, then everything you just said is addressed with minimal effort. If they’re not good enough now (academic prototypes), then C developers concerned with software robustness should put in the effort to get them into shape. It will pay off over time in bugs prevented. A simple example is just giving such a language affine types and safe concurrency to give it Rust’s main benefits without leaving C. Eventually, after enough rewrites, you code is provably free of (long list here which includes temporal errors that slip past code review a lot). It would happen for self-contained modules even more quickly.

                                              https://en.wikipedia.org/wiki/Cyclone_(programming_language)

                                              https://www.eg.bucknell.edu/~lwittie/research/Clay.pdf

                                        3. 2

                                          I think it is interesting that many people who defend the over-use of UB in C are advocates of not using C.