Wrap on integer overflow is a good idea, because it will get rid of one of undefined behavior in C.
Undefined behavior is evil. One evil it causes is that it makes codes optimization-unstable. That is, something can work in debug build but does not work in release build, which is very undesirable. The article does not address this point at all.
The article does not address this point at all.
To remove all undefined behaviour in C would severely impact the performance of C programs.
The post does suggest that trap-on-overflow is a superior alternative to wrap-on-overflow, and of course you could apply it to both debug and release builds if this optimisation instability concerns you. Even if you apply it just to your debug build, you’ll at least avoid the possibility that something works in the debug build but not in the release.
Note that Rust wraps on overflow and as far as I can tell this does not impact performance of Rust programs.
This is essentially the same as one of the arguments I addressed in the post. Although in certain types of programs, particularly lower-level languages (like C and Rust) where the code is written directly by a human programmer, there probably are not going to be many cases where the optimisations enabled by having overflow be undefined will kick in. However, if the program makes heave use of templates or generated code, or is produced by transpiling a higher-level language with a naive transpiler, then it could do (I’ll conceded this is theoretical in that I can’t really give a concrete example). The mechanism by which the optimisation works is well understood and it isn’t too difficult to produce an artificial example where the optimisation grants a significant speed-up in the generated code.
Also, in the case of Rust programs, you can’t really reliably assess the impact of wrapping on overflow unless there is an option to make overflow undefined behaviour. Is there such an option in Rust?
No, there is no such option, and there never will be. Rust abhors undefined behaviors. Performance impact assessment I had in mind was comparison with C++ code.
On the other hand, rustc is built on LLVM so it is rather trivial to implement: rustc calls LLVMBuildAdd in exactly one place. One can replace it with LLVMBuildNSWAdd (NSW stands for No Signed Wrap).
To remove all undefined behaviour in C would severely impact the performance of C programs.
This cannot be entirely true. As a reducto ad absurdum, it would be possible in principle to laboriously define all the things that compilers currently do with undefined behaviour and make that the new definition of the behaviour, and there would then be zero performance impact.
C compiler writers might argue that removing all undefined behaviour without compromising performance would be prohibitively expensive, but I’m not entirely convinced; there are carefully-optimized microbenchmarks on which the naive way of defining currently-undefined behaviour produces a noticeable performance degradation, but I don’t think it’s been shown that that generalises to realistic programs or to a compiler that was willing to put a bit more effort in.
This cannot be entirely true. As a reducto ad absurdum, it would be possible in principle to laboriously define all the things that compilers currently do with undefined behaviour and make that the new definition of the behaviour, and there would then be zero performance impact
Clearly that would be absurd, and it’s certainly not what I meant by “remove all undefined behaviour”. Your “possible in principle” suggestion is practically speaking completely impossible, and what I said was true if you don’t take such a ridiculously liberal interpretation of it. Let’s not play word games here.
C compiler writers might argue that removing all undefined behaviour without compromising performance would be prohibitively expensive
They might, but that’s not what I argued in the post.
I don’t think it’s been shown that that generalises to realistic programs or to a compiler that was willing to put a bit more effort in.
There’s no strong evidence that it does, nor that it wouldn’t ever do so.
Well then, what did you mean? You said “To remove all undefined behaviour in C would severely impact the performance of C programs.” I don’t think that’s been demonstrated. I’m not trying to play word games, I’m trying to understand what you meant.
I meant “remove all undefined behaviour” in the sense and context of this discussion, in particular related to what @sanxiyn above says:
Undefined behavior is evil. One evil it causes is that it makes codes optimization-unstable. That is, something can work in debug build but does not work in release build, which is very undesirable
To avoid that problem, you need to define specific behaviours for cases that are currently specify undefined behaviour (not ranges of possible behaviour that could change depending on optimisation level). To do so would significantly affect performance, as I said.
I could believe that removing all sources of differing behaviour between debug and release builds would significantly affect performance (though even then, I’d want to see the claim demonstrated). But even defining undefined behaviour to have the current range of behaviour would be a significant improvement, as it would “stop the bleeding”: one of the insidious aspects of undefined behaviour is that the range of possible impacts keeps expanding with new compiler versions.
I could believe that removing all sources of differing behaviour between debug and release builds would significantly affect performance
It’s not just about removing the sources of differing behaviour - but doing so with sensibly-defined semantics.
though even then, I’d want to see the claim demonstrated
A demonstration can only show the result of applying one set of chosen semantics to some particular finite set of programs. What I can do is point out that C has pointer arithmetic and this is one source of undefined behaviour; what happens if I take a pointer to some variable and add some arbitrary amount, then store a value through it? What if doing so happens to overwrite part of the machine code that makes up the program? Do you really suppose it is possible to practically define what the behaviour should be in this case, such that the observable behaviour will always be the same when the program is compiled with slightly different optimisation options - which might result in the problematic store being to a different part of code? To fully constrain the behaviour, you’d need pointer bounds checking or similar - and that would certainly have a performance cost.
But even defining undefined behaviour to have the current range of behaviour would be a significant improvement, as it would “stop the bleeding”
As I’ve tried to point out with the example above, the current range of undefined behaviour is already unconstrained. But for some particular cases of undefined behaviour, I agree that it would be better to have more restricted semantics. For integer overflow, in particular, I think it could reasonably be specified that the result becomes unstable (eg. behaves incosistently in comparisons), but the behaviour is otherwise defined - for example. Note that even this would impede some potential optimisations. (And that I still advocate trap on overflow as the preferred implementation).
I suspect that one issue is that compilers may manifest different runtime behaviour for undefined behaviour, depending on what specific code the compiler decided to generate for a particular source sequence. In theory you could document this with sufficient effort, but the documentation would not necessarily be useful; it would wind up saying ‘the generated code may do any of the following depending on factors beyond your useful control’.
(A canonical case of ‘your results depend on how the compiler generates code’, although I don’t know if it depends on undefined behaviour, is x86 floating point calculations, where your calculations may be performed with extra precision depending on whether the compiler kept everything in 80-bit FPU registers, spilled some to memory (clipping to 64 bits or less), or used SSE (which is always 64-bit max).)
It’s not only possible: Ive seen formal semantics that define various undefined behaviors just like you said. People writing C compilers can definitely do it if they wanted to.
For the trivial case where wrapping behaviour does allow simply detecting overflow after it occurs, it is also straightforward to determine whether overflow would occur, before it actually does so. The example above can be rewritten as follows:
If it’s really so trivial then why not have the compiler do that rewrite? Given that the “wrong” version is valid language syntax, programmers will write it and compilers will have to compile it; no amount of encouraging programmers to rewrite gets us away from having to decide what the compiler should do when fed the “wrong” code.
An obvious mitigation for the problem of programmers expecting this particular behaviour is for the compiler to issue a warning when it optimises based on the alternative undefined-behaviour-is-assumed-not-to-occur semantics.
Building for maximum performance and warning about a correctness violation seems like the wrong priority. Why not build the code to behave in the way that you’re sure matches the programmer’s intent and warn about the missed optimisation opportunity?
Also, even without overflow check elimination, it is not necessarily correct to assume that wrapping integers has minimal direct cost even on machines which use 2’s complement representation. The Mips architecture, for example, can perform arithmetic operations only in registers, which are fixed size (32 bit). A “short int” is generally 16 bits and a “char” is 8 bits; if assigned to a register, the underlying width of a variable with one of these types will expand, and forcing it to wrap according to the limit of the declared type would require at least one additional operation and possibly the use of an additional register (to contain an appropriate bitmask). I have to admit that it’s been a while since I’ve had exposure to any Mips code and so I’m a little fuzzy on the precise cost involved, but I’m certain it is non-zero and other RISC architectures may well have similar issues.
It would be good to permit trapping. It would be good to permit whatever the native MIPS behaviour is. But it’s obviously absurd to permit optimizing out the programmer’s overflow check.
How about: “An expression in which signed integer overflow occurs shall evaluate to an implementation-defined value and may also cause the program to receive a signal”? In conjunction with the rules in 5.1.2.3.5 this still permits the compiler to trap, still permits the compiler to use the machine behaviour (twos-complement wrapping, some other form of wrapping, saturating or what-have-you), and still permits the compiler to reorder arithmetic operations (provided they don’t cross a sequence point), but rules out craziness like the elimination of overflow checks.
If it’s really so trivial then why not have the compiler do that rewrite?
Because the compiler doesn’t know what was intended. And I sure as heck don’t want the compiler re-writing any of my code to what it “thinks” I intended. And automatically “fixing” it in some cases but not others (less trivial) will likely lead to confusion.
Why not build the code to behave in the way that you’re sure matches the programmer’s intent and warn about the missed optimisation opportunity?
The compiler can’t be sure what the programmer intended.
You can’t be sure that the code will match the programmer’s intent. Maybe that “overflow check” really is redundant. Maybe the code is generated. Maybe the “overflow check” is only obviously redundant after several other optimisation passes have taken effect.
Giving a warning lets the programmer decide: Is this really ok or did I make a mistake? “Fixing” it for them pesimises without any way to undo that pesimisation, except in the very trivial cases, which, again, might be in generated source code.
It would be good to permit trapping. It would be good to permit whatever the native MIPS behaviour is. But it’s obviously absurd to permit optimizing out the programmer’s overflow check.
Both trapping and the MIPS behaviour (where a value in an integer of some type can be stored in a register wider than that type, which can incidentally also be done on just about any architecture) would make leaving the overflow check in absurd - because it wouldn’t achieve what it was intended to achieve anyway (even if the compiler could determine the intent).
In Ada, you can tell the compiler what behavior you want for integer overflow (or checks to catch it). Are there compiler hints in C for that or other undefined behavior that make the result predictable?
Because the compiler doesn’t know what was intended. And I sure as heck don’t want the compiler re-writing any of my code to what it “thinks” I intended. And automatically “fixing” it in some cases but not others (less trivial) will likely lead to confusion.
It would be hard to do worse than deleting overflow checks that the programmer wrote into the code, but only sometimes, which is the current behaviour. Undefined behaviour practically guarantees all the things that you’ve just said you don’t want. (Indeed, the compiler is already permitted to behave in the way I’ve suggested it should - it’s just also permitted to do other, less helpful and more confusing, things).
(I do agree that it’s not actually trivial for the compiler to fix these cases - my point was if it’s not so trivial for the compiler, it’s not so trivial for the programmer either. So “the programmer should just rewrite their code so the problem doesn’t happen” isn’t a good answer.)
You can’t be sure that the code will match the programmer’s intent. Maybe that “overflow check” really is redundant. Maybe the code is generated. Maybe the “overflow check” is only obviously redundant after several other optimisation passes have taken effect.
Indeed you can’t be sure, which is why a responsible compiler should fail-safe rather than fail-dangerous. A missed optimization opportunity is much better than a security bug.
Both trapping and the MIPS behaviour (where a value in an integer of some type can be stored in a register wider than that type, which can incidentally also be done on just about any architecture) would make leaving the overflow check in absurd - because it wouldn’t achieve what it was intended to achieve anyway (even if the compiler could determine the intent).
Trapping achieves what the programmer usually intends - the programmer usually wants the function to abort/error when overflow occurs. Certainly it avoids the worst possible outcome, the one thing we can be certain that the programmer didn’t intend - blindly continuing to execute after overflow.
On second thoughts you’re right about MIPS. The programmer almost certainly never intended for an integer to be operated on at a certain width and then truncated at some mysterious later point in the execution of their program. That is so rarely an intended behaviour that I don’t think any responsible compiler should implement it without being very explicitly instructed to.
Undefined behaviour practically guarantees all the things that you’ve just said you don’t want.
No, it doesn’t. In making this claim, you are implying that code which invokes undefined behaviour still has defined semantics. Certainly, the compiler won’t guess what was intended and then try to use undefined behaviour to try and implement that; it assumes that what it was told was intended (i.e. what was expressed by the code, according to the semantics of the language) is what was intended. That’s a reasonable assumption to make given the nature and purpose of a compiler.
It would be hard to do worse than deleting overflow checks that the programmer wrote into the code, but only sometimes, which is the current behaviour.
I agree that, if the compiler could divine that some check (which it would otherwise optimise away) is meant to be a wrapping overflow check for some particular preceding operation, it would be nice if the compiler could issue a strong warning. I do not believe it should ever silently “fix” the problem by applying wrapping semantics to the relevant preceding operations, because that is going to lead to C programmers increasingly believing that overflow does have wrapping behaviour, and I don’t think that is a good idea even if wrapping was the ideal overflow behavior.
One question is, would assuming that most/all code that fits that general pattern is an indication that wrapping behaviour was required by previous operations and compiling accordingly (with a warning), have a negative impact on optimisations and performance? I think the answer is “probably not for most existing programs”, but I also think it could clearly have a significant impact potentially. This could be extended generally to a question about whether wrapping semantics generally could impact optimisation (which I think has the same answer).
Another question is, is wrapping behaviour generally useful, other than for purpose of these overflow checks? I’ve address that in the post; I think the answer is clearly no.
However, is wrapping better behaviour (disregarding performance) than undefined behaviour? From a safety perspective it may be slightly better, since the effect of overflow bugs is more constrained, but the bugs can still happen and can still be exploited. This is anecdotal, but most of the overflow-related security holes that I’ve seen, other than the small number due to compiler-removed post-overflow checks, have been directly caused by the wrapping and not by any other associated undefined behaviour.
So:
Getting back to what you said, the only way to not delete any overflow checks is to enforce wrapping semantics everywhere, and I disagree with that due to the points above.
Indeed you can’t be sure, which is why a responsible compiler should fail-safe rather than fail-dangerous. A missed optimization opportunity is much better than a security bug.
Agreed - that’s why trapping and not wrapping is the right behaviour.
Thanks for bringing up some reasonable discussion. I don’t think there’s a totally objective right/wrong answer at this point - but I hope you see some validity in the points I’ve raised above.
(some small edits made after posting to improve readability).
I’m still working on Dinit, my init system / service manager. I’m trying to keep up momentum and get to the point where I’d be happy releasing a “1.0” version, but there’s quite a bit to do yet - and then there’s the problem of filling in the functionality gaps compared to eg Systemd (which I plan mainly to do with external programs rather than pushing more and more functionality into Dinit itself). I added a bunch of tests recently and even did some fuzzing of the control protocol, which I was surprised to find did not reveal any bugs (this was almost disappointing - it was the first time I’d tried fuzz-testing anything, I was hoping for a useful result! But it’s hard to be unhappy about it).
Currently I’m trying to add some distribution-friendly functionality: the ability to enable or disable services simply via the command line (prior to now it requires hand-editing the service descriptions), so that’s what I’ll be working on this week.
Regardless of whether the intended effect is reasonable or not, I don’t think this one-word change would really help. There’s still way too much ambiguity:
There’ve been a number of posts recently trying to argue that C should essentially do away with undefined behaviour; I think it’s time for people to move on and accept that the undefined behaviour has been inherent in the standard for some time, and made use of for optimisation by compilers for some (slightly lesser) time, and it’s here to stay. Code which relied on particular integer overflow behaviour, or aliasing pointers with incompatible types, or so on, was never really correct C - it’s just the compiler once (or at least usually) generated code which did what the code author intended. Now people are getting upset that they can’t use certain techniques they once did. In some cases this isn’t ideal - I’ll grant that there needs to be a simple way in standard C to detect overflow before it happens, and there currently isn’t - but it’s time to accept and move on. Other languages provide the semantics you want, and compiler switches allow for non-standard C with those semantics too; use them, and stop these endless complaints.
As for making the overflow behaviour “sane”, the notion that you could add two positive integers and then meaningfully check whether the result was smaller than either was bat-shit crazy to begin with.
As for making the overflow behaviour “sane”, the notion that you could add two positive integers and then meaningfully check whether the result was smaller than either was bat-shit crazy to begin with.
Wow, so all that work in finite field theory is bat-shit crazy?
The C standard defines “int” as fixed length binary strings representing signed integers and even has a defined constant max value. C ints are not bignums and C does not ask the compiler to detect or prevent overflows or traps or whatever the architecture does. As a consequence of the definition of ints, x+y > x cannot be a theorem. If it was a theorem, it would follow that ints can represent infinite sets of numbers which would be a great trick with a finite number of bits.
Can people stop “explaining” that making C into Java would be hard and would lose performance or that C ints are not really integers or other trivia as attempted justifications of these undefined program transformations?
As for making the overflow behaviour “sane”, the notion that you could add two positive integers and then meaningfully check whether the result was smaller than either was bat-shit crazy to begin with.
Wow, so all that work in finite field theory is bat-shit crazy?
That’s… not what I said.
Can people stop “explaining” that making C into Java
I’m afraid you’ve crossed your wires again. Nobody was talking about making C into Java.
so from the C standard I can both conclude that sizeof(int) == 4 or 8 and for int i, i+1 > i is a theorem so a test if(i+1 <= i) panic(); is “bat-shit crazy”? Think about it. Testing to see if addition of fixed length ints overflows is not only mathematically sound, but it matches the operation of all the dominant processors - that’s how fixed point 2s complement math works which is why almost all processors incorporate an overflow bit or similar. Ints are not integers.
so from the C standard I can both conclude that sizeof(int) == 4 or 8 and for int i, i+1 > i is a theorem so a test if(i+1 <= i) panic(); is “bat-shit crazy”?
The test “if (i + 1 < = i)” doesn’t make sense mathematically because it is always false. If the range of usable values of (i + 1) is limited, then it is always either false or undefined.
Testing to see if addition of fixed length ints overflows is not only mathematically sound
It’s very definitely not mathematically sound. Limited range ints only have mathematically sound operation within their limited range.
Ints are not mathematical integers. They are not even bignums. Try again.
Here is a useful theorem for you: using n bytes of data, it is impossible to represent more than 2^{8*n} distinct values.
In mathematics whether i+1 > i is a theorem depends on the mathematical system. For example in the group Z_n, it is definitely not true. Optimization rules that are based on false propositions will generate garbage.
“Limited range ints only have mathematically sound operation within their limited range.” - based on what? That’s absolutely not C practice and certainly not required by the C standard. It doesn’t follow mathematical practice and it’s way off as a model of how processors implement arithmetic.
Ints are not mathematical integers.
Right, they have a limited range. Within that range, they behave exactly as mathematical integers.
Here is a useful theorem for you: using n bytes of data, it is impossible to represent more than 2^{8*n} distinct values.
Irrelevant.
Right, they have a limited range. Within that range, they behave exactly as mathematical integers.
what do you base that on? And you know they don’t behave like the mathematical integers mod 2^n because? Even though that’s how the processors usually implement them?
There is nothing in the C standard that supports such an approach. In fact, if it were correct, then x << 1 would not be meaningful in C.
what do you base that on?
I base that on how the C language defines operations on them; for +, for example, “The result of the binary + operator is the sum of the operands”. It does not say “… the sum of the operands modulo 2^n”.
And you know they don’t behave like the mathematical integers mod 2^n because?
For unsigned types, the text says: “A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type” (C99 6.2.5). Therefore, the unsigned integers do behave like mathematical integers mod 2^n. However, there is no equivalent text for signed types, and C99 3.4.3 says: “An example of undefined behavior is the behavior on integer overflow”. Specifically, 6.5 says: “If an exceptional condition occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not in the range of representable values for its type), the behavior is undefined.” (emphasis added).
I’m sure you will be able to find the corresponding sections in C11 if you wish.
There is nothing in the C standard that supports such an approach.
Not except for the text which describes it as such, as reproduced above.
In fact, if it were correct, then x << 1 would not be meaningful in C.
I could only guess how you came to that conclusion, but I don’t care to. This discussion has become too ridiculous for me. Good day.
However, there is no equivalent text for signed types, and C99 3.4.3 says: “An example of undefined behavior is the behavior on integer overflow”.
Correct. So it’s possible, if you are a bad engineer and a standards lawyer, to claim that the standard gives permission for the implementation to run Daffy Duck cartoons on overflow. However, nothing in the standard forbids good engineering - for example - it is totally permissable to use the native arithmetic operations of the underlying architecture and I am 100% sure that was the original intention. There is certainly no requirement for your “mathematics with holes in it” model and since there is no good engineering excuse for it, QED.
Since compilers already provide options for wrapping integer overflow, I think it’s a reasonable to propose to make those options default. After all, people who want undefined integer overflow for optimization or otherwise can use options to do so after default is changed. (If this sounds inconvenient, the exact same applies to “use options and stop complaints”.) Note that this change is backward compatible. (Although going back won’t be.)
Same applies for strict aliasing. I am much more uncertain about other undefined behaviors, for example null dereference, because when there are no pre-existing options such standard change would require (in my opinion quite substantial) additional work for implementations.
Since compilers already provide options for wrapping integer overflow, I think it’s a reasonable to propose to make those options default.
Just because compilers offer an option to do something, doesn’t mean that it’s reasonable to make that something a default. (But sure, if the standard gets changed - I doubt it will - so that integer overflow is defined as wrapping, everyone can use compiler flags to get the old behaviour back, and that would be perfectly acceptable).
I’d personally much rather have integer overflow trap than wrap. As far as I can see all that wrapping gives you is an easier way to check for overflow; there’s very few cases where it’s useful in its own right. The problem is, people will still forget to check, and then wrapping still gives the wrong result. But there’s no need to change the standard for this: I can already get it with a compiler switch. (edit: note also that trapping on overflow still allows some of the optimisations that defining it as wrapping wouldn’t).
I am much more uncertain about other undefined behaviors, for example null dereference
It would be easy enough to define that as causing immediate termination; the real question is whether this would be worth doing.
Edit: you may also have missed the main point of my comment, which was that this proposed (one-word) change would not actually cause the behaviour to become defined.
I am much more uncertain about other undefined behaviors, for example null dereference
It would be easy enough to define that as causing immediate termination
Easy enough to define it that way, sure, but I don’t think it would be a popular move in the embedded world – on MMU-less systems where the hardware might not trap it, seems like that would force the compiler to insert runtime checks before every pointer dereference.
Right, hence the note about considering whether it would be worth doing. (I suspect that what a lot of complaints about the standard are missing, is just how significant these little optimisations from exploiting the nature undefined behaviour are, when the code potentially runs on some small embedded device. Really, most of the complaints about the language should be re-directed to the compiler vendors: why do they not choose safer defaults? But then, to be fair, they largely do. I don’t think gcc for example enables strict overflow by default: you have to enable optimisation).
It would be easy enough to define that as causing immediate termination; the real question is whether this would be worth doing.
Nobody is asking for C implementations to force traps on null dereference. Nobody. So why are you trying to explain it would be hard or have negative consequences?
The statement you quoted had nothing to do with traps on overflow, it was about null pointer dereference. (In fact, I specifically argued for trap-on-overflow. I think you’ve got your wires seriously crossed).
Trap on null dereference is also something that is not necessary. What most people would prefer is that, when reasonable, the action be whatever is characteristic of the environment. So if the OS causes a trap or the architecure explodes on null dereference, or the OS (Like some versions of UNIX and many embedded systems) has valid memory at 0 the derefence fetches the data. This is not something that compilers have any useful information on and they should move on.
My point is while -fwrapv gives wrapping semantics, there is no similar flags for null dereference to compile to “whatever is characteristic of the environment”. This will need additional implementation work.
Look like it is on the way. This “optimization” is already a major source of error, but with LTO it’s going to be unspeakable. Consider a parsing library with extensive null checks linked with a buggy front end. Boom.
middle of the discussion http://lists.llvm.org/pipermail/llvm-dev/2018-April/122717.html
Thanks for the link!
Reading the whole thread (including continuation in May) reinforces my impression that this is substantial amount of work. Searching the archive for June and July, it seems the patch author is missing in action and no actual patch was posted.
I’ve been guilty of trash-talking other projects myself in the past
Well, the blog is titled “Software is Crap” :)
[Rust’s] designers made the unfortunate choice of having memory allocation failure cause termination – which is perhaps ok for some applications, but not in general for system programs, and certainly not for init
Rust can help with not allocating at all (e.g. heapless), and try_reserve is in nightly already.
Zig though is a language oriented exactly at this: it forces you to manually pick an allocator and handle allocation failure. But it is much younger than Rust, so if you’re worried about Rust “mutating” (FYI, Rust 1.x is stable, as in backwards compatible), it’s way too early to consider Zig (0.x).
non-Linux OSes are always going to be Rust’s second-class citizens
Yeah, related to that: Rust’s designers made the unfortunate assumption that OSes don’t break userspace from one release to another, just like Linux. The target extension RFC would solve this.
Other than that… while the core team is indeed focused on the “big three” (Linux/Windows/Mac), Rust does support many “unusual” targets, including Haiku, Fuchsia, Redox, CloudABI.
Back to inits and service managers/supervisors:
There are so many of them, many of them are interesting (I’ve been looking at immortal recently), but they all have one big problem: existing service files/scripts on your system are not written for them. So I usually end up just using FreeBSD’s rc for basic pre-packaged daemons + runit for my own custom stuff.
The Ideal Service Manager™ should:
-f into $SERVICE_flags? horrible and evil hacks like LD_PRELOADing a library that overrides libc’s daemon() with a no-op? lol)I guess instead of preventing forking it can support tracking forking services with cgroups on Linux, and… with path=/ ip4=inherit ip6=inherit sysvmsg=inherit ... jails on FreeBSD? I wish there was a 100% reliable way to make sure any service runs in the foreground.
Well, the blog is titled “Software is Crap” :)
Yeah, there is that. I had originally wanted to emulate a humorous style I’d seen elsewhere (the long defunct “bileblog”) which badmouthed things in such an over-the-top fashion that you knew it was humorous; I could never quite get that right and it always seemed like I was just being nasty. Now I just try to provide objective criticism; it’s probably not as entertaining to read, but it’s also less likely to upset people. And of course, I also write about Dinit and occasionally write (hopefully) helpful articles on other topics.
Rust can help with not allocating at all (e.g. heapless), and try_reserve is in nightly already. Zig though is a language oriented exactly at this:
heapless probably wouldn’t serve my needs, but things like try_reserve are what are sorely needed for Rust to be a serious systems language, so I’m glad that’s happening. There are other reasons (perhaps more subjective) that I don’t like Rust - particular aspects of its syntax and semantics bother me - but in general I think the concept of ownership and lifetime as part of type are worthwhile. I have no doubt that good things will come from Rust.
As for Zig, I need to look at it again. It certainly also has promise; but you’re right that I’d be worried about its stability and future.
I guess instead of preventing forking it can support tracking forking services with cgroups on Linux, and… with path=/ ip4=inherit ip6=inherit sysvmsg=inherit … jails on FreeBSD? I wish there was a 100% reliable way to make sure any service runs in the foreground.
Yeah, that’s a fundamental problem. Linux and DragonFlyBSD both have a simple means to prevent re-parenting past a particular process, which is one potential way to solve it (if you are ok with inserting an intermediate process, and really I don’t think that’s a big deal); cgroups/jails as you mention are another; any other option starts to feel pretty hacky (upstart apparently used ptrace to track forks, but that really feels like abuse of the mechanism to me).
Thanks for your comments.
Yeah, there is that. I had originally wanted to emulate a humorous style I’d seen elsewhere (the long defunct “bileblog”) which badmouthed things in such an over-the-top fashion that you knew it was humorous; I could never quite get that right and it always seemed like I was just being nasty. Now I just try to provide objective criticism; it’s probably not as entertaining to read, but it’s also less likely to upset people. And of course, I also write about Dinit and occasionally write (hopefully) helpful articles on other topics.
The problem with it is: that style of humor is so common in the programming world that even good one is not at all novel. Also, as you say it, it’s also very hard to get right, even for seasoned comedians, which - no offense - most programmer aren’t.
heapless probably wouldn’t serve my needs, but things like try_reserve are what are sorely needed for Rust to be a serious systems language, so I’m glad that’s happening.
Everyone attaches their own meaning to “systems language”, and adding “serious” feels a bit like moving goalposts. “Ah, yeah, you got the systems part down, but how about serious”. It might not be convenient at all places and I agree that some things are undone, but we’re up against literally decades old languages. We’re definitely serious about getting that issue solved in a foreseeable timeframe.
Heapless helps in the sense that you can provide your own stuff on top. Even the basic Box type in Rust is not part of libcore, but libstd.
Servo takes a middle ground of extending Vec with fallible push. (https://github.com/servo/servo/blob/master/components/fallible/lib.rs)
The thing here is mostly that stdlibs collection considers allocation failure and unrecoverable error. For ergonomic reasons, that’s a good pick for a standard library.
So, it’s perfectly feasible to write your own collection library (or, for example extension) even now.
Also, here’s a list of notes about what’s needed to make fallible stuff in the language proper cool. I can assure you after attending the All Hands that this is definitely a hot topic, but also a hard one.
This just as a little bit of context, I’m not trying to convince you.
I’d be very interested in what your semantic issues with Rust are.
To add to that, I’m happy that you took a look at the language, even if you came away wanting.
As for Zig, I need to look at it again. It certainly also has promise; but you’re right that I’d be worried about its atability and future.
I’m definitely hoping for more “new generation” systems programming languages. I think there is quite some space around and I hope that some of these make it.
I’d be very interested in what your semantic issues with Rust are.
A proper answer to that would need me to sit down for an hour (or more) and go through again the material on Rust to remember the issues I had. Some of them aren’t very significant, some of them are definitely subjective. I should qualify: I’ve barely actually used Rust, just looked at it a number of times and had second-hand exposure via friends who’ve been using it extensively. The main thing I can remember off the top of my head that I didn’t like is that you get move semantics by default when passing objects to functions, except when the type implements the Copyable trait (in which case you get a copy), so the presence or absence of a trait changes the semantics of an operation. This is subtle and, potentially, confusing (though the error message is pretty direct). I’d rather have a syntactic distinction in the function call syntax to specify “I want this parameter moved” vs copied.
Other things that bother me are lack of exceptions (I realise this was most likely a design decision, just not one that I agree with) and limited metaprogramming (the “hygienic macro” facility, when I looked at it, appeared a bit half-baked; but then, I’m comparing to C++ which has very extensive metaprogramming facilities, even if they have awful syntax).
I can assure you after attending the All Hands that this is definitely a hot topic, but also a hard one.
Yep, understood.
I’m happy that you took a look at the language, even if you came away wanting.
I’ll be continuing to watch closely. I’m very interested in Rust. I honestly think that some of the ideas it’s brought to the table will change the way future languages are designed.
…you get move semantics by default when passing objects to functions, except when the type implements the Copyable trait (in which case you get a copy), so the presence or absence of a trait changes the semantics of an operation.
I can definitely understand how that would feel worrying, but in practice it’s not so bad: Rust doesn’t have copy constructors, so the Copytrait means “this type can be safely memcpy()d”. For types that can be cheaply and infinitely duplicated without (heap) allocation, like u32, copy vs. move isn’t that much of a semantic difference.
The closest thing to C++’s copy constructor is the Clone trait, whose .clone() method will make a separately-allocated copy of the thing. Clone is never automatically invoked by the compiler, so the difference between moving a String versus copying a String is somefunc(my_string) versus somefunc(my_string.clone()).
lack of exceptions
As a Python programmer, I’m pretty happy with Rust’s error-handling, especially post-1.0 when then ? early-return operator was added. I feel it’s a very nice balance between C and Go-style error handling, which is explicit to the point of yelling, and Java and Python-style error handling, which is minimal to the point where it’s hard to say what errors might occur where.
limited metaprogramming
It depends how much you care about getting your hands dirty. Rust doesn’t have full-scale template metaprogramming like C++, but the hygenic macro system (while limited) is a good start. If you want to go further, Rust’s build system includes a standard and cross-compilation-friendly system for running tasks before your code is compiled, so you can run your code through cpp or xsltproc or m4 or a custom Python script or whatever before the Rust compiler sees it. Lastly, “nightly” builds of the compiler will load arbitrary plugins (“procedural macros”) which will let you do all the crazy metaprogramming you like. Since this involves tight integration with the compiler’s internals, this is not a stable, supported feature, but nevertheless some high-profile Rust libraries like the Rocket web framework are built on it.
Linux and DragonFlyBSD both have a simple means to prevent re-parenting past a particular process
Hmm?? This sounds very interesting! Please tell me more about it.
upstart apparently used ptrace to track forks
Oh, this made me realize that I can actually use DTrace to track forks!
Hmm?? This sounds very interesting! Please tell me more about it.
In linux:
prctl(PR_SET_CHILD_SUBREAPER, 1);
In DragonFlyBSD (and apparently FreeBSD too, I see):
procctl(P_PID, getpid(), PROC_REAP_ACQUIRE, NULL);
In both cases this marks the current process as a “reaper” - any child/grandchild process which double-forks or otherwise becomes orphaned will be reparented to this process rather than to init. Dinit uses this already to be able to supervise forking processes, but it still needs to be able to determine the pid (by reading it from a pid file). There’s the possibility though of inserting a per-service supervisor process which can then be used to keep track of all the processes that a particular service generates - although it still doesn’t provide a clean way to terminate them; I think you really do need cgroups or jails for that.
[Rust’s] designers made the unfortunate choice of having memory allocation failure cause termination – which is perhaps ok for some applications, but not in general for system programs, and certainly not for init
Or just run under Linux and have random processes killed by the OOM killer and random times because that’s so much better letting a program know the allocation didn’t really succeed twenty minutes ago when it could do something about it.
Agreed, the OOM killer is totally bonkers, but its existence doesn’t justify stopping a program due to a failed allocation.
its existence doesn’t justify stopping a program due to a failed allocation.
Yes, especially since overcommit can be turned off, which should largely (if not always - I’m not sure) prevent the OOM killer from acting.
Right. I was saying just let malloc return NULL and let the program deal with it instead of basically lying about whether the allocation succeeded or not. I disable memory overcommit on most of my systems.
For C I totally agree.
The Rust equivalent would be:
let b = Box::new(...);
But Box::new doesn’t return a Result. If allocation fails, the program is terminated.
And so far we have only really talked about the heap. As far as I can tell you never know if stack allocation succeeded until you get a crash! Even in C. But I suppose once the stack is hosed, so is your program, which may not be true for the heap.
This blog post: a case study in being a jerk to someone who is being a jerk, only since Linus is a “jerk” you get off scott-free. Unsurprisingly, this is written by someone who has never contributed to the Linux kernel and who was uninvolved in the discussion he’s picking apart.
The revised email at the end does lose information. Contrary to what hipsters write blog posts complaining about, 99% of Linus’s emails are cordial. The information that’s lost is the conveyance that this is more important to Linus than most subjects.
This comment: a case study in being a jerk to someone who is being a jerk to a jerk.
In all seriousness, I don’t believe that Gary Bernhardt is being a jerk at all. There’s a line between being critical of a piece of work and calling someone brain damaged, and hopefully, we all can see the difference.
Aside: I love when people use the word “hipster” to invalidate other viewpoints. Apparently, there are two modes of being: Being Right and Being A Hipster.
To the unserious comment, I don’t think I was being a jerk. I called him a jerk, which I guess you could argue is a jerk move under any circumstances, but if I’m being a jerk then so is Gary.
To the serious comment, I just want to note that “brain damaged” is a meme among old school hackers which isn’t as strong of a word as you think.
To the aside, I don’t use hipster as an insult or to imply wrongness, but I do use it to invalidate his point. Gary is a Ruby developer. Linus is a kernel developer. The worlds are far removed from each other.
I’ve put tens of thousands of lines of C into production, including multiple Linux kernel drivers. In one case, those kernel drivers were critical-path code on a device used in strain testing the wings of an airplane that you might’ve flown in by now.
I’m not a stranger to the kernel; I just left that world. Behavior like Linus’ in that email was part of the reason, though far from the only reason.
With all of that said: having written a bunch of systems software shouldn’t be a prerequisite for suggesting that we avoid attacking people personally when they make programming mistakes, or what we suspect are programming mistakes.
Exactly. I’ve also met many people that do high-performance, embedded, and/or safety-critical code in C that are more polite in these situations. Linus’ attitude is a separate issue from what’s necessary to evaluate and constructively criticize code.
“brain damaged” is a meme among old school hackers which isn’t as strong of a word as you think.
Yikes. That “meme” is a whole other thing I don’t even care to unpack right now.
I don’t use hipster as an insult or to imply wrongness, but I do use it to invalidate his point. Gary is a Ruby developer. Linus is a kernel developer. The worlds are far removed from each other.
Gotcha. Kernal developer == real old-school hacker. Ruby developer == script kiddie hipster. Are we really still having this argument in 2018?
Yikes. That “meme” is a whole other thing I don’t even care to unpack right now.
“Brain damaged” is a term from back in the Multics days, Linus didn’t make that one up for the occasion. If you’re unfamiliar with the “jargon file” aka hacker dictionary, you can see the history of this particular term here: http://www.catb.org/jargon/html/B/brain-damaged.html
Yikes. That “meme” is a whole other thing I don’t even care to unpack right now.
Listen, cultures are different and culture shock is a thing. I’m in a thread full of foreigners shocked that customs are different elsewhere. You better just take my word for it on “brain damaged” because you clearly aren’t a member of this culture and don’t know what you’re talking about.
Gotcha. Kernal developer == real old-school hacker. Ruby developer == script kiddie hipster. Are we really still having this argument in 2018?
How about you quit putting words in my mouth? Do you really need me to explain the world of difference between Ruby development and kernel hacking? In 2018? It’s not a matter of skill. Gary is great at what he does, but it has almost nothing to do with what Linus does. The people who surround Gary and the people who surround Linus are mutually exclusive groups with different cultural norms.
You can’t use “it’s our culture” as a panacea; calling someone an idiot, moron etc. is a deliberate attempt to hurt them. I guess if what you’re saying is, “it’s our culture to intentionally hurt the feelings of people who have bad ideas,” well, then we might be at an impasse.
The kind of toxic exclusivity and “old school hacker culture” elitism that you’re spouting in this thread is not what I expect to see on Lobsters. It makes me genuinely sad to see somebody saying these things and it also makes me apprehensive of ever being involved in the same project or community as you. Software development today is not what it was 20 –or even 5– years ago. Today it is far more about people than it is about software or technology. You may not like this, but it is the reality.
Lobste.rs always had a few vocal people like this in threads. But note that they’re in the minority and generally are not upvoted as much as the people who aren’t elitist, racist, or just generally being a jerk.
“old school hacker culture” elitism
Near 40, I can agree to be called old. But not elitist.
And I cannot accept to be associated with racist.
Not all software developers are hackers. Not all hackers are software developers.
Is stating this “elitism”? Is it “racism”? Is it being “jerk”?
Or is just using terms properly?
The information that’s lost is the conveyance that this is more important to Linus than most subjects.
So add “I want to stress that this issue is really important to me” at the end of the revised email.
I think that making an issue out of this particular information being lost is missing the point - that it would be possible to say the same thing as Linus did without being abusive.
Contrary to what hipsters write blog posts complaining about
You’re falling into the same trap that the post discusses. This derision isn’t necessary to make your point, and doesn’t make it any stronger - it just adds an unnecessary insult.
Contrary to what hipsters write blog posts complaining about, 99% of Linus’s emails are cordial.
That may well be true, but do we need that last 1% in a professional setting?
(I am not defending Linus’ behaviour here, please don’t put those words in my mouth.)
I strongly take issue with American ideas of “professionalism”, and an even more so with the idea that we get to decide whether this project is “a professional setting” or not. What exactly makes this a “professional setting”? What is a “professional setting”? Why do we hold some interactions to higher standards than others?
I suspect “money changing hands” is the thing that makes this “a professional setting”, and that grinds my gears even further. Why are we supposed to hold ourselves to different standards just because some people are getting paid for doing it?
Right, “professionalism” implies that you only need to be nice to somebody when you want them to something for you or want their money. This should actually be about “respect”, whether or not you want a Linux contributor to do something for you or want their money.
The Linux kernel is not a professional setting. Besides, I argue that the 1% is useful, even in a professional setting - sometimes strong words are called for. I’ll be That Guy and say that people should grow a thicker skin, especially people who weren’t even the subject of the email and have never been involved in kernel development.
If I look at who the contributors to the Linux kernel are, it would certainly appear to be a professional endeavor.
A large chunk of contributions to the kernel are made by people who are getting paid by the companies they work for to contribute. Sounds like a professional setting to me.
Linux development is only “a professional endeavour” (which is a phrase I have strong issues with, see above) because some people decided to build their businesses in Linus’ craft room. We can like or dislike Linus’ behaviour, but we don’t get to ascribe “professionalism” or lack thereof (if there even is such a thing) to Linus’ work or behaviour, or that of any of the contributors.
Even if “professionalism” is an actual thing (it’s not; it’s just a tool used by people in power to keep others down) it’s between the people doing the paying, and the people getting the pay, and has nothing to do with any of us.
This idea that people should behave differently when there’s money involved is completely offensive to me.
But it’s not. It’s a collaboration between everyone, including professionals and hobbyists. The largest group of kernel contributors are volunteers. On top of that, Linus doesn’t have to answer to anyone.
So, having a hobbyist involved means that you can be dickhead? Is that the conclusion that should be drawn from your statements?
No. I’m saying that Linus is not a dickhead, Linux is not a professional endeavour, and neither should be held to contrived professional standards.
“I’m saying that Linus is not a dickhead”
His comments are proving otherwise given the main article shows the same information could’ve been conveyed without all the profanity, personal insults, and so on. He must be adding that fluff because he enjoys it or has self-control issues. He’s intentionally or accidentally a dick. I say that as a satirist whose a dick to people that give me headaches in real life. Although it doesn’t take one to know one, being someone whose always countering dicks and assholes with some dickish habits of his own makes what Linus is doing more evident. If no mental illness, there’s little excuse past him not giving a shit.
“doesn’t behave according to my cultural norms” == “mental illness”
Seriously?
I would really appreciate it if you could stop expecting that your cultural norms have to apply to everyone on the planet.
Im identifying the cultural norm of being an asshole, saying it applies to him at times, and saying the project would benefit if he knocked if off. Im not forcing my norms on anyone.
Your comment is more amusing giving someone with Linus’s norns might just reply with profanity and personsl insults. Then, you might be complaining about that. ;)
Then, you might be complaining about that. ;)
No, I’d just accept that people from different cultures behave differently.
Let’s face it, most people hate getting told they are wrong, regardless of the tone. That’s just how we are as humans.
Taking offense about the tone just seems very US-specific, as they are accustomed to receiving some special superpowers in a discussion by uttering “I’m offended”.
Some of the best feedback I received in my life wouldn’t be considered acceptable by US standards and I simply don’t care – I just appreciate the fact that someone took his time to spell out the technical problems.
Here is a recent example: https://github.com/rust-lang/cargo/pull/5183#issuecomment-381449546
Here is a recent example: https://github.com/rust-lang/cargo/pull/5183#issuecomment-381449546
I’m not familiar with Rust, so maybe I’m missing crucial context, but I read this feedback as firm but unproblematic overall. Compared to Linus’ email:
It could be nicer, sure. But it seemed respectful, in the “you can do what you’re doing but consider these things:” kind of way…? The author event went out of their way to acknowledge being unconstructive.
To my reading it seemed closer to Gary’s email than Linus’.
To put it another way: if Linus wrote emails like this (only shorter, probably) then I don’t think Gary would have written a blog post about it.
(For the record: I’m not American, but I do fall on the gee-it’d-be-great-if-Linus-stopped-abusing-his-colleagues side of this debate.)
I didn’t intend to imply that this was comparable to Linus’ mail, but that people who would be offended by Linus’ writing would also be offended by that comment.
It’s a slippery slide where every honest-to-go comment that expresses real feelings starts getting replaced by “this is an interesting idea, but did you consider …” corporate lingo, even if the code is horribly wrong.
I didn’t intend to imply that this was comparable to Linus’ mail, but that people who would be offended by Linus’ writing would also be offended by that comment.
I understand this is your point, but I think there is no evidence for this. The people complaining about Linus’ conduct are complaining about specific things, and these things are not present in the comment you linked.
Did anyone in the Rust community (generally considered a “nicer” community than kernel development) raise concerns about this comment?
There is a difference between “not overtly nice” and “openly abusive”, even accounting for cultural context.
Then you and I arent that different in how we look at stuff. Ive just layered on top of it a push for project owners to do what’s most effective on social side.
I believe it’s intentional. He does not want to be bothered by nurturing the newbs, so he deters them from going to him directly and forces them to do their learning elsewhere.
These numbers suggest it is a professional endeavor:
Those numbers just break down the professionals involved, and don’t consider the volunteers. If you sum the percentages in that article you get around 40%. Even accomodating for smaller companies that didn’t make the top N companies, that’s a pretty big discrepancy.
Linus himself is working in a professional capacity. He’s employed by the Linux Foundation to work on Linux. The fact he is employed to work on an open source project that he founded doesn’t make that situation non-professional.
Unfortunately, some implementations are interpreting undefined behavior to license arbitrary transformations
I don’t think this is true. The transformations which make use of undefined behaviour for optimisation transformations are not arbitrary, but limited; they cannot change the semantics of a conforming program.
As an example, some implementations will roll-over on arithmetic overflow, but may delete programmer checks for roll-over as an “optimization”. The wording of the Standard does not support this interpretation: “possible undefined behavior ranges from”
- ignoring the situation completely with unpredictable results
The post argues that wrapping integer arithmetic (a natural consequence of 2’s complement representation) while optimising out checks for that roll-over (since if it occurred, it had undefined behaviour) is not the same as “ignoring the situation completely with unpredictable results”. It seems to me that “ignoring the situation completely with unpredictable results” exactly describes this common compiler behaviour, though. That is, it removes the wrapping check as an optimisation based on the assumption that the condition it checks can’t legally be true, and then ignores the case when the wrapping does occur, with the unpredictable result that the subsequent check apparently ceases to evaluate correctly.
I recently discovered how horribly complicated traditional init scripts are whilst using Alpine Linux. OpenRC might be modern, but it’s still complicated.
Runit seems to be the nicest I’ve come across. It asks the question “why do we need to do all of this anyway? What’s the point?”
It rejects the idea of forking and instead requires everything to run in the foreground:
/etc/sv/nginx/run:
#!/bin/sh
exec nginx -g 'daemon off;'
/etc/sv/smbd/run
#!/bin/sh
mkdir -p /run/samba
exec smbd -F -S
/etc/sv/murmur/run
#!/bin/sh
exec murmurd -ini /etc/murmur.ini -fg 2>&1
Waiting for other services to load first does not require special features in the init system itself. Instead you can write the dependency directly into the service file in the form of a “start this service” request:
/etc/sv/cron/run
#!/bin/sh
sv start socklog-unix || exit 1
exec cron -f
Where my implementation of runit (Void Linux) seems to fall flat on its face is logging. I hoped it would do something nice like redirect stdout and stderr of these supervised processes by default. Instead you manually have to create a new file and folder for each service that explicitly runs its own copy of the logger. Annoying. I hope I’ve been missing something.
The only other feature I can think of is “reloading” a service, which Aker does in the article via this line:
ExecReload=kill -HUP $MAINPID
I’d make the argument that in all circumstances where you need this you could probably run the command yourself. Thoughts?
Where my implementation of runit (Void Linux) seems to fall flat on its face is logging. I hoped it would do something nice like redirect stdout and stderr of these supervised processes by default. Instead you manually have to create a new file and folder for each service that explicitly runs its own copy of the logger. Annoying. I hope I’ve been missing something.
The logging mechanism works like this to be stable and only lose logs in case runsv and the log service would die.
Another thing about separate logging services is that stdout/stderror are not necessarily tagged, adding all this stuff to runsv would just bloat it.
There is definitively room for improvements as logger(1) is broken since some time in the way void uses it at the moment (You can blame systemd for that).
My idea to simplify logging services to centralize the way how logging is done can be found here https://github.com/voidlinux/void-runit/pull/65.
For me the ability to exec svlogd(8) from vlogger(8) to have a more lossless logging mechanism is more important than the main functionality of replacing logger(1).
Instead you can write the dependency directly into the service file in the form of a “start this service” request
But that neither solves starting daemons in parallel, or even at all, if they are run in the ‘wrong’ order. Depending on network being setup, for example, brings complexity to each of those shell scripts.
I’m of the opinion that a dsl of whitelisted items (systemd) is much nicer to handle than writing shell scripts, along with the standardized commands instead of having to know which services that accepts ‘reload’ vs ‘restart’ or some other variation in commands - those kind of niceties are gone when the shell scripts are individually an interface each.
The runit/daemontools philosophy is to just keep trying until something finally runs. So if the order is wrong, presumably the service dies if a dependent service is not running, in which case it’ll just get restart. So eventually things progress towards a functioning state. IMO, given that a service needs to handle the services it depends on crashing at any time anyways to ensure correct behaviour, I don’t feel there is significant value in encoding this in an init system. A dependent service could also be moved to running on another machine which this would not work in as well.
It’s the same philosophy as network-level dependencies. A web app that depends on a mail service for some operations is not going to shutdown or wait to boot if the mail service is down. Each dependency should have a tunable retry logic, usually with an exponential backoff.
But that neither solves starting daemons in parallel, or even at all, if they are run in the ‘wrong’ order.
That was my initial thought, but it turns out the opposite is true. The services are retried until they work. Things are definitely paralleled – there is not “exit” in these scripts, so there is no physical way of running them in a linear (non-parallel) nature.
Ignoring the theory: void’s runit provides the second fastest init boot I’ve ever had. The only thing that beats it is a custom init I wrote, but that was very hardware (ARM Chromebook) and user specific.
Dependency resolving on daemon manager level is very important so that it will kill/restart dependent services.
runit and s6 also don’t support cgroups, which can be very useful.
Dependency resolving on daemon manager level is very important so that it will kill/restart dependent services
Why? The runit/daemontools philsophy is just to try to keep something running forever, so if something dies, just restart it. If one restarts a service, than either those that depend on it will die or they will handle it fine and continue with their life.
either those that depend on it will die or they will handle it fine
If they die, and are configured to restart, they will keep bouncing up and down while the dependency is down? I think having dependency resolution is definitely better than that. Restart the dependency, then the dependent.
It’s a computer, it’s meant to do dumb things over and over again. And presumably that faulty component will be fixed pretty quickly anyways, right?
It’s a computer, it’s meant to do dumb things over and over again
I would rather have my computer do less dumb things over and over personally.
And presumably that faulty component will be fixed pretty quickly anyways, right?
Maybe; it depends on what went wrong precisely, how easy it is to fix, etc. We’re not necessarily just talking about standard daemons - plenty of places run their own custom services (web apps, microservices, whatever). The dependency tree can be complicated. Ideally once something is fixed everything that depends on it can restart immediately, rather than waiting for the next automatic attempt which could (with the exponential backoff that proponents typically propose) take quite a while. And personally I’d rather have my logs show only a single failure rather than several for one incident.
But, there are merits to having a super-simple system too, I can see that. It depends on your needs and preferences. I think both ways of handling things are valid; I prefer dependency management, but I’m not a fan of Systemd.
I would rather have my computer do less dumb things over and over personally.
Why, though? What’s the technical argument. daemontools (and I assume runit) do sleep 1 second between retries, which for a computer is basically equivalent to it being entirely idle. It seems to me that a lot of people just get a bad feeling about running something that will immediately crash.
Maybe; it depends on what went wrong precisely, how easy it is to fix, etc. We’re not necessarily just talking about standard daemons - plenty of places run their own custom services (web apps, microservices, whatever).
What’s the distinction here? Also, with microservices the dependency graph in the init system almost certainly doesn’t represent the dependency graph of the microservice as it’s likely talking to services on other machines.
I think both ways of handling things are valid
Yeah, I cannot provide an objective argument as to why one should prefer one to the other. I do think this is a nice little example of the slow creep of complexity in systems. Adding a pinch of dependency management here because it feels right, and a teaspoon of plugin system there because we want things to be extensible, and a deciliter of proxies everywhere because of microservices. I think it’s worth taking a moment every now and again and stepping back and considering where we want to spend our complexity budget. I, personally, don’t want to spend it on the init system so I like the simple approach here (especially since with microservies the init dependency graph doesn’t reflect the reality of the service anymore). But as you point out, positions may vary.
Why, though? What’s the technical argument
Unnecessary wakeup, power use (especially for a laptop), noise in the logs from restarts that were always bound to fail, unnecessary delay before restart when restart actually does become possible. None of these arguments are particularly strong, but they’re not completely invalid either.
We’re not necessarily just talking about standard daemons …
What’s the distinction here?
I was trying to point out that we shouldn’t make too many generalisations about how services might behave when they have a dependency missing, nor assume that it is always ok just to let them fail (edit:) or that they will be easy to fix. There could be exceptions.
Perhaps wandering off topic, but this is a good way to trigger even worse cascade failures.
eg, an RSS reader that falls back to polling every second if it gets something other than 200. I retire a URL, and now a million clients start pounding my server with a flood of traffic.
There are a number of local services (time, dns) which probably make some noise upon startup. It may not annoy you to have one computer misbehave, but the recipient of that noise may disagree.
In short, dumb systems are irresponsible.
But what is someone supposed to do? I cannot force a million people using my RSS tool not to retry every second on failure. This is just the reality of running services. Not to mention all the other issues that come up with not being in a controlled environment and running something loose on the internet such as being DDoS’d.
I think you are responsible if you are the one who puts the dumb loop in your code. If end users do something dumb, then that’s on them, but especially, especially, for failure cases where the user may not know or observe what happens until it’s too late, do not ship dangerous defaults. Most users will not change them.
In this case we’re talking about init systems like daemontools and runit. I’m having trouble connecting what you’re saying to that.
N.B. bouncing up and down ~= polling. Polling always intrinsically seems inferior to event based systems, but in practice much of your computer runs on polling perfectly fine and doesn’t eat your CPU. Example: USB keyboards and mice.
USB keyboard/mouse polling doesn’t eat CPU because it isn’t done by the CPU. IIUC the USB controller generates an interrupt when data is received. I feel like this analogy isn’t a good one (regardless). Checking a USB device for a few bytes of data is nothing like (for example) starting a Java VM to host a web service which takes some time to read its config and load its caches only to then fall over because some dependency isn’t running.
Sleep 1 and restart is the default. It is possible to have another behavior by adding a ./finish script to the ./run script.
I really like runit on void. I do like the simplicity of SystemD target files from a package manager perspective, but I don’t like how systemd tries to do everything (consolekit/logind, mounting, xinet, etc.)
I wish it just did services and dependencies. Then it’d be easier to write other systemd implementations, with better tooling (I’m not a fan of systemctl or journalctl’s interfaces).
You might like my own dinit (https://github.com/davmac314/dinit). It somewhat aims for that - handle services and dependencies, leave everything else to the pre-existing toolchain. It’s not quite finished but it’s becoming quite usable and I’ve been booting my system with it for some time now.
I’d make the argument that in all circumstances where you need this you could probably run the command yourself. Thoughts?
It’s nice to be able to reload a well-written service without having to look up what mechanism it offers, if any.
Runits sv(8) has the reload command which sends SIGHUP by default.
The default behavior (for each control command) can be changed in runit by creating a small script under $service_name/control/$control_code.
I was thinking of the difference between ‘restart’ and ‘reload’.
Reload is only useful when:
I have not been in environments where this is necessary, restart has always done me well. I assume that the primary use cases are high-uptime webservers and databases.
My thoughts were along the lines o: If you’re running a high-uptime service, you probably don’t care about the extra effort of writing ‘killall -HUP nginx’ than ‘systemctl reload nginx’. In fact I’d prefer to do that than take the risk of the init system re-interpreting a reload to be something else, like reloading other services too, and bringing down my uptime.
I hoped it would do something nice like redirect stdout and stderr of these supervised processes by default. Instead you manually have to create a new file and folder for each service that explicitly runs its own copy of the logger. Annoying. I hope I’ve been missing something.
I used to use something like logexec for that, to “wrap” the program inside the runit script, and send output to syslog. I agree it would be nice if it were builtin.
No, you don’t need C aliasing to obtain vector optimization for this sort of code. You can do it with standards-conforming code via memcpy(): https://godbolt.org/g/55pxUS
Wow, it’s actually completely optimizing out the memcpy()? While awesome, that’s the kind of optimization I hate to depend on. One little seemingly inconsequential nudge and the optimizer might not be able to prove that’s safe, and suddenly there’s an additional O(n) copy silently going on.
memset/memcpy get optimized out a lot, hence libraries making things like this: https://monocypher.org/manual/wipe
Actually it’s not optimizing it out, it’s simply allocating the auto array into SIMD registers. You always must copy data into SIMD registers first before performing SIMD operations. The memcpy() code resembles a SIMD implementation more than the aliasing version.
You can - and thanks for the illustration - but the memcpy is antethical to the C design paradigm in my always humble opinion. And my point was not that you needed aliasing to get the vector optimization, but that aliasing does not interfere with the vector optimization.
I’m sorry but the justifications for your opinion no longer hold. memcpy() is the only unambiguous and well-defined way to do this. It also works across all architectures and input pointer values without having to worry about crashes due to misaligned accesses, while your code doesn’t. Both gcc and clang are now able to optimize away memcpy() and auto vars. An opinion here is simply not relevant, invoking undefined behavior when it increases risk for no benefit is irrational.
Au contraire. As I showed, C standard does not need to graft on a clumsy and painful anti-alias mechanism and programmers don’t need to go though stupid contortions with allocation of buffers that disappear under optimization , because the compiler does not need it. My code does’t have alignment problems. The justification for pointer alias rules is false. The end.
There are plenty of structs that only contain shorts and char, and in those cases employing aliasing as a rule would have alignment problems while the well-defined version wouldn’t. It’s not the end, you’re just in denial.
In those cases, you need to use an alignment modifier or sizeof. No magic needed. There is a reason that both gcc and clang have been forced to support -fnostrict_alias and now both support may_alias. The memcpy trick is a stupid hack that can easily go wrong - e.g one is not guaranteed that the compiler will optimize away the buffer, and a large buffer could overflow stack. You’re solving a non-problem by introducing complexity and opacity.
In what world is memcpy() magic and alignment modifiers aren’t? memcpy() is an old standard library function, alignment modifiers are compiler-specific syntax extensions.
memcpy() isn’t a hack, it’s always well-defined while aliasing can never be well-defined in all cases. Promoting aliasing as a rule is like promoting using the equality operator between floats – it can never work in all cases, though it may be possible to define meaningful behavior in specific cases. Promoting aliasing as a rule is promoting the false idea that C is a thin layer above contemporary architectures, it isn’t. Struct memory is not necessarily the same as array memory, not every machine that C supports can deference an int32 inside of an int64, not every machine can deference an int32 at any offset. Do you want C to die with x86_64 or do you want C to live?
Optimizations don’t need to be guaranteed when the code isn’t even correct in the first place. First make sure your code is correct, then worry about optimizing. You talk about alignment modifiers but they are rarely used, and usually they are used after a bug has already occurred. Code should be correct first, and memcpy() is the rule we should be promoting since it is always correct. Optimizers can meticulously add aliasing for specific cases once a bottleneck has been demonstrated. You’re solving a non-problem by indulging in premature optimization.
Do you want C to die with x86_64 or do you want C to live?
Heh I bet you’d get quite varied answers to this one here
The memcpy hack is a hack because the programmer is supposed to write a copy of A to B and then back to A and rely on the optimizer to skip the copy and delete the buffer. So unoptimized the code may fault on stack overflows for data structures that exist only to make the compiler writers happier. And with a novel architecture, if the programmer wants to take advantage of a new capability - say 512 bit simd instructions , she can wait until the compiler has added it to its toolset and be happy with how it is used.
As for this not working in all cases: Big deal. C is not supposed to hide those things. In fact, the compiler has no idea if the memory is device memory with restrictions on how it can be addressed or memory with a copy on write semantics or …. You want C to be Pascal or Java and then announce that making C look like Pascal or Java can only be solved at the expense of making C unusable for low level programming. Which programming communities are asking for such insulation? None. C works fine on many architectures. C programmers know the difference between portable and non-portable constructs. C compilers can take advantage of SIMD instructions without requiring C programmers to give up low level memory access - one of the key advantages of programming in C. Basically, people who don’t like C are trying to turn C into something else and are offended that few are grateful.
You aren’t writing a copy of a buffer back and forth. In your example, you are reducing an encoding of a buffer into a checksum. You are only copying one way, and that is for the sake of normalization. All SIMD code works that way, you always must copy into SIMD registers first before doing SIMD operations. In your example, the aliasing code doesn’t resemble SIMD code both syntactically and semantically as much the memcpy() code does and in fact requires a smarter compiler to transform.
The chance of overflowing the stack is remote, since stacks now automatically grow and structs tend to be < 512 bytes, but if that is a legitimate concern you can do what you already do to avoid that situation, either use a static buffer (jeopardizing reentrancy) or use malloc().
By liberally using aliasing, you are assuming a specific implementation or underlying architecture. My point is that in general you cannot assume arbitrary internal addresses of a struct can always be dereferenced as int32s, so in general that should not be practiced. In specific cases you can alias, but those are the exceptions not the rule.
All copies on some architectures reduce to: load into register, store from register. So what? That is why we have a high level language which can translate *x = *y efficiently. The pointer alias code directly shows programmer intent. The memcpy code does not. The “sake of normalization” is just another way of saying “in order to cooperate with the fiction that the inconsistency in the standard produces”.
In many contexts, stacks do NOT automatically grow.Again, C is not Java. OS code, drivers, embedded code, even many applications for large systems - all need control over stack size. Triggering stack growth may even turn out to be a security failure for encryption which is almost universally written in C because in C you can assure time invariance (or you could until the language lawyers decided to improve it). Your proposal that programmers not only use a buffer, but use a malloced buffer, in order to allow the optimizer (they hope) not to use it, is ridiculous and is a direct violation of the C model.
“3. C code can be non-portable. Although it strove to give programmers the opportunity to write truly portable programs, the Committee did not want to force programmers into writing portably, to preclude the use of C as a “high-level assembler;” the ability to write machine-specific code is one of the strengths of C. It is this principle which largely motivates drawing the distinction between strictly conforming program and conforming program.” ( http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2021.htm)
Give me an example of an architecture where a properly aligned structure where sizeof(struct x)%sizeof(int32) == 0 cannot be accessed by int32s ? Maybe the itanium, but I doubt it. Again: every major OS turns off strict alias in the compilers and they seem to work. Furthermore, the standard itself permits aliasing via char* (as another hack). In practice, more architectures have trouble addressing individual bytes than addressing int32s.
I’d really like to see more alias analysis optimization in C code (and more optimization from static analysis) but this poorly designed, badly thought through approach we have currently is not going to get us there. To solve any software engineering problem, you have to first understand the use cases instead of imposing some synthetic design.
Anyways off the airport. Later. vy
I’m willing to agree with you that the aliasing version more clearly shows intent in this specific case but then I ask, what do you do when the code aliases a struct that isn’t properly aligned? There are a lot of solutions but in the spirit of C, I think the right answer is that it is undefined.
So I think what you want is the standard to define one specific instance of previously undefined behavior. I think in this specific case, it’s fair to ask for locally aliasing an int32-aligned struct pointer to an int32 pointer to be explicitly defined by the standards committee. What I think you’re ignoring, however, is all the work the standards committee has already done to weigh the implications of defining behavior like that. At the very least, it’s not unlikely that there will be machines in the future where implementing the behavior you want will be non-trivial. Couple that with the burden of a more complex standard. So maybe the right answer to maximize global utility is to leave it undefined and to let optimization-focused coders use implementation-defined behavior when it matters but, as I’m arguing, use memcpy() by default. I tend to defer to the standards committees because I have read many of their feature proposals and accompanying rationales and they are usually pretty thorough and rarely miss things that I don’t miss.
Everybody arguing here loves C. You shouldn’t assume the standards committee is dumb or that anyone here wants C to be something it’s not. As much as you may think otherwise, I think C is good as it is and I don’t want it to be like other languages. I want C to be a maximally portable implementation language. We are all arguing in good faith and want the best for C, we just have different ideas about how that should happen.
what do you do when the code aliases a struct that isn’t properly aligned? There are a lot of solutions but in the spirit of C, I think the right answer is that it is undefined.
Implementation dependent.
Couple that with the burden of a more complex standard.
The current standard on when an lvalue works is complex and murky. Wg14 discussion on how it applies shows that it’s not even clear to them. The exception for char pointers was hurriedly added when they realized they had made memcpy impossible to implement. It seems as if malloc can’t be implemented in conforming c ( there is no method of changing storage type to reallocate it)
C would benefit from more clarity on many issues. I am very sympathetic to making pointer validity more transparent and well defined. I just think the current approach has failed and the c89 error has not been fixed but made worse. Also restrict has been fumbled away.
The chance of overflowing the stack is remote, since stacks now automatically grow and structs tend to be < 512 bytes, but if that is a legitimate concern you can
… just copy the ints out one at a time :) https://godbolt.org/g/g8s1vQ
The compiler largely sees this as a (legal) version of the OP’s code, so there’s basically zero chance it won’t be optimised in exactly the same way.
You don’t need a large buffer. You can memcpy the integers used for the calculation out one at a time, rather than memcpy’ing the entire struct at once.
Your designation of using memcpy as a “stupid hack” is pretty biased. The code you posted can go wrong, legitimately, because of course it invokes undefined behaviour, and is more of a hack than using memcpy is. You’ve made it clear that you think the aliasing rules should be changed (or shouldn’t exist) but this “evidence” you’ve given has clearly been debunked.
Funny use of “debunked”. You are using circular logic. My point was that this aliasing method is clearly amenable to optimization and vectorization - as seen. Therefore the argument for strict alias in the standard seems even weaker than it might. Your point seems to be that the standard makes aliasing undefined so aliasing is bad. Ok. I like your hack around the hack. The question is: why should C programmers have to jump through hoops to avoid triggering dangerous “optimizations”? The answer: because it’s in the standard, is not an answer.
Funny use of “debunked”. You are using circular logic. My point was that this aliasing method is clearly amenable to optimization and vectorization - as seen
You have shown a case where, if the strict aliasing rule did not exist, some code could [edit] still [/edit] be optimised and vectorised. That I agree with, though nobody claimed that the existence of the strict aliasing rule was necessary for all optimisation and vectorisation, so it’s not clear what you do think this proves. Your title says that the optimisation is BECAUSE of aliasing, which is demonstrably false. Hence, debunked. Why is that “funny”? And how is your logic any less circular then mine?
The question is: why should C programmers have to jump through hoops to avoid triggering dangerous “optimizations”?
Characterising optimisations as “dangerous” already implies that the code was correct before the optimisation was applied and that the optimisation can somehow make it incorrect. The logic you are using relies on the code (such as what you’ve posted) being correct - which it isn’t, according to the rules of the language (which, yes, are written in a standard). But why is using memcpy “jumping through hoops” whereas casting a pointer to a different type of pointer and then de-referencing it not? The answer is, as far as I can see, because you like doing the latter but you don’t like doing the former.
Note this post was from 2015. It’s somewhat wrong: signalfd isn’t useless at all, and the criticisms boil down to:
So you have to be very careful to reset any masked signals before starting a child process, and unfortunately you need to do this yourself
… which, sure, is potentially annoying, but does not render the call useless. Also:
There’s another problem with masking signals, which is that standard UNIX signals are permitted to coalesce when they queue
This isn’t a problem with masking signals, it’s a general problem (I’m not even sure if “problem” might be too strong a word regardless). Signals can coalesce even if not masked. There are cases where signals aren’t supposed to coalesce (according to POSIX, by my interpretation) and which Linux doesn’t handle correctly, but I’m not sure if other kernels do any better. But in general, you aren’t supposed to care if you receive some signal once or twice or more.
But in general, you aren’t supposed to care if you receive some signal once or twice or more.
The OP talks about why they believe this is important, because some signals carry additional information (in the case of SIGCHLD). Can you say how one should not care about that?
(I only have a surface level understanding of signals and I’m not knowledgeable about all of the dark corners. I’m mostly just trying to understand your critique here.)
Well, I’m not sure but the case of SIGCHLD may be one that POSIX technically implies cannot be coalesced. However, even if it can be coalesced, you can generally retrieve the same information as was sent with the signal via other means - in this case, the wait() family of system calls. You can use waitpid function with the NOHANG option to keep checking for terminated children, and you can (and should) do this each time you get a SIGCHLD, until it returns ECHILD.
Incidentally, what POSIX does say (from 2.4.1 Signal Generation and Delivery):
The determination of which action is taken in response to a signal is made at the time the signal is delivered, allowing for any changes since the time of generation. This determination is independent of the means by which the signal was originally generated. If a subsequent occurrence of a pending signal is generated, it is implementation-defined as to whether the signal is delivered or accepted more than once in circumstances other than those in which queuing is required
Specific cases which require queuing, which I could find, are:
Per-process timers may be created that notify the process of timer expirations by queuing a realtime extended signal.
In 2.4.2:
When a signal is generated by the sigqueue() function or any signal-generating function that supports the specification of an application-defined value, the signal shall be marked pending and, if the SA_SIGINFO flag is set for that signal, the signal shall be queued to the process along with the application-specified signal value. Multiple occurrences of signals so generated are queued in FIFO order. It is unspecified whether signals so generated are queued when the SA_SIGINFO flag is not set for that signal.
The rationale section for the timer_create() states:
The specified timer facilities may deliver realtime signals (that is, queued signals) on implementations that support this option. Since realtime applications cannot afford to lose notifications of asynchronous events, like timer expirations or asynchronous I/O completions, it must be possible to ensure that sufficient resources exist to deliver the signal when the event occurs.
So, this implies that aynchronous I/O completion as well as timer timeout must be able to queue a signal without failing. That’s about the only specific mentions of this I can find. Note that asynchronous I/O uses a “struct sigevent” argument just as timer_create() does. Note also the implication that non-realtime signals (of which SIGCHLD is one) do not need to support queuing.
About SIGCHLD, in 2.4.3 Signal Actions:
When a process stops, a SIGCHLD signal shall be generated for its parent process, unless the parent process has set the SA_NOCLDSTOP flag.
Here it says “generated” rather than “queued”, implying that SIGCHLD can be coalesced.
Interesting. I had a similar thought (with respect to wait) when reading the OP, but wasn’t sure. It would be a nice follow up question for the author. Maybe if you know which child was terminated, you can be more precise with which child you attempt to reap? Otherwise, if you have to try many of them, that might have performance implications?
Maybe if you know which child was terminated, you can be more precise with which child you attempt to reap? Otherwise, if you have to try many of them, that might have performance implications?
You should just set the pid to -1 and let the system tell you which child terminated. I don’t think there are any serious performance implications; the main one is probably due to the extra system call (because you do one waitpid call for each terminated child, and another one to find out that there are no more terminated children). In any case, this wouldn’t be a problem that’s unique to using signalfd - as I mentioned (and as OP article admits in an edit at the end) signals can be coalesced even if they’re not masked.
What I don’t get about UB in these discussions: isn’t it possible to get the tooling to bail on optimizations that cross a UB barrier?
Like OK you might dereference a null pointer here, this is UB, but don’t take that as a “well let’s just infer that anything’s doable here”. Instead take a “the result is unknowable, so don’t bother optimizing”.
Is there some sort of reason UB-ness can’t be taken into account in these optimization paths? Is there some… fundamental thing about C optimization that requires this tradeoff?
I think the problem with this idea is that many optimisations rely on something being undefined behaviour.
In any case, the result is not just “unknowable” - there is no result. If I have two signed integer values for which the sum exceeds the largest possible signed integer value, and I add them together, there is not some unknown value that is the result. There is in fact a known value which is the result but which cannot be represented by the type which the result of the operation needs to be expressed in. What does it mean to “not bother optimising” here? Yes, the compiler could decide that the result of such an operation should be an indeterminate value, rather than undefined behaviour, but that will inhibit certain optimisations that could be possible even in cases where the compiler cannot be sure that the indeterminate value will actually ever result.
People keep saying this, but I still have not seen a well worked out example of how UB produces real optimizations let alone a respectable study.
But do “optimizations” actually produce speedups for benchmarks? Despite frequent claims by compiler maintainers that they do, they rarely present numbers to support these claims. E.g., Chris Lattner (from Clang) wrote a three-part blog posting8 about undefined behaviour, with most of the first part devoted to “optimizations”, yet does not provide any speedup numbers. On the GCC side, when asked for numbers, one developer presented numbers he had from some unnamed source from IBM’s XLC compiler, not GCC; these numbers show a speedup factor 1.24 for SPECint from assuming that signed overflow does not happen (i.e., corresponding to the difference between -fwrapv and the default on GCC and Clang). Fortunately, Wang et al. [WCC+12] performed their own experiments compiling SPECint 2006 for AMD64 with both gcc-4.7 and clang-3.1 with default “optimizations” and with those “optimizations” disabled that they could identify, and running the results on a on a Core i7-980. They found speed differences on two out of the twelve SPECint benchmarks: 456.hmmer exhibits a speedup by 1.072 (GCC) or 1.09 (Clang) from assuming that signed overflow does not happen. For 462.libquantum there is a speedup by 1.063 (GCC) or 1.118 (Clang) from assuming that pointers to different types don’t alias. If the other benchmarks don’t show a speed difference, this is an overall SPECint improvement by a factor 1.011 (GCC) or 1.017 (Clang) from “optimizations”. http://www.complang.tuwien.ac.at/kps2015/proceedings/KPS_2015_submission_29.pdf
I still have not seen a well worked out example of how UB produces real optimizations
Some examples here (found by a search just now): https://kristerw.blogspot.com/2016/02/how-undefined-signed-overflow-enables.html
SPECint isn’t necessarily a good way to assess the affect of these particular optimisations.
OMG! You find those compelling?
Compelling for what? They are examples of how UB produces real optimisations, something that you said you hadn’t seen.
They don’t show any real optimization at all. They are micro-optimizations that produce incorrectly operating code.
When the compiler assumes multiplication is commutative and at the same time produces code that has non-commutative multiplication, it’s just terrible engineering. There’s no excuse for that.
As steveklabnik says above “UB has a cost, in that it’s a footgun. If you don’t get much optimization benefit, then you’re introducing a footgun for no benefit.”
They don’t show any real optimization at all. They are micro-optimizations
This is self-contradictory, unless by “real optimization” you mean “not-micro-optimization”, in which case you are just moving the goal posts.
that produce incorrectly operating code.
This is plainly false, except if you you mean that the code doesn’t behave as you personally think it should behave in certain cases, even though the language standard clearly says that the behaviour is undefined in those cases. In which case, sure, though I’m not sure why you think your own opinion of what the language semantics should be somehow trumps that of the committee responsible for actually deciding them.
Real optimization means substantive. Micro-optimizations like “ we take this arithmetic calculation and replace it with a faster one that produces the wrong answer” are ridiculous.
I’m totally uninterested in this legalist chopping or this absurd deference to a committee which is constantly debating and revisiting its own decisions. It is just absurd for people to argue that the C Standard wording can’t be criticized.
“ we take this arithmetic calculation and replace it with a faster one that produces the wrong answer”
And there you have the core of the problem.
You say “wrong answer” implying there is A Right One, as defined by the current C standard sometimes there are no Right Answers to certain operations, only “undefined”.
So Define your Right One, proclaim your New Standard, implement a Conforming compiler and everybody is happy.
I’m entirely on board with you saying the C committee lacks guts to abandon ancient marginal CPU’s and just define things…..
So I look forward to programming in VyoDaiken C.
Alas… I warn you though… you will either end up with something that just isn’t C as we know, or you will still have large areas “undefined”.
Machine integers are not a field. Anything we call “arithmetic” on them is defined arbitrarily (usually to be kind of like arithmetic on the mathematical integers, if you ignore the fact that they’re finite), so in fact there’s not a right answer—rather, there are several reasonable ones.
You could define them to behave however the machine behaves; but this is obviously not consistent from machine to machine. You could define them in a particular way (two’s-complement with signaling overflow, perhaps), but if this definition doesn’t match up to what the target machine actually does, you have potentially expensive checks and special-casing to shore up the mismatch, and you picked it arbitrarily anyway. (Case in point: did you agree with my suggestion? Do you think you could convince all of a room full of C programmers to?)
There’s a reasonable argument to be made that C should have said integer overflow is implementation-defined rather than undefined, but it’s hard to claim there’s a single obviously correct implementation-independent definition it should have adopted.
My suggestion is that when you feel like you should tell me something I obviously already know, you should think about what point you are trying to make.
C has a great deal of room for machine and implementation dependent behavior - necessarily. Implementation defined would have prevented surprises.
arithmetic.
Sounds like you’ll love Scheme and Ruby then ;-)
They have this Numeric Tower concept that does The Right Thing by arithmetic.
ps: Have a look at this, the ultimate Good News, Bad News story and cry…
https://lobste.rs/s/azj8ka/proposal_signed_integers_are_two_s#c_xo9ebu
…if we ever meet in person, we can drown our sorrows together and weep.
It is just absurd for people to argue that the C Standard wording can’t be criticized.
Just to be clear, I’m not arguing that (and I don’t think anybody here is doing so). However if you continue to dismiss logical arguments as “legalist chopping”, and suggest that we all defer to you instead of the language committee, I think the discussion will have to end here.
This is plainly false, except if you you mean that the code doesn’t behave as you personally think it should behave in certain cases, even though the language standard clearly says that the behaviour is undefined in those cases
I have no idea how to interpret that except as an argument that one is not permitted to question the wisdom of either the standard or the interpretation chosen by compiler writers. As several people on the standards bodies have pointed out, there is certainly no requirement in the standard that compiler developers pick some atrocious meaning. “Optimizing” code that produces a correct result to produce code that does not, produces incorrectly operating code. You can claim that the unoptimized code was in violation of the standard, but it worked.
Specifically, the example you pointed to starts off by “optimizing” C code that calculates according the mathematical rules and replaces it with code that computes the wrong answer. 2s complement fixed length multiplication is not commutative. Pretending that it is commutative is wrong.
I have no idea how to interpret that except as an argument that one is not permitted to question the wisdom of either the standard or the interpretation chosen by compiler writers.
I do not see how pointing out what the language standard does say is the same as saying that you are not permitted to question the wording of the standard. Good day.
No. There is nothing about C that requires compilers behave irrationally. The problem is that (1) the standard as written provides a loophole that can be interpreted as permitting irrational compilation and (2) the dominant free software compilers are badly managed ( as someone pointed out, it’s not as if they have paying customers who will select another product). There’s a great example: a GCC UB “optimization” was introduced that broke a lot of code by assuming an out of bound access to an array could not happen. It also broke a benchmark - creating an infinite loop. The GCC developers specifically disabled the optimization for the benchmark but not for other programs. The standard does not require this kind of horrible engineering, but it doesn’t forbid it.
There’s a great example: a GCC UB “optimization” was introduced that broke a lot of code by assuming an out of bound access to an array could not happen
You know, I’m quite happy with that.
Every new gcc release comes out with a lot of new optimizations and a lot of new warnings.
Every time I go through our code base cleaning up the warnings, often fixing bugs as I go, I’m ok with that.
Sometimes we only find the borken code on the unit tests or test racks. I’m OK with that. That code was fragile and was going to bite us in the bum sooner or later.
Old working code maybe working, but as soon as you’re into undefined behaviour it’s fragile, and changes in optimization are only one of many ways which can make it break.
In my view, the sooner I find it and fix it the better.
I deliberately wiggle the optimization settings and run tests. If working code breaks… I fix.
I run stuff on CPU’s with different number of cores… that’s a really good one for knocking loose undefined behaviour bugs.
Sure my code “Works for Me” on my desk.
But I don’t control the systems where other people run my code. Thus I want the behaviour of my code to always be defined.
I run stuff on CPU’s with different number of cores… that’s a really good one for knocking loose undefined behaviour bugs.
What do you mean by that? Just something like 2 vs 4 or as different as vs a 3 or 6 core that’s not a multiple of 2? Now that you mention it, I wonder if other interaction bugs could show up in SoC’s with mix of high-power and low-energy cores running things interacting. Maybe running on them could expose more bugs, too.
The C standard as a bunch of fine grained wording around volatile and atomic and sequence points with undefined behaviour if you understand them wrong.
Threading is dead easy…. any fool can (and many fools do) knock up a multithreaded program in a few minutes.
Getting threading perfectly right seems to be extra-ordinarily hard and I haven’t met anybody who can, nor seen anybodies code that is perfect.
So any fools code will “Work For Them” on their desk… now deploy it on a different CPU with a different number of cores, with a different load ….
….and the bugs come crawling out in all directions.
Actually gcc has been remarkably good about this… The last few releases I have dealt with there has a been a pairing of new warnings with new optimization passes.
Which makes sense, because a new optimization pass tends to mean the compiler has learn more about the structure of your code.
Where they have been gravely deficient is with function attribute…. there be dragons and unlikely to get a warning if you screw up.
Curiously enough they will suggest attributes if you haven’t got one… but won;t warn you if you have the wrong one. :-(
Bottom line. Beware of gcc function attributes. They are tricksy and easy to screw up.
Are you reading any of the critiques of UB or even the defenses? The core issue is silent transformations of code - e.g. a silent deletion of a null pointer check because UB “can’t happen”.
There is no fundamental reason. It’s just a lot of work and no optimizing compiler is implemented that way right now.
It looks like part of the problem is an engineering flaw in the compilers where there are multiple optimization passes that can produce errors in combination.
This whole area has been exercising my brain recently.
As much as I hate the C standard committee’s lack of courage in defining behaviour, as often a simple decision, even if controversial will resolve it.
However, here is one that is sort of unresolvable.
What behaviour should a program have that indexes beyond the bounds of an array?
There is no way a standard can prescribe what the result will be.
It must be undefined.
So the obvious thing to do, as Pascal did, is do bounds checking and have a defined behaviour.
That imposes substantial runtime costs in CPU and memory, so users do switch it off…..
So what should the behaviour be?
One reasonable assumption a compiler writer can make is that there is no way the programmer can intend to index out of bounds, so I can assume that the index is less than the bound and generate machine code accordingly.
You might say, these newfangled optimizations are the problem… no they aren’t.
Compilers have been relaying out data in memory according what they think best for decades.
Where this whole thing is driving me nuts is around asserts. See this thread I started here… https://readlist.com/lists/gcc.gnu.org/gcc-help/7/39051.html
Asserts, if they are compiled in, tell the compiler (without getting involved in UB optimizations) that if expression is false, then everything down stream is not reachable…. so it analyzes under the assumption the expression is true.
However it doesn’t attempt to warn you if it finds a code path where the expression is false, and completely redoes it’s optimization without that assumption if you compile the assert out.
What behaviour should a program have that indexes beyond the bounds of an array?
There is no way a standard can prescribe what the result will be.
It must be undefined.
This gets to the heart of the matter, I think. Part of the issue is people confuse “the language standard” with “what compilers do”. The language says it is undefined behaviour for an out-of-bounds array access to occur, or for signed integers to have their value range exceeded, but there’s no reason why compilers can’t generate code which will explicitly detect these situations and throw out an error message (and terminate).
So why don’t compilers generally do that by default? Because C is used in performance critical code where these checks have a cost which is considered significant. And, despite the claims in this article, there are cases where trivial optimisations such as assuming that signed integer arithmetic operations won’t overflow can lead to significant speedups (it’s just that these cases are not something as trivial as a single isolated loop).
If you do value deterministic behaviour on program error and are willing to sacrifice some performance to get it, the obvious solution is to use a language which provides that, i.e. don’t use C. But that’s not a justification to criticise the whole concept of undefined behaviour in C.
There is a false choice between inefficient code with run time bounds checking and compiler “optimizations” that break working code. I love the example in http://www.complang.tuwien.ac.at/kps2015/proceedings/KPS_2015_submission_29.pdf where the GCC developers introduce a really stupid UB based “optimization” that broke working code and then found, to their horror, that it broke a benchmark. So they disabled it for the benchmark.
And, despite the claims in this article, there are cases where trivial optimisations such as assuming that signed integer arithmetic operations won’t overflow can lead to significant speedups (it’s just that these cases are not something as trivial as a single isolated loop).
Great. Let’s see an example.
But that’s not a justification to criticise the whole concept of undefined behaviour in C.
I think this attitude comes from a fundamental antipathy to the design of C or a basic misunderstanding of how it is used. C is not Java or Swift - and not because its designers were stupid or mired in archaic technology.
There is a false choice between inefficient code with run time bounds checking and compiler “optimizations” that break working code
Optimisations don’t break working code. They cause broken code to have different observable behaviour.
And, despite the claims in this article, there are cases where trivial optimisations such as assuming that signed integer arithmetic operations won’t overflow can lead to significant speedups (it’s just that these cases are not something as trivial as a single isolated loop).
Great. Let’s see an example.
I don’t have a code example to hand, and as I said they’re not trivial, but that doesn’t mean it’s not true. Since it can eliminate whole code paths, it can affect the efficacy of for example value range propagation, affect inlining decisions, and have other flow-on effects.
I think this attitude comes from a fundamental antipathy to the design of C or a basic misunderstanding of how it is used
I disagree.
Optimisations don’t break working code. They cause broken code to have different observable behaviour.
That’s a legalistic answer. The code worked as expected and produced the correct result. The “optimization” caused it to misfunction.
I don’t have a code example to hand
Apparently nobody does. So the claimed benefit is just hand waving.
I disagree.
The thinking of Lattner is indicative. He agrees that compiler behavior using the UB loophole makes C a minefield. His solution is to advocate Swift. People who are hostile to the use of C should not be making these decisions.
That’s a legalistic answer.
Alas, in absence of “legalistic answers”, the only definition of C is either…
An implementation of C is a valid implementation iff every program in a Blessed Set of programs compile and runs successfully and outputs exactly the same values.
or
An implementation of C is a valid implementation iff, every program that compiles and runs successfully on The One True Blessed C Compiler, compiles and runs and outputs exactly the same values AND every program that fails to compile on The One True Blessed Compiler, fails to compile on the candidate compiler.
What sort of C are you envisioning?
Those may be appropriate ways to progress, but that is a different language and probably should be called something other than C.
You can disagree all you want, but you also seem to be unable to produce any evidence.
I have high confidence that I could produce, given some time, an example of code which compiled to say 20 instructions if integer overflow were defined and just 1 or 2 otherwise, and probably more by abusing the same technique repeatedly, but you might then claim it wasn’t representative of “real code”. And then if I really wanted to satisfy you I would have to find some way to trawl through repositories to identify some piece of code that exhibited similar properties. It’s more work than I care to undertake to prove my point here, and so I suppose you have a right to remain skeptical.
On the other hand, I have at least explained (even if only very briefly) how small optimisations such as assuming that integer arithmetic operations won’t overflow could lead to significant differences in code generation, beyond simple exchanging of instructions. You’ve given no argument as to why this couldn’t be the case. So, I don’t think there’s any clearly stronger argument on either side.
I have high confidence that I could produce, given some time, an example of code which compiled to say 20 instructions if integer overflow were defined and just 1 or 2 otherwise
I have no confidence of this, and it would be a completely uninteresting optimization in any case.
On the other hand, I have at least explained (even if only very briefly) how small optimisations such as assuming that integer arithmetic operations won’t overflow could lead to significant differences in code generation, beyond simple exchanging of instructions.
Not really. You are omitting a single instruction that almost certainly costs no cycles at all in a modern pipelined processor. Balance that against putting minefields into the code - and note there is no way in C to check for this condition. The tradeoff is super unappealing.
Not really. You are omitting a single instruction
No, I was not talking about omitting a single instruction.
With assert(), you are telling the compiler that at this point, this is true. The compiler is trusting your assertion of the truth at that point.
Also, if you compile with -DNDEBUG -O3 you will get the warning:
[spc]saltmine:/tmp>gcc -std=c99 -Wall -Wextra -pedantic -DNDEBUG -O3 c.c
c.c: In function ‘main’:
c.c:7:20: warning: ‘*((void *)&a+10)’ is used uninitialized in this function [-Wuninitialized]
c.c:13:8: note: ‘a’ was declared here
[spc]saltmine:/tmp>gcc -std=c99 -Wall -Wextra -pedantic -O3 c.c
[spc]saltmine:/tmp>
No, that is a meaningless statement.
The compiler doesn’t even see an assert statement, let alone “trust it”.
It is a macro that gets expanded to “plain old code” at preprocessor time, so depending on NDEBUG settings it expands either to something like if(!(exp))abort() or nothing.
What the compiler does trust is the __attribute_((__noreturn__)) on the abort() function.
My file:
#include <assert.h>
int foo(int x)
{
assert(x >= 0);
return x + 5;
}
My file after running it through the C preprocessor:
# 1 "x.c"
# 1 "/usr/include/assert.h"
# 12
# 15
#ident "@(#)assert.h 1.10 04/05/18 SMI"
# 21
# 26
extern void __assert(const char *, const char *, int);
# 31
# 35
# 37
# 44
# 46
# 52
# 63
# 2 "x.c"
int foo(int x)
{
( void ) ( ( x >= 0 ) || ( __assert ( "x >= 0" , "x.c" , 5 ) , 0 ) );
return x + 5;
}
#ident "acomp: Sun C 5.12 SunOS_sparc 2011/11/16"
Not an __atttribute__ to be found. This C compiler can now generate code as if x is never a negative value.
#include <assert.h> int foo(int x) { assert(x >= 0); return x + 5; }
Can you copy paste the assembly output? (I read sparc asm as part of my day job….)
I’d be interested to see is it is treating __assert() as anything other than a common or garden function.
I’m not sure what it’ll prove, but okay:
cmp %i0,0
bge .L18
nop
.L19:
sethi %hi(.L20),%o0
or %o0,%lo(.L20),%o0
add %o0,8,%o1
call __assert
mov 6,%o2
ba .L17
nop
! block 3
.L18:
ba .L22
mov 1,%i5
! block 4
.L17:
mov %g0,%i5
! block 5
.L22:
! 7 return x + 5;
add %i0,5,%l0
st %l0,[%fp-4]
mov %l0,%i0
jmp %i7+8
restore
! block 6
.L12:
mov %l0,%i0
jmp %i7+8
restore
I did not specify any optimizations, and from what I can tell, it calls a function called __assert().
TL;DR; The optimiser for this compiler is crap. And it isn’t treating __assert() as special / noreturn.
int foo(int x)
{
( void ) ( ( x >= 0 ) || ( __assert ( "x >= 0" , "x.c" , 5 ) , 0 ) );
return x + 5;
}
;; x is register %i0
cmp %i0,0 ; Compare x with 0
bge .L18 ; If it is large branch to .L18
nop ; Delay slot. Sigh sparc pipelining is makes debugging hard.
;;; This is the "call assert" branch. gcc has function __attribute__((cold)) or
;;; __builtin_expect() to mark this as the unlikely path.
.L19:
sethi %hi(.L20),%o0
or %o0,%lo(.L20),%o0
add %o0,8,%o1
call __assert
mov 6,%o2 ;Delay slot again
ba .L17 ; Branch absolute to .L17
nop ;Delay slot
;; Really? Is this optimized at all?
! block 3
.L18:
ba .L22 ; Branch absolute to .L22!
mov 1,%i5 ; put 1 in %i5
;;; Seriously? Is this thing trying to do it the hard way?
;;; The assert branch sets %i5 to zero.
! block 4
.L17:
;; Fun fact. %g0 is the sparc "bit bucket" reads as zero, ignores anything written to it.
mov %g0,%i5
! block 5
;; Falls through. ie. Expected to come
;; out of __assert() *hasn't treated __assert as noreturn!*
;; Joins with the x>=0 branch
.L22:
! 7 return x + 5;
;; Local register %l0 is x + 5
add %i0,5,%l0
st %l0,[%fp-4] ;WTF? Has this been inlined into a larger block of code?
mov %l0,%i0 ;WTF? as above?
jmp %i7+8 ;Return to calling addres.
restore ;Unwind sparc register windowing.
;; WTF? No reference to label .L12
! block 6
.L12:
mov %l0,%i0
jmp %i7+8
restore
Actually, that isn’t quite what happens….
Actually, it’s “Just Another Macro” which, very approximately, expands to …
if( !(exp)) abort();
…where abort() is marked __attribute__((noreturn));
Which is almost, but not quite what one would want….
As the compiler uses the noreturn attribute to infer that if !exp, then rest of code is unreachable, therefore for rest of code exp is true.
Alas, I have found that it doesn’t, if it finds a path for which exp is false, warn you that you will abort!
I certainly feel there is room for compiler and optimizer writers work with design by contract style programmers to have a “mutually beneficial” two way conversation with the programmers when they write asserts.
Which is almost, but not quite what one would want….
I’m not sure I understand you. assert() will abort if the expression given is false. That’s what it does. It also prints where the expression was (it’s part of the standard). If you don’t want to abort, don’t call assert(). If you expect that assert() is a compile-time check, well, it’s not.
I certainly feel there is room for compiler and optimizer writers work with design by contract style programmers to have a “mutually beneficial” two way conversation with the programmers when they write asserts.
There’s only so far that can go though. Put your foo() function in another file, and no C compiler can warn you.
assert() is also a standard C function, which means the compiler can have built-in knowledge of its semantics (much like a C compiler can replace a call to memmove() with inline assembly). The fact that GCC uses its __attribute__ extension for this doesn’t apply to all other compilers.
That’s the other bit of Joy about C.
There are two entirely separate things….
The compiler.
And the standard library.
gcc works quite happily with several entirely different libc’s.
assert.h is part of libc, not the compiler.
How assert() is implemented is the domain of the libc implementer not the compiler writer.
I have poked at quite a few different compilers and even more libc’s…. as I have summarised is how all I have looked at are doing things. (Although some don’t have a concept of “noreturn” so can’t optimize based on that)
Which compiler / libc are you looking at?
The compiler that comes with Solaris right now.
You can’t have a standard C compiler without the standard C library. I can get a compiler that understands, say, C99 syntax, but unless it comes with the standard C library, it can’t be called a compliant C99 compiler. The standard covers both the language and the library. I’m reading the C99 standard right now, and here’s an interesting bit:
Each library function is declared, with a type that includes a prototype, in a header, (182) whose contents are made available by the
#includepreprocessing directive.
And footnote 182 states:
(182) A header is not necessarily a source file, nor are the < and > delimited sequences in header names necessarily valid source file names.
To me, that says the compiler can have knowledge of the standard functions. Furthermore:
Any function declared in a header may be additionally implemented as a function-like macro defined in the header … Likewise, those function-like macros described in the following subclauses may be invoked in an expression anywhere a function with a compatible return type could be called.(187)
(187) Because external identifiers and some macro names beginning with an underscore are reserved, implementations may provide special semantics for such names. For example, the identifier _BUILTIN_abs could be used to indicate generation of in-line code for the abs function. Thus, the appropriate header could specify
#define abs(x) _BUILTIN_abs(x)for a compiler whose code generator will accept it. In this manner, a user desiring to guarantee that a given library function such as abs will be a genuine function may write#undef abswhether the implementation’s header provides a macro implementation of abs or a built-in implementation. The prototype for the function, which precedes and is hidden by any macro definition, is thereby revealed also.
So the compiler can absolutely understand the semantics of standard C calls and treat them specially. Whether a C compiler does so is implementation defined. And good luck writing offsetof() of setjmp()/longjmp() portably (spoiler: you can’t—they’re tied to both the compiler and architecture).
So, getting back to assert() and your issues with it. Like I said, the compilers knows (whether it’s via GCC’s __attribute__(__noreturn__) or because the compiler has built-in knowledge of the semantics of assert()) that the expression used must be true and can thus optimize based on that information, much like it can remove the if statement and related code:
const int debug = 0;
{
int x = debug;
if (x)
{
fprintf(stderr,"here we are!\n");
exit(33);
}
// ...
}
even through x, because debug is constant, x is loaded with a constant, and not modified prior to the if statement. Your wanting a warning about an invalid index to an array whose index is used in assert() is laudable, but to the compiler, you are telling it “yes, this is fine. No complaints please.” Compile the same code with NDEBUG defined, the assert() goes away (from the point of view of the compiler phase) and the diagnostic can be issued.
Yes, it sucks. But that’s the rational.
The intent is you run the code, you get the assert, you fix the code (otherwise, why use assert() in the first place?) or remove the assert() because the assumption made is no longer valid (this has happened to me but not often and usually after code has changed, which is something you want, no?).
You can do #define assert(p) if (!(p)) __builtin_unreachable() to keep the optimisation benefits! And MSVC has __assume which behaves similarly.
Hmm… Interesting….
Does it then elide the expression !(p)?
Or does it impose the run time cost of evaluating !(p) and not the benefit of invoking abort()?
Since __builtin_unreachable only exists to guide the compiler, p has to be an expression with no side effects, and then the compiler can optimise it out because you don’t use its result.
I think this is an incorrectly framed question. C says that it’s not the compiler’s problem. You index past an array bound, perhaps you know what you are doing or perhaps not. The compiler is just supposed to do what you said. If you have indexed into another data structure by mistake or past the bound of allocated memory - that’s on the programmer ( BTW: I think opt-in bounds checked arrays would be great). It is unreasonable for the compiler to assume things that may be false. For example, if the programmer cautiously adds a check for overflow, I don’t want the compiler to assume that the index must be in bounds so the check can be discarded.
C says that it’s not the compiler’s problem
Actually the C standard says it’s not the compilers problem, it’s undefined behaviour and completely your problem.
If you want it to have some weird arsed, but well defined behaviour, you need a different language standard.
In C standardese, things that are “the compilers problem” are labelled clearly as “implementation defined”, things that are your problem are labelled “undefined behaviour”.
perhaps you know what you are doing or perhaps not.
Well, actually, you provably don’t know what you’re doing…. as the compiler and linker lays out the data structures in ram pretty much as they damn well feel like.
Part of that for, example, like the struct padding and alignment is part of the ABI for that particular system, which is not part of the C standard, and most of that will change as you add or remove other data items and/or change their types. If you need to rely on such things, there are other (some non-standard) mechanisms, eg. unions types and packing pragmas.
BTW: I think opt-in bounds checked arrays would be great
gcc and clang now does have sanitizers to check that.
However the C standard is sufficiently wishy-washy on a number of fronts, there are several corner cases that are uncheckable, and valgrind is then your best hope. Valgrind won’t help you, for example, if you index into another valid memory region or alignment padding.
For example, if the programmer cautiously adds a check for overflow, I don’t want the compiler to assume that the index must be in bounds so the check can be discarded.
How ever, if the compiler can prove that the check always succeeds, then the check is useless and the programmer has written useless code and the compiler rightly elides it.
Modern versions of gcc will (if you have the warnings dialled up high enough, and annotated function attributes correctly) will warn you about tautologies and unreachable code.
The C standard is not the C language. It is a committee report attempting to codify the language. It is not made up of laws of physics - it can be wrong and can change. My argument is that the standard is wrong. Feel free to disagree, but please don’t treat the current instance of the standard as if they were beyond discussion.
In fact, I do want a different standard: one that is closer to my idea what the rules of the language should be in order to make the language useful, beautiful, and closer to the spirit of the design.
The compiler and linker don’t have total freedom to change layouts even in the current standard - otherwise, for example, memcpy would not work. Note: “Except for bit-fields, objects are composed of contiguous sequences of one or more bytes, the number, order, and encoding of which are either explicitly specified or implementation-defined.”
struct a{ int x[100];}; char *b = malloc(sizeof(int)*101; struct a *y = (struct a *)b; if(sizeof(struct a)) != sizeof(int)*100 ) panic(“this implementation of C won’t work for us\n”); …. do stuff … y->x[100] = checksum(y);
But worse,
message = readsocket(); for(i = 0; i < message->numberbytes; i++) if( i > MAX)use(m->payload[i]))
if the compiler can assume the index is never greater than the array size and MAX is greater than array size, according to you it should be able to “optimize” away the check.
How ever, if the compiler can prove that the check always succeeds, then the check is useless and the programmer has written useless code and the compiler rightly elides it.
This is one of the key problems with UB. The compiler can assume there is no UB. Therefore the check is assumed unnecessary. Compilers don’t do this right now, but that’s the interpretation that is claimed to be correct. In fact, in many cases the compiler assumes that the code will not behave the way that the generated code does behave. This is nutty.
The C standard is not the C language.
Hmm. So what is The C Language?
In the absence of the standard, there is no “C Language”, merely a collection of competing implementations of different languages, confusingly all named “C”.
I don’t think calling a standard “Wrong” isn’t very helpful, as that would imply there exists some definition of Right.
I rather call it “differing from all known implementations” or “unimplementable” or “undesirable” or “just plain bloody silly”.
There is no One True Pure Abstract Universal C out there like an ancient Greek concept of Numbers.
There are only the standard(s) and the implementations.
In fact, I do want a different standard: one that is closer to my idea what the rules of the language should be in order to make the language useful, beautiful, and closer to the spirit of the design.
Ah, the Joys of Standards! They are so useful, everybody wants their own one! ;-)
Except for bit-fields, objects are composed of contiguous sequences of one or more bytes, the number, order, and encoding of which are either explicitly specified or implementation-defined.”
Umm. So, yes, an array is a contiguous sequence, but we’re talking about indexing out of bounds of an array. So what is contiguous beyond that array?
Answer 1 : Possibly empty padding to align the next object at the appropriate alignment boundary.
Answer 2: Which is the next object? That is determined by the field order within a struct…. (with the alignment padding determined by the ABI), but if the array is not in a struct…. it’s all bets off as to which object the compiler linker chooses to place next.
Hmm. Your example didn’t format nicely (nor was it valid syntax (missing parenthesis) so let me see if I can unravel that to see what you mean…
struct a {
int x[100];
};
char *b = malloc(sizeof(int)*101);
struct a *y = (struct a *)b;
if(sizeof(struct a)) != sizeof(int)*100 )
panic(“this implementation of C won’t work for us\n”);
…. do stuff …
y->x[100] = checksum(y);
Hmm. Not sure what you’re trying to say, but try this one….
#include <stdio.h>
#include <stdint.h>
struct {
char c[5];
uint32_t i;
} s;
uint64_t l;
int main(void)
{
printf( "sizeof(s)=%lu\n sizeof(c)=%lu\n sizeof(i)=%lu\n", sizeof(s),sizeof(s.c),sizeof(s.i));
printf( "address of s=%08lx\n address of l=%08lx\n diff = %ld\n", (uintptr_t)&s, (uintptr_t)&l, ((intptr_t)&s-(intptr_t)&l));
return 0;
}
Outputs…
sizeof(s)=12
sizeof(c)=5
sizeof(i)=4
address of s=00601050
address of l=00601048
diff = 8
In the absence of the standard, there is no “C Language”, merely a collection of competing implementations of different languages, confusingly all named “C”.
What a weird idea. So much of the core code of the internet and infrastructure was written in an impossible language prior to the first ANSI standard. And it even worked!
Ah, the Joys of Standards! They are so useful, everybody wants their own one! ;-)
There is a standards process. It involves people presenting proposals for modifications and discussing their merits. There are changes ! That’s totally normal and expected.
The moment C compilers began padding, every compiler added a “packed” attribute. The reason is that many C applications require that capability. Imagine ethernet packets with artisanal compiler innovative ordering. And those attributes are not in the standard - yet they exist all the same.
Your example is not my example.
So much of the core code of the internet and infrastructure was written in an impossible language prior to the first ANSI standard. And it even worked!
Were actually written in whatever dialect was available on the day and worked for that machine, that compiler on that day.
And porting to a different machine, different compiler, different version of the compiler, was a huge pain in the ass.
I know.
I’ve done a hell of a lot of that over the decades.
Role on tighter standards please.
Yup, and most of them added a subtly different packed attribute, and since it was not terribly well documented and defined, I’ve had a fair amount of pain from libraries written (LWIP comes to mind), where at various points in their history got the packed attribute wrong, so it wasn’t portable from 32 to 64 bit.
Your example is not my example.
You example wasn’t formatted, and I didn’t quite understand the point of it. Can you format it properly and expand a bit on what you were sayign with that example?
Yes, strict aliasing was a mistake. While Rust memory model is in development, it is generally agreed that Rust will have no strict aliasing.
I didn’t think Rust had any way of creating aliasing pointers/references that weren’t of the same type (except perhaps via “unsafe” code) which makes the “has strict aliasing / has no strict aliasing” decision somewhat redundant. Do you mean that pointers generated via unsafe code are always assumed to be able to alias any other pointer, even of another type?
Yes. Unsafe Rust is part of Rust. More contentious is whether pointers are integers. Tentatively they are not, just as in C.
I’m not questioning whether unsafe Rust is part of Rust. I’m questioning whether aliasing pointers with different types and dereferencing them will always give defined behaviour. (And maybe it will; but it seems an odd decision to make, seeing as Rust is generally ok with unsafe code producing being able to produce undefined behaviour, and this doesn’t seem like a worthwhile behaviour to define).
Everyone that I know of is against TBAA. I don’t know enough about the details to tell you why, but /u/sanxiyn is correct.
And maybe it will; but it seems an odd decision to make, seeing as Rust is generally ok with unsafe code producing being able to produce undefined behaviour,
This is true and not true, that is, UB has a cost, in that it’s a footgun. If you don’t get much optimization benefit, then you’re introducing a footgun for no benefit. One of our goals for the unsafe code guidelines is that they will not require large swaths of already existing unsafe code to be unwritten.
Everyone that I know of is against TBAA.
You can consider that untrue as of now. :)
I don’t know enough about the details to tell you why, but /u/sanxiyn is correct
Ok, thanks. Interesting to know.
The offhand ‘even perl’ in there struck me as unfair. It reminds me that perl is actually pretty fast (specifically at startup, but my recollection was also that it runs quickly):
$ time for i in `seq 1 1000`; do perl < /dev/null; done
real 0m2.786s
user 0m1.337s
sys 0m0.686s
$ time for i in `seq 1 1000`; do python < /dev/null; done
real 0m19.245s
user 0m9.329s
sys 0m4.860s
$ time for i in `seq 1 1000`; do python3 < /dev/null; done
real 0m48.840s
user 0m30.672s
sys 0m7.130s
I can’t comment on how fast Perl is, but you are measuring the time taken to tear down here too.
The correct way would be to take the raw monotonic time immediately before invoking the VM, then inside the guest language immediately print it again and take the difference.
P.S. Wow Python3 is slower.
but you are measuring the time taken to tear down here too.
I guess so? I’m not sure that’s a useful distinction.
The people wanting “faster startup” are also wanting “fast teardown”, because otherwise you’re running in some kind of daemon-mode and both times are moot.
The people wanting “faster startup” are also wanting “fast teardown”
Yeah, I guess I agree that they should both be fast, but if we were measuring for real, I’d measure them separately.
I’m not sure that’s a useful distinction.
If latency matters then it could be. If you’re spawning a process to handle network requests for example then the startup time affects latency but the teardown time doesn’t, unless the load gets too high.
Hah before I read the comments I did the same thing! My results on a 2015 MBP - with only startup and teardown on an empty script, and I included node and ruby also:
~/temp:$ time python2 empty.txt
real 0m0.028s
user 0m0.016s
sys 0m0.008s
~/temp:$ time python3 empty.txt
real 0m0.042s
user 0m0.030s
sys 0m0.009s
~/temp:$ time node empty.txt
real 0m0.079s
user 0m0.059s
sys 0m0.018s
~/temp:$ time perl empty.txt
real 0m0.011s
user 0m0.004s
sys 0m0.002s
~/temp:$ time ruby empty.txt
real 0m0.096s
user 0m0.027s
sys 0m0.044s
Ruby can do a bit better if you don’t need gems (and it’s Python 3 here):
$ time for i in $(seq 1 1000); do ruby </dev/null; done
real 0m31.612s
user 0m27.910s
sys 0m3.622s
$ time for i in $(seq 1 1000); do ruby --disable-gems </dev/null; done
real 0m4.117s
user 0m2.848s
sys 0m1.271s
$ time for i in $(seq 1 1000); do perl </dev/null; done
real 0m1.225s
user 0m0.920s
sys 0m0.294s
$ time for i in $(seq 1 1000); do python </dev/null; done
real 0m13.216s
user 0m10.916s
sys 0m2.275s
I was calm enough to go through a few iterations of editing the tweet to end up with something only mildly sarcastic.
But it’s not only mildly sarcastic, by any reasonable stretch. And:
I avoid name-calling, personal attacks, and profanity
It’s not ad ad hominem attack, but is a personal attack - because it was responding to a post made by an individual, and it implied that that person was being vain and short-sighted (“self-congratulatory”); further, it posed a rhetorical question - a form of passive-aggressiveness, very much related to the “RTFM” response that is being (rightly) derided.
(And to be clear, I don’t think the complaint being made is against informing people that the answer to their question can be found in the documentation - it’s about the phrasing, etc, the way that this information is given. But I do not think this tweet that is now being defended is anything but an example of exactly the wrong way to make a comment onlilne).
I’m not a processor design expert, but I’ve read a few articles about caches and cache coherency similar to this. If you were to make the mistake in an online conversation of implying that caches of different processor cores could be inconsistent or that, for example, volatile variables bypass cache you’d probably get someone point you at such an article. They’re not wrong, but they’re also not necessarily helpful. This one is better than most, I think, because it’s not being elitist.
In concurrency, correctness is hard enough without worrying about performance of the underlying hardware. While ‘volatile’ in Java might not truly need to write data through to core memory every time the variable is assigned, this mental model is very useful for reasoning about when volatile is necessary. If we think of each processor core having incoherent caches, and various primitives (such as volatile) providing a way to force read/writes to go via the shared core memory, we have a much more straightforward way of understanding the whole issue, even if it’s not accurate at the architectural level.
This article, in fact, glosses over store buffers, which in fact have a similar function to a cache but on some architectures (including even x86 in certain usually-avoided conditions, IIUC) can be incoherent; the conclusion - that from the perspective of the software application, the memory subsystem appears to be a single, consistent, monolith - is arguably not entirely correct; however, the point it’s making is really about the performance side - if we use the mental model I’ve described (which many do, I suspect because it is far more straight-forward to understand than the truth) we might believe that these synchronisation primitives are very costly, when in fact they may not be so. That’s a valid point.
One potential issue with your program: it pushes all input to the program before reading any of its output. In a case where the program does a significant amount of output, it may block - and in fact, deadlock, since it will stop consuming input, which may cause your driver program to block as it attempts to write to the pipe.
This might not be a real concern, though, if the amount of input/output is always limited.
Edit: a more comprehensive solution might be to loop while both pushing some input to the student program using non-blocking I/O and read some of its output (also using non-blocking I/O). Once you’ve pushed all input, you could leave the loop and assume the pipe buffer contains all significant output (as you do now), or perhaps better, flush the buffer and then sleep for a little before checking for more output (using non-blocking I/O).
Hm, interesting. The good news is there is not generally a significant amount of output – mostly user-interactive output, ~500B at a time. The bad news is you’re right. I didn’t realize non-blocking I/O in C was possible. Will investigate…
While it may be true that Clang generally produces better error messages,
Through this simple test, clang seems generating more user-friendly error message than gcc
… is not a good conclusion to draw from a single example.
What is JavaFX? Every time I’m trying to figure out what is it, I find only lots of nonsensical marketing terms like “Rich Internet Applications”, “Rich Client Platform”, as well as ancient dotcom-crash-era buzzwords like “multimedia”. Looks like it’s GUI toolkit that had been planned to replace Swing, but it completely lacks native look and everything is based on skins (hello from 2001). Is it usable toolkit, or just enterprise gimmick to display interactive sales charts?
I used it on a large-ish desktop project. It’s quite a nice library. It’s easy to use in common cases, has a very clean and regular UI, and is really, really fast. We use the CSS styling just a bit (to offer a dark theme) but not much of the XML-based scripting stuff.
It’s much easier to use than Swing, thanks in part to backing away from the L&F split.
Oh, JavaFX also makes it dead easy to get an OpenGL context in a panel.
So, it’s more similar to Qt Quick and Clutter than Silverlight or Adobe Air?
And is “traditional” GUI with input boxes, lots of buttons, resizable layouts doable in it? Can I just instantiate trees of these elements without dealing with skinning details or OpenGL? Do standard interactions (selecting text in text boxes with mouse or keys, clipboard, cursor movement, scrolling) work as expected, unlike, for example, some GUIs in game engines?
I don’t know Qt Quick and Clutter, so I can’t comment. Definitely in the first release, JavaFX was positioned as a competitor to Silverlight and Air. JavaFX 1 was all about the XML to make it declarative and script-driven. With JavaFX 2 they seemed to back away from that positioning a bit.
As far as the expected complement of widgets, yep. They’re all there. You don’t have to deal with CSS or skinning if you don’t want to. Just instantiate the components and go. All the interactions you’d expect are there. It’s not like an OpenGL rebuild of a GUI toolkit. It’s a GUI toolkit that also makes it easy to show OpenGL.
(The docs can be a little misleading here again, because they talk about a “stage” and a “scene”, which makes it sound like a 3d engine. It’s not.)
Another thing I liked is that they extended the JavaBeans getter/setter model with active property objects. Instead of creating a custom listener interface for everything like in AWT, you can just bind to specific properties and get notified about changes to those. That gives you a narrower interface that’s used more broadly… which makes it super-easy to use from Clojure. :-)
One thing I dislike is that you’re expected to have your main class be a subclass of javafx.application.Application. They want to take control of initial class loading and application startup. There’s a goofy workaround needed if you want (like me) to have main be a Clojure function. That’s not of general interest though so I won’t clutter this post up with it.
JavaFX was, if I recall properly, the first widespread attempt at doing reactive programming in the UI. I think it grew from the community (Romain Guy was the main developer), as it is too innovative to be something that was created out of a top-down approach. I was happily surprised when it got integrated into the JDK, for obvious strategic reasons at the time, as mentioned in other comments.
Oracle’s marketing is strange. I got interest to try it only after comments here, official webpages for JavaFX only repelled me. It looked like top-down thing to resurrect applets, I didn’t even know it’s a GUI toolkit and not chart plotting or animation lib.
It has confusingly referred to several different things over the past, but in this context, it’s really just a UI toolkit - it could be seen as a replacement for Swing.
I find that it’s nicer to use in some ways than Swing (I don’t particularly like the concept of CSS for regular UIs, but it does make altering style and positioning quick and easy), but it’s less complete, lacking features (such as an extensible text editor kit) that the more mature Swing does have. It’s also more buggy and sometimes slower than Swing, and has some ill-conceived design choices (weak references in property bindings, urgh).
Examples for beginners “Getting started with JavaFX” does not use any builder: first widgets are instantiated directly and then defined in xml. Interface builder probably exists too but it’s definitely not a killer feature of JavaFX, and UI builders exist for Swing too.
According to other comments here and some introductional docs, it’s just another GUI toolkit for Java, alternative to Swing, but more “modern” and without native look-and-feels. XML is just one of features. It was initially marketed as alternative to Silverlight, but in reality it’s another GUI toolkit.
I’m a bit puzzled why the author seems to think that integer wrap on overflow behaviour has something to do with C and undefined behaviour. The same thing happens with nearly all languages which use the processor’s integer arithmetic, because those semantics are provided by the processor itself. Java, C#, etc. all wrap on overflow. There are some exceptions though - Ada provides the “exception on overflow” semantics the author prefers, but it does come with a significant performance penalty because checking for overflow requires additional instructions after every arithmetic operation.
The point here is that if you want performant arithmetic it’s all about what the processor is designed to do, not anything to do with the languages. Java defines integer wrap as the language’s standard behaviour but as a result it incurs a performance penalty for integer arithmetic on processors which don’t behave this way. C doesn’t incur this penalty because it basically accepts that overflow works however the processor implements it. And let’s face it if your program is reliant on the exact semantics of overflowing numbers you’re probably doing it wrong anyway.
There are some processors which provide interrupts on integer overflow. This eliminates the performance penalty associated with overflow checks if your language is Ada and so you want to trap on overflow. There are other semantics around too - DSP processors often have “clamp on overflow” instead since that suits the use case better and old Unisys computers use ones complement rather than twos complement so their overflow behaves slightly differently.
Performance penalty of “trap on overflow” can be reduced by clever modeling, for example by allowing delayed trap instead of immediate trap. As-if Infinitely Ranged is one such model. Immediate trap disallows optimizing
a+b-btoa, because ifa+boverflows the former traps and the latter doesn’t. Delayed trap allows such optimization.You are mixing up underlying behaviour of the processor with defined (or un-defined) behaviour of the language. Wrap on integer overflow is indeed the natural behaviour of most common processors, but C doesn’t specify it. The post is saying that some people have argued that wrap-on-overflow should be the defined behaviour of the C language, or at least the implementation-defined behaviour implemented by compilers, and then goes on to provide arguments against that. There is a clear example in the post of where behaviour of a C program doesn’t match that of 2’s complement arithmetic (wrapping).
That’s the point - in C, it doesn’t happen.
I don’t get the point. The advantage of using integer wrap for C on processors that implement integer wrap is that it is high performance, simplifies compilation, has clear semantics, and is the semantics programmers expect. If you want to argue that it should be e.g. trap on overflow, you need to provide a reason more substantive than theoretical compiler optimizations that are shown by hand waving. The argument that it should be “generate code that overflows but pretend you don’t” you needs a stronger justification because the resulting semantics are muddy as hell. I’m actually in favor of a debug mode overflow trap for C but a optimized mode of use processor semantics.
Read the post, then; there are substantive reasons in it. I’m not engaging with you if you’re going to start by misrepresenting reasoned arguments as “hand waving”.
“However, while in many cases there is no benefit for C, the code generation engines and optimisers in compilers are commonly general and could be used for other languages where the same might not be so generally true; “
Ok! You think that’s a substantive argument.
You’re making a straw man. What you quoted is part of a much larger post.
That’s not what “straw man” means.
It means that you’re misrepresenting the argument, which you are. I said that the post contained substantive reasons, you picked a particular part and insinuated that I had claimed that that particular part on its own constituted a substantive reason, which I didn’t. And: you said “If you want to argue that it should be e.g. trap on overflow, you need to provide a reason more substantive than theoretical compiler optimizations that are shown by hand waving” but optimisations have
nothingvery little to do with trapping being a better behaviour than wrapping, and I never claimed they did, other than to the limited extent that trapping potentially allows some optimisations which wrapping does not. But that was not the only reason given for trapping being a preferable behaviour; again, you mis-represented the argument.They are related, yes. E.g. whilst signed integer overflow is well defined in most individual hardware architectures (usually as a two’s compliment wrap), it could vary between architectures, and thus C leaves signed integer overflow undefined.
The whole argument is odd.