This paper attempts to reframe unconstructive narratives and argue that the committee’s real opportunity is to improve people’s lives.
The best way to actually improve people’s lives as a C++ committee, I think, is to define more behaviour.
Both in scope and in reach. The original intention of undefined behaviour was quite clearly to support different platforms when they behave differently, without sacrificing performance. It has since expanded to allow (sometimes explicitly so) some performance optimisations. The problem is that now compilers are behaving like a sentient adversary, inserting vulnerabilities at the slightest UB even when the underlying platform could have behaved sensibly.
The most infuriating instance of course would be signed integer overflow. 2’s complement has won, no platform in current use is using anything else. And yet, -fwrapv still isn’t the default. Why? Because it would hurt the performance of some badly written code. There’s also strict aliasing, which managed to decrease the memory safety of C and C++ (if that was even possible), introducing bugs and vulnerabilities to otherwise perfectly valid code. At least this one allows even more potent optimisations. Still, wouldn’t restrict have been enough?
Now I do see an obstacle to defining more behaviour (or at least moving it unspecified, or making traps as part of defined behaviour):
[…] we can substantially improve the performance, and environmental impact, of newly written code
I believe this is “we” as “we the committee”. They feel responsible for the environmental impact of people’s code, and they feel obligated to serve us with the maximum possible performance in so many cases. I think this is misguided. Instead they should help programmers write more performant code when they need to.
Take signed integer overflow for instance: sure it hurts the performance of some loops, but any programmer who cares will correct their loops and use correctly signed and correctly sized integers to recover the speed. The benefits will be fewer undefined behaviours of course, but also the ability to check for overflow after the fact, which is sometimes faster than the alternative.
For better or worse (in my opinion mostly for worse), C++ is here to stay. I believe the best way to help people is to reduce the harm it causes. Thus, define more behaviour.
GCC and Clang don’t try to optimise within the rules of the language standard because they are some kind of malevolent agent; they do it because that’s what we ask for when we pass -O2. We tend to assume maximum performance is the default and then blame our tools when it’s not what we wanted.
GCC and Clang don’t try to optimise within the rules of the language standard because they are some kind of malevolent agent
I don’t care if they’re malevolent. They still act malevolent.
they do it because that’s what we ask for when we pass -O2.
The malevolence I speak of sometimes also happens under -O0.
We tend to assume maximum performance is the default and then blame our tools when it’s not what we wanted.
Optimise for the wrong thing, then blame the victim. I’m aware that’s the consensus among compiler writers.
If you want -fwrapv, use -fwrapv.
I can’t.
I’m writing a library, and do not have full control over compilation options, or even the exact compiler being used. If I require -fwrapv the critical CVEs will start piling up pretty quickly.
I don’t care if they’re malevolent. They still act malevolent.
This is such nonsense. I’ve seen this sort of claim made before and it always irks me. “Malevolent” (or “malice”) ascribes intent. I’m sure you recognise that the compiler doesn’t and can’t actually have any ill-will to you or anyone else, so the claim boils down to “it acts the same as it would as if it were malevolent” (or in your words above, “they still act malevolent”).
Which is also completely wrong, and obviously so since a great deal of software manages to work perfectly fine despite being compiled with “malevolent” compilers. A truly malevolent compiler would of course break everything or, for example, silently insert security vulnerabilities into all programs regardless of whether their code is correct or not.
At this point the more ardent “malevolent compiler” club members will probably start frothing about how “that’s exactly what the compiler DOES!”, even though that’s demonstrably wrong, and then the argument usually devolves into a back-and-forth about what the C (or C++) language actually requires and so on, which is an interesting discussion the first couple of times but is also never really fruitful, perhaps because the same people arguing that the language standard means what they personally think it should mean are the same people who tend to make fallacious logical leaps (like “the compiler made my code do something I didn’t want it to and it’s therefore malevolent, because clearly it could never be my own fault.”).
So what we’re left with if we strip all the ridiculous exaggeration away is that “compilers sometimes behave in a way that is indistinguishable from malice”. Which is perhaps true in a limited sense, if you don’t understand how compilers work, and you define “malice” as “makes my program behave in a way that is different to what I wanted, even though its behaviour is technically undefined”. It is not true otherwise.
If you want the compiler to interpret your undefined-behaviour-filled code in some particular way there are a plethora of options it provides to do so. Just use them.
And argue, as you reasonably did at first, that some of that behaviour should really be defined by the language. There are plenty of good arguments you can make there. Alluding to malice or malevolence in the behaviour of compilers, though, is plain wrong.
To be more precise, my claim is that a sufficiently advanced optimising compiler is indistinguishable from a sentient adversary, that may insert vulnerabilities whenever your program exhibit some kind of UB. It won’t do so every time of course, but it happens often enough, and unpredictably enough, that our only reasonable choice is to hunt down every single UB and correct it.
Now in reality I don’t mind such an adversary, but I do want the odds stacked in my favour: I need a language where UB is easy enough to avoid in the first place.
That’s like saying that when I stub my toe on a random rock that it’s indistinguishable from stubbing my toe on a rock that was placed there by a sentient adversary (so that I would stub my toe on it). It’s trivially true, but also a pointless observation; that you make the “compilers act as sentient adversaries” claim then trivialise it in this way when challenged just looks like moving the goal posts.
our only reasonable choice is to hunt down every single UB and correct it.
The practical reasonable choice is that we do our best to avoid UB, mitigate its effects where possible, and otherwise address UB bugs when they become an issue. Most UB bugs probably don’t introduce vulnerabilities. I’d even surmise (admittedly without data to prove it) that most UB bugs that introduce vulnerabilities are buffer overflows, which you can’t trivially update the language requirements to fix, and that of the other UB-related bugs most are integer-overflow vulnerabilities that are actually exploiting 2’s complement wraparound (so that simply definining the overflow behavior as wrapping wouldn’t resolve the vulnerability anyway). Then there is of course the “programmer didn’t understand how to correctly check for overflow, so the compiler removed the check” variety; that’s about the only kind that tightening the language semantics really helps with.
“Our only reasonable choice is to hunt down every single UB and correct it” is nonsense. For all that UB “can do anything”, the reality is that most UB-involving vulnerabilities boil down to a small selection of different bug types, some of which can be mitigated by compiler options, and nearly all can be mitigated by more general techniques if we have the desire (OpenBSD for example employs a raft of defenses which make even a buffer overflow difficult to exploit). These are reasonable choices. And of course we also have the perfectly reasonable choice of using a different, memory-safe, language. C++ might be made “safer” but it’s never going to completely free of UB, so if you really believe that “zero UB” is on the only acceptable level then C++ isn’t the language for you even if they do remove some of the more common tripwires.
Most UB bugs probably don’t introduce vulnerabilities.
Key words being “most”, and “probably”. I can’t rely on that, especially if I expect the compiler to be updated.
most UB-involving vulnerabilities boil down to a small selection of different bug types, some of which can be mitigated by compiler options
As a library author I don’t control compiler options.
and nearly all can be mitigated by more general techniques if we have the desire (OpenBSD for example employs a raft of defenses which make even a buffer overflow difficult to exploit)
As a library author I don’t control the execution environment.
As a library author I don’t control compiler options.
So what? You not being able to control the compiler options, or the execution environment, doesn’t mean the compiler is “acting malevolent”, nor does the fact that a newer version of a compiler might theoretically turn a UB-based bug into a vulnerability. Nor does it mean you are solely responsible for what someone else chooses to do with your library, including how they choose to compile it, or that they chose to use it in the first place, or that they chose to use (your) library written in a language which is known to have undefined behaviour.
You not being able to control the compiler options, or the execution environment, doesn’t mean the compiler is “acting malevolent”
Nothing to do with malevolence. I’m just saying your proposed mitigations are not available to me. People will update their compiler, and blame me if it causes problems. People will use flags I recommend against, and blame me if it causes problems.
In principle, if I say my code is only meant for compiler X, version Y, kernel Z, with flags F, many are not going to read the manual, and try to use it elsewhere anyway. There’s little point blaming them. My only choice is to either cater to them, or sacrifice them. in this case I would try to put more substantial controls, like some non-portable #ifdef and #error.
Those controls though aren’t available to me either, because I’m writing a portable library (I know, I’m making my life difficult. But it paid off, I now have quite a few users).
The mitigations I proposed are available to you - you can insist that your library is compiled with certain options, for example, and then, yes, it is definitely the case that people who don’t use those options shouldn’t blame you if a bug in your library turns into a vulnerability (when it otherwise wouldn’t have); the responsibility does rest at least partially on the end user themselves for choosing to compile without safe options (even if you don’t insist that they do), and also partly on the language and - yes - the compiler. But that still doesn’t mean the compiler is malevolent. And if people do blame you anyway, it doesn’t make any difference to that.
“Our only reasonable choice is to hunt down every single UB and correct it” is still wrong. There are mitigations. You can insist they be used. Your library consumers can choose to use them. Maybe you think it’s your only choice to hunt down every single UB bug and correct it, but that’s not what you said, and I don’t agree anyway. It is, as I said, pretty much impossible to do that even with significant tweaks to make C++ safer, or equivalent tweaks in compiler defaults, because of buffer overruns (and dangling references, and similar bugs which have no such simple remedy as “just defining” the behaviour).
I have two problems: user stupidity, and portability goals.
I can to some extent punt on user stupidity, write the proper stern warnings everywhere appropriate, and blame users when something wrong happens. And if I’m being honest I already do to some extent: almost all my functions accept pointers, and I never check for NULL. Heck, most of my functions don’t even return an error code, instead they rely on the user to provide correct inputs. It’s an explicit design choice to reduce the amount of error handling, not only on my part, but on the user’s as well.
I could, if I primarily shipped binaries, rely on -fwrapv and -fno-strict-aliasing. To a lesser extent I could also ask my users to activate those flags, and provide a default makefile that does exactly that. I do think however that this is less socially acceptable than failing to check user input.
Portability however is a bear. I’m targetting the intersection between C99 and C++, on a wide range of targets, from 16-bits micro-controllers to 64-bits server powerhouses. I expect users will compile my code with GCC, Clang, MSVC, ICC, TCC, CompCert, and stuff I haven’t even heard of. I have actual users on various platforms. I have no idea what the effects of UB might actually be, so my only choice is to systematically avoid it, and stick to strictly conforming code.
It is, as I said, pretty much impossible to do that even with significant tweaks to make C++ safer, or equivalent tweaks in compiler defaults, because of buffer overruns (and dangling references, and similar bugs which have no such simple remedy as “just defining” the behaviour).
I agree: case in point, the TIS interpreter, that guarantees will catch most classes of UB (if not all of them), runs orders of magnitude slower than native code. Tests that take 10 seconds to run require 35 hours with it. Even sanitisers, which don’t even catch all issues, are significantly slower than normal code (they remain fast enough to be bloody useful on debug builds though).
So I won’t advocate that we define all behaviour in C++, or even most of it. Just some, wherever reasonable.
So I won’t advocate that we define all behaviour in C++, or even most of it. Just some, wherever reasonable.
I agreed with this from the start. Whatever your perceived need for eliminating all UB in your software (it’s not an ignoble goal, I also aspire to this), it doesn’t justify calling compilers “malevolent” or saying that they “act malevolent”.
IDK, if you keep stubbing your toe on rocks maybe it makes sense to move them away from the path. It doesn’t really matter if they were put there maliciously, if they are causing trouble then it makes sense to move them.
Under which situations are compilers acting “malevolent” at -O0?
Anyway, the way I see it, when I pass -O2, my deal with the compiler is: I write correct C++ code, it produces the fastest machine code it can on the assumption that the C++ code is correct. If the compiler tries to protect me from myself by producing needlessly slow code at -O2 based on the assumption that my C++ code is probably incorrect, I get annoyed.
There are parts of the language which I think should be made into IB or defined behavior rather than being UB, but that’s a problem with the language and not the compiler IMO.
Optimise for the wrong thing, then blame the victim. I’m aware that’s the consensus among compiler writers.
It’s revealed preference. People who are OK with a bit lower perf already use safe languages for the most part, while many many C and C++ people care greatly about every last percentage point. Not all, certainly, but these advanced optimization passes wouldn’t be getting written unless there was a lot of demand for them.
It’s revealed preference. People who are OK with a bit lower perf already use safe languages for the most part
Within an application, not all code paths are equally fast and it’s generally not straightforward to use one language for the performance-critical code and another language elsewhere. Changing languages doesn’t feel like a reasonable solution IMHO.
I should not have to memorize the 100’s of different flags and the way they interoperate and which forms of UB they risk me causing in order to write my code. That way lies madness. It’s a developer ergonomics disaster and is the cause of who knows how many bugs, security flaws, and performance issues. It is almost impossible to “hold the C++ compiler” correctly right now. Making that better would do wonders for C++ usage, performance, and safety.
should not have to memorize the 100’s of different flags
The fact that you weren’t able to make this argument without resorting to a ridiculous level of exaggeration is telling. There are not 100’s of different flags required to relax the language rules that people typically get upset about; there’s a handful. You do not have to memorize them. They do not risk you causing UB, they make certain UB become implementation-defined. Nothing you said is correct except perhaps, partially, the very last sentence.
While it may be some exaggeration but I don’t think it’s ridiculous levels: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html. There are more than a handful of options there and how they interact with UB is not clearly understood by most people.
The post your replied to suggested using exactly one option. The suggested option, -fwrapv, is a straightforward well-documented option which can only ever turn UB into an implementation-defined behaviour regardless of what other options it is used together with. There is no need to understand all the options that the compiler provides and they they interact with UB in order to use a select few well-known options that reduce the amount of UB for cases that people care most about.
Using -fwrapv -fno-strict-aliasing together are enough to eliminate the two biggest “this shouldn’t be UB” complaints. Neither of them can introduce UB.
I feel like you are sort of making my point for me. C and C++ have terrible defaults. I shouldn’t have to opt-in to those safety features. If I talk to 10 different C++ devs they will have 10 different recommendations for what flags should and shouldn’t be set. I stand by my claim that both languages are quite hostile to the developer.
Your reaction to someone suggesting a single commonly-used flag, which can only remove UB, was “Absolutely not” and decrying that you shouldn’t have to memorize “100’s of different flags” with an implication that they might introduce UB. If the language is hostile, then belligerently refusing any simple step to mitigate that hostility is bonkers.
If I talk to 10 different C++ devs they will have 10 different recommendations for what flags should and shouldn’t be set.
I doubt it, but if you put the question that broadly who cares? Ask 10 good C++ developers instead which flags you should use to reduce the risk of UB in your code, and 9/10 will say “-fwrapv -fno-strict-aliasing”. (I made up the numbers just like you did).
My reaction was not to the single flag. It was the larger context that making safety the default and better defining behaviour should be resisted. The flag was just a single example of what I consider the problem to be.
But only one flag was suggested in the post you replied to. The reason you gave for not using the flag was that “I should not have to memorize the 100’s of different flags and the way they interoperate and which forms of UB they risk me causing in order to write my code”. Again: nobody suggested you should have to memorize 100’s of different flags, and nobody said anything at all about flags that introduce risk of UB. If you had a point to make, it was lost in the hyperbole.
For what it’s worth, I agree that the existence of (one or several) compiler options to curtail UB are not by themselves a reason to not reduce UB in the same way in language definition. But I think you missed the point in the post you replied to: the default is in fact quite safe, but when we use “-O2” or “-O3” which are specifically telling the compiler to perform certain optimisations, we shouldn’t complain when it does so. You’re already changing the defaults, so adding a couple of “-fwrapv”-type options is hardly worth any drama.
Defining previously undefined behaviour is always backwards compatible.
(At least in theory. Of course in practice compilers are going to exhibit some behaviour, that people may rely on even though everyone is yelling at them not to.)
I would love to see a -Wundefined-behavior option to warn me of undefined behavior in my code. It should be possible because compiler writers are already detecting undefined behavior to take advantage of it. It should warn me of NULL checks being removed because of an earlier dereference (actual issue in the Linux kernel some years back), or a 32-bit unsigned integer in a for loop where 64-bit integer would be preferable, etc.
It should be possible because compiler writers are already detecting undefined behavior to take advantage of it
That’s not how compilers work. There are no specific checks for “this is dereferencing this pointer, so it can’t be null, remove the null check”, most compilers do an iterative process, step-by-step lowering the intermediate representation, and applying optimizations that are correct in the given context. It doesn’t have knowledge of the whole, an UB at one point may have inadvertently made another optimization possible in a much later step simply because compilers have to do assumptions that are always correct to worth anything.
If “the original intention of undefined behaviour was quite clearly to support different platforms when they behave differently, without sacrificing performance”, then what was the difference between undefined behaviour and implementation-defined behaviour?
Things like supporting both two’s complement machines and one’s complement machines and all sorts of weird integer overflow behaviour could’ve been done by making signed integer implementation-defined behaviour. Yet it was made undefined. In your model, why would that be?
“More behaviour should be made implementation-defined or completely defined instead of undefined” is a perfectly valid opinion to have, but I don’t buy the “UB was originally just meant to let compilers select appropriate instructions for the target hardware” narrative.
And if signed overflow was IB, an implementation could define that signed integer overflow traps, whatever trapping means on the particular hardware/OS/runtime.
Unfortunately C & C++ standard does not define trapping, so in practice all traps happen under UB. I would very much like to see something like ”frobnicate(0) returns an implementation defined value or traps”.
If the standard text is written as “returns an implementation defined value” then yeah, trapping falls outside of the scope of that. But if the standard is written as “the behavior is implementation defined when frobnicate is called with 0 as its argument”, surely trapping would be a valid behavior?
It’s not like this kind of wording is unheard of in the standard. For example, C17 5.1.2.1 says:
The effect of program termination in a freestanding environment is implementation-defined.
7.21.4.1 says:
If the file is open, the behavior of the remove function is implementation-defined.
(This is almost identical to the “the behavior of frobnicate(0) is implementation-defined” example)
And 6.10.6 point 1 says:
A preprocessing directive of the form
# pragma pp-tokens opt new-line
where the preprocessing token STDC does not immediately follow pragma in the directive (prior to
any macro replacement) causes the implementation to behave in an implementation-defined manner.
And there’s a few other cases of this completely unconstrained “if …, the behavior is implementation defined” language.
You are right to point out though that the vast majority of uses of “implementation-defined” in the standard talks about how some value is implementation defined or otherwise highly constrains the room the implementation actually has to define wild behavior. That wouldn’t make language like “the behavior is implementation defined when frobnicate is called with 0 as its argument” inappropriate for the standard, but it might suggest something about the intention behind the “implementation-defined” mechanism, I don’t know. At the very least I learned something new, I thought much more IB was just the unconstrained “the behavior is implementation defined” style, so thanks for making me go through the standard to find examples.
That might impose an undue burden on the implementation to do things in a nice manner. Which is the secret of UB: a lot of it is there because there are a lot of really really really terrible compilers out there, who have the power of veto on languages changes they think would be too annoying.
The only thing IB means is that the compiler must somehow define and document what happens when the situation is encountered. There is nothing about signed integer overflow on any hardware which would make it an undue burden for compilers to document what happens. Even if the behaviour is absolutely bonkers; there is no requirement that the behaviour defined by the implementation is “nice”, or stays consistent across versions.
So nothing about this changes the fact that if UB was just there to be able to support shitty compilers and weird hardware, as @Loup-Vaillant alleges, it would serve the same purpose as IB, so there would not have been a distinction between the two.
The only thing IB means is that the compiler must somehow define and document what happens when the situation is encountered.
It’s a bit more restrictive than that. In most cases IB means “such and such operation returns an implementation defined value”. Right shifting of negative integers for instance is implementation defined in the sense that it returns an implementation defined value. This disallows stuff like trapping.
Left shifting a negative integer however is plain UB. Which is why we sometimes write n * (1<<26) instead of just n<<26 in bignum arithmetic code. Yes, this is bonkers, but thankfully most compilers reduce the first into the second.
While I think there’s a good amount of low hanging fruit UB in C++, I also think that it isn’t really possible to have a language like C or C++ without lots of horrible UB. The only way to not have horrible UB is to restrict the programmer a lot, or having some method of making sure many evil memory problems deterministically trap, without too much runtime overhead. If we all had CHERI, this would be easy. But on most hardware, I don’t think most people would accept the performance impact, unfortunately.
That is an extremely reductive take for -fwrapv. Signed integer is undefined because it’s the correct answer. Unsigned integer overflows because of backwards compatibility. There’s a very good reason Zig and Carbon make both undefined, but you seem to say code that benefits from commutative arithmetic optimizations is “badly written” without elaborating why.
The biggest issue here is that we don’t have a way to enable optimizations on unsigned integers in any major compiler. Anyways, the real low hanging fruit is type punning implicit-lifetime types in a union. Compilers let you do it, but because it’s undefined they forbid it in an immediate function context.
That is an extremely reductive take for -fwrapv. Signed integer is undefined because it’s the correct answer.
Is it? It goes contrary to the instinct of most programmers that their x86 machine is 2’s complement, and arithmetic should wrap around. Few programmers are even aware that signed integer overflow is UB, and those that are tend to think it will just wrap around. They’re don’t know it can in some cases be as bad as introducing critical remote code execution vulnerabilities.
Nobody expects arithmetic to overflow at all except in niche scenarios, and best practice is enabling compile time and run time warnings for that. For those niche cases, you opt in to wrapping in Zig on a per-operation basis.
If I write a loop that always adds to the loop induction variable and is conditioned on a less-than operation on that counter, I expect it to terminate (so does the compiler - that’s about the only place where compilers take advantage of this). Wrapping behaviour can make this untrue, which is very confusing.
Though this only happens when the loop induction variable is narrower than the other side of the less-than operation (either fewer bits, or signed when the other side is unsigned), or the increment is bigger than 1 and may overflow before we catch the end condition.
I personally try to make sure my loop counter is at least as big as what it’s being compared to. Compilers could also issue a warning (as they already do with signed/unsigned comparisons). I don’t have a good solution for bigger increments, though I believe they’re rare enough that this is much less of an issue.
Finally, UB doesn’t look any better than wrapping around here. I guess there’s a better third alternative, though right now I’m not sure how it should look like.
Finally, UB doesn’t look any better than wrapping around here.
UB means that the optimiser can assume that the overflow doesn’t take place. It can then use scalar evolution to model loop nests. With overflow, it can’t.
My bad for suggesting wrapping dominates UB. There’s indeed a trade-off between the risks of UB and the performance loss of wrapping.
I still believe the compiler can see the absence of overflow if:
The only changing parameter in the end condition is the loop counter: i < const_size
The stuff it is being compared to is small enough to be represented in the same type as the loop counter.
The loop counter is incremented by a power of 2 (smaller than its own size). (Common case is 1.)
The loop counter starts at a multiple of this power of 2 (Common case is zero.)
This won’t cover all uses, but I believe it does cover most. It’s not perfect, but it’s probably good enough that I’d choose wrapping over UB. Actually I already do, my loop counters are all unsigned when they can be (generally size_t).
That might be possible, but I’m not sure how feasible it is to integrate with the scalar evolution infrastructure of most compilers. Those kinds of analyses are typically the results scalar evolution, not the inputs.
I misread the title as “C++ Should Be ++C”, a position I agree with because as an expression C++ evaluates to C, which sends the wrong message. (I imagine this occurred to Bjarne, but either it was too late to change it, or he just thought “C++” looked better.)
Memory safety
Official documents discussing legislation against C++ due its “memory unsafety” have caused community uproar.
…
Where memory safety is a serious concern, we see the adoption of Rust for critical components. Yet we see little demand from even these developers for C++ safety features. Their problem is already solved.
The direction group states “no language can be everything for everybody”[20] and I cannot agree more. Rust and other languages are successfully filling engineering needs for memory safety guarantees in critical components. This is not a space our users are demanding us to go into and doing so risks both failure and, yes, even more complexity.
Have a question for the folks that may be more familiar with this topic,
In terms of the C++ committee members, is that the prevailing view:
“This is not a space our users are demanding us to go into and doing so risks both failure and, yes, even more complexity.”
I think that section is a weak point of the paper. Specifically:
When most C++ developers haven’t adopted tools like Coverity and C++ core guidelines checkers, it is hard to claim that memory safety features substantially improve their lives at least from their point of view.
An alternate interpretation is that the cost/benefit of those tools isn’t right, or there are implementation quality or other issues.
I would very much agree with that suggestion, and also that the ergonomics of Rust’s unique ownership model for everything safe fits in the same space (which is why we’re still seeing a lot more new C++ than Rust being written). What developers want is to not think about memory safety and to not have any performance overhead from correct memory safety.
Coverity has far too many false positives. I’ve only used it a few times and I gave up each time because it generated hundreds of warnings in older codebases and every single one of the ones I took the time to work through were false positives. Maybe ten of the, were real bugs, but the effort in finding them was too high. It’s much easier to adopt these things in new projects (we use the clang static analyser in CI on most new project I work on) because you get maybe one or two false positives per commit and it’s easy to fix them at that point.
The C++ core guidelines have a bunch of things that are now just normal practice (e.g. no bare pointers) but other things that are hard to retrofit to existing code and which constrain data structures. You get far more performance from good data structure design then you do from micro optimisation and so adopting the core guidelines hurts a lot more at the macro scale.
Rust requires a complete rewrite (so even worse for incremental adoption) and constrains data structure design. It’s still very niche, though in places where the ideal data structures are representable it can be a huge win.
I’ve had very positive feedback from folks that have tried CHERIoT C++. It looks and feels like normal C++. You can write tiny code for resource-constrained systems. You get all of the power of C++’s rich metaprogramming functionality. Out of bounds accesses and use after free deterministically trap, so you can find them in testing and they’re not exploitable in production. If you could give me that at compile time, I’d be ecstatic.
Does it? I can think of a few projects that have incrementally moved to Rust. Though it seems easier for C projects than for C++ projects.
You can rewrite a subsystem in rust, but incremental rewriting is hard. You typically want to expose Rust data structures at interface boundaries, along with Rust lifetimes. If you are exposing C interfaces then there’s a lot of scope for getting things wrong. Rust defines all FFI as unsafe for a good reason: the Rust compiler will assume that all code on the other side of the FFI boundary is following the rules. This makes it very easy to introduce security vulnerabilities if you’re not careful when rewriting something smaller than a self-contained subsystem.
This is much worse if you’re moving from C++ than C, because C++ objects have their own requirements around lifetimes, deterministic destruction, and so on, which are not quite the same as the Rust ones. The state of Rust <-> C++ interop is a lot better than it was a couple of years ago (you no longer have to just go via C) but it’s still not ideal. C++ templates don’t quite map to Rust generics (or vice versa).
You can write Rust staticlibs or even, if using llvm, more directly compile them together through the magic of IR. A bigger issue is it gets tricky to match idioms on both sides of the language boundary. For example, template metadata programming is very very powerful but also very specific to C++. You can’t easily replicate it in other langs. It’s not a blocker necessarily but it does mean you may need some shim code on both sides unless you’re comfortable with a C-like interface rather than using more idiomatic C++.
Does it? I can think of a few projects that have incrementally moved to Rust. Though it seems easier for C projects than for C++ projects.
You can rewrite a subsystem in rust, but incremental rewriting is hard. You typically want to expose Rust data structures at interface boundaries, along with Rust lifetimes. If you are exposing C interfaces then there’s a lot of scope for getting things wrong. Rust defines all FFI as unsafe for a good reason: the Rust compiler will assume that all code on the other side of the FFI boundary is following the rules. This makes it very easy to introduce security vulnerabilities if you’re not careful when rewriting something smaller than a self-contained subsystem.
This is much worse if you’re moving from C++ than C, because C++ objects have their own requirements around lifetimes, deterministic destruction, and so on, which are not quite the same as the Rust ones. The state of Rust <-> C++ interop is a lot better than it was a couple of years ago (you no longer have to just go via C) but it’s still not ideal. C++ templates don’t quite map to Rust generics (or vice versa).
I wonder if the c++ comity could do something similar to what was done with constexpr and was extended progressively. It would be more difficult with safety but I can imagine this feature could be implemented with two new keywords (let’s call them “safe” and “assumed_safe”) for functions/variables and a type annotation for lifetime (maybe available only in “safe” parts of the code).
Non-annotated code would be able to call all code, “safe” code would be allowed to call only “safe”/“assumed_safe” code. “assumed_safe” would be the way to bypass the checker for limited amount of code.
I’m not sure that would work. Safety isn’t something that you add to a language, it’s a property that you get by removing things. All of the things that safe Rust permits are also permitted in C++, the problem is that a lot of other things are also permitted that lead to problems.
I guess you could incrementally remove things in functions not marked as unsafe (which is how Rust spells ‘assume safe’), but that makes each step a potentially breaking change.
Yeah memory safety in C++ is definitely not “already solved” by writing some components in Rust nor by existing tooling. As mentioned by others, static analysis can be so noisy it’s not useful in practice. We have nightly Valgrind runs and automatic crash reporting and take their results very seriously. It’s not sufficient but at least something.
I find it interesting that the three concrete examples at the end are in effect (standard) library level changes. To me as a reader unfamiliar with the author and the general C++ community it seems that they’re pretty happy with the core language as is. It’s often hard to move from that point onwards because the value gained from more drastic changes don’t stack up against the cost.
They’re also places where there’s a lot of good ecosystem support. There are a lot of different C++ container libraries. For JSON, there are several alternatives but most people seem to converge around nlogmann JSON, which is very flexible. If anything, the thing that this really needs is better compile-time reflection support in the language, which is something that WG21 is working on. Similarly, CLI11 is the de-facto standard for command-line option parsing. And, again, would benefit from better introspection (for example, with something like magic_enum in the standard, we’d be able to define flags as enums, rather than needing to provide the names of the flags separately for code and the CLI). I don’t really see any benefit in adding either of these to the standard library.
What would make all these more useful is a package manager. It wouldn’t matter so much whether JSON parsing were in std, if you could slurp in a 3rd party library with an “import” statement like you can in, say, Go.
Unfortunately for that you first need modules, and the C++20 module feature looks like a real mess to me — I wanted to use it a few months ago, but reading through docs and examples was too confusing. (But that seems to be largely a tools problem, so maybe it’ll be fixed without requiring language changes?)
All of these are in vcpkg and Conan, and can also easily be used as submodules. They’re also packaged in most distributions and BSDs (including Homebrew on macOS) if you prefer to dynamic link.
The best way to actually improve people’s lives as a C++ committee, I think, is to define more behaviour.
Both in scope and in reach. The original intention of undefined behaviour was quite clearly to support different platforms when they behave differently, without sacrificing performance. It has since expanded to allow (sometimes explicitly so) some performance optimisations. The problem is that now compilers are behaving like a sentient adversary, inserting vulnerabilities at the slightest UB even when the underlying platform could have behaved sensibly.
The most infuriating instance of course would be signed integer overflow. 2’s complement has won, no platform in current use is using anything else. And yet,
-fwrapvstill isn’t the default. Why? Because it would hurt the performance of some badly written code. There’s also strict aliasing, which managed to decrease the memory safety of C and C++ (if that was even possible), introducing bugs and vulnerabilities to otherwise perfectly valid code. At least this one allows even more potent optimisations. Still, wouldn’trestricthave been enough?Now I do see an obstacle to defining more behaviour (or at least moving it unspecified, or making traps as part of defined behaviour):
I believe this is “we” as “we the committee”. They feel responsible for the environmental impact of people’s code, and they feel obligated to serve us with the maximum possible performance in so many cases. I think this is misguided. Instead they should help programmers write more performant code when they need to.
Take signed integer overflow for instance: sure it hurts the performance of some loops, but any programmer who cares will correct their loops and use correctly signed and correctly sized integers to recover the speed. The benefits will be fewer undefined behaviours of course, but also the ability to check for overflow after the fact, which is sometimes faster than the alternative.
For better or worse (in my opinion mostly for worse), C++ is here to stay. I believe the best way to help people is to reduce the harm it causes. Thus, define more behaviour.
I will push back against this take every time.
GCC and Clang don’t try to optimise within the rules of the language standard because they are some kind of malevolent agent; they do it because that’s what we ask for when we pass
-O2. We tend to assume maximum performance is the default and then blame our tools when it’s not what we wanted.If you want
-fwrapv, use-fwrapv.I don’t care if they’re malevolent. They still act malevolent.
The malevolence I speak of sometimes also happens under
-O0.Optimise for the wrong thing, then blame the victim. I’m aware that’s the consensus among compiler writers.
I can’t.
I’m writing a library, and do not have full control over compilation options, or even the exact compiler being used. If I require
-fwrapvthe critical CVEs will start piling up pretty quickly.This is such nonsense. I’ve seen this sort of claim made before and it always irks me. “Malevolent” (or “malice”) ascribes intent. I’m sure you recognise that the compiler doesn’t and can’t actually have any ill-will to you or anyone else, so the claim boils down to “it acts the same as it would as if it were malevolent” (or in your words above, “they still act malevolent”).
Which is also completely wrong, and obviously so since a great deal of software manages to work perfectly fine despite being compiled with “malevolent” compilers. A truly malevolent compiler would of course break everything or, for example, silently insert security vulnerabilities into all programs regardless of whether their code is correct or not.
At this point the more ardent “malevolent compiler” club members will probably start frothing about how “that’s exactly what the compiler DOES!”, even though that’s demonstrably wrong, and then the argument usually devolves into a back-and-forth about what the C (or C++) language actually requires and so on, which is an interesting discussion the first couple of times but is also never really fruitful, perhaps because the same people arguing that the language standard means what they personally think it should mean are the same people who tend to make fallacious logical leaps (like “the compiler made my code do something I didn’t want it to and it’s therefore malevolent, because clearly it could never be my own fault.”).
So what we’re left with if we strip all the ridiculous exaggeration away is that “compilers sometimes behave in a way that is indistinguishable from malice”. Which is perhaps true in a limited sense, if you don’t understand how compilers work, and you define “malice” as “makes my program behave in a way that is different to what I wanted, even though its behaviour is technically undefined”. It is not true otherwise.
If you want the compiler to interpret your undefined-behaviour-filled code in some particular way there are a plethora of options it provides to do so. Just use them.
And argue, as you reasonably did at first, that some of that behaviour should really be defined by the language. There are plenty of good arguments you can make there. Alluding to malice or malevolence in the behaviour of compilers, though, is plain wrong.
To be more precise, my claim is that a sufficiently advanced optimising compiler is indistinguishable from a sentient adversary, that may insert vulnerabilities whenever your program exhibit some kind of UB. It won’t do so every time of course, but it happens often enough, and unpredictably enough, that our only reasonable choice is to hunt down every single UB and correct it.
Now in reality I don’t mind such an adversary, but I do want the odds stacked in my favour: I need a language where UB is easy enough to avoid in the first place.
That’s like saying that when I stub my toe on a random rock that it’s indistinguishable from stubbing my toe on a rock that was placed there by a sentient adversary (so that I would stub my toe on it). It’s trivially true, but also a pointless observation; that you make the “compilers act as sentient adversaries” claim then trivialise it in this way when challenged just looks like moving the goal posts.
The practical reasonable choice is that we do our best to avoid UB, mitigate its effects where possible, and otherwise address UB bugs when they become an issue. Most UB bugs probably don’t introduce vulnerabilities. I’d even surmise (admittedly without data to prove it) that most UB bugs that introduce vulnerabilities are buffer overflows, which you can’t trivially update the language requirements to fix, and that of the other UB-related bugs most are integer-overflow vulnerabilities that are actually exploiting 2’s complement wraparound (so that simply definining the overflow behavior as wrapping wouldn’t resolve the vulnerability anyway). Then there is of course the “programmer didn’t understand how to correctly check for overflow, so the compiler removed the check” variety; that’s about the only kind that tightening the language semantics really helps with.
“Our only reasonable choice is to hunt down every single UB and correct it” is nonsense. For all that UB “can do anything”, the reality is that most UB-involving vulnerabilities boil down to a small selection of different bug types, some of which can be mitigated by compiler options, and nearly all can be mitigated by more general techniques if we have the desire (OpenBSD for example employs a raft of defenses which make even a buffer overflow difficult to exploit). These are reasonable choices. And of course we also have the perfectly reasonable choice of using a different, memory-safe, language. C++ might be made “safer” but it’s never going to completely free of UB, so if you really believe that “zero UB” is on the only acceptable level then C++ isn’t the language for you even if they do remove some of the more common tripwires.
Key words being “most”, and “probably”. I can’t rely on that, especially if I expect the compiler to be updated.
As a library author I don’t control compiler options.
As a library author I don’t control the execution environment.
So what? You not being able to control the compiler options, or the execution environment, doesn’t mean the compiler is “acting malevolent”, nor does the fact that a newer version of a compiler might theoretically turn a UB-based bug into a vulnerability. Nor does it mean you are solely responsible for what someone else chooses to do with your library, including how they choose to compile it, or that they chose to use it in the first place, or that they chose to use (your) library written in a language which is known to have undefined behaviour.
Nothing to do with malevolence. I’m just saying your proposed mitigations are not available to me. People will update their compiler, and blame me if it causes problems. People will use flags I recommend against, and blame me if it causes problems.
In my eyes people blaming you when they shouldn’t is largely the same as you incorrectly ascribing malevolence to the compiler.
Shouldn’t they, really?
In principle, if I say my code is only meant for compiler X, version Y, kernel Z, with flags F, many are not going to read the manual, and try to use it elsewhere anyway. There’s little point blaming them. My only choice is to either cater to them, or sacrifice them. in this case I would try to put more substantial controls, like some non-portable
#ifdefand#error.Those controls though aren’t available to me either, because I’m writing a portable library (I know, I’m making my life difficult. But it paid off, I now have quite a few users).
The mitigations I proposed are available to you - you can insist that your library is compiled with certain options, for example, and then, yes, it is definitely the case that people who don’t use those options shouldn’t blame you if a bug in your library turns into a vulnerability (when it otherwise wouldn’t have); the responsibility does rest at least partially on the end user themselves for choosing to compile without safe options (even if you don’t insist that they do), and also partly on the language and - yes - the compiler. But that still doesn’t mean the compiler is malevolent. And if people do blame you anyway, it doesn’t make any difference to that.
“Our only reasonable choice is to hunt down every single UB and correct it” is still wrong. There are mitigations. You can insist they be used. Your library consumers can choose to use them. Maybe you think it’s your only choice to hunt down every single UB bug and correct it, but that’s not what you said, and I don’t agree anyway. It is, as I said, pretty much impossible to do that even with significant tweaks to make C++ safer, or equivalent tweaks in compiler defaults, because of buffer overruns (and dangling references, and similar bugs which have no such simple remedy as “just defining” the behaviour).
I have two problems: user stupidity, and portability goals.
I can to some extent punt on user stupidity, write the proper stern warnings everywhere appropriate, and blame users when something wrong happens. And if I’m being honest I already do to some extent: almost all my functions accept pointers, and I never check for NULL. Heck, most of my functions don’t even return an error code, instead they rely on the user to provide correct inputs. It’s an explicit design choice to reduce the amount of error handling, not only on my part, but on the user’s as well.
I could, if I primarily shipped binaries, rely on
-fwrapvand -fno-strict-aliasing. To a lesser extent I could also ask my users to activate those flags, and provide a default makefile that does exactly that. I do think however that this is less socially acceptable than failing to check user input.Portability however is a bear. I’m targetting the intersection between C99 and C++, on a wide range of targets, from 16-bits micro-controllers to 64-bits server powerhouses. I expect users will compile my code with GCC, Clang, MSVC, ICC, TCC, CompCert, and stuff I haven’t even heard of. I have actual users on various platforms. I have no idea what the effects of UB might actually be, so my only choice is to systematically avoid it, and stick to strictly conforming code.
I agree: case in point, the TIS interpreter, that guarantees will catch most classes of UB (if not all of them), runs orders of magnitude slower than native code. Tests that take 10 seconds to run require 35 hours with it. Even sanitisers, which don’t even catch all issues, are significantly slower than normal code (they remain fast enough to be bloody useful on debug builds though).
So I won’t advocate that we define all behaviour in C++, or even most of it. Just some, wherever reasonable.
I agreed with this from the start. Whatever your perceived need for eliminating all UB in your software (it’s not an ignoble goal, I also aspire to this), it doesn’t justify calling compilers “malevolent” or saying that they “act malevolent”.
IDK, if you keep stubbing your toe on rocks maybe it makes sense to move them away from the path. It doesn’t really matter if they were put there maliciously, if they are causing trouble then it makes sense to move them.
This is exactly my point. I don’t need to invent a malevolent adversary in order to avoid the rocks.
Under which situations are compilers acting “malevolent” at -O0?
Anyway, the way I see it, when I pass -O2, my deal with the compiler is: I write correct C++ code, it produces the fastest machine code it can on the assumption that the C++ code is correct. If the compiler tries to protect me from myself by producing needlessly slow code at -O2 based on the assumption that my C++ code is probably incorrect, I get annoyed.
There are parts of the language which I think should be made into IB or defined behavior rather than being UB, but that’s a problem with the language and not the compiler IMO.
It’s revealed preference. People who are OK with a bit lower perf already use safe languages for the most part, while many many C and C++ people care greatly about every last percentage point. Not all, certainly, but these advanced optimization passes wouldn’t be getting written unless there was a lot of demand for them.
Within an application, not all code paths are equally fast and it’s generally not straightforward to use one language for the performance-critical code and another language elsewhere. Changing languages doesn’t feel like a reasonable solution IMHO.
No. Absolutely not.
I should not have to memorize the 100’s of different flags and the way they interoperate and which forms of UB they risk me causing in order to write my code. That way lies madness. It’s a developer ergonomics disaster and is the cause of who knows how many bugs, security flaws, and performance issues. It is almost impossible to “hold the C++ compiler” correctly right now. Making that better would do wonders for C++ usage, performance, and safety.
The fact that you weren’t able to make this argument without resorting to a ridiculous level of exaggeration is telling. There are not 100’s of different flags required to relax the language rules that people typically get upset about; there’s a handful. You do not have to memorize them. They do not risk you causing UB, they make certain UB become implementation-defined. Nothing you said is correct except perhaps, partially, the very last sentence.
While it may be some exaggeration but I don’t think it’s ridiculous levels: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html. There are more than a handful of options there and how they interact with UB is not clearly understood by most people.
The post your replied to suggested using exactly one option. The suggested option,
-fwrapv, is a straightforward well-documented option which can only ever turn UB into an implementation-defined behaviour regardless of what other options it is used together with. There is no need to understand all the options that the compiler provides and they they interact with UB in order to use a select few well-known options that reduce the amount of UB for cases that people care most about.Using
-fwrapv -fno-strict-aliasingtogether are enough to eliminate the two biggest “this shouldn’t be UB” complaints. Neither of them can introduce UB.I feel like you are sort of making my point for me. C and C++ have terrible defaults. I shouldn’t have to opt-in to those safety features. If I talk to 10 different C++ devs they will have 10 different recommendations for what flags should and shouldn’t be set. I stand by my claim that both languages are quite hostile to the developer.
Your reaction to someone suggesting a single commonly-used flag, which can only remove UB, was “Absolutely not” and decrying that you shouldn’t have to memorize “100’s of different flags” with an implication that they might introduce UB. If the language is hostile, then belligerently refusing any simple step to mitigate that hostility is bonkers.
I doubt it, but if you put the question that broadly who cares? Ask 10 good C++ developers instead which flags you should use to reduce the risk of UB in your code, and 9/10 will say “
-fwrapv -fno-strict-aliasing”. (I made up the numbers just like you did).My reaction was not to the single flag. It was the larger context that making safety the default and better defining behaviour should be resisted. The flag was just a single example of what I consider the problem to be.
But only one flag was suggested in the post you replied to. The reason you gave for not using the flag was that “I should not have to memorize the 100’s of different flags and the way they interoperate and which forms of UB they risk me causing in order to write my code”. Again: nobody suggested you should have to memorize 100’s of different flags, and nobody said anything at all about flags that introduce risk of UB. If you had a point to make, it was lost in the hyperbole.
For what it’s worth, I agree that the existence of (one or several) compiler options to curtail UB are not by themselves a reason to not reduce UB in the same way in language definition. But I think you missed the point in the post you replied to: the default is in fact quite safe, but when we use “-O2” or “-O3” which are specifically telling the compiler to perform certain optimisations, we shouldn’t complain when it does so. You’re already changing the defaults, so adding a couple of “-fwrapv”-type options is hardly worth any drama.
All of that is on the language, and due to backwards compatibility not many of it can be resolved. Blaming the compiler is just stupid.
Defining previously undefined behaviour is always backwards compatible.
(At least in theory. Of course in practice compilers are going to exhibit some behaviour, that people may rely on even though everyone is yelling at them not to.)
How can you do this and still have a library ecosystem?
If people don’t even agree on what basic semantics are defined or undefined, isn’t sharing code just inherently super dangerous?
Occam’s razor: No.
-fmerge-constantsis another example of this.Also, in C++ you have very fine grained control over the semantics of optimizations in specific parts of the program using pragmas and attributes.
I would love to see a
-Wundefined-behavioroption to warn me of undefined behavior in my code. It should be possible because compiler writers are already detecting undefined behavior to take advantage of it. It should warn me of NULL checks being removed because of an earlier dereference (actual issue in the Linux kernel some years back), or a 32-bit unsigned integer in a for loop where 64-bit integer would be preferable, etc.That’s not how compilers work. There are no specific checks for “this is dereferencing this pointer, so it can’t be null, remove the null check”, most compilers do an iterative process, step-by-step lowering the intermediate representation, and applying optimizations that are correct in the given context. It doesn’t have knowledge of the whole, an UB at one point may have inadvertently made another optimization possible in a much later step simply because compilers have to do assumptions that are always correct to worth anything.
If “the original intention of undefined behaviour was quite clearly to support different platforms when they behave differently, without sacrificing performance”, then what was the difference between undefined behaviour and implementation-defined behaviour?
Things like supporting both two’s complement machines and one’s complement machines and all sorts of weird integer overflow behaviour could’ve been done by making signed integer implementation-defined behaviour. Yet it was made undefined. In your model, why would that be?
“More behaviour should be made implementation-defined or completely defined instead of undefined” is a perfectly valid opinion to have, but I don’t buy the “UB was originally just meant to let compilers select appropriate instructions for the target hardware” narrative.
On some machines, overflow always traps, which is a bit beyond overflow merely resulting in a different number.
And if signed overflow was IB, an implementation could define that signed integer overflow traps, whatever trapping means on the particular hardware/OS/runtime.
Unfortunately C & C++ standard does not define trapping, so in practice all traps happen under UB. I would very much like to see something like ”
frobnicate(0)returns an implementation defined value or traps”.If the standard text is written as “returns an implementation defined value” then yeah, trapping falls outside of the scope of that. But if the standard is written as “the behavior is implementation defined when frobnicate is called with 0 as its argument”, surely trapping would be a valid behavior?
It’s not like this kind of wording is unheard of in the standard. For example, C17 5.1.2.1 says:
7.21.4.1 says:
(This is almost identical to the “the behavior of frobnicate(0) is implementation-defined” example)
And 6.10.6 point 1 says:
And there’s a few other cases of this completely unconstrained “if …, the behavior is implementation defined” language.
You are right to point out though that the vast majority of uses of “implementation-defined” in the standard talks about how some value is implementation defined or otherwise highly constrains the room the implementation actually has to define wild behavior. That wouldn’t make language like “the behavior is implementation defined when frobnicate is called with 0 as its argument” inappropriate for the standard, but it might suggest something about the intention behind the “implementation-defined” mechanism, I don’t know. At the very least I learned something new, I thought much more IB was just the unconstrained “the behavior is implementation defined” style, so thanks for making me go through the standard to find examples.
That might impose an undue burden on the implementation to do things in a nice manner. Which is the secret of UB: a lot of it is there because there are a lot of really really really terrible compilers out there, who have the power of veto on languages changes they think would be too annoying.
The only thing IB means is that the compiler must somehow define and document what happens when the situation is encountered. There is nothing about signed integer overflow on any hardware which would make it an undue burden for compilers to document what happens. Even if the behaviour is absolutely bonkers; there is no requirement that the behaviour defined by the implementation is “nice”, or stays consistent across versions.
So nothing about this changes the fact that if UB was just there to be able to support shitty compilers and weird hardware, as @Loup-Vaillant alleges, it would serve the same purpose as IB, so there would not have been a distinction between the two.
It’s a bit more restrictive than that. In most cases IB means “such and such operation returns an implementation defined value”. Right shifting of negative integers for instance is implementation defined in the sense that it returns an implementation defined value. This disallows stuff like trapping.
Left shifting a negative integer however is plain UB. Which is why we sometimes write
n * (1<<26)instead of justn<<26in bignum arithmetic code. Yes, this is bonkers, but thankfully most compilers reduce the first into the second.While I think there’s a good amount of low hanging fruit UB in C++, I also think that it isn’t really possible to have a language like C or C++ without lots of horrible UB. The only way to not have horrible UB is to restrict the programmer a lot, or having some method of making sure many evil memory problems deterministically trap, without too much runtime overhead. If we all had CHERI, this would be easy. But on most hardware, I don’t think most people would accept the performance impact, unfortunately.
That is an extremely reductive take for
-fwrapv. Signed integer is undefined because it’s the correct answer. Unsigned integer overflows because of backwards compatibility. There’s a very good reason Zig and Carbon make both undefined, but you seem to say code that benefits from commutative arithmetic optimizations is “badly written” without elaborating why.The biggest issue here is that we don’t have a way to enable optimizations on unsigned integers in any major compiler. Anyways, the real low hanging fruit is type punning implicit-lifetime types in a
union. Compilers let you do it, but because it’s undefined they forbid it in an immediate function context.Is it? It goes contrary to the instinct of most programmers that their x86 machine is 2’s complement, and arithmetic should wrap around. Few programmers are even aware that signed integer overflow is UB, and those that are tend to think it will just wrap around. They’re don’t know it can in some cases be as bad as introducing critical remote code execution vulnerabilities.
What you call “correct answer”, I call astonishing user interface.
Nobody expects arithmetic to overflow at all except in niche scenarios, and best practice is enabling compile time and run time warnings for that. For those niche cases, you opt in to wrapping in Zig on a per-operation basis.
If I write a loop that always adds to the loop induction variable and is conditioned on a less-than operation on that counter, I expect it to terminate (so does the compiler - that’s about the only place where compilers take advantage of this). Wrapping behaviour can make this untrue, which is very confusing.
Though this only happens when the loop induction variable is narrower than the other side of the less-than operation (either fewer bits, or signed when the other side is unsigned), or the increment is bigger than 1 and may overflow before we catch the end condition.
I personally try to make sure my loop counter is at least as big as what it’s being compared to. Compilers could also issue a warning (as they already do with signed/unsigned comparisons). I don’t have a good solution for bigger increments, though I believe they’re rare enough that this is much less of an issue.
Finally, UB doesn’t look any better than wrapping around here. I guess there’s a better third alternative, though right now I’m not sure how it should look like.
UB means that the optimiser can assume that the overflow doesn’t take place. It can then use scalar evolution to model loop nests. With overflow, it can’t.
My bad for suggesting wrapping dominates UB. There’s indeed a trade-off between the risks of UB and the performance loss of wrapping.
I still believe the compiler can see the absence of overflow if:
i < const_sizeThis won’t cover all uses, but I believe it does cover most. It’s not perfect, but it’s probably good enough that I’d choose wrapping over UB. Actually I already do, my loop counters are all unsigned when they can be (generally
size_t).That might be possible, but I’m not sure how feasible it is to integrate with the scalar evolution infrastructure of most compilers. Those kinds of analyses are typically the results scalar evolution, not the inputs.
I misread the title as “C++ Should Be ++C”, a position I agree with because as an expression C++ evaluates to C, which sends the wrong message. (I imagine this occurred to Bjarne, but either it was too late to change it, or he just thought “C++” looked better.)
With regards to:
Have a question for the folks that may be more familiar with this topic,
In terms of the C++ committee members, is that the prevailing view:
?
I think that section is a weak point of the paper. Specifically:
An alternate interpretation is that the cost/benefit of those tools isn’t right, or there are implementation quality or other issues.
I would very much agree with that suggestion, and also that the ergonomics of Rust’s unique ownership model for everything safe fits in the same space (which is why we’re still seeing a lot more new C++ than Rust being written). What developers want is to not think about memory safety and to not have any performance overhead from correct memory safety.
Coverity has far too many false positives. I’ve only used it a few times and I gave up each time because it generated hundreds of warnings in older codebases and every single one of the ones I took the time to work through were false positives. Maybe ten of the, were real bugs, but the effort in finding them was too high. It’s much easier to adopt these things in new projects (we use the clang static analyser in CI on most new project I work on) because you get maybe one or two false positives per commit and it’s easy to fix them at that point.
The C++ core guidelines have a bunch of things that are now just normal practice (e.g. no bare pointers) but other things that are hard to retrofit to existing code and which constrain data structures. You get far more performance from good data structure design then you do from micro optimisation and so adopting the core guidelines hurts a lot more at the macro scale.
Rust requires a complete rewrite (so even worse for incremental adoption) and constrains data structure design. It’s still very niche, though in places where the ideal data structures are representable it can be a huge win.
I’ve had very positive feedback from folks that have tried CHERIoT C++. It looks and feels like normal C++. You can write tiny code for resource-constrained systems. You get all of the power of C++’s rich metaprogramming functionality. Out of bounds accesses and use after free deterministically trap, so you can find them in testing and they’re not exploitable in production. If you could give me that at compile time, I’d be ecstatic.
Does it? I can think of a few projects that have incrementally moved to Rust. Though it seems easier for C projects than for C++ projects.
You can rewrite a subsystem in rust, but incremental rewriting is hard. You typically want to expose Rust data structures at interface boundaries, along with Rust lifetimes. If you are exposing C interfaces then there’s a lot of scope for getting things wrong. Rust defines all FFI as unsafe for a good reason: the Rust compiler will assume that all code on the other side of the FFI boundary is following the rules. This makes it very easy to introduce security vulnerabilities if you’re not careful when rewriting something smaller than a self-contained subsystem.
This is much worse if you’re moving from C++ than C, because C++ objects have their own requirements around lifetimes, deterministic destruction, and so on, which are not quite the same as the Rust ones. The state of Rust <-> C++ interop is a lot better than it was a couple of years ago (you no longer have to just go via C) but it’s still not ideal. C++ templates don’t quite map to Rust generics (or vice versa).
You can write Rust staticlibs or even, if using llvm, more directly compile them together through the magic of IR. A bigger issue is it gets tricky to match idioms on both sides of the language boundary. For example, template metadata programming is very very powerful but also very specific to C++. You can’t easily replicate it in other langs. It’s not a blocker necessarily but it does mean you may need some shim code on both sides unless you’re comfortable with a C-like interface rather than using more idiomatic C++.
You can rewrite a subsystem in rust, but incremental rewriting is hard. You typically want to expose Rust data structures at interface boundaries, along with Rust lifetimes. If you are exposing C interfaces then there’s a lot of scope for getting things wrong. Rust defines all FFI as unsafe for a good reason: the Rust compiler will assume that all code on the other side of the FFI boundary is following the rules. This makes it very easy to introduce security vulnerabilities if you’re not careful when rewriting something smaller than a self-contained subsystem.
This is much worse if you’re moving from C++ than C, because C++ objects have their own requirements around lifetimes, deterministic destruction, and so on, which are not quite the same as the Rust ones. The state of Rust <-> C++ interop is a lot better than it was a couple of years ago (you no longer have to just go via C) but it’s still not ideal. C++ templates don’t quite map to Rust generics (or vice versa).
I wonder if the c++ comity could do something similar to what was done with constexpr and was extended progressively. It would be more difficult with safety but I can imagine this feature could be implemented with two new keywords (let’s call them “safe” and “assumed_safe”) for functions/variables and a type annotation for lifetime (maybe available only in “safe” parts of the code).
Non-annotated code would be able to call all code, “safe” code would be allowed to call only “safe”/“assumed_safe” code. “assumed_safe” would be the way to bypass the checker for limited amount of code.
I’m not sure that would work. Safety isn’t something that you add to a language, it’s a property that you get by removing things. All of the things that safe Rust permits are also permitted in C++, the problem is that a lot of other things are also permitted that lead to problems.
I guess you could incrementally remove things in functions not marked as unsafe (which is how Rust spells ‘assume safe’), but that makes each step a potentially breaking change.
Yeah memory safety in C++ is definitely not “already solved” by writing some components in Rust nor by existing tooling. As mentioned by others, static analysis can be so noisy it’s not useful in practice. We have nightly Valgrind runs and automatic crash reporting and take their results very seriously. It’s not sufficient but at least something.
I find it interesting that the three concrete examples at the end are in effect (standard) library level changes. To me as a reader unfamiliar with the author and the general C++ community it seems that they’re pretty happy with the core language as is. It’s often hard to move from that point onwards because the value gained from more drastic changes don’t stack up against the cost.
They’re also places where there’s a lot of good ecosystem support. There are a lot of different C++ container libraries. For JSON, there are several alternatives but most people seem to converge around nlogmann JSON, which is very flexible. If anything, the thing that this really needs is better compile-time reflection support in the language, which is something that WG21 is working on. Similarly, CLI11 is the de-facto standard for command-line option parsing. And, again, would benefit from better introspection (for example, with something like magic_enum in the standard, we’d be able to define flags as enums, rather than needing to provide the names of the flags separately for code and the CLI). I don’t really see any benefit in adding either of these to the standard library.
What would make all these more useful is a package manager. It wouldn’t matter so much whether JSON parsing were in std, if you could slurp in a 3rd party library with an “import” statement like you can in, say, Go.
Unfortunately for that you first need modules, and the C++20 module feature looks like a real mess to me — I wanted to use it a few months ago, but reading through docs and examples was too confusing. (But that seems to be largely a tools problem, so maybe it’ll be fixed without requiring language changes?)
All of these are in vcpkg and Conan, and can also easily be used as submodules. They’re also packaged in most distributions and BSDs (including Homebrew on macOS) if you prefer to dynamic link.
[Comment removed by author]