Threads for davmac

  1.  

    The title needs “…on the Itanium architecture” appended to it. Does anyone even care about Itanium anymore?

    1.  

      Itanium is perhaps the coolest architecture of all time.

      1.  

        It’s talking about the Itanium ABI, which is used (instantiated, with modifications) on a heap of architectures, including x86 and x86-64.

        1.  

          Interesting, I didn’t realize the Itanium ABI was supported on other processors.

          1.  

            To be clear, this is specifically the Itanium C++ ABI - It’s concerned with details such as when values are passed on stack vs in registers, how class objects are passed/returned, how vtables are structured, and also a bunch of details about what runtime support looks like. So you can take the non-processor specific parts out and re-purpose it for another architecture pretty easily, and that’s exactly what has been done.

      1.  

        I’m generally an advocate of C++ but rather than “cute”, I found this implementation (of casting memcpy to a different type of function, which “works” based on knowledge of underlying ABI) kind of horrific. Surely this is the sort of thing that should be done with a compiler builtin, rather than relying on specific semantics for code that definitely invokes undefined behaviour.

        1. 1

          I was getting troubled by the slicing problem example with Complex and Quaternion and then it hit me: his inheritance hierarchy is upside-down, which is why this is a problem. A Quaternion is not a Complex and so the Quaternion class shouldn’t inherit publicly from Complex (if anything, it should be the other way around; and then you wouldn’t have the slicing problem anyway since you’re not adding fields in the subclass).

          This is not to say slicing is never a real problem, but I feel like the particular example is bad.

          1. 4

            What if there was a way to opt-in to RAII? i.e., just like you can tell C++ to infer a variable’s type from the value assigned, you could tell Zig to “infer” the deferred cleanup to be run. The cleanup would then be written elsewhere in a deconstructor.

            What do people think? Would this be convenient? Or would it produce clashing coding styles—namely, to a greater extent than systems programming languages already allow and/or encourage?

            1. 16

              For me, the point of the article was that if it’s opt-in, it’s “wrong by default”. Leaving that aside, I feel like it’s a bad idea generally to have different “modes” for programming languages. If I’m looking at some piece of code, I have to remember what mode it works in to know if it’s correct; I have to be much more careful in copying chunks of code around, as well (sure, “shouldn’t do that”, but it happens).

              1. 1

                I have to be much more careful in copying chunks of code around…

                I mean, whether you’re using RAII or defer, any relevant block of code that isn’t purposefully obtuse will be contiguous. So I don’t think refactoring or moving code around should be an issue.

                For me, the point of the article was that if it’s opt-in, it’s “wrong by default”.

                As far as I can see, the whole point of system programming languages is that almost everything is opt-in. Rust is basically the sole exception. So I’m not sure I buy that this principle is generally applicable, at least not to the extent you’re agreeing with the article that, say, C++ does more than Zig to protect programmers from themselves.

                What do people think? Would this be convenient? Or would it produce clashing coding styles—namely, to a greater extent than systems programming languages already allow and/or encourage?

                I feel like it’s a bad idea generally to have different “modes” for programming languages.

                I personally try to avoid sweeping statements about whether a feature is subjectively good or bad. I’m in more of a utilitarian camp. Does a feature generally help or hurt in practice? Perhaps I should have stated that more explicitly in my initial question.

                1. 1

                  I mean, whether you’re using RAII or defer, any relevant block of code that isn’t purposefully obtuse will be contiguous. So I don’t think refactoring or moving code around should be an issue.

                  What I meant was, you need to be careful about moving code that is written for one mode into a context where the other mode is active.

                  … at least not to the extent you’re agreeing with the article that, say, C++ does more than Zig to protect programmers from themselves.

                  I didn’t say that I agreed with the article (but I also don’t think the article makes that claim).

            1. 15

              There are a bunch of good reasons for this, in no particular order:

              When you’re writing a program, unless you have a 100% accurate specification and formally verify your code, you will have bugs. You also have a finite amount of cognitive load that your brain can devote to avoiding bugs, so it’s a good idea to prioritise certain categories. Generally, bugs impact one or more of three categories, in (for most use cases) descending order of importance:

              1. Integrity
              2. Confidentiality
              3. Availability

              A bug that affects integrity is the worst kind because its effects can be felt back in time (corrupting state that you thought was safe). This is why Raskin’s first law (a program may not harm a user’s data or, through inaction, allow a human’s data to come to harm) is his first law. Whatever you do, you should avoid things that can cause data loss. This is why memory safety bugs are so bad: they place the entire program in an undefined state where any subsequent instruction that the CPU executes may corrupt the user’s data. Things like SQL injection fall into a similar category: they allow malicious or buggy inputs to corrupt state.

              Confidentiality may be almost as important in a lot of cases, but often the data that a program is operating on is of value only to the user and so leaking it doesn’t matter nearly as much as damaging it. In some defence applications the converse is true and it’s better to destroy the data than allow it to be leaked.

              Availability generally comes in last. The only exceptions tend to be safety-critical systems (if your car’s brakes fail to respond for 5 seconds, that’s much worse than your engine management system corrupting the mileage logs or leaking your position via a mobile channel, for example). For most desktop software, it’s a very distant third. If a program crashes without losing any data, and restarts quickly, I lose a few seconds of time but nothing else. macOS is designed so that the entire OS can crash without annoying the user too much. Almost every application supports sudden termination: it persists data to disk in the background and so the kernel can kill it if it runs out of memory. If the kernel panics then it typically takes a minute or two to reboot and come back to the original state.

              All of this means that a bug from not properly handling out-of-memory conditions is likely to have very low impact on the user. In contrast, it requires a huge amount of effort to get right. Everything that transitively allocates an object must handle failure. This is a huge burden on the programmer and if you get it wrong in one path then you may still see crashes from memory exhaustion.

              Next, there’s the question of what you do if memory is exhausted. As programs become more complicated, the subset of their behaviour that doesn’t require allocation becomes proportionally smaller. C++, for example, can throw an exception if operator::new fails[1], but what do you do in those catch blocks? Any subsequent memory allocation is likely to fail, and so even communicating with the user in a GUI application may not be possible. The best you can do is write unsaved data to disk, but if you’re respecting Raskin’s first law then you did that as soon as possible and so doing it on memory exhaustion is not a great idea. Most embedded / kernel code works around this by pre-allocating things at the start of some operation so that it has a clear failure point and can abort the operation if allocation fails. That’s much harder to do in general-purpose code.

              Closely related, the failure is (on modern operating systems that are not Windows) not related to allocation. Overcommit is a very important tactic for maximising the use of memory (memory that you’ve paid for but are not using is wasted). This means that malloc / new / whatever is not the point where you receive the out-of-memory notification. You receive it when you try to write to the memory, the OS takes a copy-on-write fault, and cannot allocate physical memory. This means that any store instruction my be the thing to trigger memory exhaustion ( it often isn’t that bad, but on systems that do deduplication, it is exactly that bad). If you thought getting exception handling right for anything that calls new was hard, imagine how much harder it is if any store to memory needs correct exception handling.

              Finally, and perhaps most importantly, there’s the question of where to build in reliability in a system. I think that the most important lesson from Erlang is that failure should be handled at the largest scale possible. Running out of memory is one possible cause of a program crashing. If you correctly handle it in every possible case, you probably still have other things that can cause the program to crash. In the best case, with formally verified code from a correct and complete specification, hardware failures can cause crashing. If you really want reliable systems then you should work on the assumption that the program can crash. Again, macOS does this well and provides very fast recovery paths. If a background app crashes on macOS, the window server keeps a copy of the window contents, the kernel restarts the app, which reconnects and reclaims the windows and draws back into them. The user probably doesn’t notice. In a server system, if you have multiple fault-tolerant replicas then you handle memory exhaustion (as long as it’s not triggered by allowing an attacker to allocate unbounded amounts of memory) in the same way that you handle any other failure: kill a replica and restart. The same mechanism protects you against large numbers of bug categories, including a blown fuse in the datacenter.

              All other things being equal, I would like programs to handle out of memory conditions gracefully but all other things are not equal and I’d much rather that they provided me with strong data integrity, data confidentiality, and could recover quickly from crashes.

              [1] Which, on every non-Windows platform, requires heap allocation. The Itanium ABI spec requires that the C++ runtime maintain a small pool of buffers that can be used but this has two additional problems. First, on a system that does overcommit, there’s no guarantee that the first use of those buffers won’t cause CoW faults and a SIGSEGV anyway. Second, there’s a finite pool of them and so in a multithreaded program some of the threads may be blocked waiting for others to complete error handling, and this may cause deadlock.

              1. 2

                C++, for example, can throw an exception if operator::new fails[1], but what do you do in those catch blocks? Any subsequent memory allocation is likely to fail, and so even communicating with the user in a GUI application may not be possible

                This may or not be the case depending on what you were doing inside the try. In the example of a particularly large allocation for a single operation, it’d be pretty straightforward to inform the user and abort the operation. For the case of GUI needing (but not being able) to allocate, I’d suggest that good design would have all allocation needed for user interaction being done early (during application startup) so this doesn’t present as a problem, even if it’s only for critical interactions.

                All other things being equal, I would like programs to handle out of memory conditions gracefully but all other things are not equal and I’d much rather that they provided me with strong data integrity, data confidentiality, and could recover quickly from crashes.

                Agreed, but it bothers me that the OS itself (and certain libraries, and certain languages) put blocks in the way to ever handling the conditions gracefully.

                Thanks for your comments.

                1. 1

                  This may or not be the case depending on what you were doing inside the try. In the example of a particularly large allocation for a single operation, it’d be pretty straightforward to inform the user and abort the operation.

                  That’s definitely true but most code outside of embedded systems has a lot of small allocations. If one of these fails then you need to backtrack a lot. This is really hard to do.

                  Agreed, but it bothers me that the OS itself (and certain libraries, and certain languages) put blocks in the way to ever handling the conditions gracefully.

                  Apparently there’s been a lot of discussion about this topic in WG21. In the embedded space (including kernels), gracefully handling allocation failure is critical, but these environments typically disable exceptions and so can’t use the C++ standard interfaces anyway. Outside of the embedded space, there are no non-trivial C++ applications that handle allocation failure correctly in all cases, in spite of the fact that the standard was explicitly designed to make it possible.

                  Note that Windows was designed from the NT kernel on up to enable precisely this. NT has a policy of not making promises it can’t keep. When you ask the kernel for committed memory, it increments a count for your process representing ‘commit charge’. The total commit charge of all processes (and bits of the kernel) must add up to less than the available memory + swap. Requests to commit memory will fail if this limit is exceeded. Even stack allocations will probe and will throw exceptions on stack overrun. SEH doesn’t require any heap allocations and so can report out-of-memory conditions (it does require stack allocations, so I’m not quite sure what it does for those - I think there’s always one spare page for each stack) and all of the higher-level Windows APIs support graceful handling of allocation errors.

                  With all of that in mind, have you seen evidence that Windows applications are more reliable or less likely to lose user data than their macOS counterparts?

                  1. 1

                    Outside of the embedded space, there are no non-trivial C++ applications that handle allocation failure correctly in all cases

                    I’ve written at least one that is supposed to do so, though it depends on your definition of “trivial” I guess. But anyway, “applications don’t do it” was one of the laments.

                    With all of that in mind, have you seen evidence that Windows applications are more reliable or less likely to lose user data than their macOS counterparts?

                    That’s a bit of a straw-man, though, isn’t it? Nobody’s claimed that properly handling allocation failure at the OS level will by itself make applications more reliable.

                    I understand that people don’t think the problem is worth solving (that was somewhat the point of the article) - I think it’s subjective though. Arguments that availability is less important than integrity for example aren’t news, and aren’t enough to change my mind (I’ll point out that the importance of availability doesn’t diminish to zero just because there are higher-priority concerns). Other things that are being bought up are just echoing things already expressed by the article itself - the growing complexity of applications, the difficulty of handling allocation failure correctly; I agree the problem is hard, but I lament that OS behaviour, library design choices and language design choices only serve to make it harder, and for instance that programming languages aren’t trying to tackle the problem better.

                    But, if you disagree, I’m not trying to convince you.

                    1. 1

                      I’ve written at least one that is supposed to do so

                      I took a very quick (less than one minute) skim of the code and I found this line, where you use a throwing variant of operator new, in a way that is not exception safe. On at least one of the call chains that reach it, you will hit an exception-handling block that handle that failure and so will propagate it outwards.

                      It might be that you correctly handle allocation failure but a quick skim of the code suggests that you don’t. The only code that I’ve ever seen that does handle it correctly outside of the embedded space was written in Ada.

                      With all of that in mind, have you seen evidence that Windows applications are more reliable or less likely to lose user data than their macOS counterparts?

                      That’s a bit of a straw-man, though, isn’t it? Nobody’s claimed that properly handling allocation failure at the OS level will by itself make applications more reliable.

                      No, I’m claiming the exact opposite: that making people think about and handle allocation failure increases cognitive load and makes them more likely to introduce other bugs.

                      1. 1

                        I took a very quick (less than one minute) skim of the code and I found this line,

                        That’s in a utility that was just added to the code base, is still a work in progress, and the “new” happens on the setup path where termination on failure is appropriate (though, yes, it would be better to output an appropriate response rather than let it propagate right through and terminate via “unhandled exception”). The daemon itself - the main program in the repository - is, as I said, supposed to be resilient to allocation failure; If you want to skim anything to check what I’ve said, you should skim that.

                        No, I’m claiming the exact opposite: that making people think about and handle allocation failure increases cognitive load and makes them more likely to introduce other bugs.

                        Well, if you are making a claim, you should provide the evidence yourself, rather than asking whether I’ve seen any. I don’t think, though, that you can draw such a conclusion, even if there is evidence that Windows programs are generally more buggy than macOS equivalents (and that might be the case). There may be explanations other than “the windows developers are trying to handle allocation failure and introducing bugs as a result”. In any case, I still feel that this is missing the point.

                        (Sorry, that’s more inflammatory than I intended: what I meant was, you’re missing the thrust of the article. I’m really not interested in an argument about whether handling allocation failures is harder than not doing so; that is undeniably true. Does it lead to more bugs? With all other things being equal, it quite possibly does, but “how much so” is unanswered, and I still think there is a potential benefit; I also believe that the cost could be reduced if language design tried to address the problem).

                        1. 2

                          No, I’m claiming the exact opposite: that making people think about and handle allocation failure increases cognitive load and makes them more likely to introduce other bugs.

                          Well, if you are making a claim, you should provide the evidence yourself, rather than asking whether I’ve seen any.

                          The evidence that I see is that every platform that has designed APIs to require handling of OOM conditions (Symbian, Windows, classic MacOS, Win16) has had a worse user experience than ones that have tried to handle this at a system level (macOS, iOS, Android) and systems such as Erlang that don’t try to handle it locally are the ones that have the best uptime for large-scale systems.

                          You are making a claim that handling memory failures gracefully will improve something. Given that the experience of the last 30 years is that not doing so improves usability, system resilience, and data integrity, you need to provide some very strong evidence to back up that claim.

                          1. 1

                            You are making a claim that handling memory failures gracefully will improve something

                            Of course it will improve something - it will improve the behaviour of applications that encounter memory allocation failures. I feel like that’s a worthwhile goal. That’s the extent of my “claim”. It doesn’t need proving because it’s not really a claim. It’s a subjective opinion.

                            If all you want to do is say “you’re wrong”, you’ve done that. In all politeness, I don’t care what you think. You made some good points (as well as some that I flat-out disagree with, and some that are anecdotal or at least subjective) but that’s not changing my opinion. If you don’t want to discuss the ideas I was actually trying to raise, let’s leave it.

                2. 2

                  Yes it is better that programs crash rather than continue to run in a degraded state but when a program crashes is still a bad thing. This reads like an argument that quality is low because of all the quality that is being delivered, or that memory leaks aren’t worth fixing.

                  1. 2

                    That’s an argument that you can have programs that don’t crash by correctness. I.e., you can’t just be really really careful and write code that won’t crash. It’s basically impossible. What you can do is handle what is doable and architecture for redundancy, fast recovery, and minimization of damage.

                    1. 2

                      Data corruption, wrong results, and other Undefined Behavior are usually worse than crashing.

                      And I’m sorry to go into Grandpa Mode, but it’s easy to complain about quality when you haven’t had to try to handle and test every conceivable allocation failure (see my very long comment here for details.)

                  1. 8

                    I suspect few people reading this have had to deliver software that runs in a memory-constrained environment, one where the code must handle allocation failures because they are likely to occur in real use cases.

                    Nowadays the only environments like this (aside from niche retro stuff) are embedded systems. I get the impression that people often design embedded software by eliminating dynamic allocation, or restricting it enough that there aren’t too many failure cases to handle.

                    Outside the embedded domain, you have to go back to “classic” pre-X MacOS or Windows 95. There is no virtual memory to speak of (MacOS 7.5+ had VM but it just let you raise the apparent RAM limit by 2x or so.) Computers ship with too little RAM because it’s expensive, especially when politicians add tariffs because “the Japanese are taking over.” Users try to do too much with their PCs. On MacOS every app has to predeclare how much RAM it needs, and that’s how big its heap is. There’s a limited way to request “temporary memory” outside that, but it’s problematic. (I hear Windows 95 was slightly better with memory but I have no knowledge of it.)

                    This was absolutely hellish. You can get pretty far in development without running into memory problems because your dev machine has a honkin’ 16MB of RAM, you configure the app to request a healthy size heap, and you’re mostly just running it briefly with small-to-moderate data sizes. But around beta time Marketing reminds you that most users only have 4MB and the app needs to run in a 2MB heap. And the testers start creating some big documents for it to open. And the really good testers come up with more and more creative scenarios to trigger OOM. (The best ones involve OOM during a save operation. If they’re sadistic they’ll make it happen in a low-disk-space situation too, because your target hardware only has a 160MB hard drive.)

                    So I remember the last six months or so of the development cycle involving tracking down so many crashes caused by OOM. Or worse, the bugs that aren’t crashes, but an ignored malloc error or faulty cleanup code corrupted memory and caused misbehavior or a later crash or data corruption.

                    [I just remembered a great story from 1995-ish. I worked on OpenDoc, which had a clever, complicated file format called Bento. The schmuck who implemented it, from Jed Harris’s elegant design, punted on all the error handling by simply calling a “fail” callback passed in by the caller. The callback wasn’t expected to return. For some reason the implications weren’t realized until the 1.0 beta cycle. I think the guy who implemented the document-storage subsystem passed in a callback that called longjmp and did some cleanup and returned an error. Unfortunately this (a) leaked memory allocated by Bento, and (b) tended to leave the file — the user’s document — in a corrupt state. I guess nobody had heard of ACID. This had to be fixed ASAP before beta. I can’t remember whether the fix was to implement a safe-save (slow and requires more disk space) or to fix Bento by putting in real error handling.]

                    Oh, and ignoring a NULL result from malloc doesn’t immediately crash. MacOS had this wonderful feature that there was actual memory mapped at location 00000000. You could read or write it without crashing. But if you wrote more than a few hundred bytes (IIRC) there you overwrote interrupt vectors and crashed the whole computer, hard. I have no idea why this was done, it probably let Bruce Horn or Bill Atkinson shave some cycles from something in 1983 and afterwards it could never be changed.

                    Anyway, TMI, but my point is that trying to correctly handle all memory allocation errors in large programs is extremely difficult because there are so many new code paths involved. (Recovering from OOM without incurring more failures was a black art, too.) I firmly believe it isn’t worth it, in general. Design the OS so it happens rarely, and make the program crash immediately so at least it doesn’t have time to corrupt user data. Oh, and put a good auto-save feature in the GUI framework’s Document class, so even if this happens the user only loses a minute of work.

                    1. 1

                      MacOS had this wonderful feature that there was actual memory mapped at location 00000000

                      This actually sounds interesting if it wasn’t implemented in a dumb way? Like, handle writes to 00000000000, so that failed allocations don’t immediately fail, but track how many, and if it’s over s threshold, reboot the system safely.

                      1. 1

                        It stems from the MC68000, which didn’t have virtual memory. The CPU expects a table at location 0 with various pointers (the first two indicate the starting PC and SP values; others are for various exceptions and IRQ handlers). It was most likely in ROM (since it contains start up data) so writes wouldn’t affect it, but mileage may vary (some systems might map RAM into place at a certain point in time).

                      2. 1

                        my point is that trying to correctly handle all memory allocation errors in large programs is extremely difficult because there are so many new code paths involved. (Recovering from OOM without incurring more failures was a black art, too.) I firmly believe it isn’t worth it, in general.

                        Here your sentiment closely echoes something that is alluded to in the article:

                        Apart from the increased availability of memory, I assume that the other reason for ignoring the possibility of allocation failure is just because it is easier. Proper error handling has traditionally been tedious, and memory allocation operations tend to be prolific; handling allocation failure can mean having to incorporate error paths, and propagate errors, through parts of a program that could otherwise be much simpler. As software gets larger, and more complex, being able to ignore this particular type of failure becomes more attractive.

                        While I agree that for a lot of software where termination is the only suitable response to out-of-memory condition, I also think there is a range of software where sudden termination is quite undesirable. Something I didn’t say is that we do in fact have reasonable tools in some languages for dealing with allocation failure in a reasonable way - I’m thinking of exceptions and RAII / “defer” idioms - without having to add thousands of “if (p == null) …” checks through the code. (I wonder if this last is going to be even more contentious than the article).

                        And while I can see that for some applications in nearly all circumstances the sensible option really is to just terminate, what about those applications - or those circumstances - where it’s not? This was meant to be the main point: the frameworks that applications are built on - the OS, the libraries - are preventing an application from cleanly handling OOM even if they otherwise could. I think that’s unfortunate. We may disagree, and I acknowledge there are strong arguments on the other side.

                        Thanks for your comments. I found the anecdotes about Win95 / MacOS really interesting.

                      1. 11

                        As someone who is rather new to languages like C (I only recently got into it by making a game with it), I have a few newbie questions:

                        • Why do people want to replace C? Security reasons, or just old and outdated?

                        • What does Hare offer over C? They say that Hare is simpler than C, but I don’t understand exactly how. Same with Zig. Do they compile to C in the end, and these languages just make it easier for user to write code?

                        That being said, I find it cool to see these languages popping up.

                        1. 33

                          Why do people want to replace C? Security reasons, or just old and outdated?

                          • #include <foo.h> includes all functions/constants into the current namespace, so you have no idea what module a function came from
                          • C’s macro system is very, very error prone and very easily abused, since it’s basically a glorified search-and-replace system that has no way to warn you of mistakes.
                          • There are no methods for structs, you basically create struct Foo and then have to name all the methods of that struct foo_do_stuff (instead of doing foo_var.do_stuff() like in other languages)
                          • C has no generics, you have to do ugly hacks with either void* (which means no type checking) or with the macro system (which is a pain in the ass).
                          • C’s standard library is really tiny, so you end up creating your own in the process, which you end up carrying around from project to project.
                          • C’s standard library isn’t really standard, a lot of stuff isn’t consistent across OS’s. (I have agreeable memories of that time I tried to get a simple 3kloc project from Linux running on Windows. The amount of hoops you have to jump through, tearing out functions that are Linux-only and replacing them with an ifdef mess to call Windows-only functions if you’re on compiling on Windows and the Linux versions otherwise…)
                          • C’s error handling is completely nonexistant. “Errors” are returned as integer codes, so you need to define an enum/constants for each function (for each possible returned error), but if you do that, you need to have the actual return value as a pointer argument.
                          • C has no anonymous functions. (Whether this matters really depends on your coding style.)
                          • Manual memory management without defer is a PITA and error-prone.
                          • Weird integer type system. long long, int, short, etc which have different bit widths on different arches/platforms. (Most C projects I know import stdint.h to get uint32_t and friends, or just have a typedef mess to use usize, u32, u16, etc.)

                          EDIT: As Forty-Bot noted, one of the biggest issues are null-terminated strings.

                          I could go on and on forever.

                          What does Hare offer over C?

                          It fixes a lot of the issues I mentioned earlier, as well as reducing footguns and implementation-defined behavior in general. See my blog post for a list.

                          They say that Hare is simpler than C, but I don’t understand exactly how.

                          It’s simpler than C because it comes without all the cruft and compromises that C has built up over the past 50 years. Additionally, it’s easier to code in Hare because, well, the language isn’t trying to screw you up every 10 lines. :^)

                          Same with Zig. Do they compile to C in the end, and these languages just make it easier for user to write code?

                          Zig and Hare both occupy the same niche as C (i.e., low-level manual memory managed systems language); they both compile to machine code. And yes, they make it a lot easier to write code.

                          1. 15

                            Thanks for the great reply, learned a lot! Gotta say I am way more interested in Hare and Zig now than I was before.

                            Hopefully they gain traction. :)

                            1. 15

                              #include <foo.h> includes all functions/constants into the current namespace, so you have no idea what module a function came from

                              This and your later point about not being able to associate methods with struct definitions are variations on the same point but it’s worth repeating: C has no mechanism for isolating namespaces. A C function is either static (confined to a single compilation unit) or completely global. Most shared library systems also give you a package-local form but anything that you’re exporting goes in a single flat namespace. This is also true of type and macro definitions. This is terrible for software engineering. Two libraries can easily define different macros with the same name and break compilation units that want to use both.

                              C++, at least, gives you namespaces for everything except macros.

                              C has no generics, you have to do ugly hacks with either void* (which means no type checking) or with the macro system (which is a pain in the ass).

                              The lack of type checking is really important here. A systems programming language is used to implement the most critical bits of the system. Type checks are incredibly important here, casting everything via void* has been the source of vast numbers of security vulnerabilities in C codebases. C++ templates avoid this.

                              C’s standard library is really tiny, so you end up creating your own in the process, which you end up carrying around from project to project.

                              This is less of an issue for systems programming, where a large standard library is also a problem because it implies dependencies on large features in the environment. In an embedded system or a kernel, I don’t want a standard library with file I/O. Actually, for most cloud programming I’d like a standard library that doesn’t assume the existence of a local filesystem as well. A bigger problem is that the library is not modular and layered. Rust’s nostd is a good step in the right direction here.

                              C’s error handling is completely nonexistant. “Errors” are returned as integer codes, so you need to define an enum/constants for each function (for each possible returned error), but if you do that, you need to have the actual return value as a pointer argument.

                              From libc, most errors are not returned, they’re signalled via the return and then stored in a global (now a thread-local) variable called errno. Yay. Option types for returns are really important for maintainable systems programming. C++ now has std::optional and std::variant in the standard library, other languages have union types as first-class citizens.

                              Manual memory management without defer is a PITA and error-prone.

                              defer isn’t great either because it doesn’t allow ownership transfer. You really need smart pointer types and then you hit the limitations of the C type system again (see: no generics, above). C++ and Rust both have a type system that can express smart pointers.

                              C has no anonymous functions. (Whether this matters really depends on your coding style.)

                              Anonymous functions are only really useful if they can capture things from the surrounding environment. That is only really useful in a language without GC if you have a notion of owning pointers that can manage the capture. A language with smart pointers allows you to implement this, C does not.

                              1. 6

                                defer isn’t great either because it doesn’t allow ownership transfer. You really need smart pointer types and then you hit the limitations of the C type system again (see: no generics, above). C++ and Rust both have a type system that can express smart pointers.

                                True. I’m more saying that defer is the baseline here; without it you need cleanup: labels, gotos, and synchronized function returns. It can get ugly fast.

                                Anonymous functions are only really useful if they can capture things from the surrounding environment. That is only really useful in a language without GC if you have a notion of owning pointers that can manage the capture. A language with smart pointers allows you to implement this, C does not.

                                I disagree, depends on what you’re doing. I’m doing a roguelike in Zig right now, and I use anonymous functions quite extensively for item/weapon/armor/etc triggers, i.e., where each game object has some unique anonymous functions tied to the object’s fields and can be called on certain events. Having closures would be nice, but honestly in this use-case I didn’t really feel much of a need for it.

                              2. 3

                                Note that C does have “standard” answers to a lot of these.

                                C’s macro system is very, very error prone and very easily abused, since it’s basically a glorified search-and-replace system that has no way to warn you of mistakes.

                                The macro system is the #1 thing keeping C alive :)

                                There are no methods for structs, you basically create struct Foo and then have to name all the methods of that struct foo_do_stuff (instead of doing foo_var.do_stuff() like in other languages)

                                Aside from macro stuff, the typical way to address this is to use a struct of function pointers. So you’d create a wrapper like

                                do_stuff(struct *foo)
                                {
                                    foo->do_stuff(foo);
                                }
                                

                                C has no generics, you have to do ugly hacks with either void* (which means no type checking) or with the macro system (which is a pain in the ass).

                                Note that typically there is a “base class” which either all “subclasses” include as a member (and use offsetof to recover the subclass) or have a void * private data pointer. This doesn’t really escape the problem, however in practice I’ve never run into a bug where the wrong struct/method gets combined. This is because the above pattern ensures that the correct method gets called.

                                C’s error handling is completely nonexistant. “Errors” are returned as integer codes, so you need to define an enum/constants for each function (for each possible returned error), but if you do that, you need to have the actual return value as a pointer argument.

                                Well, there’s always errno… And if you control the address space you can always use the upper few addresses for error codes. That said, better syntax for multiple return values would probably go a long way.

                                C has no anonymous functions. (Whether this matters really depends on your coding style.)

                                IIRC gcc has them, but they require executable stacks :)

                                Manual memory management without defer is a PITA and error-prone.

                                Agree. I think you can do this with GCC extensions, but some sugar here would be nice.

                                Weird integer type system. long long, int, short, etc which have different bit widths on different arches/platforms. (Most C projects I know import stdint.h to get uint32_t and friends, or just have a typedef mess to use usize, u32, u16, etc.)

                                Arguably there should be fixed width types, size_t, intptr_t, and regsize_t. Unfortunately, C lacks the last one, which is typically assumed to be long. Rust, for example, gets this even more wrong and lacks the last two (c.f. the recent post on 129-bit pointers).


                                IMO you missed the most important part, which is that C strings are (by-and-large) nul-terminated. Having better syntax for carrying a length around with a pointer would go a long way to making string support better.

                              3. 9

                                Even in C’s domain, where C lacks nothing and is fine for what it is, I would criticize C for maybe 5 things, which I would consider the real criticism:

                                1. It has undefined behaviour, of the kind that has come to mean that the compiler may disobey the source code. It turns working code into broken code just by switching compiler or inlining some code that wasn’t inlined before. You can’t necessarily point at a piece of code and say it was always broken, because UB is a runtime phenomenon. Not reassuring for a supposedly lowlevel language.
                                2. Its operator precedence is wrong.
                                3. Integer promotion. Just why.
                                4. Signedness propagates the wrong way: Instead of the default type being signed (int) and comparison between signed and unsigned yielding unsigned, it should be opposite: There should be a nat type (for natural number, effectively size_t), and comparison between signed and unsigned should yield signed.
                                5. char is signed. Nobody likes negative code points.
                                1. 6

                                  the kind that has come to mean that the compiler may disobey the source code. It turns working code into broken code

                                  I’m wary of this same tired argument cropping up again, so I’ll just state it this way: I disagree. Code that invokes undefined behavior is already broken; changing compiler can’t (except perhaps in very particular circumstances, which I don’t think you were referring to) introduce undefined behaviour; it can change the observable behaviour when UB is invoked.

                                  A compiler can’t “disobey the source code” whilst conforming to the language standard. If the source code does something that doesn’t have defined semantics, that’s on the source code, not the compiler.

                                  “It’s easy to accidentally invoke undefined behaviour in C” is a valid criticism, but “C compilers breaks code” is not.

                                  You can’t necessarily point at a piece of code and say it was always broken

                                  You certainly can in some instances. But sure, for example, if some piece of code dereferences a pointer and the value is set somewhere else, it could be undefined or not depending on whether the pointer is valid at the point it is dereferenced. So code might be “not broken” given certain constraints (eg that the pointer is valid), but not work properly if those constraints are violated, just like code in any language (although in C there’s a good chance the end result is UB, which is potentially more catastrophic).

                                  I’m not saying C is a good language, just that I think this particular criticism is unfair. (Also I think your point 5 is wrong, char can be unsigned, it’s up to the implementation).

                                  1. 7

                                    Thing is, it certainly feels like the compiler is disobeying the source code. Signed integer overflow? No problem pal, this is x86, that platform will wrap around just fine! Right? Riiight? Oops, nope, and since the compiler pretends UB does not exist, it just deleted a security check that it deemed “dead code”, and now my hard drive has been encrypted by a ransomware that just exploited my vulnerability.

                                    Though I agree with all the facts you laid out, and with the interpretation that UB means the program is already broken even if the generated binary didn’t propagate the error. But Chandler Carruth pretending that UB does not invoke the nasal demons is not far. Let’s not forget that UB means the compiler is allowed to cause your entire hard drive to be formatted, as ridiculous as it may sound. And sometimes it actually happens (as it did so many times with buffer overflow exploits).

                                    Sure, it’s not like the compiler is actually disobeying your source code. But since UB means “all bets are off”, and UB is not always easy to catch, the result is pretty close.

                                    1. 3

                                      Sure, it’s not like the compiler is actually disobeying your source code. But since UB means “all bets are off”, and UB is not always easy to catch, the result is pretty close.

                                      I feel like “disobeying the code” and “not doing what I intended it to do due to the code being wrong” are still two sufficiently different things that it’s worth distinguishing.

                                      1. 4

                                        Okay, it is worth distinguishing.

                                        But it is also worth noting that C is quite special. This UB business repeatedly violates the principle of least astonishement. Especially the modern interpretation, where compilers systematically assume UB does not exist and any code path that hits UB is considered “dead code”.

                                        The original intent of UB was much closer to implementation defined behaviour. Signed integer overflow was originally UB because some platforms crashed or otherwise went bananas when it occurred. But the expectation was that on platforms that behave reasonably (like x86, that wraps around), we’d get the reasonable behaviour. But then compiler writers (or should I say their lawyers) noticed that strictly speaking, the standard didn’t made that expectation explicit, and in the name of optimisation started to invoke nasal demons even on platforms that could have done the right thing.

                                        Sure the code is wrong. In many cases though, the standard is also wrong.

                                        1. 4

                                          I agree with some things but not others that you say, but these arguments have been hashed out many times before.

                                          Sure the code is wrong

                                          That’s the point I was making. Since we agree on that, and we agree that there are valid criticisms of C as a language (though we may differ on the specifics of those), let’s leave the rest. Peace.

                                    2. 4

                                      But why not have the compiler reject the code instead of silently compiling it wrong?

                                      1. 2

                                        It doesn’t compile it wrong. Code with no semantics can’t be compiled incorrectly. You’re making the exact same misrepresentation as in the post above that I responded to originally.

                                        1. 3

                                          Code with no semantics shouldn’t be able to be compiled at all.

                                          1. 1

                                            I’d almost agree, though I can think of some cases where such code could exist for a reason (and I’ll bet that such code exists in real code bases). In particular, hairy macro expansions etc which produce code that isn’t even executed (or won’t be executed in the case where it would be UB, at least) in order to make compile-time type-safety checks. IIRC there are a few such things used in the Linux kernel. There are probably plenty of other cases; there’s a lot of C code out there.

                                            In practice though, a lot of code that potentially exhibits UB only does so if certain constraints are violated (eg if a pointer is invalid, or if an integer is too large and will result in overflow at some operation), and the compiler can’t always tell that the constraints necessarily will be violated, so it generates code with the assumption that if the code is executed, then the constraints do hold. So if the larger body of code is wrong - the constraints are violated, that is - the behaviour is undefined.

                                            1. 1

                                              In particular, hairy macro expansions etc which produce code that isn’t even executed (or won’t be executed in the case where it would be UB

                                              That’s why it’s good to have a proper macro system that isn’t literally just find and replace.

                                              In practice though, a lot of code that potentially exhibits UB only does so if certain constraints are violated

                                              True, and I’m mostly talking about UB that can be detected at compile time, such as f(++x, ++x).

                                  2. 6

                                    Contrary to what people are saying, C is just fine for what it is.

                                    People complain about the std library being tiny, but you basically have the operating system at your fingers, where C is a first class citizen.

                                    Then people complain C is not safe, yes that’s true, but with a set of best practices you can keep thing under control.

                                    People complain you don’t have generics, you dont need them most of the time.

                                    Projects like nginx, SQLite and redis, not to speak about the Nix world prove that C is perfectly fine of a language. Also most of the popular python libraries nowadays are written in C.

                                    1. 25

                                      Hi! I’d like to introduce you to Fish in a Barrel, a bot which publishes information about security vulnerabilities to Twitter, including statistics on how many of those vulnerabilities are due to memory unsafety. In general, memory unsafety is easy to avoid in languages which do not permit memory-unsafe operations, and nearly impossible to avoid in other languages. Because C is in the latter set, C is a regular and reliable source of security vulnerabilities.

                                      I understand your position; you believe that people are morally obligated to choose “a set of best practices” which limits usage of languages like C to supposedly-safe subsets. However, there are not many interesting subsets of C; at best, avoiding pointer arithmetic and casts is good, but little can be done about the inherent dangers of malloc() and free() (and free() and free() and …) Moreover, why not consider the act of choosing a language to be a practice? Then the choice of C can itself be critiqued as contrary to best practices.

                                      nginx is well-written, but Redis is not. SQLite is not written just in C, but also in several other languages combined, including SQL and TH1 (“test harness one”); this latter language is specifically for testing that SQLite behaves property. All three have had memory-unsafety bugs. This suggests that even well-written C, or C in combination with other languages, is unsafe.

                                      Additionally, Nix is written in C++ and package definitions are written in shell. I prefer PyPy to CPython; both are written in a combination of C and Python, with CPython using more C and PyPy using more Python. I’m not sure where you were headed here; this sounds like a popularity-contest argument, but those are not meaningful in discussions about technical issues. Nonetheless, if it’s the only thing that motivates you, then consider this quote from the Google Chrome security team:

                                      Since “memory safety” bugs account for 70% of the exploitable security bugs, we aim to write new parts of Chrome in memory-safe languages.

                                      1. 2

                                        I am curious about your claim that Redis is not well-written? I’ve seen other folks online hold it up as an example of a well-written C codebase, at least in terms of readability.

                                        I understand that readable is not the same as secure, but would like to understand where you are coming from on this.l

                                        1. 1

                                          It’s 100% personal opinion.

                                      2. 9

                                        Projects like nginx, SQLite and redis, not to speak about the Nix world prove that C is perfectly fine of a language.

                                        Ah yes, you can see the safety of high-quality C in practice:

                                        https://nginx.org/en/security_advisories.html https://www.cvedetails.com/vulnerability-list/vendor_id-18560/product_id-47087/Redislabs-Redis.html

                                        Including some fun RCEs, like CVE-2014-0133 or CVE-2016-8339.

                                        1. 2

                                          I also believe C will still have a place for long time. I know I’m a newbie with it, but making a game with C (using Raylib) has been pretty fun. It’s simple and to the point… And I don’t mind making mistakes really, that’s how I learn the best.

                                          But again it’s cool to see people creating new languages as alternatives.

                                        2. 4

                                          What does Hare offer over C?

                                          Here’s a list of ways that Drew says Hare improves over C:

                                          Hare makes a number of conservative improvements on C’s ideas, the biggest bet of which is the use of tagged unions. Here are a few other improvements:

                                          • A context-free grammar
                                          • Less weird type syntax
                                          • Language tooling in the stdlib
                                          • Built-in and semantically meaningful static and runtime assertions
                                          • A lightweight system for dependency resolution
                                          • defer for cleanup and error handling
                                          • An optional build system which you can replace with make and standard tools

                                          Even with these improvements, Hare manages to be a smaller, more conservative language than C, with our specification clocking in at less than 1/10th the size of C11, without sacrificing anything that you need to get things done in the systems programming world.

                                          It’s worth reading the whole piece. I only pasted his summary.

                                        1. 1

                                          TLDR version is “don’t compare a realloc result to the original pointer to determine if the object moved, it’s not guaranteed to work”. Why? Because realloc might move the allocation, rendering the old pointer invalid, and since you don’t know whether the allocation moved at the point you’re doing the test, its value is indeterminate and the test is meaningless.

                                          I’ve come across this myself a little while back, because of the way Clang/LLVM handles this code (thanks to John Regehr, https://www.cs.utah.edu/~regehr/ub-2017-qualcomm.pdf):

                                          #include <stdlib.h>
                                          #include <stdio.h>
                                          #include <stdint.h>
                                          
                                          int main() {
                                              int *p = malloc(sizeof(int));
                                              int *q = realloc(p, sizeof(int));
                                          
                                              if (p == q)  {
                                                  *p = 1;
                                                  *q = 2;
                                                  if (*p != *q) abort();
                                              }
                                          }
                                          

                                          As can be see via godbolt (https://godbolt.org/z/qbGWrM3MP) Clang compiles this into an unconditional call to abort(). Seems surprising - if (p == q) how can (*p != *q) - but that is certainly allowed by a strict reading of the standard, since an “indeterminate” value doesn’t need to behave consistently. Also the store (*p = 1) is undefined behaviour since the pointer value might not be valid.

                                          You should be able to fix this by casting the pointer to uintptr_t before the call to realloc and only using the resulting integer value for the comparison (i.e. it should be ok to check that up == (uintptr_t)q if up is the value of (uintptr_t)p as evaluated prior to the call to realloc), but it’s rare to see code that does this (and it doesn’t change the result from Clang anyway, though I think this qualifies as a Clang bug at that point).

                                          1. 1

                                            I was quite surprised by this because this particular idiom was sufficiently common that we made some explicit decisions in how CHERI C handles pointer comparison to be able to support it properly. Our experience at the time was that a lot of real-world code broke if you didn’t handle this.

                                            Your example is a bit different because it’s dereferencing the pointer, though I’m a bit surprised that LLVM notes that p and q are equal (via an icmp eq IR instruction) but also thinks that they can’t alias. I think the problem is that storing through p is unambiguously UB. If you modify this to hoist that store above the point where realloc is called then it works:

                                            
                                            int main() {
                                                int *p = malloc(sizeof(int));
                                                *p = 1;
                                                int *q = realloc(p, sizeof(int));
                                            
                                                if (p == q)  {
                                                    *q = 2;
                                                    if (*p != *q) abort();
                                                }
                                            }
                                            

                                            LLVM optimises this to:

                                            define dso_local i32 @main() local_unnamed_addr #0 !dbg !8 {
                                              %1 = tail call noalias align 16 dereferenceable_or_null(4) i8* @malloc(i64 4) #4, !dbg !17
                                              %2 = bitcast i8* %1 to i32*, !dbg !17
                                              call void @llvm.dbg.value(metadata i32* %2, metadata !14, metadata !DIExpression()), !dbg !18
                                              store i32 1, i32* %2, align 16, !dbg !19, !tbaa !20
                                              %3 = tail call align 16 dereferenceable_or_null(4) i8* @realloc(i8* %1, i64 4) #4, !dbg !24
                                              call void @llvm.dbg.value(metadata i8* %3, metadata !16, metadata !DIExpression()), !dbg !18
                                              %4 = icmp eq i8* %1, %3, !dbg !25
                                              br i1 %4, label %5, label %6, !dbg !27
                                            
                                            5:                                                ; preds = %0
                                              call void @llvm.dbg.value(metadata i32* %2, metadata !16, metadata !DIExpression()), !dbg !18
                                              store i32 2, i32* %2, align 16, !dbg !28, !tbaa !20
                                              br label %6, !dbg !30
                                            
                                            6:                                                ; preds = %5, %0
                                              ret i32 0, !dbg !31
                                            }
                                            

                                            Note that, in this case, the abort call is completely gone because the fact that %1 and %3 are the same means that non-atomic loads from the two addresses will trivially compare equal and so the entire condition is optimised away to always true and the basic block is deleted.

                                          1. 10

                                            More generally: any libc function may call malloc. If this matters to you, then you should look at the libc internals and audit any function that you care about. Folks that ship a libc need to think about this in a few places. For example, FreeBSD libc uses jemalloc, which uses locks to protect some data structures, but the locks call malloc and so have their own bootstrapping path.

                                            1. 6

                                              The specification of a lot of functions doesn’t have a suitable failure mode, so no, they can’t really call malloc (and require it to succeed) without being non-conformant.

                                              1. 5

                                                That’s not true, async-signal-safety is a thing.

                                              1. 2

                                                A while ago I discovered a similar bug on MacOS with signal delivery. That is, I had code that would listen for signals via kqueue, and once one was received, use sigtimedwait() (with a zero timeout) to retrieve the signal data. The problem was that kqueue would indicate the signal before it was available for sigtimedwait(). Demo and more complete explanation here.

                                                1. 3

                                                  Traditional C++ exceptions have two main problems:

                                                  1. the exceptions are allocated in dynamic memory

                                                  I think this is an implementation choice. I’m pretty sure the MS ABI allocates exceptions on the stack. This comes with its own problems (your handler runs with an extended stack, and can’t reduce it until the exception goes out of scope), but carefully-written code which is aware of the mechanism shouldn’t run into problems.

                                                  In fact, in my opinion, that’s how exceptions should be allocated. It’s a shame that Linux uses heap allocation instead.

                                                  1. exception unwinding is effectively single-threaded, because the table driven unwinder logic used by modern C++ compilers

                                                  Again, not a facet of the language but of the implementations. This proposal seems to be about suggesting language changes for implementation problems…

                                                  I do agree though that std::current_exception() is problematic (and I think it probably should never have existed). But even this is really only a problem for optimisation - of a mechanism that is already supposed to be used only for exceptional situations.

                                                  1. 7

                                                    I think this is an implementation choice. I’m pretty sure the MS ABI allocates exceptions on the stack.

                                                    Yes and no. It depends on what you throw, if I remember correctly. If you’re throwing a value then it will be allocated on the stack and then copied to the called stack frame. All of the unwinder state lives on the stack, including the state required for std::current_exception. If you throw by pointer then the object needs to be allocated on the heap because the address needs to be stable. One of the problems with the current C++ unwind mechanism is that it needs to handle both cases.

                                                    I think the article is mostly talking about the unwind state (specifically, the __cxa_exception structure, in the Itanium ABI). C++11 introduced some very painful things here to allow you to partially catch an exception and then re-raise it in another thread. This requires heap allocation but, in theory at least, could be deferred until someone actually used this functionality (I presume there are users in the wild - I’ve only ever used it in a test when I implemented it).

                                                    In fact, in my opinion, that’s how exceptions should be allocated. It’s a shame that Linux uses heap allocation instead.

                                                    Linux adopted the Itanium ABI, which exists mostly because Itanium made it very difficult to implement a conventional setjmp / longjmp. A lot of the design decisions there were influenced by the fact that Borland had a load of patents on the Windows SEH model. I think the last of these expired ten years ago, so it should be completely safe to implement SEH-like unwinding on other platforms now.

                                                    The Windows model is a bit more painful for the compiler. In the Itanium model, the catch blocks are just regions in the function and they run after the unwinder has transferred control back to the function containing the code, which then runs on the top of the stack. In the Windows model, these must be outlined as funclets, functions that run with a pointer to the original stack frame, on top of the stack but with access to a frame somewhere else.

                                                    LLVM already has logic for generating funclets (I think GCC does too?) for the Windows ABI, so it might be quite easy to add a funclet-based ABI for *NIX. Statically linked things could adopt it with a compile flag, dynamically linked things would need a feature flag unless you added a fallback mechanism that used the Itanium unwinder when it found a frame with Itanium ABI things.

                                                    Even then, that wouldn’t help with the second problem. Both Windows and Itanium unwind ABIs now use tables and so need to have some map from return address to the table for the function that you’re trying to unwind through. This needs to be safe with respect to loading and unloading libraries. The dl_iterate_phdr API on *NIX doesn’t need to acquire a single lock but it does need to guarantee that it iterates over all loaded ELF objects even if others are concurrently loaded.

                                                    Again, not a facet of the language but of the implementations. This proposal seems to be about suggesting language changes for implementation problems…

                                                    The language defines the space of possible implementations. One of the things I’ve been playing with recently is using the FreeBSD system call calling convention for exceptions. This uses the carry flag to differentiate between error and non-error returns. You can follow each call with a branch-on-carry, which will be statically predicted as not taken by most systems and have this fall to the error handling path. This works only if exceptions fit in the return register[s] on all platforms. Returning a single word is fine here, so if you require exceptions to be globally allocated objects or error codes, then it’s fine. You can’t do this for C++ in the general case but you could in a language that was more willing to restrict what it permits to be thrown. This is what we’re planning on doing for Verona. In LLVM IR we can model every call returning an extra i1 that tells you if the return value is the real return or an exception, use normal control flow and inlining, and then lower it to one extra instruction in the back end.

                                                    1. 2

                                                      If you throw by pointer then the object needs to be allocated on the heap because the address needs to be stable

                                                      I’m not sure if I understand what you mean by “throw by pointer”. If you are referring to using std::rethrow_exception to throw via std::exception_ptr, then the implementation is allowed to throw a copy of the object instead; there’s no need to preserve the address. Perhaps I’m not understanding you correctly.

                                                      C++11 introduced some very painful things here to allow you to partially catch an exception and then re-raise it in another thread. This requires heap allocation but

                                                      I assume again you mean std::current_exception, std::rethrow_exception. current_exception potentially requires a heap allocation if the original exception object is stack-allocated (because if it makes a copy of the exception, it needs to store the copy somewhere), but that by itself shouldn’t make it impossible to allocate thrown exception objects on the stack. Since rethrow_exception can also make a copy, it can even move [edit: not literally move, but copy] the heap-allocated copy “back to the stack”, unless I’m missing something.

                                                      Linux adopted the Itanium ABI

                                                      That is, indeed, what I am complaining about :)

                                                      I’m sure it seemed like a pragmatic choice, but in my view, it’s an unfortunate one. The fact that throwing an exception requires heap allocation (which can of course fail) is awful. Current implementations generally have a fixed-size “emergency pool” per thread which is used in case regular heap allocation fails; but of course the emergency pool can be insufficient, or can become exhausted in the presence of nested exceptions.

                                                      The language defines the space of possible implementations

                                                      But the complaints in this proposal don’t seem to be about the space of possible implementations, but the particulars of certain implementations. Again, unless I’m missing something, it should be possible to allocate exception objects on the stack, and the proposal itself discusses solutions to other problems (inefficient parallelism etc) which don’t require language changes.

                                                      1. 1

                                                        I’m not sure if I understand what you mean by “throw by pointer”.

                                                        throw new Foo();
                                                        

                                                        The heap allocation isn’t part of the throw mechanism, but it still exists. In C++, catch (T) and catch (T*) are completely unrelated things. If you throw a T then it may be copied, if you throw a T* then the pointee may not be copied.

                                                        I assume again you mean std::current_exception, std::rethrow_exception. current_exception potentially requires a heap allocation if the original exception object is stack-allocated (because if it makes a copy of the exception, it needs to store the copy somewhere), but that by itself shouldn’t make it impossible to allocate thrown exception objects on the stack. Since rethrow_exception can also make a copy, it can even move [edit: not literally move, but copy] the heap-allocated copy “back to the stack”, unless I’m missing something.

                                                        Kind of. Thrown objects in C++ don’t need to be copyable, but they do need to be movable. This leaks into the language though in two ways. First, move constructors can have side effects and so the number of moves is observable. Second, with the Itanium ABI, there is a single copy (which is typically elided) and code is written on the assumption . The object is copied (or directly allocated) into the space returned when the ABI library is asked to allocate space for the exception (the thrown object is stored directly after the exception object). The begin-catch function returns a pointer to this, so no copy is needed for the catch. The Windows ABI works in a similar way: the thrown object is allocated on the stack and remains there as the funclets that implement the catch access it directly.

                                                        I’m sure it seemed like a pragmatic choice, but in my view, it’s an unfortunate one. The fact that throwing an exception requires heap allocation (which can of course fail) is awful. Current implementations generally have a fixed-size “emergency pool” per thread which is used in case regular heap allocation fails; but of course the emergency pool can be insufficient, or can become exhausted in the presence of nested exceptions.

                                                        The emergency pool is mandated by the spec. It’s a complete waste of time because if malloc fails then you may discover that the emergency buffers are overcommitted and fail. I recently added an option to libcxxrt to disable them.

                                                        But the complaints in this proposal don’t seem to be about the space of possible implementations, but the particulars of certain implementations. Again, unless I’m missing something, it should be possible to allocate exception objects on the stack, and the proposal itself discusses solutions to other problems (inefficient parallelism etc) which don’t require language changes.

                                                        I think part of the problem is that the C++ standard is in denial about dynamic code [un]loading. This was apparent in thread-local variables, where there’s no good way of implementing destructors for thread-local variables. Exceptions require walking the stack and finding the associated cleanup. There are two ways of doing this:

                                                        • Maintain a stack of cleanup functions. This is what the Win32 ABI did. It incurs a (small) performance penalty on entry and exit of every try block and so means that it isn’t a ‘zero-cost’ abstraction.
                                                        • Walk a linker data structure to find the table (or cleanup function) corresponding to the function on the stack. This requires global synchronisation with respect to library loading and unloading.

                                                        You can somewhat mitigate the latter case with different locking policies or with lock-free data structures for loaded objects but it’s not clear whether this just reduces the problem.

                                                        Note that extrapolating from increasing core counts is probably not a good idea. Existing cache coherency protocols struggle a lot above about 128 cores, so I’d expect cache-coherent systems with >128 cores to be rare for quite a while.

                                                        Avoiding exceptions, or limiting what exceptions can do, would allow exception throwing to become a local problem: no stack walking and unwind, just a lightweight conditional branch on a flag.

                                                        1. 1

                                                          The heap allocation isn’t part of the throw mechanism, but it still exists.

                                                          Well, ok, yes, if you explicitly perform a heap allocation then there will be a heap allocation, that’s self-evident. The thrown object in this case is really the pointer, however, and doesn’t need to be on the heap. I guess I should have said that the MS ABI allocates thrown objects on the stack rather than exceptions.

                                                          Kind of. […]

                                                          What you said at this point doesn’t make clear to me what part of what I wrote was only “kind of” correct, other than the move-vs-copy thing which is incidental. (Speaking of incidentals, GCC does appear to allow throwing a non-moveable-but-copyable object, although Clang doesn’t).

                                                          All the detail about the Itanium C++ ABI: I know all this, but it has no bearing. My point is that the ABI is a bad ABI because it mandates heap allocation for thrown objects. The alternative that I’m suggesting would’ve been a better choice doesn’t require more or less copies (or moves) be performed, except in the case where std::current_exception gets used. If code exists which expects that std::current_exception and/or std::rethrow_exception don’t make copies then I’d personally be happy just call it bad code and be done with it, and anyway if we’re concerned about preserving existing code that cares about whether copies are made by current_exception / rethrow_exception then we probably can’t make the sort of changes to what exceptions can do that are being suggested in the article.

                                                          The emergency pool is mandated by the spec. It’s a complete waste of time because if malloc fails then you may discover that the emergency buffers are overcommitted and fail.

                                                          This is more-or-less what I had already said (though I don’t agree with your following assertion that the emergency pool is completely useless, because I’d much rather it be possible to throw exceptions in an out-of-memory situation than not. I guess bad_alloc could theoretically be handled specially so as not to require allocation but I’d still want other exception types to be able to be thrown).

                                                          Exceptions require walking the stack and finding the associated cleanup

                                                          […] Walk a linker data structure to find the table (or cleanup function) corresponding to the function on the stack. This requires global synchronisation with respect to library loading and unloading.

                                                          You can somewhat mitigate the latter case with different locking policies or with lock-free data structures for loaded objects but it’s not clear whether this just reduces the problem.

                                                          […]

                                                          Avoiding exceptions, or limiting what exceptions can do, would allow exception throwing to become a local problem: no stack walking and unwind, just a lightweight conditional branch on a flag.

                                                          I’m going to just sum up my thoughts:

                                                          • “not clear whether this just reduces the problem” implies that this could be investigated, and I’d suggest it would be worth doing so before making language changes to accommodate an imagined problem
                                                          • “the problem” already suggests there is a problem, whereas I’m inclined to feel that lock contention in the presence of exceptions being thrown just implies that exceptions are being used too heavily in the application code
                                                          • limiting what exceptions can do in order to make it possible to optimise exception throwing/handling in the way you suggest is all well and good if you are willing to throw away a lot of existing code, though it will introduce a small run-time overhead, and probably a code-size overhead (even counting unwinding tables), for a case that current code is written to avoid (because exceptions are known to have overhead, and are generally understood to be intended for use only in exceptional circumstances). Overall, I doubt it’s worth it. I’d certainly rather see the implementations fixed first.

                                                          What I suspect is driving the desire to optimise exception throw/catch is some people wanting to use exceptions for general control flow. I’d rather just not see them used for that.

                                                  1. 10

                                                    This waffled on a lot so I skipped to the end for the “mathematic explanation for why OOP sucks a big time”:

                                                    You’ll see the Java and C# […] make it inconvenient to write abstract variables and convenient to throw in few ints and floats, although these are quite close to the machine implementation

                                                    […]

                                                    In functional programming you just say “a” if it’s something that goes through your system intact. That’s as abstract as it can be

                                                    This is nothing to do with functional-vs-OOP; it’s about type genericity. A functional language can still require you to specify parameter types (sure, many don’t and have generic functions, but that’s not the point unless you’re comparing specific languages rather than programming paradigms).

                                                    I feel like the whole piece is fluff.

                                                    1. 7

                                                      Anyone who knows some PLT has long moved on from this debate, what we are seeing is new people joining old discussions that haven’t been cleaned up properly (so the new people are being “wrong on the internet” and hopefully they will receive the support needed to move on to being wrong about something new).

                                                    1. 2

                                                      There’s so many issues that come from the need to optimise that I wonder if C could solve a few problems by introducing “don’t touch this” blocks. Basically “volatile”, but for lines of code where no optimisation takes place, no dereference is skipped, no overflows are analysed, etc. So you’d write:

                                                      volatile { *foo = bar[a+b]; }
                                                      

                                                      and whatever else is happening around that block, you’d do the addition, deref bar, load the foo address and write there - no changes allowed.

                                                      Given how much analysis and workarounds we’re already stacking here, wouldn’t handing the control back to the dev be simpler at this point? (This would probably need to disable LTO though)

                                                      1. 5

                                                        The root problem is that people want C to be two things:

                                                        • A portable assembler.
                                                        • A language with compiler optimisations.

                                                        You can’t have both in a single language. If you want a trivial-to-understand lowering from your source syntax to the sequence of executed instructions then you can’t do any non-trivial optimisations. You can do constant propagation. You might be able to do common-subexpression elimination (though not if you have shared-memory parallelism). That’s about it.

                                                        If you want optimisation then those optimisations will be correct according to some abstract machine. You need to understand what that abstract machine is and accept that your code describes a space of possible behaviours and that any of the behaviours allowed by the abstract machine may occur. The more flexibility the abstract machine allows, the larger the set of possible optimisations. If you want things like autovectorisation of loops then you need to have a memory model that allows the compiler to make assumptions about non-aliasing and happens-before relationships: if partial results to four loop iterations are visible in memory then this would violate the semantics of a very close-to-the-ISA abstract machine, but is fine in C/C++ because the memory model tells you that no atomic has happened that established a happens-before relationship and so the exact order that these things appear in memory is undefined.

                                                        Personally, I’d love to work on a good language for writing operating systems and language runtimes. Something that had a memory model that let you reason about behaviour of lockless data structures and that had a mechanism for me to define my own type-punning rules in the language (such that I could implement a memory allocator and expose the explicit point at which the type associated with underlying memory changed). There are probably a dozen or so projects that would adopt such a language, so it’s hard to justify spending time on it.

                                                        1. 1

                                                          This would be bitch to specify, but yes, I would like to see a serious try of this. Can a and b be on register, or should compiler be required to load them from stack, for example? You basically need to specify compilation algorithm, which amounts to re-implementing a C compiler. By the way, HTML5 parsing specification does work that way, so such standard can be valuable. It’s just a lot of work and very different style of standardization.

                                                          1. 1

                                                            I’m not super familiar with the C standard - why do you think the whole compiler would have to be redefined rather than adding qualifiers like “this transformation may be done here - unless it’s a volatile block”, “this is undefined - unless it’s a volatile block where …”, etc. ?

                                                            1. 1

                                                              The C standard doesn’t directly specify transforms that can be applied, at all (maybe one or two very minor exceptions). The extent to which permissible optimisations are specified is mainly via two concepts:

                                                              The “as-if” rule, which says (more or less) that as long as the observable behavior of a program is correct then the compiler has its job (i.e. it doesn’t matter what code is generated, as long as it produces the output that it should have, according to the semantics of the “abstract machine”). Quote from the standard:

                                                              The semantic descriptions in this International Standard describe the behavior of an abstract machine in which issues of optimization are irrelevant.

                                                              Then, there’s the “undefined behaviour” concept, which says (again - roughly) that if a program violates certain constraints, the semantics of that abstract machine are not specified at all. This notion is particularly useful for compilers to exploit in order to enable optimisations. But the standard doesn’t generally talk about actual transformations to the program.

                                                              That leaves your second point:

                                                              “this is undefined - unless it’s a volatile block where …”

                                                              That could be done, to some extent; but then the behaviour (inside such a block) would have to be specified. It’s hard to explain why this is difficult without going into a lot of detail, but suffice to say, the standard is already sufficiently vague in enough areas that it’s already difficult to tell in some cases whether certain things have defined behaviour (and if they do, what it is). Getting the details right for such a change would be very finicky. However, ultimately, what you suggest could probably be done - it would just need a lot of work. I don’t think it would in fact require specifying the whole compilation algorithm.

                                                          2. 1

                                                            How would it work with, lets say, function call boundaries? In particular inline functions.

                                                            inline void write_byte(uint8_t *p, uint8_t v) { *p = v; *p = v; }
                                                            
                                                            volatile {
                                                                write_byte(p, 42);
                                                                write_byte(p, 64);
                                                            }
                                                            

                                                            Should the above write to *p once, twice, or four times? I think twice seems the most reasonable, but I think there are arguments to be made for four writes as well, depending on whether or not write_byte is static inline or not.

                                                            1. 1

                                                              work with, lets say, function call boundaries?

                                                              They don’t have to be allowed inside. I imagine using the volatile block for just a few lines like the inside of write_byte + preventing reordering around that block. Basically a high-level asm block.

                                                          1. 3

                                                            I support standardizing -fno-strict-overflow and -fno-strict-aliasing in ISO C. This is basically status quo, and standardizing existing practice is usually a good idea. On the other hand, I am pretty skeptical about proposals along the line of platform-specific UB. I am yet to see any good proposal, and most proposals can be described as compiler user wishlist with no connection to compiler implementer reality.

                                                            1. 1

                                                              I disagree with standardizing the no-overflow/no-strict-aliasing flags. Using these options is not standard practice (it may be reasonably common practice, but that’s not the same thing). Supporting these options (or equivalents) is pretty standard in compilers, but the standard already allows for that (it doesn’t mandate it).

                                                              The point of standardising the language is so that it is clear what the semantics are, and what is and what is not allowed, so you can have some assurance about the behaviour of code even when compiled using different compilers. That assurance is significantly reduced if you now need to know the specific variant of the language the code is written in. I can foresee problems arising where “no-strict-overflow, no-strict-aliasing” code would be unwittingly used in the wrong (strict-overflow, strict-aliasing) context and be broken as a result. It would arguably be better to not have these options at all, since their presence leads to their use, and their use allows what is fundamentally incorrect code to be written and used. I would much rather see standardised consistent solutions that would be embodied within the source: special “no strict overflow” integer types (or type attributes), “may alias all” pointer types, and so on. And we sorely need simple, consistent and standard ways to safely check for overflow before it happens (such as what GCC provides, but which of course is not standard: https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html).

                                                              1. 1

                                                                For implementation strategy, I think standardized pragma would be best, because as you pointed out, flag risks use in wrong context.

                                                                what is fundamentally incorrect code to be written and used

                                                                This is… non sequitur? Incorrect according to whom? It is only incorrect according to the current standard, which is immaterial since the proposal is precisely to update the standard. This is like replying to “software should not be patentable” by “infringing software patent is illegal”, both true and useless. This is not about being correct or incorrect. -fno-strict-overflow and -fno-strict-aliasing are useful. Linux kernel uses them!

                                                                “no strict overflow” integer types, “no strict aliasing” pointer types, overflow checking builtins

                                                                All good ideas, but this is also perfect embodiment of “perfect is the enemy of good”. These are not existing practices (okay, except GCC overflow checking builtins, I support standardizing them yesterday), unlike -fno-strict-overflow and -fno-strict-aliasing flags. Considerable design work needs to be done, and prototype needs to be field tested. We should start that today, but standardization is far off.

                                                                1. 1

                                                                  This is… non sequitur?

                                                                  No, I don’t think so.

                                                                  Incorrect according to whom? It is only incorrect according to the current standard

                                                                  Exactly.

                                                                  which is immaterial since the proposal is precisely to update the standard

                                                                  That’s the proposal you are making, but not one I agreed with, for the specific reason that I don’t want “standard C” to actually be multiple different languages. The existence of these options (outside the standard) and the fact that they lead to multiple variants of the language (standardised or not) is very relevant, not immaterial at all.

                                                                  This is like replying to “software should not be patentable” by “infringing software patent is illegal”

                                                                  The argument is “these options should not be standardised, because they cause problems”.

                                                                  1. 1

                                                                    I understand concerns about fragmenting the language, but my view is that it is already lost. Linux kernel exists, and C is already fragmented in practice.

                                                                    I am not proposing this, but one way to solve fragmentation is to standardize -fno-strict-overflow and -fno-strict-aliasing behavior, without any option. If you want lost optimization you can add flags yourself, exactly as you can add -ffast-math now.

                                                                    1. 1

                                                                      I understand concerns about fragmenting the language, but my view is that it is already lost. Linux kernel exists, and C is already fragmented in practice.

                                                                      I agree it’s already a problem. I think the underlying causes for this should be addressed, but not by standardising the language variants that have already emerged (nor by removing strict aliasing / strict overflow altogether).

                                                                  2. 1

                                                                    All good ideas, but this is also perfect embodiment of “perfect is the enemy of good”. These are not existing practices

                                                                    I also disagree with this characterisation, regardless of whether they are existing practices.

                                                                    But in fact: https://gcc.gnu.org/onlinedocs/gcc-4.0.2/gcc/Type-Attributes.html

                                                                    • “may_alias” attribute (applies to pointee types, not pointers themselves, so not exactly what I suggested, but close enough).

                                                                    There’s no similar attribute for wrap-on-overflow, unfortunately. But I don’t think “it hasn’t been done, therefore it should not be done” is an argument that really holds water. And characterising it as “the perfect” because it hasn’t been done seems a stretch. (edit: and characterising your own proposal as “the good” is begging the question).

                                                                    1. 1

                                                                      it hasn’t been done, therefore it should not be done

                                                                      What the fuck. I said “We should start that today”. I am just pointing out that “put it on types” has had less field testing than “put it on flags”.

                                                                      1. 1

                                                                        Sorry, I missed your “we should start that today” comment and I didn’t intend to anger you.

                                                                        this is also perfect embodiment of “perfect is the enemy of good”

                                                                        I interpreted the above other than “we shouldn’t do what you have proposed (the perfect) because there is a solution that I have proposed (the good) that is easier because it has already been done”. Now, I think what you meant was actually “we should do what I have proposed now rather than delaying indefinitely until we can do something better”. I’m afraid I still disagree; I don’t want to see these language fragments standardised. On a practical level, I also think it’s unlikely either change would be standardised in any short time frame. The ISO C committee is not known for, err, actually correcting significant problems in the language and its specification.

                                                                  3. 1

                                                                    not standard

                                                                    It has been proposed.

                                                                1. 3

                                                                  At work, finalising support for static analysis of Python code in the analysis suite I work on (that’s been a long road but finally have finally seen some results in the last few weeks).

                                                                  Outside work, trying to find some time to work more on dinit (https://github.com/davmac314/dinit). It’s seeing a lot more attention now that it’s used by chimera (https://chimera-linux.org/) and also supported as a primary init by Artix linux.

                                                                  1. 1

                                                                    Why would “fully asynchronous I/O” be a good idea?

                                                                    (Assuming the usual meaning of async = “programming w/out control flow”.)

                                                                    1. 6

                                                                      In general, it’s easy to implement synchronous API on top of asynchronous API, but not vice versa. Managarm implements POSIX synchronous API on top of its asynchronous API, for example.

                                                                      1. 1

                                                                        It is impossible to implement synchronous API on top of asynchronous API in the most widely used programming language, JavaScript.

                                                                        If you have threads then yes, it might be possible, but why not use threads to begin with?

                                                                        1. 4

                                                                          The difference is that asynchronous I/O in Javascript works only via callback. For an OS kernel it is trivial to provide a single synchronous completion-wait syscall and thus all asynchronous I/O can be made synchronous by turning it into two steps: schedule asynchronous I/O, then wait for that I/O to complete. This doesn’t require the application to be multi-threaded.

                                                                          1. 2

                                                                            It is impossible to implement synchronous API on top of asynchronous API in the most widely used programming language, JavaScript.

                                                                            I’m not sure I entirely understand what you mean. If you want to block on a fetch in JavaScript, you can simply await it. That makes it synchronous, does it not?

                                                                            There’s of course an event loop / scheduler that decides when to schedule your function’s executive, but the same is true of processes/threads on Linux.

                                                                            1. 1

                                                                              await is only possible within special contexts (at the top-level or within async functions). Now say for example you want to use an API that requires a non-async function as parameter. Can’t use await in there.

                                                                              1. 1

                                                                                But isn’t that like saying “Now say for example you want to use an API that doesn’t do any context switches. You can’t make blocking IO calls in there.”?

                                                                                1. 0

                                                                                  I am just saying that you can’t - in general - program async as if it was sync. Not in JS.

                                                                                  You can do it in a language with threads (because a thread can be blocked anywhere, whereas async/await can only block in particular contexts).

                                                                                  P.S. I don’t think my example is frivolous. Let’s say the API in question does some sophisticated compute work and you can’t replace or modify it easily. But your requirements also force you to make an async IO call from the callback. Well, you can’t with async/await.

                                                                                  P.P.S. Context-switching behavior is usually not under the control of app programmers so I don’t really get your comparison.

                                                                                  1. 1

                                                                                    I’m just thinking out loud, essentially. I’m still on the fence about the whole function colors debate.

                                                                                    I think it’s interesting, though, that while the syntax of async/await is different, the semantics is essentially the same as traditional processes/threads and context switching. Until you introduce parallel execution primitives such as Promise.all, at which point async/await becomes strictly more expressive.

                                                                                    From this perspective, it seems like async IO is indeed a better foundation on which to build an OS.

                                                                            2. 1

                                                                              how are threads implemented? microkernels are just on top of hardware, I don’t know anything about this but from reading a bit on the hurd website the issue is that the synchronous microkernels block a lot whereas the async ones can get more done >.> idk

                                                                          2. 6

                                                                            You seem to be thinking in terms of language-level abstractions, not OS abstractions. Your definition is definitely not ‘the usual meaning of async’ in the context of systems programming. When you do synchronous I/O in an OS, the following sequence happens:

                                                                            1. The OS deschedules the calling thread.
                                                                            2. The OS notifies the relevant subsystem (e.g. storage, network) to begin processing the I/O.
                                                                            3. The relevant subsystem may return immediately if it has some cached value (e.g. disk I/O in the buffer cache, incoming network packets) but typically it issues some DMA commands to tell the hardware to asynchronously deliver the result.
                                                                            4. The scheduler runs some other threads.
                                                                            5. The I/O completes.
                                                                            6. The kernel wakes up the calling thread.

                                                                            The flow with asynchronous I/O is very similar:

                                                                            1. The OS allows the calling thread to remain scheduled after processing the request.
                                                                            2. The OS notifies the relevant subsystem (e.g. storage, network) to begin processing the I/O.
                                                                            3. The relevant subsystem may return immediately if it has some cached value (e.g. disk I/O in the buffer cache, incoming network packets) but typically it issues some DMA commands to tell the hardware to asynchronously deliver the result.
                                                                            4. The scheduler runs some other threads, including the calling thread.
                                                                            5. The I/O completes.
                                                                            6. The kernel either asynchronously notifies the calling thread (e.g. via a signal or writing an I/O-completed bit into a userspace data structure) or waits for an explicit (blocking or non-blocking) call to query completion state.

                                                                            Given the latter and a blocking wait-for-completion call, you can trivially simulate the former by implementing a synchronous I/O call as an asynchronous request followed by a blocking wait-for-completion. The converse is not true and requires userspace to maintain a pool of threads that exist solely for the purpose of blocking on I/O and waiting for completion.

                                                                            If your program wants to take advantage of the asynchronous nature of I/O then it can perform other work while waiting for the I/O.

                                                                            Most OS interfaces are synchronous for two reasons:

                                                                            • They were designed before DMA was mainstream.
                                                                            • They originated on single-core systems.

                                                                            On DOS or early ‘80s UNIX, for example, if you wanted to read a file then you’d do a read system call. The kernel would synchronously call through the FS stack to find the right block to read, then would write the block request to the device’s I/O control registers and then sit doing a spinning read of the control registers to read each word that the device returned. There was no point making it async because there was no way of doing anything on the CPU other than polling the device. Even back then, this model didn’t work particularly well for things like networks and keyboards, where you may have no input for a while.

                                                                            With vaguely modern (late ‘90s onwards) hardware neither of these is really true. The kernel may synchronously call through the FS stack to get a block, but then it writes a DMA request to the device. The device eventually writes the result directly into memory and notifies the kernel (either via an interrupt or via a control register that the kernel periodically polls). The kernel can schedule other work in the middle. On a multicore system, all of the kernel’s work can happen on a different core to the userspace thread and so all of the FS stack work can happen in parallel with the userspace application’s work.

                                                                            There’s one additional dimension, which is the motivation for POSIX APIs such as lio_listio and Linux APIs such as io_uring: system calls can be expensive. In the simple async model outlined above, you potentially double the number of system calls because each call becomes a dispatch + block (or, worse, dispatch + poll multiple times) sequence. You can amortise this if you allow the dispatch to start many I/O operations (you generally don’t want to do this with sync I/O because if you had to, for example, wait until a network packet was received before seeing the result of a disk read then you’d introduce a lot of latency. APIs such as readv and writev do this for the case where it is useful: multiple I/Os to the same descriptor). You can make the poll fast by making the kernel just write a completion flag into userspace memory, rather than keeping state in the kernel that you need to query.

                                                                            Don’t conflate this with a language’s async keyword, especially not JavaScript’s. JavaScript has a run loop and event model tied into the language. It handles a single event to completion and then processes the next one. This is already asynchronous because if you had synchronous event polling then you’d block handling of any other event (you can already mess this up quite easily by spending too long servicing one event). The JavaScript async keyword does CPS construction to generate a handler for an event that captures all of the state of the things that happen after an await.

                                                                          1. 2

                                                                            I feel like the bug is in repeatedly trying to identify if a device you’ve already previously checked should be added. I would think SDL would cache the list of devices already checked (invalidated when the list of devices changes) and only opens/tests/closes devices the first time they are encountered. At least, that’s how I would have written it.

                                                                            1. 1

                                                                              I suspect one issue is that “no new device” when scanning /dev/input periodically doesn’t necessarily mean that one of the existing device nodes doesn’t actually refer to a new device. I.e. you pull out a mouse that was /dev/input/event5, plug in a joystick and it becomes that same /dev/input/event5. So just comparing the list of device nodes isn’t enough to tell if the devices have actually changed.

                                                                              Probably the right way to do it is watch the hotplug events (via libudev for example), though doing that properly in a library like SDL would also be a bit complicated, I think.

                                                                              1. 1

                                                                                In Ubuntu there is also also a directory with the devices by name (which are symlinks to the actual devices). So you can open a device by name, then when you get an error (as happens when the device is unplugged), close the device and reopen it rather than scanning the directory. (I don’t know if this is idev or the kernel or something else that creates the directory, so I’m not sure if it’s portable to other distros, but it’s quite useful).

                                                                                This works great for performance but if you want to handle the case where a new controller can be plugged in while the game is running (a second player, perhaps), then you have to scan the directory or do something like you suggested.

                                                                                EDIT: I realized after posting that I’m not sure which process creates the directory and the symlinks so I updated my reply to reflect that.

                                                                                1. 1

                                                                                  I wonder if a scheme will be developed for input devices that is similar to networking devices. Meaning that they each get some unique-ish name.

                                                                                  1. 4

                                                                                    I’m not sure how SDL handles that but you can (technically) get as unique an ID as you can for a given /dev/input path. evdev exposes the type and physical topology for each entry. Querying that probably takes some extra time but I don’t think it’s something SDL couldn’t handle for you (or if it’s something that it doesn’t handle for you, I haven’t touched SDL since the 1.2 days…).

                                                                                    Providing unique names for hotpluggable devices is not a very easy problem so I don’t know how much it would help if that information were exposed straight in the device name. The “predictable” network device naming scheme doesn’t really work for all of those, either. The predictable part relies on information provided by the motherboard firmware/BIOS so it only works as long as the enumerated devices are on-chip, soldered on the mainboard, or fitted in a slot that you can’t reach without getting a screwdriver. Once you get to cheap USB adapter land you’re back to unpredictable names, except they’re all called enpXsY because something something modern hoptlugging whatever mumbles in BSD. Unpredictable as in, just because a device has the same name as one you’ve already seen doesn’t guarantee it’s the same one, and just because you got an event for a new device with a new name doesn’t mean it’s not the same device in a different physical location.

                                                                                    It’s certainly true that most of the breakage (especially of the “just because it’s got the same name doesn’t mean it’s the same device” kind) is caused by bad hardware. Unfortunately that doesn’t stop people from buying bad hardware and they tend to blame software they get for free as opposed to hardware they paid money for, so ¯\(ツ)/¯.

                                                                                    1. 5

                                                                                      EVIOCGUNIQ often fail outright or return empty values or provide the same value for multiple devices of the same kind when plugged in. The coping mechanism is basically a hash of various device features, and caching plug/unplug actions hoping for a “proximate onset/proximate cause” kind of relationship to retain the logical mapping.

                                                                                      Nothing in evdev is reliable if your test-set of devices is big enough, and every low level consumer of the thing ends up with workaround databases to cope - and those are in a similarly terrible shape. Step into the bowels of udev and trace from how it goes from discover a new input device from NETLINK and onwards to figuring out what it is. There are some grand clintonesque assumptions about what constitutes “a keyboard”.

                                                                                      With current game devices it is practically easier to forget that the thing even exist and go for a userspace usb/bluetooth implementation of your own. It’s no less painful than trying to stitch together the quilt of rusty kernel interfaces that nobody wants to touch that has decades of workarounds - in reality you get to do both or walk away.

                                                                                      Recent console-class game “controllers” is an army of environment sensors, e.g. a camera or three, a handful of analog sticks and buttons that are both analog and digital in various stages of their exciting journey from depressed to pressed. They also have more expressive LED outputs than “designed for numlock and capslock” than evdev was ever “designed” for, as well as a low-fi speakers (force feedback) and a hi-fi one (because local sound effects) that may or may not appear as a sound device, and often split across multiple device nodes that appear unrelated from the evdev consumer point of view. Then comes assistive devices and VR …

                                                                                      The kind of performance bugs mentioned in the article are everywhere in the stack. It makes sense to outsource opening/closing to another process to forget about a few of them and just not accept tickets on the matter - it’s probably some powersave PMIC dance gone horribly wrong. I have this fine USB hub here that, depending on the order devices are plugged into it, will introduce about ~20ms extra latency on its own, jittery as all hell, or add stalls every n seconds when it suddenly and silently forces a soft reset that the kernel tries to hide from the evdev side…

                                                                                      1. 2

                                                                                        buttons that are both analog and digital in various stages of their exciting journey from depressed to pressed

                                                                                        This is the kind of thing that makes me wonder why prisons aren’t full of programmers who went insane, showed up at the office with a chainsaw one day, and did a different kind of hacking. God!

                                                                                        Do you know if things are any better in, erm, other lands, like macOS/iOS or Windows? I first started hearing horror stories about evdev about 15 years ago and the design was never particularly forward-looking so I imagine that fifteen years of industrial evolution (!?) made things even worse. Did anyone get it any “righter”?

                                                                                        1. 5

                                                                                          This is the kind of thing that makes me wonder why prisons aren’t full of programmers who went insane, showed up at the office with a chainsaw one day, and did a different kind of hacking. God!

                                                                                          If only the culprits weren’t so geographically spread ..

                                                                                          Do you know if things are any better in, erm, other lands, like macOS/iOS or Windows? I first started hearing horror stories about evdev about 15 years ago and the design was never particularly forward-looking so I imagine that fifteen years of industrial evolution (!?) made things even worse. Did anyone get it any “righter”?

                                                                                          There are pockets of “righter” in Windows/iOS/Android but they also have the benefit of being able to act more authoritatively and set a ’minimal viable type and data model” to work from. OpenXR does a fair job of approaching some of the consumer API side of things by shifting the responsibility around by having applications define the abstract inputs they want and leave it to the compositor side to actually provide the translation and mapping.

                                                                                          Android has a very well-thought out basic data model, and past the InputManager things looks quite clean (about the same stage as the compositor would be at in the Linux stack) - but its data routing is punished both by being layered on top of evdev and for having a Java GC in nearly every process.

                                                                                          The procedure I usually apply is walking the path from the input device to the consumer, noting each ‘sample/pack/unpack/filter/translation’ stage and cost along the way, and for each indirection ask “what happens on backpressure?”, “is source/identity retained?”, “can filter/translation/queueing be configured?”, “can sampling parameters be changed at runtime?”.

                                                                                          For the really neglected input devices though, hands and eyes - things are really grim. Enjoy this little nugget: https://github.com/leapmotion/leapuvc/blob/master/LeapUVC-Manual.pdf - and that’s not even all of the abuse needed for an input sample, there is something about “authentication” there at the end. Then comes the actual computer vision part ..

                                                                                          The eye tracker I use had a linux input driver at one time. It was quickly pulled from public support. It pegged a CPU core for its own, bundled electron to be able to provide a configuration step - “Look at these four reference points”, depended on systemd for .. things, required certificates for providing a configuration of your own (or you could get much better precision that made it harder to justify a commercial option), … Now I run the driver in a windows VM with the usb node forwarded, extract the samples and send them over the network. That is against the driver TOS.

                                                                                          1. 3

                                                                                            For the really neglected input devices though, hands and eyes - things are really grim. Enjoy this little nugget: https://github.com/leapmotion/leapuvc/blob/master/LeapUVC-Manual.pdf - and that’s not even all of the abuse needed for an input sample, there is something about “authentication” there at the end. Then comes the actual computer vision part ..

                                                                                            Oh wow. That whole thing is… one gem after another. I like that bit about gain control, too. Gotta hand it off to them, though, at least they didn’t just throw their hands in the air and said well, they’re multiplexed, so reading the gain value will return either the gain control, the FPS ratio, or the dark frame interval, you’ll figure out which one’s which eventually.

                                                                                            I wish I had something constructive to say but right now I mostly want to stay away from computers for a while…

                                                                                    2. 3

                                                                                      /dev/input/by-id/

                                                                                1. 2

                                                                                  I don’t think I agree with the premise of this article. Starting from this code:

                                                                                  bench_input = 42;
                                                                                  start_time = time();
                                                                                  bench_output = run_bench(bench_input);
                                                                                  result = time() - start_time;
                                                                                  

                                                                                  The compiler may not move the benchmark call before the first time call unless it can prove that this move is not observable within the language semantics. If time is a system call, it’s game over: it may modify any global that run_benchmark reads. If it has complete visibility into the benchmark and the benchmark doesn’t read any globals then that may be fine.

                                                                                  The last transform, completely eliding the benchmark run because it can be shown not to have side effects and its result is unused, is far more plausible but that’s also generally an indication of a misleading benchmark. Especially in this example where the input is a compile-time constant: even if you do use the result, the compiler is free to evaluate the whole thing at compile time. Even if it doesn’t, it may generate code that assumes more knowledge of the input than is true in the general case.

                                                                                  The DoNotOptimize function is doing two things:

                                                                                  • It is a compiler barrier or, in C++11 atomics terminology, a signal fence. It prevents the compiler reordering things across the boundary.
                                                                                  • It causes a value to escape from the compiler’s ability to analyse. This is slightly scary because LLVM actually does look inside inline assembly blocks in back-end optimisations and there’s no guarantee that it won’t in the future look there in mid-level optimisers. These would be free to observe that the instruction sequence there (no instructions) have well-defined semantics and do not capture or permute the value and so optimise this away.

                                                                                  You can do both of these without inline assembly. The signal fence is in <atomic> (C++) or <stdatomic.h> (C). The second is a bit more tricky but you generally need to either call a function that the compiler can’t see (difficult with LTO) or store to, and read back from, an atomic variable.

                                                                                  1. 1

                                                                                    it may modify any global that run_benchmark reads.

                                                                                    I think the concern is that run_benchmark might not read any globals.

                                                                                    It’s also not completely out of the question that the compiler has a special understanding of time() and knows that it doesn’t touch program globals. As far as I know that’s not the case at the moment, however (and if it was you might need further steps to guarantee that nothing got moved out from between the two calls to time()).

                                                                                    If it has complete visibility into the benchmark and the benchmark doesn’t read any globals then that may be fine.

                                                                                    Exactly :)

                                                                                  1. 5

                                                                                    Dragonfly has gone a long way since; now they’re trading blows with Linux in the performance front, despite the tiny team, particularly when contrasting it with Linux’s huge developer base and massive corporate funding.

                                                                                    This is no coincidence; it has to do with SMP leveraged through concurrent lockfree/lockless servers instead of filling the kernel with locks.

                                                                                    1. 3

                                                                                      This comparison, which seems pretty reasonable, makes it look like it’s still lagging behind.

                                                                                      1. 7

                                                                                        What I don’t like about Phoronix benchmark results generally is that they lack depth. It’s all very well to report MP3 encoding test running for 32 seconds on FreeBSD/DragonflyBSD and only 7 seconds on Ubuntu, but that raises a heck of a question: why is there such a huge difference for a CPU-bound test?

                                                                                        Seems quite possible that the Ubuntu build is using specialised assembly, or something like that, which the *BSD builds don’t activate for some reason (possibly even because there’s an overly restrictive #ifdef in the source code). Without looking into the reason for these results, it’s not really a fair comparison, in my view.

                                                                                        1. 3

                                                                                          Yes. This is well worth a read.

                                                                                          Phoronix has no rigour; it’s a popular website. A benchmark is useless if it is not explained and defended. I have no doubt that the benchmarks run in TLA were slower under freebsd and dragonflybsd, but it is impossible to make anything of that if we do not know:

                                                                                          1. Why

                                                                                          2. What is the broader significance

                                                                                          1. 4

                                                                                            The previous two comments are fair, but at the end of the day it doesn’t really change that LAME will run a lot slower on your DragonflyBSD installation than it does on your Linux installation.

                                                                                            I don’t think these benchmarks are useless, but they are limited: they show what you can roughly expect in the standard stock installation, which is what the overwhelming majority of people – including technical people – use. This is not a “full” benchmark, but it’s not a useless benchmark either, not for users of these systems anyway. Maybe there is a way to squeeze more performance out of LAME and such, but who is going to look at that unless they’re running some specialised service? I wouldn’t.

                                                                                        2. 1

                                                                                          This comparison, newer and from the same website, makes it look as the system that’s ahead (see geometric mean @ last page).

                                                                                          Not that I’m a fan of that site’s benchmarks.

                                                                                          1. 2

                                                                                            I haven’t done the math, but it seems like most of DragonFlyBSD’s results come from the 3 “Stress-NG” benchmarks, which incidentally measures “Bogo Ops/s”.

                                                                                            Here’s the benchmark page: https://openbenchmarking.org/test/pts/stress-ng

                                                                                            I don’t know why Phoronix uses a version called 0.11.07 when the latest on the page seems to be 1.4.0, but maybe that’s just a display issue.

                                                                                            1. 1

                                                                                              Christ @ benchmarking with Bogo anything.

                                                                                      1. 4

                                                                                        This is actually not a bad rundown, though I feel like the discussion of UB lacks the correct nuance. When referring to integer overflow:

                                                                                        The GNU C compiler (gcc) generates code for this function which can return a negative integer

                                                                                        No, it doesn’t “return a negative integer”, it has already hit undefined-behaviour-land by that point. The program might appear to behave as if a negative integer was returned, but may not do so consistently, and that is different from having a negative integer actually returned, especially since the program might even exhibit odd behaviours that don’t correspond to the value being negative or the arithmetically correct value, or which don’t even appear to involve the value at all. (Of course, at the machine level, it might do a calculation which stores a negative result into a register or memory location; but, that’s the wrong level to look at it, because the presence of the addition operation has effects on compiler state that can affect code generation well beyond that one operation. Despite the claim being made often, C is not a “portable assembler”. I’m glad this particular article doesn’t make that mistake).

                                                                                        1. 3

                                                                                          What? The code in question:

                                                                                          int f(int n)
                                                                                          {
                                                                                              if (n < 0)
                                                                                                  return 0;
                                                                                              n = n + 100;
                                                                                              if (n < 0)
                                                                                                  return 0;
                                                                                              return n;
                                                                                          }
                                                                                          

                                                                                          What the article is saying is that on modern C compilers, the check for n < 0 indicates to the compiler that the programmer is rejecting negative numbers, and because programmers never invoke undefined behavior (cough cough yeah, right) the second check when n < 0 can be removed because of course that can’t happen!

                                                                                          So what can actually happen in that case? An aborted program? Reformatted hard drive? Or a negative number returned from f() (which is what I suspect would happen in most cases)? Show generated assembly code to prove or disprove me please … (yes, I’m tired of C language lawyers pedantically warning about possible UB behavior).

                                                                                          1. 3

                                                                                            because programmers never invoke undefined behavior

                                                                                            They shouldn’t, but they often do. That’s why articles such as the one in title should be super clear about the repercussions.

                                                                                            So what can actually happen in that case?

                                                                                            Anything - that’s the point. That’s what the “undefined” in “undefined behaviour” means.

                                                                                            (yes, I’m tired of C language lawyers pedantically warning about possible UB behavior).

                                                                                            The issue is that a lot of this “possible UB behaviour” is actual compiler behaviour, but it’s impossible to predict which exact behaviour you’ll get.

                                                                                            You might be “tired of C language lawyers pedantically warning about possible UB behaviour”, but I’m personally tired of programmers invoking UB and thinking that it’s ok.

                                                                                            1. 1

                                                                                              They shouldn’t, but they often do.

                                                                                              Yes they do, but only because there’s a lot of undefined behaviors in C. The C standard lists them all (along with unspecified, implementation and locale-specific behaviors). You want to know why they often do? Because C89 defined about 100 undefined behaviors, C99 about 200 and C11 300. It’s a bit scary to think that C code that is fine today could cause undefined behavior in the future—I guess C is a bit like California; in California everything causes cancer, and in C, everything is undefined.

                                                                                              A lot historically came about because of different ways CPUs handle certain conditions—the 80386 will trap any attempt to divide by 0 [1] but the MIPS chip doesn’t. Some have nothing to do with the CPU—it’s undefined behavior if a C file doesn’t end with a new line character. Some have to do with incorrect library usage (calling va_arg() without calling va_start()).

                                                                                              I’m personally tired of programmers invoking UB and thinking that it’s ok.

                                                                                              Undefined behavior is just that—undefined. Most of the undefined behavior in C is pretty straightforward (like calling va_arg() incorrectly), it’s really only signed-integer math and pointers where most of the problems with undefined behavior is bad. Signed-integer math is bad only in that it might generate invalid indices for arrays or for pointer arithmetic (I mean, incorrect answers are still bad, but I’m more thinking of security here). Outside of that, I don’t know of any system in general use today that will trap on signed overflow [2]. So I come back to my original “What?” question. The x86 and ARM architectures have well defined signed integer semantics (they wrap! I’ve yet to come across a system where that doesn’t happen, again [2]) so is it any wonder that programmers will invoke UB and think it’s okay?

                                                                                              And for pointers, I would hazard a guess that most programmers today don’t have experience with segmented architectures which is where a lot of the weirder pointer rules probably stem from. Pointers by themselves aren’t the problem per se, it’s C’s semantics with pointers and arrays that lead to most, if not all, problems with undefined behavior with pointers (in my opinion). Saying “Oh! Undefined behavior has been invoked! Abandon all hope!” doesn’t actually help.

                                                                                              [1] IEEE-754 floating point doesn’t trap on division by 0.

                                                                                              [2] I would love to know of a system where signed overflow is trapped. Heck, I would like to know of a system where trap representations exist! Better yet, name the general purpose systems I can buy new, today, that use sign magnitude or 1s-complement for integer math.

                                                                                              1. 2

                                                                                                Because C89 defined about 100 undefined behaviors, C99 about 200 and C11 300

                                                                                                It didn’t define them; It listed circumstances which have undefined behaviour. This may seem nit-picky, but the necessity of correctly understanding what is “undefined behaviour” is the premise of my original post.

                                                                                                A draft of C17 that I have lists 211 undefined behaviours listed. An article on UB - https://www.cs.utah.edu/~regehr/ub-2017-qualcomm.pdf - claims 199 for C11. I don’t think your figure of 300 is correct.

                                                                                                A bunch of the C11 circumstances for UB are to do with the multi-threading support which didn’t exist in C99. In general I don’t think there’s any strong reason to believe that code with clearly well-specified behaviour now will have UB in the future.

                                                                                                So I come back to my original “What?” question

                                                                                                It’s not clear to me what your “what?” question is about. I elaborated in the first post on what I meant by “No, it doesn’t “return a negative integer””.

                                                                                                Compilers will for eg. remove checks for impossible (in the absence of UB) conditions and other things that may be even harder to predict; C programmers should be aware of that.

                                                                                                Now, if you want to argue “compilers shouldn’t do that”, I wouldn’t necessarily disagree. The problem is: they do it, and the language specification makes it clear that they are allowed to do it.

                                                                                                The x86 and ARM architectures have well defined signed integer semantics

                                                                                                so is it any wonder that programmers will invoke UB and think it’s okay?

                                                                                                This illustrates my point: if we allow the view of C as a “portable assembly language” to be propagated, and especially the view of “UB is just the semantics of the underlying architecture”, we’ll get code being produced which doesn’t work (and worse, is in some cases exploitable) when compiled by today’s compilers.

                                                                                                1. 1

                                                                                                  I don’t think your figure of 300 is correct.

                                                                                                  You are right. I recounted, and there are around 215 or so for C11. But there’s still that doubling from C89 to C99.

                                                                                                  No, it doesn’t “return a negative integer”, it has already hit undefined-behaviour-land by that point.

                                                                                                  It’s not clear to me what your “what?” question is about.

                                                                                                  Unless the machine in question traps on signed overflow, the code in question returns something when it runs. Just saying “it’s undefined behavior! Anything can happen!” doesn’t help. The CPU will either trap, or it won’t. There is no third thing that can happen. An argument can be made that CPUs should trap, but the reality is nearly every machine being programmed today is a byte-oriented, 2’s complement machine with defined signed overflow semantics.

                                                                                                  1. 1

                                                                                                    Just saying “it’s undefined behavior! Anything can happen!” doesn’t help

                                                                                                    It makes it clear that you should have no expectations on behaviour in the circumstance - which you shouldn’t.

                                                                                                    Unless the machine in question traps on signed overflow, the code in question returns something when it runs.

                                                                                                    No, as already evidenced, the “result” can be something that doesn’t pass the ‘x < 0’ check yet displays as a negative when printed, for example. It’s not a real value.

                                                                                                    The CPU will either trap, or it won’t

                                                                                                    C’s addition doesn’t map directly to the underlying “add” instruction of the target architecture; it has different semantics. It doesn’t matter what the CPU will or won’t do when it executes an “add” instruction.

                                                                                          2. 1

                                                                                            Yes, the code generated does in fact return a negative integer. You shouldn’t rely on it, another compiler may do something different. But once compiled undefined behaviour isn’t relevant anymore. The generated x86 does in fact contain a function that may return a negative integer.

                                                                                            Again, it would be completely legal for the compiler to generate code that corrupted memory or ejected your CD drive. But this statement is talking about the code that happened to be generated by a particular run of a particular compiler. In this case it did in fact emit a function that may return a negative number.

                                                                                            1. 1

                                                                                              When we talk about undefined behaviour, we’re talking about the semantics at the level of the C language, not the generated code. (As you alluded, that wouldn’t make much sense.)

                                                                                              At some point you have to map semantics between source and generated code. My point was, you can’t map the “generates a negative value” of the generated code back to the source semantics. We only say it’s a negative value on the basis that its representation (bit pattern) is that of a negative value, as typically represented in the architecture, and even then we’re assuming that for instance some register (for example) that is typically used to return values does in fact hold the return value of the function …

                                                                                              … which it doesn’t, if we’re talking about the source function. Because that function doesn’t return once undefined behaviour is invoked; it ceases to have any defined behaviour at all.

                                                                                              I know this is highly conceptual and abstract, but that’s at the heart of the message - C semantics are at a higher level than the underlying machine; it’s not useful to think in terms of “undefined behaviour makes the function return a negative value” because then we’re imposing artificial constraints on undefined behaviour and what it is; from there, we’ll start to believe we can predict it, or worse, that the language semantics and machine semantics are in fact one-to-one.

                                                                                              I’ll refer again to the same example as was in the original piece: the signed integer overflow occurs and is followed by a negative check, which fails (“is optimised away by the compiler”, but remember that optimisation preserves semantics). So, it’s not correct to say that the value is negative (otherwise it would have been picked up by the (n < 0) check); it’s not guaranteed to behave as a negative value. It’s not guaranteed to behave any way at all.

                                                                                              Sure, the generated code does something and it has much stricter semantics than C. But saying that the generated function “returns a negative value” is lacking the correct nuance. Even if it’s true that in some similar case, the observable result - from some particular version of some particular compiler for some particular architecture - is that the number always appears to be negative, this is not something we should in any way suggest is the actual semantics of C.

                                                                                            2. 0

                                                                                              Of course, at the machine level, it might do a calculation which stores a negative result into a register or memory location; but, that’s the wrong level to look at it, because the presence of the addition operation has effects on compiler state that can affect code generation well beyond that one operation.

                                                                                              Compilers specifically have ways of ensuring that there is no interference between operations, so no. This is incorrect. Unless you want to point to the part of the GCC and Clang source code that decides unexpectedly to stop doing that?

                                                                                              1. 1

                                                                                                In the original example, the presence of the addition causes the following negative check (n < 0) to be omitted from the generated code.

                                                                                                Unless you want to point to the part of the GCC and Clang source code that decides unexpectedly to stop doing that?

                                                                                                If that’s at all a practical suggestion, perhaps you can go find the part that ensures “that there is no interference between operations” and point that out?

                                                                                                1. 1

                                                                                                  In the original example, the presence of the addition causes the following negative check (n < 0) to be omitted from the generated code.

                                                                                                  Right, because register allocation relies upon UB for performance optimization. It’s the same in both GCC and Clang (Clang is actually worse with regards to it’s relentless use of UB to optimize opcode generation, presumably this is also why they have more tooling around catching errors and sanitizing code). This is a design feature from the perspective of compiler designers. There is absolutely nothing in the literature to back up your point that register allocation suddenly faceplants on UB – I’d be more than happy to read it if you can find it, though.

                                                                                                  If that’s at all a practical suggestion, perhaps you can go find the part that ensures “that there is no interference between operations” and point that out?

                                                                                                  *points at the entire register allocation subsystem*

                                                                                                  But no, the burden of proof is on you, as you made the claim that the register allocator and interference graph fails on UB. It is up to you to prove that claim. I personally cannot find anything that backs your claim up, and it is common knowledge (backed up by many, many messages about this on the mailing list) that the compiler relies on Undefined Behaviour.

                                                                                                  Seriously, I want to believe you. I would be happy to see another reason of why having the compiler rely on UB is a negative point. For this reason I also accept a code example where you can use the above example of UB to cause the compiler to clobber registers and return an incorrect result. The presence of a negative number alone is not sufficient as that does not demonstrate register overwriting.

                                                                                                  1. 2

                                                                                                    There is absolutely nothing in the literature to back up your point that register allocation suddenly faceplants on UB

                                                                                                    What point? I think you’ve misinterpreted something.

                                                                                                    you made the claim that the register allocator and interference graph fails on UB

                                                                                                    No, I didn’t.

                                                                                                  2. 1

                                                                                                    It isn’t the addition; the second check is omitted because n is known to be greater than 0. Here’s the example with value range annotations for n.

                                                                                                    int f(int n)
                                                                                                    {
                                                                                                        // [INT_MIN, INT_MAX]
                                                                                                        if (n < 0)
                                                                                                        {
                                                                                                            // [INT_MIN, -1]
                                                                                                            return 0;
                                                                                                        }
                                                                                                        // [0, INT_MAX]
                                                                                                        n = n + 100;
                                                                                                        // [100, INT_MAX] - overflow is undefined so n must be >= 100 
                                                                                                        if (n < 0)
                                                                                                        {
                                                                                                            return 0;
                                                                                                        }
                                                                                                        return n;
                                                                                                    }
                                                                                                    
                                                                                                    1. 2

                                                                                                      You’re correct that I oversimplified it. The tone of the person I responded to was combative and I couldn’t really be bothered going into detail again one something that I’ve now gone over several times in different posts right here in this discussion.

                                                                                                      As you point out it’s the combination of “already compared to 0” and “added a positive integer” that makes the final comparison to 0 redundant. The original point, that the semantics of C, and in particular the possibility of UB, mean that a simple operation can affect later code generation.

                                                                                                      Here’s an example that works without interval analysis: (edit: or rather, that requires slightly more sophisticated analysis):

                                                                                                      int f(int n)
                                                                                                      {
                                                                                                          int orig_n = n;
                                                                                                          n = n + 100;
                                                                                                          if (n < orig_n)
                                                                                                          {
                                                                                                              return 0;
                                                                                                          }
                                                                                                          return n;
                                                                                                      }