Threads for david_chisnall

  1.  

    I can’t reproduce this. I’m running the exact same commands locally, and it doesn’t appear to run the code in collections.py. Is this windows specific behaviour or something?

    In general I don’t find the idea that generating docs involves running arbitrary code surprising, build systems often involve running arbitrary code, and generating docs often involves running build systems. It being python, I wouldn’t even be that surprised if the whole thing was implemented via reflection. I was wondering if I could get the same behaviour with something like python3 -m http.server though, because that would be approaching surprising behaviour.

    1. 7

      In general I don’t find the idea that generating docs involves running arbitrary code surprising,

      I misunderstood this originally. The problem is not that it runs arbitrary code to generate the docs (if you’re importing a package, you’re letting the author of the that package [and, by extension, the docs for that package] run arbitrary code anyway, because Python does not have a capability model). The problem is that the docs tool reads files matching a specific name in the current directory and executes them. If someone manages a drive-by download exploit on your browser (I think Safari is still the only mainstream browser that requires you to confirm per site that you want to allow it to download files) then they can drop a file like this in your downloads directory, and if you run the docs command in your downloads directory then you’re owned. Fortunately, it only checks the current directory and not arbitrary parents (as the git vulnerability a month or two ago did), so dropping a file like this in ~/Downloads doesn’t exploit you if you look at docs in ~/Downloads/SomePythonPackage-1.2.3.4/)

      1.  

        Yeah, what this really boils down to is “if there’s a foo.py in the working directory directory and you run python foo.py or anything else that imports foo, you will get the working directory’s foo.py”.

        Which is one of those deep tensions between making a thing discoverable/learnable (working directory being on the import path is huge for that) versus trying to lock it down as much as possible. And is getting into an area where it’s hard to really have the language stop you – Python could maybe refuse to run if it detects it’s being invoked in a directory matching common download/home dir names, or change import behavior silently, but now you get confusing inconsistency in how it works, and no amount of “are you sure you want to trust this directory?” popups will actually help the people who are most likely to need the help, since they’ll probably just click through those.

        As some folks have noted, Python has a command-line flag that lets you explicitly decide to minimize the import path, which maybe is the way forward for some tutorials and other beginner/first-time materials. Or maybe it’s a thing that needs to be solved at the operating system level.

        Unrelated: this is also why I and several other people strongly advocate for a code repository layout with a src/ directory top-level, and any modules/packages inside that directory. If the modules are top-level, it’s very easy to trick yourself into thinking your packaging process works because you’re likely running it from the root directory, which implicitly puts all that stuff on the import path. Using a src/ (or similar name) directory means you actually have to get the packaging right in order to successfully install/test.

        1.  

          The problem is that the docs tool reads files matching a specific name in the current directory and executes them.

          I thought the issue was that running python3 -m foo would run foo.py (or similar, don’t know specifics off the top of my head) – that is, this is nothing to do with the docs tool itself. Am I mistaken?

          1.  

            Running python -m foo will run whatever foo module is found first, starting with the current working directory. The same is true of running python -m pydoc foo.

            The specific “exploit” shown here is more like

            1. Module foo imports standard library module collections
            2. I manage to get a malicious file named collections.py into your current directory and convince you to run python -m pydoc foo
            3. The import collections inside foo gets resolved to the current directory’s collections.py, so that’s the file that gets imported. If it has any import-time side effects, they execute.

            It’s a bit convoluted to actually pull off, because generally you need to convince someone to run python from their downloads directory or something like that.

        2.  

          It works for me on Debian 11 and Ubuntu 22. And it also works for “python3 -m http.server”.

          Which OS do you use? Can you try running it in a Debian container?

          When I do, it also works for me:

          docker run --rm -it debian:11-slim
          apt update -y && apt upgrade -y
          apt install -y python3
          echo 'print("P0wned");exit()' > collections.py
          python3 -m http.server
          P0wned
          
          1.  

            I tried it on void linux and mac os. With python3.11 on both systems. EDIT: Oops, python3.10.8 on mac, I checked the version number in a terminal with ssh open (but did fail to reproduce on the actual mac).

            Your docker repro works for me, and installs python3.9. Maybe the behaviour has changed in more recent versions of python?

            1.  

              Continuing weirdness: I cannot reproduce with Debian’s python 3.9.2, but if I install 3.10.6 (using pyenv) I can finally reproduce what no_gravity is seeing. But I cannot reproduce it using Debian’s system python (which he seems able to do on Ubuntu).

              I’m going to stop messing around with this now, but there seem to be other factors at work here.

              # I installed python 3.10.6 using pyenv and made that the local python
              telemachus(digitalocean) wtf$ python3 --version
              Python 3.10.6
              telemachus(digitalocean) wtf$ python3 -m http.server
              P0wned
              Could not import runpy module
              
              # Now I've go back to the system python3
              telemachus(digitalocean) wtf$ rm .python-version
              telemachus(digitalocean) wtf$ python3 --version
              Python 3.9.2
              telemachus(digitalocean) wtf$ python3 -m http.server
              Serving HTTP on 0.0.0.0 port 8000 (http://0.0.0.0:8000/) ...
              ^C
              Keyboard interrupt received, exiting.
              
          2.  

            I also cannot reproduce—not on macOS 12.6.1 with python 3.10.8 and not on Debian 11 with python 3.10.5. (My shell is bash on both systems, though I doubt that matters.)

            I wonder what other variables are at play.

            1.  

              It works for me with Python 3.10.6:

              docker run --rm -it ubuntu:22.04
              apt update -y && apt upgrade -y
              apt install -y python3
              python3 --version
              => Python 3.10.6
              echo 'print("P0wned");exit()' > collections.py
              python3 -m http.server
              => P0wned
              
              1.  

                I can finally reproduce this, but only with some pythons and in some cases. I don’t understand this at all. In any case, I’m glad to learn about the larger issue.

          1. 13

            Nice article!

            To C programmers, this can sound like a downside of Rust - there are many places in real-world code where you do in fact statically know that your index is in-bounds

            It’s worth mentioning that Rust is able to elide bounds checks in a lot of cases: but that this is a product of the optimiser, not a static guarantee.

            1. 12

              This isn’t specific to Rust. LLVM is very good at this kind of elision, it’s part of the scalar evolution framework that is able to compute the range of possible values for a loop induction variable, which then lets the check know that it will pass for all possible inputs if the upper bound can be shown to be less than or equal to the length. The same benefit is visible with C++ bounds-checked types and with other safer front ends.

              That said, the folks at Kings College London who are working on CHERI Rust reported about a 10% speed up from a compiler mode that disables all dynamic memory-safety checks, geomean across the benchmarks that they ran, so I’m not sure I believe this except for codebases where the bounds are all very static. That’s probably the case for encryption libraries where basically everything except the input and output are fixed length, but encryption libraries are not even slightly representative of other code.

              That’s not to say that you shouldn’t use Rust. If you write correct C code then you will need to manually insert bounds checks in almost all of the same places and the performance delta will be much less (and probably favour Rust because the type system can communicate more information to optimisers).

              1. 3

                It’s a huge part of a a small but important problem I have rust: it relies extensively on llvm to make what it does possible. Without llvm’s “magic” there is no way that rust could compete with any other language, let alone C and C++ (in terms of performance). The compiler emits so much unoptimized, redundant IR that’s chock-full of branches and relies on llvm to figure out what can be stripped altogether. Unfortunately every time rust updates its llvm backend, there are regressions (and wins!) in terms of codegen. There are no guarantees in the way that llvm process IR.

                1. 4

                  I think that’s unavoidable for any nontrivial language. Clang has the same problem, the difference is that LLVM developers are more likely to test clang and there are some vaguely decent C/C++ benchmark suites. Rust could significantly improve this by having a nightly build not that built and ran a load of Rust code with LLVM trunk and reported regressions.

                  So far, I’ve seen very little engagement from the Rust compiler folks with LLVM upstream. There have been several threads on subject like pointer provenance that will directly impact Rust, with no on saying ‘I work on the Rust compiler and this is what would work for us’.

                  1.  

                    Without llvm’s “magic” there is no way that rust could compete with any other language, let alone C and C++ (in terms of performance).

                    AIUI without LLVM’s “magic” there is no way C and C++ would compete in terms of performance.

                2. 1

                  When you say “a lot of cases”, which cases do you have in mind? I read about Rust’s ability to elide bounds checks but so far I’ve never actually seen it happen. I don’t often program in Rust though.

                  1. 8

                    one case is when you have two array indexing operations in the same function where one index is known to be at least as large as the other, for example: https://godbolt.org/z/Y3sbhe9YG (note the cmpq $4, %rsi followed by the jbe to the panicking branch only appears once despite two indexing ops). At the end of the day this is “just” a pretty simple local optimization as far as LLVM can be concerned

                    1. 3

                      Thank you. A Compiler Explorer example too - perfect!

                      Update: I wonder how often that situation comes up in practice. Do you need to have both indices be hard coded? Rust defines integer overflow in release mode as twos-complement wrapping, so you can’t be sure that x+1 is at least as large as x.

                      This has two bounds checks: https://godbolt.org/z/rbGWhMrEY

                    2. 1

                      I have not confirmed this but I am pretty confident that if you iterate over an array you don’t have bounds checks, because the iteration is already bound.

                      EDIT: here’s how this happens for vectors, this is very hard to grok but basically when iterating over a vector the code doesn’t do bounds checking (because of the idea that we have already confirmed that the vector has N elements). So a lot of code that is “do thing over X collection” just completely removes these checks!

                      I do think that a lot of this code is… like maybe Rust is expecting the compiler to do a lot of work to simplify a lot of this.

                      1. 1

                        Yes, using iterators is the only situation I’ve ever come across before in my own Rust programming where there are no bounds checks. However, I thought that was because Rust didn’t inject bounds checks in that situation, rather than because they were added but later elided.

                  1. 1

                    I don’t really agree with the idea that mmap memory is heap memory, in spite of the long argument. On a modern *NIX system, mmap is the only way of getting memory in a userspace process and so is the thing used to get the heap, the stack, and the mappings of the binaries that provide globals and functions. If all of these are heap memory then everything in a process is heap memory and you may as well just say ‘memory’.

                    It’s also worth noting that you often want to JIT units smaller than a page and so can’t do W^X the way that this article proposes. The canonical way of doing it is to create an anonymous shared memory object and map it read-write (or write-only) in one location and read-execute (or execute-only) in another location. This makes it difficult to automatically find the writeable address from a function pointer. Apple’s iOS JIT goes one step further and JITs a memcpy that copies into the writeable location and then scrubs all other references to the writeable address, leaving only the execute-only mapping. To be able to write, you need to either leak the writeable address via a side channel (unfortunately, this is quite easy) or find the address of the special memcpy (unfortunately, this is also easy because it’s at the start of the region and so easy to guess given a function pointer and easy to check guesses with speculative side channels).

                    1. 1

                      Hm. An idea that comes to mind:

                      • Make a file and unlink it
                      • Create a subprocess, share the file with it
                      • mmap the anonymous file X in the main process and W in the subprocess
                      • When you want to update code, send an IPC message to the subprocess asking it to update the file on your behalf
                      • The subprocess can authenticate you with a passcode (64 or 128 bit random number) or a HMAC

                      Higher overhead because of the IPC and context switching, but probably easier to hide the authentication key.

                      1. 1

                        This is more or less what the pre-Chromium Edge did on Windows, except that Windows has much nicer APIs for it. The memory mapping APIs take the HANDLE to the target process, so one process would JIT into a shared memory object and then map that object executable in the renderer process.

                        I’m not sure what the HMAC is for here. You already have control over who can send messages because only the process that has the other end of the IPC channel (socket or pipe) can send messages to the JIT process. If your goal is to privilege separate within the process that’s running the code, you’re solving an impossible (without CHERI) problem because an attacker who can run JIT’d code can insert a gadget that lets them probe the secret with speculative side channels and so they can trivially extract it in well under a second and then forge it. And if you do have CHERI, then you can put the JIT in a separate compartment in the same address space, map the memory RWX, give the consumer an X / RX capability and the JIT compartment a W / RW capability and get the same security guarantees with a fraction of the overhead.

                    1. 10

                      The ping process runs in a capability mode sandbox on all affected versions of FreeBSD and is thus very constrainted in how it can interact with the rest of the system at the point where the bug can occur.

                      Score one for capsicum!

                      1. 7

                        Meh. My FreeBSD desktop is unaffected.

                        1. 2

                          Not to imply that cheri isn’t cool, but memory safety is table stakes. Cap safety is more interesting (what about logic bugs?). (Not that unix is set up to let you do a good job of anything, but…)

                          1. 5

                            CHERI is also a capability system. You can use it in combination with Capsicum in some interesting ways, but also use it to sandbox parts of code, such as the packet-parsing bits of ping.

                      1. 17

                        The native Haiku GUI support is especially cool because, for once, it was not developed by one of the “usual suspects” among the Haiku community and then later upstreamed (if at all – many projects are reluctant or even hostile to accepting Haiku support, and so it’s unfortunately often our habit to not even try), as basically all other ports of major applications have been so far, including Qt, LibreOffice, gVim, etc., but was written by an Emacs developer and was upstream from the start.

                        1. 6

                          if at all – many projects are reluctant or even hostile to accepting Haiku support, and so it’s unfortunately often our habit to not even try

                          We took Haiku patches in snmalloc. The Haiku VM subsystem is sufficiently different from everything else that we support that we’ve caught at least one bug because someone on Haiku could reproduce it reliably but for everyone else it was a rare and intermittent crash.

                          I doubt I’ll ever use Haiku (in spite of my fondness for BeOS 5), but I see Haiku support as evidence of code quality in a project. It means that you have clean enough platform abstractions that the cost of supporting a niche platform is small (our Haiku-specific code is tiny), which tends to be a good proxy for whether you have well-thought-out abstractions in your code. The number of projects that I’ve seen recently that have Mac and Linux support, but struggle with something like FreeBSD (which is basically like Linux or like macOS in every category that userspace software would care about), let alone something a bit more different like Haiku, is depressing. When I started working on F/OSS things, running on multiple platforms was a source of pride for projects, the more obscure the better. That made it very easy to adapt code to new and interesting scenarios (want to run in the browser with asm.js? On a cloud service with WASI? On a microcontroller? On a mainframe? It’s just another platform and you already have the right abstractions), now we’re increasingly locked in to a Linux monoculture.

                          1. 4

                            if any piece of software would have upstream support for an obscure system would be Emacs, which in the year 2022 has an upstream-maintained port for DOS: https://www.gnu.org/software/emacs/manual/html_node/emacs/MS_002dDOS.html

                            1. 2

                              This is making it VERY HARD for me not to grab an old computer and put Haiku on it, dammit.

                              1. 2

                                I just got beta4 up and running on my “weird OS” machine (a gently-upgraded Thinkpad T440p) and…it’s actually really good. WiFi, the Nvidia graphics chip, sound, all of it just worked out of the box. (Now, getting a bootable root disk took a fair bit of back-and-forth with the installer, partition tool, etc.; the install app doesn’t really seem to be able to tell what configurations of disk layout, boot loader, etc. are actually going to be bootable, so unless you get it right off the bat for your particular hardware there’s some potential fiddling involved.)

                                If you have a previous beta up on a spare PC/VM/whatever, it’s pretty easy to upgrade in-place: https://www.haiku-os.org/guides/daily-tasks/updating-system

                                Just swap out the release name from r1beta3 to r1beta4, sync, and let ’er rip.

                              2. 1

                                I wonder which version will end up in HaikuDepot, the one with native GUI support or the terminal one.

                                1. 2

                                  Both, and it’s already been there for some months now.

                              1. 8

                                The two most ‘fun’ bugs that I’ve encountered (I wasn’t responsible for fixing either of them):

                                For a while, there was an issue on FreeBSD (I can’t remember if this was merged code or a branch) where, on multicore systems, data would be randomly corrupted. Investigation showed that the objects being corrupted were always accessed protected by a lock. The root cause of the bug was that someone had accidentally removed the annotation that caused some locks to be strongly aligned and, occasionally, they would span cache-line boundaries. Until fairly recently, LOCK-prefixed instructions on x86 chips that spanned such a boundary would not trap, they’d just silently not be atomic. This let two threads simultaneously acquire the same lock, but only in very rare cases.

                                The more fun one was actually a bug in the prototype CHERI CPU. The observable behaviour was that the network was really, really slow. Further investigation showed that we were seeing a lot of packets fail checksums (which caused TCP to retransmit, so no failure was reported to userspace). This was eventually tracked down to the CPU performing speculative loads even of uncached memory. The network interface was a memory-mapped FIFO, so if you read from the MMIO address you popped a word from the network device’s buffer. If you did this in speculation, you silently discarded a word from the packet, so the network stack would see a packet with a mismatched checksum. Amusingly, the Xbox 360 had a similar bug: there is an instruction to perform uncached reads and this can be executed in speculation, which can have observable side effects.

                                We also had a bug where the SD-card controller on the FPGA platform couldn’t keep up with the CPU’s write rate and so would write alternate blocks. We had 1 GiB of RAM and 512 MiB of SD card, so while the system was working we never actually read back anything that we’d written to disk: it stayed in the buffer cache. It was only after a reboot that we’d find that the root FS was completely broken. This was easier to find because we could look at the SD card, see interleaved zeroes, try a simple boot-and-write, then try a bare-metal write from the CPU without the OS and see that the hardware was at fault.

                                The most fun bug that I’ve had to find was in the Linux kernel (and, at some point, I really should upstream the fix). In one of the modes for paravirtualised console support (used only by the project I was working on and PV MIPS), it added the virtio console as console 0. It then iterated over consoles and failed to find any. It turns out that the collection that it uses to hold the list of available consoles uses a 0 value of a key as a not-present marker. Registering the console as console 1 fixed the bug. I particularly like this bug, because it took a couple of days to find, but the fix was flipping a single bit (the ASCII code for ‘1’ is one bit different from ‘0’, so it was one bit in both the source and the compiled binary). This is probably the highest ratio of debugging effort to fix size of any bug that I’ve fixed.

                                1. 2

                                  At 1,875 words, “Cramming” is concise

                                  … unlike this article, which expands one paragraph of fluff into several pages, like an essay by an english student who hasn’t read the book. tl;dr “Moore’s law was influential” ok and?

                                  1. 1

                                    Probably the expectation (or desire) for a more technical article about the subject matter of the title made it harder to read the article with clear eyes?

                                    The point of the article, in my mind, was more about the social influence of Moore on the tech industry. I am certainly familiar with the phenomenon of people using the quote-unquote promise of Moore’s Law as a crutch (to excuse bloated software, excessive UI, or just to avoid thinking about efficiency in general), but it never occurred to me that the seeds of that attitude were present in Moore’s original paper. I always imagined Moore’s observation was little more than a footnote, extracted from a more purposeful technical context – not the central conceit of a Silicon Valley manifesto.

                                    1. 1

                                      I got to the end of the very long introduction and then the article ended. Were we accidentally linked to an extended abstract by mistake?

                                      I was expecting an analysis of the end of Dennard Scaling around 2007 and the relationship between it and Moore’s Law (we’re still getting more transistors per dollar, we’re not getting nearly as many more transistors per Watt). Or an analysis of how different market segments have spent their Moore’s Law dividend (almost 100% on performance for desktop and workstation processors, around 60% on reducing price for microcontrollers). Or on the approaches that people have improve performance when they don’t get a load of free transistors to play with (specialised, heterogeneous cores that can be powered down when not in use). Instead I got… nothing.

                                      1. 1

                                        I had the same thought when I got to that part. On the other hand, once I figured out where Wired hid the byline (it’s very small and hard to read!), I realized the piece is by Virginia Heffernan, who always writes in this very excited style. I really like her work, but I can see how it doesn’t totally fit with a Lobsters-style dry technical discussion.

                                        1. 1

                                          It almost reads like the intro page to a series of articles that dig deeper, and I’m going to charitably assume that’s what it was (without checking). Maybe they just forgot the links to the deeper articles? Yeah, that’s the ticket!

                                      1. 16

                                        To summarise as fairly as I can: the author sees people trying to write correct code via unit testing. They are puzzled by this because Erlang isn’t supposed to be correct, it’s supposed to have bugs and then do some work to isolate those bugs, to be able to crash but the whole system keeps mostly running in the face of that crash.

                                        But like… wouldn’t you like to do both? Not have the bugs, and also safely crash when you have the bugs? I can’t tell if this is meant to be ironic or not.

                                        1. 11

                                          The goal for the ‘let it crash’ mindset isn’t to ship buggy code, it’s to acknowledge that even formally verified code is not bug free (your specification can never be complete) and your system should aim to be correct in the presence of failures. Testing is important in Erlang, but testing should be of the form of killing processes at surprising times and ensuring that the system recovers and does not lose data. This gives you a far more reliable system than a quest to eliminate bugs via unit tests. You have a limited budget for testing, you should spend it where it will deliver the most impact.

                                          1. 18

                                            Randomly crashing processes isn’t going to help me if I’m testing my deflate implementation or whatever. Regular unit tests are still very useful in Erlang for checking that functions do what they ought to.

                                            1. 4

                                              I’m familiar with Erlang and with let-it-crash, but that’s not the claim the article is making. Here it is blow-by-blow:

                                              1. I see people writing a lot about unit testing in Erlang
                                              2. Do we really need all that?
                                              3. Erlang has some testing stuff
                                              4. But there are cases that it can’t catch, and neither would a type system*
                                              5. Erlang’s history includes environments that needed very high uptimes
                                              6. To do that, Erlang has a way to continue in the face of crashes
                                              7. If you write “clear code with modestly-sized routines” you don’t need tests
                                              8. Some specific classes of bugs related to mutability aren’t possible
                                              9. Erlang has supervisors that can restart and log buggy code
                                              10. ∴ “some Erlangutans [are] missing the point of using Erlang”

                                              The only substantive thing there that could possibly mitigate the need for individual functions to be correct is the “clear code with modestly-sized routines”. Okay, yeah, if you never write wrong code then you’ll never have wrong code, that’s not unique to Erlang. But nothing here obviates the need for correctness. Let-it-crash allows your system that does 10 things to keep doing 9 of them while it’s broken. But it doesn’t make the other one not broken. It doesn’t make that function correct. I’m not even a TDD weenie myself but the idea that let-it-crash is in any way related to correctness, whatever form that might take, strikes me as absurd.

                                              Let-it-crash is very situational. It’s good to allow a complex system to limp along in the face of temporary external problems or problems only involving parts of the system. But “the point” of Erlang isn’t that absolves you from writing correct code with a magical “On Error Resume Next” that still builds your widgets even if you close the exit hatch before you eject them from the chute. Let-it-crash lets one of your two widget-builders keep running after one trips the breaker, which is great and is actually the “point of using Erlang”. But if you don’t fix that bug you’ll still never build a single widget.

                                              *: that particular case can be caught by some type systems like affine or linear types. I don’t claim that Erlang would be improved by those, just some fun trivia

                                              1. 1

                                                How can you be correct with unforeseen failures? Seems like the general opinion around this is if an unplanned error happens, almost all bets are off and you can’t trust anything later to be correct so you should just crash and get a person to review the problem.

                                                1. 1

                                                  How should we distinguish between foreseeable and unforeseeable failures? If I write a program that tries to read a file whose path is provided by user input, is it not foreseeable that this file may not exist? Under what conditions is it appropriate for the entire program to crash in response to this error? Under what conditions is it my responsibility as a programmer to explicitly deal with this error? What about errors during file reads caused by device errors? Or errors during file descriptor close or flush or sync syscalls? Or errors when connecting to a remote API? Or during network transfers? Or when communicating with a database?

                                                  If you want to define crashworthy errors as whatever the programmer says are crashworthy errors, okay! But then what does crashing mean? If you’re serving a request in an Erlang actor and you hit a syscall error then it’s pretty OK to “crash” that actor because actors are request-scoped and the crash just terminates that single request. But if you’re serving a request in a concurrent e.g. Rust program and that request hits a syscall error then you definitely shouldn’t “crash” the entire process, right? You can terminate the request, sure, but that’s not really what anybody understands by “crash” terminology.

                                                  1. 3

                                                    At my (former) job, I used Lua to process incoming SIP messages. It uses Lua coroutines [1] to handle each transaction. If some unexpected thing happens (like a nil reference missed in testing due to unexpected input for instance) only that coroutine “crashes”—that is, ceases to run. This is caught by the framework I’m using and logged. The “crash” does not affect the rest of the program. In my experience, most of the crashes were due to mishandling of input, or rather, a miscommunication about what, exactly, we could expect from the Oligarchic Cell Phone Company that was our customer [2]. We don’t expect crashes, but sometimes we do overlook some conditions (as I’m fond of saying, “bugs are caused by an inattention to detail.”).

                                                    While Lua isn’t Erlang, I think the overall effect is the same—a crashed thread does not impinge on the rest of the program. Perhaps a better distinction is “let it soft crash [3].”

                                                    [1] Think threads, but not pthreads threads, but Lua specific ones cooperatively mutlitasked.

                                                    [2] Our service is only for the North American Numbering Plan. We did not expect to receive international phone numbers at all (and weren’t for the non-SIP legacy interface).

                                                    [3] Where a “soft crash” just stops the processing and a diagnostic log is issued.

                                                    1. 1

                                                      Lua coroutine . . . [crashes are] caught by the framework I’m using

                                                      Great! Good! My point here is not to assert an absolute, or to deny your specific experience in any way. My point here is to say that the general notion of “crashing” usually does not describe the act of terminating entities under the management of an eg Erlang OTP, or your Lua framework, but instead usually describes the act of terminating entire OS processes.

                                                      a crashed thread

                                                      I’m trying to communicate that most people do not generally understand threads as things that can “crash”. Threads die or terminate, processes crash. The difference in terminology is my entire point.

                                                      1. 1

                                                        Then what would you call it?

                                                        I have a Lua script. I try running it, Lua spits out “attempt to index a nil value (global ‘x’)” and ends the program. That is, if I understand your nomenclature, a “crash.” Yet if I wrap that same code in a coroutine, the coroutines fails to finish, yet the script continues. What did the coroutine do? Just terminate? I personally consider it a “crash.”

                                                        And what does it mean for a “process” to crash? On Unix, the system continues running. Yet on MS-DOS, such a “crash” will usually “crash” the computer. Yet both are crashes. Why can a “process” running under a protected domain (Unix) crash, and yet, threads cannot? I think the participants here are using a broader definition of “crash” than you personally like.

                                                        1.  

                                                          I have a Lua script. I try running it, Lua spits out “attempt to index a nil value (global ‘x’)” and ends the program. That is, if I understand your nomenclature, a “crash.” Yet if I wrap that same code in a coroutine, the coroutines fails to finish, yet the script continues.

                                                          If you can write a bit of code that runs (more or less) equivalently as its own OS process (scenario 1) or as a coroutine among other coroutines in a single shared OS process without modification (scenario 2) then whatever is orchestrating coroutines in the second scenario is necessarily enforcing isolation boundaries that are functionally equivalent to the OS threads in the first scenario.

                                                          What did the coroutine do? Just terminate? I personally consider it a “crash.”

                                                          If that code fails in scenario 1, is that a crash? IMO yes.

                                                          If that code fails in scenario 2, is that a crash? IMO no.

                                                          Why can a “process” running under a protected domain (Unix) crash, and yet, threads cannot? I think the participants here are using a broader definition of “crash” than you personally like.

                                                          My understanding of “crash”, or yours, isn’t important to me, what I care about is what the average developer understands when they see that term, absent any other qualifications. I don’t think most developers consider a thread termination to be a crash, and I think if you want to say “crash” to mean something like that you need to qualify it. That’s my only point. Could be wrong.

                                                    2. 2

                                                      It’s worth pointing out that in the Erlang world, it’s not the entire program that crashes. Like, ever. Instead, whatever individual process that ran into the error is the piece that fails. In the context of a program loading a file, it’s likely that just the process responsible for loading the file would die. Which makes perfect sense, since it’s no longer a meaningful operation to continue loading a file that we can’t open.

                                                      The elegance of the “let it crash” mentality is that it lets you handle foreseeable, unforeseeable, and foreseeable-but-too-unlikely-to-care-about failures in the exact same way. Like sure, you could check to make sure a file exists, but what if it’s corrupted? Or someone takes a sledgehammer to the hard drive while you’re in the midst of reading the file? There’s an infinite number of failure states for a given operation, and for a lot of them there isn’t a meaningful resolution within the operation itself.

                                                      1. 1

                                                        It’s well-understood that “crashing” in Erlang doesn’t mean crashing the entire program. The problem is that “crashing” in basically any context except Erlang does mean crashing the entire program.

                                              1. 5

                                                Please do not advise using macros for constants or functions.

                                                For constants,

                                                Unlike variables, you can use macros in contexts where you need a “constant expression,” like array lengths or switch statement cases.

                                                For these cases, please use enum.

                                                Having constants actually be “constant expressions” is very useful and hence they should usually be defined as macros.

                                                Please don’t. Your compiler is smarter than you and a simple static const is enough.

                                                For functions,

                                                The code is pasted right in the surrounding code, instead of compiling function call instructions. This can make code faster, as function calls have some overhead.

                                                No. The compiler is smarter than you.

                                                They can be type-generic. For example, x + y is valid syntax for any numeric type. If we made that a function, we’d have to declare them as arguments and choose their type, i.e. size and signedness, in advance, which would make it only usable in some contexts.

                                                Please define functions with specific types, and use _Generic to make a generic macro that uses actually defined functions.

                                                1. 4

                                                  Lots of this advice has aged poorly, because it was passed down without context. Now people read it the way they read advice on Rust or Python, unaware of two major caveats that frequently degrade its quality:

                                                  • That there isn’t a single authoritative implementation of (generally) reasonable quality that works across the architectures that the single authoritative upstream considers important and is able to support.
                                                  • That part of C’s cancerous spread success laid in its ability to usefully leak quirks of the underlying hardware in a way that enabled you to (mostly) use it as a portable assembler for the boring parts and a non-portable assembler for the interesting parts.

                                                  So lots of C optimization advice floating around the Internet, including e.g. the one that the author quotes about macros vs. inline functions and constants, is mostly about handholding ancient or (even contemporary, sadly) proprietary compilers.

                                                  There’s surprisingly little C optimization advice that’s correct everywhere. Every generation discovers this. Lots of my “grasping C” process consisted of unlearning optimization advice (including “no, the compiler is smarter than you”) that I’d absorbed without realizing it was mostly specific to Watcom and x86, and Watcom was not a bad compiler.

                                                  If anyone’s still learning C today, my advice would be to focus more on understanding common compiler idioms and platform-specific optimizations that compilers use (or not!) and less on “optimization tips”. After the initial adjusting period, reading assembly outputs is usually more productive than reading language tips.

                                                  1. 3

                                                    Well, if the programmer suggests using macros for optimization purposes, the answer is always “no, the compiler is smarter than you.” If the programmer suggests looking at assembly for optimization purposes, the answer is always “the compiler makes dumb decisions.”

                                                  2. 1

                                                    The advice around const is also misleading. You can cast from const T to T, but doing so must be explicit. Some standard library functions must do this (e.g. memchr). It’s also worth noting that, for pointer and array types, it doesn’t mean that the object is immutable, it just means that you promise not to mutate it through this pointer. In the presence of aliasing, the object may still be mutated between two reads through a const pointer.

                                                    1. 1

                                                      Please don’t. Your compiler is smarter than you and a simple static const is enough.

                                                      Not here it isn’t:

                                                      static const int foo = 4;
                                                      // ...
                                                      const uint *arr[foo]; // foo is not a constant
                                                      

                                                      That would work in C++, but not in C. Not in C99 at least.

                                                      1. 2

                                                        I think the way /u/jxy would write that is like this (“For these cases, please use enum.”):

                                                        enum { foo = 4 };
                                                        const int *arr[foo];
                                                        

                                                        https://godbolt.org/z/MT1Eobxfh

                                                        1. 2

                                                          The Apple system headers use this idiom a lot. I had never come across the idea of an unnamed enumeration in C until I read some of them, but they seem to be a good way of getting constants into headers. They also play a lot nicer with FFI tooling than using defines if they’re an expression. Consider the following two lines:

                                                          #define BIT4 (1<<4)
                                                          enum { Bit4 = 1<<4 };
                                                          

                                                          If your FFI uses libclang to parse the header then it can extract the value of Bit4 with a single call, independent of how complex the expression is, because clang can evaluate things that the C specification says are integer constant expressions and exposes this with the clang_getEnumConstantDeclValue function. In contrast, the first requires that you identify that this is a macro that doesn’t take any arguments, then parse it to determine that it doesn’t refer to any external variables, and then evaluate it. In this case, it’s quite simple, but consider if it’s some dependent value:

                                                          #define BIT4 (1<<4)
                                                          #define NEXT_BIT (1<<((sizeof(int)*CHAR_BIT) - __builtin_clz(BIT4)))
                                                          enum
                                                          {
                                                                  Bit4 = 1<<4,
                                                                  NextBit = (1<<((sizeof(int)*CHAR_BIT) - __builtin_clz(Bit4)))
                                                          };
                                                          

                                                          The second one will be parsed by libclang and exposed in the AST. Indeed, if you compile this with -Xclang -ast-dump, then you will see that clang has evaluated the enum constants while creating the AST. This has one additional benefit: when I first wrote this, I missed some brackets and so got a warning from the enum version, whereas the warning would appear on each use with the macro version.

                                                          Now consider trying to evaluate NEXT_BIT for FFI. It depends on the sizeof operator, fortunately on a type and not an expression, but it also depends on a compiler builtin and on two symbols CHAR_BIT and BIT4 that come from the other macros. These are also preprocessor macros and so you first need to evaluate them (fortunately, CHAR_BIT, at least, is a single integer). This introduces a lot of work into the FFI binding generator and a lot of things will correctly handle simple cases and then suddenly fail on more complex ones. The __builtin_clz case is particularly interesting because most will not handle it correctly, yet C code that uses it will compile correctly with clang or gcc.

                                                      2. 1

                                                        For constants,

                                                        Unlike variables, you can use macros in contexts where you need a “constant expression,” like array lengths or switch statement cases.

                                                        For these cases, please use enum.

                                                        I read the same argument in Kernighan’s and Pike’ Practice of Programming, and I get the elegance of it, besides what @david_chisnall already told us.

                                                        But besides it being a C construct, not a pre-processor one, and making it show in the results of ctags(1) (the latter being already an advantage, in my perspective), what is the real technical explanation for this advice? Is it solely these two, or does it improve optimisation passes?

                                                        1. 2

                                                          It’s mostly the same to any C compiler, except that the compiler should error if you define an enum twice, but would only warn if you define a macro twice. For programmers, it saves you from people laughing at your SIX*NINE.

                                                      1. 2

                                                        In a different take on the title, I recently contacted a company to ask what hash algorithm they used to “store” some user data. After a lot of back-and-forth, I was told that

                                                        We’re so sorry to say that this information is confidential. We can not reveal it due to the rules of our secret [sic] business.

                                                        1. 6

                                                          Well, of course! If you contacted an NSA subcontractor which is developing a new highly-secure quantum hashing algorithm to hash government top-secret data, what did you expect?

                                                          Oh, it was just some random company? Oh well…

                                                          Now seriously, I totally understand you, heh… I’ve had and heard of so many of these types of exchanges!

                                                          That reminds me of something that really happened to me just today (believe it or not!):

                                                          I was told by my lawyer (don’t ask) that since you couldn’t digitally sign every page of a PDF document separately (I don’t even know what that means), that she was going to separate all the pages of the document into separate PDF documents and requested me to digitally sign all the PDFs (one page each). Here’s how the telephone conversation went:

                                                          • Me: but you know, if you separate all the pages into separate documents and I sign them separately, then you could just replace some pages by other pages signed by me from another document and fool the judge into thinking I signed them as a whole.

                                                          • Lawyer: No, but I wouldn’t do that, I will only send these pages!

                                                          • Me: Yes but I mean, even if you do that, when the judge receives the pages he will not be able to tell whether you sent all the pages of the document or whether some pages are conveniently missing. Are they at least numbered as in “1 of 10”, “2 of 10”, etc?

                                                          • Lawyer: Well, no, we don’t number them like that in our legal documents, we only put the first number! But I will send them all! The judge asked us to do it this way!

                                                          • Me: Uhm, yes, but you should really tell the judge this is a bad idea! You should always digitally sign the entire document as a whole, as that’s what the digital signatures are designed to do, otherwise the digital signatures can be misinterpreted to mean something they don’t.

                                                          • Lawyer: OK, I will speak with the other lawyer and then I’ll send you an email.

                                                          • Lawyer’s email: Here’s the document. I sent you the document as separate pages and also as a whole. Please sign them all. When you digitally sign the PDF that contains the document as a whole, please digitally sign only the last page!

                                                          …………… Oh for the love of God!!!

                                                          1. 8

                                                            Back in 2007 (I think), I had the contract for my first book. I was emailed the PDF and was asked to digitally sign it. This was very exciting, I’d never been asked to do that before! So I dutifully created a key signed by my CACert identity, hashed the document, and sent them back a copy of the hash signed by my certificate and a copy of the certificate. They had no idea what to do with this. It turns out, they wanted me to scribble a signature on the PDF and send it back. Now, the fun thing about PDF is that the format is designed for non-destructive editing: the signature is added as a separate object and is rendered on the top. Given this document, it’s trivial to extract that object and replace any of the layers underneath. Total security value of this dance: zero.

                                                            More recently, I’ve been asked to sign something for LLVM’s relicensing using DocuSign. This required me to type my name into a text field. At no point in the process did it do anything that verified my identity. If I knew someone’s email address (public, in the git logs) then I could easily sign on their behalf (I didn’t need to be able to receive emails there, I just needed to enter it).

                                                            Legally, in English common-law countries, the requirement for a contract to be legally binding is that a ‘meeting of minds’ has occurred: i.e. both parties have a reasonable chance of understanding what they’ve agreed to (they don’t have to exercise this chance - if I give you a contract and you sign it without reading it then that’s your fault, if I give you a contract and don’t let you read it until you’ve signed then that’s not binding) and have indicated agreement. Last time I checked, there was absolutely nothing in statute law in the UK or USA about what constitutes a valid signature, but there was a lot of precedent that a hand-written signature (especially a witnessed one) is a strong indication of a meeting of minds. I would love to see what happens when someone challenges DocuSign in court: we may find that a lot of ‘signed’ documents are not, in fact, legally binding.

                                                            1. 2

                                                              The US and UK both have statutes specifically to the effect that typing your name counts as a “digital signature”.

                                                              Of course this is much more easily forged and repudiated. In practice this makes them suitable only for what American lawyers call contracts of adhesion. In general those limit the liability of one party (usually a company), impose an obligation to pay fees on the other party, and often imposes an arbitration clause. If identity of the consumer matters to the company, it can be verified separately from the signature.

                                                              Accordingly, the main remedy of the company is to cease supplying services, and they only really care about identity when there is a significant debt to be repaid, ie the main problem is identity theft. I haven’t followed what actually happens in identity theft cases, both in and out of court. In general the party suing on the contract has the burden of proving that the other party entered into the contract. So, interesting litigation on proof of identity around digital signatures is probably not looming, but if it does come around the background will be wildly and foreseeably inappropriate use of docusign-type signing.

                                                              1. 2

                                                                I forgot to mention that just before that part of our conversation, we had clearly established that by “digitally sign”, we meant to sign using the (cryptographic) client certificate issued by the government authority, which is used by everyone in the country who wishes to authenticate themselves to all of the government’s websites that need authentication (e.g. to pay taxes, to change your legal tax residence, to get your birth certificate, etc).

                                                                In fact, a few days before, I had already signed that document by doing what you mentioned: adding a scribble to the PDF, which I had hand-drawn with my finger on my smartphone. This was rejected by the judge, although I’d really be interested in knowing why, because I don’t think it was due to mismatched signatures (otherwise they either could’ve just asked me to repeat my signature, knowing perfectly well that I’m me and not someone else trying to forge my signature, or if they suspected I’m not me, they wouldn’t offer to send me the document by snail mail to an address of my choosing). My lawyer clearly said that alternatively, we could send the judge an original copy of the document with hand-drawn signatures by all parties (although this would be very inconvenient because I’m currently in a different country).

                                                                I understand what you mean by what is considered a legally binding contract, and I’m sure there are relatively similar laws on the countries I’ve lived (although perhaps the specifics can be different).

                                                                But what is also interesting to me is that hand-drawn signatures don’t actually seem to mean anything in almost any context except if you interpret it as a symbolic act, or a mere formality. Why? Because nobody ever checks or rejects my signature, even though very often, I sign in very different ways. Nobody ever checks whether the signature matches my government ID, including lawyers and government clerks! Actually, that’s a lie: I’ve had my signature rejected a couple of times many years ago, in very absurd situations where they were forcing me to repeat drawing my signature several times until my new signature matched my old signature, in which I was actually allowed to look at my old signature to try and copy it as perfectly as possible (which is what I was trying to consciously do, unsuccessfully according to them). They were basically forcing me to forge my own signature.

                                                                There’s also the issue that my hand-drawn signature, including a copy of my government ID, has been sent to hundreds of parties over the years, including many private individuals and private companies, which they could just digitally copy or manually recreate to forge my signature if they wanted to (although of course, that’s illegal and they’d suffer severe penalties if caught in some other way).

                                                                Finally, because of the absurd/weird requests they made, it’s obvious to me that neither my lawyer nor the judge understand what guarantees a cryptographic signature gives you and I think, almost surely because of the similar name, that they just think it’s similar to a hand-drawn signature which you should do page by page. I mean, I don’t blame them, as they’re not technical people and these things can be hard to understand (even for many, if not most technical people).

                                                                But I sure think the government should have more strict standards regarding how digital signatures are used and accepted, given that although they don’t 100% guarantee it was you who signed the document, they do allow you to (and usually do) give you stronger guarantees than hand-drawn signatures, while making fewer assumptions and simultaneously being more convenient (especially if you’re physically far away).

                                                                Edit: Just a few weeks ago, someone also requested me to do something interesting: instead of adding a hand-drawn scribble of my signature to the PDF document in black (where the document’s text was in black), they told me the scribble needed to be in blue (?!).

                                                          1. 7

                                                            There exists a seemingly innocent piece of data - an image, an MP3, a text file - which when fed to MD5 produces these 128 bits: … Decoded into ASCII, that spells I hate the queen.

                                                            This is pretty unlikely to be noticed because a digest/hash is binary data and won’t be displayed as ASCII unless someone’s scanning it for strings. And in that case, it’s just as likely for that string to occur in some other binary data like JPEG or ZIP.

                                                            What’s more problematic is that digests are commonly displayed in hex or base64 (or base58 or base32 or…) With hex you have to get pretty creative to construct an illegal message — maybe DEADBEEF would get you in trouble in a hypothetical Hindu theocracy? — but the others include the full English alphabet. And as a bonus, they result in longer strings than an ASCII dump (⅓ longer for base64), so more room to hang yourself.

                                                            I’ve actually thought about this issue for reals…

                                                            So, in the P2P social network I’m building, a user’s secure global identity is an Ed25519 public key. This is for most purposes a random 256-bit number. It isn’t the ID users normally see, but it is visible because it’s the source of truth of identity. My software displays it as a 43-character base64-encoded string (usually only the first 8 chars but you can point at it to show the rest.)

                                                            On first launch, the program randomly generates a key-pair that will be your public ID. The question is, should the program’s setup UI show you its visible base64 form and allow you to reroll the dice? Because there’s a nonzero chance that your key contains something offensive/embarrassing/illegal like “ieaTbo0gerS”, maybe even in the first 8 characters. Is that likely enough for me to accommodate it in the UI?

                                                            1. 3

                                                              As I understand it, one of the benefits of Ed25519 over RSA is that the keys can be random numbers. Depending on how many of the bits you actually need for security, you could allow the user to choose a few characters. I’ve never seen this done with crypto keys, but Facebook puts faceb00c in most of their IPv6 addresses, which makes them very recognisable as hex strings.

                                                              1. 1

                                                                The private key is a random number. Almost any 256-bit number will work; you just give the key-generator 32 bytes from /dev/random and it tweaks a few bits and that’s the private key.

                                                                The public key gets derived from that by a bunch of math. So none of the bits of the public key can be chosen; it’s for all intents and purposes random.

                                                                I do have some experimental code that sets all CPU cores to generating key pairs until they find one where the public key’s base64 encoding has a desired prefix. It’s pretty slow — even on my M1 MacBook Pro it takes 10-20 minutes* if you give it a 4-char prefix. You could argue that this has some value as a proof-of-work, i.e. someone with a “vanity public key” is less likely to be a spam/bot account. Unfortunately the last thing I want to do is inject a 20-minute delay into onboarding — it’s terrible for user adoption. And by the time you’ve used the network long enough for a cool ID to be valuable, you’ve tied yourself socially to your existing non-cool ID.

                                                                * I’m sure this could be sped up a lot if the keygen function ran on the GPU, but I guess typically keygen isn’t considered worth optimizing that much; I only found one such implementation and IIRC it looked like it would only run on some GPUs and not on Macs.

                                                              2. 1

                                                                why not? I’m sure some people will find it fun to try to get a funny base64 id.

                                                                1. 3

                                                                  There are already vanity crypto wallet ID generators for this purpose.

                                                              1. 7

                                                                I noticed a weird trend where the check for if something is undefined happens after the undefined behavior. Intuitively it makes sense to me that this is insufficient so I’m wondering why this failure is so common? For example here the overflow check happens after the overflow already happened.

                                                                1. 4

                                                                  I don’t usually do that, but in my case there were 2 reasons:

                                                                  • the initial intention when writing it wasn’t protecting against overflow/UB but simply protecting against the computation going outbound for the dereference
                                                                  • the assignment needed to be done early because I was targeting a codebase with some ancient C convention on variable declaration required to be before any code; and since I had the form where I wanted to bail out early instead of opening a scope, I had to declare it early:
                                                                  int x = ...
                                                                  if (...)
                                                                      return -1
                                                                  // x can not be declared here
                                                                  return f(x)
                                                                  
                                                                  1. 9

                                                                    Not trying to be critical, but it shouldn’t be news that you can’t check for a UB condition after it’s happened. I’ve seen so many cases where people run into similar problems, or problems related to the type-punning rules. Usually the thought process is something along the lines of:

                                                                    ok, so I know this is supposed to have undefined behavior. But the variable will still have some value, so I can check if the value is in the correct range and so avoid issues that way…

                                                                    No, you can’t. This is what undefined behaviour means. All bets are off; if it’s hit, it is a bug in the code, full-stop, and no checking afterwards can fix it because the behaviour from that point (*note1) on is totally undefined. Maybe it seems to work in some cases. It doesn’t matter; use a different compiler, or a later version of the same compiler, and all of a sudden it could stop “working”.

                                                                    Don’t think of the C compiler as some sort of glorified high-level assembler. It’s way more sophisticated than that. There are rules you have to follow, if you are using any of the modern compilers. You don’t have to like it (and there are even switches available that will give behaviour that you want, but that’s not standard C any more) but it is the case. You must not ever invoke undefined behaviour.

                                                                    Note 1: Actually, I believe the whole program behaviour is undefined if the program exhibits undefined behaviour at any point. So technically, even things that were supposed to happen before the UB might not happen or might happen differently.

                                                                    1. 5

                                                                      You are technically correct, but I’m sure you understand that the consequences of such a policy means that pushed to the extreme we could have the situation where a 100k LoC codebase has a single little bug deep down somewhere, then crashing or executing random code straight at startup is an acceptable behavior.

                                                                      The cost of a single mistake is very high, that’s the main point I’m trying to make.

                                                                      1. 9

                                                                        What’s the alternative? If the compiler can’t optimise around code that hasn’t been executed yet having UB, then the opportunities for useful optimisation become near non-existent.

                                                                        The compiler is not doing anything unreasonable here: every step, in isolation, is desirable and valid. If the end result feels unreasonable, then that’s either (a) a problem with the language spec being insufficiently relaxed about what it considers to be UB or (b) a problem with insufficient tooling (or, in the case of a language like Rust, built-in compiler checks) to catch issues like this.

                                                                        To point the finger at the compiler is a very hard sell indeed because there’s no specific thing to point at that it’s doing wrong.

                                                                        1. 9

                                                                          It might be reasonable not to do the optimization. The alternative in rust is to actually define the behavior of wrapping, which would be equivalent to using -fwrapv in C. Sure we loose some optim, but is it worth it? I’m starting to believe so.

                                                                          1. 10

                                                                            Yes, I agree: but that’s a problem with the language spec, not the compiler. The language spec should just say ‘overflow has wrapping semantics’. You’ll lose some opportunities for optimisation and compatibility with a lot of older of obscure platforms (some platforms have arithmetic instructions that don’t wrap on overflow, and this is one of the big reasons that the C spec leaves overflow undefined!), but this is enough of a footgun that I think it’s a worthwhile tradeoff in the year of our lord 2022.

                                                                            But again, this isn’t GCC’s fault: this is the fault of the language spec and the compromises that went into its creation. Don’t like it? Time to get a new language (this isn’t me trying to be gatekeepy: horrid footgun shit like this is a big reason I moved to Rust and never looked back).

                                                                            1. 6

                                                                              Not saying it’s GCC fault, but just because a spec did a mistake doesn’t mean GCC should be braindead about it: it holds a responsibility for all the software in C out there. Nothing forces GCC to do dangerous optimizations; it can still follow the specs by not honoring this part. GCC serves the user, not the specs; the question becomes: do users want this kind of optimization and assume its consequences by default?

                                                                              1. 3

                                                                                Where’s the mistake? Integer overflow being undefined is a feature, not a bug. There are platforms where the behaviour of overflow is implementation defined, entirely unpredictable, or just straight up UB at a hardware level, leaving the machine in a totally invalid state. C is designed to target bizarre and unusual architectures like these, and so having integer overflow be undefined is a huge boon to the language’s portability without sacrificing (and even improving, in many cases) performance.

                                                                                If you’re just going to do language spec revisionism and claim that ‘the spec is wrong’ or something, then I think it’s clear that C’s not the language for you. Heck, it’s definitely not the language for me: I aggressively avoid touching the thing nowadays.

                                                                                1. 3

                                                                                  I am sure there is, so please name one.

                                                                                  1. 3

                                                                                    Many GPUs have saturating semantics on overflow. Other architectures emulate small integers with large integers, meaning that overflow results in unobservable ‘extra bits’. Changing the standard to make integer overflow always wrap would make writing C for these architectures extremely difficult without significant performance ramifications.

                                                                                    If reducing portability is fine with you, then so be it: but that’s not what C is for: it’s supposed to be the lowest common denominator of a vast array of architectures, and it does this quite effectively in no small part because it leaves things like integer overflow undefined.

                                                                                  2. 3

                                                                                    There are platforms where the behaviour of overflow is implementation defined, entirely unpredictable, or just straight up UB at a hardware level, leaving the machine in a totally invalid state.

                                                                                    Can you name one such platform? That is still used after Y2K?

                                                                                    Also note that the spirit of UB in 1989 was almost certainly a compatibility thing. I doubt the standard committee anticipated anything other than -fwrapv on regular 2’s complement processors. And it’s only later that compiler writers realised that they could interpret “UB” in a way that in this particular case was most probably against the spirit of it.

                                                                                2. 2

                                                                                  Yes, I agree: but that’s a problem with the language spec, not the compiler.

                                                                                  Compiler writers are on the standard committee…

                                                                                  1. 1

                                                                                    I dont think defining the behaviour of overflow is desirable: programmers want overflow to happen in very rare cases and defining its behaviour now means tools cannot distinguish between overflow the programmer wanted/expected and accidental overflow (the vast majority of cases in my experience).

                                                                                    We currently can write sanitizers around overflow because its undefined, if we had defined it as wrapping the sanitizers could only say “well its wrapping, but I guess you wanted that, right ?”

                                                                                    AFAIU rust traps on overflow in debug, and defines it as wrapping in release, I believe this is mostly because they decided undefined behaviour in safe code was unacceptable, so they went with defined but very likely wrong in release.

                                                                                  2. 4

                                                                                    You lose far fewer optimisations in a language that is not C. Unfortunately, in C, it is a very common idiom to use int as the type for a loop induction variable. Having to reason about wrapping breaks a lot of the analyses that feed vectorisation. In C++ or Rust, you typically use iterations, rather than raw indexes, and these iterations will use an unsigned type by default. Operating over the domain of positive integers with wrap to zero is much simpler than operating over the domain of signed integers with overflow wrapping to a large negative number and so the C++ and Rust versions of these loops are easier to vectorise. In C, using something like size_t as the type of the induction variable will often generate better code.

                                                                                    1. 2

                                                                                      Then… how about renouncing these optimisations, and tell everyone to update their code to use size_t so it is fast again? Because I sure resent compiler writers for having introduced critical vulnerabilities, and tell everyone to fix their programs so they are safe again…

                                                                                      I mean, sometimes the hoops I have to jump through… libsodium and Monocypher for instance can’t use arithmetic left shifts on signed integers at all. Instead of x << 26 we need to write x * (1<<26), and hope the compiler will be smart enough to generate a simple shift (thankfully it is). Reason being, left shifting negative integers is straight up undefined. No ifs, no buts, it’s undefined even when the result would stay within range.

                                                                                      1. 3

                                                                                        Then… how about renouncing these optimisations

                                                                                        That’s what PCC does. It renounces all of these optimisations and, as a result, generates much slower code. OpenBSD tried to use it for a while, but even they couldn’t put up with the poor performance (and OpenBSD generally favours security over performance). The market has shown time and time again that a C compiler that generates fast code will always be chosen over one that generates more predictable code for undefined behaviour.

                                                                                        It’s not like there aren’t alternative compilers that do what you claim to want, it’s just that no one (including you) actually wants to pay the performance cost of using them.

                                                                                        1. 3

                                                                                          The market has shown time and time again that a C compiler that generates fast code will always be chosen over one that generates more predictable code for undefined behaviour.

                                                                                          Gosh, I think our mutual employer provides a strong counter-example. The incentives of a standalone compiler vendor are very different to a vendor that uses the compiler to compile billions of lines of its own production code. Our compiler adds new security features at the expense of performance continually, and internal code is required to use them. IMHO these end up at the opposite absurd end of the spectrum, like default-initializing stack variables to ensure the undefined behavior on access becomes implementation defined, stack overflow buffer checks, etc. In addition to performance vs. security, there’s also a stronger emphasis on compatibility vs. performance; updating the compiler in a way that would defeat large numbers of existing security checks would come under a lot of scrutiny.

                                                                                          1. 2

                                                                                            I thought about MSVC’s interpretation of volatile as a counter example here (it treats it as the equivalent of sequentially consistent atomic, because that’s what a lot of internal legacy code assumed). But then I thought of all of the third-party project switching to using clang on Windows, including things like Chrome and, by extension, all Electron apps and realised that it wasn’t such a counter example after all. For a long time, MSVC was the only compiler that could fully parse the Windows headers, which gave it a big advantage in the market, now that clang can do the same, that’s eroding (I believe clang can now parse all of the Windows source code, but it can’t correctly codgen some large chunks and doesn’t generate code that is the shape expected by some auditing tools).

                                                                                            Alias analysis is another place where MSVC avoids taking advantage of undefined behaviour. Apple pushed for making -fstrict-aliasing the default and fixed (or encouraged others to fix) a huge amount of open source and third-party code, giving around 5% better performance across the entire system. MSVC does not take advantage of type-based alias analysis because doing so would break a lot of existing code that relies on UB. This is also pushing people who have code that does not depend on illegal type punning to use clang and get more performance.

                                                                                            Note that I am talking specifically about interpreting the standard with respect to UB to enable optimisations here. I see security flags such as /GUARD, stack canaries, InitAll, and so on as a different category, for three reasons:

                                                                                            • They are opt in, so you can ignore them for places where you know you’re sandboxing your program or where it isn’t operating on untrusted data.
                                                                                            • They make certain behaviour well defined, which makes it easier to reason about your source code. Not taking advantage of UB does not have this property: your program still depends on UB and may still behave in unexpected ways and your performance is now harder to reason about because it will vary hugely depending on whether, post inlining, you have enough hints in your source for the compiler to prove that the condition will not trigger.
                                                                                            • They, in general, don’t impede other optimisations. For example, InitAll combines with stead store elimination and typically can be elided by this optimisation (and Shayne did some great work to improve this). /GUARD is applied very late in the pipeline (I worked on the LLVM implementation of this so that we could have CFG for Rust and Objective-C), and so inlining and devirtualisation can significantly reduce the number of places where you need the check (MSVC has some very impressive profile-guided devirtualisation support, which helps a lot here). In contrast, things like not assuming that integer addition results in a larger number have a big knock-on effect on other optimisations.
                                                                                          2. 1

                                                                                            Well, there is renouncing a class of optimisations, and defining a class of behaviours. I don’t think those are the same. Which one PCC was trying? Did it define integer overflow and pointer aliasing etc. or did it disable dangerous looking optimisations altogether?

                                                                                            it’s just that no one (including you) actually wants to pay the performance cost of using them.

                                                                                            I put myself in a situation where I can actually cop out of that one: I tend to write libraries, not applications, and I ship source code. This means I have no control over the compilation flags, and I’m forced to assume the worst case and stick to strictly conforming code. Otherwise I would try some of them (most notably -fwrapv) and measure the impact on performance. I believe I would accept any overhead below 5%. But you’re right, there is a threshold beyond which I’d just try to be more careful. I don’t know for sure which threshold this is though.

                                                                                            1. 1

                                                                                              I tend to write libraries, not applications, and I ship source code. This means I have no control over the compilation flags

                                                                                              How’s that? Libraries would still come shipped with a build system to produce (shared) objects, right?

                                                                                              1. 1

                                                                                                Libraries would still come shipped with a build system to produce (shared) objects, right?

                                                                                                Not when this library is literally one C source file with its header, with zero dependency, and used on obscure embedded targets that don’t even have a standard library and I don’t know of anyway.

                                                                                                I do ship with a Makefile, but many people don’t even use it. And even if they did, they control $CFLAGS.

                                                                                                1. 1

                                                                                                  Ouch, that’s not an enviable situation to be in :S

                                                                                                  Too bad you can’t enforce some of those semantics using #pragma or something.

                                                                                                  1. 1

                                                                                                    Well, then again, I did it on purpose: sticking to standard C99 with zero dependency is how people ended up using it in those contexts. My work is used on a previously underserved niche, that’s a win.

                                                                                                    And in practice, I did one error of any consequence, and it was a logic bug, not anything to do with C’s UB. I did have a couple UB, but none ended up amounting to anything. (There again, it helps that my library does zero heap allocation.)

                                                                                  3. 6

                                                                                    Yes, that is exactly the by-design consequence of C UB. A single UB anywhere deep in your code could convert your computer into a giant whale or a potted plant.

                                                                                    1. 4

                                                                                      Yes. Writing code in C is a minefield, and I think people who write code in this language need to be aware of that.

                                                                                  4. 3

                                                                                    the assignment needed to be done early because I was targeting a codebase with some ancient C convention on variable declaration required to be before any code

                                                                                    If this is referring to C89 style, then you can declare a variable without assigning it:

                                                                                    int x;
                                                                                    if (...) { return -1; }
                                                                                    x = 123;
                                                                                    
                                                                                    1. 3

                                                                                      Yeah but I don’t like that for 2 reasons:

                                                                                      • 2 lines instead of 1
                                                                                      • I can’t do const int x = … anymore (and I like to use const everywhere because it helps the developer mental model about non-mutability expectations)
                                                                                  5. 4

                                                                                    Good observation. In C/C++ you are intended to check for valid preconditions before you perform an operation that relies on them. In Python and many others, there is a pervasive “look before you leap” idiom because there is no undefined behavior, either it behaves correctly or throws an exception, i.e. every operation is checked beforehand. Could be from an influx of folks into C/C++ from those languages.

                                                                                    For those who don’t understand, C/C++ does it this way because specifying “undefined behavior” allows you to assume that preconditions are valid without having to recheck them on every call, allowing the programmer to be more efficient with the CPU.

                                                                                    1. 3

                                                                                      In Python and many others, there is a pervasive “look before you leap” idiom because there is no undefined behavior, either it behaves correctly or throws an exception, i.e. every operation is checked beforehand.

                                                                                      I think “look before you leap” (LBYL) is the opposite of what you’re trying to describe. I’ve usually heard that described as “easier to ask forgiveness than permission” (EAFP).

                                                                                      1. 1

                                                                                        My mistake, I meant “leap before you look”

                                                                                    2. 1

                                                                                      Note that the order of operations doesn’t matter for UB. UB is not an event that happens. Instead, “UB can’t happen” is an assumption that the compiler is free to make, and then move or delete code under that assumption. Mere existence of any UB anywhere in your program, even in dead code that is never executed, is a license to kill for a C compiler.

                                                                                      1. 1

                                                                                        even in dead code that is never executed, is a license to kill for a C compiler.

                                                                                        No, unless you mean that it’s a license to remove the dead code (which the compiler can do anyway).

                                                                                        If code that would have undefined behaviour when executed is never executed, then it does not trigger the undefined behaviour (by definition).

                                                                                        1. 1

                                                                                          Whole-function analysis can have an effect that seems like UB going back in time. For example, the compiler may analyze range of possible values of a variable by checking its every use and spotting 2 / x somewhere. Division by 0 is UB, so it can assume x != 0 and change or delete code earlier in the function based on this assumption, even if the code doesn’t have a chance to reach the 2 / x expression.

                                                                                          1. 2

                                                                                            For example, the compiler may analyze range of possible values of a variable by checking its every use and spotting 1 / x somewhere, and then assume x != 0 and change or delete code based on that earlier in the function, even before execution has a chance to reach the 1 / x.

                                                                                            Yep, but if that 1 / x is in dead code it can’t affect assumptions that the compiler will make for live code. And if the 1 / x is in a particular execution path then the compiler can’t use it to make assumptions about a different path.

                                                                                            As an example, for:

                                                                                            if (x == 0) {
                                                                                                printf("x is zero!\n");    
                                                                                            }
                                                                                            
                                                                                            if (x == 1) {
                                                                                                printf("1/x = %d\n", 1 / x);
                                                                                            }
                                                                                            

                                                                                            … the compiler will not remove the x == 0 check based on division that occurs in the x == 1 branch. Similarly, if such a division appears in dead code, it can’t possibly affect a live execution path.

                                                                                            So:

                                                                                            even in dead code that is never executed, is a license to kill for a C compiler.

                                                                                            No.

                                                                                            (Edit, based on your edits): In addition:

                                                                                            Division by 0 is UB, so it can assume x != 0 and change or delete code earlier in the function based on this assumption,

                                                                                            Yes, if it is guaranteed that from the earlier code the 2 / x division must be subsequently reached, otherwise no.

                                                                                            even if the code doesn’t have a chance to reach the 2 / x expression.

                                                                                            No. As per above example, the compiler cannot assume that because something is true on some particular execution path it is true on all paths.

                                                                                            If what you were claiming was true, it would be impossible/useless to perform null checks in code. Consider:

                                                                                            if (p != NULL) {
                                                                                                *p = 0;
                                                                                            }
                                                                                            

                                                                                            If the compiler can assume that p is not NULL based on the fact that a store to *p exists, it can remove the NULL check, converting the above to just:

                                                                                            *p = 0;
                                                                                            

                                                                                            This is clearly different and will (for example) crash if p happens to be NULL. But a compiler can’t and won’t make that change: https://godbolt.org/z/hzbhqdW1h

                                                                                            On the other hand if there is a store that appears unconditionally on the same execution path it can and will remove the check, eg.

                                                                                            *p = 0;
                                                                                            if (p != NULL) {
                                                                                                printf("p is not null!");
                                                                                            }
                                                                                            

                                                                                            … for which both gcc and clang will remove the check (making the call to printf unconditional): https://godbolt.org/z/zr9hc7315

                                                                                            As it happens, neither compiler will remove the check in the case where the store (*p = 0) is moved after the if block, but it would be valid for them to do so.

                                                                                      2. 1

                                                                                        I think this is the core of the issue and why people are getting so fired up.

                                                                                        If you assume that integer operations are sent to the CPU in tact, and the CPU was made in the last 30 years, then checking for overflow after the fact is a single compare.

                                                                                        If you have to check for the potential for overflow beforehand, the comparison is much more involved. I was curious what it actually looks like and stumbled onto this which implements it in four compares (and three boolean evaluations.)

                                                                                        At some level, this whole conversation becomes a disagreement about the nature of bounds checking. If you assume bounds checking does not exist (or can be compiled away!) then you can exploit UB to optimize signed arithmetic to improve performance. If you assume bounds checking needs to exist, that UB exploit is a huge negative because it forces much more pessimism to put the bounds check back, making performance worse.

                                                                                        Then we end up with compiler builtins to perform signed arithmetic deterministically. This is odd because the UB optimization assumes that if the language spec doesn’t require something it’s not needed in an implementation, but the existence of the builtin suggests otherwise. The UB optimization assumes that there’s no value in having a predictable implementation defined behavior, but the builtin is a predictable implementation defined behavior.

                                                                                        1. 1

                                                                                          It’s the observation that most of the time overflow is a bug. If you wanted overflow semantics, you should have asked for it specifically. This is how e.g. Zig works.

                                                                                          1. 1

                                                                                            Avoiding the overflow requires a bounds check. I interpreted your earlier question being about why these bounds checks often create an overflow in order to perform the check (which is not a bug, it’s integral to the check.) There’s no standards compliant way to request overflow semantics specifically, so that option doesn’t really exist, and doing the check without an overflow is gross.

                                                                                            If the standard had provided a mechanism to request overflow semantics via different syntax, we probably wouldn’t have such intense discussion.

                                                                                            1. 1

                                                                                              I agree, not having a checked add or a two’s complement add is definitely a hole in the standard and should be fixed.

                                                                                      1. 4

                                                                                        It also sucks when open source maintainers get called things like “uncooperative” when they refuse to add features/fix bugs. Often those with the worst reputation use overly harsh language, but a lot of my favorite open source software is maintained by people who know how to say “no” (the Kitty terminal is one example).

                                                                                        1. 8

                                                                                          There are two superficially similar behaviours:

                                                                                          • Refusing to add feature X.
                                                                                          • Refusing to make it possible for people to use feature X.

                                                                                          There are lots of valid reasons for the first but good projects make it easy to maintain out-of-tree extensions that can add features that folks miss. SQLite is probably the best example of this. The author simultaneously refuses a large number of feature requests, while maintaining an extension interface that means that most of those features are easy to keep as plugins that you can add if you want.

                                                                                          LLVM is somewhat at the opposite extreme, changing internal APIs in ways the repeatedly break out-of-tree extensions. Some of these are important for evolving the codebase, others are due to what I think of as the Google engineering mindset: all important code is in the monorepo, if code is not in the monorepo then, by definition, it is not important and it’s fine to break it.

                                                                                          1. 3

                                                                                            LLVM is the middle. They don’t care if they break stuff that goes through the back doors. GCC is the opposite of SQLite, they make it difficult to integrate with on purpose for ideological reasons.

                                                                                            1. 5

                                                                                              LLVM is the middle. They don’t care if they break stuff that goes through the back doors

                                                                                              The problem is, with the exception of libclang, everything is ‘the back door’. We don’t have stable (even the kind of vaguely stable ‘we promise to deprecate for one release before we remove’) APIs that you can use for writing an out-of-tree front end, pass, back end, and so on (the C bindings are almost adequate for writing a front end). Even where we have plug-in interfaces, we make precisely zero guarantees that your code will work in six months.

                                                                                              This ends up being very bad for the project overall, because there’s a lot of out-of-tree downstream code. Most of this doesn’t bother tracking main, it waits for a release and then gets updated, sometimes skipping releases. As a result, we don’t get any benefit from the huge test coverage that this code would provide and instead get bug reports saying ‘something broke in the last year’. If we had more stable APIs, most of these downstream projects could do nightly CI runs with the latest LLVM and we’d get bug reports immediately after something broke.

                                                                                        1. 33

                                                                                          Hate to be the one to say it but didn’t enjoy the clickbait. Undefined behavior has been like this for at least 20 years.

                                                                                          If you are going to be doing arithmetic with untrusted user input it’s your responsibility to check ahead of time that it satisfies your program’s preconditions and doesn’t invoke undefined behavior. If you need help you can use the integer overflow builtins. __builtin_mul_overflow() would have helped you in this case.

                                                                                          1. 16

                                                                                            The only reason everyone is so aware of integer overflow being UB, is clang and gcc deciding UB allowed them to remove overflow related security checks from a bunch of critical software a decade or so ago, and then respond to the “you removed the overflow checks” with “well that’s UB and you were wrong to rely on the way all arithmetic works on all computers”. Understand that many of these security checks were based on the basics of how integers work on all computers, had been present for decades prior to gcc and clang deciding that removing them was acceptable.

                                                                                            There is no justification that ever warranted such overflow being UB, and the only reason it remains UB is for compilers to game compiler benchmarks. Today there is no hardware that exists where it can’t be defined as 2’s complement, but even before modern (e.g. last 30-40 years) hardware was not random, and the overflow could have been unspecified or implementation defined. There is not now, and more importantly, there never has been, any time where the result of overflow or underflow could not be defined by a compiler vendor.

                                                                                            1. 6

                                                                                              Compilers started doing this because people use signed variables as loop induction variables, with addition on every loop iteration, and a less-than check for termination. If you assume that signed overflow can’t happen, then you know that the loop must terminate. It is then amenable to scalar evolution, which helps drive vectorisation, which then gives a 4-8x speed up on some important workloads. Without that assumption, the sequence of loop iterations is not amenable to scalar evolution and you lose all of these benefits.

                                                                                              1. 5

                                                                                                I know that those for loops are the reason that C/C++ compilers have adopted this contorted view of what UB means. I believe that that is solely for benchmark gaming because the supposed benefit seems extraordinarily hard to get to happen.

                                                                                                For the UB to be beneficial I have to have a function that will compile meaningfully differently with clang or gcc with -fwrapv vs -fno-wrapv. The only example I have found from existing posts justifying the “integer overflow should be UB” that has any real difference in codegen is a bzip comparison function iirc which I butchered thusly:

                                                                                                template <typename IntType>  __attribute__((noinline))  int
                                                                                                cmp(IntType i1, IntType i2, unsigned char *buf)
                                                                                                {
                                                                                                    for (;;) {
                                                                                                        int c1 = buf[i1];
                                                                                                        int c2 = buf[i2];
                                                                                                        if (c1 != c2)
                                                                                                            return c1 - c2;
                                                                                                        i1++;
                                                                                                        i2++;
                                                                                                    }
                                                                                                }
                                                                                                

                                                                                                I can compile this with -fwrap or -fno-wrap and while there is some restricting (using indexed loads, etc) there is no performance difference.[*]

                                                                                                I firmly believe that keeping the “signed overflow is UB” nonsense is purely a benchmark game that does not offer actual performance wins in real world code, but has massive costs - both in terms of performance and security.

                                                                                                [*] There is a 2x difference in perf under rosetta 2, but assuming the same benchmarks being compiled then r2 may just have less work in the uncommon code you get from gcc and clang compiling x86_64 with -fwrapv.

                                                                                                1. 3

                                                                                                  The places where this shows up are usually after inlining and some local optimisations. They generally don’t show up in microbenchmarks because code that is simple enough to fit in a microbenchmark is amenable to more analysis, but that example is definitely not the shape of one that I would expect to see. The loops where this is usually a win are typically nested loops of the shape for (int i=x ; i<y ; i++). For each loop in the loop nest, you can reduce this to a range and then you can model this as an arithmetic, geometric, (and so on depending on the nesting depth) sequence. This analysis exposes vectorisation opportunities that are not present if one of the loops may be infinite (or just follow a weird pattern) as a result of signed overflow resulting in a negative number.

                                                                                                  As a rule of thumb, about 90% of compute time is spent in loop nests in a typical program, so you don’t need a bit speedup on loop nests to see some real-world improvements.

                                                                                                  I firmly believe that keeping the “signed overflow is UB” nonsense is purely a benchmark game that does not offer actual performance wins in real world code

                                                                                                  For this to be true, the following would have to be true:

                                                                                                  • It would have to make a measurable difference in one or more benchmarks.
                                                                                                  • Those benchmarks would have to be the only instances of that code pattern in CPU-bound code.

                                                                                                  Much as I dislike SPEC CPU, I think it’s very unlikely that this is the case. In my experience, these have been pushed by big companies that have run the modified compiler on large codebases and seen measurable improvements in performance-critical loops.

                                                                                                  1. 1

                                                                                                    I’ve also seen the argument of “optimizing” this

                                                                                                    for (int i=0; i < inputNumber; ++i) …
                                                                                                    

                                                                                                    into correctness (on 64-bit machines):

                                                                                                    for (size_t i=0; i < inputNumber; ++i) …
                                                                                                    

                                                                                                    Arguably, size_t was the correct integer natural number type all along and using an integer type for a natural number quantity is just asking for trouble.

                                                                                                  2. 4

                                                                                                    Is there any situation where the user couldn’t recover the optimisation by rewriting their loops a little bit?

                                                                                                    1. 2

                                                                                                      Probably not, but given the 10+ billion lines of C/C++ code in the world, who is going to pay for auditing it all, finding the bottlenecks introduced by these problems, and fixing them all?

                                                                                                      1. 3

                                                                                                        Sure. On the other hand, I’m thinking the same thing about all those integer overflow UB we haven’t yet discovered, for which -fwrapv haven’t yet been enabled.

                                                                                                    2. 1

                                                                                                      …Which is a good example of how C/C++ style signedness for integers is kinda bullshit. XD C-style for loops are a sneaky-good way to expose the problems with them; iterators or ranges are one possible way of making some of the problems go away. Use the right iterator and you know it will terminate.

                                                                                                    3. 2

                                                                                                      Unsigned integer overflow is well-defined in C. Only signed integer overflow is undefined. Whether signed integer overflow should be undefined or implementation defined behavior seems like it was a judgment call, though I’m sure the original C standards committee had a valid rationale at the time for making it undefined behavior. The hardware landscape in the 1970s and 1980s was a lot different from what it is now.

                                                                                                      For what it’s worth, C/C++ have always embraced the idea of UB. On the other hand, most mainstream contemporary languages fully define invalid constructs. If you’re coming from a such a language, where you are used to the paradigm that everything is well-defined, I can imagine that seeing code deleted during optimization is probably a surreal experience. Ideally people who are not familiar with C/C++‘s expectation that the programmer ensures that undefined things don’t happen absolutely should not casually write C/C++. In practice, due to a confluence of factors, that ideal will never be met. Fortunately Rust is emerging as an alternative lowish level language that has fully defined semantics compatible with what the average programmer coming from a language like Java/JavaScript expects.

                                                                                                      1. 8

                                                                                                        Unsigned integer overflow is well-defined in C. Only signed integer overflow is undefined.

                                                                                                        As I’ve said elsewhere, the only reason for this divergence is compiler benchmark games.

                                                                                                        More over, even in the 70s and 80s integer overflow did not produce random output on any hardware, and so should be either unspecified or implementation defined. Neither of which allow the compiler to just assume they can’t happen and make backwards assumptions from there.

                                                                                                        That said, there has not been any hardware not using 2’s complement now in a few decades, so any call to older hardware is kind of stupid, and the behavior should just be specified.

                                                                                                        For what it’s worth, C/C++ have always embraced the idea of UB.

                                                                                                        No, there is a vast difference between acknowledging that there are things the language allows you to do, that have unpredictable outcomes, and what blindly saying things are “undefined behavior means that the compiler can do whatever it wants”. The latter is a fairly contorted view of what the specification says that compiler devs started taking a few decades ago and immediately started introducing security vulnerabilities - specifically by removing overflow checks that had worked for decades, and worked on all systems.

                                                                                                        The problem is only partially that C/C++ erroneously uses “UB” for behaviors that can be unspecified or implementation defined, a bigger part of the problem is that compiler writers have taken “what happens when you do an operation X that triggers UB places no requirements on the compiler” as meaning “pretend X never happens, and remove code paths in which it does”.

                                                                                                        The only people with this expansive view of “Undefined Behavior” are the compiler authors. To get an idea of the absurdity:

                                                                                                        int i = f();
                                                                                                        if (i == 0) {
                                                                                                          logError("Whoops, we're dividing by zero\n");
                                                                                                        }
                                                                                                        int x = g() / i;
                                                                                                        

                                                                                                        By the rules compiler vendors invented, you will never get a message logged, because the only program that can possibly get that message has i==0 and that makes g()/i UB, which is invalid so can’t happen. Therefore i can never be 0 so that if() statement can be dropped.

                                                                                                        If you’re coming from a such a language, where you are used to the paradigm that everything is well-defined, I can imagine that seeing code deleted during optimization is probably a surreal experience.

                                                                                                        Alternatively if you’re used to writing code in C and C++, and know how computers work you may be a bit miffed at compilers actively breaking decades old code based on a highly questionable definition of what they are permitted to do, which is an interpretation that clang and gcc introduced and have pushed only over the last 10-15 years.

                                                                                                        We can acknowledge that the c and c++ compilers have taken this view, and write our code in the knowledge that c and c++ compilers are fundamentally an adversary, while continuing to be angry about said compilers making decisions that they know are against anything a developer has written.

                                                                                                        1. 6

                                                                                                          By the rules compiler vendors invented

                                                                                                          We can acknowledge that the c and c++ compilers have taken this view

                                                                                                          while continuing to be angry about said compilers making decisions that they know are against anything a developer has written

                                                                                                          With respect, I think you’ve got entirely the wrong end of the stick. No compiler developer has ever sat down, scowled at their monitor, and decided to make their compiler break people’s code. What to you looks like malice is actually entirely incidental to the process of writing optimising compilers.

                                                                                                          The ‘highly questionable’ behaviour you’re seeing is emergent: it appears out of the process of applying logical rules of inference on a system where the priors are reasonable and desirable, in much the same way that an artificial general intelligence might turn the world into paperclips if their goals are insufficiently constrained, despite this never being the objective of its programmer.

                                                                                                          There is no way to ‘solve’ this other than to either forego desirable & sensible optimisations, or to make the C standard define more behaviours. The hard logic of a compiler doesn’t compromise when held against the fuzzy intuitions humans have about what counts as ‘correct’, and there’s no way to solve this problem: the compiler cannot know what code you intended to write, it doesn’t have enough information! So each incremental optimisation pass will do something reasonable in isolation, and then suddenly - whoops! Your code is broken.

                                                                                                          Ralf Jung’s article about this emergent phenomena in relation to pointer provenance is very good: https://www.ralfj.de/blog/2020/12/14/provenance.html

                                                                                                          1. 7

                                                                                                            No compiler developer has ever sat down, scowled at their monitor, and decided to make their compiler break people’s code.

                                                                                                            But they did disregard the consequences of their actions. They wanted to enables some optimisations, saw that the standard let them get away with it, then proceeded to ignore bug reports warning them they were introducing vulnerabilities in old, reasonable-looking code.

                                                                                                            There is no way to ‘solve’ this other than to either forego desirable & sensible optimisations, or to make the C standard define more behaviours.

                                                                                                            Obviously C should define more behaviours. They don’t. I suspect one reason is the presence in the standard committee of compiler writers that just wouldn’t let go of their pet optimisations. I mean, yes, if we made -fwrapv the default, some existing programs would indeed be slower. But then, is there any such program that couldn’t recover their optimisations by modifying their loops a little bit? Right now I don’t think there are any.

                                                                                                            Compiler writers favour speed over correctness. When they introduced those new optimisations that broke old code because of UB, they had a simple choice: either assume -fwrapv and renounce some speed up on old programs, or do the UB thing and introduce critical vulnerabilities in generated code that weren’t there before. They chose the latter.

                                                                                                            I mean, it’s probably not explicit malice, but there is such a thing as criminal negligence.

                                                                                                            1. 2

                                                                                                              What’s the point of this reply? It feels like arbitrary finger-pointing to no end. If you don’t like how C is managed and what it’s designed to do, then… don’t use it? Rust is just there and solves the vast majority of these issues in practice, you know?

                                                                                                              1. 2

                                                                                                                You’ re being too kind to the standard committee. How do I put it… Call me entitled, but when you have such a global impact you have a moral obligation to world class results. And as difficult and gruelling their job may be, the standards committee is falling short.

                                                                                                                Rust is just there

                                                                                                                Not quite. First class support is limited to Windows, Linux, and MacOS. Tier 2 support looks better, but doesn’t even include OpenBSD. As promising as it is for application development on a limited number of very mainstream platforms, it stops working as soon as we aim for broader portability.

                                                                                                                If you don’t like how C is managed and what it’s designed to do, then… don’t use it?

                                                                                                                I can’t avoid C.

                                                                                                                I do wish there was a better way, but at this point the best I can hope for would be a language that compiles to C, in a way that avoids as many pitfalls as possible. In fact, I’m seriously considering the possibility of a “C2”, similar to Herb Stutter’s Cpp2.

                                                                                                                1. 1

                                                                                                                  Zig is going to be getting a backend that transpiles to pure C very soon. Given its already brilliant interop with C and its at least mostly sane semantics, perhaps that’ll suit your needs?

                                                                                                                  1. 1

                                                                                                                    It actually may.

                                                                                                          2. 1

                                                                                                            Unsigned integer overflow is well-defined in C. Only signed integer overflow is undefined.

                                                                                                            As I’ve said elsewhere, the only reason for this divergence is compiler benchmark games.

                                                                                                            Maybe I’m misunderstanding you but I don’t think that’s the case. At the time that C was first standardized, there were a few different signed integer representations used in mainstream architectures. Different unsigned integer representations didn’t really exist in mainstream architectures.

                                                                                                            That said, there has not been any hardware not using 2’s complement now in a few decades, so any call to older hardware is kind of stupid

                                                                                                            Not sure if this was targeted at me but if it was then it’s based on a misreading of my comment. I didn’t justify the existence of UB today on the basis of old hardware. My point was that in the 70s and 80s when C was in the process of being standardized, deciding on UB for signed integer overflow may have be a sensible thing to do given the hardware landscape at that time.

                                                                                                            “what happens when you do an operation X that triggers UB places no requirements on the compiler” as meaning “pretend X never happens, and remove code paths in which it does”.

                                                                                                            Logically the first statement clearly permits the second.

                                                                                                            In general I think your comments seem a bit angry and lacking an assumption of good faith on the actors who have shaped C from the past to the current day. There’s no reason to assume that the compiler writers or standards committees had or have malicious or antagonistic intentions. Everyone wants a secure software stack. There are just many different factors to consider and use cases to accommodate.

                                                                                                            1. 1

                                                                                                              Maybe I’m misunderstanding you but I don’t think that’s the case. At the time that C was first standardized, there were a few different signed integer representations used in mainstream architectures. Different unsigned integer representations didn’t really exist in mainstream architectures.

                                                                                                              C and C++ have the concept of implementation defined behavior, which is what describes “different forms of integer”

                                                                                                              Not sure if this was targeted at me but if it was then it’s based on a misreading of my comment. I didn’t justify the existence of UB today on the basis of old hardware.

                                                                                                              No, it wasn’t directed at you and I apologize that it came off as such.

                                                                                                              It was a response to the general “the claim the lack of standardization in the industry at the time warranted this being UB”, which is simply false. C and C++ both have the concept of “implementation defined”, which clearly would cover the wide variety of integer behaviors that exist. At the time classifying things as UB wasn’t a huge problem, but once compilers changed the interpretation of what UB allowed them to do the difference between ID and UB becomes a source of security vulnerabilities.

                                                                                                              Logically the first statement clearly permits the second.

                                                                                                              This is the language lawyering that people dislike - the language does not say “the compiler can replace any instance of UB with ‘system(“rm -rf /”)’ but we all understand that if a compiler did do that, the compiler would be wrong. The problem is the decision to take “no constraints” as “can do anything on any code path that invokes UB”, which is an interpretation that only works if you decide to ignore the context and focus only on dictionary interpretation of each word. This is flawed because that isn’t how human language actually works.

                                                                                                              In general I think your comments seem a bit angry and lacking an assumption of good faith on the actors who have shaped C from the past to the current day.

                                                                                                              I am fairly angry about this interpretation of UB, because it is wrong and has resulted in far more security vulnerabilities than it has meaningfully improved performance. I understand why compiler devs have taken this approach: compilers are compared almost exclusively on the basis of codegen performance and this interpretation helps benchmarks. They are not compared on the basis of security of their generated code (see the slew of security regressions created when they first adopted this definition of UB), a non-zero part of this is because who the heck knows how you would measure that?

                                                                                                              A lot of UB remains that has never had any real reason to be “undefined”: the spec has the concept of implementation defined for decades/ever? nonetheless definable behavior (even as “implementation defined”) has been classified as UB and so added totally unnecessary foot guns to a language that is already a footgun prone horror.

                                                                                                              I don’t think these decisions are intended to be malicious, but the result of those decisions is that if you are writing or maintaining C/C++ code you have to assume that the compiler is an adversary working against you whenever you are writing code that is security sensitive.

                                                                                                            2. 1

                                                                                                              This is a C/C++-specific problem. Reread the entire thread and pay attention to which languages are mentioned.

                                                                                                              1. 2

                                                                                                                Rust also potentially suffers from this in some cases, the main one being pointer provenance. It’s just less notable because they’re only really triggerable in unsafe.

                                                                                                            3. 2

                                                                                                              Whether signed integer overflow should be undefined or implementation defined behavior seems like it was a judgment call, though I’m sure the original C standards committee had a valid rationale at the time for making it undefined behavior.

                                                                                                              I’ve heard that at the time, some platforms went bananas upon signed integer overflow, so they made it undefined, perhaps without realising that compiler writers would eventually take that as a license to interpret that literally for all platforms, including the 2’s complement ones that comprise >99.999% of all computers.

                                                                                                              I think there’s a disconnect between the spirit and the letter of “undefined” here. And now compiler writers won’t give up their pet optimisations.

                                                                                                          3. 12

                                                                                                            Completely agreed. The entire post boils down to “I invoked undefined behavior and was surprised that something happened”. Undefined is in the name - you can’t expect anything!

                                                                                                            We could debate whether UB should continue this way or if we should handle it differently, but that’s an entirely separate discussion from what’s going on in the post.

                                                                                                          1. 11

                                                                                                            Many people have this impression, seemingly. I do not. I find that, for c, compared with clang, gcc compiles faster, gives better diagnostics, and generates code of comparable quality. IMO, it is the only good gnu/redhat product.

                                                                                                            (I am given to understand that clang has better diagnostics and standard support for c++. I avoid c++ like the plague, so I cannot speak to that.)

                                                                                                            1. 1

                                                                                                              As I understand, GCC generated better code than Clang for C++ for a very long time. Inlining is more important for C++ than C, and GCC had a better inliner, not just due to better tuning, but due to better architecture. LLVM fixed its architecture (see The New Pass Manager, GCC basically always had this) and I think it is now competitive in 2020s, but it wasn’t in 2010s.

                                                                                                              1. 1

                                                                                                                GCC’s inlined was not necessarily better, it simply worked in the opposite direction. One worked top down, the other bottom up (I forget which is which). This led to different failure modes and, importantly, did much worse on code that had been tweaked to perform well with GCC. LLVM dog foods a lot of these optimisations and the LLVM codebase itself is a large C++ project.

                                                                                                                The new pass manager does not change the inlining policy.

                                                                                                                1. 1

                                                                                                                  Doesn’t gcc have a partial inliner?

                                                                                                                  LLVM dog foods a lot of these optimisations

                                                                                                                  I do recall hearing that gcc and llvm are both fastest when self-compiled, a fact which will eternally tickle my fancy.

                                                                                                                  1. 1

                                                                                                                    Doesn’t gcc have a partial inliner?

                                                                                                                    I believe LLVM has at least one. I remember reviewing some of the patches 6-7 years ago, not sure when they landed. I don’t remember if either of the outliners landed in tree.

                                                                                                            1. 32

                                                                                                              To give a background why is that: UB is not an event that happens in the program. Instead, “UB is impossible” is an assumption that is baked into the compilers.

                                                                                                              The compilers don’t have if is_UB() { screw_you() } anywhere. The no-UB assumptions can be implicit and pervasive. Like an assumption that opposite of true is false, so if (x) {do} and if (!x) {} else {do} are equivalent. If you magically made a 3-valued bool with filenotfound, then it’s not merely triggering some else { lol() } code path in the optimizer. Instead, it makes all the assumptions about boolean algebra invalid and full of impossible-to-reason about paradoxes. And it’s hard to warn about this, because UB is not an event or a code path. The compiler would have to warn about every non-issue like “here I’ve assumed that the opposite of false is true”.

                                                                                                              The second reason is dealing with complexity of optimizations. It’s too difficult to optimize code in the high-level form you write it. There are too many tree structures to work with, too many scopes, too many constraints to preserve. Instead, the code is “lowered” to a simpler assembler-like representation that no longer resembles the source code. In this form programmer’s intention is hard to see, so it’s no longer obvious what should and shouldn’t be optimized out.

                                                                                                              To manage the complexity, optimizations are applied to low-level code in multiple independent passes. This way instead of having to consider all of the C spec for any change, each pass can focus on one thing at a time, e.g. “can I change !!x to x”. There’s a pass that simplifies arithmetic, there’s a pass that checks which expressions are always true, and there’s a pass that removes unreachable code. They can end up removing your null or overflow checks, but they don’t mean to — each pass does its own thing that in isolation seems sensible, spec-compliant, and is useful for most of the code.

                                                                                                              And finally, majority of the optimizers in C compilers are shared across all architectures. They evaluate code as if it ran on the hypothetical portable machine that the C spec describes, not your particular CPU. So it doesn’t matter that your CPU knows how to overflow a signed int or has caches coherent across threads. The low-level virtual machine the optimizer optimizes for doesn’t have that, and code under optimizer will “run” in the C way instead.

                                                                                                              1. 16

                                                                                                                Instead, it makes all the assumptions about boolean algebra invalid and full of impossible-to-reason about paradoxes

                                                                                                                This is a great explanation, but substitute ‘integer’ for ‘boolean’ and you get why C compilers really like that signed integer overflow is undefined. If you write something that loops performing some addition on an integer, then you have an induction variable that can be modelled as an arithmetic sequence. If you have a nested loop, then the induction variable can be modelled as a geometric sequence. This model is completely wrong in every regard if adding two positive values together can result in a negative number. Having to understand this case precludes any transform from making use of this information. This precludes a large number of loop optimisations.

                                                                                                                1. 4

                                                                                                                  This model is completely wrong in every regard if adding two positive values together can result in a negative number

                                                                                                                  But isn’t that just as true of unsigned integer overflow? You don’t unexpectedly get a negative sum, but you do get one that’s completely wrong.

                                                                                                                  1. 7

                                                                                                                    Signed overflow is UB so compilers get to pretend it can’t happen. The lack of UB on unsigned integer overflow means that many compilers don’t optimize unsigned arithmetic nearly as aggressively, for precisely the reason you point out. The difference is that compilers aren’t allowed to ignore the possibility of unsigned overflow since it’s well-defined and allowed to happen.

                                                                                                                2. 9

                                                                                                                  My favorite example of undefined behavior: creating a pointer into the stack and messing around with its contents. In C or unsafe Rust or something like that, compiler has literally no way of preventing you from doing this, and no way of detecting that it has occurred. It’s not even “this is on you to make it safe” territory, because the compiler is allowed to assume the contents of the stack doesn’t change out from underneath it… it may juggle local variables in and out of registers or something and, within the bounds of the calling convention, you get make zero assumptions about how it works. Just ‘cause one compiler puts a particular local var at [rsp+24] or whatever doesn’t mean it will be there the next time you compile the program.

                                                                                                                  Yes, you can do it. You can even make it work, if you know exactly how the compiler works and the compiler doesn’t change. But you’re breaking the invariants that the compiler fundamentally uses to function, and it is incapable of noticing that you are doing it.

                                                                                                                  1. 1

                                                                                                                    My favorite example of undefined behavior: creating a pointer into the stack and messing around with its contents.

                                                                                                                    Wait, what? How do you create such a pointer, exactly?

                                                                                                                    You’d have to use assembly code I assume? Or some library that uses assembly code?

                                                                                                                    1. 3

                                                                                                                      Wait, what? How do you create such a pointer, exactly?

                                                                                                                      You’d have to use assembly code I assume? Or some library that uses assembly code?

                                                                                                                      The alloca function returns a pointer to memory allocated on the stack! I’m surprised it hasn’t been mentioned in the thread yet.

                                                                                                                      https://www.man7.org/linux/man-pages/man3/alloca.3.html

                                                                                                                      1. 2

                                                                                                                        The alloca function returns a pointer to memory allocated on the stack! I’m surprised it hasn’t been mentioned in the thread yet.

                                                                                                                        D’oh! Of course, how didn’t I remember that? Thanks!

                                                                                                                        Edit: goes to show how often that function is used, hah!

                                                                                                                      2. 3

                                                                                                                        …You mean playing around with the stack isn’t the first thing everyone does when learning how C works?

                                                                                                                        int y_hello_thar() {
                                                                                                                            int a;
                                                                                                                            int *b = &a;
                                                                                                                            b += 24;
                                                                                                                            return *b;
                                                                                                                        }
                                                                                                                        

                                                                                                                        https://godbolt.org/z/GYzq87G4s

                                                                                                                        (Update: Notably, if you pass -O3 to most versions of gcc in godbolt it will generate the code you expect, but clang will almost invariably nope out and just return 1 or 0 or nothing at all. Fun!)

                                                                                                                        1. 3

                                                                                                                          That’s not what I imagined a pointer into the stack would be from your description.

                                                                                                                          That’s a pointer into a local variable. The compiler wouldn’t even allocate stack space if you compile with optimizations.

                                                                                                                          Pointers to local variables are not UB.

                                                                                                                          Edit: What is UB in this case is that you incremented “b” outside the array bounds (a pointer to a single object is considered to be a pointer into an array of 1 element).

                                                                                                                          Edit 2: You can freely use such pointers while the compiler passes variables into and out of registers and your program will still work just fine. But your pointer is not allowed to go outside the bounds of the array (except to point one-past the array, but then you can’t dereference it).

                                                                                                                          1. 2

                                                                                                                            You are correct, the UB in this case is incrementing the pointer out of its bounds. The thing is that a local variable (edit: usually) has to exist on the stack if you have a pointer to it that does arithmetic. So the easy way to get a pointer to a random place on the stack is to start from a local variable. You can also have a function that returns a pointer to a local var, you can start in the data segment or heap, or you can just make a good guess of where your stack is and hope ASLR doesn’t ruin your fun.

                                                                                                                            Or if you really want to go hard, you can ask a user to type in a number and use that as an address. Who knows what they’ll type in? The compiler sure doesn’t.

                                                                                                                            1. 3

                                                                                                                              The thing is that a local variable has to exist on the stack if you have a pointer to it that does arithmetic.

                                                                                                                              What? No, that’s not true. See for yourself: https://godbolt.org/z/PvfEoGTnc

                                                                                                                              So the easy way to get a pointer to a random place on the stack is to start from a local variable.

                                                                                                                              No, you don’t necessarily get a pointer to a random place on the stack if you do that. The local variable or array may not even exist on the stack.

                                                                                                                              If you go outside the object/array bounds with pointer arithmetic, you get UB which does not have to (and many times won’t) translate into a random pointer, it will just mean that your code will break (and it can break in many different ways).

                                                                                                                              You should really, really, carefully read @kornel’s top post to understand how UB works.

                                                                                                                              You can also have a function that returns a pointer to a local var

                                                                                                                              That’s UB but again, you won’t necessarily get a random pointer to the stack, as the variable may not even exist on the stack. And even if it does exist, your code can break in completely different ways than getting a random pointer to the stack.

                                                                                                                              you can start in the data segment or heap

                                                                                                                              You’d get UB if your pointer goes outside of a malloc()-allocated area, but again, you wouldn’t necessarily get a pointer value. You’re making the exact same false assumptions that @kornel talks about.

                                                                                                                              Your code could simply be completely optimized out, or it may cause other very strange effects that have nothing to do with pointers.

                                                                                                                              or you can just make a good guess of where your stack is and hope ASLR doesn’t ruin your fun.

                                                                                                                              Even if you knew exactly where your stack is, I don’t think there is a way to deterministically create such a pointer without implementation-specific behavior (e.g. assembly code).

                                                                                                                              Edit: updated Godbolt link due to a typo in the code.

                                                                                                                              1. 1

                                                                                                                                Or if you really want to go hard, you can ask a user to type in a number and use that as an address. Who knows what they’ll type in? The compiler sure doesn’t.

                                                                                                                                I believe converting an arbitrary integer into a pointer is also UB and you don’t necessarily get a pointer like you think you will, your code can just completely break in lots of different ways (including simply being optimized out).

                                                                                                                                You’d have to enable some kind of special implementation-specific compiler option to do something like that without UB.

                                                                                                                                What is not UB, I believe, is to e.g. pass a pointer value into an intptr_t variable and then back into a pointer without changing the value.

                                                                                                                                1. 2

                                                                                                                                  I believe converting an arbitrary integer into a pointer is also UB…

                                                                                                                                  And yet this is also how just about every device driver under the sun works.

                                                                                                                                  1. 1

                                                                                                                                    And yet this is also how just about every device driver under the sun works.

                                                                                                                                    Because of this:

                                                                                                                                    You’d have to enable some kind of special implementation-specific compiler option to do something like that without UB.

                                                                                                                                    Edit:

                                                                                                                                    Correction: technically speaking, it seems that the integer-to-pointer conversion itself is not UB if you use an explicit cast, but the C standard says the result is implementation-defined, so the code can behave differently depending on which compiler (or compiler version, or compiler flags) you’re using.

                                                                                                                                    In fact, the C standard says that the resulting value can be a trap representation, which means that it could trigger UB as soon as you tried to do anything with that pointer.

                                                                                                                                    Even if your implementation allows doing that and even if the integer is a valid memory location, I would imagine it would still be possible to trigger UB by violating other constraints, like creating a duplicate copy of another pointer of a different type (which could violate strict aliasing constraints), or creating a pointer with the wrong alignment, or calling free(ptr) if the memory location hadn’t been allocated with malloc() before, etc.

                                                                                                                                  2. 1

                                                                                                                                    I believe converting an arbitrary integer into a pointer is also UB

                                                                                                                                    The int-to-ptr operation is actually fundamentally implementation-defined. So you won’t find the UB rules for ptr-to-int and int-to-ptr in the standard, you need to look at the LLVM LangRef instead.

                                                                                                                                    1. 1

                                                                                                                                      The int-to-ptr operation is actually fundamentally implementation-defined.

                                                                                                                                      Yes, I mentioned that in my other reply.

                                                                                                                                      But note: it is undefined behavior if you don’t use an explicit cast, e.g. if you do this: int *a = 4, then it is UB.

                                                                                                                                      1. 2

                                                                                                                                        Why is it not a compile error?

                                                                                                                                        1. 2

                                                                                                                                          It is a compile error:

                                                                                                                                          https://godbolt.org/z/Gq8qbW559

                                                                                                                                          But that’s on GCC (probably clang does the same). There might be C compilers that are C-standard-compliant that might not generate an error.

                                                                                                                                          If you are asking why doesn’t the C standard specify that such conversions are invalid rather than UB? No idea!

                                                                                                                                          Edit: oh wait, I made a mistake! The above error is when compiling in C++ mode (the Compiler Explorer’s default, annoyingly). In C mode, it’s not an error, it’s a warning by default:

                                                                                                                                          https://godbolt.org/z/rP3KoGbx3

                                                                                                                                          That said, you can transform that into an error by adding -Werror to the compiler flags, of course.

                                                                                                                                          1. 1

                                                                                                                                            I don’t think this is the kind of error you’re expecting to see - if you change the code to int *a = (int *)4 it should compile without warning. I don’t know if this is even supposed to be undefined behaviour - when talking to devices you need to know the exact memory address “as a number” and write to it or read to it (think VGA video memory addressing and such), which is or was traditionally often done in C under DOS in real mode, for example.

                                                                                                                                            Although it might be that this has sort-of been retconned into “always having been wrong” like relying on hardware behaviour on integer overflow has been.

                                                                                                                                2. 1

                                                                                                                                  Local variables are generally allocated on the stack. In some cases, it may be able to not allocate memory at all and keep the local variable purely in a register, but any time you mess around with pointers to local variables – especially when those pointers are passed across non-inlined function boundaries – that pointer is to the stack-allocated local variable.

                                                                                                                                  With optimizations disabled, you can look at the assembly code and stack frame and verify that ‘a’ is allocated on the stack and ‘b’ is a pointer to that stack-allocated variable in icefox’s example. Here’s a slightly modified example where the pointer is passed across a function boundary to be dereferenced: https://godbolt.org/z/x8zEKEr1E

                                                                                                                                  1. 1

                                                                                                                                    Yes, I know that, but @icefox was talking about “creating a pointer to the stack”, but as far as I can see there is no way to create a pointer to the stack in the C language, or at least no portable way to do it (i.e. without relying on implementation-defined behavior).

                                                                                                                                    The C18 standard, as far as I can see, doesn’t even have the word “stack” in the 535-page document!

                                                                                                                                    Someone else mentioned alloca() but even that is not part of any standard.

                                                                                                                                    Then he said:

                                                                                                                                    In C (…) [the] compiler has literally no way of preventing you from doing this,

                                                                                                                                    Which seems to make little sense because if you’re relying on implementation-defined behavior (which has to be documented) to create a pointer to the stack, then that means that the compiler has already promised to give you a pointer to the stack. So of course it can’t prevent you from messing around with it.

                                                                                                                                    It’s not like you “tricked” the compiler into giving you such a pointer, usually it’s actually the opposite: the compiler is the one who tricks you into thinking that you have a pointer to the stack when in fact the local variables (and even arrays) that the pointer points to, may only exist in registers or, since all the values of the variable might be possible to determine at compile time, may not exist at all!

                                                                                                                                    Edit: That said, yes, I understand that if you rely on implementation-defined behavior then it is possible to deterministically create such a pointer (e.g. with alloca()) and mess around with it, such as corrupting the stack and causing UB. Which is what I meant when I said that you’d have to call some (platform-specific) library function to do that, it’s not like there’s a construct in the C language that says: “this is how you create a pointer to the stack”.

                                                                                                                                    Edit 2: clarified the comment a bit.

                                                                                                                                    1. 1

                                                                                                                                      You can’t find the word “stack” in the C standard because the C standard indeed has no concept of a “stack”. As far as the standard is concerned, local variables have automatic storage and are alive until the function returns, variables allocated with malloc and the like have allocated storage and are alive until they are freed with free.

                                                                                                                                      However, the common implementations have a chunk of memory they call “the stack”, where each function gets a sub-chunk called a “stack frame” which is where the compiler puts, among other things, the variables with automatic storage (except for those variables which it optimizes out or keeps in registers only). Therefore, when you take the address of a variable with automatic storage, the implementation will give you an address to the place in the function’s stack frame where that variable is kept; you have a pointer to somewhere in the stack.

                                                                                                                                      There is some mixing of abstraction levels here. You never have a “pointer to a variable allocated on the stack” in terms of the C abstract machine, you have a “pointer to a variable with automatic storage”. But when we lower that into the level of concrete machine code for some operating system, we can see that in the right circumstances, our “pointer to a variable with automatic storage” gets lowered into a “pointer to a chunk of memory in the function’s stack frame”. So we can create an example program which proves @icefox’s point (namely that the “compiler has literally no way of preventing you from doing this, and no way of detecting that it has occurred”):

                                                                                                                                      a.c:

                                                                                                                                      // This function assumes it gets a pointer to a chunk of memory with at least 5 ints, and sets the 5th to 0.
                                                                                                                                      // There is nothing at all wrong with this function, assuming the pointer does indeed point to 5+ mutable ints.
                                                                                                                                      // There is no way for the compiler to detect that anything wrong is going on here.
                                                                                                                                      void clear_fifth_int(int *ints) {
                                                                                                                                          ints[4] = 0;
                                                                                                                                      }
                                                                                                                                      

                                                                                                                                      b.c:

                                                                                                                                      void clear_fifth_int(int *ints);
                                                                                                                                      
                                                                                                                                      // This function creates a local variable 'x', which the implementation will put on the stack,
                                                                                                                                      // then calls 'clear_fifth_int' with a pointer to 'x'.
                                                                                                                                      // There is nothing at all wrong with this function, *if* the 'get_fifth_int' function only dereferences the
                                                                                                                                      // pointed-to int, so there's no way for the compiler to detect that anything wrong is going on here.
                                                                                                                                      // However, we know that clear_fifth_int will touch whatever happens to be 4*sizeof(int) bytes after 'x'
                                                                                                                                      // on the stack.
                                                                                                                                      void stupid() {
                                                                                                                                          int x = 10; // x ends up allocated on the stack
                                                                                                                                          clear_fifth_int(&x); // &x gets a pointer to somewhere in the stack
                                                                                                                                      }
                                                                                                                                      

                                                                                                                                      No individual function in this example does anything wrong, but in combination, they mess around with the stack in ways which the implementation can’t anticipate.

                                                                                                                                      1. 1

                                                                                                                                        No individual function in this example does anything wrong, but in combination, they mess around with the stack in ways which the implementation can’t anticipate.

                                                                                                                                        You don’t actually know if they mess around with the stack, they may very well not. With link-time optimization all of that code could be inlined together and no pointers to the stack would even be created.

                                                                                                                                        What you are saying is actually a result of the following:

                                                                                                                                        1. The pointer points to a single object, rather than a sufficiently-large array of objects. Which results in UB when you do pointer arithmetic past the bounds of the array.

                                                                                                                                        2. Or, in other situations, it could result in UB due to dereferencing a pointer past the lifetime of the object (which could happen both for automatic storage and for heap-allocated storage).

                                                                                                                                        It has nothing to do with “the stack” or a “pointer to the stack”. It is simply the result of having “a pointer” (to some non-permanent storage).

                                                                                                                                        You are assuming that a specific implementation detail will happen. How do you know that’s what will happen? Does your C compiler or some other standard guarantee that? I assume the compiler documentation or a calling convention such as the System V ABI may be able to guarantee that. But if it does guarantee it, how is that “the implementation not anticipating”?

                                                                                                                                        Edit: API -> ABI

                                                                                                                                        1. 1

                                                                                                                                          I don’t understand what you are disagreeing with me about. Clearly, we have some C code, where all translation units are correct in isolation, but in common implementations, their combination might (and likely will, but is not guaranteed by any standard to) result in machine code which messes with values on the stack in a way the compiler can’t account for. That much is inarguable, and I don’t see you claiming otherwise; so where is the contention?

                                                                                                                                          1. 1

                                                                                                                                            Clearly, we have some C code, where all translation units are correct in isolation but where their combination might (and likely will) result in machine code which messes up values on the stack in a way the compiler doesn’t expect. That much is inarguable, and I don’t see you disagreeing with it

                                                                                                                                            Indeed, that is correct.

                                                                                                                                            so where is the contention?

                                                                                                                                            The contention is in the following two points:

                                                                                                                                            • In suggesting that “In C (…) the compiler has literally no way of preventing you from doing this, and no way of detecting that it has occurred”, where “this” is creating a pointer to the stack and messing around with the stack.

                                                                                                                                            I think that phrase is a bit misleading, because:

                                                                                                                                            1. The C language has no construct to create a pointer to the stack, which means you’re either:
                                                                                                                                            2. Relying on implementation-defined behavior, which means that the compiler is actually giving you such a pointer on purpose
                                                                                                                                            3. Or it means there’s no way to deterministically create such a pointer, because the pointer may not even exist, i.e. the compiler may optimize it out.

                                                                                                                                            Which means that the compiler can prevent you from doing that. And if it can’t, it’s probably because you’re relying on the calling convention which is implementation-defined behavior, therefore documented and the compiler would be doing it on purpose.

                                                                                                                                            1. Or it means you’re relying on some library function or assembly code to create such a pointer, which is what I thought he meant and why I asked him to clarify as it’s not very common to do this!

                                                                                                                                            The other contention point, which I admit is (even more?) pedantic, is:

                                                                                                                                            • In saying that all of this is specific to “pointers to the stack” when it is actually just a result of having pointers and going outside their bounds or the lifetime of the storage they point to, which applies to all types of pointers.

                                                                                                                                            That said, yes, I understand that when going outside the bounds of a pointer (for example) to heap storage is unlikely to corrupt the stack (even though it’s possible) and that stack corruption is a very particular type of corruption which usually has very… interesting side-effects!

                                                                                                                                            So in conclusion, I think we are in agreement with regards to the underlying issues, I just disagree with the phrasing and particularly, how it appears to be suggesting that the compiler is powerless to prevent it, when in fact you’re either breaking the invariants of the language itself, or the compiler is actually going out of its way to help you do that, on purpose (just in case you like to shoot yourself in the foot)!

                                                                                                                                            Edit: “external library” -> “library”

                                                                                                                                            Edit 2: in other words, you haven’t really forced the compiler to give you a pointer to the stack, it was the compiler who decided to give that to you, and it could just as easily have decided to do something else if it wanted (proof: the existence of conforming C compilers for architectures that don’t even have a stack).

                                                                                                                                            1. 1

                                                                                                                                              So you would be completely happy with:

                                                                                                                                              If the compiler has a concept of what we would conventionally call a “stack”, and it has put a variable on the stack, the compiler has no way to prevent you from taking a pointer to that variable and messing around with other stuff that’s also on the stack and no way of detecting that it has occurred, assuming the compiler implements pointer arithmetic in the way all the common implementations do.

                                                                                                                                              Personally, I think the original comment was clear enough, adding all those caveats just muddies the meaning IMO. But I won’t disagree that it’s assuming a few things which aren’t strictly guaranteed by the C standard.

                                                                                                                                              […] I just disagree with the phrasing and particularly, how it appears to be suggesting that the compiler is powerless to prevent it, when in fact you’re either breaking the invariants of the language itself, or the compiler is actually going out of its way to help you do that, on purpose […]

                                                                                                                                              Well, you are breaking the invariants of the language, but I don’t understand how that means the compiler can prevent it. The compiler gives you a pointer to a variable which the complier has decided to put on what it calls “the stack”. The language has features which make compile-time tracking of where pointers come from impossible. The language has features which lets you do arithmetic on pointers. Given those facts, if a compiler is implemented in such a way that there exists a stack which contains stuff like the return address and local variables, the compiler is powerless to prevent you from messing with stuff the compiler has to assume is left unchanged.

                                                                                                                                              I agree that a compiler might be implemented differently, it might store the return address stack separately from the variable stack, it might allocate every local variable in a separate heap allocation and automatically free them when the function returns, it might do runtime checking of all pointer dereferences, or any number of other things. But under the assumption that the implementation works roughly like all the major C implementations currently do, everything icefox said is correct.

                                                                                                                                              1. 1

                                                                                                                                                If the compiler has a concept of what we would conventionally call a “stack”, and it has put a variable on the stack, the compiler has no way to prevent you from taking a pointer to that variable and messing around with other stuff that’s also on the stack and no way of detecting that it has occurred, assuming the compiler implements pointer arithmetic in the way all the common implementations do.

                                                                                                                                                This phrasing is also wrong. The compiler can do all of those things and yet the pointer is not a pointer to the stack, it’s a pointer to the variable (which may currently be stored in a register or even may only exist as a constant in this exact moment, even though there’s an allocation for the variable in the stack).

                                                                                                                                                For example if you have this code:

                                                                                                                                                void some_function(int *y, bool condition) {
                                                                                                                                                    *y = *y + 5;
                                                                                                                                                 
                                                                                                                                                     if (condition) {
                                                                                                                                                         y += 100000;
                                                                                                                                                         *y = 20;
                                                                                                                                                     }
                                                                                                                                                }
                                                                                                                                                
                                                                                                                                                void foo() {
                                                                                                                                                    int x[100] = {0};
                                                                                                                                                
                                                                                                                                                    int *y = &x[50];
                                                                                                                                                
                                                                                                                                                    some_function(y, false);
                                                                                                                                                
                                                                                                                                                    printf("%d", *y);
                                                                                                                                                
                                                                                                                                                    (...)
                                                                                                                                                }
                                                                                                                                                

                                                                                                                                                … it may not do what you think it does. Why? Because the compiler is allowed to transform that code into this equivalent code:

                                                                                                                                                int some_specialized_function() {
                                                                                                                                                    return 5;
                                                                                                                                                }
                                                                                                                                                
                                                                                                                                                void foo() {
                                                                                                                                                    int x[100] = {0};
                                                                                                                                                
                                                                                                                                                    int z = some_specialized_function();
                                                                                                                                                
                                                                                                                                                    printf("%d", z);
                                                                                                                                                
                                                                                                                                                    x[50] = z; // this line may or may not exist, e.g. if x[50] is never used again
                                                                                                                                                
                                                                                                                                                    (...)
                                                                                                                                                }
                                                                                                                                                

                                                                                                                                                So even if there’s an allocation for x[] in the stack frame for some reason (let’s say some further code manipulates the array), you think that y is a pointer to the stack, but it may very well not even exist as such, even if the code for some_function hasn’t been inlined!

                                                                                                                                                And even if you call some_function(y, true) instead of false, you think you tricked the compiler into corrupting the stack, but the compiler can clearly see that y points to &x[50] so increasing y by 100000 is absurd and the entire if block of the specialized some_function() implementation can be removed.

                                                                                                                                                So yes, the compiler can prevent you from messing around with the stack even in that case.

                                                                                                                                                The compiler gives you a pointer to a variable which the complier has decided to put on what it calls “the stack”.

                                                                                                                                                Yes, you said it very well here, even though that doesn’t always happen. The compiler has decided. You haven’t forced it to do that. The compiler could have decided something else. It was an implementation choice, which means the compiler could have prevented it.

                                                                                                                                                I agree that a compiler might be implemented differently, it might store the return address stack separately from the variable stack, it might allocate every local variable in a separate heap allocation and automatically free them when the function returns, it might do runtime checking of all pointer dereferences, or any number of other things.

                                                                                                                                                Exactly! Which means that the compiler can actually prevent you from doing the things icefox said.

                                                                                                                                                But under the assumption that the implementation works roughly like all the major C implementations currently do, everything icefox said is correct.

                                                                                                                                                Well, it’s not possible for me to prove this, but to be honest, I think icefox had a misconception, as evidenced by the fact that (before he edited it) his reply to my comment said that just doing pointer arithmetic on a pointer to a local variable means that the compiler is forced to give you a pointer to the stack, when this is not true: compilers often optimize such pointers out of the code completely.

                                                                                                                                                And you know, this entire debate could have been avoided if he just said “it’s often possible to corrupt the stack [perhaps by doing such and such], which has all these interesting effects” rather than suggesting that C compilers have some underlying limitation and are powerless to prevent you from messing with what they do.

                                                                                                                                                But you know, apart from that, I think we are in agreement!

                                                                                                                                                Edit: renamed function name to clarify the code.

                                                                                                                          2. 5

                                                                                                                            So it doesn’t matter that your CPU knows how to overflow a signed int.

                                                                                                                            I would say, it doesn’t matter that virtually all CPU in active use since… say the start of the 21st century, know how to overflow a signed int. Some platforms somewhere that existed at some point in time don’t know how to overflow signed integers, and so the standard committee thought it was a good idea to mark it “UB”, probably for portability reasons, and the compiler writers later thought it was a good idea to interpret it literally, for optimisation reasons.

                                                                                                                            This is where the standard committee should really have said something along the lines of:

                                                                                                                            • Signed integer overflow is implementation defined…
                                                                                                                            • except on platforms that produce non-deterministic results, in which case it’s unspecified…
                                                                                                                            • except on platforms that fault, in which case it aborts the program, whatever that should mean…
                                                                                                                            • except on platforms that fall into an inconsistent state, in which case it’s Undefined Behaviour™.

                                                                                                                            Sure, the specs are wordier this way, but that would have gotten rid of a huge swath of security vulnerabilities. And while some code would be slower, compiler writers would have told everyone they should use unsigned size_t indices for their loops.

                                                                                                                            But no, they didn’t define behaviour depending on the platform. If something is undefined in one supported platform, no matter how niche, then it’s undefined in all supported platforms. Just so it could be a little bit faster without requiring users to touch their source code.


                                                                                                                            My understanding of undefined behaviour is now as follows: compilers will, in the face of undefined behaviour, generate code that under the “right” conditions causes your hard drive to be encrypted and print a ransom message. The most useful threat model of a compiler is that of a sentient adversary: if there’s any UB in your code, it will eventually find it, and cause as much harm as it can up to and including causing people to die.

                                                                                                                            A plea to compiler writers: please rein that in?

                                                                                                                            1. 8

                                                                                                                              (post author here)

                                                                                                                              Unfortunately, it’s not that easy.

                                                                                                                              The compiler isn’t out to get you, and generally won’t include disk-encrypting code into your program if it wasn’t there before. The line about playing DOOM on UB is meant as humor and to demonstrate that it’s still not a compiler bug if this happens. But in reality, it isn’t actually going to happen in GCC, or Clang, or rustc, or any other compiler that wasn’t purpose-built to do that. It’s not guaranteed to not happen, but compilers are in practice not written to maliciously manufacture instructions out of whole cloth just to mess with you on purpose.

                                                                                                                              But at the same time, providing guarantees about undefined behavior … makes it not-undefined anymore. There can certainly be less undefined behavior (e.g. mandating the equivalent of NullPointerException, or Rust’s “safe Rust is UB-free Rust, or it’s a bug in the compiler” guarantee), but any real compiler will still have some UB. UB just means “this is an invariant that I expect and internally continue to uphold, and I don’t know what happens if something breaks it.” Even my toy compiler for the Advent of Code day 23 problem from last year contains a notion of UB, which I plan to discuss in an upcoming episode of my Compiler Adventures blog series – and that’s for a programming language that doesn’t even have if statements or loops!

                                                                                                                              I highly recommend this talk on UB if you have 40min: https://www.youtube.com/watch?v=yG1OZ69H_-o

                                                                                                                              1. 5

                                                                                                                                I guess the playing DOOM on UB being humorous is written with MMU-using operating system mindset. I witnessed IRL a situation where a simple read from stdin, do some calculations, write to stdout C program was quite reliably opening CD-ROM tray… That was on Windows 3.11 I think.

                                                                                                                                1. 1

                                                                                                                                  That’s amazing :)

                                                                                                                                  My dad once traced a client-reported bug in his company’s software down to a bad CPU in the client’s machine. When adding two very specific 3-digit integers together, the sum was incorrect. It wasn’t UB that time, just bad hardware, but it doesn’t sound like it was fun to track down. And the accountants using the software certainly didn’t appreciate the fact that the computer was adding up their numbers wrong…

                                                                                                                                  1. 1

                                                                                                                                    Since it’s the accountants’ computer that is broken, they have a catch 22. They need to ask for their money back, but the computer they do all their sums on is broken, so how much refund should they be asking for!? They can’t possibly calculate it. ;)

                                                                                                                                2. 5

                                                                                                                                  The compiler isn’t out to get you, and generally won’t include disk-encrypting code into your program if it wasn’t there before.

                                                                                                                                  Correct, but ransomware authors are out to get me, and they’ll exploit any vulnerability they can. And UB driven compilers are a source of vulnerabilities that wouldn’t be there if more behaviour was defined. That’s where my threat model comes from. Blaming the compiler as if it was the enemy isn’t accurate, but it is simpler and more vivid.

                                                                                                                                  Oh, and that talk from Chandler Carruth? I’ve watched it several time, its misleading and harmful. All the more because he’s such a good speaker. He’s trying to reassure people about the consequences of their bugs. Sure the compiler itself won’t actively summon the nasal demons, but outside attackers finding vulnerabilities will. And the result is the same: UB does enable the summoning of nasal demons.

                                                                                                                                  But he’s not saying that out loud, because that would drive people away from C and C++.

                                                                                                                                  But at the same time, providing guarantees about undefined behavior … makes it not-undefined anymore.

                                                                                                                                  Correct, which is why I advocate for defining more behaviours in C.

                                                                                                                                  1. 1

                                                                                                                                    Correct, which is why I advocate for defining more behaviours in C.

                                                                                                                                    I think that’s a lost cause. Instead, it would be nice to have more options like -fwrapv that at least impose sane behaviour for things that are UB, strictly speaking.

                                                                                                                                    1. 1

                                                                                                                                      Of course, your position is valid: it sounds like you are making an even more conservative set of assumptions than the minimum necessary set.

                                                                                                                                      I wish more people took more precautions than strictly necessary, rather than fewer than necessary as is frequently the case :)

                                                                                                                                      1. 5

                                                                                                                                        I think the larger point is that, effectively, people who write C have to treat the compiler as a deadly hyperintelligent enemy deliberately attempting to cause as much harm as possible via any means available. Otherwise they just end up being hurt by the compiler, and then when they write it up they get mocked and belittled by people who tell them it’s their fault that this happened.

                                                                                                                                        And since it’s effectively impossible to write useful/non-trivial C programs without UB, the takeaway for me is “don’t ever write C”.

                                                                                                                                        1. 3

                                                                                                                                          FWIW I feel similarly. I’ve switched to Rust because I don’t trust myself to not write UB and to not commit memory safety errors. I still haven’t needed unsafe in Rust anywhere thus far, and “if it compiles then it’s memory-safe and UB-free” is worth a lot to me.

                                                                                                                                  2. 2

                                                                                                                                    C users as a whole are hostile to any overheads and speed regressions in compilers. Compilers get benchmarked against each other, and any “unnecessary” instructions are filed as issues to fix.

                                                                                                                                    Look how long it is taking to zero-init all variables in C and C++, even though overhead is rare and minimal, and the change is good for security and reliability.

                                                                                                                                    I can’t imagine users accepting speed regressions when indexing by int, especially when it’s such a common type.


                                                                                                                                    The most useful threat model of a compiler is that of a sentient adversary:

                                                                                                                                    I just wrote a whole post trying to explain this couldn’t be further from the truth, so I don’t get what you’re trying to say here.

                                                                                                                                    1. 4

                                                                                                                                      The most useful threat model of a compiler is that of a sentient adversary:

                                                                                                                                      I just wrote a whole post trying to explain this couldn’t be further from the truth, so I don’t get what you’re trying to say here.

                                                                                                                                      It’s an oversimplification. There are sentient adversaries out there out to exploit whatever vulnerabilities that may arise from your code. And mr compiler here, despite being neither sentient nor actively hostile, does tend to magnify the consequence of many bugs, or to turn into bugs idioms that many long time practitioners thought were perfectly reasonable.

                                                                                                                                      Thus, I like to pretend the compiler itself is hostile. It makes my threat model simpler and more vivid. Which I pretty much need when I’m writing a cryptographic library in C, which I ship in source form with no control over which compilation flags may be used.

                                                                                                                                    2. 1

                                                                                                                                      This is where the standard committee should really have said something along the lines of: (…)

                                                                                                                                      Strictly speaking, undefined behavior allows implementation-specific behavior ;)

                                                                                                                                      For example, you can compile an entire Linux distro with the -fwrapv gcc and clang flags and you’ll get the behavior you want (i.e. well-defined signed overflow with wrapping behavior).

                                                                                                                                      All code that was previously C-standard-conformant will still remain C-standard-conformant. Additionally, code which previously triggered UB on signed overflow now also becomes well-defined.

                                                                                                                                      Although I do sympathize with your point that this should probably be the default even if it has a performance cost, also note that in many cases such well-defined signed overflow can lead straight into UB or a crash anyway, because these integers in many cases are used as array indexes or on array index calculations.

                                                                                                                                      Edit: might I suggest deterministically aborting on signed overflow/underflow? You can use -ftrapv for that.

                                                                                                                                    3. 2

                                                                                                                                      I do wonder if one could extend LLVM with a visualization that tells us what optimizations used which assumptions, somehow summarizing on a top-level view while allowing us to drill down to each block. That would be a great teaching and debugging tool indeed.

                                                                                                                                      1. 2

                                                                                                                                        Apparently you can now see optimization passes in LLVM on the Compiler Explorer, but I don’t think it has assumptions or things like that.

                                                                                                                                        1. 1

                                                                                                                                          That would be cool! But it sounds quite difficult to be honest. I’ve been thinking about ways to describe and represent information flow in my Compiler Adventures blog series on how compiler optimizations are implemented, and even for the massively-simplified language used there (no loops, no branches, no functions!) it’s still quite difficult to track “who did what where because of whose info.”

                                                                                                                                          Of course this doesn’t say it can’t be done, and I really hope someone takes up that challenge. I’ll be excited to see the results.

                                                                                                                                          1. 1

                                                                                                                                            I think that e-class (program equivalence classes) graph edges could be annotated with which optimization produced the candidate equivalent program

                                                                                                                                            1. 1

                                                                                                                                              If your compiler has 10 individual optimizations which get performed iteratively in a total of 30 optimization passes, you could imagine recording the output of every one of the 30 optimization passes and then showing that to the user, allowing him to go to the previous/next pass and see a diff of what changed.

                                                                                                                                              You wouldn’t know exactly why an optimization happened if you’re not familiar with the optimization, but I think it would still be immensely instructive.

                                                                                                                                              Although you’d need different visualizations and diff algorithms for different intermediate compiler languages (C AST, LLVM IR, assembly, etc).

                                                                                                                                              1. 2

                                                                                                                                                This is a very good way to understand the behaviour of the compiler you’re using.

                                                                                                                                                Fwiw you don’t need to store all the intermediate results if the compiler is deterministic, you can generate the one you want to look at by running all the passes up to that point.

                                                                                                                                                I think you can do this right now with clang? By changing the list of passes and dumping IL.

                                                                                                                                                You can do something like this with ghc-core for Haskell. :)

                                                                                                                                                1. 1

                                                                                                                                                  if the compiler is deterministic

                                                                                                                                                  That is a load-bearing phrase right there :) Making compilers behave deterministically is possible but quite difficult. Folks working on reproducible builds have gone down this road, so we can learn from their experiences.

                                                                                                                                                  It might be a topic for a future “falsehoods programmers believe about compilers” post :)

                                                                                                                                                  1. 1

                                                                                                                                                    I think in practice they mostly are.

                                                                                                                                            2. 1

                                                                                                                                              I’d be a quite an ambitious project. Consider how incomplete debug information is in optimized programs. Optimizers already struggle with even the most basic tracking of where the code came from. So tracking all the changes, their causes, and their assumptions would be even harder.

                                                                                                                                              There’s a lot of passes (that’s where majority of compilation time is spent). They are a mix of analysis and transformations passes that aren’t explicitly connected. Some passes even generate new unoptimized code that is meant to be cleaned up by other passes, so it’s hard to track meaningful causality throughout all of that.

                                                                                                                                              Optimizers work on a low-level code representation where your variables don’t exist any more (SSA form), there are no for, while or if constructs, only goto between chunks of assembly-like code (CFG). So even if you collect all the data without running out of memory, filter it down to only relevant bits, it would still be difficult to present it in a way that is relevant to C programmers.

                                                                                                                                            3. 1

                                                                                                                                              shared across all architectures

                                                                                                                                              There may also be asm optimization passes

                                                                                                                                              1. 2

                                                                                                                                                Compilers have these in what they call back-ends, which are usually separate from their main optimiser, and are language-independent, so the concept of C’s UB doesn’t exist there any more.

                                                                                                                                            1. 5

                                                                                                                                              The .NET team had a really neat project that I think was released. They created a declarative command-line argument parser which, in addition to generating the parser, embedded the grammar in a special section of the binary. PowerShell could then read that section and provide rich completions (including help text and so on) that was always in sync with the binary.

                                                                                                                                              I’d love to see *NIX platforms adopt something like this: a special ELF section that embedded a grammar for the command line and tooling to generate it from getlongopt arguments and richer interfaces. Shells could then parse it on the first invocation of a command and cache it until the binary changed.

                                                                                                                                              1. 1

                                                                                                                                                That sounds like a really cool idea. Any idea what the declarative grammar looked like? Was it something akin to what you see in a manpage?

                                                                                                                                              1. 24

                                                                                                                                                I see a lot of negativity in the answers here and I want to say that if you are looking to understand Rice’s theorem, this is an excellent, and succinct, tutorial.

                                                                                                                                                @wizeman’s post is actually an interesting tangent, but it takes nothing away from the quality of this tutorial – it is raising a much larger question and the critique would apply to any tutorial, book, or video that treats these subjects of theoretical computer science.

                                                                                                                                                1. 5

                                                                                                                                                  I also really enjoyed it, except some of the last bits (around step 26) asked me about things from earlier and I was too lazy to scroll up and remind myself which of the 20+ program snippets had that particular name and so just button mashed to the end.

                                                                                                                                                  The reason that I find this kind of theorem unsatisfying is the universal quantification. Yes, it is impossible to determine whether a program halts, for all programs over all inputs. It is possible to show whether some large subset of all finite programs halt over unbounded inputs (@wiseman is somewhat missing the fact the we do have a practically infinite tape connected to our computers, we call it a network). The really interesting question, to me, is: what are the limits over which that computation is not just possible, but feasible. The same applies to Rice’s theorem. In practice, for a lot of problems, an proof assistant that returns true, false, or don’t know is useful, especially if it can provide some hints about how to modify the input to make the proof possible.

                                                                                                                                                  1. 2

                                                                                                                                                    I agree totally on the computation question. The classic example is primitive recursion: if each case leads to “smaller” input then you have simple induction to prove that your program terminates.

                                                                                                                                                    I would be pretty interested in working through those kinds of formalisms, and investigating the bounds of interesting programs within there (after all, lots of programs we write in practice do terminate, so why?)

                                                                                                                                                    1. 1

                                                                                                                                                      @wiseman is somewhat missing the fact the we do have a practically infinite tape connected to our computers, we call it a network

                                                                                                                                                      That’s a good point!

                                                                                                                                                      Strictly speaking, I don’t think a network is a truly infinite tape. But yes, for practical purposes, currently we’d have to model it as such, especially if programs are allowed to run indefinitely.

                                                                                                                                                      So I guess, now that I think about it, perhaps the bounded Halting problem is a more interesting Halting problem to me than the Halting problem itself, since it ensures by construction that the program can only use a finite amount of memory, regardless of whether they use things like a network or not.

                                                                                                                                                      I wonder if we can always answer any questions about programs that only run for a bounded amount of time.

                                                                                                                                                      The really interesting question, to me, is: what are the limits over which that computation is not just possible, but feasible. The same applies to Rice’s theorem.

                                                                                                                                                      Yes, indeed! I completely agree that is the more interesting question. I just get bothered when people say it’s not even possible.

                                                                                                                                                      Edit: sorry, I substantially changed my comment as more thoughts occurred to me.

                                                                                                                                                      1. 2

                                                                                                                                                        Strictly speaking, I don’t think a network is a truly infinite tape.

                                                                                                                                                        A Turing machine doesn’t actually require an infinite tape because it can only move the tape by one unit per time step. This is the basis for all of the proofs about space vs time complexity on Turing machines. The ‘infinite’ tap needs to be one entry longer than the number of time steps that the program runs for. This leads to an infinite tape only for programs that run for infinite time. One of the simplest non-terminating programs on a Turing machine advances the tape one step and repeats. This requires an infinite tape, but never revisits any entry and so is equivalent to a program that reads one byte from a socket, discards it, and loops.

                                                                                                                                                        Given the rate of growth in cloud storage versus the speed of a computer, the ‘tape’ length can be increased faster than it can be consumed by a single machine, which makes it equivalent to a Turing machine tape.

                                                                                                                                                        This then leads to more interesting economic questions than the pure theoretic computer science termination questions: can this computation be completed before I run out of money (or, more often, before I exhaust my users’ attention span).

                                                                                                                                                        1. 1

                                                                                                                                                          A Turing machine doesn’t actually require an infinite tape because it can only move the tape by one unit per time step.

                                                                                                                                                          I don’t understand this argument. If the Turing machine had an unbounded but finite tape, rather than a truly mathematically infinite tape, then many proofs would not work and instead, we would believe the opposite of what we believe. Proofs such as the halting problem being undecidable, Rice’s theorem, the busy beaver function being uncomputable, etc…

                                                                                                                                                          If a Turing machine was what you said, it’s very likely that I would absolutely have no problem with it.

                                                                                                                                                          This leads to an infinite tape only for programs that run for infinite time.

                                                                                                                                                          Yes, but this is crucially important for all the proofs that I mentioned, otherwise it would be possible to prove the opposite of what they prove.

                                                                                                                                                          One of the simplest non-terminating programs on a Turing machine advances the tape one step and repeats. This requires an infinite tape, but never revisits any entry and so is equivalent to a program that reads one byte from a socket, discards it, and loops.

                                                                                                                                                          Right, but this is only one program out of an infinite number of programs that actually, can always use the entire length of the infinite tape.

                                                                                                                                                          For example, on a Turing machine, I can write a non-terminating program that advances the tape one step, then goes back to the beginning of the tape, reads or writes something for the entire existing length of the tape, then advances the tape one more step, and loops forever doing this. This program requires truly infinite storage (in the mathematical sense) and it would not be possible for this program to succeed executing forever unless the tape length is truly infinite.

                                                                                                                                                          Of course, in the real world nothing would execute forever (actually, it could if the program runs on multiple computers), but when we prove things about programs we are only concerned with modeling the behavior of a program in an ideal (and hopefully, functionally representative) model.

                                                                                                                                                          The problem I have is that the Turing machine is not functionally representative of what computers can do. A finite-sized model such as a deterministic finite-state machine or a deterministic linear bounded automaton, on the other hand, is functionally representative of what computers can do, even if it has an incredibly huge but finite tape length.

                                                                                                                                                          This is why Turing equivalent computation is considered more powerful than that which finite-sized models can do (although ironically, it has more limitations) and it’s why I argue that Turing equivalent computation or anything more powerful should be considered hypercomputation.

                                                                                                                                                          Note that this does not prevent us from creating Turing-complete programming languages and creating programs written in those languages as a simplification, which allows us to express programs with Turing-equivalent computations. But the model we use to analyze the execution of those programs should represent what computers can do.

                                                                                                                                                          If you want, you can still use a Turing machine to analyze the execution of programs, but if you do this, you should be aware that the conclusions you extract may be absurd, so don’t mislead everyone into thinking that these conclusions apply to realistic computation devices (i.e. those with a finite size).

                                                                                                                                                          Given the rate of growth in cloud storage versus the speed of a computer, the ‘tape’ length can be increased faster than it can be consumed by a single machine, which makes it equivalent to a Turing machine tape.

                                                                                                                                                          No, it does not, because there is one very crucially important difference: cloud storage cannot keep increasing forever (and I mean forever in its true sense). That is the difference between “something grows so big that I cannot even imagine its size, but it will reach a point where it doesn’t grow anymore so it is still finite” and “truly mathematically infinite”.

                                                                                                                                                          If we actually used a machine model with a hugely, hugely big tape (say, with a tape length as large as the number of atoms in the universe), but it was still finite, rather than a Turing machine where the tape is mathematically infinite, then all of the problems I mentioned would disappear! The halting problem would be considered decidable, Rice’s theorem would be considered false, we could always compute the busy beaver function, etc.

                                                                                                                                                          Note, however, that there is a difference between computable functions and efficiently computable functions (i.e. whether they complete in a reasonable amount of time), and this is true whether a model has an infinite tape or a finite one. There is also a difference between efficiently computable functions and their time complexity, which can be (but not always are) very different. So strictly speaking these are all different issues and they would have to be analyzed separately.

                                                                                                                                                          This then leads to more interesting economic questions than the pure theoretic computer science termination questions: can this computation be completed before I run out of money (or, more often, before I exhaust my users’ attention span).

                                                                                                                                                          Sure, but this also assumes a model with finite size (even if it can be very large), which leads to the answer being computable.

                                                                                                                                                          It seems like you are arguing for the same thing I am: that we should use a model with finite bounds, even if we use a bound so large that it is hard to comprehend. Unfortunately, this is not what we are doing currently, which leads to all sorts of paradoxes and mistaken conclusions about the kind of computation that is possible to do in our universe.

                                                                                                                                                          1. 1

                                                                                                                                                            You have confused an endomorphism (a function which has the same type for domain and codomain, like a function from the natural numbers to the natural numbers) with the repeated application of that endomorphism to a starting value. The endomorphism is like a description of how a Turing machine takes a single step, while repeated application is like evaluation of that Turing machine on a given input. The Halting Problem, Rice’s theorem, etc. prove properties of the endomorphism, not properties of the repeated application.

                                                                                                                                                            1. 1

                                                                                                                                                              You can argue that is true (although I look at it from another perspective) but that’s a bit of a pedantic argument: you are missing the forest for the trees.

                                                                                                                                                              Even if you see it from that perspective, you are missing the fact that the Halting Problem, Rice’s theorem, etc. can only be proved if the domain and codomain are infinite, which is what leads to those absurd conclusions.

                                                                                                                                                              There are an infinite number of functions that have infinite domains and codomains and which a real computer can never implement.

                                                                                                                                                              If the domains and codomains were finite, you could prove the opposite of those theorems.

                                                                                                                                                              Edit: I’d also like to say that your perspective is actually a very interesting way to think about the problem, I hadn’t really thought about it like that before. Thanks!

                                                                                                                                                    2. 5

                                                                                                                                                      I already knew Rice’s theorem going in and thought it did a great job of explaining it.

                                                                                                                                                      1. 3

                                                                                                                                                        the critique would apply to any tutorial, book, or video that treats these subjects of theoretical computer science.

                                                                                                                                                        Can you prove this statement? 😉

                                                                                                                                                      1. 2

                                                                                                                                                        They don’t say a word of this, but this describes LLVM finally catching up to GCC, having been behind on this for a very long time.

                                                                                                                                                        1. 2

                                                                                                                                                          Can you expand on that? The new pass manager provides a different way of communicating analysis results to transform passes and has a few features that make it easier to generate a single IR module that can target a heterogeneous architecture, but it’s largely an internal refactoring with very little direct impact on end users (some of the things it enables will impact users). What do you see as the GCC equivalent that it has caught up with?

                                                                                                                                                        1. 1

                                                                                                                                                          I’m curious about moving clone and exec into userspace. Does anyone have a link to the underlying system call interfaces that make this possible?

                                                                                                                                                          1. 1

                                                                                                                                                            As far as I can tell, this is the new implementation of clone in their libc, which does things like opening cur_pid_fd and new_pid_fd, duplicating addrspace of cur_pid_fd (Redox dup has an optional second argument), and writing to current-addrspace of new_pid_fd.

                                                                                                                                                            1. 2

                                                                                                                                                              If I understand correctly, thisproc:current/open_via_dup is a magic path that opens a process descriptor to the current process? It then has the ability to duplicate mappings from one process descriptor to another? That’s very Mach-like. It’s a shame that they’ve built access to the current process descriptor on top of a global namespace, which makes it harder to move to a clean capability model, but the rest of it looks very elegant.