Threads for 0x2ba22e11

  1. 2

    I don’t really agree with the idea that mmap memory is heap memory, in spite of the long argument. On a modern *NIX system, mmap is the only way of getting memory in a userspace process and so is the thing used to get the heap, the stack, and the mappings of the binaries that provide globals and functions. If all of these are heap memory then everything in a process is heap memory and you may as well just say ‘memory’.

    It’s also worth noting that you often want to JIT units smaller than a page and so can’t do W^X the way that this article proposes. The canonical way of doing it is to create an anonymous shared memory object and map it read-write (or write-only) in one location and read-execute (or execute-only) in another location. This makes it difficult to automatically find the writeable address from a function pointer. Apple’s iOS JIT goes one step further and JITs a memcpy that copies into the writeable location and then scrubs all other references to the writeable address, leaving only the execute-only mapping. To be able to write, you need to either leak the writeable address via a side channel (unfortunately, this is quite easy) or find the address of the special memcpy (unfortunately, this is also easy because it’s at the start of the region and so easy to guess given a function pointer and easy to check guesses with speculative side channels).

    1. 1

      Hm. An idea that comes to mind:

      • Make a file and unlink it
      • Create a subprocess, share the file with it
      • mmap the anonymous file X in the main process and W in the subprocess
      • When you want to update code, send an IPC message to the subprocess asking it to update the file on your behalf
      • The subprocess can authenticate you with a passcode (64 or 128 bit random number) or a HMAC

      Higher overhead because of the IPC and context switching, but probably easier to hide the authentication key.

      1. 1

        This is more or less what the pre-Chromium Edge did on Windows, except that Windows has much nicer APIs for it. The memory mapping APIs take the HANDLE to the target process, so one process would JIT into a shared memory object and then map that object executable in the renderer process.

        I’m not sure what the HMAC is for here. You already have control over who can send messages because only the process that has the other end of the IPC channel (socket or pipe) can send messages to the JIT process. If your goal is to privilege separate within the process that’s running the code, you’re solving an impossible (without CHERI) problem because an attacker who can run JIT’d code can insert a gadget that lets them probe the secret with speculative side channels and so they can trivially extract it in well under a second and then forge it. And if you do have CHERI, then you can put the JIT in a separate compartment in the same address space, map the memory RWX, give the consumer an X / RX capability and the JIT compartment a W / RW capability and get the same security guarantees with a fraction of the overhead.

    1. 32

      To give a background why is that: UB is not an event that happens in the program. Instead, “UB is impossible” is an assumption that is baked into the compilers.

      The compilers don’t have if is_UB() { screw_you() } anywhere. The no-UB assumptions can be implicit and pervasive. Like an assumption that opposite of true is false, so if (x) {do} and if (!x) {} else {do} are equivalent. If you magically made a 3-valued bool with filenotfound, then it’s not merely triggering some else { lol() } code path in the optimizer. Instead, it makes all the assumptions about boolean algebra invalid and full of impossible-to-reason about paradoxes. And it’s hard to warn about this, because UB is not an event or a code path. The compiler would have to warn about every non-issue like “here I’ve assumed that the opposite of false is true”.

      The second reason is dealing with complexity of optimizations. It’s too difficult to optimize code in the high-level form you write it. There are too many tree structures to work with, too many scopes, too many constraints to preserve. Instead, the code is “lowered” to a simpler assembler-like representation that no longer resembles the source code. In this form programmer’s intention is hard to see, so it’s no longer obvious what should and shouldn’t be optimized out.

      To manage the complexity, optimizations are applied to low-level code in multiple independent passes. This way instead of having to consider all of the C spec for any change, each pass can focus on one thing at a time, e.g. “can I change !!x to x”. There’s a pass that simplifies arithmetic, there’s a pass that checks which expressions are always true, and there’s a pass that removes unreachable code. They can end up removing your null or overflow checks, but they don’t mean to — each pass does its own thing that in isolation seems sensible, spec-compliant, and is useful for most of the code.

      And finally, majority of the optimizers in C compilers are shared across all architectures. They evaluate code as if it ran on the hypothetical portable machine that the C spec describes, not your particular CPU. So it doesn’t matter that your CPU knows how to overflow a signed int or has caches coherent across threads. The low-level virtual machine the optimizer optimizes for doesn’t have that, and code under optimizer will “run” in the C way instead.

      1. 16

        Instead, it makes all the assumptions about boolean algebra invalid and full of impossible-to-reason about paradoxes

        This is a great explanation, but substitute ‘integer’ for ‘boolean’ and you get why C compilers really like that signed integer overflow is undefined. If you write something that loops performing some addition on an integer, then you have an induction variable that can be modelled as an arithmetic sequence. If you have a nested loop, then the induction variable can be modelled as a geometric sequence. This model is completely wrong in every regard if adding two positive values together can result in a negative number. Having to understand this case precludes any transform from making use of this information. This precludes a large number of loop optimisations.

        1. 4

          This model is completely wrong in every regard if adding two positive values together can result in a negative number

          But isn’t that just as true of unsigned integer overflow? You don’t unexpectedly get a negative sum, but you do get one that’s completely wrong.

          1. 7

            Signed overflow is UB so compilers get to pretend it can’t happen. The lack of UB on unsigned integer overflow means that many compilers don’t optimize unsigned arithmetic nearly as aggressively, for precisely the reason you point out. The difference is that compilers aren’t allowed to ignore the possibility of unsigned overflow since it’s well-defined and allowed to happen.

        2. 9

          My favorite example of undefined behavior: creating a pointer into the stack and messing around with its contents. In C or unsafe Rust or something like that, compiler has literally no way of preventing you from doing this, and no way of detecting that it has occurred. It’s not even “this is on you to make it safe” territory, because the compiler is allowed to assume the contents of the stack doesn’t change out from underneath it… it may juggle local variables in and out of registers or something and, within the bounds of the calling convention, you get make zero assumptions about how it works. Just ‘cause one compiler puts a particular local var at [rsp+24] or whatever doesn’t mean it will be there the next time you compile the program.

          Yes, you can do it. You can even make it work, if you know exactly how the compiler works and the compiler doesn’t change. But you’re breaking the invariants that the compiler fundamentally uses to function, and it is incapable of noticing that you are doing it.

          1. 1

            My favorite example of undefined behavior: creating a pointer into the stack and messing around with its contents.

            Wait, what? How do you create such a pointer, exactly?

            You’d have to use assembly code I assume? Or some library that uses assembly code?

            1. 3

              Wait, what? How do you create such a pointer, exactly?

              You’d have to use assembly code I assume? Or some library that uses assembly code?

              The alloca function returns a pointer to memory allocated on the stack! I’m surprised it hasn’t been mentioned in the thread yet.

              https://www.man7.org/linux/man-pages/man3/alloca.3.html

              1. 2

                The alloca function returns a pointer to memory allocated on the stack! I’m surprised it hasn’t been mentioned in the thread yet.

                D’oh! Of course, how didn’t I remember that? Thanks!

                Edit: goes to show how often that function is used, hah!

              2. 3

                …You mean playing around with the stack isn’t the first thing everyone does when learning how C works?

                int y_hello_thar() {
                    int a;
                    int *b = &a;
                    b += 24;
                    return *b;
                }
                

                https://godbolt.org/z/GYzq87G4s

                (Update: Notably, if you pass -O3 to most versions of gcc in godbolt it will generate the code you expect, but clang will almost invariably nope out and just return 1 or 0 or nothing at all. Fun!)

                1. 3

                  That’s not what I imagined a pointer into the stack would be from your description.

                  That’s a pointer into a local variable. The compiler wouldn’t even allocate stack space if you compile with optimizations.

                  Pointers to local variables are not UB.

                  Edit: What is UB in this case is that you incremented “b” outside the array bounds (a pointer to a single object is considered to be a pointer into an array of 1 element).

                  Edit 2: You can freely use such pointers while the compiler passes variables into and out of registers and your program will still work just fine. But your pointer is not allowed to go outside the bounds of the array (except to point one-past the array, but then you can’t dereference it).

                  1. 2

                    You are correct, the UB in this case is incrementing the pointer out of its bounds. The thing is that a local variable (edit: usually) has to exist on the stack if you have a pointer to it that does arithmetic. So the easy way to get a pointer to a random place on the stack is to start from a local variable. You can also have a function that returns a pointer to a local var, you can start in the data segment or heap, or you can just make a good guess of where your stack is and hope ASLR doesn’t ruin your fun.

                    Or if you really want to go hard, you can ask a user to type in a number and use that as an address. Who knows what they’ll type in? The compiler sure doesn’t.

                    1. 3

                      The thing is that a local variable has to exist on the stack if you have a pointer to it that does arithmetic.

                      What? No, that’s not true. See for yourself: https://godbolt.org/z/PvfEoGTnc

                      So the easy way to get a pointer to a random place on the stack is to start from a local variable.

                      No, you don’t necessarily get a pointer to a random place on the stack if you do that. The local variable or array may not even exist on the stack.

                      If you go outside the object/array bounds with pointer arithmetic, you get UB which does not have to (and many times won’t) translate into a random pointer, it will just mean that your code will break (and it can break in many different ways).

                      You should really, really, carefully read @kornel’s top post to understand how UB works.

                      You can also have a function that returns a pointer to a local var

                      That’s UB but again, you won’t necessarily get a random pointer to the stack, as the variable may not even exist on the stack. And even if it does exist, your code can break in completely different ways than getting a random pointer to the stack.

                      you can start in the data segment or heap

                      You’d get UB if your pointer goes outside of a malloc()-allocated area, but again, you wouldn’t necessarily get a pointer value. You’re making the exact same false assumptions that @kornel talks about.

                      Your code could simply be completely optimized out, or it may cause other very strange effects that have nothing to do with pointers.

                      or you can just make a good guess of where your stack is and hope ASLR doesn’t ruin your fun.

                      Even if you knew exactly where your stack is, I don’t think there is a way to deterministically create such a pointer without implementation-specific behavior (e.g. assembly code).

                      Edit: updated Godbolt link due to a typo in the code.

                      1. 1

                        Or if you really want to go hard, you can ask a user to type in a number and use that as an address. Who knows what they’ll type in? The compiler sure doesn’t.

                        I believe converting an arbitrary integer into a pointer is also UB and you don’t necessarily get a pointer like you think you will, your code can just completely break in lots of different ways (including simply being optimized out).

                        You’d have to enable some kind of special implementation-specific compiler option to do something like that without UB.

                        What is not UB, I believe, is to e.g. pass a pointer value into an intptr_t variable and then back into a pointer without changing the value.

                        1. 2

                          I believe converting an arbitrary integer into a pointer is also UB…

                          And yet this is also how just about every device driver under the sun works.

                          1. 1

                            And yet this is also how just about every device driver under the sun works.

                            Because of this:

                            You’d have to enable some kind of special implementation-specific compiler option to do something like that without UB.

                            Edit:

                            Correction: technically speaking, it seems that the integer-to-pointer conversion itself is not UB if you use an explicit cast, but the C standard says the result is implementation-defined, so the code can behave differently depending on which compiler (or compiler version, or compiler flags) you’re using.

                            In fact, the C standard says that the resulting value can be a trap representation, which means that it could trigger UB as soon as you tried to do anything with that pointer.

                            Even if your implementation allows doing that and even if the integer is a valid memory location, I would imagine it would still be possible to trigger UB by violating other constraints, like creating a duplicate copy of another pointer of a different type (which could violate strict aliasing constraints), or creating a pointer with the wrong alignment, or calling free(ptr) if the memory location hadn’t been allocated with malloc() before, etc.

                          2. 1

                            I believe converting an arbitrary integer into a pointer is also UB

                            The int-to-ptr operation is actually fundamentally implementation-defined. So you won’t find the UB rules for ptr-to-int and int-to-ptr in the standard, you need to look at the LLVM LangRef instead.

                            1. 1

                              The int-to-ptr operation is actually fundamentally implementation-defined.

                              Yes, I mentioned that in my other reply.

                              But note: it is undefined behavior if you don’t use an explicit cast, e.g. if you do this: int *a = 4, then it is UB.

                              1. 2

                                Why is it not a compile error?

                                1. 2

                                  It is a compile error:

                                  https://godbolt.org/z/Gq8qbW559

                                  But that’s on GCC (probably clang does the same). There might be C compilers that are C-standard-compliant that might not generate an error.

                                  If you are asking why doesn’t the C standard specify that such conversions are invalid rather than UB? No idea!

                                  Edit: oh wait, I made a mistake! The above error is when compiling in C++ mode (the Compiler Explorer’s default, annoyingly). In C mode, it’s not an error, it’s a warning by default:

                                  https://godbolt.org/z/rP3KoGbx3

                                  That said, you can transform that into an error by adding -Werror to the compiler flags, of course.

                                  1. 1

                                    I don’t think this is the kind of error you’re expecting to see - if you change the code to int *a = (int *)4 it should compile without warning. I don’t know if this is even supposed to be undefined behaviour - when talking to devices you need to know the exact memory address “as a number” and write to it or read to it (think VGA video memory addressing and such), which is or was traditionally often done in C under DOS in real mode, for example.

                                    Although it might be that this has sort-of been retconned into “always having been wrong” like relying on hardware behaviour on integer overflow has been.

                        2. 1

                          Local variables are generally allocated on the stack. In some cases, it may be able to not allocate memory at all and keep the local variable purely in a register, but any time you mess around with pointers to local variables – especially when those pointers are passed across non-inlined function boundaries – that pointer is to the stack-allocated local variable.

                          With optimizations disabled, you can look at the assembly code and stack frame and verify that ‘a’ is allocated on the stack and ‘b’ is a pointer to that stack-allocated variable in icefox’s example. Here’s a slightly modified example where the pointer is passed across a function boundary to be dereferenced: https://godbolt.org/z/x8zEKEr1E

                          1. 1

                            Yes, I know that, but @icefox was talking about “creating a pointer to the stack”, but as far as I can see there is no way to create a pointer to the stack in the C language, or at least no portable way to do it (i.e. without relying on implementation-defined behavior).

                            The C18 standard, as far as I can see, doesn’t even have the word “stack” in the 535-page document!

                            Someone else mentioned alloca() but even that is not part of any standard.

                            Then he said:

                            In C (…) [the] compiler has literally no way of preventing you from doing this,

                            Which seems to make little sense because if you’re relying on implementation-defined behavior (which has to be documented) to create a pointer to the stack, then that means that the compiler has already promised to give you a pointer to the stack. So of course it can’t prevent you from messing around with it.

                            It’s not like you “tricked” the compiler into giving you such a pointer, usually it’s actually the opposite: the compiler is the one who tricks you into thinking that you have a pointer to the stack when in fact the local variables (and even arrays) that the pointer points to, may only exist in registers or, since all the values of the variable might be possible to determine at compile time, may not exist at all!

                            Edit: That said, yes, I understand that if you rely on implementation-defined behavior then it is possible to deterministically create such a pointer (e.g. with alloca()) and mess around with it, such as corrupting the stack and causing UB. Which is what I meant when I said that you’d have to call some (platform-specific) library function to do that, it’s not like there’s a construct in the C language that says: “this is how you create a pointer to the stack”.

                            Edit 2: clarified the comment a bit.

                            1. 1

                              You can’t find the word “stack” in the C standard because the C standard indeed has no concept of a “stack”. As far as the standard is concerned, local variables have automatic storage and are alive until the function returns, variables allocated with malloc and the like have allocated storage and are alive until they are freed with free.

                              However, the common implementations have a chunk of memory they call “the stack”, where each function gets a sub-chunk called a “stack frame” which is where the compiler puts, among other things, the variables with automatic storage (except for those variables which it optimizes out or keeps in registers only). Therefore, when you take the address of a variable with automatic storage, the implementation will give you an address to the place in the function’s stack frame where that variable is kept; you have a pointer to somewhere in the stack.

                              There is some mixing of abstraction levels here. You never have a “pointer to a variable allocated on the stack” in terms of the C abstract machine, you have a “pointer to a variable with automatic storage”. But when we lower that into the level of concrete machine code for some operating system, we can see that in the right circumstances, our “pointer to a variable with automatic storage” gets lowered into a “pointer to a chunk of memory in the function’s stack frame”. So we can create an example program which proves @icefox’s point (namely that the “compiler has literally no way of preventing you from doing this, and no way of detecting that it has occurred”):

                              a.c:

                              // This function assumes it gets a pointer to a chunk of memory with at least 5 ints, and sets the 5th to 0.
                              // There is nothing at all wrong with this function, assuming the pointer does indeed point to 5+ mutable ints.
                              // There is no way for the compiler to detect that anything wrong is going on here.
                              void clear_fifth_int(int *ints) {
                                  ints[4] = 0;
                              }
                              

                              b.c:

                              void clear_fifth_int(int *ints);
                              
                              // This function creates a local variable 'x', which the implementation will put on the stack,
                              // then calls 'clear_fifth_int' with a pointer to 'x'.
                              // There is nothing at all wrong with this function, *if* the 'get_fifth_int' function only dereferences the
                              // pointed-to int, so there's no way for the compiler to detect that anything wrong is going on here.
                              // However, we know that clear_fifth_int will touch whatever happens to be 4*sizeof(int) bytes after 'x'
                              // on the stack.
                              void stupid() {
                                  int x = 10; // x ends up allocated on the stack
                                  clear_fifth_int(&x); // &x gets a pointer to somewhere in the stack
                              }
                              

                              No individual function in this example does anything wrong, but in combination, they mess around with the stack in ways which the implementation can’t anticipate.

                              1. 1

                                No individual function in this example does anything wrong, but in combination, they mess around with the stack in ways which the implementation can’t anticipate.

                                You don’t actually know if they mess around with the stack, they may very well not. With link-time optimization all of that code could be inlined together and no pointers to the stack would even be created.

                                What you are saying is actually a result of the following:

                                1. The pointer points to a single object, rather than a sufficiently-large array of objects. Which results in UB when you do pointer arithmetic past the bounds of the array.

                                2. Or, in other situations, it could result in UB due to dereferencing a pointer past the lifetime of the object (which could happen both for automatic storage and for heap-allocated storage).

                                It has nothing to do with “the stack” or a “pointer to the stack”. It is simply the result of having “a pointer” (to some non-permanent storage).

                                You are assuming that a specific implementation detail will happen. How do you know that’s what will happen? Does your C compiler or some other standard guarantee that? I assume the compiler documentation or a calling convention such as the System V ABI may be able to guarantee that. But if it does guarantee it, how is that “the implementation not anticipating”?

                                Edit: API -> ABI

                                1. 1

                                  I don’t understand what you are disagreeing with me about. Clearly, we have some C code, where all translation units are correct in isolation, but in common implementations, their combination might (and likely will, but is not guaranteed by any standard to) result in machine code which messes with values on the stack in a way the compiler can’t account for. That much is inarguable, and I don’t see you claiming otherwise; so where is the contention?

                                  1. 1

                                    Clearly, we have some C code, where all translation units are correct in isolation but where their combination might (and likely will) result in machine code which messes up values on the stack in a way the compiler doesn’t expect. That much is inarguable, and I don’t see you disagreeing with it

                                    Indeed, that is correct.

                                    so where is the contention?

                                    The contention is in the following two points:

                                    • In suggesting that “In C (…) the compiler has literally no way of preventing you from doing this, and no way of detecting that it has occurred”, where “this” is creating a pointer to the stack and messing around with the stack.

                                    I think that phrase is a bit misleading, because:

                                    1. The C language has no construct to create a pointer to the stack, which means you’re either:
                                    2. Relying on implementation-defined behavior, which means that the compiler is actually giving you such a pointer on purpose
                                    3. Or it means there’s no way to deterministically create such a pointer, because the pointer may not even exist, i.e. the compiler may optimize it out.

                                    Which means that the compiler can prevent you from doing that. And if it can’t, it’s probably because you’re relying on the calling convention which is implementation-defined behavior, therefore documented and the compiler would be doing it on purpose.

                                    1. Or it means you’re relying on some library function or assembly code to create such a pointer, which is what I thought he meant and why I asked him to clarify as it’s not very common to do this!

                                    The other contention point, which I admit is (even more?) pedantic, is:

                                    • In saying that all of this is specific to “pointers to the stack” when it is actually just a result of having pointers and going outside their bounds or the lifetime of the storage they point to, which applies to all types of pointers.

                                    That said, yes, I understand that when going outside the bounds of a pointer (for example) to heap storage is unlikely to corrupt the stack (even though it’s possible) and that stack corruption is a very particular type of corruption which usually has very… interesting side-effects!

                                    So in conclusion, I think we are in agreement with regards to the underlying issues, I just disagree with the phrasing and particularly, how it appears to be suggesting that the compiler is powerless to prevent it, when in fact you’re either breaking the invariants of the language itself, or the compiler is actually going out of its way to help you do that, on purpose (just in case you like to shoot yourself in the foot)!

                                    Edit: “external library” -> “library”

                                    Edit 2: in other words, you haven’t really forced the compiler to give you a pointer to the stack, it was the compiler who decided to give that to you, and it could just as easily have decided to do something else if it wanted (proof: the existence of conforming C compilers for architectures that don’t even have a stack).

                                    1. 1

                                      So you would be completely happy with:

                                      If the compiler has a concept of what we would conventionally call a “stack”, and it has put a variable on the stack, the compiler has no way to prevent you from taking a pointer to that variable and messing around with other stuff that’s also on the stack and no way of detecting that it has occurred, assuming the compiler implements pointer arithmetic in the way all the common implementations do.

                                      Personally, I think the original comment was clear enough, adding all those caveats just muddies the meaning IMO. But I won’t disagree that it’s assuming a few things which aren’t strictly guaranteed by the C standard.

                                      […] I just disagree with the phrasing and particularly, how it appears to be suggesting that the compiler is powerless to prevent it, when in fact you’re either breaking the invariants of the language itself, or the compiler is actually going out of its way to help you do that, on purpose […]

                                      Well, you are breaking the invariants of the language, but I don’t understand how that means the compiler can prevent it. The compiler gives you a pointer to a variable which the complier has decided to put on what it calls “the stack”. The language has features which make compile-time tracking of where pointers come from impossible. The language has features which lets you do arithmetic on pointers. Given those facts, if a compiler is implemented in such a way that there exists a stack which contains stuff like the return address and local variables, the compiler is powerless to prevent you from messing with stuff the compiler has to assume is left unchanged.

                                      I agree that a compiler might be implemented differently, it might store the return address stack separately from the variable stack, it might allocate every local variable in a separate heap allocation and automatically free them when the function returns, it might do runtime checking of all pointer dereferences, or any number of other things. But under the assumption that the implementation works roughly like all the major C implementations currently do, everything icefox said is correct.

                                      1. 1

                                        If the compiler has a concept of what we would conventionally call a “stack”, and it has put a variable on the stack, the compiler has no way to prevent you from taking a pointer to that variable and messing around with other stuff that’s also on the stack and no way of detecting that it has occurred, assuming the compiler implements pointer arithmetic in the way all the common implementations do.

                                        This phrasing is also wrong. The compiler can do all of those things and yet the pointer is not a pointer to the stack, it’s a pointer to the variable (which may currently be stored in a register or even may only exist as a constant in this exact moment, even though there’s an allocation for the variable in the stack).

                                        For example if you have this code:

                                        void some_function(int *y, bool condition) {
                                            *y = *y + 5;
                                         
                                             if (condition) {
                                                 y += 100000;
                                                 *y = 20;
                                             }
                                        }
                                        
                                        void foo() {
                                            int x[100] = {0};
                                        
                                            int *y = &x[50];
                                        
                                            some_function(y, false);
                                        
                                            printf("%d", *y);
                                        
                                            (...)
                                        }
                                        

                                        … it may not do what you think it does. Why? Because the compiler is allowed to transform that code into this equivalent code:

                                        int some_specialized_function() {
                                            return 5;
                                        }
                                        
                                        void foo() {
                                            int x[100] = {0};
                                        
                                            int z = some_specialized_function();
                                        
                                            printf("%d", z);
                                        
                                            x[50] = z; // this line may or may not exist, e.g. if x[50] is never used again
                                        
                                            (...)
                                        }
                                        

                                        So even if there’s an allocation for x[] in the stack frame for some reason (let’s say some further code manipulates the array), you think that y is a pointer to the stack, but it may very well not even exist as such, even if the code for some_function hasn’t been inlined!

                                        And even if you call some_function(y, true) instead of false, you think you tricked the compiler into corrupting the stack, but the compiler can clearly see that y points to &x[50] so increasing y by 100000 is absurd and the entire if block of the specialized some_function() implementation can be removed.

                                        So yes, the compiler can prevent you from messing around with the stack even in that case.

                                        The compiler gives you a pointer to a variable which the complier has decided to put on what it calls “the stack”.

                                        Yes, you said it very well here, even though that doesn’t always happen. The compiler has decided. You haven’t forced it to do that. The compiler could have decided something else. It was an implementation choice, which means the compiler could have prevented it.

                                        I agree that a compiler might be implemented differently, it might store the return address stack separately from the variable stack, it might allocate every local variable in a separate heap allocation and automatically free them when the function returns, it might do runtime checking of all pointer dereferences, or any number of other things.

                                        Exactly! Which means that the compiler can actually prevent you from doing the things icefox said.

                                        But under the assumption that the implementation works roughly like all the major C implementations currently do, everything icefox said is correct.

                                        Well, it’s not possible for me to prove this, but to be honest, I think icefox had a misconception, as evidenced by the fact that (before he edited it) his reply to my comment said that just doing pointer arithmetic on a pointer to a local variable means that the compiler is forced to give you a pointer to the stack, when this is not true: compilers often optimize such pointers out of the code completely.

                                        And you know, this entire debate could have been avoided if he just said “it’s often possible to corrupt the stack [perhaps by doing such and such], which has all these interesting effects” rather than suggesting that C compilers have some underlying limitation and are powerless to prevent you from messing with what they do.

                                        But you know, apart from that, I think we are in agreement!

                                        Edit: renamed function name to clarify the code.

                  2. 5

                    So it doesn’t matter that your CPU knows how to overflow a signed int.

                    I would say, it doesn’t matter that virtually all CPU in active use since… say the start of the 21st century, know how to overflow a signed int. Some platforms somewhere that existed at some point in time don’t know how to overflow signed integers, and so the standard committee thought it was a good idea to mark it “UB”, probably for portability reasons, and the compiler writers later thought it was a good idea to interpret it literally, for optimisation reasons.

                    This is where the standard committee should really have said something along the lines of:

                    • Signed integer overflow is implementation defined…
                    • except on platforms that produce non-deterministic results, in which case it’s unspecified…
                    • except on platforms that fault, in which case it aborts the program, whatever that should mean…
                    • except on platforms that fall into an inconsistent state, in which case it’s Undefined Behaviour™.

                    Sure, the specs are wordier this way, but that would have gotten rid of a huge swath of security vulnerabilities. And while some code would be slower, compiler writers would have told everyone they should use unsigned size_t indices for their loops.

                    But no, they didn’t define behaviour depending on the platform. If something is undefined in one supported platform, no matter how niche, then it’s undefined in all supported platforms. Just so it could be a little bit faster without requiring users to touch their source code.


                    My understanding of undefined behaviour is now as follows: compilers will, in the face of undefined behaviour, generate code that under the “right” conditions causes your hard drive to be encrypted and print a ransom message. The most useful threat model of a compiler is that of a sentient adversary: if there’s any UB in your code, it will eventually find it, and cause as much harm as it can up to and including causing people to die.

                    A plea to compiler writers: please rein that in?

                    1. 8

                      (post author here)

                      Unfortunately, it’s not that easy.

                      The compiler isn’t out to get you, and generally won’t include disk-encrypting code into your program if it wasn’t there before. The line about playing DOOM on UB is meant as humor and to demonstrate that it’s still not a compiler bug if this happens. But in reality, it isn’t actually going to happen in GCC, or Clang, or rustc, or any other compiler that wasn’t purpose-built to do that. It’s not guaranteed to not happen, but compilers are in practice not written to maliciously manufacture instructions out of whole cloth just to mess with you on purpose.

                      But at the same time, providing guarantees about undefined behavior … makes it not-undefined anymore. There can certainly be less undefined behavior (e.g. mandating the equivalent of NullPointerException, or Rust’s “safe Rust is UB-free Rust, or it’s a bug in the compiler” guarantee), but any real compiler will still have some UB. UB just means “this is an invariant that I expect and internally continue to uphold, and I don’t know what happens if something breaks it.” Even my toy compiler for the Advent of Code day 23 problem from last year contains a notion of UB, which I plan to discuss in an upcoming episode of my Compiler Adventures blog series – and that’s for a programming language that doesn’t even have if statements or loops!

                      I highly recommend this talk on UB if you have 40min: https://www.youtube.com/watch?v=yG1OZ69H_-o

                      1. 5

                        I guess the playing DOOM on UB being humorous is written with MMU-using operating system mindset. I witnessed IRL a situation where a simple read from stdin, do some calculations, write to stdout C program was quite reliably opening CD-ROM tray… That was on Windows 3.11 I think.

                        1. 1

                          That’s amazing :)

                          My dad once traced a client-reported bug in his company’s software down to a bad CPU in the client’s machine. When adding two very specific 3-digit integers together, the sum was incorrect. It wasn’t UB that time, just bad hardware, but it doesn’t sound like it was fun to track down. And the accountants using the software certainly didn’t appreciate the fact that the computer was adding up their numbers wrong…

                          1. 1

                            Since it’s the accountants’ computer that is broken, they have a catch 22. They need to ask for their money back, but the computer they do all their sums on is broken, so how much refund should they be asking for!? They can’t possibly calculate it. ;)

                        2. 5

                          The compiler isn’t out to get you, and generally won’t include disk-encrypting code into your program if it wasn’t there before.

                          Correct, but ransomware authors are out to get me, and they’ll exploit any vulnerability they can. And UB driven compilers are a source of vulnerabilities that wouldn’t be there if more behaviour was defined. That’s where my threat model comes from. Blaming the compiler as if it was the enemy isn’t accurate, but it is simpler and more vivid.

                          Oh, and that talk from Chandler Carruth? I’ve watched it several time, its misleading and harmful. All the more because he’s such a good speaker. He’s trying to reassure people about the consequences of their bugs. Sure the compiler itself won’t actively summon the nasal demons, but outside attackers finding vulnerabilities will. And the result is the same: UB does enable the summoning of nasal demons.

                          But he’s not saying that out loud, because that would drive people away from C and C++.

                          But at the same time, providing guarantees about undefined behavior … makes it not-undefined anymore.

                          Correct, which is why I advocate for defining more behaviours in C.

                          1. 1

                            Correct, which is why I advocate for defining more behaviours in C.

                            I think that’s a lost cause. Instead, it would be nice to have more options like -fwrapv that at least impose sane behaviour for things that are UB, strictly speaking.

                            1. 1

                              Of course, your position is valid: it sounds like you are making an even more conservative set of assumptions than the minimum necessary set.

                              I wish more people took more precautions than strictly necessary, rather than fewer than necessary as is frequently the case :)

                              1. 5

                                I think the larger point is that, effectively, people who write C have to treat the compiler as a deadly hyperintelligent enemy deliberately attempting to cause as much harm as possible via any means available. Otherwise they just end up being hurt by the compiler, and then when they write it up they get mocked and belittled by people who tell them it’s their fault that this happened.

                                And since it’s effectively impossible to write useful/non-trivial C programs without UB, the takeaway for me is “don’t ever write C”.

                                1. 3

                                  FWIW I feel similarly. I’ve switched to Rust because I don’t trust myself to not write UB and to not commit memory safety errors. I still haven’t needed unsafe in Rust anywhere thus far, and “if it compiles then it’s memory-safe and UB-free” is worth a lot to me.

                          2. 2

                            C users as a whole are hostile to any overheads and speed regressions in compilers. Compilers get benchmarked against each other, and any “unnecessary” instructions are filed as issues to fix.

                            Look how long it is taking to zero-init all variables in C and C++, even though overhead is rare and minimal, and the change is good for security and reliability.

                            I can’t imagine users accepting speed regressions when indexing by int, especially when it’s such a common type.


                            The most useful threat model of a compiler is that of a sentient adversary:

                            I just wrote a whole post trying to explain this couldn’t be further from the truth, so I don’t get what you’re trying to say here.

                            1. 4

                              The most useful threat model of a compiler is that of a sentient adversary:

                              I just wrote a whole post trying to explain this couldn’t be further from the truth, so I don’t get what you’re trying to say here.

                              It’s an oversimplification. There are sentient adversaries out there out to exploit whatever vulnerabilities that may arise from your code. And mr compiler here, despite being neither sentient nor actively hostile, does tend to magnify the consequence of many bugs, or to turn into bugs idioms that many long time practitioners thought were perfectly reasonable.

                              Thus, I like to pretend the compiler itself is hostile. It makes my threat model simpler and more vivid. Which I pretty much need when I’m writing a cryptographic library in C, which I ship in source form with no control over which compilation flags may be used.

                            2. 1

                              This is where the standard committee should really have said something along the lines of: (…)

                              Strictly speaking, undefined behavior allows implementation-specific behavior ;)

                              For example, you can compile an entire Linux distro with the -fwrapv gcc and clang flags and you’ll get the behavior you want (i.e. well-defined signed overflow with wrapping behavior).

                              All code that was previously C-standard-conformant will still remain C-standard-conformant. Additionally, code which previously triggered UB on signed overflow now also becomes well-defined.

                              Although I do sympathize with your point that this should probably be the default even if it has a performance cost, also note that in many cases such well-defined signed overflow can lead straight into UB or a crash anyway, because these integers in many cases are used as array indexes or on array index calculations.

                              Edit: might I suggest deterministically aborting on signed overflow/underflow? You can use -ftrapv for that.

                            3. 2

                              I do wonder if one could extend LLVM with a visualization that tells us what optimizations used which assumptions, somehow summarizing on a top-level view while allowing us to drill down to each block. That would be a great teaching and debugging tool indeed.

                              1. 2

                                Apparently you can now see optimization passes in LLVM on the Compiler Explorer, but I don’t think it has assumptions or things like that.

                                1. 1

                                  That would be cool! But it sounds quite difficult to be honest. I’ve been thinking about ways to describe and represent information flow in my Compiler Adventures blog series on how compiler optimizations are implemented, and even for the massively-simplified language used there (no loops, no branches, no functions!) it’s still quite difficult to track “who did what where because of whose info.”

                                  Of course this doesn’t say it can’t be done, and I really hope someone takes up that challenge. I’ll be excited to see the results.

                                  1. 1

                                    I think that e-class (program equivalence classes) graph edges could be annotated with which optimization produced the candidate equivalent program

                                    1. 1

                                      If your compiler has 10 individual optimizations which get performed iteratively in a total of 30 optimization passes, you could imagine recording the output of every one of the 30 optimization passes and then showing that to the user, allowing him to go to the previous/next pass and see a diff of what changed.

                                      You wouldn’t know exactly why an optimization happened if you’re not familiar with the optimization, but I think it would still be immensely instructive.

                                      Although you’d need different visualizations and diff algorithms for different intermediate compiler languages (C AST, LLVM IR, assembly, etc).

                                      1. 2

                                        This is a very good way to understand the behaviour of the compiler you’re using.

                                        Fwiw you don’t need to store all the intermediate results if the compiler is deterministic, you can generate the one you want to look at by running all the passes up to that point.

                                        I think you can do this right now with clang? By changing the list of passes and dumping IL.

                                        You can do something like this with ghc-core for Haskell. :)

                                        1. 1

                                          if the compiler is deterministic

                                          That is a load-bearing phrase right there :) Making compilers behave deterministically is possible but quite difficult. Folks working on reproducible builds have gone down this road, so we can learn from their experiences.

                                          It might be a topic for a future “falsehoods programmers believe about compilers” post :)

                                          1. 1

                                            I think in practice they mostly are.

                                    2. 1

                                      I’d be a quite an ambitious project. Consider how incomplete debug information is in optimized programs. Optimizers already struggle with even the most basic tracking of where the code came from. So tracking all the changes, their causes, and their assumptions would be even harder.

                                      There’s a lot of passes (that’s where majority of compilation time is spent). They are a mix of analysis and transformations passes that aren’t explicitly connected. Some passes even generate new unoptimized code that is meant to be cleaned up by other passes, so it’s hard to track meaningful causality throughout all of that.

                                      Optimizers work on a low-level code representation where your variables don’t exist any more (SSA form), there are no for, while or if constructs, only goto between chunks of assembly-like code (CFG). So even if you collect all the data without running out of memory, filter it down to only relevant bits, it would still be difficult to present it in a way that is relevant to C programmers.

                                    3. 1

                                      shared across all architectures

                                      There may also be asm optimization passes

                                      1. 2

                                        Compilers have these in what they call back-ends, which are usually separate from their main optimiser, and are language-independent, so the concept of C’s UB doesn’t exist there any more.

                                    1. 1

                                      The composition trick to find and then delete the background is really neat. :)

                                      1. 2

                                        This is definitely an interesting point to keep in mind. I remember often thinking things that “can’t matter relative to I/O” (things like Django models getting built up rather than using .values_list) end up being significant contributors to slowness.

                                        As usual, of course, the answer to all of this is measure, measure, measure.

                                        1. 9

                                          This also impacts operating system design. FreeBSD’s GEOM stack was originally built with some fairly expensive abstractions that were assumed to be fine given the 1ms+ latency of even the fastest storage devices. SSDs and, especially, NVMe changed that and a lot of the work over the last decade has been carefully punching through some of those layers for places where the CPU had suddenly become the bottleneck.

                                          Disk encryption is a particularly interesting case where (not specific to FreeBSD) the calculation has changed a few times. With spinning rust, decompression added negligible latency. With SSDs, it suddenly became noticeable. Then newer compression algorithms (lz4 and zstd, in particular) came along and a single core was able to do stream decompression at the disk’s line rate. This was further improved by the fact that modern hardware can DMA directly to the last-level cache, so you read compressed data from disk to cache, decompress in cache, and then write decompressed data out to DRAM. The DRAM bandwidth is far wider than the disk bandwidth and so the disk remains the bottleneck here and the compression improves throughput with a negligible latency hit again, just as it did 20 years ago.

                                          1. 3

                                            Disk encryption is a particularly interesting case… …Then newer compression algorithms

                                            The topic of this paragraph suddenly changed halfway through from encryption to compression. Was that deliberate?

                                            At about the same time, AES acceleration became pretty much ubiquitous. Offhand, a 2016 Intel Mac with AES-NI gets about 6GB/s doing AES-GCM on one CPU core.

                                            1. 1

                                              No, my brain is still a bit fuzzy from the cold I’ve been suffering from. Encryption was meant to read compression. Encryption is also interesting though, with AES hardware on modern COUs. Before that, PCI crypto accelerators had interesting performance characteristics, with both high latency and high throughput, making them useful in some situations but a source of overhead in others.

                                          2. 2

                                            I have also been surprised by how slow Django JSON serialization can be.

                                          1. 16

                                            I am working on a project to generate 1 billion rows in SQLite under a minute and inserted 100M rows inserts in 33 seconds. First, I generate the rows and insert them in an in-memory database, then flush them to the disk at the end. To flush it to disk it takes only 2 seconds, so 99% of the time is being spent generating and adding rows to the in-memory B Tree.

                                            For Python optimisation, have you tried PyPy? I ran my same code (zero changes) using PyPy, and I got 3.5x better speed.

                                            I published my findings here.

                                            1. 4

                                              An easy way to do this might be to just create the DB in /dev/shm (on Linux anyway) and later copy it to a persistent device (if you have enough DIMM space).

                                              1. 5

                                                Sending PRAGMA synchronous=off to SQLite or using libeatmydata might be easier; either way your goal is to skip fsync / fdatasync calls (since this data could just be re-generated anyway).

                                                1. 4

                                                  Sending PRAGMA synchronous=off to SQLite

                                                  This is what I am doing!

                                            1. 3

                                              This seems wrong. From 1990s, dpkg and apt come to mind, changing software packaging and distribution. From 2000s, udev and systemd, implementing hotplug and changing system initialization. How are these not significant changes to Unix? Yes, dpkg/apt/udev/systemd are Linux specific, but they came into being due to changing needs, so other Unix systems also adopted something similar, like brew and launchd for macOS.

                                              1. 4

                                                Solaris 10 had SMF in 2005 and, I think, OS X 10.0 had launchd in 2001. These are just about out of the ‘90s, but only just. I don’t really see udev as a big change. Originally UNIX had userspace code that created dev nodes, then Linux moved it into the kernel, then they moved it out and created an event notification model. FreeBSD 5.0 added something similar around 2003, but it also didn’t fundamentally change anything.

                                                FUSE and CUSE are probably bigger changes, but they’re really applying ‘80s micro kernel concepts to UNIX.

                                                1. 3

                                                  I strongly agree that package managers like dpkg have been one of the biggest changes to how you use the OS day to day.

                                                1. 13

                                                  Maybe this is a hot take, but I suspect that unless we start using radically different physical hardware, UNIX is going to stay a pretty good API and I’m fine with it looking pretty similar in the year 2100.

                                                  Maybe this comment will age very poorly and people will dunk on me in the future. If so, sorry for climate change :(

                                                  1. 27

                                                    Maybe this is a hot take, but I suspect that unless we start using radically different physical hardware, UNIX is going to stay a pretty good API and I’m fine with it looking pretty similar in the year 2100.

                                                    The hardware has changed quite a bit from the systems where UNIX was developed:

                                                    • Multicore is the norm, increasingly with asymmetric multiprocessing (big.LITTLE and so on).
                                                    • There are multiple heterogeneous compute units (GPUs, NPUs, and so on).
                                                    • There is almost always at least one fast network.
                                                    • Local storage latency is very low and seek times are also very low.
                                                    • Remote storage capacity is effectively unbounded.

                                                    Some changes were present in the ’90s:

                                                    • There’s usually a display capable of graphics.
                                                    • There’s usually a pointing device.
                                                    • RAM is a lot slower than the CPU(s), you can do a lot of compute per memory access.

                                                    At the same time, user needs have changed a lot:

                                                    • Most computers have a single user.
                                                    • Most users have multiple computers and need to synchronise data between them.
                                                    • Software comes from untrusted sources.
                                                    • Security models need to protect the user’s data from malicious or compromised applications, not from other users.
                                                    • Users perform a very large number of different tasks on a computer.

                                                    I think UNIX can adapt to the changes to the hardware. I’m far less confident that it will adapt well to the changes to uses. In particular, the UNIX security model is a very poor fit for modern computers (though things like Capsicum can paper over some of this). Fuchsia provides an more Mach-like abstraction without a single global namespace (as did Plan 9), which makes it easier to run applications in isolated environments.

                                                    1. 2

                                                      Thanks for writing this out a lot better than I could have! I really like your distinction between adapting to hardware vs. uses. Re: the security model, one observation from the OP that I liked was:

                                                      I think that if you took a Unix user from the early 1990s and dropped them into a 2022 Unix system via SSH, they wouldn’t find much that was majorly different in the experience. Admittedly, a system administrator would have a different experience; practices and tools have shifted drastically (for the better).

                                                      Do you think that even the API exposed to UNIX users/programs might have to change to accommodate new security models?

                                                      I could believe that. ACLs (setfacl) are already quite different from traditional UNIX permissions, and apparently they never made it into POSIX despite being widespread. Although maybe that’s also a case of a change to UNIX that a system admin do/users don’t normally have to care about.

                                                      EDIT: To be clear, I don’t want to move the goalposts–I definitely think something like setfacl that fails to be standardized as part of POSIX is an example of its limits, and a counterpoint to my claim that we’ll be using UNIX in 2100.

                                                      And to try and answer my own question, it looks like Fuchsia might not plan on being POSIX complaint? So that’d also be a counterpoint:

                                                      On Fuchsia the story is a bit different from Posix systems. First, the Zircon kernel (Fuchsia‘s microkernel) does not provide a typical Posix system call interface. So a Posix function like open can’t call a Zircon open syscall. Secondly, Fuchsia implements some parts of Posix, but omits large parts of the Posix model. Most conspicuously absent are signals, fork, and exec.

                                                      From: https://fuchsia.googlesource.com/docs/+/refs/heads/sandbox/jschein/default/libc.md

                                                      1. 7

                                                        Things like POSIX ACLs don’t really change the model, they just add detail. They’re still about protecting files (where, in UNIX, ‘file’ just means ‘something that exists in some namespace outside of the program’s memory’) from users. In practice, there is a single real user and the security goals should be to protect that user’s files from programs, or even parts of programs.

                                                        Capsicum is a lot better in this regard. Once a program enters capability mode, it lacks all access to any global namespaces and requires some more privileged entity to pass it new file descriptors to be able to access anything new. This works well with the power box model, where things like open and save dialog boxes are separate programs that return file descriptors to the invoking program. It’s still difficult to compartmentalise an application with Capsicum though, because you really want a secure synchronous RPC mechanism, like Spring Doors (also on Solaris).

                                                        Server things are increasingly moving towards PaaS models where you write code against some high-level API with things like HTTP endpoints as the IPC model. Currently, these are built on Linux.p but I don’t see that continuing for the next 10-20 years because 90% of the Linux kernel is irrelevant in such a context. There’s no need for most IPC primitives, because the only communication allowed between components is over the network (which helps a lot with scalability, if everything is an async network request and you write your code like his then it’s easy to deploy each component on a separate machine). There’s no need for a local file system. There’s very little need for a complex scheduler. There’s no need for multiple users even, you just want a simple way of isolating a network stack, a TLS/QUIC stack, a protocol parser, and some customer code, ideally in a single address space with different components having views of different subsets of it. In addition, you want strong attestation over the whole thing so that the user has an audit trail that ensures that you are not injecting malicious code into their system. You probably don’t even want threading, because you want customers to scale up by deploying more components rather than by making individual components do more.

                                                    2. 6

                                                      Yeah Unix was supposed to be a minimal portable layer on top of hardware, which you can build other systems on top of — like the web or JVM or every language ecosystem

                                                      So stability is a feature, not a bug

                                                      Ironically the minimal portable layer is also better than some of the higher layers for getting work done in a variety of domains

                                                      1. 2

                                                        How would you rate io_uring as a change to the Unix model? Big deal or not a big deal?

                                                        1. 3

                                                          Medium deal? Didn’t change the model but improved the economics a lot.

                                                      1. 7

                                                        I don’t completely agree with the “Separate I/O from processing” section. Using a functional style does not necessarily mean you need to have all the data you’re gonna work with in memory when you start working with it, you just need to separate the work you do between tasks that perform I/O and data transformation. There’s many FP tools to help you deal with streaming data, such as lazy sequences/transducers and parser combinators.

                                                        1. 4

                                                          Right, you want the code that does the IO to be separated from the code that does the computation, but only in terms of where you put its source code. They can still overlap in time at runtime.

                                                          You’re agreeing with the author here about this:

                                                          have the processing code call a (possibly configurable) function for input and output

                                                          1. 6

                                                            I was more ticked off by “get more memory”, which I don’t think should be an acceptable tradeoff for adopting a functional style.

                                                            1. 2

                                                              Good point.

                                                        1. 8

                                                          “ To quote publicly available data, by 2020, we had around 2000 engineers with 20M lines of hand-written code (10x more including the generated code) in the monorepo alone, ”

                                                          Every time I read stats like this I think - surely there must be a better way to write software!

                                                          1. 7

                                                            I agree! I hear a lot of good things about twitter’s culture (before Musk took over, that is). A kernel team, a culture of excellence, etc. But honestly, the actual service they offer is hosting a bunch of tweets, pictures, and videos. Their site fails to work very regularly.

                                                            Surely there must be some challenges with scaling, but from the outside it just seems very crappy.

                                                            1. 21

                                                              I haven’t experienced a significant Twitter outage in years.

                                                              I think you underestimate the challenges of running a near real-time many-to-many messaging service, to be honest.

                                                              1. 12

                                                                I think you underestimate the challenges of running a near real-time many-to-many messaging service, to be honest.

                                                                And an analytics platform, a (very) high-volume data API platform for data customers, multiple search systems (some are paid-access only), an advertising platform, a content moderation system (across many countries, in compliance with local laws), probably a copyright infringement system, anti-abuse systems, internal monitoring and reporting, mobile applications, internationalization and localization systems, …

                                                                People have this incredibly reductive view of what Twitter, in particular, actually does, that I find really frustrating. It’s not simple.

                                                                1. 4

                                                                  People have this incredibly reductive view of what Twitter, in particular, actually does

                                                                  I have a vague memory of, possibly, a JWZ thing where he points out that browsers are huge now because, whilst you only use 10% of the functionality, B uses a different 10%, C uses a different 10% again, etc., and that leads to a complex system which is, by necessity, big.

                                                                  (But I cannot currently find the article / rant itself.)

                                                                2. 4

                                                                  Nothing should require 20 million lines of code to accomplish.

                                                                  1. 8

                                                                    Why not?

                                                                    1. 2

                                                                      Because small things are exponentially easier to work with and understand than large things.

                                                                  2. 3

                                                                    Not an outage but we’ve all experienced breakage.

                                                                    1. 1

                                                                      I’m starting to have weird issues on my “Latest Tweets” timeline since a few days (on mobile).

                                                                      1. 1

                                                                        I am excluding issues after the Musk takeover.

                                                                  3. 5

                                                                    So happy you said this. It seems like FAANGs get praised for their scale, when really it’s a completely pathological case of inefficiency that requires that many engineers to begin with. There is a better way, we can’t give up on that.

                                                                    What’s interesting is that works out to 10,000 LOC per engineer, which doesn’t sound like much but realistically how much code can a single human actually comprehend? LOC is not useful in many ways, but there is certainly an upper bound on how many LOC a human brain can reasonably work with.

                                                                    1. 5

                                                                      You can definitely write something that provides similar functionality in much fewer lines of code. I guarantee you won’t enjoy the on-call rotations, though.

                                                                      1. 1

                                                                        This is confusing - are you saying that more lines of code implies better maintainability and reliability? That would go against any study where bugs are found to be very directly related to lines of code.

                                                                        What I’m saying is that there’s an upper bound on physically how much code a human being can physically handle, and knowing what that limit is would be a good thing. I’m not suggesting that we play cos golf, but we should learn how to more efficiently use groups of people.

                                                                        1. 1

                                                                          It’s nothing to do with bugs per line of code. It’s how automated your procedures are for interfacing with the inherent complexity of hosting things in the real world. I’ve spent some years inside Azure dealing with this - the amount of effort it took to turn building & bringing online a new datacenter from a highly manual process to an even partially automated process was staggering.

                                                                          1. 1

                                                                            I see. Sure, if you want to solve problems, that comes with added logic. My criticism of large companies is that they can afford to have hoards (by which I mean ~tens of thousands) of humans add this logic at the abstraction level of our current tools, which hides the fundamental issue: I wish we’d be able to do equal or more with way less effort.

                                                                            I understand that sounds pie-in-the-sky, but I’ve been at least experimenting with model-driven code generation a lot, and it feels slightly promising. Essential complexity can’t be avoided, but how much of the Azure datacenter code is essential?

                                                                      2. 2

                                                                        We just had a post from someone who has a game with 58,000 LOC so 10,000 is likely to small: https://lobste.rs/s/lsspr7/porting_58000_lines_d_c_jai_part_0_why_how

                                                                        1. 1

                                                                          Sure, but it’s very interesting that it’s still in the same order of magnitude. It’s also very likely that individual productivity goes down on a multi-person team, because of the coordination necessary between people, and also because you have to work within parameters that you didn’t create.

                                                                      3. 4

                                                                        I think the first step is to understand why it’s 20M lines in the first place. Is it lots of test code? Sad-path handling? Boilerplate? Features? Regulatory compliance? Maybe most of it actually is necessary!

                                                                        1. 1

                                                                          They had a kernel team and their own linux fork. I would bet on them having multiple MLoC of other forked / vendored deps too.

                                                                        2. 4

                                                                          It seems so, but I think it’s largely an illusion. Of course at 20M there’s probably a few M lines of code that could be cut, but I don’t think you could easily reduce it by an order of magnitude without losing something important.

                                                                          Just like institutions grow bureaucratic processes out of past pains and failures, they grow code out of past pains and reaching breaking points with simple or out-of-the-box solutions.

                                                                          For example, the original Twitter may have used off-the-shelf memcached, but grew to the point that its limitations around cache fragmentation and eviction strategies that don’t matter for most users, did matter to Twitter. Suddenly it’s a big step: from “just install memcached” it becomes “develop and maintain a custom cache cluster tuned to your workload”. Repeat that a few times, and you have millions lines of code that seem like they could be replaced with “simple” solutions, but actually everything would fall over if you tried.

                                                                          Apart from scalability/performance, also resilience is a code multiplier. You not only develop feature X, but also error handling, lots of error handling, automated tests, healthchecks, fallbacks for every failure scenario, error handling for the fallbacks, rate limiting and circuit breakers for errors to avoid cascading failures, monitoring and logging, aggregation and alerting for the monitoring, and supporting infra for all of the extra code and tooling.

                                                                        1. 1

                                                                          Seems like a good idea to me because there’s already a Twitter field and a GitHub field.

                                                                          1. 11

                                                                            it looks intersting but the the crypto bro part immediately turned me off

                                                                            1. 9

                                                                              Our company has nothing to do with cryptocurrencies! We’re building HVM, a massively parallel functional runtime developed in Rust, based on a new model of computation (Interaction Nets), that is outperforming Haskell’s compiler in some cases by an order of magnitude. I believe it is a groundbreaking technology that has the potential to change the tech industry by making massive parallelism accessible. It is the first “automatic parallelization” project with remarkable real-world success and numbers to back it up.

                                                                              Yes, we’re building a p2p computer too, but it is just a secondary product we made mostly to showoff the performance of the HVM. Specifically, we replaced Ethereum’s EVM by the HVM, and managed to hit ~100x increased throughput for many opcodes. But that’s just it, a p2p virtual machine. It isn’t cryptocurrency. It doesn’t even have a currency! I share the hate, but not every p2p project is a cryptocurrency. Torrent and DNS existed way before Bitcoin, and are fundamental to the internet as we know it!

                                                                              That said, this webpage is a draft and we’re due to some massive rework of it, because it is clearly not showing our intents properly, so that’s not your fault. We need to communicate better.

                                                                              1. 2

                                                                                I would be very interested a p2p computer. I think having a shared computer with state is a key building block for a lot of services. I’m also interested in replicating the experience/vibe of having a shared machine that multiple (trusted) people can SSH into. No interest in having money involved here.

                                                                                1. 2

                                                                                  Exactly! That’s the spirit/point of our chain: you do NOT need a cryptocurrency to have a worldwide shared computer, and such a thing would be so useful as a technological primitive that projects can rely on, just like internet protocols and whatnot. But I’m almost regretting developing it because people immediately jump to the conclusion that we’re a crypto project, even though the chain isn’t nearly as important as HVM and don’t even have a currency!

                                                                                  1. 4

                                                                                    Don’t call it a chain IMO. That nearly lost my interest when I saw it.

                                                                                    1. 2

                                                                                      To be clear, does the chain use proof-of-work or other energy-wasting mechanisms?

                                                                                  2. 2

                                                                                    If you don’t want people to think it’s a cryptocurrency thing, you badly need to redesign the marketing. The very first thing I see when I look at this is a picture of a chain. You’ve already turned off everyone who dislikes cryptocurrency at this point.

                                                                                    The fact that there’s text on that next page that says it’s not a cryptocurrency thing doesn’t help you much because you already created a bad first impression and some readers have already left by this point.

                                                                                    A second problem I have is this:

                                                                                    It is PoW-based, forever. In fact, PoS isn’t even possible, since there is no built-in currency.

                                                                                    This is one of the specific aspects of cryptocurrencies that has made people dislike them.

                                                                                    Could you not cut out the PoW waste entirely by having some TTPs sign blocks, acting like notaries?

                                                                                  3. 5

                                                                                    Looks like a less banana-pants crazy variation on the Urbit idea.

                                                                                    1. 4

                                                                                      to be clear the parts that look interesting to me are Kind2 and HVM

                                                                                      1. 1

                                                                                        Me too. I am also interested in the ‘Interaction Combinators’ mentioned in the manifesto. It is unclear to me how it relates to HVM. Any hint?

                                                                                        1. 1

                                                                                          HVM is the first real-world efficient implementation of Interaction Nets! We took this new model of computation, which was invented in 1997, developed a memory-efficient representation for it, called Symmetric Interaction Calculus, and, based on that, we implemented the High-order Virtual Machine (HVM), a massively parallel runtime. Right now it only targets functional languages, and it is already outperforming Haskell’s GHC by 9.2x in some cases (!), but it isn’t limited to FP.

                                                                                    1. 5

                                                                                      I have to admit, I would not have guessed that y2038 could have caused such obscure behavior. (We really need to finish migrating away from 32-bit software over the next decade.)

                                                                                      1. 10

                                                                                        There are many proprietary software in the state of abandonware; games are a good example of that, and I believe Steam is going to become some sort of mad games archive (if it isn’t already). Original developers might not have access to the code anymore, it might even be lost forever, or maybe they wouldn’t be allowed to modify it, couldn’t care about it, etc.

                                                                                        I believe old video games (and actually all proprietary software for that matter) are kind of in the same position as old video files: when spread in the wild, we have to support them forever. That’s why the amount of hacks to support old broken encoded files that you can find in projects like FFmpeg is likely never going to shrink, ever. You need to be able to playback an old .avi or .wmv files from your hard drive from 20 years ago, even if the dumb encoder that produced them doesn’t exist anymore, and even if it was generated a half broken files violating whatever standard at that time. It’s terrible and depressing, but that’s the way it is.

                                                                                        Instead I’m expecting to see some sort of virtualization/container mechanism that put these games in some sort of time loop by isolating a fake clock and hooks all stat FS accesses such that never reaches y2038.

                                                                                        1. 2

                                                                                          But even with virtualization, etc, how do you pick the time it is? What if the software uses functions depending on time, day of week, etc.? Especially video games tend to have Easter eggs.

                                                                                          You’d have to decide if you want to to be close (so January 2038), if you need the right week days, how long the application runs so it doesn’t have to go over 2038, software might even do worse if time goes backwards, especially in bigger steps.

                                                                                          Side note: Containers as they are mostly used today I don’t think are varying in time, because they share the OS which keeps the time.

                                                                                          I’d think that today’s software will likely run in an emulator. Does anyone know how emulators for very old systems deal with Y2K and other time related issues, for example in games that use something like Release Year + X or something to shave off some bits?

                                                                                          1. 3

                                                                                            But even with virtualization, etc, how do you pick the time it is? What if the software uses functions depending on time, day of week, etc.? Especially video games tend to have Easter eggs.

                                                                                            An arbitrary user offset from the current clock?

                                                                                            Side note: Containers as they are mostly used today I don’t think are varying in time, because they share the OS which keeps the time.

                                                                                            On the technical side, a naive implementation would use LD_PRELOAD to hook gettimeofday, stat, etc

                                                                                            1. 2

                                                                                              An arbitrary user offset from the current clock?

                                                                                              Yes, that’s what I mean. Not trivial to choose. Might even be hard to decide on what’s the “best offset” on a per-software basis. And you might very much not be aware about all the things it does, so I am expecting a lot of interesting investigations and “archeological” findings in that area in the time after 2038.

                                                                                              On the technical side, a naive implementation would use LD_PRELOAD to hook gettimeofday, stat, etc

                                                                                              Again, that’s what I mean. Not really a container. But then that term is loosely defined and only nitpicking. ;-)

                                                                                              1. 1

                                                                                                Might even be hard to decide on what’s the “best offset” on a per-software basis.

                                                                                                What’s wrong with “release date plus one year”? I’m not seeing what you think will be hard about picking which time to emulate. (As opposed to emulating the desired time, which is a technical challenge.)

                                                                                              2. 2

                                                                                                On the technical side, a naive implementation would use LD_PRELOAD to hook gettimeofday, stat, etc

                                                                                                There is already the rather sophisticated libfaketime: https://github.com/wolfcw/libfaketime

                                                                                              3. 2

                                                                                                There’s a time namespace since around Linux 5.6 but it only lets you mess with the boot time and monotonic clocks, not the real time clock. https://man7.org/linux/man-pages/man7/time_namespaces.7.html

                                                                                                The sensible way to fake the date for just one process is to run it in a virtual machine of course. :)

                                                                                          1. 4

                                                                                            “under thirty (virtual) servers, a very few large PostgreSQL database servers and a single file-server/storage”

                                                                                            I’d love to see the back of the envelope math behind the estimation that you could probably fit 6 million users into this. I’m not saying this claim is wrong or anything, just that the capacity planning would be interesting.

                                                                                            1. 4

                                                                                              Not directly an answer to your question, but years ago I found this post about SO infrastructure quite interesting. Certainly, a Relational DBMS can get you quite far.

                                                                                              1. 1

                                                                                                I should re-read that now that very fast SSDs are cheap and commonplace instead of being an unusual expensive investment.

                                                                                                Edit: having reread it, oh gosh remember when a TB of SSD was tremendously expensive? Dang. The detail that stuck out to me was their having to have overkill LAN bandwidth to support backups and other similar bulk data transfers.

                                                                                            1. 2

                                                                                              Annoyingly, blocking these requests did not surface any error – it seems that whatever that Stripe is scraping is non-essential

                                                                                              Seems like a lot of the value add for Stripe is their magic sauce differentiating fraud from non-fraud. And, much like people, they try to widen their “line-of-sight” on their counter party. I assume they do this both to check for red flags, and probably also to check for “green” flags too. Stripe scraping the website might just be part of their fraud differentiation, and that seems to me to be consistent with the blocked scraping not surfacing an error.

                                                                                              1. 1

                                                                                                Extremely advanced anti-fraud systems are fairy standard across the American payment card industry, from what I’ve heard. As a developer, Stripe’s SDK and API, especially around subscription management and comprehensive story for suspended payments, is the primary reason to choose them.

                                                                                                1. 1

                                                                                                  They really don’t care as much about fraud as we be optimal for society, because their customers, the ones receiving the fraudulent payments, are the ones that bear the risk. And those customers don’t have much choice due to the creditcard duopoly.

                                                                                                2. 1

                                                                                                  It being anti fraud would also explain why it claims to be Chrome rather than explicitly saying “Stripe” in the user agent string.

                                                                                                  It’s slightly worrying that it says outdated Chrome. In the unlikely but not totally implausible event that they really are using headless Chrome to scrape pages, they could have all sorts of unpatched old vulnerabilities. ;)

                                                                                                  1. 5

                                                                                                    I tinkered a bit more with it after posting this, specifically because I was wondering the same thing, and yeah… it really looks like they are using a very old version of headless Chrome.

                                                                                                    1. 2

                                                                                                      So we’re finally at the point where you don’t target normal visitors, but towards scarper bots.

                                                                                                      1. 1

                                                                                                        Oh eek

                                                                                                        I hope they sandbox that well

                                                                                                  1. 7

                                                                                                    This reminds me of a thing I read a few years ago about why health services were trying to consolidate elective surgery wards. Instead of having a small number of surgeons each doing a large number of procedures, the bigger consolidated wards could have each surgeon specialise in very few procedures or even just one, and then get really really good at them by sheer amount of practice. Apparently the improvement to patient health outcomes is significant.

                                                                                                    1. 20

                                                                                                      This seems incredibly hard for some people to understand, but I actually like JavaScript and do understand types. In fact, I use strongly typed backend languages all the time! I also see TS as unnecessary overhead for what basically amounts to “I feel better because a class of bugs that almost never happens won’t happen.”

                                                                                                      I always see the runtime error excuse yet I’ve never seen anyone actually have issues with “I passed the wrong type into a function and broke the site.”

                                                                                                      The only one that has merit is the IDE support, but unless you’re making a library (which is the time I do find TS worth using), you’re not going to notice this part much if at all.

                                                                                                      I’ve been coding in JS since the late 90s. I’ve seen a lot of really bad code. I’ve also seen some really awesome code. I’ve also had to use TS everywhere for work and saw Enterprise Java infect it. You can make some terrible code with TS.

                                                                                                      If you like TS, go for it. But don’t assume I don’t understand it because I don’t prefer it.

                                                                                                      And for what it’s worth, optional typing is very likely coming to vanilla JS in the near future. When it’s built in, I’ll be happier to see it. I don’t want a whole ecosystem of dependencies for it.

                                                                                                      1. 19

                                                                                                        I’ve never seen anyone actually have issues with “I passed the wrong type into a function and broke the site.”

                                                                                                        I am so jealous. Either you work alone or you only work with competent programmers. That must be nice.

                                                                                                        1. 22

                                                                                                          I’ve worked on projects with brilliant programmers, and this still was a problem. Once the codebase is larger than what a single person can fit in their head, and old enough that it needs refactorings, types save a lot of pain.

                                                                                                          1. 3

                                                                                                            I’ve worked with absolute beginners and really experienced people over the years. If you have people breaking production because they passed e.g. a string instead of a number, your tests aren’t good enough.

                                                                                                            And to be clear, you don’t test for type. You test behavior and functionality.

                                                                                                            1. 2

                                                                                                              A type checker is equivalent to an automatic and concise test for behavior at every call site that involves a string or number that you can’t forget to write.

                                                                                                          2. 13

                                                                                                            I always see the runtime error excuse yet I’ve never seen anyone actually have issues with “I passed the wrong type into a function and broke the site.”

                                                                                                            Not only have I seen this happen with Javascript, but with Ruby and Python as well. However, I don’t believe strong typing is in service to preventing such things. Rather less directly it’s about enabling the powerful type-aware tooling that prevents such things regularly during development and subsequently improving developer efficiency.

                                                                                                            If you like TS, go for it. But don’t assume I don’t understand it because I don’t prefer it.

                                                                                                            I think this is a valid point that strong typing advocates should hear.

                                                                                                            1. 4

                                                                                                              The only one that has merit is the IDE support, but unless you’re making a library (which is the time I do find TS worth using), you’re not going to notice this part much if at all.

                                                                                                              Maybe another way to consider it is accident-proofing. I tend to find that by shifting some of the load to the computer, it frees me up to focus more in the logic I’m trying to implement, or change.

                                                                                                              You might not need that support, but other folks in your team might, and might even be uncomfortable admitting that.

                                                                                                              1. 4

                                                                                                                I can see an argument for TypeScript simply because of how many things you have to more or less memorize to understand what JavaScript will do in a given situation. Introducing a layer of abstraction which doesn’t require that memorization, which lets you write simpler code without worrying about it, and which then compiles to JS that will do what you intended, has utility for me. Though a lot of modern JavaScript has cleaned up the worst bits anyway, so it’s not always a necessity.

                                                                                                                As to other benefits of TypeScript… well, I’m not sure I see them, and to be honest I’m not sure I see them in Python (my main day-to-day language), either.

                                                                                                                Some of this is just that dynamically-typed languages are different. I think that’s the single biggest hurdle – people who are used to static typing have a certain mindset about how they write code, and often have difficulty adjusting it. I’ve seen people write the equivalent of:

                                                                                                                def some_function(param_1, param_2, param_3, param_4):
                                                                                                                    if not isinstance(param_1, str):
                                                                                                                        raise TypeError("param_1 must be string!")
                                                                                                                    if not isinstance(param_2, int):
                                                                                                                        raise TypeError("param_2 must be int!")
                                                                                                                    # ... etc.
                                                                                                                

                                                                                                                And then they write unit tests where they call the function passing in a bunch of random things just to assert that it raises the expected TypeError.

                                                                                                                Which is not how I write code. And not how people who are actually used to dynamic typing write code. There’s a level of what I can only call trust: that if you do by some chance pass, say, a str to something that expected an int, it will be caught by the test suite. Not because the test suite contains exhaustive checks for all possible types being passed to every function, but because the test suite will exercise the codebase as a whole and the bad path will execute somewhere in there and blow up.

                                                                                                                And for the most part people don’t go around doing things like passing completely wrong types into functions. After all, dynamically-typed languages have documentation! Including easily-accessible documentation that your editor or IDE can display inline for you! This is actually the only real use case I currently have for type annotations in Python – they’re much less cumbersome than the alternatives, and most tooling now picks them up automatically, so they streamline the documentation process. Which means that instead of writing Sphinx param declarations to get that part of my docs, I just put in type annotations for most things. I don’t actually run mypy or any other checker, though (for a variety of reasons).

                                                                                                                But I do think most of it comes down to that lack of trust. I’ve worked on lots of codebases in dyamicallv-typed languages. Including huge codebases with large teams and even teams of teams involved. And I’ve never seen dynamic typing fail to “scale up” the way people always insist it will. Which isn’t to say I’ve never seen failures, just that I’ve never seen failures that were attributable solely to choosing a dynamically-typed language. Scaling up a team/codebase is difficult, and often fails even in projects that use statically-typed languages. People just like to selectively blame languages/type-systems/etc. for issues that were not technical in nature.

                                                                                                                I have once or twice seen people who came in with a chip on their shoulder about type systems and tried their best to “prove”, through committing deliberate mischief, that dynamically-typed code can never be trusted. But that’s a people problem, not a programming-language problem. And I’ve seen people who never really were able to let go of their fear give up and just went back to static typing. Where, amusingly, they rarely actually need the type system to yell at them because they do exactly the same things that would have worked in the dynamically-typed languages (like referring to documentation before calling into unfamiliar functions/methods).

                                                                                                                Plus, everybody who’s ever tried to rigorously prove that static typing produces better code has failed. Dan Luu’s literature review on this is old, but still very relevant for pointing out the common flaws in such attempts.

                                                                                                                1. 3

                                                                                                                  What size JavaScript codebase did you work on and with how many other developers?

                                                                                                                  1. 5

                                                                                                                    Pretty much all sizes you can imagine. This goes back pre-jQuery to my current role, from little startups to giant corporate behemoths. Some of the worst code I’ve seen is Java devs trying to write TypeScript at Capital One.

                                                                                                                    Some companies I’ve worked at that you may have heard of: Capital One, USA Today, ESPN, Nickelodeon

                                                                                                                  2. 2

                                                                                                                    I also see TS as unnecessary overhead for what basically amounts to “I feel better because a class of bugs that almost never happens won’t happen.”

                                                                                                                    In my experience, removing certain classes of bugs is only half of it. A typed language is also an excellent form of documentation that can’t be matched by manual approaches such as JSDoc.

                                                                                                                    Approaching a large code add without any types is scary and difficult, especially if the functions use JS tricks such as arguments and dynamic function signature overloading.

                                                                                                                    For small systems that one person can keep in their head, then sometimes a typed language can slow you down or feel like it gets in your way. When you have a legs system worked on by 100+ engineers, you are going to pay a heavy price for not using a typed front end language.

                                                                                                                    1. 1

                                                                                                                      And for what it’s worth, optional typing is very likely coming to vanilla JS in the near future. When it’s built in, I’ll be happier to see it. I don’t want a whole ecosystem of dependencies for it.

                                                                                                                      Are you referring to this?:

                                                                                                                      https://tc39.es/proposal-type-annotations/

                                                                                                                      1. 1

                                                                                                                        Yeah. Imagine instead of adding in a bunch of transpiling libraries you would just have the ability to add a plugin to your IDE to give you the same info. That’s the big benefit there.

                                                                                                                        1. 1

                                                                                                                          FWIW:

                                                                                                                          …the language in this specification favours TypeScript as it is the most popular type-system…

                                                                                                                          A few years ago I, like you I think, dismissed TypeScript as another CoffeeScript. It would, at best, influence JavaScript. Eventually the languages would diverge and TypeScript codebases would be marooned with an antiquated version of JavaScript as its compile target—hip today, embarrassing tomorrow.

                                                                                                                          Instead, a couple of things happened that surprised me. First, TypeScript has maintained (with one exception) a commitment to being a superset of JavaScript and had thus far done a good job of staying up to date with its parent language. Second, it has become immensely popular, the result of which is that the parent language is deferring to its child in this proposal, the intent of which appears to be that we will be able to run TypeScript from source in JS engines without any compilation. TypeScript would presumably then itself be run much as you envision: as a static analysis plugin for IDEs.

                                                                                                                          The elephant in the room is TypeScript’s enum, which breaks the “fully erasable” in “fully erasable type system.” This proposal explicitly will not support it. I would like to see TypeScript kill it in its next major version.

                                                                                                                          1. 1

                                                                                                                            enum such a small feature that ECMAscript could just add it.

                                                                                                                    1. 2

                                                                                                                      Implementing this sort of thing on Linux might prove difficult in practice if someone ends up wanting to do that in the future, since glibc uses mmap in malloc for large allocations (https://www.gnu.org/software/libc/manual/html_node/The-GNU-Allocator.html)

                                                                                                                      On a system like OpenBSD where libc and the kernel live in the same project, it’s a neat sounding feature!

                                                                                                                      1. 5

                                                                                                                        It doesn’t make all mmap regions immutable, only specified ones. The heap would, obviously, not set this flag. It also shouldn’t be incompatible with dlclose if dll pen doesn’t set these flags, only normal library mappings do, but I’d be quite happy if they killed dlclose: there’s no real need for it on 64-bit systems, the security cost significantly outweighs the small reduction in address-space consumption.

                                                                                                                        Personally, I wish they would step back and stop adding hacks on top of mmap. The API is seriously flawed because it conflates a load of things: allocating virtual address space, permissions on that address space, and backing store. It does all of these things with no permission model. OpenBSD would be in a good position to design and ship a deconflation of these. Brooks Davis proposed a ‘cookie mmap’ a few years ago that started this, where the call to allocate address space returned a capability that then gave you the rights to change the mapping, you could only do mprotect or MAP_FIXED-like things if you had that capability. Once you have such a model, it becomes quite easy to add a richer set of permissions, including dropping the right to subsequently modify the mapping.

                                                                                                                        1. 2

                                                                                                                          I’d be quite happy if they killed dlclose: there’s no real need for it on 64-bit systems, the security cost significantly outweighs the small reduction in address-space consumption

                                                                                                                          FWIW there’s one edge case, where you have an application that repeatedly creates and loads and unloads different .so files with dlsym, in which dlclose() not doing the thing would lead to a resource leak.

                                                                                                                          In practice though, just for stability alone, I’d recommend never doing that and spawning subprocesses to do the linking in of dynamically generated .so files instead.

                                                                                                                          1. 2

                                                                                                                            You might still have bugs if you’re repeatedly opening and closing things. If your library calls any function in a global constructor that trigger thread-local objects with non-trivial destructors to be accessed, then your library will either be locked in memory, or you’ll leak things. Library unloading is difficult to do reliably and needs language support to do well, the way that it was shoehorned into POSIX is a mess.

                                                                                                                      1. 3

                                                                                                                        I found this interesting because a speedup to something that was previously taking 50% of the total runtime made the whole process about 3x faster, instead of the “up to 2x” that you’d normally expect.

                                                                                                                        1. 2

                                                                                                                          I for one am very interested.

                                                                                                                          Anything which could provide a faster alternative to eslint would be extremely welcome.

                                                                                                                          I wouldn’t have said that I’m unhappy with the performance of Prettier at the moment; maybe it could be a lot faster but I have it running on save in my IDE and I just never see it taking time.l

                                                                                                                          1. 4

                                                                                                                            Looks as if the link is working now.

                                                                                                                            Please don’t use std::unreachable in switch statements like this, it breaks the compiler warning that you’ve missed a case. It’s also probably a good idea never to use it directly and instead use it in a debug function that checks the thing that you believe is impossible in debug builds (and builds that you’re fuzzing) and only allows the compiler to optimise based on it in release builds. Also, be aware that the compiler can’t dead-code-eliminate functions leading to an unreachable call unless it can prove that they don’t have side effects.

                                                                                                                            std::to_underlying looks useful for simplifying code that uses enum classs as unit types. It’s weirdly asymmetric though. I guess a std::from_underlying would be harder to use correctly since it may provide an invalid enum value and there’s no standard way of enumerating valid values to check.

                                                                                                                            std::byteswap has been a long time coming, but it is a weird API. The bits header contains an endian enumeration and I’d expect a less ambiguous API to be of the form to_byte_order<endian> and from_byte_order<endian>. You can implement these as:

                                                                                                                            template<typename T, std::endian E>
                                                                                                                            constexpr T to_byte_order(T value) requires std::integral<T>
                                                                                                                            {
                                                                                                                              if constexpr (E == std::endian::native)
                                                                                                                              {
                                                                                                                                return value;
                                                                                                                              }
                                                                                                                              return std::byteswap(value);
                                                                                                                            }
                                                                                                                            

                                                                                                                            But I’m not sure why this wasn’t the default API.

                                                                                                                            1. 4

                                                                                                                              Also, be aware that the compiler can’t dead-code-eliminate functions leading to an unreachable call unless it can prove that they don’t have side effects.

                                                                                                                              Isn’t the point that reaching std::unreachable is undefined behaviour, so the compiler is allowed to assume that the branch which executes it doesn’t have any side effects?

                                                                                                                              Edit: I would advocate abort() rather than unreachable() though. It costs very little code size and is much easier to debug.

                                                                                                                              1. 3

                                                                                                                                Sorry, I should have been clearer:

                                                                                                                                Anything on the path to unreachable and nowhere else is subject to DCE, but I’ve seen a fairly common misuse of __builtin_unreachable that looks something like this:

                                                                                                                                if (someFn())
                                                                                                                                  __builtin_unreachable();
                                                                                                                                

                                                                                                                                If you write this, the compiler doesn’t just assume that the function will return false and skip the call, it executes the function and eliminates the conditional. This kind of use is fine if someFn can be inlined and is side-effect free. If it has side effects (or can’t be proven not to have side effects) then it will still be generated.

                                                                                                                                This kind of thing is why __builtin_assume is generally better than __builtin_unreachable as a hint to the compiler. Unfortunately, there isn’t yet a standard way of expressing that in C++.

                                                                                                                              2. 1

                                                                                                                                std::from_underlying<Enum>(i) would be the same as Enum{i}, wouldn’t it? Which is shorter.

                                                                                                                                1. 2

                                                                                                                                  Huh, I thought you needed an explicit cast for that. I am constantly surprised by things that the C++ type system lets me do without warnings or errors.

                                                                                                                                  1. 1

                                                                                                                                    I think this got added in C++17. The nice thing is that it’s a bit safer than Enum(i) because it requires that the type of i conforms to the enum’s underlying type; so you’d get a compile error if Enum is based on int but i is an int64_t, for example.

                                                                                                                                    1. 1

                                                                                                                                      I guess it’s fine as long as it’s an implicit cast. In C, enum is basically useless because you can implicitly cast from int to enum foo and so you must assume that any enum variable may hold invalid values and defensively check. In C++, enum class is a lot more robust against that (and, unlike Rust, the compiler will not ‘helpfully’ delete any defensive checks that you do add if you are in code reading an enum from a memory-mapped I/O device or similar).