1. 2

    I don’t think I agree with the premise of this article. Starting from this code:

    bench_input = 42;
    start_time = time();
    bench_output = run_bench(bench_input);
    result = time() - start_time;
    

    The compiler may not move the benchmark call before the first time call unless it can prove that this move is not observable within the language semantics. If time is a system call, it’s game over: it may modify any global that run_benchmark reads. If it has complete visibility into the benchmark and the benchmark doesn’t read any globals then that may be fine.

    The last transform, completely eliding the benchmark run because it can be shown not to have side effects and its result is unused, is far more plausible but that’s also generally an indication of a misleading benchmark. Especially in this example where the input is a compile-time constant: even if you do use the result, the compiler is free to evaluate the whole thing at compile time. Even if it doesn’t, it may generate code that assumes more knowledge of the input than is true in the general case.

    The DoNotOptimize function is doing two things:

    • It is a compiler barrier or, in C++11 atomics terminology, a signal fence. It prevents the compiler reordering things across the boundary.
    • It causes a value to escape from the compiler’s ability to analyse. This is slightly scary because LLVM actually does look inside inline assembly blocks in back-end optimisations and there’s no guarantee that it won’t in the future look there in mid-level optimisers. These would be free to observe that the instruction sequence there (no instructions) have well-defined semantics and do not capture or permute the value and so optimise this away.

    You can do both of these without inline assembly. The signal fence is in <atomic> (C++) or <stdatomic.h> (C). The second is a bit more tricky but you generally need to either call a function that the compiler can’t see (difficult with LTO) or store to, and read back from, an atomic variable.

    1. 1

      it may modify any global that run_benchmark reads.

      I think the concern is that run_benchmark might not read any globals.

      It’s also not completely out of the question that the compiler has a special understanding of time() and knows that it doesn’t touch program globals. As far as I know that’s not the case at the moment, however (and if it was you might need further steps to guarantee that nothing got moved out from between the two calls to time()).

      If it has complete visibility into the benchmark and the benchmark doesn’t read any globals then that may be fine.

      Exactly :)

    1. 5

      Dragonfly has gone a long way since; now they’re trading blows with Linux in the performance front, despite the tiny team, particularly when contrasting it with Linux’s huge developer base and massive corporate funding.

      This is no coincidence; it has to do with SMP leveraged through concurrent lockfree/lockless servers instead of filling the kernel with locks.

      1. 3

        This comparison, which seems pretty reasonable, makes it look like it’s still lagging behind.

        1. 7

          What I don’t like about Phoronix benchmark results generally is that they lack depth. It’s all very well to report MP3 encoding test running for 32 seconds on FreeBSD/DragonflyBSD and only 7 seconds on Ubuntu, but that raises a heck of a question: why is there such a huge difference for a CPU-bound test?

          Seems quite possible that the Ubuntu build is using specialised assembly, or something like that, which the *BSD builds don’t activate for some reason (possibly even because there’s an overly restrictive #ifdef in the source code). Without looking into the reason for these results, it’s not really a fair comparison, in my view.

          1. 3

            Yes. This is well worth a read.

            Phoronix has no rigour; it’s a popular website. A benchmark is useless if it is not explained and defended. I have no doubt that the benchmarks run in TLA were slower under freebsd and dragonflybsd, but it is impossible to make anything of that if we do not know:

            1. Why

            2. What is the broader significance

            1. 4

              The previous two comments are fair, but at the end of the day it doesn’t really change that LAME will run a lot slower on your DragonflyBSD installation than it does on your Linux installation.

              I don’t think these benchmarks are useless, but they are limited: they show what you can roughly expect in the standard stock installation, which is what the overwhelming majority of people – including technical people – use. This is not a “full” benchmark, but it’s not a useless benchmark either, not for users of these systems anyway. Maybe there is a way to squeeze more performance out of LAME and such, but who is going to look at that unless they’re running some specialised service? I wouldn’t.

          2. 1

            This comparison, newer and from the same website, makes it look as the system that’s ahead (see geometric mean @ last page).

            Not that I’m a fan of that site’s benchmarks.

            1. 2

              I haven’t done the math, but it seems like most of DragonFlyBSD’s results come from the 3 “Stress-NG” benchmarks, which incidentally measures “Bogo Ops/s”.

              Here’s the benchmark page: https://openbenchmarking.org/test/pts/stress-ng

              I don’t know why Phoronix uses a version called 0.11.07 when the latest on the page seems to be 1.4.0, but maybe that’s just a display issue.

              1. 1

                Christ @ benchmarking with Bogo anything.

        1. 4

          This is actually not a bad rundown, though I feel like the discussion of UB lacks the correct nuance. When referring to integer overflow:

          The GNU C compiler (gcc) generates code for this function which can return a negative integer

          No, it doesn’t “return a negative integer”, it has already hit undefined-behaviour-land by that point. The program might appear to behave as if a negative integer was returned, but may not do so consistently, and that is different from having a negative integer actually returned, especially since the program might even exhibit odd behaviours that don’t correspond to the value being negative or the arithmetically correct value, or which don’t even appear to involve the value at all. (Of course, at the machine level, it might do a calculation which stores a negative result into a register or memory location; but, that’s the wrong level to look at it, because the presence of the addition operation has effects on compiler state that can affect code generation well beyond that one operation. Despite the claim being made often, C is not a “portable assembler”. I’m glad this particular article doesn’t make that mistake).

          1. 3

            What? The code in question:

            int f(int n)
            {
                if (n < 0)
                    return 0;
                n = n + 100;
                if (n < 0)
                    return 0;
                return n;
            }
            

            What the article is saying is that on modern C compilers, the check for n < 0 indicates to the compiler that the programmer is rejecting negative numbers, and because programmers never invoke undefined behavior (cough cough yeah, right) the second check when n < 0 can be removed because of course that can’t happen!

            So what can actually happen in that case? An aborted program? Reformatted hard drive? Or a negative number returned from f() (which is what I suspect would happen in most cases)? Show generated assembly code to prove or disprove me please … (yes, I’m tired of C language lawyers pedantically warning about possible UB behavior).

            1. 3

              because programmers never invoke undefined behavior

              They shouldn’t, but they often do. That’s why articles such as the one in title should be super clear about the repercussions.

              So what can actually happen in that case?

              Anything - that’s the point. That’s what the “undefined” in “undefined behaviour” means.

              (yes, I’m tired of C language lawyers pedantically warning about possible UB behavior).

              The issue is that a lot of this “possible UB behaviour” is actual compiler behaviour, but it’s impossible to predict which exact behaviour you’ll get.

              You might be “tired of C language lawyers pedantically warning about possible UB behaviour”, but I’m personally tired of programmers invoking UB and thinking that it’s ok.

              1. 1

                They shouldn’t, but they often do.

                Yes they do, but only because there’s a lot of undefined behaviors in C. The C standard lists them all (along with unspecified, implementation and locale-specific behaviors). You want to know why they often do? Because C89 defined about 100 undefined behaviors, C99 about 200 and C11 300. It’s a bit scary to think that C code that is fine today could cause undefined behavior in the future—I guess C is a bit like California; in California everything causes cancer, and in C, everything is undefined.

                A lot historically came about because of different ways CPUs handle certain conditions—the 80386 will trap any attempt to divide by 0 [1] but the MIPS chip doesn’t. Some have nothing to do with the CPU—it’s undefined behavior if a C file doesn’t end with a new line character. Some have to do with incorrect library usage (calling va_arg() without calling va_start()).

                I’m personally tired of programmers invoking UB and thinking that it’s ok.

                Undefined behavior is just that—undefined. Most of the undefined behavior in C is pretty straightforward (like calling va_arg() incorrectly), it’s really only signed-integer math and pointers where most of the problems with undefined behavior is bad. Signed-integer math is bad only in that it might generate invalid indices for arrays or for pointer arithmetic (I mean, incorrect answers are still bad, but I’m more thinking of security here). Outside of that, I don’t know of any system in general use today that will trap on signed overflow [2]. So I come back to my original “What?” question. The x86 and ARM architectures have well defined signed integer semantics (they wrap! I’ve yet to come across a system where that doesn’t happen, again [2]) so is it any wonder that programmers will invoke UB and think it’s okay?

                And for pointers, I would hazard a guess that most programmers today don’t have experience with segmented architectures which is where a lot of the weirder pointer rules probably stem from. Pointers by themselves aren’t the problem per se, it’s C’s semantics with pointers and arrays that lead to most, if not all, problems with undefined behavior with pointers (in my opinion). Saying “Oh! Undefined behavior has been invoked! Abandon all hope!” doesn’t actually help.

                [1] IEEE-754 floating point doesn’t trap on division by 0.

                [2] I would love to know of a system where signed overflow is trapped. Heck, I would like to know of a system where trap representations exist! Better yet, name the general purpose systems I can buy new, today, that use sign magnitude or 1s-complement for integer math.

                1. 2

                  Because C89 defined about 100 undefined behaviors, C99 about 200 and C11 300

                  It didn’t define them; It listed circumstances which have undefined behaviour. This may seem nit-picky, but the necessity of correctly understanding what is “undefined behaviour” is the premise of my original post.

                  A draft of C17 that I have lists 211 undefined behaviours listed. An article on UB - https://www.cs.utah.edu/~regehr/ub-2017-qualcomm.pdf - claims 199 for C11. I don’t think your figure of 300 is correct.

                  A bunch of the C11 circumstances for UB are to do with the multi-threading support which didn’t exist in C99. In general I don’t think there’s any strong reason to believe that code with clearly well-specified behaviour now will have UB in the future.

                  So I come back to my original “What?” question

                  It’s not clear to me what your “what?” question is about. I elaborated in the first post on what I meant by “No, it doesn’t “return a negative integer””.

                  Compilers will for eg. remove checks for impossible (in the absence of UB) conditions and other things that may be even harder to predict; C programmers should be aware of that.

                  Now, if you want to argue “compilers shouldn’t do that”, I wouldn’t necessarily disagree. The problem is: they do it, and the language specification makes it clear that they are allowed to do it.

                  The x86 and ARM architectures have well defined signed integer semantics

                  so is it any wonder that programmers will invoke UB and think it’s okay?

                  This illustrates my point: if we allow the view of C as a “portable assembly language” to be propagated, and especially the view of “UB is just the semantics of the underlying architecture”, we’ll get code being produced which doesn’t work (and worse, is in some cases exploitable) when compiled by today’s compilers.

                  1. 1

                    I don’t think your figure of 300 is correct.

                    You are right. I recounted, and there are around 215 or so for C11. But there’s still that doubling from C89 to C99.

                    No, it doesn’t “return a negative integer”, it has already hit undefined-behaviour-land by that point.

                    It’s not clear to me what your “what?” question is about.

                    Unless the machine in question traps on signed overflow, the code in question returns something when it runs. Just saying “it’s undefined behavior! Anything can happen!” doesn’t help. The CPU will either trap, or it won’t. There is no third thing that can happen. An argument can be made that CPUs should trap, but the reality is nearly every machine being programmed today is a byte-oriented, 2’s complement machine with defined signed overflow semantics.

                    1. 1

                      Just saying “it’s undefined behavior! Anything can happen!” doesn’t help

                      It makes it clear that you should have no expectations on behaviour in the circumstance - which you shouldn’t.

                      Unless the machine in question traps on signed overflow, the code in question returns something when it runs.

                      No, as already evidenced, the “result” can be something that doesn’t pass the ‘x < 0’ check yet displays as a negative when printed, for example. It’s not a real value.

                      The CPU will either trap, or it won’t

                      C’s addition doesn’t map directly to the underlying “add” instruction of the target architecture; it has different semantics. It doesn’t matter what the CPU will or won’t do when it executes an “add” instruction.

            2. 1

              Yes, the code generated does in fact return a negative integer. You shouldn’t rely on it, another compiler may do something different. But once compiled undefined behaviour isn’t relevant anymore. The generated x86 does in fact contain a function that may return a negative integer.

              Again, it would be completely legal for the compiler to generate code that corrupted memory or ejected your CD drive. But this statement is talking about the code that happened to be generated by a particular run of a particular compiler. In this case it did in fact emit a function that may return a negative number.

              1. 1

                When we talk about undefined behaviour, we’re talking about the semantics at the level of the C language, not the generated code. (As you alluded, that wouldn’t make much sense.)

                At some point you have to map semantics between source and generated code. My point was, you can’t map the “generates a negative value” of the generated code back to the source semantics. We only say it’s a negative value on the basis that its representation (bit pattern) is that of a negative value, as typically represented in the architecture, and even then we’re assuming that for instance some register (for example) that is typically used to return values does in fact hold the return value of the function …

                … which it doesn’t, if we’re talking about the source function. Because that function doesn’t return once undefined behaviour is invoked; it ceases to have any defined behaviour at all.

                I know this is highly conceptual and abstract, but that’s at the heart of the message - C semantics are at a higher level than the underlying machine; it’s not useful to think in terms of “undefined behaviour makes the function return a negative value” because then we’re imposing artificial constraints on undefined behaviour and what it is; from there, we’ll start to believe we can predict it, or worse, that the language semantics and machine semantics are in fact one-to-one.

                I’ll refer again to the same example as was in the original piece: the signed integer overflow occurs and is followed by a negative check, which fails (“is optimised away by the compiler”, but remember that optimisation preserves semantics). So, it’s not correct to say that the value is negative (otherwise it would have been picked up by the (n < 0) check); it’s not guaranteed to behave as a negative value. It’s not guaranteed to behave any way at all.

                Sure, the generated code does something and it has much stricter semantics than C. But saying that the generated function “returns a negative value” is lacking the correct nuance. Even if it’s true that in some similar case, the observable result - from some particular version of some particular compiler for some particular architecture - is that the number always appears to be negative, this is not something we should in any way suggest is the actual semantics of C.

              2. 0

                Of course, at the machine level, it might do a calculation which stores a negative result into a register or memory location; but, that’s the wrong level to look at it, because the presence of the addition operation has effects on compiler state that can affect code generation well beyond that one operation.

                Compilers specifically have ways of ensuring that there is no interference between operations, so no. This is incorrect. Unless you want to point to the part of the GCC and Clang source code that decides unexpectedly to stop doing that?

                1. 1

                  In the original example, the presence of the addition causes the following negative check (n < 0) to be omitted from the generated code.

                  Unless you want to point to the part of the GCC and Clang source code that decides unexpectedly to stop doing that?

                  If that’s at all a practical suggestion, perhaps you can go find the part that ensures “that there is no interference between operations” and point that out?

                  1. 1

                    In the original example, the presence of the addition causes the following negative check (n < 0) to be omitted from the generated code.

                    Right, because register allocation relies upon UB for performance optimization. It’s the same in both GCC and Clang (Clang is actually worse with regards to it’s relentless use of UB to optimize opcode generation, presumably this is also why they have more tooling around catching errors and sanitizing code). This is a design feature from the perspective of compiler designers. There is absolutely nothing in the literature to back up your point that register allocation suddenly faceplants on UB – I’d be more than happy to read it if you can find it, though.

                    If that’s at all a practical suggestion, perhaps you can go find the part that ensures “that there is no interference between operations” and point that out?

                    *points at the entire register allocation subsystem*

                    But no, the burden of proof is on you, as you made the claim that the register allocator and interference graph fails on UB. It is up to you to prove that claim. I personally cannot find anything that backs your claim up, and it is common knowledge (backed up by many, many messages about this on the mailing list) that the compiler relies on Undefined Behaviour.

                    Seriously, I want to believe you. I would be happy to see another reason of why having the compiler rely on UB is a negative point. For this reason I also accept a code example where you can use the above example of UB to cause the compiler to clobber registers and return an incorrect result. The presence of a negative number alone is not sufficient as that does not demonstrate register overwriting.

                    1. 2

                      There is absolutely nothing in the literature to back up your point that register allocation suddenly faceplants on UB

                      What point? I think you’ve misinterpreted something.

                      you made the claim that the register allocator and interference graph fails on UB

                      No, I didn’t.

                    2. 1

                      It isn’t the addition; the second check is omitted because n is known to be greater than 0. Here’s the example with value range annotations for n.

                      int f(int n)
                      {
                          // [INT_MIN, INT_MAX]
                          if (n < 0)
                          {
                              // [INT_MIN, -1]
                              return 0;
                          }
                          // [0, INT_MAX]
                          n = n + 100;
                          // [100, INT_MAX] - overflow is undefined so n must be >= 100 
                          if (n < 0)
                          {
                              return 0;
                          }
                          return n;
                      }
                      
                      1. 2

                        You’re correct that I oversimplified it. The tone of the person I responded to was combative and I couldn’t really be bothered going into detail again one something that I’ve now gone over several times in different posts right here in this discussion.

                        As you point out it’s the combination of “already compared to 0” and “added a positive integer” that makes the final comparison to 0 redundant. The original point, that the semantics of C, and in particular the possibility of UB, mean that a simple operation can affect later code generation.

                        Here’s an example that works without interval analysis: (edit: or rather, that requires slightly more sophisticated analysis):

                        int f(int n)
                        {
                            int orig_n = n;
                            n = n + 100;
                            if (n < orig_n)
                            {
                                return 0;
                            }
                            return n;
                        }
                        
                1. 3

                  Hi, I’ve just published the first release of this program. It is open source released under the Apache 2.0 license.

                  From the linked document:

                  Cedro is a C language extension that works as a pre-processor with four features:

                  • The backstitch @ operator.
                  • Deferred resource release.
                  • Block macros.
                  • Binary inclusion.

                  The source code archive and GitHub link can be found here: https://sentido-labs.com/en/library/

                  Edit: in some machines, the miniz library does not compile with -Wsign-conversion: you can get it to work by removing that option from CFLAGS in Makefile. This affects only cedro-new: both cedro and cedrocc compile before that error.

                  1. 4

                    This looks neat, I have a few comments:

                    Why does the deferred resource release not use __attribute__((cleanup))? It can generate code that is correct in the presence of exceptions, whereas the output code here still leaks resources in the presence of forced stack unwinding. Is it just that you’re doing token substitution (__attribute__((cleanup)) takes a pointer to the cleanup object, so you can have a single cedro_cleanup function that takes a pointer to a structure that contains explicit captures and a function to invoke). The choice of auto for this also means that it cannot be used in headers that need to interoperate with C++ - was that an explicit design choice?

                    Similarly on the lowering, binary includes are a much-missed feature in C/C++, but the way that you’ve lowered them is absolutely the worst case for compiler performance. The compiler needs to create a literal constant object for every byte and an array object wrapping them. If you lower them as C string literals with escapes, you will generate code that compiles much faster. For example, the cedro-32x32.png example lowered as "\x89\x50\x4E\x47\0D\x0A\x1A..." will be faster and use less memory in the C compiler. I’m not sure I understand this comment though:

                    The file name is relative to the current directory, not the C file, because usually binary files do not go next to the source code.

                    What is ‘the current directory’? The directory from which the tool is invoked? If so, that makes using your tool from any build-system generator annoying because they tend to prefer to expand paths provided to tool invocations to absolute paths to avoid any reliance on the current working directory. I don’t actually agree with the assertion here. On every codebase I’ve worked where I’ve wanted to embed binaries into the final build product, those binaries have been part of the source tree so that they can be versioned along with the source.

                    1. 3

                      It can generate code that is correct in the presence of exceptions

                      It’s a C preprocessor and C does not have exceptions

                      1. 2

                        C has setjmp(3) and friends as a very raw, low-level exception mechanism. It’s basically the same underpinnings, with a much less developer-friendly interface. Still, real C code does make use of it!

                        1. 1

                          And __attribute__((cleanup(..))) doesn’t work with longjmp. Not even C++ destructors run if you longjmp out of a scope. (Both destructors and __attribute__((cleanup(..))) run when unrolling due to an exception though.)

                          C’s longjmp doesn’t do any stack unrolling, it essentially just sets the instruction pointer and stack pointer in a way which breaks lots of stuff. That’s not to say it’s not used though; I’ve encountered it myself with the likes of libjpeg.

                        2. 2

                          C does not have exceptions, but C often lives in a world where exceptions can be thrown. If your C code invokes callbacks from non-C code, it often has to handle exceptions being thrown through C stack frames, even if the C code itself doesn’t handle them. Writing exception-safe code is one of the main motivations for __attribute__((cleanup)) existing.

                          1. 2

                            Is it well defined behavior to throw C++ exceptions across C stack frames? For some reason I thought that was UB.

                            1. 2

                              It’s certainly not defined behaviour, since the C++ standard doesn’t (naturally enough) concern itself with specifying what happens when an exception passes into code written in another language.

                              In practice, functions in C may not have unwind information and an exception that propagates into them will be treated the same as an exception that (would) propagate out of a noexcept-attributed C++ function. (Off the top of my head, I think the result is that std::terminate() is called).

                              However, it is possible to compile C with unwind info (eg gcc has -fexceptions to enable this), and in that case implementations will allow propagating an exception through the C functions. At some point the exception still needs to be caught, which you can only do in C++ (or at least non-C) code, so this is only something that’s needed when you have a call pattern like:

                              C++: main() {
                                       -> foo() [call into C code]
                              C:   foo() {
                                       -> bar() [call into C++ code]
                              C++: bar() {
                                       (throws exception)
                              
                              1. 2

                                Well-defined in what context? In the context of the C specifications, any interaction with code not written in C is not well defined. In C++, functions exposed as extern "C" impose some constraints on the ABI, but no strong definitions. All (well, almost all) functions that are part of the C standard are also part of the C++ standard. This includes things like qsort, and qsort in C++ explicitly permits the callback function to throw exceptions.

                                More relevant, perhaps, are platform ABI standards. On Windows, throwing SEH and C++ exceptions through any language’s stack frames is well-defined behaviour and the C compiler provides explicit support for them via non-standard extensions.

                                Most *NIX platforms use the unwind standard from the Itanium C++, which defines a Base ABI that is language agnostic and layers other things on top. This defines DWARF metadata for each frame that tells the compiler how to unwind. You can write these by hand in assembly with the .cfi_ family of directives. If a frame doesn’t contain any code that needs to run during stack unwind, this information just describes how to restore the stack pointer, what the return address is, and where any callee-save registers that the function used were stashed. The unwinder can then walk the stack and restore the previous frame’s expected register set and recurse. With GCC and Clang, __attribute__((cleanup)) emits the same metadata that a C++ destructor would in these tables, so the code in the cleanup function is run during stack unwinding. You can use this to do things like release locks and deallocate memory in C if a callback function that you invoke from any language that has exceptions throws an exception through your stack frame. Note that this doesn’t let you catch the exception. There’s no major reason why you shouldn’t be allowed (from C) to block exception propagation, though if you need this then invoking a callback via a tiny C++ shim that does try { callback() } catch (...) {} will prevent any exceptions (even non-C++ ones that use the same base ABI) from propagating into the C code.

                          2. 2

                            Why does the deferred resource release not use __attribute__((cleanup))?

                            Because that is compiler-dependant, at least for now. I want something that would work where a newer compiler is not available. Also, the current mechanism in cedro allows code blocks with or without conditionals, which are more flexible than the single-argument function required by __attribute__((cleanup)), unless I misunderstand that feature. I’ve actually never used variable attributes in my own programs, only read about it, so I might be missing something.

                            The choice of auto for this also means that it cannot be used in headers that need to interoperate with C++ - was that an explicit design choice?

                            It wasn’t, the reason was to avoid adding more keywords that would be either prone to collisions, or cumbersome to type. For use with C++, you could write the output of cedro to an intermediate file which would be standard C. I’ll have to think about it in more detail to see how much of a problem it is in practice.

                            the way that you’ve lowered them is absolutely the worst case for compiler performance. The compiler needs to create a literal constant object for every byte and an array object wrapping them. If you lower them as C string literals with escapes, you will generate code that compiles much faster. For example, the cedro-32x32.png example lowered as “\x89\x50\x4E\x47\0D\x0A\x1A…” will be faster and use less memory in the C compiler.

                            I did not realize that, you are right of course! I know there are limits to the size of string literals, but maybe that does not apply if you split them. I’ll have to check that out.

                            EDIT: I’ve just found out that bin2c (which I knew existed but haven’t used) does work in the way you describe, with strings instead of byte arrays: https://github.com/adobe/bin2c#comparison-to-other-tools It does mention the string literal size limit. I suspect you know, but for others reading this: the C standard defines some sizes that all compilers must support as a minimum, and one of them is the string literal maximum size. Compilers are free to allow bigger tokens when parsing.

                            I’m concerned that it would be a problem, because as I hinted above my use case includes compiling on old platforms with outdated C compilers (sometimes for fun, others because my customers need that) so it is important that cedro does not fail any more than strictly necessary when running on unusual machines.

                            Thinking about it, I could use strings when under the length limit, but those would be the cases where the performance difference would be small. I’ll keep things like this for now, but thanks to you I’ll take these aspects into account. EDIT END.

                            What is ‘the current directory’? The directory from which the tool is invoked? If so, that makes using your tool from any build-system generator annoying because they tend to prefer to expand paths provided to tool invocations to absolute paths to avoid any reliance on the current working directory. I don’t actually agree with the assertion here. On every codebase I’ve worked where I’ve wanted to embed binaries into the final build product, those binaries have been part of the source tree so that they can be versioned along with the source.

                            I see, that’s again something I’ll have to consider more carefully. I keep the binaries separated from the source code, but it would make sense to put things like vertex/fragment shaders next to the C source.

                            Thank you very much for your detailed review.

                            1. 1

                              Also, the current mechanism in cedro allows code blocks with or without conditionals, which are more flexible than the single-argument function required by __attribute__((cleanup)), unless I misunderstand that feature. I’ve actually never used variable attributes in my own programs, only read about it, so I might be missing something.

                              The attribute takes a function, the function takes a pointer. That’s sufficient to implement a closure. For example, you could transform:

                              int a;
                              int b;
                              auto a += b;
                              

                              Into something like this:

                              // These can be in a header somewhere
                              struct cedro_cleanup_capture
                              {
                                void(*destructor)(struct cedro_cleaup_capture*);
                                void *captures[0];
                              };
                              void cedro_cleanup(void **p)
                              {
                                struct cedro_cleanup_capture **c = (struct cedro_cleanup_capture **c)p;
                                (*c)->destructor(*c);
                              }
                              
                              // Generated at the top-level scope by Cedro
                              static void __cedro_destructor_1(struct cedro_cleanup_capture *c)
                              {
                                // Expanded from a += b
                                *((int*)c->captures[0]) += *((int*)c->captures[1]);
                              }
                              
                              ...
                              
                              int a;
                              int b;
                              // Generated at the site of the `auto` bit by cedro:
                              __attribute__((cleanup(cedro_cleanup)))
                              struct { void (*destructor)(struct cedro_cleanup_capture); void *ptrs[2] } = { __cedro_destructor_1, {&a, &b} };
                              

                              Now you’ve got your arbitrary blocks in the cleanups. If your compiler supports the Apple blocks extension then this can be much simpler because the compiler can do this transform already.

                              It wasn’t, the reason was to avoid adding more keywords that would be either prone to collisions, or cumbersome to type. For use with C++, you could write the output of cedro to an intermediate file which would be standard C. I’ll have to think about it in more detail to see how much of a problem it is in practice.

                              The best way of doing this is to follow the example of Objective-C and use a character that isn’t allowed in the source language to guard your new keywords. A future version of C may reclaim auto in the same way that C++11 did, and some existing C code uses it already, so there’s a compatibility issue here. If you used $auto then it would not conflict with any future keyword or identifier.

                            2. 1

                              What is ‘the current directory’? The directory from which the tool is invoked? If so, that makes using your tool from any build-system generator annoying because they tend to prefer to expand paths provided to tool invocations to absolute paths to avoid any reliance on the current working directory. I don’t actually agree with the assertion here.

                              After thinking about it, my conclusion is that you are right, so I have changed the program: now the binary file is loaded relative to the including C source file.

                              1. 1

                                Thanks! What are you currently using this for? The place I would imagine it being most useful is for tiny embedded systems that have a C compiler but no C++ compiler. Firmware blobs that want embedding in the final binary are pretty common there.

                                1. 1

                                  Well, today I’m continuing work on source code manipulation tools using tree-sitter, which is a parser generator that outputs C parsers. I started with the Rust wrapper but some of the machines where I would like to run it do not have a Rust compiler, some because of the OS, others because of the CPU ISA.

                                  What I’m doing is exploring how much I can cut out dependencies and remain productive. Dependency hell is manageable for a full-time job, but for anything else which I revisit only occasionally it is not acceptable to get derailed by something that used to work, but they changed it and now it does not anymore, and you can not get back to a previous version because of a tangle of up-/downstream dependencies.

                                  The use case of resource-limited machines like microcontrollers and retrocomputers is also a goal: I hope to resume work in that respect soon; like many people, I have a bunch of such machines lying around waiting for me to find some time for them. The intention is that a simpler build chain should make that easier to do as a spare-time job.

                                  And then, binary includes are very useful to cut down on dependencies even on modern machines: one example is for simple GUI applications, where using nanovg and embedding the fonts and images I can get an executable that does not require installation, and depends only on glibc and the various libGL*, libglfw libraries which works well in practice for me. I find this much easier to keep portable than using big complex GUI frameworks, which I admit provide lots of difficult-to-implement features: for some programs though, I find that my choice is not either minimal dependencies with spartan features or more dependencies with complete features, but either minimal dependencies or a non-compiling/non-running program.

                                  1. 1

                                    For pretty much anything I was using C for 10 years ago (including a Cortex-M0 with 8 KiB of RAM) I’m using C++. C++17 is available on any system with GCC or Clang support. The C++ standard library is sufficiently ubiquitous that it counts as a dependency in the same way that the C standard library does: it’s there on anything except tiny embedded things. It can easily consume C APIs and with modern C++ there are a lot of useful things for memory management and so on.

                                    1. 1

                                      I do see your point, and I’ve used C++ for decades and expect to keep using it in the future: the improvements in the last years after the stagnation period have made it much more comfortable to use.

                            3. 1

                              This is really great! I love it!

                              1. 3

                                Thanks, I would like to hear about your experience, positive or negative, once you get to try it out.

                            1. 2

                              The code snippet was entirely synthesized? Ouch…

                              It has undefined behavior, by the way. Dereferencing a void** is UB because pointers can be different sizes on the same architecture, depending on which type of pointer is underneath.

                              Doesn’t bode well for the rest of the article.

                              1. 3

                                It has undefined behavior, by the way. Dereferencing a void** is UB because pointers can be different sizes on the same architecture, depending on which type of pointer is underneath.

                                That doesn’t matter. void* has a single size (though this is implementation defined), and so dereferencing a void** that is the address of a void* is fine. Taking the address of a T* and casting it to void** then dereferencing it is unspecified and may be invalid if the representation of T* and void* are different. In particular, void* and char* must be able to store any kind of data pointer (to handle word-addressable machines where a char* may include a mask to indicate which offset within the word it refers to) and so these types may be wider than pointers to other types.

                                Their code is taking the address of a void* and then dereferencing the resulting void**, which is well defined.

                                1. 2

                                  The function is recursive, and r is created for those recursive calls by taking the address of an int*. So it’s still undefined.

                                  1. 2

                                    Their code is very difficult to follow, but the only things I see that look like int* appear to be the payload. The next pointers are void*. They do have an assumption that sizeof(void*) >= sizeof(int*) in the offset calculation, but that’s correct according to the standard.

                                    1. 1

                                      If I’m reading it correctly, the assumption is sizeof(void*) >= sizeof(int) (not sizeof(int*)). The list nodes store an int, followed by a void* which points to the next node in the list. In:

                                      *((int *)y2) = vx22;
                                      *((void **)y2+1) = y12
                                      

                                      The first assignment (from vx22) is an int (the stored value in the node, copied from the original node), the second is void * (the copy of the list tail).

                                      I can’t recall off the top of my head whether the standard requires that sizeof(void*) >= sizeof(int) but it does typically hold.

                                      (edit: Incidentally, C doesn’t really allow for storing separate objects into a malloc’d area in this way. The language used in the standard re malloc is: The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object with a fundamental alignment requirement and then used to access such an object or an array of such objects in the space allocated (until the space is explicitly deallocated), Here what is being stored is two objects of distinct type, which is not an array nor a single non-array object. I consider this an oversight in the standard though, not a deliberate restriction).

                                    2. 1

                                      It’s convoluted and hard to follow, but the code doesn’t at any point take the address of an int *. I think you’re incorrect.

                                      1. 1

                                        It stores an int* into a void**. No, it doesn’t take the address, but an int* is not guaranteed to be the same size as a void**.

                                        1. 1

                                          It stores an int* into a void**

                                          I don’t think you’ve read it correctly. At no point does it store an int *.

                                          The closest it comes to what you’re saying is this line:

                                          *((int *)y2) = vx22;
                                          

                                          That’s storing an int, not an int *, and it’s storing it by casting a void * which is the result of a call to malloc into an int *, then storing via that pointer. There’s nothing wrong with that.

                                  2. 1

                                    The code snippet was entirely synthesized? Ouch…

                                    It wasn’t. That’s an example he provided of “by hand” code.

                                    1. 4

                                      It was.

                                      In fact, the code written above was actually entirely automatically synthesised by our tool, along with a proof certificate, formally guaranteeing its correctness.

                                      From the “Certified Program Synthesis to the rescue!” section, right under the void listcopy() picture.

                                      Also, the fact that a certificate exists for that code makes me doubt their tool.

                                      Also, big fan. I follow your blog as of recently.

                                      1. 2

                                        Mea culpa!

                                        1. 3

                                          No problem! I understand that it is a little buried, and like you, I want things like this to work.

                                          Also like you, I would love to see formal methods used more in the industry. I am currently learning TLA+, helped by your intro to it, and I am also learning Alloy. I would like to write a version control system that “doesn’t randomly delete chapters of a user’s book.” :)

                                  1. 15

                                    Modifying the microbenchmark by simply commenting out the 2nd memcpy shows that it’s not the 2 memcpys that make the “slow” path slow (compiled with gcc -O3):

                                    $ ./a.out 
                                    slow: 0.003500 microseconds per write
                                    fast: 0.000456 microseconds per write
                                    $
                                    

                                    Somewhat fantastically, that means that this line is the one responsible for the massive difference in time:

                                            const size_t part1 = min(MESSAGE_SIZE, BUFFER_SIZE - offset);
                                    

                                    I then tried compiling with clang (again with -O3), and:

                                    $ ./a.out 
                                    slow: 0.000000 microseconds per write
                                    fast: 0.000001 microseconds per write
                                    

                                    I suspect the loop gets optimised away entirely in this case (the buffer isn’t used afterwards, so why bother copying anything into ti at all…).

                                    Playing around a little more shows some other interesting results; if I (with gcc) use the correct -march=haswell for my desktop, the difference between fast and slow is significantly reduced, if I make the 2nd memcpy call conditional on whether it has anything to copy (if (part2)):

                                    $ ./a.out 
                                    slow: 0.000809 microseconds per write
                                    fast: 0.000465 microseconds per write
                                    

                                    I think this goes to show the benchmark is bad; I think the fixed 32-byte message size is one part of the problem (compilers can convert memcpy into an unrolled operation using MMX or SSE instructions for example). Eg godbolt shows the memcpy as simply:

                                            movaps  %xmm1, 64(%rsp,%rax)
                                            movaps  %xmm0, 80(%rsp,%rax)
                                    

                                    … but only if it doesn’t have to do the part1/part2 check. That explains why that case is so much faster. If the message was variable-sized, this wouldn’t happen.

                                    1. 23

                                      C is called portable assembly language for a reason, and I like it because of that reason.

                                      People do keep calling C “portable assembly language”, and it continues to be wrong. Thinking that you understand exactly (or even roughly) what assembly will be generated for a particular piece of C code is a trap, one that leads to the nastiest and subtlest of bugs.

                                      C is not a portable assembly language.

                                      1. 2

                                        I feel like this is quoting a specific part of the text, which specifies that the “standard imposes no requirements” on how undefined behaviour is implemented, and then complaining that people interpreted this to mean that the standard imposes no requirements on how undefined behaviour is implemented.

                                        Compiler vendors are free to have, for example, signed integer overflow calculations actually wrap around as per 2’s complement arithmetic; some of them do; others have a flag which enables this.

                                        But if you store a value through an invalid pointer, potentially overwriting the program code itself (I know - it’s usually protected in modern operation systems, but the standard is meant to be more general than that), what is the “reasonable behaviour” that you could hope for? (The result of doing this also falls under “undefined behaviour”).

                                        Complaining about compiler vendor choices makes sense; compiling about the actual text of the standard makes sense; complaining about people interpreting the above text to mean exactly what it states (and even denigrating those people as “self-appointed experts”) is ridiculous.

                                        From the article:

                                        Parsing it out, the prevailing interpretation reads the passage as follows (and note that we have to delete the comma after “behavior” to make it work):

                                        behavior upon use of a nonportable or erroneous program construct, or of erroneous data, or of indeterminately-valued objects: The Standard imposes no requirements on this behavior

                                        No, this has deleted the “for which” part. The behaviour on use of a nonportable or erroneous program construct is “undefined behaviour” only if the standard imposes no requirements on it; that is the “prevailing interpretation”. Nobody is assuming that the standard both does and doesn’t proscribe behaviour on use of objects with indeterminate value for example.

                                        Under the prevailing interpretation “imposes no requirements” is the only operative phrase: the WG14 Committee ignores “use of a nonportable or …” from the text, and has built out a long, ad-hoc list of uses that produce “undefined behavior”

                                        The point is that that list describes program constructs which are nonportable, or erroneous, and for which the standard imposes no requirements. Not just that the standard imposes no requirements that all nonportable/erroneous constructs cause erroneous behaviour. Which is, if I understand correctly, what the author also believes, but they are saying that the prevailing interpretation is the latter, which I just don’t see. Eg of undefined behaviour (from the C99 standard):

                                        An object is referred to outside of its lifetime (6.2.4).

                                        That this is listed in an appendix titled “Portability issues” should be telling.

                                        The more constructive interpretation is that the intention of the first sentence was specify that “undefined behavior” was what happened when the programmer used certain constructs and data, not otherwise defined by the Standard.

                                        That is exactly the interpretation that is being taken, with the results that the article is decrying.

                                        Returning a pointer to indeterminate value data, surely a “use”, is not undefined behavior because the standard mandates that malloc will do that

                                        It is not in fact a “use”. This is “using”, if anything, the pointer value, not the value to which it points; the pointer value is not indeterminate (unspecified or a trap) since its value is specified by the specification for malloc.

                                        Further:

                                        Consider the shift example from Linux that Regehr mentioned. Under the constructive interpretation, compilers would have to choose some “in range” option in place of an “optimization” that doesn’t optimize anything:

                                        That would be imposing requirements, when the text specifically says that no requirements are imposed.

                                        ignore the situation, generate a “shl” instruction for x86 or the appropriate instruction for other architectures and let the machine architecture determine the semantics.

                                        “Ignoring the situation” means ignoring the situation where the shift count is greater than what is allowed, and therefore assuming that it is within the allowed range.

                                        1. 3

                                          I feel like this is quoting a specific part of the text, which specifies that the “standard imposes no requirements” on how undefined behaviour is implemented

                                          The argument being made here is that the spec as written should have been read more like (yes, this is not the wording it used, it is a rewrite to indicate the meaning):

                                          Undefined behavior — Undefined behavior occurs when both of the following are true:

                                          1. The program makes use of a nonportable or erroneous program construct, or of erroneous data, or of indeterminately-valued objects
                                          2. No other part of this Standard imposes requirements for the handling of the particular construct(s), data, or object(s) at issue.

                                          When only condition (1) above is met, but condition (2) is not, the situation is not undefined behavior and conformant implementations MUST NOT invoke their handling for undefined behavior.

                                          Conformant implementations MAY handle undefined behavior in any of the following ways, but MUST NOT handle undefined behavior in any other way:

                                          • Ignore the situation completely with unpredictable results
                                          • Behave during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message)
                                          • Terminate a translation or execution (with the issuance of a diagnostic message)

                                          This is wordier, but is the “plain English” meaning of the original standard as quoted. The entire argument here is that the rest of your comment, and of compiler implementations, is based on a willful misreading which redefined UB to be only item (1) above and separated out item (2) as the handling of it, despite the text explicitly not supporting and in fact contradicting this reading.

                                          The fact that it is contradicted is clear from the fact that the Standard does in fact impose requirements for how to handle undefined behavior, immediately afterward. Thus, the contention that the Standard intended to impose no requirements is unsupported by text.

                                          1. 1

                                            The entire argument here is that the rest of your comment, and of compiler implementations, is based on a willful misreading which redefined UB to be only item (1) above and separated out item (2) as the handling of it, despite the text explicitly not supporting and in fact contradicting this reading.

                                            But the rest of my comment doesn’t separate out item (2). I’m in full agreement that both (1) and (2) must apply for something to have “undefined behaviour”. If some behaviour is required, it is not undefined behaviour. That seems fundamentally obvious, and I don’t think anyone has really misunderstood that, despite OP’s claims.

                                            The problem I have is this part (of your re-wording):

                                            Conformant implementations MAY handle undefined behavior in any of the following ways, but MUST NOT handle undefined behavior in any other way

                                            I think it’s clear that is not the case in the C99 text, and I don’t think it was strictly the case from the C89 text either, even if you do ignore the contradiction with the previous paragraph (“imposes no requirements”). The C89 text:

                                            Permissible undefined behavior ranges from …

                                            … implies that there is a range of permissible behaviours, and that edges of this range are specified;

                                            Furthermore, one of the “permissible” behaviours:

                                            … ignoring the situation completely with unpredictable results, …

                                            … pretty much accurately describes the current behaviour of compilers in the presence of UB that the author is complaining about.

                                            So I feel like there are a bunch of things in the post that are wrong or at least contentious:

                                            1. The supposition that compiler vendors who implement UB in a way that author disagrees with have separated out items (1) and (2) from the definition of UB, as you break it down above – which I think is wrong;
                                            2. That the C89 text was not trying to say what the C99 text now does say – contentious;
                                            3. That “ignoring the situation completely” is a behaviour aligned with what the post author wants, rather than what compilers often now do – contentious.

                                            In particular, from the OP:

                                            What is conspicuously missing from this list of permissible undefined behaviors is any hint that undefined behavior renders the entire program meaningless

                                            … yet “with unpredictable results” captures this quite nicely, I think.

                                            1. 1

                                              No other part of this Standard imposes requirements

                                              I suddenly twigged to the significance of “other part” here. You are saying that no other part of the standard imposes requirements, but that this part does.

                                              While that makes the whole argument make a little more sense, I don’t think the text supports this; the actual text does not use the “other part” wording at all (it’s unorthodox to put normative requirements in the “terms” section anyway), and I think my other points still hold: the “permissible undefined behaviour” from C89 specifically includes an option which allows for “unpredictable results”, i.e. the “required behaviour” is not requiring any specific behaviour anyway.

                                              1. 2

                                                I suddenly twigged to the significance of “other part” here.

                                                Yes, that’s the intended reading. The very next bit of text imposes requirements for handling UB, so any attempt to read it as a declaration that the Standard intends to impose no requirements for handling UB is nonsensical and requires disregarding the text itself. The only possible logically-consistent interpretation of the “imposes no requirements” is thus as part of the definition of UB – that UB is behavior which meets both prongs (use of erroneous construct/data/etc. and one for which the Standard doesn’t otherwise impose requirements), not as a declaration of the rules for handling of UB (which is explicitly defined in the following section).

                                                Compiler implementers, as the OP explains in detail, have rammed their preferred reading all the way through to later versions of the Standard (which have had to do some grammatical gymnastics documented in the OP). That doesn’t mean it was the right reading. That doesn’t mean it should be the right reading.

                                                1. 1

                                                  The very next bit of text imposes requirements for handling UB

                                                  I don’t think that it does, though. “Ignoring the situation completely with unpredictable results” is one of the explicitly listed “permissible” behaviours, but is hardly proscriptive. And, the text describes a range of permissible behaviour, implying that the examples given may not be a complete set.

                                                  any attempt to read it as a declaration that the Standard intends to impose no requirements for handling UB

                                                  The reading is actually that if the standard imposes behavioural requirements on something, then it is not UB. It is not “if it is UB, the standard imposes no requirements”, it is “it is not UB unless the standard imposes no requirements”. That “2nd prong” isn’t being ignored as part of the definition of UB at all. That “the standard intends to impose no requirements” is a logical consequence of, but is not the actual full interpreted meaning of, that 2nd part.

                                                  The only possible logically-consistent interpretation of the “imposes no requirements” is thus as part of the definition of UB – that UB is behavior which meets both prongs (use of erroneous construct/data/etc. and one for which the Standard doesn’t otherwise impose requirements)

                                                  The problem is you’ve inserted the “otherwise”. The text actually says “for which this International Standard imposes no requirements”, and that necessarily includes the subsequent paragraph. I.e. for this interpretation to be logically consistent, you need to modify the text slightly.

                                                  Another logically-consistent interpretation is that the paragraph which “imposes requirements” is not actually intended to do so, which, given that it allows a range of behaviour (i.e. perhaps not fully specified) and that one of those “behaviours” is effectively anything at all (“unpredicatable results”), doesn’t seem too unlikely.

                                                  (edit: and again, I note that this text is all in the “terms, definitions and symbols” section, which typically is definitive rather than proscriptive).

                                          1. 2

                                            How do other languages treat undefined behaviour? Is there any hope in formalizing a dialect of C (and C++) that requires compilers to fail when the program is invalid, instead of taking it as license to optimize the whole program away?

                                            My attempt at a summary (I’m no expert):

                                            • Ada Spark: If disabling the static verification and runtime checks that normally prevent this, there is a concept of bounded errors, where a value may be allowed to become invalid, yet the program is otherwise intact – defined error behaviour.
                                            • Rust: The program won’t compile if it has memory errors. Integer overflow generate panics in debug mode and is defined (as two’s complement) in release mode.
                                            • Go: Nil dereference panics; integer overflow does not.
                                            • Zig:

                                              If undefined behavior is detected at compile-time, Zig emits a compile error and refuses to continue. Most undefined behavior that cannot be detected at compile-time can be detected at runtime. In these cases, Zig has safety checks. Safety checks can be disabled on a per-block basis

                                            1. 4

                                              Is there any hope in formalizing a dialect of C (and C++) that requires compilers to fail when the program is invalid, instead of taking it as license to optimize the whole program away?

                                              No, this is impossible. Even CompCERT, which is formally verified, does not make any guarantees when your input program contains undefined behaviour. The entire point of UB is to encode things that the implementation cannot guarantee that it can catch. There are basically three things you can do for this kind of behaviour:

                                              • Restrict your language to disallow things that make analysis hard. For example, Rust does not allow two pointers to the same object except in very restricted situations (one is the canonical owning pointer and the lifetime of the others must have shorter lifetimes than that one), Java doesn’t allow pointer arithmetic at all and so disallows any kind of type or bounds errors that can result from them.
                                              • Require run-time checks, for example Java or C# require whole-program garbage collection.
                                              • Require complex static analysis.

                                              The C spec says, for example, that using a pointer to an object after a pointer to the same object has been passed to free is UB. This means that the compiler may assume that any pointer that you dereference has not been passed to free and so is still valid. If you wanted C implementations to fail in this case then you’d need to either restrict what C is allowed to do with pointers (to the extent that the language is no longer very C-like), require a global garbage collector, or require whole-programme data-flow analysis that gets sufficiently close to symbolic execution that it’s infeasible for any nontrivial C program.

                                              Or consider a simpler example, such as division by zero. This is UB in C because some architectures trap on division by 0, others give an invalid result. The compiler can potentially warn if it discovers that the value is guaranteed to be zero, but a compiler typically only learns this after inlining and multiple other optimisations, at which point it typically doesn’t have the information lying around to explain why the value is guaranteed to be zero (requiring compilers to record the sequence of transforms that led to a particular endpoint would increase memory usage by at least a factor of 10 and would probably still be incomprehensible to 99% of users). If you want the compiler to dynamically protect against this, it would need to insert a conditional move to replace the divisor with some other value to give a predictable outcome. Pony does this and defines N / 0 = 1, so just replaces the denominator by the numerator in a conditional move. That’s about the only fast thing you can do, but still adds some overhead unless the compiler can statically prove that the value won’t be 0.

                                              Or how about Linus’ favourite one: signed integer overflow. This one exists because some older machines didn’t have deterministic (and definitely not portable) behaviour on overflow but it turns out to be really useful for optimisation. If you are using int as the type of a loop induction variable and a < or > comparison for loop termination then the compiler needs to prove that overflow and underflow won’t happen to be able to usefully reason about this. This, in turn, feeds into autovectorisation (which isn’t enabled for the kernel) and increases the number of loops that can be vectorised.

                                              1. 3

                                                Rust does have a notion of undefined behavior; std::mem::transmute-ing something is UB out of a few situations, and transmuting an immutable reference to a mutable reference is immediate UB. They even had to deprecate the old “give me some uninitialized memory” API in favor of MaybeInit because let x: bool = std::mem::uninitialized() was UB.

                                                All of these require unsafe to do, but they’re still undefined behavior.

                                                1. 3

                                                  Is there any hope in formalizing a dialect of C (and C++) that requires compilers to fail when the program is invalid

                                                  Not really; consider that C is intended to be applicable to a wide range of architectures and operating systems, and is used when speed/efficiency are important (more so than safety in the presence of errors), and finally that for example a write through an invalid pointer is tricky to detect without a significant performance penalty (but can have an effectively unbounded range of possible behaviours).

                                                  That said I believe Regehr was working on a “safer C” at some point. I think the effort floundered.

                                                  Edit: https://blog.regehr.org/archives/1180 and https://blog.regehr.org/archives/1287 are relevant links to Regehr’s work.

                                                1. 1

                                                  In my opinion:

                                                  The best way of handling POSIX signals, if you want to handle them at all, is to do so synchronously (even though the signals themselves are technically asynchronous). That means using an event loop that also supports detecting signal delivery.

                                                  (I have written such an event loop - http://davmac.org/projects/dasynq/ - in C++)

                                                  The act of catching signals, and retrieving the siginfo_t data, and not discarding any signals is quite difficult to manage properly. I probably should do a blog post some time, but the key points are:

                                                  • most APIs which allow early termination (interruption) by signals actually execute the signal handler -
                                                  • which means you can’t use sigwaitinfo to extract the siginfo_t data synchronously; it’s already gone once the signal handler returns
                                                  • so, you have to store the data (from the signal handler) and retrieve it afterwards
                                                  • but, as this post also notes, there’s a lot of things you can’t do from a signal handler, including allocation. So the storage has to be pre-allocated;
                                                  • which means you must make sure only one signal is handled, otherwise you may lose the data as one call to a signal handler overwrites the data stored by the previous call

                                                  This all turns out to be difficult to do in practice and really difficult to do portably. To the extend that I ended up actually jumping out of the signal handler (via siglongjump, which restores the signal mask and thus avoids another signal handler being executed) to make sure I could get the signal data.

                                                  1. 2

                                                    even though the signals themselves are technically asynchronous

                                                    This is half true. Some signals, such as SIGIO, SIGINT, SIGUSR1, and so on are asynchronous. They are unordered with respect to anything else in the system and it’s fine to mask them and check them in your run loop (kqueue has explicit events for this, I think Linux’s signalfd can be used in the same way).

                                                    Others, such as SIGPIPE, SIGILL, SIGFPE, SIGSEGV, are synchronous. They are triggered by a specific instruction or system call. Of these, most need to be handled synchronously because the thread that triggered them cannot make progress without handling the condition. For example, SIGILL requires you to skip the offending instruction, emulate it, or kill the thread. SIGSEGV may require you to update page mappings to continue. You cannot defer these to your next run-loop iteration because you cannot get to the run loop without fixing the faulting instruction. SIGPIPE is something of an outlier here: you can make progress, but you probably want to handle the error at the point of the system call that triggered it (and not in the signal handler, which in most cases is just an annoying hold-over from ancient UNIX).

                                                    If you do defer handling of the signal, then you will not have access to the ucontext_t that is delivered with the signal (there is no even vaguely portable way of capturing this and on x86 the size of the signal frame can vary quite significantly between microarchitectures and even if you do capture it then it is full of values that refer to things that no longer exist). This means that you can’t do any of the interesting things that signals allow. Even capturing the siginfo_t can be dangerous because it can refer to file descriptors or POSIX IPC identifiers that are guaranteed to be live for the duration of the signal handler but may not exist by the start of the next run-loop iteration (or, worse, exist as identifiers but refer to different objects).

                                                    It’s quite unfortunate that signals are a single mechanism used for both synchronous and asynchronous events. Spilling the ucontext_t is quite expensive for the kernel and it’s almost never used for asynchronous events (it’s useful only when you want to do something like userspace threading). Similarly, the various mechanisms for polling for signals are completely useless for synchronous signals.

                                                    1. 1

                                                      SIGPIPE, SIGILL, SIGFPE, SIGSEGV, are synchronous. They are triggered by a specific instruction or system call. Of these, most need to be handled synchronously

                                                      Sure, but that’s not what the original article was discussing - capturing signal info and feeding it back to the main thread. These synchronous signals are the exception rather than the rule; most programs don’t handle them at all, though of course there are reasons for doing so.

                                                      1. 2

                                                        most programs don’t handle them at all, though of course there are reasons for doing so.

                                                        Most programs don’t handle any signals. I don’t have a representative sample set of the ones that do, but from the ones that I’ve seen (which tend to be language runtimes, emulators, or security features, so a fairly skewed sample set) it’s almost always the synchronous ones that people care about. The asynchronous ones were important 15 years ago, but there are now usually other mechanisms that are less painful to use for getting the same information. Any high-performance I/O-heavy system I’ve worked on has started by disabling signals.

                                                        1. 1

                                                          Most programs don’t handle any signals

                                                          But, again, there are some that do, and that’s what the article was discussing. I’m aware of at least a few that handle SIGINT to do a clean shutdown, and a number of daemons (smbd and sshd, for example) are receptive to SIGHUP (often as a “reload configuration” message). The dd utility responds to SIGINFO. Those are just examples I can think of off the top of my head. I’d be surprised if there weren’t a fair few programs that use SIGCHLD to detect child process determination as well, though it’s not always necessary.

                                                          I understand there are programs that care about the synchronous signals, but I think it’s mostly particular types of program that do (as you say: skewed sample set), and I don’t think the article was talking about these synchronous signals (it’s using SIGTERM as the example, and the technique it’s talking about isn’t useful for the synchronous signals, it loses the ucontext_t information).

                                                      2. 1

                                                        Oh and:

                                                        It’s quite unfortunate that signals are a single mechanism used for both synchronous and asynchronous events. Spilling the ucontext_t is quite expensive for the kernel

                                                        Linux’s signalfd does solve that; you don’t need to let a signal handler run to dequeue the signal. On BSDs with kqueue you can detect the signal and then use sigwaitinfo or sigtimedwait (the latter is actually necessary, with a zero timeout, to avoid waiting spuriously for a signal which was delivered but not queue, eg a signal is collapsed into an already pending signal). Unfortunately OpenBSD doesn’t have sigtimedwait, and the Mac OS implementation of kqueue has (or at least had) a bug where kqueue sometimes reported a signal some time before it was queued. It’d certainly be nice if there was a standardised mechanism.

                                                    1. 4

                                                      This is, I guess, a personal plug, but I want to point out that my own init/service manager, “dinit” (https://davmac.org/projects/dinit/), builds and runs fine on Alpine and I even have a VM image that boots Alpine with dinit. It doesn’t yet support external control of service state as this post discusses (to allow integration with network state or device presence for example) but that has been on the TODO list for a while and is high priority (when I do get time to work on it again - paid work is taking all my energy right now).

                                                      That said, S6(-RC) is a really well-engineered piece of software and I can see why Alpine are interested in using it.

                                                      1. 12

                                                        I have to wonder whether wide-spread adoption of Java applets might have led to an outcome qualitatively better than the modern web. I mean, the Java runtime was intended to be an application platform, whereas the web is a document delivery system abused and contorted to make do as an application platform.

                                                        1. 12

                                                          Except we had widespread adoption of java applets and the web platform turned out to be a better application platform. On the desktop we’re running VS Code (the web platform) rather than Eclipse (Java).

                                                          I wrote Java applets professionally in the 90s and then web apps. Even back in the pre dynamic html days native web apps were better for most interesting stuff.

                                                          1. 4

                                                            we had widespread adoption of java applets

                                                            We did?

                                                            My memory isn’t what it used to be but I can’t remember a single instance of seeing this in the wild.

                                                            1. 4

                                                              I recall Yahoo using these for apps/games and what not.

                                                              1. 4

                                                                Not widespread like today where a large fraction of websites run JS on load. But I did run across pages here and there that would present an applet in a frame on the page, and you’d wait for it to load separately.

                                                                1. 4

                                                                  They were supported in all popular browsers. Java was widely taught and learned. There definitely were lots and lots of applets deployed but compared to the web they were bad for actually building the applications people wanted to use.

                                                                  1. 4

                                                                    I remember quite a few. Maybe you didn’t really notice them? Even today I occasionally run across a site with an applet that won’t load, especially older sites for demonstrating math/physics/electronics concepts. It also used to be a popular way to do remote-access tools in the browser, back when you couldn’t really do any kind of realtime two-way communication using browser APIs, but you could stick a vnc viewer in an applet.

                                                                    1. 1

                                                                      Aha; now that you mention it I do remember using a VNC viewer that was done as an applet, and also an SSH client. So I don’t think I ever used a Java applet on my own computer, but I did use a couple in university when I was stuck on a Windows machine and didn’t have anything better around.

                                                                    2. 3

                                                                      Runescape Classic xD

                                                                  2. 9

                                                                    I have to agree with ianloic. Applets just didn’t work very well. They weren’t part of the web, they were a (poorly designed) GUI platform shoehorned into a web page with very limited interconnection. And they were annoyingly slow at the time.

                                                                    Flash was a much more capable GUI but still badly integrated and not web-like.

                                                                    With HTML5 we finally got it right, absorbing the lessons learned.

                                                                    1. 8

                                                                      With HTML5 we finally got it right, absorbing the lessons learned.

                                                                      Now, instead of a modular design with optional complexity (user installs/loads given module only when needed), we have bloated web browsers consisting of 20+ millions lines of code with complexity that is mandatory often even for simple tasks (e.g. submit a form with some data, read an article or place order in an e-shop).

                                                                      1. 6

                                                                        Very strongly agree.

                                                                        Back when Flash was widespread, it didn’t seem that popular - it was a delivery mechanism for overzealous advertising that jumped all over content. People were happy to embrace the demise of Flash because Flash was a liability for users.

                                                                        What we have today are synchronous dialog boxes that jump all over content which are very difficult to remove because they’re indistinguishable from the content itself. The “integration” has meant it can no longer be sandboxed or limited in scope. The things people hated about Flash have become endemic.

                                                                        The web ecosystem is not doing a good job of serving users today. I don’t know the mechanism, but it is ripe for disruption.

                                                                        1. 3

                                                                          Flash was also a delivery mechanism for games and videos that entertained millions, and educational software that probably taught more than a few people. If you think games, videos, and education beyond what flat HTML can provide are not “valid” that’s fine, but Flash filled a role and it served users.

                                                                          1. 3

                                                                            I didn’t mean to suggest that all uses of flash are not “valid”; if there was no valid use, nobody would intentionally install it. I am suggesting that it became misused over time, which is why Steve Jobs didn’t encounter too much resistance in dropping it.

                                                                            But the real point from franta which I strongly agree with is being a plugin model it was relatively easy for users to enable when the content really needed it, and leave disabled in other cases. Personally I had two browser installs, one with flash and one without. That type of compartmentalization isn’t possible with HTML5.

                                                                        2. 3

                                                                          Optional complexity is not the right choice in this context. Nobody wants to design an experience where most users are just met with complex plug-in installation instructions. One of the best parts of the HTML5 ecosystem is that it’s largely possible to make websites which work on most of the browsers your users are actually going to use.

                                                                          I agree that the complexity of “HTML5” is a problem. Maybe it would be nice to have two standards, one “simplified” standard which is basically Google’s AMP but good and standardized, and one heavy-weight standard. Simpler websites like news websites and blogs could aim to conform to the simplified standard, and simple document viewer browsers could implement only the simplified standard. But it definitely 100% wasn’t better when the “web” relied on dozens of random proprietary closed-source non-standard plug-ins controlled by single entities with a profit motive.

                                                                        3. 2

                                                                          I think that’s an overstatement. We haven’t gotten it right yet. Browser APIs are getting decent, but HTML+CSS is not a felicitous way to represent a UI. It’s a hack. Most everything to do with JavaScript is also a hack, although on that front we’ve finally started to break the “well, you have to write JS, or transpile to JS, because JS is the thing browsers have” deadlock with WASM, which finally offers what Java and Flash had a quarter century ago: compact bytecode for a fairly sensible VM.

                                                                        4. 4

                                                                          The biggest problem was that Java wasn’t integrated with the DOM. The applet interface was too impoverished.

                                                                          jQuery had a nice tight integration that was eventually folded into the browser itself (document.querySelector). And if you look at modern frameworks and languages like React/preact, Elm, etc. you’ll see why that would continue to be a problem.

                                                                          They use the DOM extensively. Although interestingly maybe the development of the virtual DOM would have been a shim or level of indirection for Java to become more capable in the browser.

                                                                          The recent Brendan Eich interview has a bunch of history on this, i.e. relationship between Java, JavaScript, and the browser:

                                                                          https://lobste.rs/s/j82tce/brendan_eich_javascript_firefox_mozilla

                                                                          1. 3

                                                                            It was in fact perfectly possible to manipulate the DOM from an applet (although at some level you did still need to have the applet visible as a box somewhere; I don’t think it was possible or at least frictionless to have “invisible applets”).

                                                                            I would instead say the biggest problem was the loading/startup time; the JVM was always too heavy-weight; there was a noticable lag while applets started up; early on it would even freeze the whole browser. There were also a lot of security issues; the Java security model wasn’t great (it was fine in principle, but very difficult to get right in practice).

                                                                            Now, funnily enough, the JVM can be much more light-weight (the “modules” effort helps, along with a raft of other improvements that have been made in recent JDKs) and the startup time is much improved, but it’s too late: applets are gone.

                                                                            1. 2

                                                                              I don’t think it was possible or at least frictionless to have “invisible applets”

                                                                              it totally was. Make them 1x1 pixel large and use css to position them off screen. I have used that multiple times to then give the webpage access to additional functionality via scripting (applets could be made accessible to JS)

                                                                              Worse: the applets could be signed with a code signing cert which gave them full system access, including JNA to FFI call into OS libraries.

                                                                              Here is an old blog post of mine to scare you: https://blog.pilif.me/2011/12/22/grave-digging/

                                                                              1. 1

                                                                                It was in fact perfectly possible to manipulate the DOM from an applet

                                                                                How? I don’t recall any such thing. All the applets I used started their own windows and drew in them.

                                                                                  1. 1

                                                                                    OK interesting. It looks like this work was done in the early 2000’s. I think it must have lagged behind the JS implementations but I’m not sure. In any case jQuery looks a lot nicer than that code! :)

                                                                              2. 2

                                                                                In that interview, Brendan noted that JavaScript was async, which helped adoption in a UI world. It’s true, it made it nearly impossible to block a UI on a web request.

                                                                                1. 3

                                                                                  Yes good point. IIRC he talks about how JavaScript was embedded directly in Netscape’s event loop. But you can’t do that with Java – at least not easily, and not with idiomatic Java, which uses threads. As far as I remember Java didn’t get async I/O until after the 2000’s, long after Javascript was embedded in the browser (and long after Python).

                                                                                  So yeah I would say those are two absolutely huge architectural differences between JavaScript and Java: integration with the DOM and the concurrency model.


                                                                                  This reminds me of this subthread with @gpm

                                                                                  https://lobste.rs/s/bl7sla/what_are_you_doing_this_weekend#c_f62nl3

                                                                                  which led to this cool experiment:

                                                                                  https://github.com/gmorenz/async-transpiled-xv6-shell

                                                                                  The question is “who has the main loop”? who can block? A traditional Unix shell wants to block because wait() for any process is a blocking operation. But that conflicts with GUIs which want to have the main loop.

                                                                                  Likewise Java wants the main loop, but so does the browser. JavaScript cooperates better by allowing callbacks.

                                                                                  When you have multiple threads or processes you can have 2 main loops. But then you have the problem of state synchronization too.

                                                                            1. 6

                                                                              Although the URL is indeed “wayland-on-wine-”, the title of the article is (or has been corrected to) “Wine on Wayland”, which actually makes sense.

                                                                              1. 6

                                                                                The architechture.md file mentioned is quite long imo. I wonder whether documenting the code like a library would work (easier in some languages than others), and providing a list of entry points/interactions in a short architechture.md (I often struggle with finding where the work actually starts). In-source docs have much less chance of going stale.

                                                                                Also, for those not bothered to check what exa -TD does, it prints a tree of the directories at .. Funny the author didn’t just say tree

                                                                                1. 5

                                                                                  The architechture.md file mentioned is quite long imo.

                                                                                  Mea culpa :) In my defense, rust-analyzer is a deep and complex project which sits closer to the 200k limit, so there’s a lot of ground to cover. And yes, pointing out entry points explicitly is a good idea, I’ve added separate section for them, thank you!

                                                                                  I don’t think that in-source docs are a substitute for architecture.md though. They are good for explaining what are you looking at, but bad at helping to find where to look in the first place. In my experience, the central document is also, counter-intuitively, easier to keep up to date. I still forget to update the docs even if they are next to code, but with a single document I at least have a workflow to check if the docs are up to date. Finally, with inline docs you probably want to get an “atlas of fine grained maps” rather than a “coarse grained map”. An interesting idea to bridge the tho approaches is to auto-generate codemap out of specially-marked doc comments.

                                                                                  That being said, rust-analyzer also enforces “each module has a docstring” property, and you write about providing a list of entry points to find where the work actually starts, so I expect that we are more in agreement than not :)

                                                                                  1. 4

                                                                                    I agree with this!

                                                                                    In my Dinit project, it’s called DESIGN. It’s pretty brief and only succintly covers a few things, but importantly points where to look in the source for various things; then there are large chunks of comments within various sources files which give the details relevant to that particular source. The DESIGN file also spells out the design considerations and philosophy, and gives very high-level information as to how the software is put together.

                                                                                    It’s really hard to build up a picture of a how a project is put together. An overview like this can save hours of having to slowly piece it together by trawling source more-or-less at random. But, a lot of information can still live in the form of comments within the source code. (Of course, they need to exist, and to be kept up to date).

                                                                                  1. 2

                                                                                    There’s no garbage collector (although C++ allows for one, it is optional, and I’m not aware of any implementations that provide one).

                                                                                    TIL. Is there anywhere theorizing on what this might look like?

                                                                                    1. 1

                                                                                      I believe the idea is that it would work automatically, so code written for garbage-collected C++ would be normal C++ code but without the need to explicitly perform heap deallocations (via delete). A few requirements on pointer manipulation were added to support this and a few library functions were added to handle some special cases, but these would generally not be needed. As I said (I’m the author of the post) I’m not aware of any actual implementations of this. There is no requirement for implementations (compilers & runtime libraries) to support garbage collection.

                                                                                      This stackoverflow answer has a few more details if you’re still curious: https://stackoverflow.com/a/15157689/388661

                                                                                    1. 19

                                                                                      Author here! The title says 2.3x faster, but most runs showed much higher multiples (usually 10-14x faster than RC). I decided to use the least impressive result (2.3x faster than RC) until we can figure out how to benchmark more consistently.

                                                                                      We found this by wondering what really makes Rust code fast. It wasn’t the borrow checker, since the borrow checker is just a restriction, not a particular code pattern. But Rust’s idioms and the borrow checker influence us into patterns that happen to make code fast, such as:

                                                                                      • Using better allocation strategies (most often Vec)
                                                                                      • Isolation of memory between threads (except message passing and mutexes)
                                                                                      • Using generational indices instead of RC (and instead of GC, of course)

                                                                                      So we made a language that does these things directly, instead of forcing the user to do them via a borrow checker. Through some leaps of madness, we found a way to integrate generational indices directly into malloc (for now, a layer above malloc), as mentioned in the article. The result was quite surprising.

                                                                                      We’re pretty excited about what doors this opens. The next design (hybrid-generational-memory) adds static analysis like an “automatic borrow checker”, and uses regions to enable better allocation strategies. If all goes well, maybe it’ll be as fast as Rust itself, without the borrow checker’s complexity cost. We’ll see!

                                                                                      1. 4

                                                                                        Great results! Do you see any limitations with this approach?

                                                                                        1. 4

                                                                                          Hey soc! No user-visible limitations per se, but off the top of my head, there are two things that could cause some overhead in the final design:

                                                                                          • Releasing memory back to the OS is doable with mmap, but could be tricky and have unforeseen consequences, I’m sure there’s mmap caveats that I don’t know of yet.
                                                                                          • Some objects that would have been on the normal stack are now pushed onto “side stacks” (which have generation numbers) which grow and shrink like the normal stack. The minor problem here is that main stack then needs pointers to the locals in the side stack, instead of just being able to subtract a constant from %rbp. That load will probably be fast, but those pointers on the main stack use up some space and therefore some cache.

                                                                                          Other than those potential concerns, the approach seems pretty solid so far!

                                                                                          1. 6

                                                                                            I can see lots of potential issues, though it’s difficult to say how they’d impact real applications.

                                                                                            • you say “releasing memory back to the OS is doable with mmap”, but I don’t see how that could be right. The whole scheme (of checking the generation counter to ensure it’s correct, on each dereference) seems to rely on the allocation (and generation count) persisting indefinitely.
                                                                                            • while it may be faster than reference counting, one obvious drawback is that it doesn’t actually persist an object to which a reference is still held. That is, reference counting keeps the object alive while it is referenced; this scheme instead allows detecting when the reference is no longer valid. They are slightly different use cases, but that is important. Real reference counting allows objects to be kept alive solely by reference-counting links, which allows for dynamically allocated circular data structures, for example.
                                                                                            • the cost of checking on dereference is unlikely to always be irrelevant.
                                                                                            • the inability to ever truly free an allocated object block (i.e. to re-use it an allocation of a different size) could lead to memory fragmentation, which will be problematic for some applications. Breaking allocations into fixed-size buckets will also lead to internal fragmentation, which is not without cost either. And further, how does this work for very large allocations?

                                                                                            I’d be wary of writing off those concerns when comparing to reference counting.

                                                                                            1. 3

                                                                                              Thanks for the reply! Great questions.

                                                                                              • Let’s say we’ve deallocated all the objects on 5 particular pages. We would re-map all of these pages’ virtual address space to a central page containing 0xFF which will (correctly) fail all generation checks, and we would never use that address space again (well actually, we can, with a certain other trick).
                                                                                              • If one requires shared ownership (keeping the object alive) then this approach would be a drawback. However, we rarely actually require shared ownership. I believe most C++ would agree, it’s very rare that unique_ptr is not sufficient for something. In my experience using C++ in industry and Vale at home, I’ve never come across a need for shared ownership (though we can contrive a rare case where it would be convenient, if we try hard). You can read more about this phenomenon at https://vale.dev/blog/raii-next-steps#emerging-patterns if you’re curious.
                                                                                              • You’re right, though it’s only a minor concern of mine because the stack is very likely to be hot in cache. I could be wrong about this impact, so I’m keeping an eye on it. EDIT: I think I actually misunderstood your remark, did I imply checking on dereference is irrelevant somewhere?
                                                                                              • Mimalloc and jemalloc segregate by size class already, so we’re not too worried about this. There are also two mitigations:
                                                                                                • Vale makes very good use of regions, which are wonderful for avoiding fragmentation, you can read more about this at https://vale.dev/blog/zero-cost-refs-regions.
                                                                                                • We could merge two partially-filled pages, using mmap, effectively compacting memory. We’ll wait until fragmentation proves to be a problem before we implement this though.
                                                                                              1. 4

                                                                                                Thanks for responding.

                                                                                                We could merge two partially-filled pages, using mmap, effectively compacting memory. We’ll wait until fragmentation proves to be a problem before we implement this though.

                                                                                                I am dubious about this. As well as adding complexity in the allocator, you could only safely merge two pages if the generation count of all live objects in each page was larger than the generation count of the corresponding dead object in the other page. And even if that weren’t the case, finding merge-able pages would already be non-trivial to do efficiently (plus, messing with page tables has a performance impact especially on multi-threaded applications, and you can’t atomically copy-and-remap, so you’d need some sort of stop-the-world strategy or a write barrier which again is not free).

                                                                                                I believe most C++ would agree, it’s very rare that unique_ptr is not sufficient for something

                                                                                                I’m primarily a C++ developer and I do agree with that. But if you are comparing against reference-counting, it’s a different story, because the ability to manage cyclic data structures is one of the key benefits of RC.

                                                                                                I.e. I agree that it’s an uncommon use case, but when you’re directly comparing against reference counting I think you should be clear that reference counting can do something that generational references can’t.

                                                                                                Mimalloc and jemalloc segregate by size class already, so we’re not too worried about this

                                                                                                Ok, but I think you’ve overlooked what was probably the major concern I raised: how do you deal with very large objects, eg. large arrays? using buckets for such allocations will cause major internal fragmentation. If I remember correctly jemalloc (at least) stops using segregated-by-size buckets beyond a certain size.

                                                                                                I guess, again, it’s something of a corner case; but it’s something to consider regardless.

                                                                                                But - hey, it’s a neat enough idea, and maybe it will prove to be a pretty good strategy overall. I’ll be interested to see how it pans out in the long run!

                                                                                                1. 3

                                                                                                  Thanks for the great comment! What do you mean by “manage cyclic data structures”? Also, if one wants to implement reference counting in Vale, we actually can; the programmer would give an object a counter and an owning reference to itself, and make an Rc-like handle. (Yes, this possibility means that cyclical references are technically possible, in Vale, but they’re basically impossible to do accidentally, which is a nice balance I think.) It’s more expensive, but I’m fine with a very niche need (IMO) taking a bit more memory.

                                                                                                  Thanks for pointing out the large arrays concern, I forgot to address that.

                                                                                                  For arrays that are made of non-inline objects or inline immutables (chars, ints, or inline immutable structs), they are a fat pointer which contains:

                                                                                                  • A pointer into an allocation that came from regular malloc, with no generations.
                                                                                                  • A pointer to an 8-byte allocation that holds only the generation. This generation is separate, but serves as the sole generation for the entire array.
                                                                                                  • (of course) The length.

                                                                                                  The other case, arrays that are inline mutables, could either:

                                                                                                  • Look for a contiguous chunk of elements of the desired length, in the gaps of the existing allocations, similar to some malloc implementations today.
                                                                                                  • Allocate a new set of pages for this array.
                                                                                                    • This would seem to explode our use of virtual memory (which is a finite resource here), but we can use a trick I hinted at before (but didn’t explain, apologies). When we release a page to the OS, we can look for the “maximum generation” in all the entries for the page, and remember that on the side. Later, we can re-use that address space by going through and initializing all the generations to that previous “maximum generation” number.

                                                                                                  Regarding your first point, about being able to merge the entries only all of one page’s generations are greater than the other, you’re totally right. This conundrum felt familiar, so I had to look back at my notes to figure out what mitigation I had for this, and I have written down some insane, feverish ramblings:

                                                                                                  • We can designate every page as only using either “even” or “odd” or “any” generations. Pages start as even or odd, and we can only merge an even- and an odd-generationed page together, to form an “any generation” page. We would also set a “minimum” generation (which is the maximum of both pages’ generations at merge-time), and any new generations would be at least that.
                                                                                                    • We could extend this logic to more than just two (even and odd) partitions. Maybe 10? Maybe with some cleverness with prime numbers, any amount?

                                                                                                  Regarding finding mergeable pages, yeah, that’s pretty difficult. We were exploring this the other day in the discord, and came up with an interesting heuristic: if A is the number of allocations in the first page, and B in the second, and there are 256 slots in each, they will most likely be able to merge if A * B < 44. It seems pretty low, but maybe it’s not. In a given page, most objects are probably short-lived. But even if it doesn’t pan out, that’s okay, since merging partial pages was just a nice-to-have; most allocators don’t do that today anyway.

                                                                                                  Hope that helps, and let me know if there’s any other shortcomings you can find! (and let me know if you see any improvements, too)

                                                                                                  1. 1

                                                                                                    You’ve obviously thought about it this very thoroughly, which is great. I’ve gotten a lot of interesting details out of reading your responses, thanks.

                                                                                                    What do you mean by “manage cyclic data structures”?

                                                                                                    I meant, maintain a heap-allocated cyclic data structure, a structure whose nodes (potentially) contain reference cycles, such as a graph. Each node would be part of the graph and wouldn’t necessarily have an owner outside the graph, but it’s not possible (or at least, it’s difficult) to structure it so that each node is owned by another node. Generational references alone won’t keep graph nodes alive, and you don’t have owning references for all nodes, so what do you do? (there are solutions, but I don’t think there’s any as nice as just having reference counting references, though even those come with the problem that you risk memory leaks if you do have reference cycles and don’t take steps to break them).

                                                                                                    Re multithreading: IIRC, each page is managed by one thread (if an allocation travels to another thread, dropping the owning ref will send it back to its original thread for freeing)

                                                                                                    My main point about multithreading was that, if you have references (of any kind) to an object from multiple threads, and at least one of those references allows mutation of the object, and you want to relocate that object to another physical page (eg as part of the page merging you mentioned), you have make sure that all threads with mutation-capable references don’t write to the object just after you copied it to the new page but just before you updated the page tables (via mmap). So you’d need a write barrier of some kind, or you need to suspend all threads (that might have a mutation-capable reference).

                                                                                                    Now, if only the owning reference can do mutation, maybe that’s not such a problem. Apologies, I’m not really familiar with the Vale language, so I don’t know if this is the case.

                                                                                                    And of course if multiple threads can mutate an object you may well have a mutex built into the object anyway, in which case the solution could be just to lock it before copying it to the new page and unlocking after (at the risk of allowing another thread to stall your memory management by holding the mutex).

                                                                                                    Regarding your first point, about being able to merge the entries only all of one page’s generations are greater than the other, you’re totally right

                                                                                                    Actually I may be missing something, but that’s not quite what I meant. As I see it, it’s only necessary that the live object in a particular “slot” in one page has a greater generation number than the dead object in the same slot in the other page (this must be true for every slot). If both slots contained dead objects you would preserve the highest generation number of the two, but I don’t see why that would need be from the same page every time.

                                                                                                    Eg merging two pages like:

                                                                                                    | page #1    | page #2    |
                                                                                                    |------------+------------|
                                                                                                    | live (3)   | dead (2)   |
                                                                                                    | dead (4)   | live (7)   |
                                                                                                    | dead (4)   | dead (7)   |
                                                                                                    | dead (7)   | dead (4)   |
                                                                                                    

                                                                                                    Would produce:

                                                                                                    | Merged     |
                                                                                                    |------------|
                                                                                                    | live (3)   |
                                                                                                    | live (7)   |
                                                                                                    | dead (7)   |
                                                                                                    | dead (7)   |
                                                                                                    

                                                                                                    In each case, live references still refer to the same live object with the same generation count, and dead references now refer to a dead slot with a same-or-higher generation count or to a live slot with a higher generation count (and so they can be identified as dead references).

                                                                                                    But even if it doesn’t pan out, that’s okay, since merging partial pages was just a nice-to-have;

                                                                                                    Yes, fair enough. I think it would probably be do-able but at a cost to performance. I guess you could delay at least some of that cost until merge-time, but overall I suspect you’d incur a massive increase in allocator complexity for a fairly minimal benefit.

                                                                                                    Anyway, thanks, this has been enlightening! You should consider writing up the whole thing in detail once the implementation is complete (I’m assuming it’s not quite there yet). A lot of the details in the discussion we’ve had are really interesting.

                                                                                                  2. 3

                                                                                                    Seems I missed some parts of your post!

                                                                                                    Re multithreading: IIRC, each page is managed by one thread (if an allocation travels to another thread, dropping the owning ref will send it back to its original thread for freeing), which lets us do a lot more operations without locks or atomics, and we can batch them to reduce write barrier waits. I haven’t done much more thinking in this area, so any input you have here is welcome!

                                                                                                    Re mmap expense: I think we do the same amount of page manipulations as existing strategies, so it should be the same. Unless a remapping is a more expensive call than plain releasing to the OS, at which point we’ll have to adjust that optimization.

                                                                                                    Re large objects: for example if an object is larger than the largest size class (128B) then the struct will use multiple entries in the 128B* bucket, leaving bytes 128-136, 256-264, etc untouched.

                                                                                                    *We can actually put it in whichever bucket will minimize wasted space.

                                                                                                    I appreciate your comments, you’re exploring this much further than anyone else has!

                                                                                              2. 2

                                                                                                I’ve seen this behaviour in our own workloads, at some rare point there is a very rare spike in usage (GBs) and then the work is finished and all the memory is freed. However, the process continues to own that memory since “free” does not hand it back to the OS. IMO, this is a kind of a leak.

                                                                                                1. 1

                                                                                                  Nevermind, we just solved that second drawback =) we can group all a block’s locals into a single allocation, as if they were members of a struct. Now we only have pointers to blocks. (Also, this may accidentally enable really easy green threads.)

                                                                                                  1. 1

                                                                                                    Isn’t this how some languages implement closures and nested functions?

                                                                                                    1. 2

                                                                                                      Yes! That’s right. That’s a much better way of thinking about it.

                                                                                                      (Though Vale is a bit weird, itself implements closures by individually “boxing” fields, and then the closure structs will point at the boxes. This lets us compile to JS and JVM losslessly, and with some hints to the LLVM backend, it can be zero cost.)

                                                                                            1. 1

                                                                                              That temporary is used to construct the parameter value s in the set_s. The argument to the constructor of this s is a temporary – so it’s of type string &&.

                                                                                              Is it true that all temporaries constructed by the compiler are available as r-value references (or are they just r-values)? Is there a part of the standard in C++11 that guarantees this?

                                                                                              1. 3

                                                                                                Yes, temporaries are rvalues by definition, since they don’t have a “name” — they’re not held by or pointed to by a variable. That means a temporary can safely be destroyed (moved out of) as part of the expression, since there’s no way to observe its value afterwards.

                                                                                                (I am not a licensed C++ guru. But I use rvalues a lot.)

                                                                                                1. 2

                                                                                                  as r-value references (or are they just r-values)?

                                                                                                  If you have an rvalue, you have an rvalue reference (or at least, you can get one for free). I’m not great with the terminology but I think of the temporary itself as being the rvalue. An rvalue reference can bind to such a temporary. For example, int &&r = 5 + 6; compiles just fine (and does pretty much what you’d expect; as a bonus, the lifetime of the temporary is extended to the scope of the reference, so you don’t immediately get a dangling reference).

                                                                                                  1. 1

                                                                                                    For example, int &&r = 5 + 6; compiles just fine (and does pretty much what you’d expect;

                                                                                                    That’s a really weird line of code. R-value references are usually for function arguments, where temporaries are natural. Honestly, I had no expectations whatsoever what would happen.

                                                                                                    However, it seems to only work for local variables. In this code b is an l-value reference:

                                                                                                    #include <stdio.h>
                                                                                                    
                                                                                                    struct a {
                                                                                                        int&& b;
                                                                                                    
                                                                                                        a(int&& c) : b(c) {}
                                                                                                    };
                                                                                                    
                                                                                                    int main() {
                                                                                                        a a(5+6);
                                                                                                        printf("%d\n", a.b);
                                                                                                    }
                                                                                                    

                                                                                                    And so it doesn’t compile (assigning an r-value to an l-value reference).

                                                                                                    1. 1

                                                                                                      I had no expectations whatsoever what would happen.

                                                                                                      I meant that it assigns r a reference to a temporary object (which is created via the expression 5 + 6), just as it reads.

                                                                                                      However, it seems to only work for local variables. In this code b is an l-value reference:

                                                                                                      No, b is declared as an rvalue reference: int&& b; - just as c.

                                                                                                      However, referring to c (or to b for that matter) still gives an lvalue (otherwise any use would perform a move). You need to explicitly std::move a value to get an rvalue reference:

                                                                                                      #include <algorithm>   // for std::move
                                                                                                      ...
                                                                                                          a(int&& c) : b(std::move(c)) {}
                                                                                                      

                                                                                                      The same would be true if c was a local variable; there is nothing special about local variables in this regard.

                                                                                                      The difference between something declared an rvalue reference (&&) vs an lvalue reference (&) is what they can bind to. An rvalue reference can bind to an rvalue, an lvalue reference can’t. Each binds only to the corresponding type of value.

                                                                                                1. 2

                                                                                                  I have the week off work so I’m taking some time to push my init/service manager, Dinit (http://davmac.org/projects/dinit/), closer to completion. I made a release earlier today and while it’s still pre-1.0, I’m finally happy calling it “alpha” - a pretty big milestone, since I’ve worked on it for many years (“real life” keeps getting in the way).

                                                                                                  1. 14

                                                                                                    Rust seems like a cult at this point. It seems that in every post there is always someone commenting “you know, in Rust we do it like this”, or “how funny, this remind me that one time, at Rust camp, when…”

                                                                                                    I’m glad some many people have found their passion, but it is funny nonetheless to see how “intense” Rust fans can be.

                                                                                                    1. 11

                                                                                                      Rust does an excellent job alienating a decent number of C++ programmers by constantly hijacking their posts.

                                                                                                      I wish people would also stop wrapping C++ up in discussions about C. They’re not the same language and each have their own good and bad points. That table’s wrong: C++ is type safe, has C++ has implicit typing with auto and you can do “dynamic” type checking (either with templates or std::any).

                                                                                                      1. 7

                                                                                                        C++ is type safe

                                                                                                        What meaning of “type safe” do you have in mind here? In C++ the type system is more of a suggestion than a strict enforcer. Let’s see what Wikipedia has to say about it:

                                                                                                        First definition:

                                                                                                        In computer science, type safety is the extent to which a programming language discourages or prevents type errors. A type error is erroneous or undesirable program behaviour caused by a discrepancy between differing data types for the program’s constants, variables, and methods (functions), e.g., treating an integer (int) as a floating-point number (float).

                                                                                                        Second definition:

                                                                                                        Vijay Saraswat provides the following definition: “A language is type-safe if the only operations that can be performed on data in the language are those sanctioned by the type of the data.”

                                                                                                        Both are demonstrably not true for C++ with its const_cast, reinterpret_cast, normal casts between different pointer types, no array bounds checking and lack of memory safety which results in values being overwritten by other data and basically any violation of type system you can think of.

                                                                                                        Now, Rust also isn’t type-safe. Safe Rust is type-safe. Unsafe Rust isn’t, but even unsafe Rust is more type-safe than C++.

                                                                                                        1. 4

                                                                                                          In C++ the type system is more of a suggestion than a strict enforcer

                                                                                                          I can’t comprehend how that could be true, other than that it allows explicit casting, which many languages (that are generally considered type-safe) do. One typical difference is that C++ is not memory safe, and such a cast may have or result in undefined behaviour rather than a run-time error as a result. But memory safe and type safe are not the same thing.

                                                                                                          Let’s see what Wikipedia has to say about it:

                                                                                                          The first definition is bogus, and the second is met by C++, excluding casts; it won’t let you call a member function that doesn’t exist, it won’t let you add two things that have no suitable operator overload defined, etc. That’s what’s usually meant by type-safety: if the code compiles and contains no type casts, you can’t get a runtime type error. Casts do break this, of course, but if you use a cast you’re explicitly overriding the type system.

                                                                                                          It may do a few implicit conversions if necessary, which is an example of weak typing, but not of lack of type safety.

                                                                                                          (One problem, admittedly, is that there is no very good agreement on what “type safety” really means, as can be seen in the wikipedia article you refer to. But it irks me that people point at C++ and say it lacks type safety because of XYZ, and they are really talking about memory safety or implicit conversion).

                                                                                                          no array bounds checking and lack of memory safety

                                                                                                          That’s nothing to do with type safety.

                                                                                                          1. 4

                                                                                                            Without memory-safety there is no type-safety, as you can modify any object representation to anything you want. You don’t need casts in C++ to violate type-safety in this way, just use-after free, out-of-bounds array access, use-after-referenced-object-wend-out-of-scope.

                                                                                                            I can’t comprehend how that could be true, other than that it allows explicit casting, which many languages (that are generally considered type-safe) do.

                                                                                                            Which languages are you talking about and in what way does casting in them violate type-safety? Note that type-casting alone isn’t a problem in C++, only casting of pointers.

                                                                                                            1. 2

                                                                                                              in what way does casting in them violate type-safety

                                                                                                              It doesn’t, that was the point I was making.

                                                                                                              You have certain types of cast available in C++ for which using them breaks universal type safety (as you define it at least). But, they’re clear markers in the code, and they indicate a choice of the programmer to override the type system. Saying “the type system is more of a suggestion than a strict enforcer” is true for C++ only in the same way as it is for Rust, which has an “unsafe” keyword and allows casts; that is, it’s not really true at all. Type-checking in C++ is generally quite stringent.

                                                                                                              even unsafe Rust is more type-safe than C++

                                                                                                              How would you even measure it? If “without memory-safety there is no type-safety” – your own words – then both unsafe Rust and C++ have no type-safety, and unsafe Rust can’t be “more” type-safe than C++.

                                                                                                              I think Rust is pushing language development in a good direction, but this kind of “Rust good, C++ bad!” clamour is not constructive.

                                                                                                              1. 2

                                                                                                                You have certain types of cast available in C++ for which using them breaks universal type safety (as you define it at least). But, they’re clear markers in the code, and they indicate a choice of the programmer to override the type system.

                                                                                                                That’s only true for casts though. I’ve already pointed out many other aspects of the language which don’t require any casts, and yet violate type-safety. You can’t grep for out-of-bounds array access, use-after-free,use-after-referenced-variable-went-out-of-scope the way you can for unsafe.

                                                                                                                How would you even measure it? If “without memory-safety there is no type-safety” – your own words – then both unsafe Rust and C++ have no type-safety, and unsafe Rust can’t be “more” type-safe than C++.

                                                                                                                I need to do more in unsafe Rust to e.g. go against the borrow-checker. It’s not enough to just write unsafe, I also need to convert some references into pointers. In C++ there is no equivalent thing that would stop me from violating the ownership model.

                                                                                                                I think Rust is pushing language development in a good direction, but this kind of “Rust good, C++ bad!” clamour is not constructive.

                                                                                                                There are good and bad language design decisions. C++ had plenty of the latter. Is pointing out the bad decisions inconstructive? Maybe in the sense that it doesn’t advance the state of C++. But I think it’s constructive in the sense that people should be aware that it’s not the only way in the niche that C++ fills and there are options you can switch to. C++ continually disappoints me both in ergonomics and safety, and its committee never disappoints on the front of making it even worse that it already is. So yes, it’s not constructive for C++. I don’t have a problem with that. Pointing out those flaws also serves another constructive purpose: a warning not to repeat C++‘s faults in other languages. Let’s learn from the past.

                                                                                                                1. 1

                                                                                                                  C++ had plenty of the latter. Is pointing out the bad decisions inconstructive?

                                                                                                                  No. But bold assertions like “In C++ the type system is more of a suggestion than a strict enforcer” are unconstructive. Assertions like “unsafe Rust is more type-safe than C++” are unconstructive if you don’t clarify what you mean by “more type-safe”.

                                                                                                                  How would you even measure it?

                                                                                                                  I need to do more in unsafe Rust to e.g. go against the borrow-checker

                                                                                                                  You’re missing my point. You vacillate between treating type-safety as a binary quantity (“C++ is not type-safe”) and as a spectrum (“unsafe Rust is more type-safe than C++”). Pick one!

                                                                                                                  1. 1

                                                                                                                    Agreed. I was being ridiculous. Too much time spent on the internet.

                                                                                                            2. 1

                                                                                                              it allows explicit casting, which many languages (that are generally considered type-safe) do.

                                                                                                              Please name one language that allows unsafe typecasts but is considered type-safe.

                                                                                                              That’s what’s usually meant by type-safety: if the code compiles and contains no type casts, you can’t get a runtime type error.

                                                                                                              Most people’s definition of type safety is not the same as yours; runtime type errors are an example of type safety as the language doesn’t allow values of a type to be used incorrectly. Compile-time vs runtime type-checking has nothing to do with type safety. You seem to be confusing static typing with type safety.

                                                                                                              1. 1

                                                                                                                Please name one language that allows unsafe typecasts but is considered type-safe.

                                                                                                                I didn’t say “unsafe typecasts”.

                                                                                                                If you want to make the argument that it’s the unsafe casts that make C++ not typesafe, I’m somewhat ok with that. But do casts in general make it not typesafe? No. Implicit conversions? No.

                                                                                                                Most people’s definition of type safety is not the same as yours; runtime type errors are an example of type safety as the language doesn’t allow values of a type to be used incorrectly

                                                                                                                Definitions are problematic here, as I already noted. By your definition, I don’t think you can have a memory-unsafe language which is type-safe at all (depending on what exactly you mean by “used incorrectly”, I suppose). The wikipedia article also essentially defines “type safe” and “memory safe” in a way that you can’t have one without also having the other. That seems pointless to me, but ok, maybe it’s the more accepted usage.

                                                                                                            3. 1

                                                                                                              C++ does implicit conversions of primitives, that doesn’t mean that it’s not type safe. You can see this with restrictions on narrowing casts on things like std::vector<int> v = {{ 1, 2, 3.0 }}; giving an error, and also with how function overrides with various primitives work. This is also visible at the function level because C doesn’t allow function overloading, whereas C++ does, because the types (integer, float, etc.) are part of the symbol put in the binary.

                                                                                                              Rust even allows normal casts between pointer types using as. This is actually a difference between C and C++ in that C++ requires the cast.

                                                                                                              reinterpret_cast in C++ is used for the same purposes as std::mem::transmute in Rust, for most of the same reasons, like reinterpreting binary data from a file or network as a struct.

                                                                                                              I think I’ve only ever seen one actual use of const_cast in 5 years and about 20 million lines of code.

                                                                                                              1. 4

                                                                                                                C++ does implicit conversions of primitives, that doesn’t mean that it’s not type safe.

                                                                                                                It doesn’t matter whether they are implicit.

                                                                                                                Here is a perfectly valid C++14 program demonstrating problems with casts:

                                                                                                                #include <iostream>
                                                                                                                #include <string>
                                                                                                                
                                                                                                                int main() {
                                                                                                                    const std::string hello = "hello";
                                                                                                                
                                                                                                                    float* fello = (float*)(void*)const_cast<std::string*>(&hello);
                                                                                                                
                                                                                                                    *fello /= 0.0;
                                                                                                                
                                                                                                                    std::cout << hello;
                                                                                                                
                                                                                                                    int x = 5;
                                                                                                                
                                                                                                                    double* y = (double*)&x;
                                                                                                                
                                                                                                                    std::cout << *y;
                                                                                                                }
                                                                                                                

                                                                                                                Another one, no casts, no bounds-checking:

                                                                                                                #include <iostream>
                                                                                                                #include <string>
                                                                                                                
                                                                                                                int main() {
                                                                                                                    const std::string hello = "hello";
                                                                                                                    volatile int a[] = {1, 2, 3};
                                                                                                                
                                                                                                                    for (int i = 0; i <= 3; i++) {
                                                                                                                        a[i] = 0xff;
                                                                                                                    }
                                                                                                                
                                                                                                                    std::cout << hello;
                                                                                                                }
                                                                                                                

                                                                                                                What do they have in common? They demonstrate that the language is not type-safe (they violate the semantics of the std::string type). Similar effects can be achieved with use-after-free, referring to variables whose references were captured in e.g. a lambda, but whose scope already ended.

                                                                                                                Rust even allows normal casts between pointer types using as.

                                                                                                                reinterpret_cast in C++ is used for the same purposes as std::mem::transmute in Rust, for most of the same reasons, like reinterpreting binary data from a file or network as a struct.

                                                                                                                You can do neither in safe Rust. All C++‘s footguns on the other hand are at your disposal at all times. And you can’t just grep for them in an inherited codebase.

                                                                                                                Wikipedia page provides a clear objective definition of a type-safe language. It isn’t a language that merely allows you to write programs that don’t violate type safety. It’s a language that prevents you and your collaborators from violating type safety, so you may sleep well, without the paranoia that maybe a string is not a string.

                                                                                                                1. 2

                                                                                                                  Type-casts are explicitly there to subvert type-safety. Rust has equivalent constructs, and I can write equivalent programs in Rust, if you allow me the “unsafe” keyword.

                                                                                                                  Native arrays in C++ are of course flagrantly unsafe. So is pointer arithmetic. Both are heavily discouraged in modern C++

                                                                                                                  So I will disagree with you that Rust is type-safe; only the subset without “unsafe” is. I agree C++ is unsafe, but the unsafety can be avoided by avoiding certain language features, which can be enforced as part of a style guide. (Unlike C, where no style guide can save you.)

                                                                                                                  (I do think “unsafe” is a brilliant feature, and I wish other languages had it; I’m looking at you, Nim.)

                                                                                                                  1. 2

                                                                                                                    Native arrays in C++ are of course flagrantly unsafe. So is pointer arithmetic. Both are heavily discouraged in modern C++

                                                                                                                    They are discouraged in favour of std::vector with operator[] which has the same problem.

                                                                                                                    So I will disagree with you that Rust is type-safe; only the subset without “unsafe” is.

                                                                                                                    If you check my comment again, you will see that I explicitly said that Rust isn’t type-safe, only safe Rust (as in without unsafe code) is.

                                                                                                                    1. 1

                                                                                                                      Hm, I’d forgotten that vector isn’t range-checked by default. I use the libc++ feature that turns on range checking, plus the Clang address and UB sanitizers in test builds.

                                                                                                                      1. 1

                                                                                                                        At that point you can opt into bounds-checking in the compiler and then classic arrays are just fine. You’ll gain lighter syntax and faster compilation as well. Unless you want growable arrays, but that’s a different scenario.

                                                                                                                        1. 1

                                                                                                                          I’m not aware of a compiler flag for bounds-checking C arrays … ? It seems of limited use because, if you pass an array as a function parameter, it decays into a pointer with no length information.

                                                                                                                          1. 2

                                                                                                                            g++ -fcheck-pointer-bounds -mmpx

                                                                                                                            Appears to work with arrays converted to pointers as well. But maybe there is a case where it will not work. I would have to check more details.

                                                                                                                            UPDATE: Above works only on x86 linux though. But after a bit of searching I found this piece on Wikipedia, which says that you can enable bounds-checking in STL with _GLIBCXX_DEBUG=1 or _LIBCPP_DEBUG=1 preprocessor constants. That will be more portable. I’m glad I found out about it. Less stress in the future debugging C++ memory issues.

                                                                                                                2. 1

                                                                                                                  Implicit conversions have nothing to do with type safety. JavaScript will implicitly convert a float to a string but is a type-safe language. You can work around C++‘s type system which means it isn’t type-safe. Rust’s std::mem::transmute is also not type-safe and can’t be used in safe Rust code.

                                                                                                            4. 2

                                                                                                              Maybe explore why this is the case?

                                                                                                              1. 6

                                                                                                                If I had to guess, it’s desperation to validate the time they’ve spent climbing the absurd learning curve. I see so many Rust programmers who only have personal projects to show, and bash away with Python/Go/Whatever at their jobs.

                                                                                                                I’d never even consider Rust for anything serious. It’s way too difficult to learn, and there’s way too many low-quality user conversations that I’ve read. It’s hard to take it seriously.

                                                                                                                I think Zig will eventually find a place in low-level systems programming, and thank God for that. It’s by no means perfect, imo, but it’s way easier to learn than Rust.

                                                                                                                1. 12

                                                                                                                  Your comment reads like your issues with Rust are that (i) it’s not mainstream, and (ii) you’re not familiar with it.

                                                                                                                  I’ll just note that C++ is probably even harder to learn than Rust. Except most don’t even realise it, because on average, they’re much more familiar with C++ already.

                                                                                                                  1. 5

                                                                                                                    I’ll just note that C++ is probably even harder to learn than Rust.

                                                                                                                    I agree. I think there’s a misconception that Rust is harder to learn than it is, and that comes from the strong shift it forces you to make in your mental model when you’re at the early-intermediate stage of your learning. It’s an uncomfortable thing to have to do, because it makes you feel like a beginner for a bit, but ultimately I don’t think it makes Rust that much harder to learn. I think if you were learning Rust and you actually were a beginner to programming, that barrier wouldn’t really exist.

                                                                                                                    C++ teaches you the same lessons that Rust’s borrow checker does, but it teaches you those things over the span of 40 years through hard-won experience. Rust compresses that into a few months.

                                                                                                                    1. 5

                                                                                                                      I’ve always been curious about this. Having written C++ and C in the past (though not a lot), I found Rust’s borrow checker to be, in most cases, fairly easy to learn. I understand the pain if you’re coming from a managed memory language, but if you’re already used to manual memory management, the borrow checker often just enforces best practices. Don’t return dangling pointers unless they have an explicitly tracked lifetime, make sure that any function consuming a struct uses a read-only pointer to the struct, etc, etc. The borrow checker certainly makes some things that a programmer knows is safe much harder to express, but that sounds more like a sharp edge than a learning curve. And if you’re coming from a managed language, manual memory management will always be a learning curve.

                                                                                                                      1. 2

                                                                                                                        Rust compresses that into a few months.

                                                                                                                        That’s exactly what I mean by “hard to learn”; the learning is front-loaded. The payoff is great but the cost is very real.

                                                                                                                      2. 1

                                                                                                                        The low-quality internet chatter certainly doesn’t help, case in point TFA. Most of it just reads like marketing blog posts and exhortations on the glory of the fearless borrow checker.

                                                                                                                      3. 6

                                                                                                                        If I had to guess, it’s desperation to validate the time they’ve spent climbing the absurd learning curve.

                                                                                                                        I can only speak from my own experience, but for me this is the other way around. My outward expression of enthusiasm is a result of already feeling the time I spent learning Rust has been validated by how much I enjoy working with the language.

                                                                                                                        I see so many Rust programmers who only have personal projects to show, and bash away with Python/Go/Whatever at their jobs.

                                                                                                                        I’m not sure if you intended it as such, but I don’t think this really works as a supporting argument that it’s too difficult to learn. Rust jobs are still really scarce, for a variety of debatable reasons - foremost among them, imo, that it’s trying to displace some very well-established/entrenched languages (C and C++). That and the fact that it’s still quite a young language in the grand scheme of things. I think that scarcity is probably the main reason you see this happening.

                                                                                                                        I’d never even consider Rust for anything serious. It’s way too difficult to learn

                                                                                                                        Again, I can only speak anecdotally, but I work full-time in Rust, and it’s in a very serious use-case. And I work with people who didn’t know Rust before they worked here, and they’re getting by more than fine.

                                                                                                                        And yes, I realise the more I type the more I validate this:

                                                                                                                        to see how “intense” Rust fans can be

                                                                                                                  1. 5

                                                                                                                    The discussion so far assumed that NULL is always an invalid pointer. But is it? The C standard defines a NULL pointer as a pointer with the value 0

                                                                                                                    The C standard also defines that pointer value to be an invalid value for dereferencing.

                                                                                                                    The following is footnote 102 from the C11 standard. Being a footnote, it’s not normative, but it’s consistent with the body of the standard; the null pointer can’t be valid, because it doesn’t refer to any object:

                                                                                                                    Among the invalid values for dereferencing a pointer by the unary * operator are a null pointer, an address inappropriately aligned for the type of object pointed to, and the address of an object after the end of its lifetime.

                                                                                                                    (It doesn’t matter that the address 0 may be a legitimate address in the underlying architecture; in C, 0 is not a valid dereferencable pointer value).

                                                                                                                    1. 1

                                                                                                                      I came here to say that, thanks :)

                                                                                                                      1. 1

                                                                                                                        Interesting, how do I read/write the memory at 0x00000000 then? On embedded systems, zero can be the address for the reset vector or maybe some memory-mapped peripheral, but apparently the C standard cannot represent it.

                                                                                                                        1. 3

                                                                                                                          The quoted section from the article is actually wrong. C defines that the cast from an integer constant expression that evaluates to 0 to a pointer is NULL. Casting any other integer (including a run-time integer value that happens to be 0) to a pointer is, at best, implementation defined (it’s implementation defined if you do it via intptr_t, it’s unspecified if you do it any other way). There’s no requirement that NULL have a bit pattern of all zeroes and on some architectures the all-ones bit pattern is used because that’s more convenient and allows any address from zero to the top of memory to be used as a valid address.

                                                                                                                          1. 1

                                                                                                                            There’s no requirement that NULL have a bit pattern of all zeroes and on some architectures the all-ones bit pattern is used because that’s more convenient and allows any address from zero to the top of memory to be used as a valid address.

                                                                                                                            Are any of these architectures currently manufactured? I looked into this a couple of years ago and all of the examples I found were at least 30 years old.

                                                                                                                            1. 1

                                                                                                                              Yes, at least one of the IBM mainframe architectures (Z series, I believe?). This came up when porting LLVM to that architecture. Interestingly, on AMD GPUs, address 0 is a valid (and frequently used) stack address. They would do much better by making an int-to-pointer cast subtract one and use the all-ones bit pattern to represent null, so casting (int)0 to a pointer would give an all-ones pointer value.