Threads for FrancisStokes

    1. 10

      That it is not common practice to ship debug symbols with production binaries is, in my opinion, quite sad and insensible. One argument is obfuscation. That never stopped any reverse engineer. Another is binary sizes. My bin directories are a few piddling gigabytes; they can bear to grow a bit in exchange for a better user experience.

      1. 9

        That it is not common practice to ship debug symbols with production binaries is, in my opinion, quite sad

        Agreed.

        Another is binary sizes […] they can bear to grow a bit in exchange for a better user experience.

        Unfortunately this doesn’t apply to everyone. At $WORK, stripping all of the binaries contained in the package we ship to customers results in saving almost a gigabyte. Our customers sometimes try to download our product from behind awful corporate firewalls/antivirii that will randomly interrupt downloads, so shipping the tiniest package we can is very important. A couple of my remote colleagues are also on terribly slow networks where that extra gigabyte would be a real pain when attempting to bisect the whole package itself.

        Another example was Debian’s firefox-dbgsym package, which IIRC was also a multi-gigabyte monstrosity. Clearly not something that should be shipped to everyone.

        1. 3

          Was coming here to say basically this. Storage is cheap, but transfer can be expensive, and time can be very expensive. We made our life much better at $WORK last year when we split our software into data and code and turned a 2.5 GB download into a 2 GB download that seldom changes and a 0.5 GB download that changes all the time.

        2. 3

          You may want to experiment with stripping debug sections and debug symbols but keeping other symbol names. In my experience, that results in a binary that’s almost as small as a stripped binary, but you get function names rather than byte offsets in stack traces. It’s not as nice to work with in a debugger as a binary with full debug info, but it’s much nicer than a completely stripped binary.

          With the strip tool, use --strip-debug to strip debug symbols and debug sections but keep other symbols.

      2. 8

        Another is binary sizes. My bin directories are a few piddling gigabytes; they can bear to grow a bit in exchange for a better user experience.

        In what way is the user experience better? At best, users get better stack traces when things crash, but if you’re recording those stack traces for debugging and know the package version then you can symbolise them later. Unless you’re a developer, you are unlikely to have any use for debug symbols. If you are a developer then you can

        Debug symbols are usually 2-10 times larger than the whole of the rest of the binaries. This doesn’t just impact disk size (though it has a huge impact on container start times) it also impacts downloads. You may have FTTP, but for a lot of users, downloading an extra 10-100 MiBs of debug symbols for each package when they update is a big impact.

        I just had a look at a FreeBSD system (FreeBSD ships with separate debug info). The /usr/lib directory on this system contains 9.9 MiBs of shared libraries, 27 MiBs of debug info. That’s three times as much disk space and network bandwidth consumed to distribute the debug symbols as the libraries. I’m very happy that that’s an optional install so I can install it on dev systems but not production ones.

      3. 7

        I don’t really care about debug symbols, and I wouldn’t want to need +700% of additional disk space used for something I will never use, and without which everything works perfectly fine:

        -rwxrwxr-x   1 x x   25M lis 24 13:41 app.without-symbols*
        -rwxrwxr-x   1 x x  184M lis 24 13:41 app.with-symbols*
        

        In rare cases where something crashes, I can download debug symbols later (if they’re available), in case I need them.

        One argument is obfuscation. That never stopped any reverse engineer.

        This is a protection based on cost to reward ratio. It seeds out less determined reverse engineers, but doesn’t affect more determined ones. So it does stop some reverse engineers, just not everyone. It increases the cost of reversing to a point where some of reversers bail out, because cost is higher than the reward.

      4. 4

        How often do users need the debug symbols? My educated guess is “almost never”. We complain about bloat all the time, why bloat things further with unnecessary things, then?

        Making debug symbols easily and readily available benefits both the end-user (who doesn’t need, nor want them), and the developer (who does, and can install them trivially) is, in my experience, a much nicer practice, with almost the same benefits, at the fraction of the cost.

        1. 2

          We complain about bloat all the time

          I don’t. And I don’t find that it is ‘trivial’ to install debug symbols (case in point, else-thread: ‘it seems that packed is currently unusable on Linux’). Something l have noticed other people say they value is open-source contributions; wouldn’t it be a great help if it were as easy as possible to debug, modify, and contribute to open source software?

          If it were a simple matter of changing a setting in my system package manager’s configuration file, to get debug symbols, source code, and dirty build artifacts—with the default being as it currently is—that would be one thing. But this is not a feature anyone implements, as far as I know.

          1. 7

            And I don’t find that it is ‘trivial’ to install debug symbols

            Then that should be fixed, rather than shipping debug symbols by default, which would be useless 99% of the time.

            If it were a simple matter of changing a setting in my system package manager’s configuration file, to get debug symbols, source code, and dirty build artifacts—with the default being as it currently is—that would be one thing. But this is not a feature anyone implements, as far as I know.

            Debian has a debuginfod server, which gives you trivial access to debug symbols: all you have to do is either have elfutils installed, or export a single environment variable (more information). While making source code similarly easy to access isn’t a thing yet, that’s also simple to automate, and it’s usually an apt-get source away.

            It’s a bit more work, yes, but you won’t get gigabytes of near useless stuff by default, and that’s a big win in my book.

            1. 2

              Don’t forget about apt-get install somepackage-dbg! (which might require enabling the debug symbols repo? can’t recall.)

              Debugging software from Debian repos is a pretty painless experienc. I’ve been much more willing to debug minor bugs in random software since switching to it.

      5. 2

        I agree with the sentiment. It would probably make more sense to have the common practice be that package maintainers release software exactly as is described in the article: the stripped binary, and an addon containing all of the debug symbols. Best of both worlds!

    2. 37

      Highly recommend this talk from Software You Can Love, Vancouver 2023, which dives deeper into this topic: ATTACK of the KILLER FEATURES - Martin Wickham.

      This is something I was hoping Carbon would have addressed in a satisfactory way, but, alas, they do the same thing Zig does, so Zig is once again the language which will have to innovate to solve this problem.

      1. 19

        The video explains that “Parameter Reference Optimization” is a serious design flaw in Zig. It’s a footgun that leads to subtle bugs that can’t be seen by grug-brained developers. It’s unsound. Other system languages like C, C++, Rust do not have this footgun, so people who come to Zig from those other languages will fall into the trap.

        It seems like the obvious fix (but not mentioned in the video?) is to use C semantics by default for passing large parameters (copy the value before making the function call, then pass a reference to the copy). Only use the “parameter reference optimization” when the optimizing compiler can prove that no aliasing will result if the copy is elided. This optimization strategy is sometimes called “copy elision”.

        At the 8 minute mark, a bug in List.add() from the Zig library is shown. The bug is caused by the “Parameter Reference Optimization” feature. To fix the bug requires the developer to add an extra copy of ‘item’ to the function body, which causes a performance hit. If, instead, Zig used copy elision as a sound compiler optimization, then you wouldn’t have this subtle bug, you wouldn’t take a performance hit to fix the bug, and the extra copy of the ‘item’ parameter would only happen in cases where it was needed (the compiler adds the copy at the caller site to prevent parameter aliasing).

        For the other buggy feature in the video, “Result Location Semantics”, I make the same comments. Fixing the bug in rotate (without changing the function signature) requires the developer to make an explicit copy of the parameter before using it, which again is a performance hit. Instead of making people write code like this, you should generate compiler temporaries by default, and elide the copies as an optimization only when it is sound to do so.

        You guys understand these issues better than I do, so what am I missing? Alias analysis is hard, so the standard approach I describe will have worse performance than what the Zig compiler does today (on par with C/C++), and you want better performance than C?

        1. 10

          I wouldn’t say that the issue is solved in other languages:

          • in C++, you can accidentally copy large heap objects. C++ semantics of implicit deep copy is not a useful one.
          • Rust solve correctness problems around parameter passing and aliasing via borrow checker, but that is complicated (at least, as a part of the whole package that is Rust), and doesn’t solve perf problems — Rust does a lot of stack-to-stack copies. It also lacks placement.
          • C doesn’t protect you from aliasing when passing arguments, and has useless copies where the caller has to defensively copy parameter.

          Then, there’s this annoying question of “should you pass small things by reference or by value? And what is small, by the way?”.

          And then there’s: https://outerproduct.net/boring/2021-05-07_abi-wrong.html

          My personal take here is that “fn f(param: Foo) passes parameter param by immutable reference” is the right high-level semantics:

          • not mutating is the common case, it shouldn’t require any sigils
          • some things can’t be copied (e.g., their address is meaningful), but you still want to be able to pass them as parameters without extra ceremony
          • whether “foo” is passed by value in registers or by pointer is a detail that should depend on target CPU. Language semantics should allow either.

          The question is rather how to implement that semantics without running afoul of aliasing, which is what Carbon, Zig, and Hylo (formerly Val) are trying to do.

          1. 3

            Rust has added fixes/optimizations to LLVM to eliminate redundant stack-to-stack copies.

      2. 7

        Thank you for the video suggestion, I am not a Zig programmer and the blog post was a bit too terse for me – I could not really follow. The video is super clear and got me very excited about Zig, but folks if you expect a solution besides a clear exposition of the problems you will be disappointed :D we’ll have to wait a while for that I guess.

      3. 3

        I’ve been getting in to Zig in the past few weeks and I really have to say: It’s a joy. Would forcing the compiler to always pass a struct by value with a flag ever be considered? Or does that go harder against the grain of other language principles?

        1. 11

          That kind of solution doesn’t really work because it doesn’t compose … Now libraries would have to add test coverage for being compiled with both flags.

          And if the libraries have dependencies, then they can’t rely on the semantics of those either. You end up with a 2^N test matrix for N libraries.

          The flags also don’t appear in the source code, which is a problem for readability.

          Compiler flags can be used for optimizations, which by definition are semantics-preserving. Pass by value vs. by reference isn’t a semantics-preserving change – it’s basically the opposite semantics!

        2. 1

          Adding a compiler flag to decide semantics as fundamental as “are variables passed by pointer or by copy” sounds like a terrible idea.

    3. 1

      Came across this post https://j3s.sh/thought/recover-lost-text-by-coredumping-firefox.html by @j3s a few days ago and fell down a little rabbit hole of learning all about the /proc file system and ptrace. I’m afraid that it’s now more or less inevitable that I’ll end up writing my own debugger now.

    4. 21

      I think that the points around making things serializable is interesting and valid. But I think that it’s important to acknowledge that the rust type system does in fact involve trade-offs. Self referential structures aren’t just “object soup”. They are a natural and fundamental way of modeling information and when a language makes it awkward to express those relationships it has given up something. Rust gave it up in return for memory safety without GC which is a valid tradeoff.

      1. 8

        They are a natural and fundamental way of modeling information and when a language makes it awkward to express those relationships it has given up something.

        One could say the same thing about goto. Just s/information/code/.

        The progress in programming comes from increasing global composability by eliminating local expressiveness that interferes with it.

        In my experience forcing developers to express information relationships in a bigger framework of tree of ownership leads to substantially better software.

        1. 7

          I would argue that the comparison to goto is not completely apt — the two tradeoffs offer very different benefits. In the goto case you barely lose anything (almost every usage can be converted to structured control-flow), while your reasoning abilities are greatly increased. It is no accident Dijkstra’s paper became so widespread.

          Not having a GC is a valid tradeoff in some (niche) use cases, as mentioned, but it is very far from a ubiquitous one. There is nothing fundamentally bad about self-referential data, in fact I think the vast majority of existing software would be vastly more complicated without them.

          1. 3

            almost every usage can be converted to structured control-flow

            Almost every use of self-referencing can be expressed as indexes, with increased safety, ability to reason about and composability.

            Not having a GC is a valid tradeoff in some (niche) use cases, as mentioned, but it is very far from a ubiquitous one.

            Not having a GC makes the code have no runtime, which increases it’s composability by making it potentially usable in everything. No one will use Java code from Python, or JS, etc. while almost every language can use Rust because it doesn’t have a runtime.

            It also allows practical aliasing model which enables efficient memory safety and concurrency.

            Small concession, big wins.

            There is nothing fundamentally bad about self-referential data

            “There’s nothing fundamentally bad about goto.” Quite a few elegant goto patterns in C, that I don’t miss, but used to employ.

            Dropping it too was a relatively minor concession for ability to reason and compose larger software.

            1. 14

              Almost every use of self-referencing can be expressed as indexes, with increased safety

              Moving references out of the type system does not increase safety. If you have a pointer to an object of type T, and the type system knows that it is a T*, then static and dynamic checkers (and run-time enforcement techniques) can ensure that your T* does not outlive the underlying T, or that it is explicitly invalidated if it does. In contrast, if you refer to objects via an index then you are hiding this relationship from the type system. You are doing manual memory management again. It’s not quite as bad as C, because a dangling index will point to a bounds-checked object and so will either be detectably invalid ot point to some T, but there is nothing that guarantees that it points to the T that you think it does.

              If you delete an object from a graph, you now have a free slot in your index table. Do you reuse that? Now you have a mechanism for introducing use-after-free bugs. A dangling index to that slot will refer to the wrong object. Alternatively, do you avoid reusing the slot? Now you have a memory leak and have increasingly sparse data structures that become progressively more inefficient to traverse.

              This kind of model works fairly well for things like the CFG in the Rust compiler, because those are short-lived structures and so it’s fine to simply burn slots for deallocated basic blocks and free the whole thing at the end (though it’s still more effort than it would be to use explicit traced references or weak references in a reference-counted environment). They work very poorly for data structures that change shape over time.

              Not having a GC makes the code have no runtime, which increases it’s composability by making it potentially usable in everything. No one will use Java code from Python, or JS, etc. while almost every language can use Rust because it doesn’t have a runtime.

              People happily used Objective-C from Ruby, Python, and Java (and even Haskell and Ocaml) via mechanically generated FFI, and from C/C++ via hand-written FFI. Having a runtime didn’t prevent this. Having reference counting in the runtime made it easy to build bridges that automatically surfaced Objective-C objects in these other languages. Objective-C’s perception as an Apple language was the largest barrier to this.

              It also allows practical aliasing model which enables efficient memory safety and concurrency.

              Except that now you’re advocating for design patterns that throw away temporal safety. Type-pooled allocators in C/C++ have the same level of temporal safety as your proposal. They’re much better than nothing (which is why Apple now uses them in the XNU kernel) but they’re a long way away from ideal.

              “There’s nothing fundamentally bad about goto.” Quite a few elegant goto patterns in C, that I don’t miss, but used to employ.

              C’s goto is absolutely nothing like the goto that existed prior to structured programming. It has well-defined behaviour with respect to lexically scoped objects (you can’t branch over a local variable’s declaration, for example) and it is not permitted to branch to labels in a different function.

              1. 3

                C’s goto is absolutely nothing like the goto that existed prior to structured programming. It has well-defined behaviour with respect to lexically scoped objects (you can’t branch over a local variable’s declaration, for example) and it is not permitted to branch to labels in a different function.

                You can branch over a local variable’s declaration. The following is legal:

                    goto L1; // legal
                    int x = 42;
                L1:
                
                    goto L2; // legal
                    {
                        double y = 3.1419;
                L2:
                    }
                

                The only restriction that the C standard places on goto’s are that (1) the jump must be within the same function, and that (2) you cannot jump into a scope containing a variable-length array. (C23 6.8.6.1 “The goto statement”, draft N3096), so the following would be invalid:

                    goto L3; // illegal
                    {
                        char buf[n];
                L3:
                    }
                
                    goto L4; // illegal
                    {
                L4:
                        (void) 0; // a label cannot directly precede a declaration
                        char buf[n];
                    }
                

                EDIT: fixed formatting.

            2. 4

              with increased safety

              s/increased/decreased. Even naive C have better tools to actually debug “dangling pointer” problems — valgrind will cry about it, but there is nothing to do if you store an index, but the associated value was replaced in the meanwhile. Let alone a managed memory language.

              And come on, increased ability to reason about? What is more clear, that my Node has a parent: Node, or if it has a parent: u32? In a managed language the former has only a single invalid value (or if the language is null-aware, none), the latter has potentially 2^32, let alone the case of dangling pointers.

              Not having a GC makes the code have no runtime, which increases it’s composability

              Yes, which is a niche use case, which might be worth it depending on context, say, writing a specialized library. Most people want to write applications, though, where this concern is negligible, and can be circumvented by IPC.

              It also allows practical aliasing model which enables efficient memory safety and concurrency

              So you think a checked array access will have better optimizations by the compiler than pointers themselves? Especially that one of rust’s strong points is that it can actually “use the restrict keyword” properly, thanks to the borrow checker? This is an unfounded statement on your part.

              There is definitely value in restricting the “degree of freedom” of code, see types, but as most things, it is always a tradeoff. We don’t generally use regular programs, even though they are much easier to reason about than Turing-complete ones.

              1. 4

                And come on, increased ability to reason about? What is more clear, that my Node has a parent: Node, or if it has a parent: u32?

                I think you misunderstood this. The idea is to not to just replace the pointer with an index in the object, but to implement another vector of data in the main struct:

                parents: Vec<(PersonId, PersonId)>
                

                This works if your data is immutable, so you can sort it and use ranges and binary search. If you mutate your data during runtime, using a map works better:

                parents: HashMap<PersonId, PersonId>
                

                Although if the data is kept in memory for a long time, and mutates during its lifetime, I’d consider using indices like this a bit before jumping into writing the code.

                Again, you have all your data in flat structures and you do not have any nesting of data anywhere. It makes a lot of sense e.g. when modeling databases to an ORM, or when writing a compiler.

                1. 3

                  Again, you have all your data in flat structures and you do not have any nesting of data anywhere. It makes a lot of sense e.g. when modeling databases to an ORM, or when writing a compiler.

                  That’s an important point. If one is working with graph data, chances are high that it will need to be persisted anyway, in which case there’s barely any technology that will allow persisting the “object soup” natively (they were attempt of “object oriented databases”, but obviously it’s a terrible idea so it didn’t work out).

                  So now the “ease” of “object soup” has an additional difficulty of dealing with the conversion back and forth to the form I’d advocate for in the first place as clearly better.

              2. 3

                tools to actually debug “dangling pointer” problems

                It’s not a pointer. It can’t be accidentally deferenced forgetting to check if it’s still valid. Sure the logical coherence of the larger data-structure needs to be taken care of, but it’s not unlike GCed languages. In a managed language if you point a child at a new parent, but forget to update the old parent, are you going to get great results? If anything, all the “invalid values” (often including ‘generations’ in a slotmap) can actually crash the code in an orderly way and help spot the issue, instead of letting it “mostly work”.

                or if it has a parent: u32

                It’s parent: NodeId.

                Yes, which is a niche use case, which might be worth it depending on context, say, writing a specialized library.

                That’s no a niche use-case. The ubiquity of C and C++ despite their huge risks, is a result of being able to reuse them.

                So you think a checked array access will have better optimizations by the compiler than pointers themselves?

                Who cares. Rust does have references, including shared ones. You only really need to use indexes when you’re already working with a graph, and children need to point at parents, etc. and the whole thing is non-trivial already, and the performance overhead of “non-native-references” is akin to the cost of a JOIN in a database .

                Normal relationships that fit a DAG (“tree of onwership”) are all one needs 99% of the time anyway.

                Worry about all the memory barriers, extra memory usage and cache invalidation caused by a GCed-runtime that is slowing down the normal GCed code at all times, to make the GC work, not a 1% case in Rust, that is often fast enough.

          2. 1

            There is nothing fundamentally bad about self-referential data, in fact I think the vast majority of existing software would be vastly more complicated without them.

            It wasn’t obvious to me, but the reasons for Rust not supporting self-referential structs can be said to be historical, not strictly technical (see 1 and 2).

            I find that restriction quite limiting in the kind of APIs I’m able to design. I often want to encapsulate two object in a single struct, where one fields references the other (e.g. rusqlite::Connection and rusqlite::Transaction). You can’t do that in current Rust because:

            1. Fields of structs are dropped in declaration order.
            2. Transferring ownership (i.e. moving) of a struct needs to be equivalent to memcpy().

            In [2], withoutboats lays out a proposal for a Move trait that would invert the status quo: structs would be immovable by default and movable as an opt-in. Immovable types could be naturally self-referential and even structs that implement Move could manually instruct the compiler how to copy themselves, thereby solving (2). As for (1), it’s a simple convention that would need to be relaxed.

            The biggest reason in my understanding why this is not happening is backward compatibility.

          3. 1

            almost every usage can be converted to structured control-flow

            In fact, every program with goto can be converted into one with only loops and conditionals https://en.wikipedia.org/wiki/Structured_program_theorem

            1. 1

              Yes, though arguably there are a few usage patterns where goto or its tamer little brother, break is much cleaner.

        2. 1

          You’re falling into the trap of believing that there is a single path of progress towards some kind of ideal, which every language is moving towards or away from. Ideas in languages evolve the same way organisms do: In response to localised pressures, towards whatever gives them an advantage in that domain.

          So to your point: goto has indeed mostly “evolved” out of most programmers workflow and languages. But it is still alive and well in certain contexts, for example kernel development.

      2. 6

        Totally agreed, there are major tradeoffs here. One that’s gotten a lot of discussion recently: The designers of async/await had to go through an enormous amount of trouble to fit futures into the language, since they’re fundamentally self-referential types. They settled on the Pin API, which is a leading contender for the trickiest module in the whole standard library.

        Another interesting one: The small-string-optimization layout in GCC’s std::string is self-referential. Rust’s String doesn’t do SSO in general, but if it did, it would have to do something like what Clang does rather than what GCC does.

        1. 1

          I don’t think SSO is typically self-referential in the sense that is not allowed in rust. Typically SSO looks like (but more optimized to use every available byte and not hardcoded to 64 but platforms).

          enum String {
            Inline([u8; 24])
            Allocated{
              ptr: *mut [u8],
              len: usize,
              capacity: usize,
            }
          }
          

          There is no explicit reference to the inline buffer (as that would waste a word of SSO space). Rust can handle this perfectly fine as a method like as_u8 can borrow a reference to the buffer.

          1. 2

            You’re right about what’s typical, and for example that’s what Clang’s implementation does, but GCC’s std::string is different. Reading C++ stdlib headers is the worst, but the most direct evidence of this is that Clang’s data pointer is only present in the long representation but GCC’s data pointer is always present. Also, Clang has a fast path for bitwise copying short strings, but GCC does not, because it can’t copy the self-referential data pointer. Since this steals 8 bytes of capacity from the small representation, GCC pads std::string to 32 bytes.

            1. 1

              Oops, I skipped over what’s probably the most important implementation detail: Clang’s string::data() has a branch to handle the small string case, but GCC’s string::data() does not.

      3. 2

        Well said. I’m as big a member of the Rust Evangelism Strike Force as anyone, but it’s very frustrating to see so many people try to spin things that are clearly flaws/shortcomings/whatever as somehow actually good. It’s not just Rust fans, of course–I see it in all of the various programming language forums/subreddits/etc. It’s okay to admit that something sucks about the thing you like!

        Using arenas in Rust is sometimes the best approach because of performance reasons, but frequently arenas are just awkward, bug prone (must keep your indexes in sync with arena updates), workarounds for Rust’s borrow rules.

        1. 2

          In the past seven years I’ve been working with Rust, using the approach with indices described in the article has been an absolutely awesome API two times in quite large projects. It’s a very nice abstraction when you need to parse a file of code once and then refer it later. Eventually you’ll have one allocated string sitting in the bottom, tokenized in the next layer and then building a database of typed token IDs either in vectors or maps on top of the AST. The public API is built on top of the flat vectors of IDs which can then take a reference directly from the string which represents the code.

          It is very efficient and extremely nice API to use. It has been causing almost zero crashes and has accelerated all development in the corresponding code bases. It’s not that I’m trying to evangelize Rust, but this way of modeling the data works very nicely for a few hard problems I’ve been needing to solve in my career.

          1. 1

            Sure. And I guess I shouldn’t have specified “because of performance reasons”. Rather, my point was that sometimes the arena approach is actually the best approach to the problem, but sometimes it’s used because it’s the only option.

            Phrased a different way: if you were writing your code in a GC’d ref-happy language and you’d still choose the arena approach, great. But if you wouldn’t choose the arena approach in the hypothetical language, then it seems hard to argue that it’s better than “object soup” or whatever. It’s like the infamous tree/graph data structures conundrum. Using trees and graphs to model data relationships is very often exactly what we want. The fact that I see so many (frankly, newbie) Rust fans try to tell people that they really don’t want/need trees is crazy.

    5. 15

      This article seems locked into desktop/server computer architectures. Lookup tables are common in embedded systems for good reasons.

      Another advantage of a lookup table is the ability to use the results of actual tuning experiments for complicated control systems (think like altitude, temperature, humidity, operating modes of a system, etc). This data may not be convertible to a mathematical function.

      In general, one programming community’s heresy is another community’s bread and butter.

      1. 7

        Even for desktop / servers, the L1 cache sizes and latencies look a bit wrong. Not a huge amount, but enough to skew the results (caches that small tend to be faster, caches that slow tend to be bigger). It also doesn’t account for locality. Quite often (though very workload-dependent), a look-up table will have similar values looked up close together. We used a lookup table for sizeclass calculation in snmalloc and found that we very often hit in L1 because programs tend to allocate a load of similar-sized objects together. Quite often, they’ll allocate multiple objects of the same type (and therefore size) in a tight loop, and then every lookup after the first one is basically free.

        It’s also somewhat ignoring the secondary effects from speculation. A look-up table will push something into a load queue and stall operations that depend on the result until it gets the result from cache / memory. Big CPUs are designed around the assumption that memory is slow and are really good at hiding memory latency by running other independent instructions while they wait (this is even more true on SMT cores, if you care about whole-system throughput more than microbenchmark latency). In contrast, if you do a load of arithmetic then you’re using arithmetic pipleines and they can’t simultaneously be used for anything else. That’s even before you think about rename register pressure and the cost of having larger numbers of live values tracked in the core.

        For (small) embedded systems, lookup tables often have very different tradeoffs:

        • There often isn’t a cache at all.
        • Memory is often very fast (single-cycle-access SRAM)
        • Multiply and divide hardware might not even be available, so the function-call overhead alone to a divide routine can be more than a lookup table.
        • Even when hardware is available, often things like multiply and divide are multi-cycle operations, optimised for area not performance (e.g. divide implemented by repeated shift and subtract in a loop). This is not good for anything that needs deterministic latency (crypto, control systems, and so on).
        • On some of the slightly more clever cores, multiply and divide can be variable-length instructions
        • Hardware is typically single-issue (all except the most expensive M-profile Arm cores, for example) and so two instructions will take longer than one.

        On our Ibex, for example, a lookup table will require 1-2 cycles for address calculation, then one for a data load (pointer loads are a bit slower, with an extra cycle of load-to-use delay that can be hidden with software pipelining), so you can typically do it in 3-4 cycles. 32-bit integer divide is 15 cycles. Reciprocal multiply (to divide by a constant) can be 6-8 cycles. If your calculation contains a single division, it is faster to do with a look-up table.

        Of course, a div instruction is smaller than lookup table and the 3 or so instructions to do the lookup, so if this is not a hot path then you’re better off using arithmetic and having slower smaller code than bigger faster code.

      2. 2

        While you’re completely right that LUTs are way more common (and neccesary) on embedded systems, understanding cache access patterns absolutely has a place in this domain. Modern microcontrollers have cache hierarchies with detailed information about access times. You can use exactly the same kinds of calculations shown in the article to figure out whether a table lookup in a hot codepath (e.g. an ADC interrupt handler) will result in worse performance than performing a certain calculation inline. Just as important is cache locality.

      3. 1

        I tried removing it from a toy program of mine, but it got so much slower, on desktop.

        It’s “using math to do things to chars” (and floats). Granted, it does say “(probably)” in the headline.

        I should try it on the much faster work desktop.

    6. 3

      Looking forward to testing this out. Not really getting the “closed source” until we ship idea though.

      1. 4

        I mean, it’s a personal project. Especially when you’re somewhat well known, opening up a project that you don’t intend to take any practical feedback on can just generate a lot of noise.

    7. 1

      This ended up being more in-depth than I was expecting, given the title. Whenever discussing eval possibilities and dangers in any language, I tend to try and emphasise that it’s usually just not the right tool for the job. 95% of the time, people want to evaluate a mathematical expression, and need to be pointed towards a parsing technique instead.

    8. 2

      Great list! Though the idea of building a statically typed programming language in a week gives me imposter syndrome

      1. 5

        The trick is to reduce scope! Any of the example programs simplify to very small cores. Once you have a core, you can spend howmany weeks you like to extend the core.

        For a statically-typed programming language, the first simplification would be to get rid of the parser. You can start with just

        struct Expr<T=()> {
            ty: T,
            kind: ExprKind<T>
        }
        
        enum ExprKind<T> {
            Int(i32)
            Bool(bool),
            Binary { lhs: Expr<T>, rhs: Expr<T>, op: BinaryOp },
            If { cond: Expr<T>, then_branch: Expr<T>, else_branch: Expr<T> },
        }
        
        enum BinaryOp {
            Add, Sub, Lt, Gt, Eq
        }
        
        enum Type { Int, Bool }
        
        fn type_infer(e: Expr<()>) -> Result<Expr<Type>, String>
        fn compile_to_wasm(e: Expr<Type>) -> String
        
    9. 3

      I watch this video about once a year, each time thinking this will be the time I fully understand it. I’m making progress, but I’ve definitely still got a while to go.

    10. 4

      Fantastic post. It combines two of my favourite things:

      • Building your own tools

      • Fearlessly approaching areas of the stack that many people shy away from

    11. 20

      It took me a second to figure out what the video was recording: the person holds down the . key.

      1. 5

        This is the context I was missing!

      2. 3

        Looking back I should probably have highlighted that more to not cause any confusion.

    12. 3

      The title totally buries the lede! The real value here is the technique for automatically keeping an implicit history with every build. Definitely something I’ll be adding to my own work flow.

    13. 2

      Great explanation! I’m a huge fan of parser combinators; I wrote and maintain https://github.com/francisrstokes/arcsecond, and even the first series of videos I made was about building up a PC implementation from scratch (https://www.youtube.com/playlist?list=PLP29wDx6QmW5yfO1LAgO8kU3aQEj8SIrU).

      They’re really appealing on multiple levels; they both break down the complexity normally associated with parsing into just some functions, but are also flexible enough to be thrown at both text and binary data. Combine that with higher order operations, and techniques like using coroutines for async-await style parsing, and you’ve got one very powerful tool. Years later, and they’re still my go-to for anything beyond the simplest of regular expressions

    14. 4

      This is totally incredible work, she said at the end “I spent thousands of hours looking through the firmware for gadgets”. It’s hard to imagine putting in so much work on something when you don’t even know if it’s possible. Clearly it was worth it though!

      1. 3

        Yeah it’s a phenomenal feat of work! I imagine that it takes a certain persistence of will to push through, but also every little vulnerability that she found probably gave her a new boost of energy to keep going. Getting incremental wins can be really powerful

    15. 1

      This video is part of an ongoing series of videos, where the over-arching goal is build up, from scratch, a complete firmware solution with a custom bootloader capable of performing signed, authenticated firmware updates. It makes use of a Cortex-M4 STM32 microcontroller, and the open source libopencm3 project as the primary C library for interfacing with the microcontroller. I’m trying to cover a lot of topics - everything from writing custom linkerscript, to reading datasheets and references manuals, to designing and building a packet protocol for reliable data transmission, to using AES for firmware signing and validation.

    16. 3

      Makes me really happy to see node come out in front of the non-compiled on the higher end benchmarks. There’s so much trash talk out there about it as a runtime and a language, but in the async do-a-ton-of-things domain, it totally holds its own.

    17. 64

      Whoaboy that is car crashing into a derailed train falling off the side of a cliff.

      1. 5

        Seriously… it just keeps going.

    18. 7

      This video is part of an ongoing series of videos, where the over-arching goal is build up, from scratch, a complete firmware solution with a custom bootloader capable of performing signed, authenticated firmware updates. It makes use of a Cortex-M4 STM32 microcontroller, and the open source libopencm3 project as the primary C library for interfacing with the microcontroller. I’m trying to cover a lot of topics - everything from writing custom linkerscript, to reading datasheets and references manuals, to designing and building a packet protocol for reliable data transmission, to using AES for firmware signing and validation.

    19. 2

      Really good article. This is something I’m going to keep my eye on in the future while writing docs. The title seems draw an extra conclusion that I don’t feel the article is necessarily making, however.