Threads for evmar

    1. 2

      Semi-relatedly: despite not having any explicit support for tail calls, I recently discovered that Rust (because of LLVM) will generate tail calls if the call is in the right position. Note the “jmp” here: https://godbolt.org/z/f6P41qqv5 . I have been experimenting with making use of it in my emulator for the same performance reasons Python wants to use it.

      Unfortunately, these tail calls do not carry over to Rust generating wasm. C++ to wasm appears to generate tail calls if you pass -mtail-call, but I couldn’t figure out if there was a Rust was to trigger it. Here’s at least C++ wasm tail calls: https://godbolt.org/z/Ebc81vnGa (the “return_call_indirect” op there).

      (PS: Rust does have some ideas/plans around a “become” keyword eventually but there’s nothing usable yet, bummer.)

      1. 5

        C/C++ don’t explicitly support it either, but it’s a ubiquitous optimization. Clang did add a nonstandard “must tail-call” attribute to force a tail-call regardless of optimization level, which is useful in development of CPS code so your debug builds don’t blow the stack!

      2. 5

        My favorite cool application of Tailscale recently: I needed to test some software on Linux within WSL on Windows, from my Mac in a cafe. A quick install of Tailscale on the inner Linux VM made ssh immediately work, through unfathomable layers of firewalls/configuration of Linux, Windows, my home router, my ISP…

        1. 1

          Yes, I use Tailscale inside WSL too. Very handy because the Windows machine is more beefy than my other native Linux desktop.

          It doesn’t work so great for builds because I keep getting clock skew errors a lot.

        2. 2

          I’m in good faith trying to follow the author’s larger point, so do not take this as a refutation but rather as a question to make sure I followed the post:

          In a more mutate-y language, is their solution the same as having a state accumulator object passed around

          interface Output {
            putNode(Node)
            putMetadata(Node, Metadata)
            putGraph(Graph)
          }
          

          where e.g. putNode is this.nodeSet.insert(node)?

          If that is the case, what insight is the monoid+Writer framing beyond that?

          Is it that the traversals of the Trees are separable and having this state accumulation interface is sufficient, without needing any read operations? Or is it something simpler like how they get the implementations of the accumulations for free via the ‘deriving’s?

          (Rereading, I guess it is maybe the first one? From the post, “The insight here is that everything we needed to collect was monoidal”?)

          1. 4

            Both of the things you said are true, but neither is the real point.

            The author, like most programmers, is interested in replacing lots of specialized, explicit code with a smaller, easier to reason about slice of specialized code that implements an interface accepted by a more general-purpose and battle tested chunk of code that is someone else’s job to make work right.

            The real value is not the Monoid itself, but instead how it plugs into other standard interfaces and language constructs. By implementing Monoid and using the Writer monad, you can eliminate all the explicit code involved in threading the result state in and out of functions and combining partial results.

            So I read this article the same way I would read an article detailing how one would model a solution by implementing their language’s Iterable interface, or how their code could be improved by leaning on their Dependency Injection container and implementing a Java Bean.

          2. 14

            This page seems to be missing any reference at all to prior art, e.g. dozens of project like this:

            https://domterm.org/index.html

            https://github.com/unconed/TermKit - 14 years old, but looks better than the state of the art !!!

            https://hyper.is/

            https://github.com/ericfreese/rat

            etc. etc.

            https://github.com/oils-for-unix/oils/wiki/Interactive-Shell

            Mentioned here:

            https://lobste.rs/s/qwcov5/deja_vu_ghostly_cves_my_terminal_title#c_imqrsg

            https://lobste.rs/s/xhcdg3/majjit_lsp#c_4agnik


            As I mentioned there, I wonder if terminals/shells are stuck in the past because the efforts are so diffuse. That is kind of natural in open source, but also a problem


            On static typing, there is almost certainly something that can be improved about the shell, but this page doesn’t address a key problem – static typing doesn’t scale to the sofware on even a single machine (on Linux or Windows), let alone multiple machines:

            https://lobste.rs/s/sqtnxf/shells_are_two_things#c_pa4wqo

            1. 5

              Hey thanks for all the links! Some great stuff I hadn’t seen before

              As I mentioned there, I wonder if terminals/shells are stuck in the past because the efforts are so diffuse. That is kind of natural in open source, but also a problem

              On static typing, there is almost certainly something that can be improved about the shell, but this page doesn’t address a key problem – static typing doesn’t scale to the sofware on even a single machine (on Linux or Windows), let alone multiple machines:

              scrapscript is specifically built so that all machines share the same types/data/code :)

              This is one major reason I think scrapscript specifically has something new to offer to shell design! You don’t have to install libs/apps or copy/paste to access the scrapyard. You can imagine something like this:

              silent (cat file.csv)
                |> andyc/customers
              
              -- ok [ andyc::customer { id = 123, name = "Joe" } ]
              

              But I don’t know. This might just be an advanced case of NIH syndrome.

              Really appreciate the feedback!

              1. 4

                For a while I was collecting “related work for new terminals/shells” here, so here are a bunch more links:

                https://github.com/evmar/smash/blob/master/docs/related.md

                1. 4

                  Yup, that link is near the top of my wiki :-) https://github.com/oils-for-unix/oils/wiki/Interactive-Shell

                  It would be cool if there was some survey of all those projects, to see what they did “wrong” and right. And what they have in common and where they differed

                  I linked this survey from 2014, but I haven’t seen many others - http://waywardmonkeys.org/2014/10/10/rich-command-shells


                  I think it’s a hard design / bootstrapping problem because it’s IPC, not something you can ever break (e.g. why the terminal itself is so crufty, why CSV is, etc.)

                  I also remember this page about hyperlinks in terminals, which seemed to inspire a lot of argument at the bottom - https://gist.github.com/egmontkob/eb114294efbcd5adb1944c9f3cb5feda (adding this to the wiki)

                  I should probably rename / refactor the wiki a bit to make it clear it’s an editable survey, and make it clear which ones are apps and which ones are protocols, etc.

                  1. 1

                    @evmar @andyc @ilyash @xiaq

                    This thread gave me the thought that there should be a CANONICAL site that hosts descriptions of all the main shells and is a jumping point to more info.

                    And then I thought “what a stupid idea - very few topics have canonical web sites” …

                    And THEN I thought “but wait actually most of the shells that are being actively developed at the moment are being developed by lobsters!” So you could actually do this.

                    (There’s someone from the fish team here too but I’ve forgotten their username.)

                    1. 3

                      There are a bunch of detailed and comprehensive surveys here, and apparently people other than me actively edit them :-)

                      https://github.com/oils-for-unix/oils/wiki/Alternative-Shells

                      https://github.com/oils-for-unix/oils/wiki/Internal-DSLs-for-Shell

                      https://github.com/oils-for-unix/oils/wiki/Alternative-Regex-Syntax

                      https://github.com/oils-for-unix/oils/wiki/Survey-of-Config-Languages

                      The “Interactive Shell” one should be renamed / improved, as I mentioned … help is welcome

                      1. 1

                        I somehow assumed that the first link here is the “canonical” de facto place. Thanks to Andy who already did the job!

                      2. 1

                        And TIL that Steve Bourne is still with us, as in still living - I don’t know whether he’s on lobste.rs.

                2. 5

                  One angle on this that I now appreciate is a reason most programs don’t use more than 4gb anyway: because actually traversing >4gb of data to load it into memory takes a long time, and similarly you’re unlikely to want to read it all back out.

                  The exceptions tend to be things like databases, which might slowly load a large dataset and an index and only read specific subsets of it back out for any given query, or programs specifically designed for manipulating large quantities of data like image editors, which eventually need specialized code to work with GPUs. Another exception are enormous matrices for AI purposes, but for those you end up writing specialized code to make them work with the GPU, you don’t tend to need read-write operations on every arbitrary byte of the data. Otherwise, even programs that might need to process >4gb of input are far more likely to stream through it using less than 4gb of memory at any given time.

                  You almost could imagine it would work out to have an explicit separate “do X with a >4gb buffer” API that you use for the exceptional cases while running the rest of your program within 4gb. In fact, I worked on a codebase that did this; see the discussion headed “Indirect buffers” here https://neugierig.org/software/blog/2022/06/wasm-c++.html . It is the image editor case described above, and for exactly the reason described here the “big” data is held in buffers outside of the program’s memory where the GPU can manage it.

                  1. 3

                    With enough operating system support and page table tricks, you should be able to reduce the bounds checks to a single bit test per pointer access on x86, and possibly eliminate it entirely on an architecture that allows execute-only memory. You’d put the wasm into it’s own address space, with the code at the top. Something like having the wasm module in it’s own process with shared memory, or a syscall to temporarily swizzle the memory pages around, or a syscall to temporarily set all the non-wasm memory as unaccessible.

                    There would of course be some cost to entering a wasm function like this, but for some functions it may be worth it. You could make it optional too, with some expensive functions entering this mode, and other functions expected to return quickly simply using bounds checks. A single function could even use different modes in different branches.

                    I wonder how feasible it is to give x86 a Harvard mode, with instruction and data lookups going to different page tables.

                    1. 1

                      Thinking about it a bit more, I don’t think the execute-only memory trick would work. Because WebAssembly needs a call stack (the one outside of linear memory).

                      Also, this scheme has problems with multiple memories. You could still have one blessed memory mapped at 0 that needs only a comparison, but the others would need an add too.

                      1. 1

                        I think you only get instruction/data separation if you obey segment registers, which only happens in 32-bit processor modes, which means you also lose the additional registers of 64-bit mode. (I think? Maybe fs/gs are still obeyed?)

                        1. 1

                          FS and GS base are still obeyed, but limits are ignored. This is just to preserve use of those segment registers for cheap TLS.

                        2. 1

                          Not every platform has segmented memory. The overwhelming majority of devices running wasm don’t have cpus that support memory segmentation at all.

                          1. 1

                            You don’t need it. All you need is paged virtual memory, which the overwhelming majority of devices running wasm do have. Our machines all have Memory Management Units in them, we can use them to make the virtual address space equivalent to WebAssembly’s linear memory.

                            You usually still need to map the code into that address space somewhere, so on x64 you’d map those pages to the top half of the address space, where the most significant bit is 1. Then you check that bit before every access to wasm linear memory, and trap if it’s 1. Yes, this means you can only allocate half of the 64 bit address space, but 8,192 PiB ought to be enough for anyone. You could swap it out for a comparison if you need to anyway.

                            However, as I understand it, some architectures like RISC-V and AArch64 allow execute-only memory. Then, you wouldn’t even need a bounds check at all. Just collect the memory fault interrupt if webassembly tries to access the code pages, and turn it into a wasm trap.

                            1. 4

                              No. Paged virtual memory is not sufficient.

                              What you have to protect against is arbitrary access of arbitrary memory addresses.

                              Paged memory protection does not prevent a wasm binary from constructing an arbitrary 64bit address and then trying to read or write that address.

                              Executability of memory is a different issue entirely, and is merely a secondary defense against failure to prevent OoB memory access.

                              1. 1

                                I think you need to read my comment again a bit more carefully.

                                1. 2

                                  Ok, I do not see how you achieve the wasm memory access restrictions, just by using paging.

                                  Please describe your proposed scheme, and how it prevents wasm code from constructing a pointer outside of the wasm allocation region?

                                  1. 1

                                    My point is, if you edit the Page Table so that the wasm memory pages are mapped starting at address 0, and have everything else that wasm code shouldn’t be able to read mapped to addresses starting at 1 << 63, you can reduce the bounds check to a single bit test.

                                    If you really are performance bound by bounds checks, then this scheme may be faster than getting the buffer length and base pointer, comparing, and adding, which you would need to do otherwise.

                                    1. 1

                                      Oh I see you aren’t suggesting page tables to remove bounds checks, you’re just trying to optimize the bounds checks. The problem is you’re making an incorrect assumption about pointers visible in wasm. All wasm pointers (those that are visible to wasm code, not runtime internals that cannot be access) are relative to the base pointer. Pointers in wasm are not machine addresses, so every memory access is (in principle) ( + “pointer”), so this kind of page table hackery does not gain you anything.

                                      There’s a few other assumptions you’re making though, that also impact what you’re suggesting: there can be multiple wasm heaps in a single webpage (let alone a single process), depending on hardware and security model the top bits may not actually be available (pointer authentication, memory tagging, etc), mapping memory at specific locations is not necessarily permitted by the OS, etc.

                                      That said it also sounds like you think I believe there are performance problems due to bounds checks, and I’m not really convinced of that myself - I suspect there’s just a lot of room for improvement in whatever the SM implementation is doing - I was responding to your claim that page table hackery could remove/reduce the overhead, which I don’t believe to be the case given the constraints.

                        3. 8

                          After what feels like a lifetime of running desktop Linux – I think switched to Linux as my primary desktop in maybe early 2000, ran it on servers, desktops, and laptops for the ensuing years – I got an M1 from work because their software only worked on a Mac.

                          After slogging a bit through some of the key commands being different and the normal sort of new user problems, I am now totally converted. I have the usual gripes with the software but the hardware is so good it doesn’t matter, especially when the software is fine enough. Typing this comment on my own personal M2.

                          1. 73

                            i wonder if i’ll live to see the day where we can talk about a language without putting a different language down

                            1. 38

                              The YouTube channel here seems to be a person who needs to be dramatic for view reasons. I think the actual content, and the position of the Ghostty author here on this topic, is pretty mild.

                              An actual bit from the video:

                              Guest: “…I don’t know, I’m questioning everything about Go’s place in the stack because […reasonable remarks about design tradeoffs…]”

                              Host: “I love that you not only did you just wreck Go […]”

                              1. 27

                                Aside… In the new year I’ve started reflexively marking videos from channels I follow as “not interested” when the title is clickbait, versus a succinct synopsis of what the video is about. I feel like clickbait and sensationalism on YouTube is out of control, even among my somewhat curated list of subscribed channels.

                                1. 42

                                  This is why I can’t stand almost any developer content on YouTube and similar platforms. They’re way too surface-level, weirdly obsessed with the inane horse race of finding the “best” developer tooling, and clickbait-y to a laughable degree. I have >20 years of experience, I’m not interested in watching someone blather on about why Go sucks when you could spend that time on talking about the actual craft of building things.

                                  But, no, instead we get an avalanche of beginner-level content that lacks any sort of seriousness.

                                  1. 14

                                    This is why I really like the “Developer Voices” channel. Great host, calm and knowledgeable. Interesting guests and topics. Check it out if you don’t know it yet.

                                    1. 1

                                      Very nice channel indeed. Found it accidentally via this interview about Smalltalk and enjoyed it very much.

                                      Do you have other channel recommendations?

                                      1. 1

                                        I found Software Unscripted to pretty good too. Not quite as calm as Developer Voices, but the energy is positive.

                                        1. 2

                                          Thanks! Didn’t know Richard Feldman hosted a podcast, he’s a good communicator.

                                      2. 1

                                        Signals and Threads is another great podcast, albeit doesn’t seem to have a scheduled release

                                        1. 1

                                          Thanks for the suggestion. I will check it out!

                                      3. 2

                                        I’m in a similar boat. Have you found any decent channels that aren’t noob splooge? Sometimes I’ll watch Asahi Lina, but I haven’t found anything else that’s about getting stuff done. Also, non-OS topics would be nice additions as well.

                                        1. 6

                                          As someone else said, Developer Voices is excellent, and the on the opposite end of the spectrum from OP.

                                          1. 4

                                            Two more:

                                            • Oxide and Friends
                                            • Two’s Complement
                                            1. 4

                                              The Software Unscripted podcast publishes on YouTube too, and I enjoy it a fair bit at least in the audio only format.

                                              1. 1

                                                Book Overflow, which focuses on reading a software book about once every two weeks and talking about it in depth.

                                            2. 14

                                              7 (7!) Years ago LTT made a video about why their thumbnails are so… off putting and it essentially boiled down to “don’t hate the player; hate the game”. YouTube rewards that kind of content. There’s a reason why nearly every popular video these days is some variant of “I spent 50 HOURS writing C++” with the thumbnail having a guy throwing up. If your livelihood depends on YouTube, you’re leaving money on the table by not doing that.

                                              1. 1

                                                It’s not just “Youtube rewards it”, it’s that viewers support it. It’s a tiny, vocal minority of people who reject those thumbnails. The vaaaaast majority of viewers see them and click.

                                                1. 3

                                                  I don’t think you can make a definitive statement either way because YouTube has its thumb on the scales. Their algorithm boosts videos on factors other than just viewer click through or retention rates (this has also been a source of many superstitions held by YouTubers in the past) and the way the thumbnail, title and content metas have evolved make me skeptical that viewers as a whole support it.

                                                  1. 1

                                                    What is the alternative? That they look at the image and go “does this person make a dumb face” ? Or like “there’s lots of colors” ? I think the simplest explanation is that people click on the videos a lot.

                                                    1. 1

                                                      …or it’s just that both negative and positive are tiny slices compared to neutrals but the negative is slightly smaller than the positive.

                                                      (I use thumbnails and titles to evaluate whether to block a channel for being too clickbait-y or I’d use DeArrow to get rid of the annoyance on the “necessary evil”-level ones.)

                                                2. 0

                                                  If your livelihood depends on YouTube

                                                  then you have chosen poorly.

                                                  1. 5

                                                    No, I think it’s okay for people to make great content for a living.

                                                    1. 2

                                                      I am quite happy to differ in opinion to someone who says ‘great content’ unironically. Anyway your response is obviously a straw man, I’m not telling Chopin to stop composing for a living.

                                                      1. 3

                                                        Your personal distaste for modern culture does not make it any higher or lower than Chopin, nor does it invalidate the fact that the people who make it have every right to make a living off of it.

                                                        1. 0

                                                          They literally don’t have a right to make a living from Youtube, this is exactly the problem. Youtube can pull the plug and demonetise them at any second and on the slightest whim, and they have absolutely no recourse. This is why relying on it to make a living is a poor choice. You couldn’t be more diametrically wrong if you tried. You have also once again made a straw man with the nonsense you invented about what I think about modern culture.

                                                          1. 2

                                                            How’s that any different from the state of the media industry at any point in history? People have lost their careers for any reason in the past. Even if you consider tech or any other field, you’re always building a career on top of something else. YouTube has done more to let anyone make a living off content than any other stage in history, saying you’re choosing poorly to make videos for YouTube is stupid.

                                                            You have also once again made a straw man with the nonsense you invented about what I think about modern culture

                                                            You’re the one who brought it up:

                                                            I am quite happy to differ in opinion to someone who says ‘great content’ unironically

                                                    2. 6

                                                      Isn’t this kind of a rigid take? Why is depending on youtube a poor choice? For a lot of people, I would assume it’s that or working at a fast-food restaurant.

                                                      Whether that’s a good long-term strategy, or a benefit to humanity is a different discussion, but it doesn’t have to necessarily be a poor choice.

                                                      1. 0

                                                        Not really?
                                                        I mean sure if you’ve got like 1000 views a video then maybe your livelihood depending on YouTube is a poor choice.
                                                        There’s other factors that come into this, but if you’ve got millions of views and you’ve got sponsors you do ad-reads for money/affiliate links then maybe you’ll be making enough to actually “choose” YouTube as your main source of income without it being a poor choice (and it takes a lot of effort to reach that point in the first place).

                                                        We’ve been seeing this more and more. You can, and people definitely do, make careers out of YouTube and “playing the game” is essential to that.

                                                  2. 10

                                                    Heh - I had guessed who the host would be based on your comment before I even checked. He’s very much a Content Creator (with all the pandering and engagement-hacking that implies). Best avoided.

                                                    1. 11

                                                      Your “ghostty author” literally built a multibillion dollar company writing Go for over a decade, so Im pretty sure his opinion is not a random internet hot take.

                                                      1. 5

                                                        Yup. He was generally complimentary of Go in the interview. He just doesn’t want to use it or look at it at this point in his life. Since the Lobsters community has quite an anomalous Go skew, I’m not surprised that this lack of positivity about Go would be automatically unpopular here.

                                                        And of course the title was click-baity – but can we expect from an ad-revenue-driven talk show?

                                                        1. 19

                                                          My experience is that Lobste.rs is way more Rust leaning than Go leaning, if anything.

                                                          1. 1

                                                            We have more time to comment on Lobsters because our tools are better ;)

                                                            1. 8

                                                              Waiting for compile to finish, eh?

                                                              1. 3

                                                                Hahahahaha. Good riposte!

                                                                I was able to get the incremental re-builds down to 3-5 seconds on a 20kloc project with a fat stack of dependencies which has been good enough given most of that is link time for a native binary and a wasm payload. cargo check via rust-analyzer in my editor is faster and does enough for my interactive workflow most of the time.

                                                              2. 2

                                                                Yeah, Haskell is so superior to Rust that’s not even fun at this point.

                                                            2. 2

                                                              It’s funny you say that because recently it seems we get a huge debate on any Go-related post :D

                                                            3. 4

                                                              The YouTube channel here seems to be a person who needs to be dramatic for view reasons.

                                                              First thought was “I bet it’s The Primeagen.” Was not disappointed when I clicked to find out.

                                                            4. 6

                                                              Don’t be a drama queen ;-) You can, all you want. That’s what most people do.

                                                              The host actually really likes Go, and so does the guest. He built an entire company where Go was the primary (only?) language used. It is only natural to ask him why he picked Zig over Go for creating Ghostty, and it is only natural that the answer will contrast the two.

                                                              1. 2

                                                                i can’t upvote this enough

                                                              2. 5

                                                                After having used Ninja in my last job, I might adjust the plan in the blog post in following ways:

                                                                • Use an mtime-like sequence number rather than raw mtimes or content-hashes.
                                                                  • I was surprised at work when we found that (custom) content-hash computation was a non-trivial time expenditure; however, it’s possible that we used “deep” traces where we could have used “shallow” traces and not had as much of a performance impact (for the reason of concurrent hash computation as mentioned in the article; it would also enable early cut-off).
                                                                  • I suspect you can resolve the listed problematic scenarios with a kind of lexicographical ordering involving mixing in a unique monotonically-increasing build sequence number, but I haven’t verified.
                                                                  • Alternatively, there may be some clever data structure for building and querying a partial ordering efficiently that doesn’t involve literal numbers. In some sense, you’re trying to order unrelated events in a “distributed” system (concurrent build tasks), but using a single global clock may have fundamental issues in which it ascribes “happens-before” relationships and inferences in places where no such relationship actually exists.
                                                                • Lean much more into the article’s “manifests” and symbolic representations of nodes in the build graph, rather than “files” as the inherent type of node.
                                                                  • It’s good for explainability if you store more data, as mentioned in the article, especially if you do early-cutoff.
                                                                  • It seems useful at the fringes of the build graph where you want to represent side effects rather than real files; I recall a lot of weird behavior/bugs with Ninja’s “phony” rules in particular, which can be used to represent those.
                                                                  • I recall having to contort the build graph uncomfortably to make it fit into the file-per-node model, and that just doing I/O on all the files representing build nodes was a non-trivial cost, even when they didn’t have meaningful content.
                                                                  • In principle, a build rule may be able to optimize incremental builds in a domain-specific way if you give it the old and new manifests for comparison, although that’s an involved optimization.
                                                                1. 4

                                                                  Lean much more into the article’s “manifests” and symbolic representations of nodes in the build graph, rather than “files” as the inherent type of node.

                                                                  FWIW, in build2 we have added the notion of a target type. So instead of (using make syntax):

                                                                  libfoo.a: foo.c
                                                                  

                                                                  We write:

                                                                  liba{foo}: c{foo}
                                                                  

                                                                  Besides solving other problems (like varying extensions for same file types across platforms), this allowed us to have non-mtime-based or even non-file-based targets. Specifically, we have alias (which is similar to the phony targets) and group (which represents multiple targets). You can see the beginning of the target type hierarchy in Targets and Target Types.

                                                                  While we received (and keep receiving) a lot of flack for the “strange” syntax, looking back, it was one of the best design decisions that we’ve made, IMO.

                                                                  1. 1

                                                                    Representing and operating on the structured data for the build graph nodes feels a lot more natural and extensible than operating primarily on files like make/ninja. I missed having access to Bazel/buck2-style “providers” for arbitrary structured metadata, and being able to process the metadata directly in the build language, rather than having to shell out.

                                                                    • [tangential] An interesting difference between build2 targets and Bazel/buck2-style providers seems to be that the former are inheritance-/subtyping-based, while the latter are more interface-/protocol-based.

                                                                    Of course, one question is: when does Ninja cease to be an “assembler” rather than a “compiler”, as per its design description?

                                                                    • On one hand, the build metadata doesn’t really ‘need’ to live in the Ninja build graph, rather than the generator.
                                                                    • On the other hand, there are meaningful performance concessions when you have to represent every node as a file (and possibly correctness implications too, when you consider mtime issues, filesystem issues. etc.).
                                                                    • I wonder if there’s a minimally-invasive design change that could lift n2 to operate on more abstract nodes?
                                                                      • Of course, it may turn out that existing Ninja files wouldn’t be able to take meaningful advantage of it without a rewrite.
                                                                      • But maybe it would address certain issues around mtimes or phony rules?

                                                                    Some additional context for those unfamiliar with Ninja:

                                                                    • Ninja has rules which can be parametrized, but perhaps the main difference is that the data flows in only one direction?
                                                                      • The instantiation of a rule can pass in information (limited to string key-value pairs), but a rule can’t easily ask for information about its inputs.
                                                                      • Since Ninja is intended to be generated, it doesn’t matter from an expressiveness standpoint, because you can arrange for rules to be instantiated with all the information they would need.
                                                                    • IIRC I still found the lack of structure inconvenient when writing/maintaining the generator program.
                                                                      • There was a lot of dumping and parsing important build metadata into files as inputs to build rules, and then the rule implementation would have to call back into some script at runtime to interpret the input blob.
                                                                      • You might need to forward metadata across many different kinds of rules. I think this means that all rules potentially need to know about all other rules, or you somehow otherwise need to account for this in your generator.
                                                                      • It was somewhat painful to implement and reason about the information flow at different evaluation times, although this is just a general compiler problem.
                                                                    1. 3

                                                                      An interesting difference between build2 targets and Bazel/buck2-style providers seems to be that the former are inheritance-/subtyping-based, while the latter are more interface-/protocol-based.

                                                                      We actually have both: target type inheritance (which is handy for representing fundamental “types” of targets) and less rigid metadata that can be associated with targets using target-specific variables. We use the latter all over the place: rules can use it (e.g., the compile and link rules for C/C++ communicate some information like this), users can use it (for example, to communicate information about threads in CHERIoT builds: declaration and use), etc.

                                                                      Of course, one question is: when does Ninja cease to be an “assembler” rather than a “compiler”, as per its design description?

                                                                      Perhaps a more interesting question is whether the “assembler” approach, and the meta-build system/project generator that it implies, is still the right design. Meta-build systems have a fundamental flaw (in short, they partition the build graph into two disconnected sub-graphs which leads to the same problems as “why recursive make is considered harmful”).

                                                                  2. 1

                                                                    Can you say more about how you used ninja with content hashes? I have been thinking about adding them to n2 and I think I have a plan where they will be relatively cheap, but I am curious to read real world experience about it.

                                                                    1. 3

                                                                      I think this portion in mtime comparison considered harmful (linked from the original article) held true in practice:

                                                                      Checksumming every input file before building is very slow. If you’re considering whether to rebuild foo.a, and foo.a depends on *.o, and each.o depends on each.c, then you have to checksum the full content of every .c file every time you consider making an incremental build of foo. In large projects, this could be thousands, or tens of thousands of files, each of which we have to open(), read(), checksum, and close(), possibly over a network filesystem. For small projects this is fine, but for large projects, this sucks a lot.

                                                                      blaze/bazel come from a world where source files are stored in a virtual filesystem, which happens to have the ability to tell you a precalculated checksum for every source file (except the ones you’ve changed locally). If you only have to checksum your locally-changed files on each build, that’ll be very fast. But you need filesystem support to make this possible, and we can’t assume that everywhere.

                                                                      You would expect hash computation/comparison to be strictly slower than mtime computation/comparison even in the best case:

                                                                      • In the case of multi-GB artifacts, hash computation could be noticeably slow even on an individual basis.
                                                                        • On some ontological basis, you could argue that big artifacts are most likely to be on a critical path, under the hypothesis that smaller artifacts earlier in the build generally combine to become bigger artifacts later in the build.
                                                                        • If true, doing hash computation feels painful under the above circumstances because 1) it’s on the critical path and 2) it’s for a bigger artifact and linear in said artifact’s size.
                                                                      • It’s extra expensive if you have to use a networked filesystem to read the file contents.
                                                                      • For a large enough project, you might even consider implementing a cache from file metadata to content hash, like mentioned in the “redo: mtime dependencies done right” section of the linked article.
                                                                        • It probably gets messy if you want to cache networked file hashes in the same way. I don’t know what kind of metadata is reliable in networked contexts.
                                                                        • Also, I see now that they mention a “sequence number” for targets, which is perhaps similar to my original comment of adding a sequence number instead of using mtimes directly.

                                                                      I suspect that, due to the Ninja architecture, the build and hash computation happened via deep constructive traces, rather than shallow ones:

                                                                      • Therefore, you couldn’t start the build before computing all the hashes. This was already discussed under “Single pass” in the original article.
                                                                      • But, even if you had shallow traces, you would still have to hash large artifacts and many artifacts as per above.
                                                                      • Hash comparison is also fundamentally more expensive than numeric mtime comparison, which eventually starts to add up.
                                                                        • I expect this is more of a data locality issue than a computation issue. Do you store the hashes directly in the core “node” data structure, or somewhere else? If you store them directly, then you can have fewer nodes in the cache; if you store them indirectly, then you have to traverse the indirection when invalidating the graph.
                                                                        • It’s also just higher peak memory usage, which may or may not be substantial in practice.
                                                                      1. 1

                                                                        I missed this response, sorry for the late reply!

                                                                        My small ideas in this space are:

                                                                        • You can view “hash of file contents” as a kind of output that depends on file mtime/inode/size/etc. and then treat its computation much like other build steps, where it is computed incrementally, parallelized, and cached.
                                                                        • For large files, you could even imagine skipping the actual hashing and falling back on just assuming they never return to a previous state (which only causes overbuilding, not incorrectness). This is why in the n2 post I was thinking about how the hashing approach used there can be pluggable per target.
                                                                        • One last trick is you could borrow the content hashes from git’s index, which maintains its own cache of content hashes and platform-specific tricks to keep that up to date. That only covers files Git tracks though.

                                                                        Re hash comparison cost itself, note that all you really care about is minimizing the chance that hash(t1)==hash(t2) for each individual file – you’re not comparing hashes of unrelated files – which means the birthday paradox problem you normally worry about with content hashing is different. I think this may mean you can get away with using some subset of a standard hash. A full sha1 is 20 bytes, while the int64 timestamps we use are 8 bytes already, so the data size might be comparable. (Now that I’m typing this, I realize: in particular note that if you’re not doing a full content-addressed storage of file content, you typically only have the last build hanging around, so you only are drawing exactly two entries for the set of your possible birthday paradox.)

                                                                        1. 1

                                                                          I don’t know if this thread will auto-close eventually; I’m happy to talk more by email (me@waleedkhan.name) or GitHub (@arxanas).

                                                                          I think conceptualizing the metadata -> hash mapping as an incremental build task is the right way of thinking about it. But there end up being complications like bloated dependency graphs, dynamic dependencies, remote execution, etc. that can justify specializing it, rather than literally using the build system itself.

                                                                          the hashing approach used there can be pluggable per target.

                                                                          This sounds technically feasible, but strikes me as wrong for a reason I’m having difficulty articulating. It’s something like: if you trust the metadata as an input to produce the content hash, then you should be able to use the metadata directly in all cases, so why should you ever hash the file contents? But I haven’t worked through it. My reasoning might be predicated on an invalid assumption anyways, like that all nodes should be cacheable in an ideal world.

                                                                          you could borrow the content hashes from git’s index

                                                                          I specifically looked at this in the context of incrementalizing certain builds. It didn’t seem too useful in practice:

                                                                          • You’re now serialized on recomputing git status, which can take hundreds of milliseconds in large repos.
                                                                          • The computed hash isn’t raw SHA1, but rather prefixed with the blob header, so you can’t directly interoperate with systems that might use SHA1 as the content hash directly.
                                                                          • Potential race conditions. Modified files are not usually stored in the object database, so it’s possible to read an inconsistent file from disk, and you have to prepare for this eventuality.
                                                                          • Like you mentioned, you can’t hash ignored and/or untracked files in this way.

                                                                          However, deeper integration with source control to enable builds seems useful overall. You might be interested in my talk, which involves incrementalizing builds without daemonizing the build itself, similarly to Ninja. There’s also discussion of keying certain build artifacts from source control hash near the end (“Distributing persistent state”), which I did while at Meta.

                                                                        2. 1

                                                                          Also, I see now that they mention a “sequence number” for targets, which is perhaps similar to my original comment of adding a sequence number instead of using mtimes directly.

                                                                          A bit late in the discussion, but do you think you could elaborate on the sequence number idea? Specifically, which problems with mtime you think adding it will help address?

                                                                          To me it seems a persistent sequence number would only be useful if the build system cannot otherwise remember which targets it has updated during any given run (which would be helpful to know if the resulting mtime hasn’t changed due to, for example, low mtime resolution).

                                                                          But maybe I am missing something?

                                                                          1. 2

                                                                            Specifically I was referring the four problematic example scenarios listed in the original article.

                                                                            • Indeed, they’re mostly situations where the build system doesn’t “know about” or “remember” a certain rebuild.
                                                                            • In the case of non-file artifacts or side-effects, a literal mtime isn’t available at all.
                                                                            • In the case of early-exit, you also need to support kind of the opposite of the typical scenario, in which you need to mark a node as logically modified even when the underlying file physically hasn’t been touched.

                                                                            The point of keeping a weird sequence number at all is to enable a very efficient check for whether a node is “up to date”:

                                                                            • That is, even if that information could be determined by processing a historical build log, querying a database, etc.
                                                                            • Specifically, I was remarking that content-hashing is reliable, but not free in comparison to mtime (integer) comparison.
                                                                            • You could consider it a hypothetical optimization for efficiently querying and updating the underlying partial order structure.
                                                                    2. 9

                                                                      I just found this madness in https://github.com/evmar/n2/blob/main/doc/development.md :

                                                                      Path handling and Unicode safety

                                                                      See the longer discussion of Unicode in general in the design notes.

                                                                      Concretely, we currently use Rust String for all paths and file contents, but internally interpret them as as bytes (not UTF-8) including using unsafe sometimes to convert.

                                                                      Based on my superficial understanding of how safety relates to UTF-8 in Rust strings, it’s probably harmless given that we never treat strings as Unicode, but it’s also possible some code outside of our control relies on this. But it does mean there’s a bunch of kind of needless unsafes in the code, and some of them are possibly actually doing something bad.

                                                                      We could fix this by switching to using a bag of bytes type, like https://crates.io/crates/bstr. But it is pretty invasive. We would need to use that not only for paths but also console output, error messages, etc. And it’s not clear (again, see above design notes discussion) that using bags of bytes is the desired end state, so it’s probably not worth doing.

                                                                      I think this is totally unacceptable. The project should be considered not-production-ready until they switch to bstr or something similar

                                                                      1. 6

                                                                        What is the actual problem you’re concerned about?

                                                                        1. 13

                                                                          The immediate effects are:

                                                                          • Some part of Rust community will feel strongly and vocally about misusing unsafe (this is a serious misuse of unsafe according to the current culture).
                                                                          • There’s probably some bugs where invalid input makes n2 crash, rather than emit an error. Eyeballing, I think this truncate could cause a crash? It asserts that the truncation point is the utf-8 char boundary, which is not something n2 enforces.
                                                                          • It is unlikely, but possible that there’s some sort of really bad bug where untrusted input would cause an RCE or something.

                                                                          But, more generally, it feels like this whole thing is really just fighting the language? There’s a very specific, intentional dichotomy between utf-8 – String, everything else – Vec<u8>. It’s not about “why don’t you do something different”, but rather about “why do you do this”. The default action here would be to use Vec<u8> / [u8] for everything. I don’t think you need bstr. It will make certain things nicer, but a vector of bytes is quite capable and convenient type in its own right.

                                                                          That is if you want to use the raw-bytes model. There’s an argument that you can’t escape encoding (because, on Windows, the build.ninja will be in utf-8/ascii, and the paths are always UTF-16, so something will need to encode even for ascii case). From this perspective, it makes sense to use String for build.ninja, OsString or Vec<u8> for parts that represent paths/arguments (I think OsString is the morally correct choice here, but its internal representation is insufficiently transparent to do things like path simplification easily, so you might want to roll your own on top of Vec<8> here) and Vec<8> for everything that might be encoded (like compiler’s output a-la /showIncludes), with a clean layer that decodes Vec<8> into OsString.

                                                                          (for completeness, there’s also a school of thought that says that, instead of going out of the way to represent non-utf8 inputs, build tooling should just mandate that everything is utf-8, and error early for everything that’s not, forcing everyone to make their builds non-weird. That is, you’ll still parse WTF-16 you get from msvc, but you error if it can’t be decoded into valid utf-8. https://docs.rs/camino/latest/camino/ is a library for that)

                                                                          1. 9

                                                                            on Windows […] the paths are always UTF-16

                                                                            (I know you know this but) when getting into these weeds it’s important to note that UTF-16 doesn’t exist: every system that uses 16-bit code units is actually WTF-16 because they all allow unpaired surrogates.

                                                                            1. 2

                                                                              that there’s some sort of really bad bug where untrusted input would cause an RCE or something

                                                                              There’s a lot of untrusted input problems where the build file says things like command = rm -rf / and deletes all your files, which is why I was curious about the specific worries in this. All of the responses here have been of the form “it makes people upset” without any of the underlying reasons.

                                                                              In all this thread feels very silly to me, especially given that it’s mostly tedious to change and doing so would not provide any practical benefit. (Meanwhile had it been written in pretty much other language it could be making the same mistake and nobody would even notice.)

                                                                              1. 8

                                                                                In all this thread feels very silly to me,

                                                                                While it is silly in the context of your particular use-case, I think it is a rational reaction (in direction, not necessary in degree) in the context of the broader Rust ecosystem.

                                                                                Your context is that you don’t care too much about UB: while.truncate and .parse are technically UB the way they are used in n2, and they even can cause a minor bug in practice (a crash rather than clean exit with an error), it’s no big deal, as n2 assumes trusted input.

                                                                                The broader Rust context is that untrusted input is the norm though. This sort of code in some random HTTP middleware would have been a huge problem. The practical safety of Rust hinges not only on the statically-checkable rules of the safe subset, but also on the community culture of using unsafe correctly. Keeping the language constant, the actual safety of the ecosystem could be much much lower if the equilibrium point is where everyone is using unsafe willy-niliy, without thinking and upholding safety invariants.

                                                                                As this is a culture question, it is really hard to make one-off exceptions for special cases, and there’s usually push-back both from senior members of the community who try to deliberately steer the culture, as well as from less senior members, who I think mostly fall for us-vs-them thing, with memory safety being a dividing line.

                                                                            2. 12

                                                                              But it does mean there’s a bunch of kind of needless unsafes in the code, and some of them are possibly actually doing something bad.

                                                                              There are Path and PathBuf types in the standard library for this and it’s not clear why they’ve chosen something more complicated and error prone.

                                                                              1. 5

                                                                                People don’t write code like this. This will scare away potential contributors. If I was looking for Rust job and I knew that in my potential employer codebase &str is not necessary UTF-8, I will probably rejected that job.

                                                                                Also, you lose benefits of Rust strong type system. Ideally, types should represent sets of allowed values as accurately as possible. If I would write something like n2, I would use different type for any kind of string: one for proper UTF-8, one for arbitrary bytestrings, one for WTF-8, etc, this would help me write correct code. See also https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/ .

                                                                                (Fun fact: when writing this comment, I thought that putting invalid UTF-8 to str is undefined behavior, and thus compiler is allowed to do anything. I did dig to Rust reference in attempt to find a proof. And… it turned out that this is not undefined behavior. :) So, yes, today I became slightly smarter. :) Thank you. :) But I still think that nobody should store arbitrary bytestrings in str/String. Yes, this is not immediate undefined behavior, but standard library docs still warn ( https://doc.rust-lang.org/std/primitive.str.html#invariant ) that a lot of code both in standard library and outside of it assume that str is valid UTF-8.)

                                                                                1. 3

                                                                                  Also, such fundamental tools (sitting very deeply in dependency chain) as n2 should not be written in Rust. I love Rust. Today I write everything in Rust. But if I wrote something like n2, I would choose C. (And if I did choose Rust, then I would explicitly scare away people from using the tool for core Linux packages.)

                                                                                  Reason is simple: Rust’s compiler is big and complex. And it is written in Rust itself. This complicates building Rust programs for unusual architectures, even if Rust compiler is already ported to that architecture.

                                                                                  Here is post, which proves my point: https://drewdevault.com/2022/01/15/2022-01-15-The-RISC-V-experience.html . The author built Alpine Linux for his personal riscv computer. And he had to exclude Firefox, because parts of it are written in Rust:

                                                                                  I was able to successfully set up almost all of my standard workstation loadout on it [RISC-V box], with some notable exceptions. Firefox is the most painful omission — bootstrapping Rust is an utter nightmare and no one has managed to do it for Alpine Linux riscv64 yet (despite many attempts and lots of hours wasted), so anything which depends on it does not work. librsvg is problematic for the same reason; I had to patch a number of things to be configured without it.

                                                                                  Then in footnote:

                                                                                  Incidentally, my new language [Hare] can be fully bootstrapped on this machine in 272 seconds, including building and running the test suite. For comparison, it takes about 10 seconds on my x86_64 workstation. Building LLVM on this machine, let alone Rust, takes upwards of 12+ hours. You can cross-compile it, but this is difficult and it still takes ages, and it’s so complicated and brittle that you’re going to waste a huge amount of time troubleshooting between every attempt

                                                                                  As well as I understand, Rust compiler was already ported to riscv at that point, but still building rustc (and everything dependent on it, such as Firefox) happened to be too difficult for Drew.

                                                                                  If (let’s imagine this) n2 becomes widespread tool and many core projects (sitting deeply in dependency chain) start to use it, then building Alpine with Firefox for riscv, etc, becomes even more difficult.

                                                                                  Here is another writing by Drew: https://drewdevault.com/2019/03/25/Rust-is-not-a-good-C-replacement.html . One notable fragment:

                                                                                  C is the most portable programming language. Rust actually has a pretty admirable selection of supported targets for a new language (thanks mostly to LLVM), but it pales in comparison to C, which runs on almost everything. A new CPU architecture or operating system can barely be considered to exist until it has a C compiler. And once it does, it unlocks access to a vast repository of software written in C. Many other programming languages, such as Ruby and Python, are implemented in C and you get those for free too.

                                                                                  Please, make no mistake: I love Rust! I write everything in it. And I disagree with many things in this Drew’s article (“Rust is not a good C replacement”). I think that Rust is good C replacement. But I still agree with Drew in one thing: Rust currently is not as portable as C. And writing core tools, such as build systems, in Rust, is unacceptable, because this creates a lot of problems for system creators, for system programmers, etc.

                                                                                  Look at this list: https://bootstrap.debian.net/essential.html . This is list of transitive build dependencies of Debian package “bash” (or “coreutils”, or “libc”, the list will be same no matter where you start). If you port Debian to new architecture, then you should eventually port all packages in the list, otherwise the port cannot be considered “complete”. Every tool in that list transitively depends on every other! This means that author of any tool in that list can easily perform supply chain attack (remember https://en.wikipedia.org/wiki/XZ_Utils_backdoor ) on all Debian packages. I think that this list should be kept as small as possible. To easy porting of Debian to new architectures and to prevent supply chain attacks. Alas, the list grows beyond any control and becomes bigger, not smaller. (And other Linux distros, of course, have similar lists, through they not always actually generate them and put to some web site.) Unfortunately, the list contains rust, too. This is very unfortunate. I strongly believe that rustc should be removed from that list. And yes, this is possible (if someone makes some heroic efforts.) But if your n2 becomes widespread and people start to use it for core Linux/Debian packages, then removing rustc from that list will become harder.

                                                                                  So, again: I think this is totally OK to write n2 in Rust as long as you write warning: “Please, don’t use this build system for core Linux packages sitting deeply in dependency chain”.

                                                                                  (And of course, all these is just advice, of course, I don’t order you to do anything.)

                                                                                  And yes, I think ninja and n2 are both small and nice tools. Thanks for creating them!

                                                                                  1. 11

                                                                                    I’m not very convinced by this argument. The complicated parts of Rust are all in the frontend and don’t need any changes to port them. If you can port GCC and Clang to your niche architecture you can port Rust too.

                                                                                    The total complexity of the compiler is only important if you’re doing a bootstrap from assembly which is a rare thing to do. Most source distros will start with a binary or cross compiler for C, why not have a Rust one as well? There’s mrustc for the case where you truly want to do a native bootstrap anyway.

                                                                                    I wouldn’t necessarily go to Drew for unbiased takes on Rust.

                                                                                    1. 7

                                                                                      If the build system uses the compatible subset of n2 and ninja then you can substantially reduce bootstrapping worries by using the C implementation, samurai.

                                                                                2. 1

                                                                                  Re “Modular linting”, I did a deeper dive on the underlying ideas here: https://neugierig.org/software/blog/2022/01/rethinking-errors.html

                                                                                  For the reasons described there, it seems obviously correct to me to instead fold linting into the language tooling, much in the same way providing a standard library helps every other library agree on what a String is.

                                                                                  1. 2

                                                                                    That’s an interesting post. I think it’s missing a discussion of what to do when shipping source code to others. In that situation you lose control over which version of the language is used: it might be a different implementation with a different collection of warnings, or it might be a future or past version that you haven’t adapted your warning configuration for. That’s one of the big reasons for having a separate dev build or lint rule that’s much stricter than the normal build for end users.

                                                                                  2. 2

                                                                                    All timings run on my (midrange?) Ryzen 9 24 core 3900X desktop on Windows 11.

                                                                                    I wonder in what world a 24 core Ryzen 9 desktop is midrange. Not any company I ever worked at. (even if it’s a 12 core, 24 thread one).

                                                                                    1. 1

                                                                                      I don’t know much about this, but skimming newegg I think I see desktops with such CPUs for around $2k.

                                                                                      That’s about the price of a Mac Pro laptop, which is what my last job was distributing to engineers.

                                                                                    2. 18

                                                                                      In an ideal world, Servo would become a competitive engine for browsers, but in the shorter term I think there’s a lot of value in it for non-browser uses. Electron uses embedded Chrome, but most Electron apps have 100% control over the HTML and CSS that they generate. This means that they can work around missing features in the browser engine, whereas a real browser has to handle whatever it gets back from an HTTP request to any third-party server.

                                                                                      There are lots of places where people use HTML+CSS+JavaScript because they also work on the web, but where you don’t need to support everything that the web does. Servo is a great fit there. And that path makes them able to ship something that’s useful now and add missing features gradually, rather than having to get to 100% immediately for it to be useful. And, at some point, enough will be working that it can function as a drop-in web view for the general case.

                                                                                      1. 3

                                                                                        I’m ignorant - what is the pitch for servo over chrome there? Won’t they both be about the same size, eventually anyway, if the goal for servo to be a complete engine?

                                                                                        1. 4

                                                                                          The original releases of Chrome were something like 10mb, and it was able to render most of the web. Today it is well over 100mb. Not sure what it’s doing that is >10x more than before.

                                                                                          1. 2

                                                                                            A complete engine could still be potentially stripped down for distribution for an embedded case where you know you don’t need certain things. I would hope they design it with these cases in mind. Generally, if you use chrome you get like a gigabyte blob you’re expected to distribute with all kinds of web features supported you’ll probably not use. Fine for embedding a browser you put on arbitrary websites, overkill for something where you know what you want.

                                                                                        2. 9

                                                                                          One aspect that bothered me was some people still treating this like a theoretical “maybe in the future” problem.

                                                                                          You’d think 30+ years of people pointing out these issues, getting told “that’s just theoretical”, then it becomes an issue, would have taught those involved? I mean SHA1 for commits is a famous example of this so it’s particularly ironic.

                                                                                          1. 8

                                                                                            This framing of the problem is really good: https://valerieaurora.org/hash.html

                                                                                            I think the lesson is that “in the future” is continuously approaching, and you need to design your system with evolution in mind.

                                                                                            1. 14

                                                                                              There are about three issues here, and it’s important to be sure what you are worried about.

                                                                                              Firstly, Kees is pointing out problems with Linux kernel development tooling that does not preserve a full git hash as a reference to a commit. If a tool truncates a hash then it’s easy to construct a collision. This isn’t a SHA1 problem: this is a truncation length problem. Kees’s collision is AIUI about 48 bits, which is a million times easier than the state of the art in collision attacks. (Or a trillion? Dunno how the birthday paradox applies, but it doesn’t matter much. Let’s assume the worst.)

                                                                                              Secondly, Valerie was working on hash collisions in zfs 20+ years ago, before there was solid cryptographic understanding of how to make a hash function. Her chart of hash function lifetimes was based on the best knowledge of how cryptanalysis improved through the 1990s up to the big breaks in MD5 and SHA-1 around 2005. Val was very concerned that straightforward hash-based deduplication was unsafe. She was right, because Subversion chose exactly the design Val was worried about, and svn was fucked when SHA-1 was broken a decade later.

                                                                                              Thirdly, there has been progress. Cryptographers now have a much better understanding of how MD5 failed, how SHA-1 is weak, and why SHA-2 is OK. The current state of the art is that SHA-1 is very hard but not impossible to break, and there is a counter-cryptanalysis defence used by git that we hope can be re-used to turn any future novel very hard attack into a corresponding defence.

                                                                                              Kees is saying that hash collisions are easy, even if they require hundreds of trillions of attempts. But Kees’s observation says nothing about the security of SHA-1 or git (which, with the defences, require several orders of magnitude more attempts). And Valerie’s analysis 20 years ago is not a good guide to the reliability of cryptographic hash functions today.

                                                                                              Nowadays I would say, yes you need to design any protocol with evolution in mind, but cryptographic failure is not continuously approaching in the way Valerie illustrated. (Nowadays cryptographers are more worried about quantum attacks on asymmetric cryptography; much less worried about symmetric primitives.)

                                                                                            2. 3

                                                                                              You’d think 30+ years of people pointing out these issues, getting told “that’s just theoretical”, then it becomes an issue, would have taught those involved?

                                                                                              After probably 3000+ years of people not taking possible future problems seriously, I wouldn’t expect people to do otherwise. I guess you’re not as cynical as I am.

                                                                                            3. 8

                                                                                              In min-ver, if you get foo=1.2.3 in your build graph, that means that someone somewhere made an explicit decision to write foo=1.2.3 in the manifest for their library.

                                                                                              What would such an explicit decision be typically based on, in practice?

                                                                                              Say, I’ve decided to use some functionality provided by foo so I need to add the dependency to the manifest. How do I pick a version to specify there? Do I go through the version history and try to find the minimum version of foo that provides the functionality I need but also has all the relevant bugs fixed? I don’t think so. I just take whatever is currently the latest stable version and use that. Which means even with MVS you can easily end up with a version that was released 5 minutes ago and had seen minimal testing.

                                                                                              1. 7

                                                                                                Even if you update all your deps to their current latest without caring about changes, you’re still going to test you own project a bit afterward. It’s a set of deps guaranteed to have received minimal testing from a developer, as opposed to the innumerable sets your users will encounter when building that release tomorrow, next month, next year (if using max-versions resolution).

                                                                                                1. 3

                                                                                                  Exactly this. I had a Rust project that was broken by a point release in the core “libc” library. It trivially failed testing, and it was only impacting users who picked up a version later than the one I had tested.

                                                                                                  https://github.com/rust-lang/libc/issues/3677

                                                                                                  1. 1

                                                                                                    But you will only test the subset of the foo`s functionality needed by your project, not all the functionality used by all other dependents of foo in your dependency graph (or your project’s dependents dependency graphs). With MVS you choose a version for your own project but it may end up forcing the selection in every dependency graph involving your project.

                                                                                                    In fact, foo`s own tests, which were presumably run before the release, are likely to offer much better coverage, no?

                                                                                                    1. 1

                                                                                                      Maybe you can look at it as an “integration tests vs unit test” question. Some bugs will only show up during integration, for example because of threading patterns. The bigger you dep graph, the more careful you need to be. If a new dep version fixes a bug that doesn’t affect you, upgrading is a small but pointless risk. But we also expect new versions to be better overall, so we can’t be too conservative.

                                                                                                      No approach is intrinsically better than the other. We humans are biased toward risks we can control (min-ver) but also toward exciting new stuff (max-ver). Thankfully those are just ecosystem defaults, and if it makes sense for your project you can advise builders to use another strategy.

                                                                                                2. 31

                                                                                                  Strong disagree by someone who lived in a world before lockfiles.

                                                                                                  Are there any implementations that would meet the author’s suggested criteria? The author doesn’t list any.

                                                                                                  1. 6

                                                                                                    The (implicit?) proposal is not a regression to the world before lockfiles (which I also remember with zero fondness), but a world after lockfiles. I really empathise with this part:

                                                                                                    Lock files are a crutch for non-deterministic processes; they make downstream steps reproducible, at the cost of an extra dependency (the contents of the lock file). At best they are unverifiable, unreproducible build artifacts with no corresponding source; at worst they are plausibly-deniable attack vectors. In this sense, they embody all the same anti-patterns, foot-guns and time-bombs as other toxic practices like e.g. Docker images.

                                                                                                    Just yesterday I was trying to manoeuvre a combination of a Gemfile.lock and gemset.nix (via bundix) for my partner’s Ruby project into alignment. I already had the exact same setup done on a project of mine — there’s an annoying bug wherein bundix doesn’t understand Bundler platforms very well, and on my project I had something going well which forced Ruby/non-native gems — but I simply couldn’t recall or find my way to the exact sequence of commands necessary to produce the correct lockfile. In the end I had to just hack up her Gemfile.lock with a text editor, but her resolved versions differed from mine and so there was even more grief.

                                                                                                    And Gemfile.lock is one of the easier kinds of lockfile to deal with as a human!

                                                                                                    1. 2

                                                                                                      Working with Ruby related software on Nix is a pain, but a counter example could be the Rust-Nix ecosystem mix; here you simply specify your Cargo.lock file and that is sufficient for nix to build the package deterministically. The cargo2nix tool works fairly reliable. Both will build projects just fine.

                                                                                                      The author of the post is proposing to do away with lock files by proposing storing a content hash of dependencies… which is just a lock file with a new name. They only difference to what f.e. Rust currently does it that they want Cargo.lock with Minimal Version Selection from Go.

                                                                                                      1. 1

                                                                                                        The author of the post is proposing to do away with lock files by proposing storing a content hash of dependencies… which is just a lock file with a new name.

                                                                                                        Lock files are unnecessary, since all that information is already stored. For example, Cargo doesn’t need a lock file since all that information is already available in the crates.io index; we just need to say what revision we’re using (current HEAD is https://github.com/rust-lang/crates.io-index/commit/10e6ccc602255b181d5187b9926dc0abeaebfd9e )

                                                                                                        They only difference to what f.e. Rust currently does it that they want Cargo.lock with Minimal Version Selection from Go.

                                                                                                        Nope, no need for Cargo.lock, and version selection is mostly orthogonal (so long as it gives the same output for the same input). I’ve never used Go, so can’t really comment on how well its version selection works.

                                                                                                        1. 6

                                                                                                          I think your article would have been easier to understand if you had been more explicit about how the package name, version -> hash lookup occurs and how it is made deterministic and reproducible. This one example gets your point across far more effectively.

                                                                                                          But it still isn’t clear to me why you think it’s better to lock the package index rather than the individual dependencies. What if I want to get the dependencies from somewhere other than a centralized repository?

                                                                                                          1. 3

                                                                                                            What if I want to get the dependencies from somewhere other than a centralized repository?

                                                                                                            I’m not tied to any particular implementation; indeed, I’m not even advocating the use of centralised repositories, that’s just the default behaviour of many existing package managers.

                                                                                                            My guiding principle is that it’s better for repos to contain “source code”, which is roughly “instructions written by a person”. Lock files are not source code, since they’re auto-generated, not the sort of thing a person would write, and (crucially) were not chosen or decided-on by a human. Their contents were calculated automatically, so why keep them in git? The only reason to do so is that the calculation was not reproducible; if we fix that, then we don’t need them.

                                                                                                            On the other hand, something like , or <hackage.haskell.org at 2024-01-01T00:00:00Z>, or <revision ABC of git://foo> are the sorts of thing a programmer might decide to write down.

                                                                                                            Hence, the answer to your question would be: if you choose to use some specific thing, then write that down that choice, in a way that’s precise-enough to be reproduced elsewhere. That would probably look like a lock file, but the crucial difference is that it’s “source code”, in the sense that you wrote it to capture a specific decision you made.

                                                                                                          2. 3

                                                                                                            Storing the revision of the cargo registry is just a lock file, though it would solve the version selection issue at the cost of being unable to deterministically upgrade a dependency version without possibly dragging along other versions.

                                                                                                            1. 1

                                                                                                              being unable to deterministically upgrade a dependency version without possibly dragging along other versions.

                                                                                                              There’s no reason a package manager has to stick with one registry, or one version of one registry. It’s also useful to allow manually-specified versions/hashes (which look like a lock file, but are hand-written instructions; not an auto-generated cache of everything). Overrides, jail-breaking, etc. can be useful too. There’s a large design space for that; but none of that needs lock files.

                                                                                                              1. 3

                                                                                                                Having to maintain that, especially if I use tools to automatically create PRs for security updates, starts sounding like the file will end up being automatically generated anyway because few people will bother to go into crates.io and copy-paste the latest revision from there.

                                                                                                                Cargo would support the entire workflow if you really wanted using a local repository mirror. It’d just be really painful for near 0 gain over having an autogenerated lock files.

                                                                                                                The points mentioned are great and would be achievable with lock files, I see no reason to abandon them.

                                                                                                            2. 2

                                                                                                              For example, Cargo doesn’t need a lock file since all that information is already available in the crates.io index; we just need to say what revision we’re using (current HEAD is https://github.com/rust-lang/crates.io-index/commit/10e6ccc602255b181d5187b9926dc0abeaebfd9e )

                                                                                                              Not quite, no. cargo update --precise exists.

                                                                                                              1. 1

                                                                                                                I’m not super familiar with Cargo, but from reading the manual it looks like --precise is just a way to explicitly specify a subset of choices? Writing down those choices somewhere in our git repo makes perfect sense (either by adding cargo update --precise to our build scripts, or doing something fancier).

                                                                                                                Running that command manually then adding its result to git doesn’t make sense to me; the same way that it doesn’t make sense to put .o files in git, when we could instead write down the GCC command that creates them.

                                                                                                                1. 2

                                                                                                                  I think it makes sense to me ¯\(ツ)

                                                                                                        2. 5

                                                                                                          Golang uses minimum version resolution, which I have mixed feelings about in general but does meet the criteria as far as I know.

                                                                                                          But simply checking in the lock file also adds determinism.

                                                                                                          1. 14

                                                                                                            I think even with minimal version selection it isn’t resilient to an upstream silently replacing the code in a version in place (without a new version number). Go has its .sum file with hashes for (I assume) this reason.

                                                                                                            1. 1

                                                                                                              Right, so you still need that. But I think that is different in kind since it is purely derived, unlike Cargo.lock which is both derived and contains information not found anywhere else.

                                                                                                          2. 2

                                                                                                            Are there any implementations that would meet the author’s suggested criteria? The author doesn’t list any.

                                                                                                            Maven is very close to this: in the absence of version ranges it will do the right thing and be 100% reproducible without any need for lock files. Version ranges fuck this up but luckily they are very rare in the Maven world. Leiningen uses Maven dependency resolution but it watches for ranges and defaults to aborting when they are detected.

                                                                                                            Strong disagree by someone who lived in a world before lockfiles

                                                                                                            You’re talking about pre-bundler Ruby, right? The problem with pre-bundler Ruby was not that it didn’t have lockfiles, the problem was that basically every gem in existence used version ranges, which are inherently nondeterministic. In that situation you can’t solve the root problem (bad dependency specifications in a lange library ecosystem) so the best you can do is work around it by layering lockfiles on top of it.

                                                                                                            But using lockfiles in a brand new ecosystem is an unforced error. At that stage you can simply omit support for version ranges and get the determinism you want the right way.

                                                                                                            1. 3

                                                                                                              Maven is very close to this: in the absence of version ranges it will do the right thing and be 100% reproducible without any need for lock files.

                                                                                                              I consider version numbers to be documentation. They shouldn’t be relied upon for anything security-critical, like which files to fetch and run off the Internet. Besides which, I like semver and find dependency solvers very useful.

                                                                                                              Maven itself doesn’t actually have much security: everything is addressed using a name and version, but those are both arbitrary strings that have no relation to the package contents. Maven repos keep content hashes alongside artifacts (e.g. .jar and .pom files), so anyone compromising the artifacts can also compromise the hashes.

                                                                                                              The good thing about Maven is that it’s easy enough to run in an offline sandbox, using a local directory for package metadata. That, in turn, makes it easily reproducible. Unfortunately, I’ve not found an authoratitive, reproducible source of Maven metadata (akin to all-cabal-hashes, or crates.io-index). Hence my Maven uses tend to generate something like a lockfile (but cached in /nix/store, not version-controlled, and with only its fixed-output hash kept in git) which relies on the repository-provided hashes via trust-on-first-use. Still, that setup was able to spot some repository corruption e.g. https://discuss.gradle.org/t/plugins-gradle-org-serving-incorrect-pom/44161

                                                                                                              1. 2

                                                                                                                Yes, I should have said “in the absence of version ranges and given trustworthy repositories” because it doesn’t offer much protection against a compromised repository. Definitely the Correct Solution is a content-addressed system, but Maven comes the closest of the “first generation” package systems to doing the right thing.

                                                                                                              2. 1

                                                                                                                But using lockfiles in a brand new ecosystem is an unforced error. At that stage you can simply omit support for version ranges and get the determinism you want the right way.

                                                                                                                Golang, I believe, does this by having you specify a version. When you pull your dependency list has the version, and then you have a block file that has the checksums of every single version you’ve downloaded.

                                                                                                                So I’m curious what you would improve about the Golang situation to not have the two files that essentially are lock files.

                                                                                                                1. 2

                                                                                                                  I’ve never used golang so I could be wrong here, but my understanding is that the checksums it stores do not affect the dependency resolution algorithm; they are used after dependency resolution has occurred as a way to detect mismatches that could be caused by attackers or mistakes.

                                                                                                                  Superficially they look similar to lockfiles, but if your main dependency declaration always resolves by giving you exactly what you asked, then they don’t actually work the same way at all.

                                                                                                                  I don’t believe there’s any downside to having a separate safety pass layered on top of a dependency resolution algorithm that is already built on a deterministic foundation.

                                                                                                              3. 2

                                                                                                                Are there any implementations that would meet the author’s suggested criteria? The author doesn’t list any.

                                                                                                                I tend to use Nix for everything, and this post was originally written as part of http://www.chriswarbo.net/projects/nixos/nix_dependencies.html but I made it separate since it’s largely orthogonal. That link gives some concrete advice for those writing package managers, gives a worked example of getting Haskell’s Cabal package manager to resolve dependencies from a local copy of Hackage metadata inside a Nix derivation, and the problems I encountered along the way (e.g. the use of bespoke binary formats, the semantics of local repos not offering all the same features of remote repos, etc.).

                                                                                                                1. 1

                                                                                                                  Haskell Stack will suggest extra-deps with commitment hashes. You can leave out the hash and you might get broken when a revision is published on Hackage. So it’s not perfect. But it seems like it should be realistic to verify whether a stack.yaml is sufficiently pinned down.

                                                                                                                  You could also consider Nix package expressions a conforming implementation, since I believe you always attach a hash commitment everytime you put an URL in a Nix expression.

                                                                                                                  1. 1

                                                                                                                    Nixpkgs’s Haskell infrastructure does a decent job, e.g. it pins a default Hackage state for lookups, and that state can be overridden if needed.

                                                                                                                    One downside with the current way Cabal and Hackage work (I’m not overly familiar with Stack) is that it’s hard to make them resolve using a local folder of Hackage state, e.g. see www.chriswarbo.net/projects/nixos/nix_dependencies.html#solving-dependencies-in-nix-haskell-example

                                                                                                                2. 3

                                                                                                                  The Rust snippet for the non-zero iterator doesn’t work.

                                                                                                                  https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=570cd3b67a9c3c8452059999113d4309

                                                                                                                  The other snippets looked sketchy to me too (I think “struct NonZero” in C++ doesn’t work either) but I didn’t look at them too hard.

                                                                                                                  1. 1

                                                                                                                    Doesn’t look like the second C++ snippet works (haven’t checked the first). Here’s a godbolt for a working version.

                                                                                                                    1. 1

                                                                                                                      it’s pretty simple to make it work though

                                                                                                                      fn main() {
                                                                                                                          let mut ints: &[i32] = &[0, 1];
                                                                                                                          let it = std::iter::from_fn(move || {
                                                                                                                              while ints.get(0) == Some(&0) {
                                                                                                                                  ints = &ints[1..];
                                                                                                                              }
                                                                                                                              if let Some((item, next)) = ints.split_first() {
                                                                                                                                  ints = next;
                                                                                                                                  Some(item)
                                                                                                                              } else {
                                                                                                                                  None
                                                                                                                              }
                                                                                                                          });
                                                                                                                          for e in it {
                                                                                                                              println!("{}", e);
                                                                                                                          }
                                                                                                                      }
                                                                                                                      
                                                                                                                      1. 1

                                                                                                                        I golfed it a bit:

                                                                                                                                loop {
                                                                                                                                    let (&hd, rest) = ints.split_first()?;
                                                                                                                                    ints = rest;
                                                                                                                                    if hd != 0 { return Some(hd); }
                                                                                                                                }
                                                                                                                        

                                                                                                                        Looks like there’s an upcoming take_first API that would make it even shorter.

                                                                                                                    2. 2

                                                                                                                      FWIW, it is legal to initialize the lookup table at compile time in the same way as the lazy-static codepath in a ‘const’ block: https://godbolt.org/z/n8Tb5qGnK

                                                                                                                      …ah, I see in the original PR that motivated the post they also eventually made this change too.

                                                                                                                      1. 14

                                                                                                                        SQLite comes with its own issues: concurrency and multi-tenancy. Since I/O is synchronous, thus blocking, this makes applications running on the machine compete for resources. This also increases latency.

                                                                                                                        This makes little sense to me. It’s as though they’re ignoring threads. Just run each task on a separate thread and they no longer “compete”.

                                                                                                                        The benefits become noticeable only at p999 onwards; for p90 and p99, the performance is almost the same as SQLite.

                                                                                                                        If I read this correctly, 99.9% of the time there’s no speedup, but for some reason 0.1% of queries are much slower in regular SQLite. That’s interesting, and I’d love to know why — something to do with thread scheduling? — but to me, avoiding a one-in-a-thousand slowdown isn’t a big win that makes it worth completely rewriting the DB engine.

                                                                                                                        1. 14

                                                                                                                          I had the same question and looked at the paper.

                                                                                                                          Their test setup for SQLite:

                                                                                                                          To simulate a multi-tenant serverless runtime, the microbenchmark creates a thread per tenant to represent a resident and executable function in the runtime. In each thread, we open a connection to a separate 1 MiB SQLite database file, and we execute the SQL query.

                                                                                                                          So they were using threads, and the threads weren’t even contending over the same files.

                                                                                                                          [With the above setup] we observe jumps in the latency percentiles when the number of threads is close to a multiple of the number of cores, which highlights that synchronous I/O limits multi-tenancy.

                                                                                                                          It is disappointing that they don’t have a good theory for why this happens. (Also it says “We plan to investigate the reason for the fall in latency [with SQLite]” that happened from 70->80 workers on a 12 core machine, which is also pretty strange.)

                                                                                                                          I looked at their benchmark setup and noticed that they include the database “open” time in the measurement. It would be pretty disappointing if the difference was explained by something dumb like SQLite using a different system API for file-level locking.

                                                                                                                          I think to truly measure what they’re trying to measure they ought to use the same code (maybe their new Rust one) with both sync and async system calls.

                                                                                                                          1. 17

                                                                                                                            SQLite using a different system API for file-level locking.

                                                                                                                            It’s worse than just using different syscalls. Opening a SQLite DB with the default Unix VFS takes time linear in the number of DB files open because the VFS doesn’t use OFD file locks and thus mustn’t close the same file (inode) while a lock is held. The files are tracked in a synchronised doubly linked list https://github.com/sqlite/sqlite/blob/master/src/os_unix.c#L1183-L1207, and each open performs a linear search to find collisions (https://github.com/sqlite/sqlite/blob/master/src/os_unix.c#L6091-L6094).

                                                                                                                            None of that is needed on recent Linux or OS X (or BSD, but untested theory), since they support OFD locks (https://lwn.net/Articles/586904/): https://github.com/pkhuong/verneuil/blob/main/c/linuxvfs.c#L1590-L1634, for example.

                                                                                                                            But, looking at https://github.com/tursodatabase/limbo/blob/9aaf0869aed54e6d90ffc377ce2c8a1660b6f7a4/perf/latency/rusqlite/src/main.rs#L23-L34, I think they only measure queries? It’s a bit weird to report p999 with 1000 iterations per tenant though?

                                                                                                                            1. 2

                                                                                                                              Thinking about this more: even with the benchmark only measuring queries, it will still need to grab a lock per each iteration’s query execution, right? So my worry above about a difference in locking mechanisms is still plausible?

                                                                                                                              1. 1

                                                                                                                                I think they only measure queries

                                                                                                                                You are right, I misread the code! I would update my comment if I could, sorry!

                                                                                                                            2. 4

                                                                                                                              If I read this correctly, 99.9% of the time there’s no speedup, but for some reason 0.1% of queries are much slower in regular SQLite.

                                                                                                                              Issues to do with scheduling concurrency, table/row locks, write ahead log, reader/writer priority starvation, and other matters of this nature. I.E. issues inherent to the design and architecture of the dbms at large.

                                                                                                                              For example, look at my (old but still relevant) benchmark of SQLite to get a baseline performance reading of how it holds up executing the exact same query but with m readers and n writers, playing with knobs that SQLite itself already exposes for adjusting how these simultaneous events affect one another, then extrapolate to new algorithms or mechanics that would become possible with a different backend altogether: https://github.com/mqudsi/sqlite-readers-writers

                                                                                                                              I report p95, p99, and p99.9 – perhaps it would be of interest to also include p90, though I personally view a metric that discounts a full 10% of your workload to be virtually useless.

                                                                                                                              1. 2

                                                                                                                                avoiding a one-in-a-thousand slowdown isn’t a big win

                                                                                                                                If we want to be nitpicky, this is really relative. The environment, the scale all these things are measured at and prepared for indicate something way beyond regular sqlite scenarios. If you have a significant scale, you’re making a thousand people pay that latency cost for every million requests. To most use cases, it’s probably not even that bad of a cost, they wait s second more for their sort to run on a table of football scores.

                                                                                                                                But to Amazon, this would probably be a big big deal, not too sell a thousand thing.