Threads for indygreg

  1. 2

    Congratulations on this achievement. Is there already something similar for code-signing Windows executables?

    1. 3

      I’m not sure: I haven’t seriously looked into the problem space on Windows. But I do believe the technical implementation on Windows is vastly simpler than on macOS. I’ve long wanted to implement this but I just haven’t had time to look into it.

      1. 3

        I’d say “vastly simpler” is fair, though I might be tempted to add that the signatures are less useful as well. JSign is where I’d start if I needed this today.

      2. 3

        Assuming you just mean Authenticode, it is close enough to the PKCS#7 CMS that it’s not hard to whip up something to do that. I did more than once a little over a decade back for internal build systems, but never won the arguments I needed to to get them released.

        JSign is what I’d use if I needed that today.

      1. 4

        I wrote software to perform a similar analysis a few months ago.

        Great post. Loved the analysis of the microarchitecture levels.

        1. 1

          Linux Package Analyzer looks fantastic, I might try to run it myself at some point but the statistics you provide in the blog post are already super interesting on their own.

          I’m surprised you didn’t seem to have submitted it here on before.

        1. 4

          Fully statically linked Linux binaries are possible. PyOxidizer supports it. But you can’t load compiled extension modules (.so files) from static binaries. Since I was initially going for a drop-in replacement for “python, I didn’t want to sacrifice ecosystem compatibility.

          The glibc based builds I believe conform to Python’s manylinux2014 platform level and are about as portable as you can get before going fully static.

          1. 1

            You could perhaps provide two kinds of builds, at least on Linux, one dynamic as the current one, and one static for those that need maximum portability but are OK with the limitation of running only Python native code.

            All in all, great work!

          1. 4

            I built something similar to this a few weeks ago. It has been a trove of insight into how distro packages and ELF binaries work in the wild.


            1. 6

              I recently delved into the world of Apple code signing and notarisation in order to automate the process in our CI build. My understanding is that you need to notarise on macOS Catalina or newer for binaries to run. Does this tool without notarisation solve a problem for you or are you still needing to do a notarisation step afterwards?

              1. 8

                You still need to notarize certain apps, yes. This tool allows you to do everything up to notarization without macOS though. Closer to not requiring macOS. But not quite there.

                1. 2

                  Cool, a great step nonetheless.

              1. 31

                If the author is here and open to changing the post title, Rust is for everyone, and good for professionals too! The title as is sounds like it may be exclusionary toward people who don’t identity as professionals, or feel intimidated by Rust, which is contrary to Rust’s core goal of opening up systems programming to everyone!

                1. 39

                  I, the author, am here.

                  I appreciate the constructive feedback and can see your point. Choosing titles is hard. I too was concerned about the exclusionary potential for the title. However, I thought I had it sufficiently mitigated by the introduction paragraphs. Apparently I could have done better.

                  I’m not going to change the title of this post because I don’t want to deal with the hassle of rewriting URLs. However, I will try to consider your feedback for future posts I author.

                  1. 6

                    I appreciated the way you took the feedback. Just wanted to say thanks for listening and considering it.

                    1. 3

                      I’ve programmed in the following languages: C, C++ (only until C++11), C#, Erlang, Go, JavaScript, Java, Lua, Perl, PHP, Python, Ruby, Rust, shell, SQL, and Verilog.

                      To me, Rust introduced a number of new concepts, like match for control flow, enums as algebraic types, the borrow checker, the Option and Result types/enums and more.

                      How did you use Erlang without pattern matching?

                      1. 2

                        I think the order is alphabetical so its possible that author used Erlang after he learned Rust.

                        Have a nice day!

                        1. 1

                          Doesn’t add up, he says that Rust introduced pattern matching to him :-)

                      2. 3

                        My usual way of framing it is “Rust is an industrial programming language”. That leaves the door open for practitioners of all kinds. It’s more like a cars are (in general) a utility and not a plaything or research endeavor, but are open to enthusiasts and amateur practitioners as well.

                        1. 2

                          Thanks! I liked the disclaimer, and think predicting how people will respond can be tough. Appreciate you putting in the time to write this, and agree that Rust can be used in professional / production environments successfully, and that it can feel like a breath of fresh air for many!

                        2. 22

                          Or, alternately, when someone writes 13,000 words about how nice Rust is, how ergonomic and human-centric and welcoming and humane the language and tooling and community are, they can keep the title without someone telling them they’re maybe being exclusionary.

                          1. 16

                            It’s constructive criticism. If you post something on the public web, it’s fair game to criticize it- especially if done in a polite manner.

                            1. 23

                              It’s criticism, but just because it’s delivered politely does not mean it is constructive. There is a cost to spending the effort to write a thirteen thousand word love letter to a language, just to have someone nipping at your ankles about how the first four words might exclude someone. Objectively speaking, this is a blogger that Rust would benefit from sticking around – but what is the signal they’ve received here?

                              1. 11

                                Agree 💯.

                                Recently a friend submitted his static site generator which is optimised for math heavy sites on HN.

                                He made the fatal mistake of calling it “beautiful” in the description and the amount of hangups that people had. It was done in a mostly polite manner, but the amount of time and energy wasted in discussing just the first sentence with barely any technical discussion of the internals, was a form of pathological bikeshedding that I was surprised to see. It was just that “nipping at his ankles” as you’ve described, which can take its toll.

                                Vowed to never take that site seriously again.

                                1. 1

                                  It was done in a mostly polite manner, but the amount of time and energy wasted in discussing just the first sentence with barely any technical discussion of the internals, was a form of pathological bikeshedding that I was surprised to see. It was just that “nipping at his ankles” as you’ve described, which can take its toll.

                                  Agreed. Yet here we are- arguing about arguing. I need a life… xD

                                  1. 1

                                    Right, in my experience, HN often attracts uninteresting, unhelpful, or even mean-spirited commentary. Sometimes I am surprised by the opposite, though. In any case, understanding the dynamic really helps. A lot of commentary, seems to me, are upset at some level in some way and take it out in their commentary.

                                  2. 3

                                    The author thanked them and described the feedback as constructive. So if they feel like they’ve taken something away from it, who are we to decide otherwise? My own hopefully constructive feedback here is that this subthread strikes me as projecting conflict into an interaction between two parties who themselves seem to have had a positive experience with it.

                                    1. 2

                                      Yeah, I get where you’re coming from and I don’t really disagree.

                                      But I also think that the person who offered the criticism is just trying to help the author get more views.

                                      It’s fair to suggest that we all should consider our criticisms carefully, though. Because you’re right- if the criticism isn’t that important, we may just be disincentivizing someone from future contributions.

                                      I also wonder if you’re maybe reading a little too hard into the choice of the word “exclude” used by the criticizer, though… I think that term has a little bit of extra charge and connotation in today’s social environment and I don’t believe the criticizer meant it in that way. I think they simply meant that a person who isn’t writing enterprise software might pass over the article because they believe they aren’t the target audience. I agree with that conclusion as well, because I do write enterprisey software and I thought the article was going to be about that specific point of view- I thought I would see arguments about productivity and developer costs and cross-platform support and other less-sexy stuff. But, it was really- as you said, just a love letter to everything about Rust.

                                  3. 8

                                    I felt this lead-in made it a friendly suggestion and not harsh:

                                    If the author is here and open to changing the post title

                                    1. 4

                                      FWIW first third of those thousands of words is irellevant chitchat. The First actual claim on why is rust good is rust makes me giddy. I’m not sure “giddy” describes profesionals. It’s not no, but for me definitely not a yes either - giddy is just…giddy. Then all the technical points, borrow checkers, race conditions and whatnot - none of that tells me why is it a thing for professionals.

                                      What I mean is, I don’t know if rust is for pros or not, I just think that the title is a bit click-baity, and “rust is good for everyone, even pros” like alilleybrinker suggested would work much better.

                                      1. 7

                                        The First actual claim on why is rust good is rust makes me giddy. I’m not sure “giddy” describes profesionals.

                                        There’s a layer of extra irony here in that the word “amateur” comes from a root meaning “love”; in other words, amateurs are people who do something because they love it. So taken literally, this is very unprofessional!

                                    2. 18

                                      It seems relevant to note that the author does address this in the post itself:

                                      The statement Rust is for Professionals does not imply any logical variant thereof. e.g. I am not implying Rust is not for non-professionals. Rather, the subject/thesis merely defines the audience I want to speak to: people who spend a lot of time authoring, maintaining, and supporting software and are invested in its longer-term outcomes.

                                      1. 10

                                        Yes, except the title doesn’t have a disclaimer, and will be taken in the way they say they don’t want. Rather than leaving it ambiguous, they could change the title to remove the ambiguity.

                                        1. 28

                                          This side-steps the issue “people don’t read past the title” - which is the real problem. It’s bad, and unrealistic, to expect every person who writes on the internet to tailor each individual sub-part of an article (including the title) to an audience that is refusing to read the (whole) article itself before making judgements or assumptions. That’s purely the fault of the audience, not the writer, and changing the article title is merely addressing the symptoms, instead of the root problem, which will allow it to just become worse.

                                          We have an expression “judging a book by its cover” specifically for this scenario, and in addition to that, any reasonably-thoughtful reader is aware that, once a sufficient amount of context has been stripped away, any statement loses its meaning - which logically implies that full context must be absorbed before the reader has any chance of understanding the author’s intended meaning.

                                          People should not have to write hyper-defensively on the internet, or anywhere, because civil discussion collapses when your audience isn’t acting honestly.

                                          Reading the title and judging before reading the content itself is not acting honestly.

                                          1. 4

                                            Thank you for writing this. I’m sick of how much culture is warped by toxic social media sites.

                                          2. 7

                                            Agreed. It’s a poor choice for a title, even with a disclaimer. In fact, that he knew he needed a disclaimer should have been a big clue that it wasn’t a good title.

                                            A better title would be: “Professional devs should love Rust”, or “Rust isn’t just for enthusiasts and hipsters”

                                            1. 0

                                              I don’t disagree.

                                          3. 18

                                            Responses like this make me hesitant to try Rust, because they make me feel that I’d have to tiptoe around every word I put into a chat room, issue tracker, or mailing list.

                                            1. 10

                                              I’m not sure this is a Rust issue as much as a general issue of recent years. Be delicate and excessively sensitive about everything because everyone’s an egg shell. It’s contrary to anti-fragility which I would suggest is a far better approach; and the opposite approach. “Rust is for professionals” would have had me targeting Rust so that I could level up, not scare me away. Show me a challenge,

                                            2. 5

                                              I used rust for three years outside of work, but I’m afraid I stopped last year because I find it too big for a hobbyist. If I used it all day every day - no problem. It’s a great language, but I agree with the title.

                                              1. 2

                                                This was my first thought, as well, before even reading the article. Obviously I should read before judging, but the author may want to consider that the title could turn away potential readers.

                                                I made the same mistake in my own My staff engineering reading list. There was no reason for me to needlessly exclude people earlier in their careers from my list.

                                                And it’s good feedback that the author of this piece may not want to unintentionally exclude people from their article.

                                              1. 5

                                                Great post overall. I have a few comments on the section about compression.

                                                The comparison of the ratio between network bandwidth and CPU speed is too simplistic since it doesn’t account for IPC and multi-threading (as stated by the author but their effect are dramatic), RAM available and costs (which is probably were the difference is the most important). This doesn’t invalidate the idea behind this however.

                                                I have a couple remarks about speeds and especially LZMA’s decompression speed which the author calls extremely slow. While it is slower than zstd, it’s still not that slow. On a given benchmark [1] the ratio is around 5 which is less than an order of magnitude and it’s only twice as slow as deflate. The difference is not enough to be able to always rule out one in favor of the other. This wouldn’t be the case for bzip2 which decompression is roughly as slow as its compression (only twice faster) while LZMA’s is much faster (roughly ten times faster).

                                                By the way, there’s a “zlib-ng” project which advertises much faster compression and decompression: (see for performance benchmarks).


                                                edit: and a couple words about ISAs: it’s possible to have run-time detection of CPU features; this has even been rolled into glibc.

                                                1. 3

                                                  Your call out on the cycles per byte is fair: I cherry picked the numbers and methodology to reinforce my point, taking care to qualify it because I knew it wasn’t that fair. If I ever write a dedicated follow-up post on zstd performance, my methodology will be much more rigorous :)

                                                  You are correct that lzma is often less than an order of magnitude slower. However, the relative slowness is often enough to make it the bottleneck. My general belief is that if you are CPU constrained by decompression, there are better options. And with LZMA’s decompression output speed peaking at ~100 MB/s, you often are CPU constrained by LZMA. Not always, but more often than zstd with its typical 1,000 MB/s decompression output. Depending on the line speed of the software you are feeding decompressed data to, this can make a big difference! Keep in mind that on single threaded pipelines, every cycle spent in the compression library is a cycle not spent in your application logic. My general belief is unless you are optimizing for low memory use or minimizing the number of compressed bytes, why take a chance. That’s why I think zstd is a better default for most cases.

                                                1. 27

                                                  Great post! The Python/Ruby/JS interpreter startup problem has been bugging me for more than a decade… To add some detail, there are a few separate problems there:

                                                  1. CPU for interpreter initialization. This can be dozen of milliseconds as you say, but I think it’s really the smallest cost.
                                                  2. I/O for imports. There is an algorithmic problem here if you have M entries on PYTHONPATH, and N imports, then the Python startup algorithm does O(M*N)*constant stat() calls.
                                                  • stat() itself can be surprisingly slow because it can be random access I/O on spinning media.
                                                  • there has historically been a mess of Python packaging tools, and many of them made PYTHONPATH very long.
                                                  • most Python programs have a lot of imports
                                                  • The constant is even too large: Python does many stats() for stuff like, foo.pyc, and even foo.pyo. IMO .pyo should just be removed because the cost just to find them exceeds any benefit they ever had (or maybe they already have been, I haven’t kept up)
                                                  • Startup time is one of the only performance dimensions that is STILL worse in Python 3 than Python 2 (last I checked).
                                                  1. Ignoring the time to just find file, “import somepythonlib” often does a surprising amount of computation, and this happens before you get to your own main(), even if you don’t use the module at all in that program run. For example if you import the stdlib it will compile regexes and namedtuples you don’t use.

                                                  If you import a lot of libraries in Python you can easily get to 300ms or 500 ms startup time. It feels like 50-100 ms is almost the minimum, which is kind of sad.

                                                  You can fork a process in a millisecond (or maybe 10x less if it’s statically linked), so a Python process is 100x-1000x too slow to start up, while it’s more like 10x-100x slower than C at runtime (usually closer to 10x).

                                                  I am thinking of reviving Oil’s “coprocess protocol” (post from 2 years ago) to solve this problem. I think I found a good way to do it on Unix only – with descriptor passing. Basically this is a way to keep processes warm WITH the constraint that you should not have to change your command line tool very much to use it. Random tools in any language should be patched easily to be (single threaded) coprocesses.

                                                  I had some discussions with people around multiplexing stdout/stderr over a single channel, and now I think that’s the wrong way to do it.

                                                  Great info about Zstandard too! I’d be interested in the follow-ups – there is a lot of great information here, and it easily could have been 5 posts.

                                                  1. 9

                                                    I’ve thought a lot about this problem space from the Python side. I (the author of the post) am also the maintainer of PyOxidizer, which has dabbled in improving efficiency of Python imports and startup using alternate module loading techniques. Read for example. And see some of the ideas captured at for improving various aspects of Python performance.

                                                    1. 5

                                                      Yes I remember chatting about that last year or few years ago! I was in the middle of patching CPython for my own purposes, and really saw how bad the issue is. Oil ( is currently delivered as a Python app bundle, but that’s going away in favor of native code (for performance reasons that are NOT startup time.)

                                                      I think you would have to break Python semantics to get more than a mild speedup in Python. And the best time to do that was Python 3, which has long past :-/

                                                      But I’m still interested in the startup time issue from the shell side now. Python is far from the worst offender… All the JITs, and Ruby, are considerably worse. Perl/Awk are better. It would be nice to solve them “all at once” with coprocesses.

                                                    2. 3

                                                      The worst import time I’ve come across:

                                                      pip3 install --user colour_demosaicing
                                                      env -i time python3 -c "from colour_demosaicing import demosaicing_CFA_Bayer_bilinear"

                                                      Takes ~2s (warm) on a modern laptop.

                                                    1. 6

                                                      I am in strong agreement with one of the theses of this post, which is that “application distribution” is the thing that matters to end-users. I’ve long thought myself that distribution packaging focuses too much on packaging of the composite parts instead of whole applications.

                                                      I believe I understand many of the [historical] reasons for why this is. But I can’t shake the feeling that many of the policies/decisions are rooted in the state of technology from 20+ years ago and if we were starting from first principles today we would devise a different set of policies. We’ve even seen this experiment play out with technologies like Docker, snap, and flatpack (as the author notes in the post), so there is some precedent to evaluate and understand the viability of alternative distribution mechanisms/policies. The thing that hasn’t yet changed is the mindset of distribution packagers, which is still focused on the composing parts instead of higher-level applications.

                                                      On one hand, I admire their commitment to the established distribution packaging policies because towing that line is difficult. On the other, I believe that other technologies have clearly demonstrated end-user value for application distribution and the world would be a better, more standardized place if distribution packaging updated its policies and allowed us to converge on fewer, more officially supported solutions. Of course, this is an insanely complex problem space with many 2nd and 3rd order effects to any policy change. So it is understandable why things have remained why they have for so long.

                                                      1. 2

                                                        many of the policies/decisions are rooted in the state of technology from 20+ years ago

                                                        Maybe that’s just indicative how little has changed. If you look at ecosystems like npm, gems, cargo, docker, flatpack and snap, it feels like they regressed more than 20 years, instead of bringing anything new to the table.

                                                        It feels like kids reinventing the very old ways, completely unaware how bad it was and that the current state is the result of a lot of effort into migrating way from it.

                                                      1. 23

                                                        Former Mozilla employee here.

                                                        This wiki page is ancient. There was a strong anti SQLite phase several years ago. I’m not sure where things are today, but by the time I left, people had warmed back up to SQLite a bit. Although many points in the wiki page are still accurate.

                                                        I think a large part of the anti SQLite mindset in Firefox was driven by a combination of lack of understanding on how to optimally use SQLite and actually achieving an optimal configuration through all consumers of it. In its default configuration, SQLite will fsync() after every transaction. And SQLite implicitly creates a transaction for every statement performing a mutation. This can suck performance like no other. You have to issue a bunch of PRAGMA statements when opening a database to get things to behave a bit better. This sacrifices durability. But many database writes aren’t critical and can be safely lost. e.g. do you care if your browser history for number of times you visited a page is off by 1?

                                                        The performance impact of continuous SQLite writes with high durability guarantees is demonstrated in my favorite commit which I authored while at Mozilla: tl;dr a test harness loading thousands of pages and triggering writes to the “places” database (which holds site visit information) was incurring ~50 GB of I/O during a single test run.

                                                        Compounding matters was Firefox’s code historically performed I/O on the main thread. So if there was a read()/write() performed on the main thread (potentially via SQLite APIs), the UI would freeze until it was serviced. Hopefully I don’t have to explain the problem with this model.

                                                        One of my contributions to Firefox was SQLite.jsm (, a JavaScript module providing access to SQLite APIs with batteries included so consumers would get reasonable behavior by default. The introduction of this module and subsequent adoption helped fix a lot of the performance issues attributed to SQLite.

                                                        IMO the biggest issue with Firefox’s data storage was there were literally dozens of components each rolling their own storage format and persistence model. This ranged from SQLite databases, JSON files, plain text files, bespoke file formats, etc. This represented 10+ years of accumulated technical debt. There have been various efforts to unify storage. But I’m unsure where they are these days.

                                                        1. 13

                                                          FWIW GitLab uses a gRPC based solution (Gitaly) for all Git repository interactions, including Git wire protocol traffic (the Git protocol data is treated as a raw stream). See the protocol definition at This allows GitLab to abstract the storage of Git repositories behind a gRPC interface.

                                                          Fun fact: this is how Heptapod ( - a fork of GitLab that supports Mercurial - works: they’ve taught Mercurial to answer the gRPC queries that Gitaly defines. GitLab issues gRPC requests and they are answered by Mercurial instead of Git. Most functionality “just works” and doesn’t care that Mercurial - not Git - is providing data. Abstractions and interfaces can be very powerful…

                                                          1. 2

                                                            That’s interesting! I’ve looked at doing remote Git operations by creating an abstraction over the object store. The benefit is that the interface is much smaller than what you linked to. I guess the downside is higher latency for operations that need several round-trips. Do you know if that has been explored?

                                                            1. 6

                                                              GitLab’s/Gitaly’s RPC protocol is massive. My understanding is they pretty much invent a specialized RPC method for every use case they have so they can avoid fragmented transactions, excessive round trips, etc. The RPC is completely internal to GitLab and doesn’t need to be backwards or forwards compatible over a long time horizon. So they can get away with high amounts of churn and experimentation. It’s a terrific solution for an internal RPC. That approach to protocol design won’t work for Git itself, however.

                                                          1. 10

                                                            Wow I really sympathize with all of this. I can see how poor a fit Python 3 is for Mercurial, having ported Oil from Python 2 to 3, and then back to 2 again, mainly for the reason of strings. (That was early in the project’s life and it didn’t have users, so it was easy.)

                                                            I agree with all this:

                                                            the approach of assuming the world is Unicode is flat out wrong and has significant implications for systems level applications

                                                            We effectively sludged through mud for several years only to wind up in a state that feels strictly worse than where we started

                                                            I think we talked about this before, and you mentioned PyOxidizer, which I again saw in the post.

                                                            But after reading all this exhausting effort, I’m still left thinking that it would have been less effort and you would have a better result if Mercurial had bundled Python interpreter.

                                                            I feel like that’s 10x less work than I read in the post, and it would have taken 10x less time, and you would have the better model of UTF-8 strings.

                                                            It doesn’t matter if distros don’t package Python 2 – because it can be in the Mercurial tarball. (People keep invoking “security” but I think that’s a naive view of security. If someone asks I’ll dig up my previous comment on that. Also I don’t think the Python 2.7 codebase is that hard to maintain and you can get rid of > 50% of it.).

                                                            I guess it doesn’t matter now, but honestly reading the post confirmed all the feelings I had and I personally would have abandoned such an effort years in advance. Despite my love for Python, and using it for decades, it’s not a stable enough abstraction for certain applications, including a shell and a version control system (e.g. ask me about my EINTR backports). To be fair, very few languages are suitable for writing a POSIX-compatible shell – e.g. I claim Go can’t and won’t be used for this task because of its threaded runtime.

                                                            In fact I used to be a Mercurial user and I remember getting ImportError because of some “fighting” over the PYTHONPATH that distros and the numerous layers of package managers do. I still use distutils and tarballs for important applications because it’s stable. I use virtualenv reluctantly, and try avoiding pip / setuptools (and I roll my eyes whenever I hear about a Python package manager that adds layers rather than rethinking them.). The stack is really tremendously bad and unstable, and it makes software on top of it unstable.

                                                            It’s not something I want my VCS touching. So avoiding all of that and the resulting increase in stability is a huge reason to embed the Python interpreter IMO. (BTW Oil is getting rid of the Python interpreter altogether, but Python was hugely helpful in figuring out the algorithms, data structures, and architecture. That’s what I like Python for.)

                                                            1. 7

                                                              Thank you for the thoughtful comments.

                                                              The subject of bundling a Python interpreter is complex. I’m a proponent of bundling the Python interpreter with Mercurial (or most Python applications for that matter) because it reduces the surface area of variability. It that were the exclusive mode of distribution, we could distribute the latest, greatest Python interpreter and drop support for the older version quickly after new versions are released. Wouldn’t that be nice!

                                                              Of course, distributing your own interpreter means now you are responsible for shipping security updates in Python (or potentially any of its dependencies), effectively meaning you need to be prepared to release at any time.

                                                              Then there’s the pesky problem of actually executing on application distribution. It’s a hard problem space that requires resources. It has historically not been prioritized outside of Windows in the Mercurial project because of lack of time/resources/expertise. (Windows installers exist likely because the alternative is nobody uses your software otherwise since they can’t install it!)

                                                              There’s also the problem of Linux distributions, which treat Python distribution fundamentally differently from how it treats compiled languages. Distributions would insist on unbundling the Python interpreter from Mercurial as well as 3rd party libraries which they have alternate means of distributing. This creates a myriad of problems and slows everyone down. I recommend reading and generally agree with its premise that packaging should revolve more around applications, as it would be more user friendly for application developers and end-users alike.

                                                              1. 2

                                                                I agree you might get some eyebrows raised from distro maintainers, but you wouldn’t be alone. I thought Blender also embedded Python but maybe I’m wrong.

                                                                Based on my limited experience with Oil, I think it would be a minor issue but not a blocking one. CPython is plain C code without dependencies, so it doesn’t cause many problems for distros.


                                                                To me, the Windows example just shows that it’s possible and a known amount of work. It sounds like much less work than the years-long migration that you described.

                                                                The only reasons I see for migrating are “memes”, misplaced social pressure, and vague fears about security. They don’t seem particularly solid, especially when compared with the downsides of the alternative. Like the possibility for data corruption in a VCS. I understand that it was the “accepted” thing to do, but I’m explicitly questioning the accepted wisdom.

                                                                I agree there’s room for a CPython post-mortem. But what you described in your post is nothing less than a disaster that’s not over yet! So that might warrant a Mercurial post-mortem as well! I hope that doesn’t come off as rude because it’s not meant to be. I really appreciate you writing this up and I think it got a lot of deserved attention, including from the CPython core team.

                                                                1. 2

                                                                  Another problem you are likely to encounter is the sheer size of your distributable artifact. As far as I know, there’s no good way to eliminate dead code in Python, so you’ll need to ship all of the code that is imported (including transitively), even in conditional imports. Additionally, the interpreter and many libraries (including standard libraries) depend on shared object libraries, so do you also include those in the bundle? I wouldn’t be surprised at all if any nontrivial application bundle was gigabytes in size, even compressed.

                                                                  1. 3

                                                                    No, that’s not a problem. Oil ships 1.1 MB of code from the CPython binary (under GCC, 1.0 MB under Clang).


                                                                    I’m pretty sure that’s less than the size of a “hello world” HTTP server for languages like Rust or Go.

                                                                    I removed dead code with the process described in Dev Log #7: Hollowing Out the Python Interpreter, but it was never more than 1.5 MB done naively, so you don’t even need to do that.

                                                                    I imagine mercurial is pretty much like a shell – it reads and writes the file system, and does tons of byte string manipulation.

                                                                    In Python you can generate a module_init table to statically link the libraries you care about. It’s never going to reach gigabytes in any case, or if it was then the equivalent C program would be gigabytes.

                                                                    There are some downsides to what I did, for sure. But what I’m comparing them to is the multiple person-years of work described in the post, and the pages full of downsides.

                                                                    It’s not only doing all that migration work – it’s that the end result is actually worse. He says he anticipates “a long tail of bugs for years” and I would suspect the same. The source code would be in much better shape due to needing to support just one Python version. The effort from maintainers could have gone elsewhere, e.g. to improving the program’s functionality and fixing other bugs.

                                                                    I’m sorry they were in this situation. It sounds like nothing less than a disaster, and when you’re faced with a disaster that should motivate unconventional solutions (which this really isn’t because plenty of apps embed the Python interpreter.)

                                                                    1. 2

                                                                      Sorry, my dead code elimination comment was directed at Python libraries, not the Python interpreter. I believe our 1-year-old Python application bundle (using pex) is on the order of 500 MB compressed, and that’s not including the CPython interpreter, standard libraries, etc; just the application code and third-party dependencies. Most of that is certainly dead code in third-party dependencies. I’m assuming more mature applications are quite a lot larger.

                                                                      I completely agree with your analysis of the unfortunate situation Mercurial found itself in.

                                                                2. 2

                                                                  I found the previous thread where I commented on security:


                                                                  The tl;dr is that I’m wondering why bundling/embedding wasn’t considered as the FIRST solution, or at least after 2 of the 10 years of struggles with Python 3.

                                                                  My suspicion is that it’s because it feels “wrong” somehow, and because there was some social pressure to abandon Python 2.

                                                                  But as far as Mercurial is concerned, I think that solution is better in literally every dimension of engineering – less short term effort, less long term effort, more stable result, etc.

                                                                  1. 5

                                                                    The tl;dr is that I’m wondering why bundling/embedding wasn’t considered as the FIRST solution, or at least after 2 of the 10 years of struggles with Python 3.

                                                                    Wouldn’t desire to support third-party extensions (written in Python) make this problematic? You’d essentially be creating a Mercurial dialect of Python that drifts from the mainline Python everyone knows over time. Oil’s case is different since Python isn’t part of the exposed API surface.

                                                                    1. 2

                                                                      Yes that’s a good point, I forgot Mercurial had Python plugins.

                                                                      But I would say that you’re breaking the plugins anyway by moving from Python 2 to 3. So that would be an opportunity to make it more language agnostic – and that even has the benefit that you could keep plugins in Python 2 while Mercurial uses Python 3!

                                                                      I’m not sure exactly what the plugins do, but IPC and interchange formats are more robust and less prone to breakage. For example I looked at pandoc recently and it gives you a big JSON structure to manipulate in any language rather than a Haskell API (which would have been a lot easier for them cod to code. I never used it but I’ve seen a lot of systems like this. Git hooks also use textual formats.

                                                                      I have a lot of experience with Python with a plugin language, and while I’d say it’s better than some alternatives, it’s not really a great fit and often ends up getting replaced with something else. The Python version is an issue, even though Python is more stable than many languages.

                                                                1. 2

                                                                  PyOxidizer is a Rust application and requires Rust 1.33+ to be installed in order to build binaries.

                                                                  Hmm, it would have been nicer if PyOxidizer had been meta. i.e. it itself had a version that was self contained single file executable, so that those of us who are not interested in installing yet another language tool chain on our computers could grumble less.

                                                                  1. 7

                                                                    I acknowledged this in the post:

                                                                    It is a bit unfortunate that I force users to install Rust before using PyOxidizer, but in my defense the target audience is technically savvy developers, bootstrapping Rust is easy, and PyOxidizer is young, so I think it is acceptble for now.

                                                                    I will almost certainly provide pre-built executables once the project is more mature. Thank you for the feedback.

                                                                      1. 1

                                                                        That’s great! Always on the lookout for good ways to distribute Python code to the end user. I generally deal with CLI programs, but I’ve created PySide based programs and programs using other toolkits. The other tools I’ve used (PyInstaller, cx_Freeze type things) tend to not do well with some frameworks. Hope this will deal with those too!

                                                                        1. 1

                                                                          I applaud you for your strategy and tactics! Wonderfully done. I was thinking of similar for a different language. I will really have to deconstruct what you have done here.

                                                                          What was your inspiration? Are there similar systems? What is your long term goal?

                                                                          What would it take to support PyPy?

                                                                          1. 11

                                                                            Inspiration was a few things.

                                                                            First, I’m a core contributor to Mercurial, which is a massive, system’s level (mostly) Python application. From a packaging and distribution standpoint, my personal belief is that Python hinders Mercurial. We can’t aggressively adopt modern Python versions, there’s a lot of variance in Python execution environments that create a long tail of bugs, etc. On the performance front, Python’s startup overhead is pretty poor. This prevents things like hg status from being perceived as instantaneous. It also slows down the test suite by literally minutes on some machines. And packaging Mercurial for multiple platforms takes a lot of effort because this isn’t a solved problem for the Python ecosystem. While there are a lot of good things about Mercurial being implemented in Python (and I don’t want to be perceived as advocating for porting Mercurial away from Python - because I don’t), it feels like Mercurial is constantly testing the limits of Python on a few fronts. This isn’t good for Mercurial. It isn’t a good sign for the longevity of Python if they can’t “support” large, mature projects like Mercurial.

                                                                            So a big source of inspiration was… frustration, specifically around how it felt that Python was limiting Mercurial’s potential.

                                                                            Another source of inspiration was my general attitude of not accepting the status quo. I’m always thinking about why things have to be the way they are. A large part of PyOxidizer was me questioning “why can’t I have a single file executable that runs Python?” I helped maintain Firefox’s build system for several years and knew enough about the low-level build system bits to understand what the general problems with binary distribution were. I knew enough about CPython’s internals (from developing Python C extensions) that I had confidence to dive really deep to be able to implement PyOxidizer. I felt like I knew enough about some relatively esoteric systems (notably build systems and CPython internals) to realize that others who had ventured into the Python application packaging space were attempting to solve this problem within constraints given to them by how CPython is commonly compiled. I realized I possessed the knowledge to change the underlying system and to coerce it to do what I wanted (namely produce self-contained executables containing Python). In other words, I changed the rules and created a new opportunity for PyOxidizer to do something that nobody had ever done in the public domain (Google has produced self-contained Python executables for years using similar but sufficiently different techniques).

                                                                            If you want to learn more about the technical journey, I suggest reading

                                                                            As for similar systems, other than WASM, I’m not aware of other “interpreted/scripting languages” (like Python) that have solutions that do what PyOxidizer does. I’m sure they exist. But Python is the only language in this language space that I’ve used significantly in the past ~8 years and I’m not sure what other languages have implemented. Obviously you can do these single executable tricks in compiled languages like Go and Rust.

                                                                            My long term goal is to make Python application distribution painless. A secondary goal is to make Python applications developed with PyOxidizer “better” than normal Python applications. This can be done through features like an integrated command server and providing Rust and Python access to each other’s capabilities. I want Python application maintainers to focus on building great applications, not worry about packaging/distribution.

                                                                            PyPy could theoretically be supported if someone produces a Python distribution conforming to the format documented by In theory, PyOxidizer is agnostic about the flavor of the Python distribution it uses as long as that Python distribution provides object files that can be relinked and implements aspects of Python’s C API for interpreter control. There would likely need to be a few pieces in PyOxidizer, such as how the extension modules C array is defined. There’s probably a way to express this in python-build-standalone’s distribution descriptor JSON document such that it can be abstracted across distributions. I would very much like to support PyPy and I envision I will cross this bridge eventually. I think there are more important features to work on first, such as compiling C extensions and actually making distribution easy.

                                                                            1. 1

                                                                              Thank you for such a detailed response.

                                                                              I love that you have standardized an interface contract for Python runtimes.

                                                                              This looks like it could give organizations confidence in their deployment runtime, while no longer being tied to specific distros and can start using more niche and esoteric libraries that might be difficult to install. This is a form of containerization for Python applications.

                                                                              What I am really interested in, is because you own both sides of the system, is streamlining the bidirectional call boundary. Having control over the shell that runs on the host, the Rust wrapper and the VM there is an opportunity to short circuit some of the expense of calling into C or how data is laid out in memory. In a quick ripgrep through the code, I couldn’t find any reference to cffi. Do you plan on supporting cffi or is it already handled? I am really curious to learn about what your integration plans look like. Great work.

                                                                        2. 1

                                                                          That’s an awful lot of heavy lifting you’re asking from a tool maintainer.

                                                                          And, I mean, ‘brew/apt/yum install rust’ isn’t generally a particularly big ask of you, the end user :)

                                                                          1. 4

                                                                            But, but, this is solving pip install x … don’t you think at least the irony should be acknowledged?

                                                                        1. 11

                                                                          As the developer of a version control tool (Mercurial) and a (former) maintainer of a large build system (Firefox), I too have often asked myself how - not if - version control and build systems will merge - or at least become much more tightly integrated. And I also throw filesystems and distributed execution / CI into the mix for good measure because version control is a specialized filesystem and CI tends to evolve into a distributed build system. There’s a lot of inefficiency at scale due to the strong barriers we tend to erect between these components. I think there are compelling opportunities for novel advances in this space. How things will actually materialize, I’m not sure.

                                                                          1. 1

                                                                            I agree also , there is quite a bit of opportunity for innovation around this. I am thinking at a slightly different angle.

                                                                            There is an opportunity for creating a temporal aware file system, revision control, emulation environment, build system. All linked by same temporal time line. A snapshot, yes. but across all these things.

                                                                            take a look at

                                                                            Imagining a bit, but it could server as a ‘file system’ for the emulation environment. It could also enhance version control system, where the versioning/snapshotting happens at level

                                                                            While I am not working with 100s developers these days, I am noticing that environment/build control is much easier in our Android development world – because we control a) emulator b) build environment

                                                                            So it is very reproducible: same OS image, same emulator (KVM or Hyper-V), same build through gradle (we also do not allow wildcard for package versions, only exact versions).

                                                                            Working on backend with, say C++ (or other lang that rely on OS provided includes/libs) – very different story, very difficult to replicate without introducing an ‘emulator’ (where we can control a standardized OS image for build/test cycle).

                                                                          1. 14

                                                                            The post mortem for this should be a good read. But first, let’s hope there’s a resolution soon, because this is a highly disruptive issue for Firefox users and could lead to users abandoning Firefox over. It’s also a rough situation for the unfortunate Mozilla employees who have to deal with this going into the weekend.

                                                                            FWIW one of the Firefox security team members who would be on my short list for “person in charge of renewing this certificate” is currently in the middle of a multi-week vacation. This is pure speculation on my part, but I wouldn’t be surprised if a contributing cause to this incident were that the renewal reminder emails for this certificate were going to the inbox of someone not checking their email while on vacation. But I suspect there wasn’t a single point of failure here because the people who manage these certificates at Mozilla are typically very on top of their game and are some of the best security people I’ve interacted with. I’m quite surprised this occurred and suspect there are multiple contributing causes. We’ll just have to wait for the post mortem to see.

                                                                            1. 7

                                                                              Also, it takes Mozilla 18-24 hours to push an emergency release (a “chemspill” in Mozilla parlance). So if a new binary needs to be pushed out to users, I wouldn’t expect one until around 00:00 UTC.

                                                                              1. 1

                                                                                I don’t think this needs a new release. “just” a new cert, no? Keys aren’t hard-coded afair

                                                                            1. 7

                                                                              I agree with the premise of the post that Git doesn’t do a good job supporting monorepos. Assuming the scaling problem of large repositories will go away with time, there is still the issue of how clients should interact with a monorepo. e.g. clients often don’t need every file at a particular commit or want the full history of the repo or the files being accessed. The feature support and UI for facilitating partial repo access is still horribly lacking.

                                                                              Git has the concept of a “sparse checkout” where only a subset of files in a commit are manifested in the working directory. This is a powerful feature for monorepos, as it allows clients to only interact with files relevant to the given operation. Unfortunately, the UI for sparse checkouts in Git is horrible: it requires writing out file patterns to the .git/info/sparse-checkout file and running a sequence of commands in just the right order for it to work. Practically nobody knows how to do this off the top of their head and anyone using sparse checkouts probably has the process abstracted away via a script. In contrast, I will point out that Mercurial allows you to store a file in the repository containing the patterns that constitute the “sparse profile” and when you do a clone or update, you can specify the path to the file containing the “sparse profile” and Mercurial takes care of fetching the file with sparse file patterns and expanding it to rules to populate the repository history and working directory. This is vastly more user intuitive than what Git provides for managing sparse checkouts. Not perfect, but much, much better. I encourage Git to steal this feature.

                                                                              Another monorepo feature that is yet unexplored in both Git and Mercurial is partial repository branches and tags. Branches and tags are global to the entire repository. But for monorepos comprised of multiple projects, global branches and tags may not be appropriate. People may want branches and tags that only apply to a subset of the repo. If nothing else this can cut down on “symbol pollution.” This isn’t a radical idea, as per-project branches and tags are supported by version control systems like Subversion and CVS.

                                                                              1. 5

                                                                                I agree with you, git famously was not designed for monorepo.

                                                                                Also agreed, sub-tree checkouts and sub-tree history would be essential for monorepos. Nobody wants to see every file from every obscure project in their repo clones, it would eat up your attention.

                                                                                I would also like storing giant asset files in repo ( without the git-lfs hack ), more consistent commands, some sort of API where compilers and build systems can integrate into revision control etc. Right now, it seems we have more and more tooling on top of Git to make it work in all these conditions while git was designed to manage a single text file based repo, namely the Linux kernel.

                                                                              1. 3

                                                                                It’s worth reminding everyone that PGP keys have expiration times and can be revoked. So if you put PGP signatures into Git, it is possible that signature verification works today but not tomorrow. (GPG and other tools will refuse to verify signatures if they belong to expired or revoked keys.) goes into more detail on the problem and is always a terrific read. In my opinion, this is a very nasty limitation and therefore using PGP for signatures in a VCS is extremely brittle and should be done with extreme care.

                                                                                In order to solve this general problem of not being able to validate signatures in the future, the VCS needs to manage keys for you (so you always have access to the key). And you probably don’t want to use PGP because tools enforce expiration and revocation. Key management is of course a hard problem and increases the complexity of the VCS. For what it’s worth, the Monotone VCS has built-in support for managing certificates (which are backed by RSA keys). See captures a lot of context about this general problem.

                                                                                1. 30

                                                                                  A generic solution that doesn’t require Docker is a tool/library called eatmydata:

                                                                                  Using LD_PRELOAD or a wrapper executable, libeatmydata essentially turns fsync() and other APIs that try to ensure durability into no-ops. If you don’t care about data durability, you can aggressively enable eatmydata to get a substantial speedup for workloads that call into these expensive APIs.

                                                                                  1. 8

                                                                                    eatmydata is also useful when testing other applications that [ab]use fsync including build systems.

                                                                                    fsync() is also the reason why people believe that ramfs are just/always faster than drives. Very often the kernel is doing a good job at caching data in memory and drives perform as well as ramfs… once you disabled fsync.

                                                                                    1. 1

                                                                                      At least on Linux (where I’ve measured it), tmpfs is in fact significantly faster than persistent filesystems even for cached (purely in-memory) operations.

                                                                                      Whether your application is filesystem-intensive enough for it to matter is another question.

                                                                                      1. 1


                                                                                        …did you disable fsync() in your benchmarks?

                                                                                        1. 1

                                                                                          My measurements were taken with purpose-built, hand-written microbenchmarks. There was no fsync to “disable”.

                                                                                  1. 3

                                                                                    This is a fantastic interview. Pablo Santos articulates some features/advantages that Plastic SCM has compared to other tools. His vision for the role and future of version control is eerily similar to what is floating around in my head (I’m a contributor to Mercurial). He even talks a bit about the need for (version control) tools to be fast, get out of the way, and provide better features in order to avoid cognitive overhead, which impacts productivity and velocity. This is a must-listen for anyone who works on version control tools or cares about the larger developer productivity space.

                                                                                    1. 2

                                                                                      How realistic do you think it is for the Git to evolve to support big files?

                                                                                      As I understand it the problem boils down to three issues:

                                                                                      1. Every file is hashed completely before being stored as a Blob
                                                                                      2. A git checkout checks out the whole tree, it’s not possible to checkout a subset
                                                                                      3. There is no lazy-fetching of the Git objects. It’s possible to do a shallow fetch but then Git operations are limited.

                                                                                      (1) means that even a 1 byte change will create a whole new Blob. I think that it could be improved by introducing a new “BlobList” type of object that can contain a list of Blob. Then the update would be only on the size of the Blob. Blob chunking heuristics can then be developed and used at insertion time.

                                                                                      (2) means re-thinking a lot of the CLI operations to work on a subset

                                                                                      (3) would have to re-design the database to lazily fetch objects upstream when they are missing

                                                                                      1. 1

                                                                                        I couldn’t agree more. I always scratched my head thinking why version control, being the “operative system” of software development did not lead the Agile, DevOps… you name it, modern software development “movement”. In a way I see that the dominance of Git raised the bar so high, innovation was not required for a long time. On the other hand, being so generalist makes changes in roadmap of Git difficult to cover al the most innovative edge cases, right?

                                                                                        1. 1

                                                                                          Hey, Greg, you should get a hat!

                                                                                        1. 6

                                                                                          Nice article!

                                                                                          • Regarding startup time, I heard about the Mercurial command server a long time ago, so this was a good reminder. The idea of the “coprocess protocol” in Dev Log #8: Shell Protocol Designs is to make it easy for EVERY Unix binary to be a command server, no matter what language it’s written in.

                                                                                            I have a cool demo that uses some file descriptor tricks to accomplish this with minimal modifications to the code. It won’t work on Windows though, which may be an issue for some tools like Mercurial.

                                                                                          • I also embed the Python interpreter and ship it with Oil, which reduces startup time. sys.path has a single entry for Python modules, and every C module is statically linked. I thought about making this reusable, but it’s a pretty messy process.

                                                                                            Rewriting Python’s Build System From Scratch

                                                                                            Dev Log #7: Hollowing Out the Python Interpreter

                                                                                          • Regarding function call overhead, attribute access, and object creation, the idea behind “OPy” is to address those things for Oil, although I haven’t gotten very far along with that work :-)


                                                                                          I guess the bottom line is that we’re both stretching Python beyond its limits :-/ It’s a nice and productive language so that tends to happen.

                                                                                          1. 1

                                                                                            Thank you for the context! It is… eerie that we seem to have gone down similar rabbit holes with embedding/distributing Python!

                                                                                            You may be interested in, which is the sister project to PyOxidizer and aims to produce highly portable Python distributions. There’s still a ways to go. But you may find it useful as a mechanism to produce CPython build artifacts (and their dependencies) in such a way that can easily be recombined into a larger binary, such as Oil.

                                                                                            1. 1

                                                                                              The general coprocess protocol looks really interesting. It might be nice to surface it somewhere more visible and trackable…github wiki pages are notoriously bad for being able to keep up with changes to docs like this.