1. 27
  1. 17

    Losers always whine about how their bloat is a calculated tradeoff. The important thing to consider is that there is no tradeoff.

    I have a finite amount of time in this world. There is definitely a tradeoff.

    1. 8
      1:	stosb
      loop	1b

      So wasteful. rep stosb would save you a byte.

      1. 7

        Oh my gosh you’re right! Wow great eye. Blog post updated. That’s also going to save us four bytes in the actual Cosmopolitan codebase, since one of the things I didn’t mention is the LOOP instruction has a pretty big legacy pipeline slowdown so it’s macro’d out into add/jnz there. But your solution is so much faster and simpler.

      2. 3

        Similar to how a Chess game may unfold very differently if a piece is moved to an unintended adjacent square, an x86 program can take on an entirely different meaning if the instruction pointer becomes off by one. We were able to use this to our advantage, since that lets us code functions in such a way that they overlap with one another.

        This is bonkers. Now I want to try this on m68000 too.

        1. 2

          Cosmopolitan is a scrappy operation. The repo right now only has about 1.5 million lines of code.

          Hm I was surprised by this? I thought this was about Lisps that fit in 512 bytes :)

          It looks like most of it is third party code, perhaps in the build toolchain, like a copy of GCC and Python:


          I was also surprised that Python is in the repo.

          But now that I look at it, it’s a much bigger and more ambitious project than I thought!

          I guess I’m not entirely sold on the idea of giving up source portability for running on multiple OSes. As far as I understand the project compiles to native x86-64 code which can be run on all the common OSes.

          But I think source portability like compiling to ARM (Apple M1) or RISC V is a great feature! And I also wonder about portability to new OSes.

          1. 4

            Author here. I agree! CPU architecture portability is on the horizon. We’re just focusing heavily on where we have the resources to overdeliver right now. So if ARM is a blocker for you, please keep following the project.

            1. 1

              OK interesting … I look forward to seeing this :)

              This is not my area, but I feel like there is going to be some extremely clever QEMU-like thing … I remember reading this paper many years ago and being surprised about the way that QEMU used GCC to generate code snippets. (It probably doesn’t work like that anymore?)


              Also another tangent … I believe binary size is an underappreciated issue for distributed systems, as the post points out.

              But I also think a pretty practical answer is that static executables should retain some structure for differential compression. For example, they can be a Python app bundle, and then we need some kind of git / OSTree / casync like thing that can quickly find that two executables both contain Python.

              And it should also be able to detect the differences between Python 3.6 and 3.7 quickly, and transfer diffs.

              I think this is well within the state of the art, but it’s mostly a matter of naming / conventions / tooling … there is something of a network effect, i.e. it makes sense to use differential compression only if other people are using it :) So you have something to base off of.

              Or it could be a matter of making some kind of Dockerfile DSL that exposes structure rather than obscuring it … i.e. it should be designed for “maximum” sharing, and make layers explicit, not implicit.

              1. 2

                One way we could do it is by linking on the binary on-host using ld.bfd rather than transferring it over the network. In order for that to work, the build system would need to shard all the .o files across the network. Since all the test hosts are likely contributing to building, then half the binary is already there. They just need to request the other .o files from the other build farm hosts. This way if you make a small change to one .o file, you don’t need to retransfer the whole .com file over the network. Just the .o file.

                That’d be an smart tool that could make even bloated binaries scale for testing. It’s something I want to have some day. Right now our testing setup is so super simple and primitive. It basically just boils down to scp and ssh run. But what’s great about small binaries, is they’re so small that I don’t need the best build system in the world quite yet! I’m happy and things will be going fast for some time.

          2. 1

            @jart What do you think about using 32-bit x86 as your universal ISA as opposed to x86-64? I suppose that would rule out modern Mac? It would halve pointer sizes, though.

            I’m thinking about this in light of this recently published critique of the uxn virtual machine. It seems to me that the best way to achieve uxn’s stated goal of salvage computing, as opposed to implementing a retro-style VM (though that’s certainly a fun project), is to target x86, since there are so many unused x86 boxes out there. And many of those are 32-bit.

            1. 3

              I think older computers deserve a home where they can be happy healthy and useful. My focus is less towards salvage and more towards working my way up to being able to host things like AVX scientific computing workloads. That’s one of the reasons why we have Python. It’s about solving the toil problem.

            2. 0

              For the record, here’s the same post on the orange site: https://news.ycombinator.com/item?id=31693323

              Stories with similar links:

              1. Supporting four operating systems in a 400 byte ELF executable authored by jart 5 months ago | 22 points | 5 comments