1. 61
  1.  

    1. 23

      Was this just something I thought was neat (a “performance romantic”, as one person called it)

      This just makes me sad. A compression savings like this at NPM’s scale would lead to an automatic promotion at most companies I know. Obviously you need to factor in the tradeoffs before making a decision like this, but questioning the benefit itself seems bonkers. Could be there’s something I’m missing, but either way thanks @EvanHahn for the work and the writeup.

      1. 17

        At a technical level, while I understand the appeal of sticking to DEFLATE compression, the more appealing long term approach is probably to switch to zstd–it offers much better compression without slowdowns. It’s a bigger shift, but it’s a much clearer win if you can make it happen.

        I admit to being a bit disappointed by the “no one will notice” line of thinking. It’s probably true for the vast majority of users, but this would rule out a lot of useful performance improvements. The overall bandwidth used by CI servers and package managers is really tremendous.

        1. 14

          Node already ships Brotli, and Brotli works quite well on JS, it basically has been designed for it.

          1. 1

            Took me a minute to realize that by “on JS” you meant, on the contents of .js/.mjs files. At first I thought you meant, to be implemented in JS. Very confusing :D

          2. 5

            the more appealing long term approach is probably to switch to zstd–it offers much better compression without slowdowns.

            Yes, especially since the change can’t recompress older versions anyway because of the checksum issue. Having a modern compression algorithm could result in smaller packages AND faster/equivalent performance (compression/decompression).

            1. 3

              I agree. Gzip is just about as old as it gets. Surely npm can push for progress (I’m a gzip hater, I guess). That said,

              Dictionaries can have a large impact on the compression ratio of small files, so Zstandard can use a user-provided compression dictionary.

              I do wonder if npm could/would come up with a custom dictionary that would be optimized for, well, anything at all, be it the long tail of small packages or a few really big cornerstones.

              [1] https://en.wikipedia.org/wiki/Zstd

              1. 3

                HTTP is adding shared dictionary support, including a Brotli archive that is like zip + Brotli + custom dictionaries:

                https://datatracker.ietf.org/doc/draft-vandevenne-shared-brotli-format/13/

              2. 2

                I agree a better compression algorithm is always nice, but here back-compat is really important given there’s lots of tools and users.
                It’s a whole other level of pain to add support a format existing tools won’t support, it’s not even sure the NPM protocol was built with that in mind. And a non back-compat compression might even make things worse in the grand scheme of things: you need 2 versions of the packages, so more storage space, and if you can’t add metadata to list available formats you get clients trying more than one URL, increasing server load.

              3. 11

                Publishing would be slower—in some cases, much slower

                Is this really that much of a downside? I can’t imagine publishing to be something that is done often enough to warrant concern about this.

                1. 4

                  It’s probably an issue for companies that publish private packages many times a day from CI.

                  1. 2

                    Is a marginal slowdown really that important in CI, as opposed to something being run on a laptop where the developer is interactively waiting for npm publish to complete?

                    (To be fair, I’m kind of just stirring the pot - an obvious retort to this question might be, actually, private packages tend to be huge and this would balloon the time on the order of tens of minutes. I don’t know whether that’s true or not.)

                    1. 3

                      You could also make it opt-out. Default to slow, strong compression, but if it causes a major performance regression on your deployment, toggle a flag and you’re back on the old behaviour.

                2. 8

                  Fedor Indutny pointed out that npm’s lockfile, package-lock.json, contains a checksum of the package file. The npm people couldn’t easily re-compress existing packages without breaking things.

                  Is… is it just me or does it seem like it’d be better for npm to have the checksum of the uncompressed file? Or maybe even better: a checksum of a deterministic archive format instead of a general tarfile?

                  If the checksum was uncompressed, then that would let npm swap to a better compressor seamlessly, or switch to a better compression algorithm like zstd for newer npm builds, or even recompress packages to gzip “on the fly” while silently using better compression behind the scenes

                  Is there a reason that npm’s lockfile uses a hash of the .tar.gz file? Is there some benefit, e.g. maybe for preventing “zip bomb” style attacks?

                  1. 6

                    I agree. Users will almost always be unpacking the archive anyway, so you’d still need to protect against “zip bombs” and potentially other issues.

                    Maybe there’s something I’m missing, though.

                    In any case, such a change would be backwards incompatible.

                  2. 8

                    It’s a shame, I think this would be a win overall, however small. While I understand that old packages won’t be benefitting from this, common sense tells me that most downloaded packages are often most updated ones and that would make it significant.

                    1. 2

                      At the end of the day, you’ll probably have more impact by convincing the biggest players to reduce their size (like React). I’m not sure I understood what was difficult about integrating the tool into the npm cli, or why slower publishing is a problem (it’s not a common workflow compared to the time spent creating).

                      1. 3

                        Integrating Zopfli into the CLI would require WebAssembly, which the npm maintainers were reticent to use. And slower publishing makes a big difference for some packages—for example, when I tested it on the typescript package, it took 2.5 minutes.

                        1. 4

                          when I tested it on the typescript package, it took 2.5 minutes.

                          https://www.npmjs.com/package/typescript?activeTab=versions

                          2.5 minutes once a day is.. nothing? Especially when the savings are on the scale of terabytes of data (and that’s just the public registry). I imagine they’re waiting on CI builds for significantly longer than that.

                          Integrating Zopfli into the CLI would require WebAssembly

                          Why was wasm necessary over a native (via bindings) or JS implementation?

                          1. 7

                            2.5 minutes once a day is.. nothing? Especially when the savings are on the scale of terabytes of data (and that’s just the public registry). I imagine they’re waiting on CI builds for significantly longer than that.

                            I think doing this has too many edge cases to be worth it, at least as a default. I imagine a package author on a slow machine, wondering why it takes half an hour to publish. I imagine someone wondering why there’s been a huge slowdown, and emailing npm support or complaining online. I imagine a large module where Zopfli takes a very long time to build.

                            There are ways around this. You could imagine a command line flag like npm publish --super-compression, or a timeout on Zopfli with a “regular” gzip fallback. But that complexity has to work for hundreds of thousands of package authors.

                            Why was wasm necessary over a native (via bindings) or JS implementation?

                            That’s true, wasm isn’t the only solution. Because the compression is CPU-bound, I’d be concerned that a JS solution would be too slow. And a native solution could work, but I recall the npm CLI team trying to keep native dependencies to a minimum to improve compatibility.

                            1. 1

                              but I recall the npm CLI team trying to keep native dependencies to a minimum to improve compatibility.

                              I’m impressed to find out the cli is entirely implemented in JS. I didn’t do any additional digging to look at the first-party package dependencies, however.

                              https://github.com/npm/cli

                    2. 5

                      Great write up! Appreciate you trying to make everyones lives a little better