1. 19
  1.  

  2. 35

    Why is “a complete platform for building web and mobile apps in pure JavaScript” 600MB…

    For reference gcc + glibc (a somewhat complete platform for building web and mobile apps and games and kernels and databases in pure C/C++) is 140MB.

    1. 19

      I’m glad somebody said this so I didn’t have to. :) If the goal is to save bits, switching compression provides only modest, incremental savings.

      1. 2

        gcc + glibc is very incomplete. Does it include any kind of HTTP support? Any kind of RPC? Any kind of serialization? Any GUI support, even just the API? Compression format support? A config format support? Heck, any kind of parsing library? Support for any kind of structured datastore (beyond the filesystem)?

        600mb does sound like too much, but the comparison is off; it’s likely doing a lot more than gcc + glibc.

        1. 2

          From a quick look at the tarball, it doesn’t seem to include the runtime (Node.js), so it’s incomplete in the other way.

      2. 12

        I’m not caremad about people using gzip for things.

        But! If you want to see the fruits of some of the work done on compression over time, the Squash benchmark is super cool. This guy’s analysis of it approximately agrees with mine.

        (The initial file type and machine for the benchmark results are randomly picked, so it might load showing compression results for Protocol Buffers on a Raspberry Pi. You might want to choose options more like your use case.)

        • Brotli at its lowest settings competes with gzip in compression speed but with a higher ratio. At its highest it’s overlapping with some xz settings in terms of ratio, but trading off slightly worse ratio for higher decompression speed.
        • Zstd (lz4 author Yann Collet’s work in progress) provides a good point in the region “to the right” of Brotli and gzip (more speed, less compression).
        • lz4 itself provides another interesting point to the right of Zstd.
        • Density, a compressor I’d not previously heard of, turns in some creditable results for extremely high speed compression.

        If there’s anything I’m oddly obsessed with, it’s where we might be able to use compression but don’t, or could make cheap compression better. OS X, Android, and Chrome OS compress RAM before swapping to disk. Are there advancements to make there, in terms of hardware assists, better algorithms, or clever tricks to decide what to compress (or choose among algorithms, e.g. tighter packing for data likely to stay packed for longer)? Samsung apparently put a simple hardware memory compressor in their Exynos chips, and Intel gzip QuickAssist and the planned compression coprocessor in the AMD ARM A1100 server chip (delayed but still kicking) are other (different) examples. And sometimes transparent compression while writing to disk would be a net speed win (i.e., we’re I/O bound) but we don’t do it.

        I think folks are requesting the .br extension for Brotli from IANA, and Chrome and Firefox support it (or will) as an HTTP Content-Encoding. It’s part of the WOFF2 font standard already. There are also zlib patches improving speed but maintaining backcompat. Zstd, though not finalized yet, is interesting. I would love all those to get handy command-line utilities so they’re easy to deploy when, you know, I have a dozens-of-GB backup to archive at work.

        1. 4

          Came here to mention Squash, too. You might also be interested in my heatshrink project, which is a C library for doing data compression and decompression in severely memory constrained and/or hard real-time systems. (It’s LZSS-based.)

        2. 9

          I decided to do some tests on my backups. I generally dump a postgres instances with the following command:

          time pgdump -U postgres head | (gzip > /home/database/koparohead$(/bin/date +\%Y-\%m-\%d\%H-\%M-\%S).gz)

          Here are the timings with gzip, xz and gzip -9:

          Doing a dump and compressing it with gzip:

          real    1m6.960s
          user    1m3.220s
          sys     0m3.320s
          

          Dump performed and compressed with xz:

          real    17m57.054s
          user    17m37.457s
          sys     0m8.777s
          

          gzip -9 the db dump:

          real    1m21.947s
          user    1m16.890s
          sys     0m4.373s
          

          Resulting size:

          -rw-r--r-- 1 database database 643M Dec 12 19:49 koparo_head_2015-12-12_19-48-00.gz
          -rw-r--r-- 1 database database 477M Dec 12 20:07 koparo_head_2015-12-12_19-50-00.xz
          -rw-r--r-- 1 database database 641M Dec 12 20:11 koparo_head_2015-12-12_20-09-40.gz9
          

          In my use case, it seems gz is the most sane approach. I doubt waiting 17m57s for a DB backup is a viable case :)

          1. 6

            I happen to have an actual potential use case I tried throwing this at. I’m regularly taking snapshots of a bit of server filesystem with tar (it happens to be a Minecraft server, but I doubt that massively skews the compression performance characteristics of the data). I have snapshots easily to hand, so I grabbed one and checked it. These are all done with a warmed cache and are profoundly unscientific, but here we go:

            $ time cat snapshot.tar | cat | wc -c
            1721825280
            
            real    0m3.135s
            user    0m0.032s
            sys     0m4.504s
            $ time cat snapshot.tar | gzip | wc -c
            1044644775
            
            real    1m34.558s
            user    1m33.436s
            sys     0m5.436s
            $ time cat snapshot.tar | bzip2 | wc -c
            1037459851
            
            real    6m34.127s
            user    6m28.952s
            sys     0m8.828s
            $ time cat snapshot.tar | xz | wc -c
            1030032944
            
            real    14m14.562s
            user    14m7.672s
            sys     0m14.856s
            

            So using gzip saves about 40% on my original most-of-2GB. Not bad, especially in only a minute and a half. bzip2 saves an additional half a percent, at the cost of another five minutes of processing; and xz saves 0.8% over gzip, at the cost of almost thirteen minutes of additional processing. There’s no way that’s worth it, especially since the server needs to be doing things other than compressing snapshots while this is going on. So I guess I’ll keep using gzip.

            1. 1

              How many cores do you have available and did you try xz with -T 0? For apples-to-apples, you could try pigz, as well.

            2. 2

              I regularly run personal backups and disk images through xz, they both end up about half the size of gzip. Also, having threaded compression built into the primary executable (-T 0) is nice compared to having to download a separate pigz package.

              1. 1

                I guess it’s less of an issue now'a days but what is the CPU overhead for xz vs gzip compared to the compression ratio?

                1. 7

                  From the numbers in the post, roughly 2x compression for 5x CPU.