1. 67
  1. 8

    Seriously, I’m dumbfounded. BMP and TIFF have run-length-encoding and then GIF comes around with LZW. But there’s nothing in between. Why? I found the space between RLE and LZW to be large enough to spend many days on. And there’s a lot more to explore.

    If somebody invents a way of building bridges that’s ten times more efficient but requires complicated and error-prone setup, there’s going to be building projects where the added complexity is a bargain, projects where the added complexity is exorbitant, and projects where it’s not quite worthwhile but a simpler system that’s only five times more efficient would be welcome.

    Software is different because it’s vastly easier to re-use inventions. If somebody invents an algorithm that’s ten times more efficient but requires complicated and error-prone setup, somebody can do that once and then copy/paste it into all the other projects where it would be useful at zero marginal cost. A simpler but unusual algorithm can be more complex if it has to be implemented from scratch instead of re-using the standard implementation that you almost certainly already have somewhere in your dependency tree.

    1. 5

      This is great! I would suggest one way to go even simpler: Use some raw file format for the images, and use a generic file compression afterwards. There isn’t really any reason why compression and image file format need to be coupled.

      1. 15

        Many image formats are close to some variation of that theme (header + prefilter + generic compressor). From a quick look at qoi it seems like a twist on the LZW family. The prefilter stage here is updating the dictionary based on complete pixels rather than ‘per byte’ (index is xor:ing the channels together). I’d speculate that LZ4 would perform similarly compression ratio wise on planar image layouts. Both suffer when images start to get wide. Two easy improvements now while staying O(n) would be a space filling curve (hilbert) over a quadtree or simple tilemap.

        There are plenty of reasons why compression and image formats should be coupled, even when staying lossless. Several generic compressors get greatly improved performance by agreeing on pre-trained sets that are representative of the data to send. These dictionaries need to be agreed upon in beforehand, and thus becomes part of the image “format”.

        1. 7

          Generic compression can only go so far, to achieve good compression ratios you need to have some real insight into what structure the data has, and how to take advantage of it. I found this post about compressing SPIR-V shaders to be a good example of how those structures can look like.

          1. 5

            You may be interested in farbfeld from the Suckless folks, which is pretty much exactly what you are suggesting.

            1. 5

              It depends on the data. Generic compression would certainly do a good job with run-length encoding, as well as with efficient representation of recently-seen values. But method #3, storing colors as deltas, seems like the sort of thing that most generic compression algorithms won’t do. If you were trying to losslessly compress a photo, I would expect QOI to do a better job.

              On the other hand, generic compression will also recognize patterns that QOI can’t! If you have lots of dithering, generic compression is probably better.

              Either way, it’s remarkable how far simple algorithms can take you.

              1. 2

                But method #3, storing colors as deltas, seems like the sort of thing that most generic compression algorithms won’t do.

                This depends a bit on the image format. For ‘raw’ images, you generally have two possible representations: you store each pixel as a sequence of values one per colour plane (array of structures) or you store each colour plane separately (structure of arrays). Most GPUs support a variety of minor variations on these two themes.

                A lot of general-purpose compression algorithms handle small deltas quite well because sequences or values where the entropy is in the low bits are very common. This works well in the structure-of-arrays representation, less well in the array-of-structures representation.

                1. 2

                  png does storing colors as delta

                2. 2

                  There is one big reason: applications want to open and edit files. If all you care about is archiving, then it’s fine to separate these, but go generally want to constrain the set of things that every application that opens images needs to handle. This is one of the problems with TIFF: it’s a family of different compression and encoding schemes and a lot of things implement different subsets of them.

                  This approach works well with tar because tar is a tape archiver: the file format is designed for streaming access, not random access, and so you can always use an external decompresser and pipe the output to tar for decompression. It doesn’t work when I want to open a file in my photo editor and it doesn’t know how to handle the particular decompressor. This is why most successful file formats specify a small set of CODECs.

                3. 4

                  Why? I found the space between RLE and LZW to be large enough to spend many days on. And there’s a lot more to explore.

                  Another mostly untapped niche for “image” compression still are lossless modes for remote desktop protocols. There is an unusual property that doesn’t really get leveraged (afair recent x2go versions has something of a tile-cache engine but proprietary sauce so) where regions reoccur or are repeated verbatim but with unpredictable delay in between. The frame-distance is large enough that even generous GOP sequences (key-frame, progressive-delta, bidirectional-delta) will miss it.

                  That’s on my drawing table for a12 but preserving intermediate representations without degrading into pixels until the very end is a higher priority.

                  1. 2

                    Doesn’t Windows RDP use this?

                    I believe VNC (in)famously doesn’t, or was that plain X ?

                    1. 2

                      Windows RDP has a whole lot of encoding strategies, but I don’t recall seeing a caching one based on window UUIDs and type-information in FreeRDP. There are some patented crazyness that sends raw reference frames for server-side classification and a returned set of codecs for the client to pick from. Maybe one is in there.

                      VNC(RFB) is nuts/undefined, like every part of it is made for a world that is long gone and people keep abusing the hell out of it. There are a bunch of proprietary “extensions” that add pretty much every encoding under the sun, and then locked down in some niche product you’d find in kiosks. One of the first targets I go for in some pen tests/red-teaming.

                      Plain X does nothing no, buffers are sent raw and probably all of 5 clients actually use draw commands (see also The X Transparency myth).

                  2. 4

                    This is a terrible headline…

                    O(n) is so abstract. Most important factors in compression are size and decompression time (so I don’t have to wait to look at the image). One paragraph in these are all given but why didn’t they use “Lossless Image Compression with 50x speed up” as the headline??

                    1. 1

                      Agreed on the headline. Most CODECs do n-pass scans of the input image and so their complexity is some constant times the image size, or O(n). The thing that’s interesting here is that it’s a single-pass encoder with a small amount of state and so is very fast.