I agree, the benchmark is weird. Besides the fact that some important competitors are missing, I understand that compression implementations make trade-offs between time/memory and compression efficiency, so it’s hard to compare them fairly by just testing one of their configuration. Instead you should plot the graph, over varying configurations, of time/memory consumption and compression ratio. (If the line drawn by one algorithm is always “above” another, then one can say that it is always better, but often they will cross in some places, telling us interesting facts about the strengths and weaknesses of both.)
They do compare to lz4 and zstd for some of the test workloads, not sure why not for everything. They’re not the comparison I’d like though, I don’t see an xz comparison anywhere. For me, xz has completely replaced bzip2, for the things where compression ratio is the most important factor. lz4 and zstd are replacing gzip for things where I need more of a balance between compression ratio and throughput / memory consumption.
Tangentially, I tried using brotli for something at work recently and at compression level 4 it beats deflate wl hands down, at about the same compression speed. I was impressed.
is there any real reason to adopt new or custom compression schemes in 2022? of course there are many formats used in existing applications, protocols, and formats (e.g. zlib/gzip, bzip2, xz, snappy, …) and they are here to stay.
zstd and brotli are both optimized for speed over compressed size. IMO there is a valid niche for algorithms that are slower but make smaller archives, especially if they still decompress fast.
No parallel benchmarks? What good is it if you are fast on a single core when most users have multi-core systems? FWIW, i’ve made very good experience with plzip (repology).
For an algorithm, it’s essentially irrelevant: Compression is inherently a serial problem, but any algorithm can be parallelized by splitting the problem first. Unless I’ve missed something, that’s what they all do.
The nature of compression is that the more you divide the problem, the less compression you can have. I wouldn’t even count that as a parallel algorithm. An actual parallel compression algorithm would require another way to parallelism. The most viable I can think of:
Parallel search makes sense in SIMD and FPGA, but CPU threads, probably no: CPU threads aren’t suitable to sync that tightly. Maybe it would make sense at a compression level that’s already slow enough or if they busywait for each other (very energy wasteful).
Pipelining doesn’t scale, because you can only cut the dataflow in so many suitable places, usually zero. Maybe a two-threaded compression algorithm, one for prediction and one for entropy coding, might be viable. Someone needs to try that.
Edit: I would agree 100% if I had to use the tool in its current form, but that’s fixable.
Actually the zstd implementors did a lot of work in getting good benefit of single-core parallelism by careful design and implementation choices. See for example this blog post on the blog of the author of ztsd. (This blog post is more general, but I believe the knowledge gathered from those experiments were integrated into “fse”, which is a building block of zstd.)
I probably did not make my point very clear. Sorry about that. Users don’t use the algorithm, but the application. And they often want to compress large directory trees. Then why should I use bzip3, which does not appear to support parallel processing, when there are tools available, like plzip or pbzip2, that perform the task in a fraction of the time? If your compression tool is only single-threaded, then algorithmic superiority does not really matter.
If you’re compressing a bunch of independent blobs, you just call the compression API on multiple threads.
If you’re compressing blobs that are related, where you want the commonalities between blobs to improve compression — say, a bunch of JSON files with the same schema — you don’t have a well-parallelizable task, as @anordal pointed out. Each blob being compressed affects the state used for the next one.
I also assume that a naive parallelisation would be trivial to write. I believe that makes it worse that there are no performance numbers of an parallel execution shown.
Oh no, another one.
Cute benchmark pic, but:
I agree, the benchmark is weird. Besides the fact that some important competitors are missing, I understand that compression implementations make trade-offs between time/memory and compression efficiency, so it’s hard to compare them fairly by just testing one of their configuration. Instead you should plot the graph, over varying configurations, of time/memory consumption and compression ratio. (If the line drawn by one algorithm is always “above” another, then one can say that it is always better, but often they will cross in some places, telling us interesting facts about the strengths and weaknesses of both.)
Good point. And there should be a tool for running benchmarks and plotting the charts automatically.
First commit on May 1st this year. Hopefully they’ll get to it!
They do compare to lz4 and zstd for some of the test workloads, not sure why not for everything. They’re not the comparison I’d like though, I don’t see an xz comparison anywhere. For me, xz has completely replaced bzip2, for the things where compression ratio is the most important factor. lz4 and zstd are replacing gzip for things where I need more of a balance between compression ratio and throughput / memory consumption.
They added that after my comment… and yeah, it’s odd they didn’t do it on all workloads.
xz is based on lzma, but not exactly the same. Maybe they thought including lzma was enough.
Tangentially, I tried using brotli for something at work recently and at compression level 4 it beats deflate wl hands down, at about the same compression speed. I was impressed.
is there any real reason to adopt new or custom compression schemes in 2022? of course there are many formats used in existing applications, protocols, and formats (e.g. zlib/gzip, bzip2, xz, snappy, …) and they are here to stay.
but nowadays the near-omnipresence of zstd (https://en.wikipedia.org/wiki/Zstd) and brotli (https://en.wikipedia.org/wiki/Brotli), both of which are extremely widely and successfully used in many different scenarios, seem to be the right choice for most new applications?
zstd and brotli are both optimized for speed over compressed size. IMO there is a valid niche for algorithms that are slower but make smaller archives, especially if they still decompress fast.
Which is why decompression speed and memory usage would be nice to have in the benchmarks.
I feel like I’ve seen lz4 far more widely used than Brotli, and I’m surprised you wouldn’t mentioned it when talking about zstd.
brotli is rather big (and gaining traction) on the web: in http, web fonts, etc.
anyway, yes, lz4 is also widely used, and belongs to the same family (lz77) as brotli. the lz4 author is also the original author of zstd, btw.
No parallel benchmarks? What good is it if you are fast on a single core when most users have multi-core systems? FWIW, i’ve made very good experience with plzip (repology).
For an algorithm, it’s essentially irrelevant: Compression is inherently a serial problem, but any algorithm can be parallelized by splitting the problem first. Unless I’ve missed something, that’s what they all do.
The nature of compression is that the more you divide the problem, the less compression you can have. I wouldn’t even count that as a parallel algorithm. An actual parallel compression algorithm would require another way to parallelism. The most viable I can think of:
Edit: I would agree 100% if I had to use the tool in its current form, but that’s fixable.
Actually the zstd implementors did a lot of work in getting good benefit of single-core parallelism by careful design and implementation choices. See for example this blog post on the blog of the author of ztsd. (This blog post is more general, but I believe the knowledge gathered from those experiments were integrated into “fse”, which is a building block of zstd.)
I probably did not make my point very clear. Sorry about that. Users don’t use the algorithm, but the application. And they often want to compress large directory trees. Then why should I use bzip3, which does not appear to support parallel processing, when there are tools available, like plzip or pbzip2, that perform the task in a fraction of the time? If your compression tool is only single-threaded, then algorithmic superiority does not really matter.
If you’re compressing a bunch of independent blobs, you just call the compression API on multiple threads.
If you’re compressing blobs that are related, where you want the commonalities between blobs to improve compression — say, a bunch of JSON files with the same schema — you don’t have a well-parallelizable task, as @anordal pointed out. Each blob being compressed affects the state used for the next one.
Because someone could write the parallel tool for bzip3 in about 37 seconds, given that the algorithm exists.
I also assume that a naive parallelisation would be trivial to write. I believe that makes it worse that there are no performance numbers of an parallel execution shown.