1. 13
    1. 3

      Secondly, I think that compression works more efficiently when given larger blocks to compress.

      It can. In some database situations it’s suggested to use a very large blocksize, like 1meg, to maximize compression.

      I also wonder if the author’s problem is some sort of alignment? Their 4k image was not aligned to 4k blocks and that caused extra blocks to be written? I have no idea if that is possible in this situation.

      1. 3

        My understanding is that ZFS only compresses blocks separately, since it has to be able to do random IO to them. With small block sizes on some drives, this can create a surprisingly low space savings from compression due to the fact that ZFS can only write disk-level blocks. If your disks are using 4K physical blocks (ie they’re ‘advance format’ drives) and you’re using, say, 8K logical blocks, in order to save any space from compression you need at least a 50% compression ratio, so that your original 8K block can be written in a single 4K physical block; if it doesn’t compress that much, it has to be written using two 4K blocks and you save no space. If you’re using 128 Kb logical blocks you can win from much smaller compression ratios, because you don’t have to shrink as much in order to need fewer physical blocks.

        (SSDs are somewhat inconsistent in whether they report having 512 byte physical blocks or 4K physical blocks. It’s all smoke and mirrors on SSDs anyway, so the whole mess is inconvenient for ZFS.)

        1. 1

          Like anything: it depends. You probably won’t notice much difference between 4k blocks and 8k blocks for storing video files. But you might notice large differences when storing logs. I think what you’re saying is right, I’m just drawing more attention to the “it depends on what you’re storing” aspect of it.

          EDIT: Another place that matters too is Compressed ARC. Better compression you can get there the more data you can jam into RAM. Ramajama

    2. 1

      If you modify one byte in a file with the default 128KB record size, it causes the whole 128KB to be read in, one byte to be changed, and a new 128KB block to be written out.

      Recordsize “enforces the size of the largest block written”1 (emphesis mine), not that all blocks are that size.

      1. 2

        ZFS recordsize is extremely confusing. That description is correct at a filesystem level, but not at a file level; for a single file (or zvol), there is only one logical block size and if the file has multiple blocks, that logical block size will be the filesystem recordsize. So on a normal filesystem, if you have a file larger than 128 Kb, all its logical blocks will be 128 Kb and if you modify one byte, you do indeed rewrite 128 Kb. I had to go through an entire process of writing test files under various circumstances and dumping them with zdb to understand what was going on.

        (Compression may create different physical block sizes. One potentially surprising form of this compression is compressing the implicit zero bytes at the end of files that are not exact multiples of 128 Kb. So with compression, your 160 Kb incompressible file will come out close to 160 Kb allocated on disk, instead of 256 Kb.)