1. 21

  2. 3

    We need the hash of the uncompressed block because now that we have added a layer of optimization in compression we have also exposed ourselves to a durability issue. Corruption is now possible while compressing the block on the client as our clients rarely have ECC memory. We see a constant rate of memory corruption in the wild and end-to-end integrity verification always pays off.

    Fascinating: non-ECC memory corruption is something I’ve heard about, but I’ve rarely come across someone saying they’ve had to mitigate it in their code.

    1. 4

      It is rare indeed. I have seen this before from ZFS developers, which makes sense.

      1. 12

        dropbox sync engineer here :)

        we regularly see bitflips on consumer machines. for example, the desktop client’s sync engine has a metadata consistency checker that compares the client’s view of the filesystem to the remote filesystem at a snapshot, and we report any discrepancies up to the server. any report should indicate a bug in our protocol to be fixed.

        but… we do have to bucket out differences in metadata that are separated by one or two bitflips. it’s not a huge number but it shows up when you’re trying to have zero inconsistencies over many 10s of millions of machines.

        1. 1

          I love seeing examples like this of the law of large numbers. See also: “it’s never the network”, “this random string will never appear in customer input”, (ab)using JavaScript numbers as monotonic counters, etc.

          1. 1

            Cool! Is it usually a small subset of clients that have most of the bitflips or are they pretty evenly distributed?

            1. 6

              you know, I had never actually looked at the distribution. I just ran the numbers and indeed a handful of hosts make up the majority of bitflip inconsistencies.

              1. 2

                Thanks for running it! My (strawman) theory is that most of these bitflips are coming from bad RAM rather than external factors like EM radiation.