1. 4
    1. 3

      This article does not answer the question it sets in the beginning, namely:

      …after an fsync() call recently modified data can still sit in the drive’s volatile write cache and, thus, it may be lost in case of a power failure. If you want any meaningful durability, you should go for enterprise-grade drives that have a battery/capacitor so that they can flush the data to persistent storage on power loss. Is it really so?

      The eventual conclusion:

      the REQ_PREFLUSH flag (as well as the REQ_FUA flag) tells the block device that it should flush its volatile cache to the persistent storage. Drivers for any well-behaved disk with a volatile write cache should handle this flag properly.

      So Linux sends a flush command to the disk controller and waits for it to complete. But the assertion I have heard for many years is that some/many consumer drives either ignore this flush command or send the response immediately without waiting for it to complete … because it improves their benchmark scores.

      (I first heard it back in 2005 or so from filesystem engineers at Apple, and was told that Apple tested drives and didn’t use any in its own products that took this shortcut.)

    2. 2

      There’s also some history here. Before 2008, Linux used to not require a disk to flush its volatile cache before acknowledging the write https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg272085.html

      This was fixed in https://lkml.iu.edu/hypermail/linux/kernel/0805.2/0393.html

      Also note that the story of macOS making it hard to require durability is broader than what what the linked HN thread discusses. I have some notes about this.

      1. 1

        You have some typos in your blog around the word “physically” in the big blockquote.