I think the requirement to fsync the directory in addition to the file is a combination of overblown urban myth and stupid bug. I know it gets repeated a lot, but the claims are rarely specific. Yes, filesystem developers say they don’t need to sync metadata. But which developers of which filesystems? The semiannual posting of “fsync is hard” would be more informative with some concrete examples and discussion of status quo in various real world systems.
That said, if you use SQLite as “modern fopen” then hopefully somebody who cares much more than you do has already gone to the trouble of getting it right.
The classical example of needing to fsync the directory appears to be at least some versions of ext4 on Linux (it’s possible that this applies to ext3 as well, but I’ve only seen it cited for ext4). However, this points to the broader issue of what FS developers feel they can get away with. A new FS developer who wants to avoid expensive syncs can now point to historical ext4 behaviour and this general advice and say ‘see, forcing people to fsync directories for durability is totally within the bounds of legality and so my FS is totally within its rights here’ (this is what you could call the C compiler optimization excuse).
More broadly, if you’re writing even future-proof code (never mind portable code), you need to make the most conservative assumptions possible. If FS developers can get away with it, you have to assume that someday your program will be running on such a FS; otherwise, you wind up in the position of people who wrote for ext2 or traditional old Unix semantics, wound up having their program running on some version of ext4 with some settings, and kaboom.
(I’m the author of the linked-to blog entry.)
Personally, I’m pretty disappointed in this line of reasoning (though I know you didn’t invent it). Somebody found a way to parse posix to permit this behavior. And so we have it, because fast. But if approximately every program written is now incorrect in this new faster world and requires 37 fsyncs, is that an improvement? Things will only be faster until programs are fixed, and then they will be just as slow, if not even slower because fsync of the directory will sync other entries, not just the one we care about.
The comparison to C optimization is apt (and I have thoughts aplenty on that), but at least the C standard was clear about undefined behavior even if people were careless about it in practice.
I’m not sure if it was the case here, but as far as I recall, at least some of POSIX’s oddities and undefined behavior are there for the same reason as C’s, to allow fairly different systems to each do things their “native” way without imposing a requirement for expensive emulation of a strictly defined semantics. In the case of POSIX filesystem semantics, a bunch of the stranger bits are probably there to accommodate network file systems.
I guess its a similar problem to generalist developers writing their own crypto code.
It’s a more subtle than expected problem and the failure case isn’t exposed to the programmer and can’t really be tested, but a proxy exists “I can see the file and it’s contents with ‘ls’ and ‘cat’ so I created it correctly” / “I can’t see the plaintext of my message, so I encrypted it correctly”.
In both cases, job is marked as done and people move on.
It would be an interesting stress test (like the Jepsen, but for single-system durability testing) to run a system in a VM, express various assertions about data durability, and instrument the kernel to check those assertions at various points. It should be possible to find a number of durability problems that way.
i.e. use code inspection and then ptrace() to find fd’s which should be durable, smuggle the pid+fd into some of your kernel code which then keeps track of all data blocks written to that fd, then checks at close() time that they have been syncd.
The author posted a follow-up today, Why Unix needs a standard way to deal with the file durability problem.