1. 19

  2. 10

    Linus’s point that nobody checks errors for close is strong. For example, ignoring errors for close is an explicitly documented behavior of Rust standard library: https://doc.rust-lang.org/std/fs/struct.File.html

    Files are automatically closed when they go out of scope. Errors detected on closing are ignored.

    1. 2

      I never handle errors on close() because usually I’m in the destructor of an object (in C++ with RAII) and there is just no way to handle it properly since you aren’t supposed to throw exceptions in destructors and the destructor cannot return a value. Also, how should I handle a close() error? Should I retry opening the file and redo all the operations? My program might have had this handle open for hours if not days, it cannot cache all the things it has done to the file. It is extremely unclear what could cause close() to return an error and how to handle it. If nobody knows how to handle an error, it is likely that everyone will ignore it. It is also unclear what breakage a close error can cause. Unless the file is of critical importance, I usually just log it and hope that someone reads the logs.

      1. 2

        A decent pattern could be, in my opinion, using a temporary file as a bridge.

        Create a new file, fill it up with content. Try to close it: if everything worked, fsync worked, close worked, replace the old file with the new file by a rename.

        Upon failure you tell the user and don’t discard the old file. No data loss, just a lost update.

        Then I admit this approach does not scale well, if the file is huge. But again, every case is different.

        1. 1

          Also, how should I handle a close() error? Should I retry opening the file and redo all the operations?

          No, you report it to the user with the file path. At the bare minimum.

          Everything else depends on the error code returned and the semantics of your program.

          1. 1

            What would the user do about it?

            1. 1

              In the more general case of why should I tell my user about anything they cannot do anything about:

              • so they can be aware of it.
              • so they can take steps to verify the results
              • so they can file a bug

              This doesn’t mean you have to pop an alert on the screen of the end user but somehow (log file, event log etc) letting someone know (user, admin) is often a good idea.

            2. 1

              No, you report it to the user with the file path.

              The vast majority of software cannot directly interact with the user, and even if it could, it would be useless. Imagine you own an Epson printer and somehow the printer told you /var/epson/printer_conf.ini: error EIO on close().

              To the user, that is completely useless. Maybe you understand it because you are familiar with Linux systems, but imagine the average user seeing this message, they would be completely puzzled.

              Everything else depends on the error code returned and the semantics of your program.

              Okay, so the POSIX standard says close() can return EIO, EBADF and EINTR. How would you handle these errors in your software, other than logging it or reporting it to the user? Do you know any resources or existing code bases where developers can learn how to treat close() errors?

        2. 6

          check this one out to see how hard it is to actually save data without corruption or data loss.

          1. 3

            The number of applications that do even the minimal safety-net of “create new file, rename it atomically over an old one” is basically zero

            Vim (and Nvim, and probably Emacs) does that at least :)

            And if we have a Linux-specific magic system call or sync action, it’s going to be even more rarely used than fsync().

            Not to mention the BSDs all have their own completely different APIs. This is why libraries like libuv are really valuable. Yet many C projects blithely assume that POSIX and ISO and vt100 happy-land is all they need to think about.

            Do you think anybody really uses the OS X FSYNC_FULL ioctl?

            libuv does: https://github.com/libuv/libuv/commit/5b0e1d75a207bb1c662f4495c0d8ba1e81a5bb7d

            But anyways, I agree with the general sentiment:

            So rather than come up with new barriers that nobody will use, filesystem people should aim to make “badly written” code “just work”

            Databases have done a good job of hiding the unreliability of the underlying systems. But having worked on embedded systems and a text editor, my impression is firmly that OSes aren’t the cathedrals I had once assumed they were. Incidentally that’s (one reason) why I’m pretty certain that urbit or something like it makes sense.

            1. 2

              This was before ZFS was a thing on Linux. I wonder what Linus’s opinion is on how ZFS behaves in this area.

              1. 1

                Since ZFS isn’t in mainline, I could conceivably see him just straight up not caring and going “we have enough filesystems in the kernel that are varying degrees of broken, I don’t have time to look at another one”.

              2. 2

                Same goes for any complaints that “people should write a temp-file, fsync it, and rename it over the original”. You may wish that was what they did, but reality is that “open(filename, O_TRUNC | O_CREAT, 0666)” thing.

                Of course they do! SELinux labels? Extended attributes? ACLs? Just the plain old mode? Owner and group? Hardlinks? Softlinks?

                I mean… Seriously? Who in their right would not just overwrite the file if all they care about are the contents?

                And don’t even get me started on moving files with all the different file systems, mount options, bind mounts…