1. 57
  1. 5

    FYI, some of the architecture you describe isn’t universal — it’s just part of the evolutionary tree of Unix filesystems, not shared by others. For example, Apple’s HFS doesn’t use inodes, and has a rather different disk organization based on a b-tree. (I don’t know if its successor, APFS, shares those features.) I suspect Microsoft filesystems are different too, but I know nothing about them.

    1. 3

      NTFS carries heavy inspiration from VMS’ ODS.

      1. 1

        APFS looks a bit closer to a Unix filesystem, eg) it uses 64-bit inodes

        1. 1

          How can HFS support hard links and inode numbers in stat_t without having a concept of inodes on disk? Maybe you mean it doesn’t have dedicated blocks for inode tables?

          1. 1

            Keep in mind HFS predates Mac OS X (it dates from about 1986) and had no Unix-isms in it at all. Even HFS+ is from the mid-90s. Some more features like hard links and journaling got added for Mac OS X.

            HFS always had persistent directory IDs, and I think file IDs were added in 1991 to support the “aliases” feature. Those are basically like inode numbers. But there aren’t inode structures in each directory’s contents the way there are in Unix-derived filesystems. More than that I can’t say — I’m not a filesystems expert.

        2. 1

          Why are the timestamps wrapped in Option<>?

          1. 6

            it’s common to disable atime etc… I can’t count how many bugs I’ve had to deal with over time due to misunderstandings of mtime, ctime, atime etc… Making the ones other than creation time optional is a nice technique for reducing bugs due to representing a type that might not actually be present by some placeholder value that only those who have already been burned enough times might remember to properly handle next time.

            Translating many types of common gotchas into compile-time constraints through types like this is one of the biggest timesaving features of Rust as I use it.

            1. 2

              But as soon as anyone stats it you’re going to have to make up values for them anyway. I’m not aware of any “real” filesystems (*nix ones, anyway) that have any analogous notion (however encoded) of a timestamp that might be missing, and I’m having trouble thinking of a case in which a filesystem-internal operation would be doing anything with the semantic value of a timestamp other than reporting it to a running process (via an API that doesn’t support Option<>), so I still don’t really see the point.

              And just to be clear: I’m not an anti-rust partisan or anything (in fact I’ve been introducing bits of it at my current job) – this particular pattern just seems like additional complexity for no discernible benefit.

              1. 2

                Making the ones other than creation time optional

                Creation time is mostly useless. Hell, Linux didn’t even have an API to read the file creation time until recently.

                Modify time on the other hand is how things like make work.

                1. 7

                  It doesn’t matter if it’s useful to you or not. If a file has been created, then it has a creation time. It will not have a modified time until it is modified. The purpose of the Option is to signify when something does not exist, without resorting to error-prone magic values.

                  1. 5

                    Huh? Creation time is super useful for end users. I often sort Finder windows by creation time, especially of documents I’ve created.

                2. 1

                  Disclaimer: not the author.

                  created_at is not an option but accessed_at, modified_at, and changed_at are which makes sense in the way that a file may not have been accessed, modified or its properties changed since its creation time. It could also be there for ease of implementation of features like noatime mount option. However, currently code is initializing all of them to the same value when instantiating an Inode so not sure whether that argument holds water.

                  1. 1

                    Yes, that is exactly the reason. And right now I’m initialising the values to mimic the behaviour from my local machine (APFS).

                3. 1

                  Great post! Love the name too!

                  Perhaps it’s for historical reasons or even a dumb question, but why have separate “indirect” level qualifications vice just having a block that either points to data or it points to another pointer? It seems that would allow near infinite “levels.” The only reason against I can think is that you’d need some sort of tag to denote pointer vs data, but that tag could be a single byte or even single bit.

                  1. 1

                    Thanks! I’m glad you liked it. You’re the first one that got the name reference :)

                    Re the blocks, it’s probably like you said, for historical reasons. Both the FFS paper [1] and if you go back to the “UNIX Implementation” [2] talk about triple indirect pointers without discussing using a “tag”.

                    [1] https://people.eecs.berkeley.edu/~brewer/cs262/FFS.pdf

                    [2] https://users.soe.ucsc.edu/~sbrandt/221/Papers/History/thompson-bstj78.pdf