1. 8
  1.  

  2. 19

    It is perhaps worth noting that this article appears to be in complete ignorance of the history of both shared libraries and filesystems. Although I started skimming after a while, what it appears to want for library code is static linking, which is alive and well in modern language environments such as Go (and causing problems too, of course).

    I feel that there are interesting arguments to be had about getting rid of both shared libraries and filesystems, but you are unlikely to find good ones in an article that is historically ignorant. People have looked into these issues before and the current approaches exist for good reasons; recreating the mistakes and missteps of the past is not going to be progress.

    1. 18

      Agreed.

      Some lines from the article that made me frown:

      C is a programming language that has been designed with the idea of shared/dynamic libraries in mind. It was created in a time of turmoil where code ran only on a few types of machines and was a headache to get working on new machines or competitor’s machines.

      Lolwut? C didn’t have any notion of dynamic libraries for many years.

      When you “link” this program, usually at the time when you compile it, it performs a process called “relocation” where it fills in the madlibs with the locations of the routines on the specific machine you want to run it on. These are the “shared libraries.”

      That’s not how…damn it. Shared libraries refer to dynamically-loaded libraries, not normal statically-linked libraries. That’s not how this works.

      Here is my theory on filesystems, and I’m just gonna throw it out there… they were designed once and then people forgot to change them.

      This theory is bad and the author should feel bad. There was a vast amount of research and continual reinvention of the implementation and philosophy of filesystems–FAT, NTFS, HFS, ZFS, XFS, ResierFS, etc. Claiming otherwise is ignorant.

      Oh, and Plan 9. The author is a hack.

      People don’t want hierarchical filesystems, yet we rarely protest them.

      [citation needed]

      I’ll go back and edit in more grumping later, but this is honestly just off to a terrible start.

      1. 7

        People don’t want hierarchical filesystems, yet we rarely protest them.

        You can take my hierarchical filesystem from my cold, dead hands.

        I’ve seen what user interfaces do when “flattening” the file space, like in many applications on Android and certain places in macOS. You get a coarsely filtered (by time and/or filetype) epicly long list of files.

        1. 4

          Although I started skimming after a while, what it appears to want for library code is static linking, which is alive and well in modern language environments such as Go (and causing problems too, of course).

          Lolwut? C didn’t have any notion of dynamic libraries for many years.

          Wasn’t System V the first unix that could do dynamic linking? I thought that this was mostly predicated on MMU support and early Unixen lacked a reasonable way to do this.

          That said it always amuses me how people think go and rust are unique with static linking. There have been scientific programs that have used static linking EXCLUSIVELY for almost 30+ years in fortran/c/c++. They did it because it is faster and generally sharing wasn’t an issue when you have full use over the system you’re on with stuff like MPI.

          And as someone that has to work in filesystem code, I agree that I hate filesystems as an abstraction. But that is more dealing with the edge cases that always make your so elegant code look like spaghetti.

          I will also admit to hating shared libraries in general, but this has more to do with gnu libc and its utter braindead inability to behave in a backwards compatible way. musl libc has made my life much better in this regard.

          1. 4

            Good, general shared libraries essentially require mmap(), so they arrived in Unix with it (which I believe was in SunOS 4 and then System V Release 4 and Solaris 2). However, people did earlier, more limited versions of dynamic linking in earlier systems; this was generally done to save disk space and involved hacks like fixed load addresses for the dynamic libraries involved.

            (I know that the AT&T 3B1 did this and I believe the Acorn RISC iX did as well. The Acorn was an especially impressive feat, as they managed to squeeze a full Unix with X Windows into a remarkably small amount of disk space and memory. I believe the Acorn people wrote an interesting paper on how they did it, although I don’t know if it’s available online any more; the best reference I can find is this old Usenet post.)

        2. 5

          recreating the mistakes and missteps of the past is not going to be progress.

          Alas, that is what is being done all the time.

          Reading the artice, Venti came to mind.

        3. 6

          Instead of data being in hierarchical directories, the data is essentially just floating blobs like that of a lava lamp.

          Hierarchical file systems are by no means a panacea, but this is even worse. I like being able to locate my files in a predictable manner, so I’m careful to put related files close to each other. When directory hierarchies don’t cut it (e.g., for managing my personal image and video collection), what I reach for is something with more structure, like a relational database.

          1. 4

            This “blob” system has been worked on for a long time anyways, with limited results. Tools like Spotlight in OS X are roughly built on this idea. It’s been tough going. For myself, I find it almost impossible to find what I want with Spotlight. Maybe that means the technology needs to go farther. I have written a few small applications that store data and use the “tag” approach and when I find is it isn’t really much easier. Finding the right tags is always hard, even if curated. And one ends up having many tags so exploring the data is not really much easier, and that is only with a couple hundred tags. You’re presented with all the options at once rather than a rough categorization that you slowly explore. Humans seem to put all knowledge in to a hierarchy, this tag approach goes against that and I have a hard time seeing it being better.

          2. 3

            To me, this read as the plethora of ideas that go through ones head when they first learn about content-addressable storage. The devil is in the details, though. It seems so simple and elegant but there are problems with performance, what to hash, downloading executable code from random machines, etc.

            1. 1

              Getting rid of shared libraries by essentially statically linking, but having the data for that library link back to a shared file on the filesystem is essentially the same thing. Except now you can’t sign the whole executable and change the library, you have to choose. You can sign the “application code* at the start which is basically what we do now, but if you tack the shared libraries on the end then what is the point when you could just dynamically load them.

              Minimizing global applications and libraries is a good idea. I’d love it if I could run dnf/apt-get as a regular user and have my own versions of apps and libraries installed for my user only. Winodws and Mac can do this if the app supports it, installing to a single users' home folder.

              Sharing libraries is still a good idea so that when some more crappy code in openssl is discovered you just have to run in my hypothetical system, “sudo dnf update” to update the system and “dnf update” as my regular user to update my private applications version of openssl. So I could have 2 applications on the “system” using openssl and my user could have 3 entirely different applications.

              I know chrooting offers something like what I want, but I would prefer it if Ubuntu and Fedora added per user dnf/apt-get by default to too.

              1. 1

                I found the part comparing his theoretical filesystem to git interesting, though I’ll admit that I dont really understand how modern filesystems actually save data, do they just diff it and overwrite only the changed parts?

                The part about using the hashes instead of dynamic linking seemed peculiar, and I’m guessing would have the same limitations as a cdn does, namely, all the different libraries and versions of libraries would mean that you’re almost never wounding up with a reusable cache. Static linking does seem like a better solution here, though I might be wrong.

                1. 7

                  how modern filesystems actually save data, do they just diff it and overwrite only the changed parts?

                  As with most things: it depends. On most filesystems in popular usage today, the file is like a big byte buffer and when you write a change to it, the byte buffer is modified in-place. There are a bunch of layers between that abstraction and reality, but that is the rough idea. This is one reason a computer crashing can leave the file system in an inconsistent state or corrupt files. If the machine goes down while the byte buffer is in mid-update, the file will be corrupted. I see this a lot in log files in Linux with ext where if the machine crashes hard the log file will have a bunch of random data at the end. This (I assume) is because the length of the file has been updated and committed but the data being written to it has not happened and is lost in the reboot. But I could be wrong about the actual mechanism, there. Journaling file systems try to alleviate some of this by writing the planned operations to a journal, then executing them. So if the machine goes down the journal can be replayed.

                  There are another class of file systems, which ZFS and Btrfs fall into, called Copy-on-write. In this case, a write copies the file and makes the changes then the pointer to the file is moved to the new location and the old one is cleaned up later. This means that a file can never be partially written. In a crash, the pointer might not be moved, but the data will always be consistent. It also makes things like snapshotting efficient. One just remembers the previous file pointers. Of course, there are optimizations underneath. In ZFS (I don’t know anything about Btrfs), the file is broken into blocks and the blocks are what is copied an updated, rather than the whole file. The analogy here is a persistent data map like in a functional language. The map is immutable so updating it involves creating a new root node with pointers moved around to the new data. Snapshotting is simply remembering old root nodes. ZFS is much more complicated than something like ext and has much higher resource requirements. But for hard-core storage, those are usually acceptable.

                  And I’m sure there are other exotic file systems that do something different.