1. 17
  1.  

  2. 12

    Discussing the future of operating systems is fine, but not while ignoring the past. There had been at least 20 years of OS research before linux with a lot of interesting ideas wildly different from the “everything is a file” world view.

    For instance, z/OS. It has been powering large mainframes for 40 years. Every user run in its own VM (z/VM) where he has access to a full OS. Everything is not a file but a database. It’s the grand-daddy of cloud OSes.

    Symbolic operating systems, like those on Lisp machines, also are a lot of fun to read about.

    1. 4

      That’s VM/CMS (CMS, Conversational Monitor System, being the per-user VM). z/OS is a batch processing system historically, still used for that role.

      1. 1

        I read somewhere that VM/CMS was one of the few hotbeds of “hacker culture” outside of UNIX.

    2. 7

      What I would like to highlight is not that we should replace all configuration files with function calls, but that configuring a system is similar to writing a program. Except that in the Linux world right now it is done with one of the worst programming languages that you can imagine: a spaghetti of files, weakly typed, and full of implicit magic. Programming languages in general have evolved over time to become easier to use and to offer more guarantees about their correctness. None of these evolutions apply here.

      Strongly agree with this characterization of the contemporary linux programming environment.

      While I took the example of a function call, I believe that declarative programming is the most appropriate tool for configuring a system. More concretely, I believe that a system should be described by a single declarative-style program, similar to what NixOS is for example doing. NixOS, however, doesn’t go far enough. The validity of this system-wide configuration should be fully verifiable ahead of time. To take one of the slogans of Haskell: “if it compiles, it works”. It should simply be impossible to setup your system in a way that can malfunction.

      It’s an overstatement - in both Haskell and NixOS - to say that “if it compiles it works”. It’s certainly possible to have a buggy compiled Haskell program or a malfunctioning Nixos system where the config successfully applied. But these cases are the result of limitations of the type system to model all the behavior a user might care about; that doesn’t imply that making an OS’s system configuration more verifiable is bad by any means.

      Programs should stop relying on configuration files altogether, and more generally on the presence of some files on the file system. Relying on this shared global state that the file system is, is against mathematical purity. More on that later. Instead, all the parameters and configuration that a program needs to run should be passed as input. In order to permit ahead-of-time verification, programs must provide metadata that makes it possible to verify that the input is potentially correct. Launching a program should offer the same guarantees as calling a function in a strongly-typed programming language has.

      Yes.

      I can’t help it but see the rise of Docker containers, and virtualization in general, as a consequence of the failure of the existing operating systems to properly isolate programs.

      Docker lets you execute programs in an isolated environment, to ensure that said programs can’t take control of the system or interfere in a negative way with the other programs running on the same machine. The question I’d like to ask is: why isn’t this the case by default?

      I’ve had the same thought myself. And yes, I’m aware that SELinux is a thing, but for whatever reason, it’s never been tractable for me as a desktop PC linux user to learn (maybe I should have - on the other hand, why aren’t the SELinux guarantees default themselves?).

      Probably the biggest and most problematic area when it comes to isolating programs is the file system. While it is a very radical solution, I believe that programs should completely stop sharing access to a single global file system.

      This is definitely worth considering - on the other hand, this is more or less the way that smartphone apps work, and it is genuinely limiting. One of the things that a global filesystem provides that is actually useful, is a paradigm where you expect that many different programs can operate on your personal data (what the article characterizes as “ the content of /home: your family pictures, whether your IDE defaults to spaces or tabs, save games of your video games, and so on”). It’s handy that I can chose to store all my photos in ~/Photos and chose to interact with them using GIMP or eog or krita or blender, without the storage of those photos themselves being tied to one and only one application.

      All the design proposals presented in this article are fully or partially implemented in my redshirt operating system project.

      Please note, however, that redshirt is a personal side-project. I am writing this article in order to present how I think an operating system should be, rather than as a way to promote redshirt.

      Well, regardless of your intentions, the article did end up promoting redshirt, insofar as it’s made me somewhat interested in checking it out.

      1. 6

        It feels like Plan 9 needs to be mentioned in all the quotes about the file system. It has per-process namespaces.

        Linux bind mounts came directly from Plan 9 as far as I understand:

        http://man.cat-v.org/plan_9/2/bind

        https://news.ycombinator.com/item?id=3075735

        https://unix.stackexchange.com/questions/8337/what-aspects-of-plan-9-have-made-their-way-into-unix

        1. 4

          Remarkably, Genode has no global filesystem namespace, and similarly handles POSIX applications.

      2. 5

        I really like Varnish (the HTTP proxy); I think VCL is great, and certainly a lot better than e.g. HAProxy’s configuration. While it’s not exactly what the author is proposing here, it’s the closest example that I know of and is actually used.

        But is it also easier? I’m not so sure that it is. For simple configurations I’d say that HAProxy is usually easier, but for more advanced ones Varnish is. And “grokking” the logic isn’t always easier either; especially not for people who are not programmers. I never struggled much with this as I’m a programmer first and sysadmin second, but a lot of people are sysadmins first and programmers second.

        Most of the time I’ve seen sysadmins programming the results were … less than impressive IMO. You see the same in crypto and scientific circles: some of the worst code I’ve seen is from brilliant cryptographers or scientists. These people are much smarter than I am, but they’re also not programmers. Although there are also sysadmin, crypto, and science folk who are good at both their chosen field and programming, but it’s not the norm in my experience.

        I suspect that having all configuration as code will end up making things more of a mess, rather than less, even though I personally like the idea I suspect that in reality it will be a world full of shitty ad-hoc “wtf is this?!” configuration scripts.

        1. 2

          It’s been a few years but when I was in charge of doing some sophisticated caching in Varnish I had the “map” of which step belongs to which directive and in what order they are executed printed out on the wall next to my desk.

          Some coworkers were laughing it off until they had to work with the config. I’m not saying Varnish is bad or overly complex, but like with every macro/DSL/… you can usually only gain power with complexity. If it can do everything you probably need to grasp how it does everything.

        2. 4

          Sandboxing and locking out access to the filesystem is why I, as a user, run screaming from Android. Sacrificing usefuless on the alter of security and correctness. A terrifying future.

          1. 3

            It can be done properly. Suppose the file picker dialog is a separate process with a different set of capabilities. An application such as a webbrowser can then save files anywhere, without having the capabilities to do so, just by using a capability to a file handed over by the file picker dialog.

            1. 4

              That’s more or less exactly how it works in macOS for sandboxed (App Store) apps. The file picker runs out of process and when a file is chosen by the user the application is granted access to it. Prior to that it had no access.

            2. 2

              I agree that locking things away from the user in the name of security is a terrible choice that unfortunately seems to often be the default in today’s systems (such as iOS, Android, macOS, etc). (In some cases, it’s done more to preserve app store monopolies than for actual security…)

              In the case of this article, it is not described explicitly, but my impression is the author mainly wants to isolate program state and configuration by default, not to prevent user access, but so that programs can’t easily trample on one another.

              I strongly agree there should always be a path to modify anything on the system as a user.

              1. 2

                It doesn’t need to sacrifice usability. The less a random program can do without my consent, the more likely I am to be willing to run it. That’s a big win for usability. Most programs don’t need to have access to every file that I own, I am completely happy with a sandboxing policy that requires an external process (e.g. a file picker or a shell) to explicitly authorise access to specific files. Even if I trust the author of a particular program to be non-malicious, I still might use it to access a file or network service that exploits a bug and compromises it. I’d much prefer that such a compromise only gives an attacker access to the files that I’ve opened with the program that invocation, rather than my entire home directory.

              2. 2

                integrating a program into a package manager involves a lot of bureaucracy.

                I’ve never experienced that with the FreeBSD ports tree. Usually it’s just a “patch accepted, thank you.”

                a binary running natively on x86 can’t be the same as a binary running natively on ARM (unless you merge the code of every single platform that exists into the same file, which I’m going to assume is not practical)

                I guess the author isn’t familiar with MacOS and its Mach-O file format? There’s also the Fat ELF standard.

                1. 2

                  Long article. I started by glancing at the conclusions, which turned out to be a good idea in hindsight.

                  Monolithic vs micro-kernel vs your-favourite-denomination-kernel. This is one of the ultimate nerd debates, and I don’t think that the answer to this question is actually very important.

                  The highlighted part saved me a lot of reading. I dismissed the article as easily as the writer dismissed the state of the art in OS architecture; They didn’t even bother with doing some basic research on the subject, yet felt entitled enough to rudely dismiss it.

                  1. 10

                    I think it’s a mistake to dismiss this article based on it not dealing with a single question of monolithic vs microkernel kernel architecture. Most of what it dealt with was the design of operating system userspace paradigms, which I think is as worthy of attention as kernel architecture. Certainly it’s far more visible to the end user than whether the kernel is a monolith or not.

                    1. 2

                      I think it’s a mistake to dismiss this article based on it not dealing with a single question of monolithic vs microkernel kernel architecture.

                      I was otherwise fine with the article not dealing with this, but they didn’t stop at that, and had to add some disrespectful blabber.

                      Most of what it dealt with was the design of operating system userspace paradigms

                      Sure, I did end up reading more, but was understandably pissed off at how the part I highlighted was worded. Just because the author isn’t personally interested, it doesn’t give them agency to do this.

                      1. 4

                        Hahaha, especially since they have an issue open on Github titled “How to handle when a device driver panics?”.

                    2. 2

                      Given some of the grief I’ve had due to faulty drivers or kernel services in production systems, I can only imagine the article’s author has little experience with production servers. I’d give a lot for the ability to gcore and restart currently in-kernel services without rebooting the entire system: less impact for fixes and upgrades, far easier debugging, likely higher overall stability, and a smaller attack surface while we’re at it.

                      The answer to this question is very important.