1. 14
  1.  

  2. 3

    The state of reality at the OS layer is almost as bad. World-class jackass Rob Pike insists that OS research is dead. As an observation, it’s deliberately ignorant; as an assertion, it’s abominable.

    I think OS research will pick up again because of virtualization. As virtualization gets more popular, all of the hardware will fade away and become commoditized and operating systems will just (or can just, at least) comprise of lightweight drivers for standardized, virtualized interfaces like the network and disk interfaces presented by vmware, qemu, xen, etc.

    Everything will look the same to the operating system and vendors can keep all of their hardware-specific, terribly written voodoo code in firmware blobs or EFI drivers or whatever. This will put pressure on hardware vendors to compete by making better hardware/firmware since it will all be commoditized and look the same at the OS level and it will be easier for consumers to benchmark one device against another rather than relying on OS-specific hacks and workarounds inside software drivers.

    Free operating systems will get to redirect their attention to things within their realm (OS research) rather than devoting resources to writing hardware drivers for the latest and greatest network card. When many OSes can run on lots of new hardware because it all looks the same, users get to experiment with different OSes without any hardware compatibility problems.

    (I am actually just hoping this comes true so OpenBSD can always run on the latest and greatest hardware without compatibility problems or having to waste time writing, debugging, and updating hardware drivers.)

    1. 7

      I work on SmartOS, an operating system somewhat less mainstream than even OpenBSD. Until Keith (the author of the blog post) retired, he sat at the desk behind me. One of the most frustrating areas we deal with in the maintenance of our chosen operating system is debugging problems induced by poorly constructed firmware.

      Firmware is the undebuggable, unobservable nightmare cathedral that lies between a reasonable operating system (be it OpenBSD, SmartOS, or something else), and the hardware beneath. When it breaks it can be exceptionally difficult to get a vendor to look at the problem, much less admit that anything is actually wrong. Recently, Intel changed the sector size that some of their SSD units present. When we pointed out that this is not alright, and that we would like to return to a less-broken earlier version Intel said simply that “you cannot downgrade.” New SSDs can no longer be procured with a pre-change firmware version.

      In the realm of the entirely bizarre some of our Supermicro X9-series systems would, after handing control to the boot loader, continue to write to areas of physical memory handed to the boot loader for use. The problem was never fully diagnosed by the vendor; they basically gave up, after telling us it was probably something in the UEFI networking stack. We ended up modifying our boot loader (grub) and our OS to avoid using memory in the physical address range [c700000, c800000).

      Writing a driver for well-designed hardware that doesn’t include large, complicated firmware-driven behaviour is actually relatively simple. New generations of well-designed hardware often do not require large sweeping changes to drivers to work. The PCI to PCIe transition is a good case study in hardware being built to be backwards compatible. Some especially well-behaved vendors even offer BSD-licensed driver code written in a relatively OS-agnostic way, as Intel does for their Ethernet NICs.

      What we truly need is for vendors to let go of the idea that complicated, opaque firmware makes for added value; instead they should publish specifications for their hardware, or liberally licensed driver code. BIOS/EFI vendors need to let go of the idea that the way to provide fault detection and system monitoring is via mandatory, uninstrumentable system firmware; operating the system is emphatically the job of the operating system. The OS is where we are best equipped to detect and handle faults, and to make policy decisions and forward telemetry to the operator or to other remote systems.

      With all this in mind, I absolutely hope that we do not end up in a world where all of the “heavy lifting” is done by the hardware vendor in a way that we cannot possibly hope to understand, influence or replace.

      1. 4

        I think your Supermicro story demonstrates my point. Vendors are creating shitty hardware (and writing shitty firmware) expecting the OS to fix up their mistakes because it’s cheaper than redesigning a hardware component. Vendors have long relied on their Windows drivers to work around bugs in hardware, which then have to get rediscovered in Linux, and then make their way to the less popular OSes.

        If the option of driver hacks were taken away and vendors had to start doing everything properly in hardware (or their firmware), the end result is something more reliable and less buggy (though they would probably not be able to come out with new devices as fast).

        Some especially well-behaved vendors even offer BSD-licensed driver code written in a relatively OS-agnostic way, as Intel does for their Ethernet NICs.

        Ever looked at the Broadcom wireless drivers in Linux? I think it’s a bit unreasonable to expect anyone other than employees at Broadcom to be able to do anything with that code.

        OpenBSD has Intel’s DRM stack ported from Linux but that required a lot of effort to adapt to a different kernel and now to periodically sync. Nobody outside of Intel is changing any of that code that actually talks to their hardware, so what is the point of all of that code being there other than necessity? How are developers or users benefiting from it vs. it being some code running on the graphics chip at the BIOS level? If Intel’s cards gave the same performance and features through their VESA interface and all of the complexity was handled by the card itself, wouldn’t that be better? Much less code for OpenBSD or any other OS to implement and now that goop is all running on the graphics hardware instead of the host CPU. Less security and reliability concerns to deal with, and the people that actually know and understand the hardware are in charge of dealing with it rather than the 1 person in OpenBSD that syncs some code from Linux.

        With all this in mind, I absolutely hope that we do not end up in a world where all of the “heavy lifting” is done by the hardware vendor in a way that we cannot possibly hope to understand, influence or replace.

        Ok, perhaps firmware handling more aspects of the hardware impacts your ability to understand how it works, but with a modern Wifi+Bluetooth card, how many people can possibly understand how all of that works? Why should so many more people need to understand it just to be able to use it with their OS?

        As for influence and replacing, having commoditized hardware would actually make it that much easier to replace. No longer do you have to do system compatibility checks and driver updates, you just swap out a piece of hardware for another from a different vendor and the OS sees it as the same type of device with the same driver.

      2. 1

        I stumbled upon this post again. This post is only 3 month old but in that time multiple companies have dedicated themselves to containers as the unit of deployment. The downside being: it means you’re stuck on Linux. Virtualization will still play a role but it seems, to me, that it’s going to advance a lot slower now. That is good and bad. The bad being Linux continues to grow, the good being virtualization is pretty nasty anyways. I sympathize with the author. I feel frustration every time I have to work around some poor vendor decision or glue two components together because of bad reasons.