1. 22

Abstract: “I/O is getting faster in servers that have fast programmable NICs and non-volatile main memory operating close to the speed of DRAM, but single-threaded CPU speeds have stagnated. Applications cannot take advantage of modern hard-ware capabilities when using interfaces built around abstractions that assume I/O to be slow. We therefore proposed structure for an OS called parakernel, which eliminates most OS abstractions and provides interfaces for applications to leverage the full potential of the underlying hard-ware. The parakernel facilitates application-level parallelism by securely partitioning the resources and multiplexing only those resources that are not partitioned.”

  1.  

  2. 1

    HotOS this year is a pile of smartass takes from people who seem to have read exactly one paper from previous HotOS and other OS conferences.

    1. 3

      Do you have any specific critiques of the paper and/or alternative approaches that achieve its goals?

      1. 5

        I don’t like the handwavy “everything else is obsolete here’s some vaguely defined ideas that are better because they are so simple”.

        Resource sharing in multicore systems requires synchronization between CPU cores. This road-block for applicationlevel parallelism

        Road block? Come on. Any measurements? Any substantive discussion? Also, perhaps a reference to Galaxy VMS (which was a very smart system, routinely ignored). Maybe an indication that shared resources are often useful and not just an indication of the stupidity of other OS developers? For example, perhaps it would be bad if multiple network applications advertised the same ports or ip numbers or good if the file system were protected from raw disk operations that made holes in it.

        Partitioning resources at the application-level require applications to discover the hardware topology, such as DRAM NUMA locality

        And yet, somehow, Linux is able to allocate for NUMA without application discovery (e.g. with cpusets), not a revolutionary idea either. In fact, Linux and other systems have capability of optimizing memory locality for both topology aware and topology ignorant processes. The shared nothing policy has limitations - e.g. a very large memory footprint cannot involve memory from multiple numa zones.
        Even Windows has a numa allocator that allows processes to get numa optimal memory/scheduling without knowing the numa topology by asking Windows for minimal or more extensive topology information. https://docs.microsoft.com/en-us/windows/desktop/procthread/numa-support There may be a great idea in this, but perhaps they could explain it?

        and then

        For instance, sockets are too heavyweight for high-speed networks [19, 42].

        Reference 19 says:

        We show that we can retain the original socket API without the current limitations. Specifically, our sockets almost completely avoid system calls on the “fast path”. We show that our design eliminates up to 99% of the system calls under high load. Perhaps more tellingly, we used our sockets to boost NewtOS, a microkernel-based multiserver system, so that the performance of its network I/O approaches, and sometimes surpasses, the performance of the highly-optimized Linux network stack.

        i didn’t bother to read reference 42.

        Then

        The parakernel provides asynchronous interfaces and eliminates all blocking operations from OS interfaces. Blocking OS interfaces are detrimental because they require applications to leverage kernel threads for concurrency. This limits application-level parallelism because context switching between kernel threads is expensive, and the application must synchronize data access among the threads.

        This would be cool on an undergrad project, but you’d hope beyond that people would (a) understand that async interfaces are not without tradeoffs (b) that standard obsolete OS’s offer async interfaces - but maybe they have a cool alternative so we read on

        With an asynchronous model, kernel threads are unnecessary and the parakernel replaces them with processes for parallelism and application-controlled primitives, such as coroutines or fibers, for concurrency

        That’s great, we have “processes” instead of obsolete old clunky, um, processes. And applications don’t need to synchronize data access among them because um, we are just using a model of coroutines in a single processor like Go and Lua and Javascript do on obsolete operating systems. Ok then.

        I like how there is no paging in this OS too (and applications have to handle memory faults themselves!). That may be a good idea, but maybe not - perhaps a discussion of tradeoffs ( I just saw a commodity server computer with 900Gig memory, maybe it doesn’t need paging yet).

        etc. etc.

        1. 1

          Thanks for the thorough tear down. Very interesting. I didn’t get to go deep into Galaxy. What was really smart about it?