1. 2

Abstract: “Operating systems provide a wide range of services, which are crucial for the increasingly high reliability and scalability demands of modern applications. Providing both reliability and scalability at the same time is hard. Commodity OS architectures simply lack the design abstractions to do so for demanding core OS services such as the network stack. For reliability and scalability guarantees, they rely almost exclusively on ensuring a high-quality implementation, rather than a reliable and scalable design. This results in complex error recovery paths and hard-to-maintain synchronization code.

We demonstrate that a simple and structured design that strictly adheres to two principles, isolation and partitioning, can yield reliable and scalable network stacks. We present NEaT, a system which partitions the stack across isolated process replicas handling independent requests. Our design principles intelligently partition the state to minimize the impact of failures (offering strong recovery guarantees) and to scale comparably to Linux without exposing the implementation to common pitfalls such as synchronization errors, poor locality, and false sharing.”

  1. 1

    Separation kernels in commercial sector use partitioned components for things like networking and filesystems. I usually don’t see any public research about that stuff. This one is more reliability-focused but should get people thinking.

    The NEaT design follows this work where they claim to preserve the BSD, socket API while knocking out 99% of system calls on fast path to boost performance. They claim their microkernel-based networking gets close to Linux’s stack in performance. The new work tries to do reliability and scaling both at once despite two being contradictory goals.