1. 35
  1.  

  2. 10

    On the downside, containers offer weaker isolation than VMs, to the point where people run containers in virtual machines to achieve proper isolation.

    That may be true of Linux containers. But in Solaris and FreeBSD, the container solutions are quite secure to the point where one wants to run the VMs in containers.

    1. 4

      I think one of the big wins for running VMs inside of containers is isolating some of the complicated usermode device drivers from the rest of the system. There is still a lot of VM implementation that lives in the kernel that is not isolated by the container. Also, I don’t think Solaris or FreeBSD jail or zone solutions are as close to secure as hypervisors. I’m not sure what metrics are good for comparison but I feel the attack surface of a container (ie: the whole of the kernel syscall API) is going to be much larger than the attack surface of a VM even if the VM has lots of crazy devices being emulated.

      1. 3

        I don’t have in-depth knowledge in either, but I have been lead to believe a Jail/Zone is much simpler to implement than a VM. The VM has a bunch of hardware + a supervisor kernel involved in it whereas Jail/Zone seem to be more like a specialized process in the kernel. But maybe @jclulow could comment.

        1. 6

          Hypervisors are actually really easy to implement these days with virtualization extensions in modern processors. At this point it’s basically just implementing a second layer of context switching.

          Containers on the other hand, you have to thread isolation logic through every part of the kernel and make sure you don’t forget anything anywhere. Eventually the kernel evolves internal abstractions that support containers as a first class citizen, and it gets less difficult. But for the initial phase of containerizing subsystems you end up with a lot of holes or mismatching interfaces, see Linux containers.

          These days most container instability is caused by docker wankery. All the overlay filesystems are really new and known to hang or cause other problems. Jails don’t fuck around with that, they just use a chroot. Zones have ZFS. LXC, which has traditionally tried to act more like Jails, has far fewer problems. As I understand it, Heroku dynos have been LXC for nearly a decade. Although maybe things have changed there, or I am unaware of difficulties they have had with LXC.

          1. 1

            Heroku is kind of weird. Like its containers on top of VMs [at least last time i checked]. So:

            1. you get at least as bad performance as running on VMs
            2. you get worse security than both containers and VMs. since you can be attacked either via other VMs that share the same physical host or other containers that share the same VM

            Heroku’s ease of use must make up for some of this but I’m not sure why people don’t just use elastic beanstalk.

            Also, heroku seem to be quite opaque about security patching of their kernels. Like obviously they need to be doing it but it’s not clear when they are doing it. (Or at least the information is not easy to find on their website) As a user how do I know how long my software was exposed to ‘publicly’ known linux kernel exploits or whether they were vulnerable at all. To be fair a lot of cloud providers are like this. Though, amazon seems to post stuff like this: https://aws.amazon.com/security/security-bulletins/AWS-2017-004/

            I suspect part of the reason heroku is opaque about this is because it would scare customers off.

    2. 7

      The key to this work is throwing out old assumptions and requiring explicit guest support.

      Historically, VM systems “had” to be able to boot guests which didn’t need to know they were in a VM, but the guest could optionally implement dedicated “hardware” drivers to have more optimized I/O than through emulated devices. Still, you could take the install media for various OSes and install them all.

      This project requires explicit guest support for basic boot-up. Which is great, if your model is around managing everything in the guest and you can make that demand. They reap major benefits from doing so, and there’s no reason for everyone creating images for deployment needs to be held back because the target system is also trying to be compatible with stuff which you’ll never deploy. But it’s very much a case of needing the guest to be compiled explicitly for the target hosting platform.

      Since the competition is structured containerization with something like a Dockerfile defining entry-points, environmental dependencies, etc, this is not different. It’s a great trade-off. But it is made possible by the target audience having moved and adapted to a world of on-demand machine instances and container workloads.

      1. 5

        Related site: LightVM

        But what is this TinyX (a tool that enables creating tailor-made, trimmed-down Linux virtual machines) they speak of? All I find in a search is a “tiny” X server.

        1. 5

          But what is this TinyX (a tool that enables creating tailor-made, trimmed-down Linux virtual machines) they speak of?

          https://github.com/sysml/lightvm/issues/1

          1. 2

            Ah, good piece of information that perhaps should have been mentioned in the paper as “not ready for prime time”.

        2. 3

          Related (currently empty) repo: https://github.com/sysml/lightvm