We are making Firecracker open source because it provides a meaningfully different approach to security for running containers.
Why would I run containers inside Firecracker micro VMs, as opposed to just deploying my software directly into the VM? Is the assumption that I’m using containers already for (eg) local development and testing?
Firecracker is solving the problem of multi-tenant container density while maintaining the security boundary of a VM. If you’re entirely running first-party trusted workloads and are satisfied with them all sharing a single kernel and using Linux security features like cgroups, selinux, and seccomp then Firecracker may not be the best answer. If you’re running workloads from customers similar to Lambda, desire stronger isolation than those technologies provide, or want defense in depth then Firecracker makes a lot of sense. It can also make sense if you need to run a mix of different Linux kernel versions for your containers and don’t want to spend a whole bare-metal host on each on.
Thanks. I was thinking about this in the context of the node / npm vulnerabilities that were also being discussed yesterday. I was imagining using these microVMs to (eg) contain node applications for security, without having to package the application up into a container.
(disclaimer: I work for Amazon and specifically work on the integration between the Firecracker VMM and container software)
Multi-tenant is a big use-case, but so is any workload where there is at least some untrusted code running. Firecracker helps to enable workloads where some third-party, untrusted code is expected to cooperate in a larger system.
In case that’s too abstract, think of a situation where a third-party component handles some aspect of data processing, but should not have access to the rest of the resources that are present in your application. Firecracker helps you establish a hypervisor-based boundary (including a separate kernel) between the third-party component and your code.
As far as I can tell “container” is about supporting a specific packaging format, OCI(Open Container Initiative). You can just deploy your software directly. In fact, I think there is no “container” support at the moment. To quote:
We are working to make Firecracker integrate naturally with the container ecosystem, with the goal to provide seamless integration in the future.
(disclaimer: I work for Amazon and specifically work on the integration between the Firecracker VMM and container software)
“Container” is about the ecosystem of container-related software, including OCI images, CNI plugins for networking, and so forth. We’ve open-sourced a prototype that integrates the FIrecracker VMM with containerd here, and plan to continue to develop that prototype into something a bit more functional than it is today.
A microVM is similar to a VM but does not boot firmware (UEFI or BIOS) and uses a much smaller device model (the virtualized devices available to the VM). Otherwise a microVM provides all the same containment features of a VM. In the case of Firecracker there are only 4 emulated devices: virtio-net, virtio-block, serial console, and a 1-button keyboard controller used only to stop the microVM. Also the kernel is started by executing Linux as an elf64 executable, not bootstrapped by firmware.
Really love that we’re seeing more announcements like this. Despite what some would think Amazon is not the enemy of open source. We’re built on it and have always given back, albeit quietly.
So many hosting solutions around vms, small/light/micro-vms, containers, lambdas… and I still feel the deployment processes and tooling is still far from good enough.
The devil’s in the details. Do a little bit of detail diving here and I think you’ll see they’re at least thinking very hard about the common problems in this space.
Ok, since on my PC, I can dig up my links. Been following secure virtualization going back to IBM Kernalized VM/370. Certain practices consistently got good results in NSA pentests. Others, esp UNIX-based systems, got shredded with more problems over time. Most important was a tiny foundation enforcing security with rigorous assurance that it did exactly what it said with little to no leaks. Examples, focusing on architecture/rigor, include VAX VMM Security Kernel (see Layered Design and Assurance sections), Nizza Architecture that Genode is based on, this work which is either INTEGRITY-178B or VxWorks MILS, NOVA microhypervisor, and Muen SK. All of these were built by teams big companies could afford with several open source for building on and/or commercially available for license. During evaluations, this design style had lower numbers of vulnerabilities and leaks along with better containment of user-mode problems. So, it’s lens I look through evaluating VMM’s.
This work starts out by saying it uses Linux and KVM. Unlike the above, Linux has a huge pile of code in kernel mode with a non-rigorous development process high-assurance security predicted would lead to piles of bugs, a portion vulnerabilities in kernel mode (read: worst-case scenario). Here’s a recent look (slides) at the state of Linux security if you want to see how that panned out. Although Linux integration disqualifies KVM immediately, I did look for attempts by high-security folks at securing KVM or just reducing complexity (step 1). I found this which rearchitected it to reduce privilege of some components without negative impact to Trusted Computing Base (TCB). So, it’s feasible.
If you looked into the details already, you can quickly assess the security by just looking to see if they broke Linux, KVM, and their additions into deprivileged pieces running on a separation kernel or VMM. Then, applied full-coverage testing, a covert channel analysis, static analysis by good tools, and fuzzing. This on both individual components and models of interactions with careful structuring. Each of these usually find bugs or leaks, often different ones. If you don’t see these things, then it must be assumed insecure by default until proven otherwise with rigorous methods that were themselves repeatedly proven in the field. And even then, it’s no secure so much as having lower probability of failure or severe failures over time, maybe none if lucky.
I hope someone surprises me with evidence Firecracker could pass an EAL6+ evaluation at High Robustness. Although usually full of jargon, these slides use more English than jargon describing evaluation requirements with some examples of how NSA/labs rated different things. High Robustness adds a bunch more but I couldn’t find non-jargon link in time I had. Just imagine a higher bar. :)
Been thinking about your response, and I have some questions.
I am not a security expert, and I am so clueless I don’t even know what an EAL6+ IS so I am in no way challenging you, but I am wondering:
Do your comments take the intended use case for Firecracker into account? These aren’t traditional ‘heavyweight’ VMs that are intended to be long running. They’re intended to be used in serverless, where each VM spawns with a single ‘function’ or maybe application running, and the VM lasts for the lifetime of the function invocation or application run, and then evaporates.
My extremely naive understanding of a lot of the security problems around VMs stem from using inherent vulnerabuilities in virtualization architectures to get privesc in a VM and then be able to control whatever resources are running in there, but how useful could that actually be if the VM say only lives for the lifetime of a single HTTP response?
I know so little about container tech and other highly-hyped products that I have to be careful commenting on them. There’s levels of analysis to do for tech security: general patterns that seem to always lead to success or failure; attributes of the specific work with a range of detail. I was using lens of general patterns of what prevented and led to vulnerabilities when looking at details of the work. I saw two, risky components immediately. The TCB principle of security means the solution is only as secure as its dependences and how it uses them. That’s before the solution itself. I don’t know much about the use case, the new stuff on top, etc. I just know the TCB poisoned it from the start if aiming for high security.
“but how useful could that actually be if the VM say only lives for the lifetime of a single HTTP response?”
Well, the concern was the underlying primitives could be vulnerable. So, the enemy uses one or more sessions to exploit them, the exploit escapes the isolation mechanism, the malicious code now has read (side channel) or write (backdoor) access to all functions, and it can bypass their security. Their security might mean reading sensitive data, corrupting the data, or feeding malicious payloads to any clients trusting the service (it said HTTPS!). What risk there is for a given application, HTTP response, and so on varies considerably. It will be acceptable for many users as cloud use in general shows.
The amount of people using these platforms means they either don’t know about the risk, don’t care, or find the current ones acceptable cost-benefit analysis. That last part is why I bring up alternatives that were more secure to better inform the cost benefit analysis. An easy example is OpenBSD for firewalls or other sensitive stuff. Genu’s use it if someone wants shrink-wrap. It rarely gets hacked. So, you can focus on your application security. Firecracker uses Rust plus careful design to reduce application security risks. End-to-end Signal instead of text messages is another easy one. These are examples of blocking entire classes of problems with high probability using general-purpose techniques that (a) are known to work and (b) don’t raise costs prohibitively.
For separation kernels, they used to cost five to six digits to license depending on organization but this is Amazon, right? Could probably hire experts to build one like Microsoft, IBM, and some small firms did. Or just buy the owner’s company getting a start on a secure VMM plus a bunch of other IP they can keep licensing/using. ;)
The amount of people using these platforms means they either don’t know about the risk, don’t care, or find the current ones acceptable cost-benefit analysis.
Yes. I think this is the key. The risks are acceptable given most end users cost/benefit analysis. Security is on a sliding scale that balances out against usability/convenience. Firecracker would appear to solve a real problem people are having in that prior VM implementations were too heavyweight to be used in any kind of serverless context.
The fact that, comparatively speaking, these VMs are, if you’re correct, relatively insecure should definitely be kept in mind, but that doesn’t IMO lessen the perceived benefit of being able to have considerably more isolation than previous container technology provided without the traditional startup / shutdown time that made VMs a deal breaker in this particular context.
So, to re-iterate, I’m not arguing with you, I’m asking that you consider whether or not your reservations about the security of this technology might be more or less useful in evaluating whether it makes sense to adopt this technology given the very particular context and use case it’s trying to support.
So, to re-iterate, I’m not arguing with you, I’m asking that you consider whether or not your reservations about the security of this technology might be more or less useful in evaluating whether it makes sense to adopt this technology given the very particular context and use case it’s trying to support.
I figured we were just having a discussion rather than arguing since your wording was kind. :)
There’s two angles to that which stem from the fact that Amazon isn’t telling its customers that other tech exists that’s way more secure, they’re ignoring it on purpose to maximize every penny of profit, they’re pushing insecure foundations for critical apps, and trying to grab markets worth of money out of that which, again, won’t improve the foundations. If customers hear that, would they (or those use cases):
Be cool with that in general and still buy Amazon happily?
Think that’s fucked up before:
2.1. Buying something secure based on their recommendation after telling Amazon they lost business cuz of this. Update their offerings.
2.2. Grudgingly buy Amazon due to attributes it has, esp price or a needed capability, that the more secure offerings don’t have.
I’m very sure there’s a massive pile of people that will do No 1. I’m just also pretty sure there’s people that will do either 2.1 or 2.2 based on fact that companies and FOSS projects with security-focused software still have customers/users. I can’t tell you how many. I can tell you it’s a regrettably tiny number so much that some high-security vendors go out of business or withdraw secure offerings each year. Some stick around with undisclosed numbers. (shrugs)
I thought it was especially important to bring up possibility of No 2. The existence of high-security techniques that small, specialist teams can afford is something unfamiliar to most of the market. My feedback on HN and Lobsters, highly-technical forums, corroborates that. You can bet the big companies did that on purpose, too, for their own profit maximization. So, that’s where folks like me come in letting product managers, developers, and users know there were alternative methods. Then, they get to do something they reasonably couldn’t before: make an informed choice based on the truth, not lies. An example of that happening, probably more for predictable performance, was the shift of some of cloud market to SoftLayer and other bare metal hosting despite good VM’s being cheaper.
They might make the same choice, esp if market doesn’t have better offering. At least they know, though, that maybe dedicated hosting w/ security-focused stacks makes more sense if they care about security. Many of the micro- and separation-kernels supported POSIX or Linux VM’s, too. So, they could even use the untrusted, but minimalist, stuff to reuse legacy code/apps even if it wasn’t good at stopping malicious neighbors. :)
That’s what they’re asserting, and from where I sit it’s very likely. All you have to do is read anything written by Dan Walsh in the last few years to understand that securing workloads in container enviironments is entirely possible, but decidedly non trivial, because of the relative lack of isolation you’re actually getting from containers when they run in your garden variety Linux system.
Dan has long been a proponent of container hosts running things like SELinux to increase the isolation potential, but as anyone who’s ever tried turning it on knows, administration of SELInux systems can be challenging for the uninitiated.
This is very exciting and I’ll have to make some time to play with it!
Our mission is to enable secure, multi-tenant, minimal-overhead execution of container and function workloads
My background is in cloud hosting, but my last couple of jobs have been in different spaces and involved building internal multi-tenanted platforms (read: Kubernetes, at the moment). This sort of thing is probably overkill for those situations - which is fine, because we need good tools in the hosting space and this looks very promising, just for a different problem set to my own.
Purely out of curiousity, though, does anyone have a significant use-case for this outside the hosting business? Is anyone here looking at this and thinking “hey, that’s exactly what we need”?
Why would I run containers inside Firecracker micro VMs, as opposed to just deploying my software directly into the VM? Is the assumption that I’m using containers already for (eg) local development and testing?
Firecracker is solving the problem of multi-tenant container density while maintaining the security boundary of a VM. If you’re entirely running first-party trusted workloads and are satisfied with them all sharing a single kernel and using Linux security features like cgroups, selinux, and seccomp then Firecracker may not be the best answer. If you’re running workloads from customers similar to Lambda, desire stronger isolation than those technologies provide, or want defense in depth then Firecracker makes a lot of sense. It can also make sense if you need to run a mix of different Linux kernel versions for your containers and don’t want to spend a whole bare-metal host on each on.
Thanks. I was thinking about this in the context of the node / npm vulnerabilities that were also being discussed yesterday. I was imagining using these microVMs to (eg) contain node applications for security, without having to package the application up into a container.
(disclaimer: I work for Amazon and specifically work on the integration between the Firecracker VMM and container software)
Multi-tenant is a big use-case, but so is any workload where there is at least some untrusted code running. Firecracker helps to enable workloads where some third-party, untrusted code is expected to cooperate in a larger system.
In case that’s too abstract, think of a situation where a third-party component handles some aspect of data processing, but should not have access to the rest of the resources that are present in your application. Firecracker helps you establish a hypervisor-based boundary (including a separate kernel) between the third-party component and your code.
As far as I can tell “container” is about supporting a specific packaging format, OCI(Open Container Initiative). You can just deploy your software directly. In fact, I think there is no “container” support at the moment. To quote:
(disclaimer: I work for Amazon and specifically work on the integration between the Firecracker VMM and container software)
“Container” is about the ecosystem of container-related software, including OCI images, CNI plugins for networking, and so forth. We’ve open-sourced a prototype that integrates the FIrecracker VMM with containerd here, and plan to continue to develop that prototype into something a bit more functional than it is today.
FYI: Firecracker is written in Rust.
What do they mean by “microVM”? The marketing docs are pretty short of details about this and googling returns links about programming language VMs.
A microVM is similar to a VM but does not boot firmware (UEFI or BIOS) and uses a much smaller device model (the virtualized devices available to the VM). Otherwise a microVM provides all the same containment features of a VM. In the case of Firecracker there are only 4 emulated devices: virtio-net, virtio-block, serial console, and a 1-button keyboard controller used only to stop the microVM. Also the kernel is started by executing Linux as an elf64 executable, not bootstrapped by firmware.
Thank you! That helps clear things up.
Interesting that they open their technology. Usually AWS is not doing that. I wonder what they hope to gain from that. Simply attracting tech talent?
This is a common mis-perception. Try validating said perception. You’ll find it’s unfounded. AWS contributes to OSS a LOT.
This is the second announcement in two weeks of a major OSS project from us (AWS).
#include <I_do_not_speak_for-AWS.h>
They contribute but they usually do not start open source projects. That is a different thing IMO.
Disagree. This is a counter-example, as are others that can be seen here
Admittedly some of them are OSS projects that work with or are in support of AWS products, but my point still stands.
Really love that we’re seeing more announcements like this. Despite what some would think Amazon is not the enemy of open source. We’re built on it and have always given back, albeit quietly.
There’s some discussion of this announcement in this thread.
So many hosting solutions around vms, small/light/micro-vms, containers, lambdas… and I still feel the deployment processes and tooling is still far from good enough.
“called microVMs, which provide enhanced security and workload isolation over traditional VMs”
“uses the Linux Kernel-based Virtual Machine (KVM) to create and manage microVMs”
That hasn’t been secure so far. At least they’re making smaller VM’s.
The devil’s in the details. Do a little bit of detail diving here and I think you’ll see they’re at least thinking very hard about the common problems in this space.
Ok, since on my PC, I can dig up my links. Been following secure virtualization going back to IBM Kernalized VM/370. Certain practices consistently got good results in NSA pentests. Others, esp UNIX-based systems, got shredded with more problems over time. Most important was a tiny foundation enforcing security with rigorous assurance that it did exactly what it said with little to no leaks. Examples, focusing on architecture/rigor, include VAX VMM Security Kernel (see Layered Design and Assurance sections), Nizza Architecture that Genode is based on, this work which is either INTEGRITY-178B or VxWorks MILS, NOVA microhypervisor, and Muen SK. All of these were built by teams big companies could afford with several open source for building on and/or commercially available for license. During evaluations, this design style had lower numbers of vulnerabilities and leaks along with better containment of user-mode problems. So, it’s lens I look through evaluating VMM’s.
This work starts out by saying it uses Linux and KVM. Unlike the above, Linux has a huge pile of code in kernel mode with a non-rigorous development process high-assurance security predicted would lead to piles of bugs, a portion vulnerabilities in kernel mode (read: worst-case scenario). Here’s a recent look (slides) at the state of Linux security if you want to see how that panned out. Although Linux integration disqualifies KVM immediately, I did look for attempts by high-security folks at securing KVM or just reducing complexity (step 1). I found this which rearchitected it to reduce privilege of some components without negative impact to Trusted Computing Base (TCB). So, it’s feasible.
If you looked into the details already, you can quickly assess the security by just looking to see if they broke Linux, KVM, and their additions into deprivileged pieces running on a separation kernel or VMM. Then, applied full-coverage testing, a covert channel analysis, static analysis by good tools, and fuzzing. This on both individual components and models of interactions with careful structuring. Each of these usually find bugs or leaks, often different ones. If you don’t see these things, then it must be assumed insecure by default until proven otherwise with rigorous methods that were themselves repeatedly proven in the field. And even then, it’s no secure so much as having lower probability of failure or severe failures over time, maybe none if lucky.
I hope someone surprises me with evidence Firecracker could pass an EAL6+ evaluation at High Robustness. Although usually full of jargon, these slides use more English than jargon describing evaluation requirements with some examples of how NSA/labs rated different things. High Robustness adds a bunch more but I couldn’t find non-jargon link in time I had. Just imagine a higher bar. :)
Been thinking about your response, and I have some questions.
I am not a security expert, and I am so clueless I don’t even know what an EAL6+ IS so I am in no way challenging you, but I am wondering:
Do your comments take the intended use case for Firecracker into account? These aren’t traditional ‘heavyweight’ VMs that are intended to be long running. They’re intended to be used in serverless, where each VM spawns with a single ‘function’ or maybe application running, and the VM lasts for the lifetime of the function invocation or application run, and then evaporates.
My extremely naive understanding of a lot of the security problems around VMs stem from using inherent vulnerabuilities in virtualization architectures to get privesc in a VM and then be able to control whatever resources are running in there, but how useful could that actually be if the VM say only lives for the lifetime of a single HTTP response?
I know so little about container tech and other highly-hyped products that I have to be careful commenting on them. There’s levels of analysis to do for tech security: general patterns that seem to always lead to success or failure; attributes of the specific work with a range of detail. I was using lens of general patterns of what prevented and led to vulnerabilities when looking at details of the work. I saw two, risky components immediately. The TCB principle of security means the solution is only as secure as its dependences and how it uses them. That’s before the solution itself. I don’t know much about the use case, the new stuff on top, etc. I just know the TCB poisoned it from the start if aiming for high security.
“but how useful could that actually be if the VM say only lives for the lifetime of a single HTTP response?”
Well, the concern was the underlying primitives could be vulnerable. So, the enemy uses one or more sessions to exploit them, the exploit escapes the isolation mechanism, the malicious code now has read (side channel) or write (backdoor) access to all functions, and it can bypass their security. Their security might mean reading sensitive data, corrupting the data, or feeding malicious payloads to any clients trusting the service (it said HTTPS!). What risk there is for a given application, HTTP response, and so on varies considerably. It will be acceptable for many users as cloud use in general shows.
The amount of people using these platforms means they either don’t know about the risk, don’t care, or find the current ones acceptable cost-benefit analysis. That last part is why I bring up alternatives that were more secure to better inform the cost benefit analysis. An easy example is OpenBSD for firewalls or other sensitive stuff. Genu’s use it if someone wants shrink-wrap. It rarely gets hacked. So, you can focus on your application security. Firecracker uses Rust plus careful design to reduce application security risks. End-to-end Signal instead of text messages is another easy one. These are examples of blocking entire classes of problems with high probability using general-purpose techniques that (a) are known to work and (b) don’t raise costs prohibitively.
For separation kernels, they used to cost five to six digits to license depending on organization but this is Amazon, right? Could probably hire experts to build one like Microsoft, IBM, and some small firms did. Or just buy the owner’s company getting a start on a secure VMM plus a bunch of other IP they can keep licensing/using. ;)
Yes. I think this is the key. The risks are acceptable given most end users cost/benefit analysis. Security is on a sliding scale that balances out against usability/convenience. Firecracker would appear to solve a real problem people are having in that prior VM implementations were too heavyweight to be used in any kind of serverless context.
The fact that, comparatively speaking, these VMs are, if you’re correct, relatively insecure should definitely be kept in mind, but that doesn’t IMO lessen the perceived benefit of being able to have considerably more isolation than previous container technology provided without the traditional startup / shutdown time that made VMs a deal breaker in this particular context.
So, to re-iterate, I’m not arguing with you, I’m asking that you consider whether or not your reservations about the security of this technology might be more or less useful in evaluating whether it makes sense to adopt this technology given the very particular context and use case it’s trying to support.
I figured we were just having a discussion rather than arguing since your wording was kind. :)
There’s two angles to that which stem from the fact that Amazon isn’t telling its customers that other tech exists that’s way more secure, they’re ignoring it on purpose to maximize every penny of profit, they’re pushing insecure foundations for critical apps, and trying to grab markets worth of money out of that which, again, won’t improve the foundations. If customers hear that, would they (or those use cases):
Be cool with that in general and still buy Amazon happily?
Think that’s fucked up before:
2.1. Buying something secure based on their recommendation after telling Amazon they lost business cuz of this. Update their offerings.
2.2. Grudgingly buy Amazon due to attributes it has, esp price or a needed capability, that the more secure offerings don’t have.
I’m very sure there’s a massive pile of people that will do No 1. I’m just also pretty sure there’s people that will do either 2.1 or 2.2 based on fact that companies and FOSS projects with security-focused software still have customers/users. I can’t tell you how many. I can tell you it’s a regrettably tiny number so much that some high-security vendors go out of business or withdraw secure offerings each year. Some stick around with undisclosed numbers. (shrugs)
I thought it was especially important to bring up possibility of No 2. The existence of high-security techniques that small, specialist teams can afford is something unfamiliar to most of the market. My feedback on HN and Lobsters, highly-technical forums, corroborates that. You can bet the big companies did that on purpose, too, for their own profit maximization. So, that’s where folks like me come in letting product managers, developers, and users know there were alternative methods. Then, they get to do something they reasonably couldn’t before: make an informed choice based on the truth, not lies. An example of that happening, probably more for predictable performance, was the shift of some of cloud market to SoftLayer and other bare metal hosting despite good VM’s being cheaper.
They might make the same choice, esp if market doesn’t have better offering. At least they know, though, that maybe dedicated hosting w/ security-focused stacks makes more sense if they care about security. Many of the micro- and separation-kernels supported POSIX or Linux VM’s, too. So, they could even use the untrusted, but minimalist, stuff to reuse legacy code/apps even if it wasn’t good at stopping malicious neighbors. :)
I think they mean that their approach is more secure than containers? Or isn’t actually that the case?
That’s what they’re asserting, and from where I sit it’s very likely. All you have to do is read anything written by Dan Walsh in the last few years to understand that securing workloads in container enviironments is entirely possible, but decidedly non trivial, because of the relative lack of isolation you’re actually getting from containers when they run in your garden variety Linux system.
Dan has long been a proponent of container hosts running things like SELinux to increase the isolation potential, but as anyone who’s ever tried turning it on knows, administration of SELInux systems can be challenging for the uninitiated.
That could be true.
This is very exciting and I’ll have to make some time to play with it!
My background is in cloud hosting, but my last couple of jobs have been in different spaces and involved building internal multi-tenanted platforms (read: Kubernetes, at the moment). This sort of thing is probably overkill for those situations - which is fine, because we need good tools in the hosting space and this looks very promising, just for a different problem set to my own.
Purely out of curiousity, though, does anyone have a significant use-case for this outside the hosting business? Is anyone here looking at this and thinking “hey, that’s exactly what we need”?