1. 1

    How is FreeBSD’s Linux emulation these days? Is anyone running Linux-based Docker/OCI containers on FreeBSD in production (without running Linux itself through virtualization)?

    1. 4

      It’s improved a lot in 13. It doesn’t support seccomp-bpf though, so you can’t run the Linux container management programs. It’s probably good enough to run a lot of Linux containers in jails, but the orchestration code isn’t there yet.

      1. 3

        Stupid not a FreeBSD user question: Why would you do that? Aren’t jails the moral equivalent? Or would one want to run Docker for simple software distribution convenience purposes?

        1. 4

          Docker / OCI containers in common use conflate a bunch of things:

          • A way of distributing a self-contained userspace thing.
          • A way of building a self-contained userspace thing using layers of filesystem overlay.
          • A way of orchestrating local deployment of self-contained userspace things.
          • A way of isolating self-contained userspace things.

          Jails provide the fourth of these (in a significantly lower-overhead way than the horrible mess of cgroups and seccomp-bpf on Linux), but they don’t provide any of the other bits. Between ZFS and jails, FreeBSD has great mechanisms for running container-like things, but doesn’t yet have good tooling for building and deploying them.

          The containerd port and runj program linked by @kwait are likely to end up with the right things here. That should make it possible to build layers, package them up and deploy them on FreeBSD. The bit that isn’t currently getting any investment is making runj able to deploy containers that are packaged as Linux binaries running on the FreeBSD Linux ABI layer.

          1. 2

            There is also Bastille, which looks pretty nice. IIUC it builds on FreeBSD jails and takes care of the distribution and deployment aspect (your first point).

            1. 2

              in a significantly lower-overhead way than the horrible mess of cgroups and seccomp-bpf on Linux

              As I understand it, the main Linux facility for this kind of isolation is namespaces. I’m not sure how seccomp-bpf found its way into container tools, but presumably “for extra security”.

              Namespaces should have the same kind of overhead (basically none) as jails. The main difference is that namespace API is additive (you tell it “isolate PIDs” then “isolate the networking” and so on, building the sandbox piece by piece) while the jails API is subtractive (you kinda just start with a package deal of full isolation, but you can opt out of some parts – set the FS root to /, or the networking to the host stack). Namespaces are more flexible, but much harder to use securely.

            2. 3

              It would be nice to be able to run the FreeBSD kernel, to have ZFS without entering a licensing gray area if nothing else, while being able to take advantage of both all the software available for Linux and the accumulated tooling and practices around Docker (especially when it comes to building container images). Since a Docker image is basically a JSON manifest and a bunch of tarballs, maybe it wouldn’t be too hard to write a tool that could fetch and unpack a Docker image and run it in a FreeBSD jail.

                1. 2

                  That is really, really rad. I need this in my life.

                  1. 4

                    runj is my project! It’s nice to see other folks excited about it. There’s some previous discussion here. No Linux support yet; I’m focusing on a FreeBSD userland first.

                2. 3

                  Given that zfs is now deployed on millions of Ubuntu installs around the world I’m not sure how much weight I’d place on said gray area.

                  YMMV however.

                  1. 2

                    I just never understood this. The CDDL is not compatible with the GPL, and this prevents ZFS being part of the same code, though it can be installed as an external module, and from what I understand ZFS on Linux works fine.

                    What’s the legal grey area here? How is this different from installing an MIT (or any other GPL-incompatible) licensed project on your Ubuntu machine?

                    1. 5

                      As I understand it (I am not a lawyer, this is not legal advice), the issue comes from a bunch of different things:

                      First, the GPL says that any software derived from GPL’d software must impose the conditions of the GPL on the combined work and may not impose any additional conditions. This is usually paraphrased as saying that it must be GPL’d, but that’s not actually the case. It’s fine to ship a BSD-licensed file as part of a GPL’d project. The GPL also talks about ‘mere aggregation’. Just distributing two programs on the same medium is explicitly excluded from the GPL, but linking them may trigger the license terms.

                      Second, there’s a bit of a grey area about exactly what the GPL applies to in Linux. Linux is distributed under GPLv2, but the source tree includes a note written by Linus (which is not part of the license and not written as a legal document) that the GPL obviously doesn’t apply across the system call boundary. Some internal kernel symbols are also expose as public non-GPL-propagating symbols but that is not actually part of the license. To make this more fun, some bits of code in the kernel were released as GPL’d code elsewhere and then added to the Linux kernel and so it’s possible for the copyright holders of this code to assert that they don’t believe that these exemptions apply to their code. This is somewhat moot for ZFS because uses GPL-tainted kernel symbols.

                      Third, the GPL is a distribution license. This means that there are only two things that it can prevent you from doing:

                      • Distributing a GPL’d project
                      • Distributing something that is a derived work of a GPL’d project.

                      Typically, the work-around that companies such as nVidia use is to write a driver that is developed completely independently of the Linux kernel and is therefore not a derived work of the Linux kernel, then write a shim layer that is a derived work of the Linux kernel and is able to load their non-GPL’d driver. They cannot distribute the two together (because the GPL would kick in and prevent distribution of a thing where the combined work does not grant all of the permissions found in the GPL), but they can distribute their own code (they own it) and the shim (by itself, it is GPL compliant). A customer can then acquire both and link them together: the GPL is explicitly not a user license, once you have received the code you are free to use it in any way, including linking it with things where you are not permitted to release the result).

                      So using ZFS on Linux is fine, the tricky bit is how you distribute the the CDDL’d component and the Linux kernel together.

                      My general view of Linux legal questions is that the vast majority of users are doing something that could be regarded as a violation of the license but no one with standing to sue has any incentive to torpedo the ecosystem.

                      1. 1

                        Thanks for the detailed answer. This clears it up. 🙏

            1. 8

              Oh hey this is my project! I wrote up a little explanatory blog post since I didn’t really expect runj to get quite this much attention so quickly.

              1. 2

                Seems really quite polished for something that’s not quite ready for prime time :)

                1. 1

                  Thanks! I’d love feedback if you get the chance to try it out.

                  1. 1

                    I’d like to see a simpler build process. I realize that Dockerfile compatibility is a pretty lofty goal, but all-in-all, Dockerfile is a work of art – regardless of what you think of the entire Docker ecosystem. It’s extremely simple for the end user to Get Things Done.

                    1. 1

                      Thanks for the feedback! As an OCI runtime, runj is intentionally limited and pretty low in the stack; the idea is that higher-level tools can be built on top of runj for the kinds of use-cases you’re describing. This is similar to runc (the reference OCI implementation), which both containerd and Docker use in order to implement the container primitives on Linux.

                      I’ve been working on a containerd shim and porting containerd to FreeBSD to start building up the pieces of the puzzle. I don’t know that I’ll try to port Docker itself (it’s a very large codebase), but nerdctl is a command-line utility that exposes a very similar interface to Docker and has the ability to build images from a Dockerfile. It’d be neat to get all of these pieces working together.

                2. 2

                  The existence of this project makes me unreasonably happy. Please feel free to reach out to me if you’re confused by any FreeBSD idiosyncrasies (jails, capsicum, firewalls, packages, whatever).

                  1. 1

                    Thanks! I’m brand-new to FreeBSD and trying to learn as I go. I appreciate the offer; a few other folks have made similar offers and one has already added runj to the ports tree.

                    If you have the chance to take a look at the codebase for runj and have any feedback on how it leverages the existing components of FreeBSD (like my very basic jail.conf or which image I’m testing with), I’d love to hear your suggestions.

                    1. 2

                      I’m a bit surprised that you’re using a jail.conf at all. I would have expected this to be an alternative jail front end, rather than building on top of the existing command-line tool, which is a very thin wrapper around the a small set of system calls. The jail tool is really designed around managing a small set of static jails, which is quite at odds with an OCI environment where jails are dynamically created, configured, and destroyed. This approach may work, but I suspect you’ll hit impedance mismatches.

                      If you look at the jail tool’s source code, you’ll see that a huge amount of the code is related to parsing commands and config files and a fairly small amount of it is actually managing the jails. The jexec tool actually does most of what you want and is under two hundred lines of code.

                      If you have the jail-related system calls exposed into Go, then you probably have everything you need (until you get into firewall rules, then you need to deal with the fact that FreeBSD has 3 different firewalls…).

                      1. 1

                        I picked the jail tool to start with as I had hoped it would be a faster way to get a proof-of-concept up and running, but the feedback I’m hearing from you and from others on Twitter makes it seem like I should switch sooner rather than later.

                        go-jailsbsd is a wrapper for the jail-related syscalls in Go, though it looks like it’s linking against the libc functions to do it rather than calling the syscalls directly (which is more typical in Go).

                        1. 2

                          I’d hope using the libc wrappers isn’t a show stopper. I personally dislike Go’s aversion to doing things the recommended way for their targets and I remember every single Go program ever built for macOS breaking as a result of this not too long ago. FreeBSD maintains a backwards-compatible ABI from the kernel but explicitly excludes some control-plane interfaces from this (e.g. network configuration). These are stable only within a major release and differences between them are typically hidden in the userspace libraries for code that links them. Various other bits of the Go ecosystem mean that it’s unlikely that anything written in Go will ever be adopted into the base system, which is a shame for something like runj - I’d love to have OCI container support out of the box on FreeBSD, though given that things like K8s will still need to come from ports it isn’t a huge concern.

                          There are also a few Go implementations of the userspace interfaces to the ZFS ioctls. You might take a look at Poudriere for some inspiration. It is a lot more specialised than a container runtime, but actually does almost everything that I think a container runtime needs to do:

                          • Creates ZFS clones from base images.
                          • Layers nullfs mounts over the top (not unionfs, it was unstable when Poudriere was originally built, and because it targets a specific narrow use case, it was able to design around the need for overlays).
                          • Creates jails that run on these FS trees.
                          • Runs commands inside those jails.
                          • Captures output from the commands run in the jails.

                          The main thing that it’s missing (aside from providing generic configuration over the top) is any form of network configuration. The jails that Poudriere manages have simple network config so that they can fetch packages, but they’re assumed to be non-malicious so the host isn’t protected.

                          The other thing that’s less generally important but perhaps more interesting is that Poudriere always sets up jails with a FreeBSD userland. It would be very interesting if runj could run Linux containers in the Linux compat ABI is installed and would give FreeBSD users a much easier way of running some Linux things than trying to get the CentOS userland packages working (and failing because things need newer packages and then having to build their own libstdc++ and so on…).

                          1. 1

                            I’d hope using the libc wrappers isn’t a show stopper. I personally dislike Go’s aversion to doing things the recommended way for their targets and I remember every single Go program ever built for macOS breaking as a result of this not too long ago. FreeBSD maintains a backwards-compatible ABI from the kernel but explicitly excludes some control-plane interfaces from this (e.g. network configuration). These are stable only within a major release and differences between them are typically hidden in the userspace libraries for code that links them.

                            Definitely not a show-stopper, just something I haven’t prioritized figuring out yet since the jail tool was enough for me to get started. I remember reading that Go 1.16 will stop making direct syscalls on OpenBSD and it sounds like similar reasons apply here.

                            Various other bits of the Go ecosystem mean that it’s unlikely that anything written in Go will ever be adopted into the base system, which is a shame for something like runj - I’d love to have OCI container support out of the box on FreeBSD, though given that things like K8s will still need to come from ports it isn’t a huge concern.

                            I’d love to learn more about what’s stopping Go from being adopted into the base system; I’m still very new to FreeBSD and don’t have that context yet.

                            • Creates ZFS clones from base images.
                            • Layers nullfs mounts over the top (not unionfs, it was unstable when Poudriere was originally built, and because it targets a specific narrow use case, it was able to design around the need for overlays).

                            I’ve updated containerd to use nullfs mounts for the “native” snapshotter as the simplest implementation based on copy-ahead semantics. containerd has a ZFS snapshotter too, but I haven’t tried it yet since I already have “native” working – similar to the use of the jail command I was first looking to get the simplest thing running before going back and optimizing each component.

                            It would be very interesting if runj could run Linux containers in the Linux compat ABI is installed and would give FreeBSD users a much easier way of running some Linux things

                            Almost everyone I’ve talked to about runj prior to me posting it on the Internet had said something similar. I’m focusing first on a regular FreeBSD userland because that’s more interesting to me, but I don’t really see any major reasons why you couldn’t use runj as part of a system to run Linux container images. It might even work today; if you have a bundle with the Linux compat ABI installed?

                            1. 3

                              I’d love to learn more about what’s stopping Go from being adopted into the base system; I’m still very new to FreeBSD and don’t have that context yet.

                              The big one is toolchain stability and portability. The Plan 9 Go toolchain doesn’t support all of the architectures that FreeBSD supports. Even if it did support all of the architectures that FreeBSD supports today, the fact that it uses a different back end to the LLVM that the other base-system compilers use means that it’s likely to be an obstacle to bringing up new architectures or even new ABIs or variants in existing ones. For example, FreeBSD is likely to upstream CHERI support soon, but Go is unlikely to support any CHERI architectures for a while. That means a base system tool written in Go would not be able to adopt the same hardware-security features as the rest of the system.

                              Secondly, a major release series of FreeBSD is supported for around five years. Minor releases now generally do bring in newer versions of clang / lld, because they come with new features that are useful, but a new version of clang is expected to be able to compile all of the C/C++ code that the previous version supported (and if it can’t then it’s a bug, and if a new clang can’t build the FreeBSD base system then that will be a release blocker for that LLVM release until it’s fixed). Bringing in a new toolchain doesn’t cause any code churn anywhere else in the tree. As I recall, that’s not a guarantee that you get from Go, you may need to run go fix and clean up things.

                              That said, in the context of the rest of the OCI infrastructure, you probably wouldn’t want everything in the base system, because there’s a sufficiently high level of churn that providing 5-year stability guarantees in the interfaces is not (yet?) feasible. It’s more important that it’s easy to install from packages and that it’s easy to build a VM image that has an up-to-date install of everything.

                              I’ve updated containerd to use nullfs mounts for the “native” snapshotter as the simplest implementation based on copy-ahead semantics. containerd has a ZFS snapshotter too, but I haven’t tried it yet since I already have “native” working – similar to the use of the jail command I was first looking to get the simplest thing running before going back and optimizing each component.

                              Having ZFS working out-of-the-box is one of the selling features of FreeBSD and it’s over a decade since I ran a FreeBSD machine with anything else. Poudriere leans heavily into ZFS features for a lot of things. Trying to have UFS fall-back paths is one of the things that’s held back a lot of things in the FreeBSD world and prevented it from capitalising on the sorts of things that ZFS enables. For something like runj, taking a hard dependency on ZFS is probably not a problem.

                              Almost everyone I’ve talked to about runj prior to me posting it on the Internet had said something similar. I’m focusing first on a regular FreeBSD userland because that’s more interesting to me, but I don’t really see any major reasons why you couldn’t use runj as part of a system to run Linux container images. It might even work today; if you have a bundle with the Linux compat ABI installed?

                              I don’t know what happens now with runj if you try to load a Linux container - does it parse the Linux options and understand them? It probably needs to do things like mount linprocfs inside the jail, rather than procfs, for example and may need to load the linux / linux64 kernel module if it isn’t already loaded. For a lot of Python stuff, which is built against an ancient CentOS so that it works on any vaguely modern glibc-based Linux, the FreeBSD Linux compat layer should work fine and that would be really useful. Solaris’ Docker port decided to make the Linux ABI their default container ABI, but I wouldn’t suggest that FreeBSD do the same because the Linuxulator is not quite good enough to expect everything to work (plus you’d lose out on some of the benefits of FreeBSD, such as Kqueue and Capsicum - given the choice, I’d write code against the FreeBSD syscall interface and not the Linux one, even if the Linuxulator were 100% compatible).

                              1. 1

                                Thanks for all that background about FreeBSD and Go, that was super interesting.

                                For something like runj, taking a hard dependency on ZFS is probably not a problem.

                                The way the OCI runtime spec is written, a runtime should expect the root filesystem for the container to be mounted at a path described in the bundle, but there’s no specific filesystem. The containerd ZFS snapshotter is what I’d want to use to provide the copy-on-write layer functionality when running jails via containerd.

                                I don’t know what happens now with runj if you try to load a Linux container - does it parse the Linux options and understand them?

                                No, not yet. It sounds like there’s additional work that’ll need to be done to get this to work, but that it’s not insurmountable.

                                Solaris’ Docker port decided to make the Linux ABI their default container ABI, but I wouldn’t suggest that FreeBSD do the same

                                I agree. I’m a lot more interested in FreeBSD being the default environment here, as if I want to run Linux containers I’d typically run Linux instead of emulating it on top of another operating system.

                1. 3

                  I think this is a case where the traditional packaging model falls apart. First, because the model makes it hard to programmatically generate package definitions, secondly because having multiple versions of a package is typically conflicting (unless you make versions part of the package name, rather than version).

                  This is less of a problem for functional package managers, because their use of turing-complete languages (Nix, Scheme) make they it easier to generate package definitions programmatically. Secondly, since they permit storing multiple versions of a package in parallel, many versions can be installed in parallel without any conflicts.

                  See e.g. crate2nix (which generates Nix from Cargo.toml/Cargo.lock files), bundix, yarn2nix, etc.

                  1. 4

                    This is a solution. But a shallow one frankly.

                    In traditional distribution management security issues are solved by fixing one part of the puzzle. With the hermetic ones you are patching each and every dependant package induvidually. And who knows which version they use? Rust publishes security advisories and deals with guidance to project maintainers. But who else does this? And do maintainers follow them to actively update their project dependencies?

                    Would a project publish patch releases just to fix security issues in dependant packages?

                    Nix/Guix/Docker and other hermetic build systems pushes the patching down to the respective upstreams because the tools to bridge this is not available. You can’t programatically inspect the vendored dependencies and figure this out. The tooling doesn’t exist.

                    I don’t think it solves anything.

                    1. 2

                      I don’t think it solves anything.

                      First, it solves the issue that you can manager these programs using your system package manager, rather than relying on curl | bash, Docker containers, etc. As a result, it provides better integration (e.g. management of services through systemd units, etc.).

                      Also, similar to how you can patch dependencies for security issues in a traditional package manager, you can apply patches to specific versions. Most functional package sets provide override mechanisms. These are already used to specify e.g. for Rust openssl crate derivation to provide the native OpenSSL library as a dependency (since this information is not provided by the Cargo metadata). This mechanism could be extended easily to override certain packages or package versions to apply an additional patch.

                      1. 2

                        First, it solves the issue that you can manager these programs using your system package manager, rather than relying on curl | bash, Docker containers, etc. As a result, it provides better integration (e.g. management of services through systemd units, etc.).

                        That is the job for any package manager though. There is nothing inherently special here.

                        Also, similar to how you can patch dependencies in a traditional package manager, you can apply patches to specific versions. Most functional package sets provide override mechanisms. These are already used to specify e.g. for Rust crate openssl provide the native OpenSSL library as a dependency. This mechanism could be extended easily to override certain packages or package versions to apply an additional patch.

                        I think you are missing the point though. The problem is devendoring the project and/or keep track of the packages. Along with patching them. Remember, there are multiple copies of the library on your system. Openssl is a fairly simple case since it’s fairly trivial to provide multiple versions on a system level. What is not easy is to keep track of 3 different versions of a library inside node_modules and ensure it’s patched.

                        Same applies for Go. Here is containerd depending on 4-5 different versions of runc(!)

                        https://github.com/containerd/containerd/blob/master/go.sum#L403-L408

                        1. 1

                          Same applies for Go. Here is containerd depending on 4-5 different versions of runc(!) https://github.com/containerd/containerd/blob/master/go.sum#L403-L408

                          containerd depends on a single version of runc. You can also see there’s a single copy in the vendor folder.

                          go.sum contains information other than the modules currently in-use:

                          Is ‘go.sum’ a lock file? Why does ‘go.sum’ include information for module versions I am no longer using?

                          No, go.sum is not a lock file. The go.mod files in a build provide enough information for 100% reproducible builds.

                          For validation purposes, go.sum contains the expected cryptographic checksums of the content of specific module versions. See the FAQ below for more details on go.sum (including why you typically should check in go.sum) as well as the “Module downloading and verification” section in the tip documentation.

                          In part because go.sum is not a lock file, it retains cryptographic checksums for module versions even after you stop using a module or particular module version. This allows validation of the checksums if you later resume using something, which provides additional safety.

                          In addition, your module’s go.sum records checksums for all direct and indirect dependencies used in a build (and hence your go.sum will frequently have more modules listed than your go.mod).

                          (see the Go modules FAQ)

                    2. 4

                      This is just sweeping the problem under the rug IMHO. I personally believe that the root problem is bad backward incompatibility and crappy semantic versioning of software libraries. To this, add application developers who don’t try understand the library and uses private features which the library developer didn’t mean to expose through the API.

                      You end up with software requiring lib.2.3.5-beta.so and which will break with anything else.

                      Some projects have an obsession with backward compatibility, you rarely hear about an application having a “maximum linux kernel version” for example.

                      I guess software engineering is not a craft, and never was…

                      1. 5

                        I personally believe that the root problem is bad backward incompatibility and crappy semantic versioning of software libraries.

                        That is not the origin of the problem that the article states, which is that a lot of software nowadays rely on, or vendor, a large number (hundreds or thousands) of dependencies. I think this has little to do with semantic versioning, and more to do with that language-specific package managers make the cost of adding dependencies so low for developers.

                        Since C and C++ did not provide any standard package managers, Linux distributions effectively became the de facto C and C++ package managers. When you develop a C/C++ application, you are constrained to a common set of libraries provided by the major Linux distributions, or you or your users end up manually managing dependencies. Consequently, Linux distributions were a fairly strong incentive to use a smaller set of widely-used dependencies, and conservatively developing against older versions (for Debian Stable, Ubuntu LTS, and RHEL compatibility).

                        But now that there are language-specific package managers with package repositories where anyone can upload libraries (crates.io, PyPI, etc.), the traditional constraints on the number of dependencies and maximum version of dependencies is lifted. As a consequence, people just add more dependencies and more recent dependencies (e.g. many Rust crates do not compile with a one-year old Rust compiler).

                        Distribution cannot keep up with packaging all these dependencies in many incompatible versions, so other mechanisms for distribution are taking over, such as a single static binary, a Docker image, Flatpak, AppImage, or Snap. I wouldn’t be surprised if the role of mainstream distributions is reduced in a few years to: 1.) a lean OS to build container images; 2.) a lean server OS to run containers; 3.) a lean desktop OS where all the applications are distributed as Flatpaks.

                        I am not convinced this is better, but it is a transition that has happened over the past decade or so, and I think that ship has sailed.

                        To this, add application developers who don’t try understand the library and uses private features which the library developer didn’t mean to expose through the API.

                        Unless the downstream developers are forking the library, this seems more like a library design issues. Do not expose APIs that you do not want downstream users to use. Even though such vendoring + patching (effectively creating a fork) happens in some projects, this is AFAIK not the norm for e.g. vendoring in Go.

                        1. 2

                          many Rust crates do not compile with a one-year old Rust compiler

                          Ironically, crates may require newer Rust versions, because they want to use functionality recently added to the standard library. So it’s the opposite case — avoiding dependencies causes churn.

                    1. 4

                      Why is FreeBSD moving from an Apache licensed project to a GPL licensed one?

                      1. 7

                        There are multiple permissively licensed git implementations, including game of trees (which is slated to replace cvs for openbsd), and git9 (by @orib, admittedly probably not easily portable outside of plan9.

                        Since the format itself is not GPL’d, and the base system doesn’t need to ship with a git client, I expect it’s not considered a huge problem; and once game of trees is released it can be the officially-sanctioned client.

                        1. 6

                          As far as I know OpenBSD has no official plans to switch away from CVS.

                          1. 1

                            Am I the only one who thinks that this GPL-aversion is incredibly childish and cringe-worthy?

                            Projects can be good community neighbors or bad community neighbors, and in terms of free software, they are certainly poor neighbors.

                            1. 2

                              One of the goals of OpenBSD is that it’s “free” as in “free to do whatever you want with it”. This is a different definition of “free” as used it the GPL. There’s nothing wrong with the GPL definition as such, but having GPL in the base system removes the OpenBSD definition of “free”, and I don’t think caring about that is either “childish” nor “cringe-worthy”.

                              1. -4

                                I’d love to know what’s the ideological overlap between Blue-Lives-Matter/Anti-MediaCare-For-All//Anti-Womens-Rights/… crowd and BSD devs. They certainly give off the same “screw you, got mine” vibe.

                                1. 1

                                  lolwut? I don’t know if I should upvote this as it’s hilarious, or downvote as troll.

                                  1. 1

                                    From my perspective as a non-US-person, these groups feel pretty close in their regressive mindset and the corporate worshiping.

                                    That’s why I was curious.

                                    1. 2

                                      If you consider OpenBSD an act of corporate worship, you have an unusual way of thinking. I struggle to see the similarity.

                                      1. 1

                                        Yes, I do.

                          2. 2

                            There’s a good discussion of the reasoning here.

                          1. 6

                            To briefly summarize, this was an attack on Red Hat’s model of backporting patches to older versions of software (also known as “enterprise support”). Red Hat at that time shipped a version of Docker that was just slightly behind the latest cut, whereas Docker shipped their latest.

                            I have never worked for either Docker or Red Hat, but my understanding was that the conflict was related primarily to Red Hat shipping patches in their build of Docker that were rejected upstream, such as this one about overriding the default registry, this one that adds partial audit logging to syslog, this one allowing hook specification to enable systemd integration, and this one for injecting RHEL licensing secrets to containers. Many distributions (including Amazon Linux, which I work on) backport bugfixes, but fewer distros include wholesale-rejected feature patches in their builds; I can understand an upstream project being concerned about a very different experience for users of those distro-maintained packages when the set of supported features are different. This list was last updated in 2016, but gives a bunch of background on the types of patches Red Hat carried at the time.

                            1. 4

                              Yeah it was pretty clear at the time that RedHat initiated the problem by essentially attempting to strongarm the Docker project into decisions they didn’t want to make. Not to mention that RedHat’s rejection of AUFS in their kernels forced Docker to support RedHat’s implementation of devicemapper as a filesystem driver… which was a complete mess for a long time, and made everyone’s user experience worse.

                            1. 1

                              At work, I’m continuing to work on building a variant of Bottlerocket for Amazon ECS (and writing bad Rust, as I am very much a beginner in the language).

                              Outside work, I’m starting to play Animal Crossing.

                              1. 16

                                On one side, I strongly agree with this. I use GCP and DigitalOcean often to outsource what I do.

                                On the other hand, I’m watching an entire community of people put out fires because they built their IT on a managed service which Apple bought and effectively terminated yesterday, causing people to wake up to entire fleets of devices with broken policies.

                                Like everything else in tech, there’s no right answer, rather it’s a set of tradeoffs someone has to make.

                                1. 2

                                  MASSIVE DISCLAIMER I WORK ON GOOGLE CLOUD

                                  I think there is definitely a difference between using AWS/Azure/GCP/AliCloud and a startup like Fleetsmith. I feel super sad for the people that got impacted, as that sunset is really bad (I know that GCP has a 1 year sunset for GA products). If you’re using say, GKE for your k8s clusters, you can be confident that’s not going away.

                                  Yesterday I was trialing EKS (k8s) on AWS. I did not like the experience, I ended up abandoning the AWS native method for a third-party tool called eksctl and it still took ~30m to provision a 2 node cluster. I cannot begin to imagine how one would self host a k8s cluster.

                                  So yes, there are trade-offs, but I think there are definitely ways to mitigate them.

                                  P.S. Given the Fleetspeak turn-off, one great service going away that would keep me up at night is PagerDuty, there really is no product that I know of that is anywhere near as good.

                                  1. 3

                                    a difference between using AWS/Azure/GCP/AliCloud and a startup like Fleetsmith

                                    Is there thought? So only use a big provider (AWS/GCP/Azure) for your startup project? No Digital Ocean/Vultr? Those are both fairly large shops with a lot of startups on them. But they’re also not too big to fail. Digital Ocean is offering more managed services (databases and k8s) but if they ever declared bankruptcy, your startup will be scrambling for another service (and could find yourself with a much higher bill on the big three).

                                    I’d rather see more open source management solutions for things like simply full redundancy management for postgres or mysql. What I’ve found is that most shops that have this kind of tooling keep it under lock, and it’s proprietary/specific to their setup.

                                    I think managed services are bad due to cost and lockin, and they’re also having the side-effect on slowing innovation for better tooling so people can self-host those same solutions.

                                    1. 2

                                      Yes, the loss of DigitalOcean in particular would be a huge blow to the ecosystem. Their documentation in particular is fabulous.

                                      I’m unclear about whether I’d agree with lock-in as long as you are judicious if this is a concern, e.g. Google Cloud SQL is just Postgres/MySQL with Google juju underneath to make it run on our infra. There’s nothing stopping you dumping your database at any time. Same goes for something like a service like Cloud Run where you’re just deploying a Docker container, you can take that anywhere too. But then if you go all in on GCP BigQuery, then yeah, you’re going to have a harder time finding somewhere to take a data dump of that to.

                                      1. 1

                                        Is there thought?

                                        I would say that the difference isn’t big provider vs startup but infrastructure-as-a-service vs software-as-a-service. Sure the major cloud providers have some software they offer as services but they all also have VMs that you can spin up and install whatever you want on. It’s not like you can install Fleetsmith on your own machines.

                                      2. 1

                                        Disclaimer: I work on containers for AWS, but not directly on EKS

                                        Just a note here that eksctl is the official and recommended tool for interacting with EKS. You can find it in the EKS user guide here.

                                    1. 2

                                      At work, I’m learning the packaging workflow for some of the software in Amazon Linux. I’m also writing planning documents, including some of the details of how we’ll add ECS support to Bottlerocket.

                                      At home, I’m trying to relax after last week, when I spent a bunch of time working on a talk for LinuxFest Northwest.

                                      1. 4

                                        The code is not in a hot path, and micro-optimizing it is not needed. But still wanted to know what’s faster.

                                        I should have stopped reading right there :)

                                        And it resulted in a Go patch of rather dubious real-world use, as noted by the commenters on that PR.

                                        1. 4

                                          The part that I found compelling (and the reason I posted it here) was the description of the process that they went through (as a first-time contributor to the compiler) for understanding how the compiler generates optimizations and building a new one.

                                          1. 2

                                            True! That’s indeed the good part.

                                        1. 1

                                          This ALL-CAPS style makes it a terrible read.

                                          1. 2

                                            Annoying as it is, I find all these dotfiles in $HOME far more annoying.

                                            1. 2

                                              It is all monospace, not all-caps.

                                              1. 1

                                                There’s a CSS property that makes the body text appear all-caps for me:

                                                font-feature-settings: "liga","tnum","case","calt","zero","ss01","locl","calt";
                                                

                                                The “case” feature seems to be what’s controlling it. I’m using Firefox 75 on Ubuntu.

                                                1. 1

                                                  It changed since the time I looked at it. Previously it was all-caps.

                                                  1. 1

                                                    Oh, actually it’s all-caps on Firefox 76, but not on Firefox ESR.

                                                1. 2

                                                  https://samuel.karp.dev

                                                  I work on Linux containers professionally, so that makes up a good portion of what I write about. A couple of my posts appeared on the official AWS blogs first, but I syndicate those to my own site (see this one about a container escape or this one about Bottlerocket) so I can have my own copy. I set this blog up last year after the .dev domains became available and wrote a bit more about what I want the blog to become in my welcome post.

                                                    1. 12

                                                      and technical blogs: https://lobste.rs/s/l7b3iy/

                                                      1. 3

                                                        This one’s from five years ago though. It’s nice to ask again and see people who started blogging in the last five years or who weren’t on lobsters five years ago.

                                                      2. 1

                                                        Oh this was asked before sorry i sparked this thread again

                                                      1. 1

                                                        I’ve been meaning to write this up, but I have two personal static websites (one is a blog). They both use git and GitHub for source control and repository management, build with Hugo inside CodeBuild, and deploy to S3 and CloudFront with ACM certificates. (I work for AWS, which is part of why I decided to use a bunch of AWS services, but this is my own paid personal account.)

                                                        1. 1

                                                          It took me a few months to do this, but I’ve now documented how the blog works.

                                                        1. 3

                                                          A solution to avoid tracking would be for webmail to pre-download all the images and serve those to the clients. That way the tracking data would be completely bogus but clients can still see the images.

                                                          1. 1

                                                            The tracking data would not be completely useless because the URL to the image can contain a unique identifier.

                                                            1. 4

                                                              If GMail always fetches all URLs in the mail before you even open or regardless if you log in to your account then it renders the tracking feature useless as that’s no longer the user reading your mail - in fact the user may never do that.

                                                              1. 1

                                                                I’m assuming you mean the image resources, because a lot of email I receive contains one-time-use links that expire the second they get requested. Any client prefetching those would render the mechanism broken.

                                                                1. 1

                                                                  Spilled.ink purports to do this (listed under “Server asset fetch”)

                                                                2. 1

                                                                  Since the mail server would pre-fetch all the images, the sender would see somebody loading the tracking pixel from the mail server IP at the time where the email is received.

                                                                  Gmail already does something similar where it proxies all image requests through their servers to hide the reader’s location. This would be going one step further and even hide open times.

                                                              1. 4

                                                                (disclosure: I work on containers, but this is my own opinion and not necessarily that of my employer)

                                                                I don’t really agree that this is “more secure”. The strongest argument here is that podman’s use of fork/exec directly from the command that you run instead of a client/server model makes things more traceable when using auditctl/the Linux audit framework. A better title would be “Podman: A more auditable way to run containers” since that’s the central argument of the article.

                                                                The last section adds a few other, otherwise unrelated, thoughts. SD_NOTIFY and socket activation are both systemd features that are unrelated to security. The portion about “running Podman and containers as a non-root user” is relevant to security, but claiming that you never use root privileges is somewhat imprecise; manipulating the primitives that make up a container (primarily namespaces and filesystem mounts) require elevated privileges (CAP_SYS_ADMIN) and the podman binary thus either needs setuid or setcap bits.

                                                                1. 5

                                                                  I found the marketing page to be a bit more understandable, since it demonstrates how to use the package registry with various clients.

                                                                  1. 2

                                                                    AWS customers can find relevant information about the vulnerability and any customer action required for mitigation in the AWS security bulletin.

                                                                    Disclosure: I am an AWS employee working on containers

                                                                    1. 2

                                                                      Does AWS use runc/lxc/other popular containers for its compute service’s host environment?

                                                                      1. 1

                                                                        The security bulletin contains all the relevant information about this vulnerability and what customer action is required for mitigation.

                                                                        1. 2

                                                                          Oh, sorry, I was just asking out of curiosity as an aside.

                                                                          1. 6

                                                                            MOVE ALONG、HUMAN

                                                                            1. 4

                                                                              Oh, got it! So I think the challenge is that there isn’t a simple answer to your question. Amazon uses a bunch of different technologies, in different combinations, for different purposes. Some of them we’ve talked about in public, but not everything. And I only know my own little corner of it super well; I don’t feel comfortable speaking about the implementation of the things I don’t work on because I don’t want to be inaccurate or imprecise.

                                                                              I think some of the interesting ones (from a compute perspective) are the Nitro system architecture that’s used for the newer instance types in EC2 and the new Firecracker VMM used in Lambda and Fargate for function- and container-like workloads (full disclosure: I currently work on firecracker-containerd to bridge container-like workloads into microVMs without the full Fargate system).

                                                                              For the container services, Amazon ECS and Amazon EKS both use Docker (which uses runc) in the default configurations (running in EC2 instances under your control, so Docker runs inside the AWS hypervisor). The bulletin that I linked above contains information about the patched Docker RPM for Amazon Linux and the patched AMIs for ECS and EKS.

                                                                              1. 1

                                                                                Firecracker looks pretty interesting! I confess that over the last five years or so containerization/virtualization has really exploded in a way that I personally find difficult to keep up with.

                                                                                It’s great to see the innovation but it’s tough to understand what distinguishes all of these new products from various vendors. I frequently find myself asking “how is this better than X? Does it even obsolete X or is it complementary?” The fact that these new features are often introduced with those stack/layer diagrams does help a lot, though.

                                                                      1. 3

                                                                        This is good advice for troubleshooting EC2 performance in general, as disk IOPS issues don’t always jump out the same way CPU, RAM, and network I/O issues might.

                                                                        1. 1

                                                                          You can explicitly add more CPU & RAM but only implicitly add iops.

                                                                          1. 9

                                                                            io1 volume types offer provisioned IOPS with a guarantee of “Amazon EBS delivers within 10 percent of the provisioned IOPS performance 99.9 percent of the time over a given year.” io1 volumes do have a price premium over the more commonly-used gp2 volume types, but there is no bucket-and-credit model.

                                                                            (disclosure: I work for AWS)

                                                                          1. 9

                                                                            We are making Firecracker open source because it provides a meaningfully different approach to security for running containers.

                                                                            Why would I run containers inside Firecracker micro VMs, as opposed to just deploying my software directly into the VM? Is the assumption that I’m using containers already for (eg) local development and testing?

                                                                            1. 16

                                                                              Firecracker is solving the problem of multi-tenant container density while maintaining the security boundary of a VM. If you’re entirely running first-party trusted workloads and are satisfied with them all sharing a single kernel and using Linux security features like cgroups, selinux, and seccomp then Firecracker may not be the best answer. If you’re running workloads from customers similar to Lambda, desire stronger isolation than those technologies provide, or want defense in depth then Firecracker makes a lot of sense. It can also make sense if you need to run a mix of different Linux kernel versions for your containers and don’t want to spend a whole bare-metal host on each on.

                                                                              1. 2

                                                                                Thanks. I was thinking about this in the context of the node / npm vulnerabilities that were also being discussed yesterday. I was imagining using these microVMs to (eg) contain node applications for security, without having to package the application up into a container.

                                                                                1. 2

                                                                                  (disclaimer: I work for Amazon and specifically work on the integration between the Firecracker VMM and container software)

                                                                                  Multi-tenant is a big use-case, but so is any workload where there is at least some untrusted code running. Firecracker helps to enable workloads where some third-party, untrusted code is expected to cooperate in a larger system.

                                                                                  In case that’s too abstract, think of a situation where a third-party component handles some aspect of data processing, but should not have access to the rest of the resources that are present in your application. Firecracker helps you establish a hypervisor-based boundary (including a separate kernel) between the third-party component and your code.

                                                                                2. 4

                                                                                  As far as I can tell “container” is about supporting a specific packaging format, OCI(Open Container Initiative). You can just deploy your software directly. In fact, I think there is no “container” support at the moment. To quote:

                                                                                  We are working to make Firecracker integrate naturally with the container ecosystem, with the goal to provide seamless integration in the future.

                                                                                  1. 10

                                                                                    (disclaimer: I work for Amazon and specifically work on the integration between the Firecracker VMM and container software)

                                                                                    “Container” is about the ecosystem of container-related software, including OCI images, CNI plugins for networking, and so forth. We’ve open-sourced a prototype that integrates the FIrecracker VMM with containerd here, and plan to continue to develop that prototype into something a bit more functional than it is today.