1. 23
  1.  

  2. 8

    Oh hey this is my project! I wrote up a little explanatory blog post since I didn’t really expect runj to get quite this much attention so quickly.

    1. 2

      Seems really quite polished for something that’s not quite ready for prime time :)

      1. 1

        Thanks! I’d love feedback if you get the chance to try it out.

        1. 1

          I’d like to see a simpler build process. I realize that Dockerfile compatibility is a pretty lofty goal, but all-in-all, Dockerfile is a work of art – regardless of what you think of the entire Docker ecosystem. It’s extremely simple for the end user to Get Things Done.

          1. 1

            Thanks for the feedback! As an OCI runtime, runj is intentionally limited and pretty low in the stack; the idea is that higher-level tools can be built on top of runj for the kinds of use-cases you’re describing. This is similar to runc (the reference OCI implementation), which both containerd and Docker use in order to implement the container primitives on Linux.

            I’ve been working on a containerd shim and porting containerd to FreeBSD to start building up the pieces of the puzzle. I don’t know that I’ll try to port Docker itself (it’s a very large codebase), but nerdctl is a command-line utility that exposes a very similar interface to Docker and has the ability to build images from a Dockerfile. It’d be neat to get all of these pieces working together.

      2. 2

        The existence of this project makes me unreasonably happy. Please feel free to reach out to me if you’re confused by any FreeBSD idiosyncrasies (jails, capsicum, firewalls, packages, whatever).

        1. 1

          Thanks! I’m brand-new to FreeBSD and trying to learn as I go. I appreciate the offer; a few other folks have made similar offers and one has already added runj to the ports tree.

          If you have the chance to take a look at the codebase for runj and have any feedback on how it leverages the existing components of FreeBSD (like my very basic jail.conf or which image I’m testing with), I’d love to hear your suggestions.

          1. 2

            I’m a bit surprised that you’re using a jail.conf at all. I would have expected this to be an alternative jail front end, rather than building on top of the existing command-line tool, which is a very thin wrapper around the a small set of system calls. The jail tool is really designed around managing a small set of static jails, which is quite at odds with an OCI environment where jails are dynamically created, configured, and destroyed. This approach may work, but I suspect you’ll hit impedance mismatches.

            If you look at the jail tool’s source code, you’ll see that a huge amount of the code is related to parsing commands and config files and a fairly small amount of it is actually managing the jails. The jexec tool actually does most of what you want and is under two hundred lines of code.

            If you have the jail-related system calls exposed into Go, then you probably have everything you need (until you get into firewall rules, then you need to deal with the fact that FreeBSD has 3 different firewalls…).

            1. 1

              I picked the jail tool to start with as I had hoped it would be a faster way to get a proof-of-concept up and running, but the feedback I’m hearing from you and from others on Twitter makes it seem like I should switch sooner rather than later.

              go-jailsbsd is a wrapper for the jail-related syscalls in Go, though it looks like it’s linking against the libc functions to do it rather than calling the syscalls directly (which is more typical in Go).

              1. 2

                I’d hope using the libc wrappers isn’t a show stopper. I personally dislike Go’s aversion to doing things the recommended way for their targets and I remember every single Go program ever built for macOS breaking as a result of this not too long ago. FreeBSD maintains a backwards-compatible ABI from the kernel but explicitly excludes some control-plane interfaces from this (e.g. network configuration). These are stable only within a major release and differences between them are typically hidden in the userspace libraries for code that links them. Various other bits of the Go ecosystem mean that it’s unlikely that anything written in Go will ever be adopted into the base system, which is a shame for something like runj - I’d love to have OCI container support out of the box on FreeBSD, though given that things like K8s will still need to come from ports it isn’t a huge concern.

                There are also a few Go implementations of the userspace interfaces to the ZFS ioctls. You might take a look at Poudriere for some inspiration. It is a lot more specialised than a container runtime, but actually does almost everything that I think a container runtime needs to do:

                • Creates ZFS clones from base images.
                • Layers nullfs mounts over the top (not unionfs, it was unstable when Poudriere was originally built, and because it targets a specific narrow use case, it was able to design around the need for overlays).
                • Creates jails that run on these FS trees.
                • Runs commands inside those jails.
                • Captures output from the commands run in the jails.

                The main thing that it’s missing (aside from providing generic configuration over the top) is any form of network configuration. The jails that Poudriere manages have simple network config so that they can fetch packages, but they’re assumed to be non-malicious so the host isn’t protected.

                The other thing that’s less generally important but perhaps more interesting is that Poudriere always sets up jails with a FreeBSD userland. It would be very interesting if runj could run Linux containers in the Linux compat ABI is installed and would give FreeBSD users a much easier way of running some Linux things than trying to get the CentOS userland packages working (and failing because things need newer packages and then having to build their own libstdc++ and so on…).

                1. 1

                  I’d hope using the libc wrappers isn’t a show stopper. I personally dislike Go’s aversion to doing things the recommended way for their targets and I remember every single Go program ever built for macOS breaking as a result of this not too long ago. FreeBSD maintains a backwards-compatible ABI from the kernel but explicitly excludes some control-plane interfaces from this (e.g. network configuration). These are stable only within a major release and differences between them are typically hidden in the userspace libraries for code that links them.

                  Definitely not a show-stopper, just something I haven’t prioritized figuring out yet since the jail tool was enough for me to get started. I remember reading that Go 1.16 will stop making direct syscalls on OpenBSD and it sounds like similar reasons apply here.

                  Various other bits of the Go ecosystem mean that it’s unlikely that anything written in Go will ever be adopted into the base system, which is a shame for something like runj - I’d love to have OCI container support out of the box on FreeBSD, though given that things like K8s will still need to come from ports it isn’t a huge concern.

                  I’d love to learn more about what’s stopping Go from being adopted into the base system; I’m still very new to FreeBSD and don’t have that context yet.

                  • Creates ZFS clones from base images.
                  • Layers nullfs mounts over the top (not unionfs, it was unstable when Poudriere was originally built, and because it targets a specific narrow use case, it was able to design around the need for overlays).

                  I’ve updated containerd to use nullfs mounts for the “native” snapshotter as the simplest implementation based on copy-ahead semantics. containerd has a ZFS snapshotter too, but I haven’t tried it yet since I already have “native” working – similar to the use of the jail command I was first looking to get the simplest thing running before going back and optimizing each component.

                  It would be very interesting if runj could run Linux containers in the Linux compat ABI is installed and would give FreeBSD users a much easier way of running some Linux things

                  Almost everyone I’ve talked to about runj prior to me posting it on the Internet had said something similar. I’m focusing first on a regular FreeBSD userland because that’s more interesting to me, but I don’t really see any major reasons why you couldn’t use runj as part of a system to run Linux container images. It might even work today; if you have a bundle with the Linux compat ABI installed?

                  1. 3

                    I’d love to learn more about what’s stopping Go from being adopted into the base system; I’m still very new to FreeBSD and don’t have that context yet.

                    The big one is toolchain stability and portability. The Plan 9 Go toolchain doesn’t support all of the architectures that FreeBSD supports. Even if it did support all of the architectures that FreeBSD supports today, the fact that it uses a different back end to the LLVM that the other base-system compilers use means that it’s likely to be an obstacle to bringing up new architectures or even new ABIs or variants in existing ones. For example, FreeBSD is likely to upstream CHERI support soon, but Go is unlikely to support any CHERI architectures for a while. That means a base system tool written in Go would not be able to adopt the same hardware-security features as the rest of the system.

                    Secondly, a major release series of FreeBSD is supported for around five years. Minor releases now generally do bring in newer versions of clang / lld, because they come with new features that are useful, but a new version of clang is expected to be able to compile all of the C/C++ code that the previous version supported (and if it can’t then it’s a bug, and if a new clang can’t build the FreeBSD base system then that will be a release blocker for that LLVM release until it’s fixed). Bringing in a new toolchain doesn’t cause any code churn anywhere else in the tree. As I recall, that’s not a guarantee that you get from Go, you may need to run go fix and clean up things.

                    That said, in the context of the rest of the OCI infrastructure, you probably wouldn’t want everything in the base system, because there’s a sufficiently high level of churn that providing 5-year stability guarantees in the interfaces is not (yet?) feasible. It’s more important that it’s easy to install from packages and that it’s easy to build a VM image that has an up-to-date install of everything.

                    I’ve updated containerd to use nullfs mounts for the “native” snapshotter as the simplest implementation based on copy-ahead semantics. containerd has a ZFS snapshotter too, but I haven’t tried it yet since I already have “native” working – similar to the use of the jail command I was first looking to get the simplest thing running before going back and optimizing each component.

                    Having ZFS working out-of-the-box is one of the selling features of FreeBSD and it’s over a decade since I ran a FreeBSD machine with anything else. Poudriere leans heavily into ZFS features for a lot of things. Trying to have UFS fall-back paths is one of the things that’s held back a lot of things in the FreeBSD world and prevented it from capitalising on the sorts of things that ZFS enables. For something like runj, taking a hard dependency on ZFS is probably not a problem.

                    Almost everyone I’ve talked to about runj prior to me posting it on the Internet had said something similar. I’m focusing first on a regular FreeBSD userland because that’s more interesting to me, but I don’t really see any major reasons why you couldn’t use runj as part of a system to run Linux container images. It might even work today; if you have a bundle with the Linux compat ABI installed?

                    I don’t know what happens now with runj if you try to load a Linux container - does it parse the Linux options and understand them? It probably needs to do things like mount linprocfs inside the jail, rather than procfs, for example and may need to load the linux / linux64 kernel module if it isn’t already loaded. For a lot of Python stuff, which is built against an ancient CentOS so that it works on any vaguely modern glibc-based Linux, the FreeBSD Linux compat layer should work fine and that would be really useful. Solaris’ Docker port decided to make the Linux ABI their default container ABI, but I wouldn’t suggest that FreeBSD do the same because the Linuxulator is not quite good enough to expect everything to work (plus you’d lose out on some of the benefits of FreeBSD, such as Kqueue and Capsicum - given the choice, I’d write code against the FreeBSD syscall interface and not the Linux one, even if the Linuxulator were 100% compatible).

                    1. 1

                      Thanks for all that background about FreeBSD and Go, that was super interesting.

                      For something like runj, taking a hard dependency on ZFS is probably not a problem.

                      The way the OCI runtime spec is written, a runtime should expect the root filesystem for the container to be mounted at a path described in the bundle, but there’s no specific filesystem. The containerd ZFS snapshotter is what I’d want to use to provide the copy-on-write layer functionality when running jails via containerd.

                      I don’t know what happens now with runj if you try to load a Linux container - does it parse the Linux options and understand them?

                      No, not yet. It sounds like there’s additional work that’ll need to be done to get this to work, but that it’s not insurmountable.

                      Solaris’ Docker port decided to make the Linux ABI their default container ABI, but I wouldn’t suggest that FreeBSD do the same

                      I agree. I’m a lot more interested in FreeBSD being the default environment here, as if I want to run Linux containers I’d typically run Linux instead of emulating it on top of another operating system.

      3. 4

        The pot framework for FreeBSD jails has also published some OCI-related bits recently: https://github.com/pizzamig/pot-images