1. 12
  1. 15

    I think the key insight here is that container images (the article confuses images and containers, a common mistake that pedants like me will rush to point out) are very similar to statically linked binaries. So why Docker/container images and why not ELF or other statically linked formats?

    I think the main answer is that container images have a native notion of a filesystem, so it’s “trivial” (relatively speaking) to put the whole user space into a single image, which means that we can package virtually the entire universe of Linux user space software with a single static format whereas that is much harder (impossible?) with ELF.

    1. 4

      And we were able to do that with virtualization for at least 5 - 10 years prior Docker. Or you think that packaging also the kernel is too much?

      Anyways, I do not think that a container having the notion of a filesystem is the killer feature of Docker. I think that moving the deployment code (installing a library for example) close to compilation of the code helped many people and organizations who did not have the right tooling prior that. For larger companies who had systems engineers cgroups gave the security part mostly because packaging was solved decades prior to Docker.

      1. 1

        IMO it’s not the kernel but all of the supporting software that needs to be configured for VMs but which comes for ~free with container orchestration (process management, log exfiltration, monitoring, sshd, infrastructure-as-code, etc).

        Anyways, I do not think that a container having the notion of a filesystem is the killer feature of Docker. I think that moving the deployment code (installing a library for example) close to compilation of the code helped many people and organizations who did not have the right tooling prior that.

        How do you get that property without filesystem semantics? You can do that with toolchains that produce statically linked binaries, but many toolchains don’t support that and of those that do, many important projects don’t take advantage.

        Filesystem semantics enable almost any application to be packaged relatively easily in the same format which means orchestration tools like Kubernetes become more tenable for one’s entire stack.

      2. 4

        I can fit a jvm in a container! And then not worry about installing the right jvm in prod.

        I used to be a skeptic. I’ve been sold.

        1. 2

          Slightly off topic - but JVM inside a container becomes really interesting with resource limits. Who should be in charge of limits, JVM runtime or container runtime?

          1. 8

            Gotta be the container runtime (or the kernel or hypervisor above it) because the JVM heap size limit is best-effort. Bugs in memory accounting could cause the process to use memory beyond the heap limit. Absent that, native APIs (JNI) can directly call malloc and allocate off-heap.

            Would still make sense for the container runtime to tell the JVM & application what the limits on it currently are so it can tailor its own behaviour to try to fit inside them.

            1. 4

              It’s easy: the enclosing layer gets the limits. Who should set the resource limits? ext4 or the iron platter it’s on?

              1. 2

                What’s the enclosing layer? What happens when you have heterogenous infrastructure? Legacy applications moving to cloud? Maybe in theory it’s easy, but in practice much tougher.

              2. 2

                Increasingly the JVM is setting its own constraints to match the operating environment when “inside a container”.

            2. 4

              Yes, layers as filesystem snapshots enable a more expressive packaging solution than statically linked alternatives. But its not just filesystems, but also runtime configuration (variables through ENV, invocation through CMD) that makes the format even more expressive.

              p.s. I have also updated the post to say “container images”

              1. 3

                I think the abstraction on images is a bit leaky. With docker you’re basically forced to give it a name into a system registry, so that you can then run the image as a container.

                I would love to be able to say like… “build this image as this file, then spin up a container using this image” without the intermediate steps of tagging (why? because it allows for building workflows that don’t care about your current Docker state). I know you can just kinda namespace stuff but it really bugs me!

                1. 3

                  Good practice is addressing images by their digest instead of a tag using the @ syntax. But I agree - registry has always been a weird part of the workflow.

                  1. 1

                    addressing images by their digest instead of a tag using the @ syntax.

                    Be careful about that. The digest of images can change as you push/pull them between different registries. The problem may have settled out, but we were bitten by changes across different releases of software in Docker’s registry image and across the Docker registry and Artifactory’s.

                    I’m not sure if there’s a formal standard for how that digest is calculated, but certainly used to be (~2 years back) be very unreliable.

                    1. 1

                      Oh I wasn’t aware of that! That could let me at least get most of the way to what I want to do, thanks for the pointer!

                  2. 3

                    I noticed Go now has support for, in its essentially static binary, including a virtual filesystem instantiated from a filesystem tree specified during compilation. In that scenario, it further occurs to me that containerization isn’t perhaps necessary, thereby exposing read only shared memory pages to the OS across multiple processes running the same binary.

                    I don’t know in the containerization model if the underlying/orchestrating OS can identify identical read only memory pages and exploit sharing.

                    1. 2

                      I think in the long term containers won’t be necessary, but today there’s a whole lot of software and language ecosystems that don’t support static binaries (and especially not virtual filesystems) at all and there’s a lot of value in having a common package type that all kinds of tooling can work with.

                      1. 2

                        As a packaging mechanism, in theory embedded files in Go works ok (follows single process pattern). In practice, most Go binary container images are empty (FROM scratch + certs) anyways. Lots of files that are environment dependent that you would want at runtime (secrets, environment variables, networking) that are much easier to declaratively add to a container image vs. recompile.

                      2. 2

                        So why Docker/container images and why not ELF or other statically linked formats?

                        There are things like gVisor and binctr that work this way, as do somethings like Emscripten (for JS/WASM)

                        1. 2

                          I really hope for WASI to pick up here. I used to be a big fan of CloudABI, which now links to WASI.

                          It would be nice if we could get rid of all the container (well actually mostly Docker) cruft.

                      3. 4

                        More like “docker is a packager”

                        1. 3

                          Now that docker (buildkit) does instruction caching, garbage collection, and concurrent dependency resolution, its looking a lot less like just a packager!

                        2. 2

                          .. and CI/CD pipelines are distributed build systems

                          1. 2

                            It definitely is in the xkcd.com/303 sense :) One would’ve thought that this is not possible with python, but technology can fix everything!

                            1. 2

                              I’ve worked on a web application written in Python that took a solid minute to start up. If you really want it, anything is possible with Python!

                              1. 2

                                I’ve worked with a web application written in Python where just downloading all the docker layers took more than that ;)

                                This is a weird competition.

                                1. 2

                                  For what it’s worth, the application I worked on took that long just purely by running python run.py. Loading a bunch of data in memory can take a while.

                            2. 1

                              since buildkit with llb, docker is closer to being a compiler than ever. we can now even write custom syntax for it.

                              1. 1

                                Buildkit, which creates docker images, is like a compiler. I actually built my own syntax for it based on INTERCAL for fun. It’s called ickfile and feel free to use it for your next project if you are feeling like a masochist. https://github.com/agbell/compiling-containers/tree/main/ickfile