1. 45
    1. 18

      I’m always a bit miffed at how Docker is a game of choosing between readable Dockerfiles and “tricks” like this.

      I guess most of my problems are solved by just making a copy script or something but it still surprises me how the layering system doesn’t go for something like “each Dockerfile makes one layer” (or like… have a prod build or something)

      1. 12

        If you’re using the ZFS snapshotted with containerd (and a newer version of Docker that uses containerd) then the on-disk size for each layer in the deployment will be very small because the chmod layer will contain updated metadata but share all of the data blocks.

        This doesn’t actually help you because of the way that the distribution format works. Even though the two layers differ, each layer will be separately distributed as a tarball that can be extracted over the top of the lower layer to provide the change and so must contain all of the deltas. Worse, even using the ZFS snapshotter, no one else will see this saving because they’ll download the first layer, extract it, snapshot the FS, clone the snapshot, and then extract the new tarball (which completely replaces the first one) over the top. This could be improved a lot if OCI containers allowed layers to be represented as binary diffs rather than as tarballs.

        I think the root cause of this kind of surprise in a lot of Docker uses is that the tool makes snapshotting implicit. If the Docker format made snapshots explicit then the file would be written as:

        COPY --from=downloader /bin/<binary> /bin/<binary>
        
        RUN apt-get update 
        RUN apt-get install -y openssl dumb-init iproute2 ca-certificates 
        RUN rm -rf /var/lib/apt/lists/*
        RUN  chmod +x /bin/<binary>
        RUN mkdir -p <couple of empty directories> \
        SNAPSHOT
        

        You don’t care about intermediate state between any of these run lines and you don’t care about intermediate state between the COPY and the RUN, but Docker implicitly snapshots at the end of any command in the same family as COPY, RUN, and so on. Without that behaviour, you could do whatever modifications you wanted in any sequence of steps, and make snapshots only at places that make sense. You might write the above as something like this instead:

        COPY --from=downloader /bin/<binary> /bin/<binary>
        RUN  chmod +x /bin/<binary>
        SNAPSHOT
        
        RUN apt update 
        RUN apt install -y openssl dumb-init iproute2 ca-certificates 
        RUN rm -rf /var/lib/apt/lists/*
        RUN mkdir -p <couple of empty directories> \
        SNAPSHOT
        

        This would avoid the need to recreate the external binary each time you wanted to do an apt update, on the assumption that the tools installed from apt change more frequently than the other tool (alternatively, you could reorder these).

        As I understand it, Docker doesn’t do this because it simplifies their implementation. If the layer below you hasn’t changed and the text of the RUN line hasn’t changed or the text and source of a COPY line haven’t changed then they don’t do anything (which leads to surprising things like the fact that this won’t re-run apt update and so you’ll miss security updates to OpenSSL). If snapshots were explicit then they’d need to collect multiple lines and have a proper model for when things need rerunning but then they’d need to do some actual software engineering and that’s not the kind of thing you expect from CADT software.

        1. 2

          This would be great! I can’t tell you how many Dockerfiles I’ve written each statement as a single line, figured out any bugs, then “minified them” by concatenating statements with && which is horrible as I’ve now I have a giant statement that’s impossible to debug.

          If the layer below you hasn’t changed and the text of the RUN line hasn’t changed or the text and source of a COPY line haven’t changed then they don’t do anything

          I also forget exactly the logic of this, as I haven’t been on a project using Docker in awhile, but I remember doing a variety of tricks to get Docker to pick up changes. All this becomes even a larger issue when dealing with Windows images which start out at 8GB, which brings into question if Docker is the best thing to use with Windows but that’s another discussion.

          1. 2

            Yes this annoys me too.

            One thing I do is make them separate shell scripts, so I get syntax highlighting / error handling / debugging.

            But this has implications for caching (which appears to be incorrect in both directions in Docker)

            e.g. here I call out to deps-apt.sh, deps-py.sh, deps-R.sh, but I think the caching implications are why this is perhaps not a best practice.

            https://github.com/oilshell/oil/tree/master/soil

            This is pie in the sky now but with Oil blocks I want to do something like:

            layer "Binary packages" {
              apt-get install foo bar
              do_some_cleanup
            }
            
            layer "Python" {
              pip install spam eggs
              do_some_cleanup
            }
            

            This is valid Oil syntax but the semantics aren’t implemented, e.g. allowing you to define a layer proc that takes a Ruby-like block.

            e.g. I have mentioned this feature with regard to the “YAML problem”

            http://www.oilshell.org/blog/2021/04/build-ci-comments.html#the-yaml-problem

        2. 1

          If you don’t need dockerfile support, you can do this with buildah. See the basic usage at https://developers.redhat.com/blog/2021/01/11/getting-started-with-buildah#building_a_container

          You can use the “buildah commit” in the same way you propose SNAPSHOT.

      2. 3

        This is how Bazel creates docker containers though.

        The weird thing with bazel creating docker containers is that you start to think of them as pluggable layers after a while.

    2. 1

      Good to know, thanks!