I’m a first-semester CS student preparing a short intro to Docker for my classmates. As a newbie myself, I’d appreciate any advice. My audience is mostly beginners to web tech, deployment, and Linux. I’m considering:
My aim is to make it beginner-friendly while conveying the potential of containers. How would you approach this? What key concepts, examples or analogies would you use to engage new CS students? I’m excited but a bit nervous, so any guidance on striking the right balance would be incredibly valuable. Thanks in advance!
I built a class to teach this - feel free to check our schedule and notes to see how we order the content: https://cis1912.org/lectures/
We started out by motivating portability in the context of getting a Python app that works on my laptop to work on my friend’s laptop. From there, you can show how portability is even harder at deploy-time and how Docker simplifies things in both of these contexts. For many of these students, it’ll be their first time even considering these problems, so it’s worth building strong intuition for why this is useful.
Once you’ve motivated, then you can show how the tech actually works. Process, network, and filesystem isolation are the easiest to teach in my experience. These give good entrypoints to show how namespaces & cgroups work. They also have good counterparts in the virtual machine world, which students might be more familiar with.
Tools like Kubernetes are very cool, but it takes a while to understand why they’re useful. It’s good to spend some time here, but the most important thing is to spark interest and give the students resources to learn more on their own time. The ones who really care will eat that stuff up.
Best of luck! The most important thing is to teach something that lights you up: students can tell when you’re excited, so if you can find content or a way to teach the content that gets you going, follow that path.
Agreed that for freshmen, the “why” of something like Docker is probably more important than the “how”, especially since they don’t yet have the operating systems background. I’m a current Penn PhD student and I had no idea there was a class like this. This looks really cool, nice work!
For inspiration you might take a look at Julia Evans’ zine How Containers Work, which I think approached this material at exactly the right level: https://jvns.ca/blog/2020/04/27/new-zine-how-containers-work/
I’d come at it from the other direction: increasing ability to isolate parts of the system. Processes, virtual memory, multiple users, and file system permissions were all steps to this. There was chroot since long, long ago on unix for some simple filesystem isolation. At the other end of the spectrum is full virtualization, but you have to run a whole separate copy of the OS and pay the cost of brokering hardware access. Then talk about Solaris zones and FreeBSD jails, which provided an integrated way of doing process isolation, then finally all the various namespaces that got added to Linux which let you do the same thing by wiring up the pieces.
This would be my approach too – the fundamentals of isolation are crucial to understanding containers, and eventually Docker or whatever, and then the orchestration like Kubernetes.
I wouldn’t focus on Docker. I would focus on the image itself being an immutable starting point, and volumes being the mutable state.
The rest of Docker around that is how to build that immutable image and how to run that image. Part of that is process isolation. Part of that is pretending it’s own system, but those things are breakable within Docker. We can opt out of those isolation mechanisms.
Build a simple Go application, then tear apart the image it’s attached to. Then do the same for a Python application, and tear apart the image it’s attached to. They are just tar balls. They can just be read out. And you can see the layers, and you can see how files are resolved.
Remove the magic of Docker. You only have to do it one time, and then it’s something that the audience can understand as opposed to a magical thing. Their future colleagues will thank you.
If you’re teaching, maybe teach the technology (OCI containers) rather than the brand for a popular implementation (Docker).
Please don’t conflate the container abstraction with one of the implementations. If you want to understand shared-kernel virtualisation, the Jails and Zones papers are a good place to start, as is Bryan Cantrill’s excellent talk about them. The Linux implementation is by far the worst of any operating system and so starting here will do more harm than good. Even on Linux, there are multiple implementations of the isolation part (the ‘shim’ in container terminology), including gVisor, which uses ptrace, and Kata Containers, which use lightweight VMs.
The isolation model is the least interesting part of the container abstraction. Building images from immutable snapshots is far more interesting because this ties into the distribution and orchestration models, where the benefits of containers really shine. Cow sharing for the lower layers of the filesystem images, for example, is a nice efficiency gain but the separation of state between images (built via some automated process) and volumes (per-deployment state) is one of the biggest wins for containers and this is totally orthogonal to the isolation mechanism that you use.
Go is probably the worst example because the Go compiler produces a single statically-linked binary. This has no dependencies other than the kernel and so can use the degenerate case for containers: a single layer with one file in it. Containers really shine when you have a load of dependencies that need to be versioned together and some that are shared. Python makes for better examples. You can use a common base layer for several Python programs, so the Python interpreter, C library, and so on all have their code pages existing precisely once on disk and with memory maps of the same files (you can use a debugger to show the inode numbers of the mappings while running two instances and validate that the memory is shared). You can then install the Python dependencies and package the whole thing up. You can also show two Python things that depend on incompatible versions of the same dependency in different containers with no special handling,
Ask yourself: if you couldn’t use Docker to accomplish task X, how would you do it?
I would start with a problem containers solve. The main problem is “works on my machine.” Go is typically a bad example for Docker because it’s not that hard to build a pure Go binary on any system or cross compile it. Python is a better choice because it’s quite difficult to build a working Python system if you use a number of common libraries, such as numpy and pandas.
So, show a working Python system on one VM. Log into another VM, try to get it installed there too. Show how this sucks and it isn’t working right on the first try because it needs some library. Then show how with Docker you can document what is needed for a build and distribute it. If you have time, get into how it works, the tar layers and cgroups and compute limits and whatnot, but you have to start with a problem or no one will pay attention enough to understand the rest.
Just some random thoughts:
Would definitely do a simple explanation of the Dockerfile, and what each of the most basic commands does. Also how to do cleanup in docker and building containers from scratch.
I would focus on the why of docker and on an hands-on example that can get people working with it. I wouldn’t go to I depth on more advanced topics ( e.g., networking, volumes) or other technology built of docker (e.g., kubernetes) as it could become an unproductive bird walk.
As far as the workshop outline, I would start with a motivating example (e.g., simple web app with a database and a redid data store) and compare running it on bare metal and on docker (compose).
The rest of the workshop could cover dockerfile and docker compose syntax, potential uses, and future topics (here you can talk about the more advanced topics like the underlying docker tech, kube, and other).