1. 62

Many people on Lobsters are sharing their doubts about Docker and how they prefer deploying without containers.

  • What does your deployment process look like?
  • How do you ensure that your deployments allow rollbacks?
  • Do you version your deployment files ? How are you tracking compatibility with builds?
  • How are you achieving immutable architecture or at least limiting configuration drift?
  • How do you handle configuration for your services? Are they tied to deployment scripts or to releases? -…

I think that many people are writing and experimenting a lot with containers and big companies are all writing at length about best practices, but I find the subject container-less to have way less content and I feel that very few companies actually describe how they do it “the boring way”.

  1. 18

    I have a few small deployments on NixOS servers, I git push and it does a nixos-rebuild to the new system state. If there is something wrong with the deploy I do a nixos-rebuild switch --rollback. the entire system state is immutable so mutability isn’t a concern. Each deployed app can have a different pinned version of nixpkgs, so other apps on the same machine don’t interfere with others.

    1. 4

      For personal projects or green fields projects, I would definitely push for nixos, that I’m using myself for my desktop. Nonetheless, this is harder to get introduced in a more enterprise and regulated environment because of the knowledge being way too different to operate it, without even mentioning that many big-cos want to buy commercial support for their OS.

      1. 2

        How do you approach nixpkgs pinning? I’ve thought about keeping a whole channel in our repo, but curious to hear your approach.

        1. 3

          It’s pretty simple, the derivation takes a nixpkgs argument that defaults to a known working version. example: https://github.com/jb55/gdk_rpc/blob/nix-build/default.nix#L1 it can always be called with any other nixpkgs as well if you don’t want to use the pinned version.

          1. 2

            Neat, so you just pin your build to the hash in git. Sounds reproducible to me!

      2. 15

        A java fat .jar is uploaded to S3. A supervisor process notices the new build and swaps out the running processes (waiting for them to finish current work). A revert just means re-uploading a previous .jar file.

        1. 1

          what sort of stack are you running in your jar? Trying to figure out what is current best practice - Jetty?

          1. 2

            Yes, for us Jetty is running as an embedded server (not a server container or WAR runner or anything like that).

        2. 12

          We build OS packages(via CI/CD) and then deploy using standard OS package tools. For us that’s either Debian(.deb) or pkg(FreeBSD).

          The OS level package tools handle all the issues in regards to rollbacks, versioning, etc. Configuration is either static done @ build package time, @ install time (via package hooks) or @ runtime via consul, vault, etc.

          it’s basically bulletproof and easy, since if you can install vim or emacs, you can install our apps as well.

          1. 4

            Where do you draw the line in regards to things like including the language runtime and library dependencies? Do you use the system Python/Ruby/Node, bring in a third-party repo or just bundle everything together in your package?

            1. 1

              I answer that with: After installation of my package it should be ready to run.

              Use system package dependencies whenever possible. Sometimes you can’t, and then you go forth and build your own. Either built into the package, or as a separate package dependency(depending on if other things I build will need it as well).

            2. 1

              I’ve done this in the past and I hated that you have no real control over when your services restart. Green-Blue-Deployments are also quite hard.

              Our meh solution for this was to version our packages, so you’d install app-1.2.3 alongside app-1.2.2, both int their own directory, e.g. /opt/app-1.2.3 and then let a script do a more atomic ln -s /opt/app /opt/app-1.2.3 once the actual artifact was there and the LB had shifted connections to drain.

              1. 1

                Agreed, system level task managers/service runners are generally meh.

                That’s why we are now using Nomad to handle running of our services/task management, it can do all these things and more.

              2. 1

                Do you have more details (like a blog post or else) on this process? What are the advantages, what limitations do you see and maybe have accepted?

                1. 3

                  no? It’s not a complicated process. have a script(we have a Makefile) that creates a package and shoves it into your custom repo. Every OS is different on how that goes.. consult your OS documentation. Have your CI/CD run that script on check-in. Our CI/CD runs a script per OS we support(on each OS that we support).

                  The end goal is getting X files in Y places out on disk. Package managers basically excel at that. I don’t see any limitations, except the overhead of figuring out how to create your own packages, and yelling at the system to pull in your custom repo(s). But once you get over that hump, it’s cake. fpm[0] makes creating packages less insane.

                  If you integrate closely with the OS you can go nuts with service/task startup integration, etc. We generally avoid going to crazy into tying that closely with the OS. If we were tied to only 1 OS, and were not cross-platform(we deploy to Windows, MacOS, Debian and FreeBSD) then perhaps we would spend more time integrating into the OS more. It also depends on the OS as well, as with windows all you can get built in is MSI’s, which are terrible compared to a .deb package.

                  0: https://fpm.readthedocs.io/en/latest/index.html

              3. 9

                This is old-school and leaves certain things to be desired, but here goes anyway.

                What does your deployment process look like?

                A github webhook is triggered on commit push. The handler writes a deployment job file into a filesystem queue. deployer then updates the .a or .b git checkout, whichever one isn’t currently in use, and runs a quick user-defined smoke test. If everything succeeds, a symlink is atomically updated to swap inactive and active checkouts (.a and .b). Finally, a SIGHUP bounces the corresponding daemontools service. Dream Deploys: Atomic, Zero-Downtime Deployments describes this in more detail.

                How do you ensure that your deployments allow rollbacks?

                Rollback involves deploying a prior commit. Stateful changes like database migrations may require special accommodation.

                Do you version your deployment files ? How are you tracking compatibility with builds?

                Meaning, how do you deal with changes to your deployment tooling & configuration? Haven’t had to think about this in a while! The architecture described above is minimal and pretty generic. I do keep all my daemontools services in a separate repo in case ./run files or other top-level configuration bits need changing.

                How are you achieving immutable architecture or at least limiting configuration drift?

                Sadly, this isn’t an immutable infrastructure approach. But on the plus side deployments are a heck of a lot faster than other pipelines I’ve worked on (seconds vs minutes). There’s a bit of a tradeoff here where you can make deployments safer, or you can make fixing deployment problems faster. Larger orgs probably prefer safer, but on small teams I prefer faster.

                I try to store as much configuration in git as possible, and use the same deployment process described above to make config changes. I use etckeeper to track /etc in git. This lets me do sysadmin work directly in a production environment lookalike, then when I’m satisfied, commit my config changes. I don’t like guessing what action-at-a-distance an ansible playbook, puppet script, etc will have on a unix system, and I know unix well enough that those abstractions tend to get in the way.

                How do you handle configuration for your services? Are they tied to deployment scripts or to releases?

                See above, top-level config I track in a separate repo of daemontools services, and in /etc. Application defaults typically live directly in the app’s repo, and get selectively overriden by the daemontool service’s envdir.

                1. 2

                  etckeeper is underrated indeed.

                  1. 2

                    I didn’t know etckeeper! It seems that you created the tooling that you missed in the space and that it fit very well into your process. That’s great reading about this process. Thanks!

                  2. 8

                    Some of us don’t ship things that work online, so our deployment had no need of containers. In that case, once our product is built, we hand it off to someone who has access to the website who uploads it for us. In other words, our deployment process is, “put it in pre-determined folder somewhere and tell the website people to update the site”.

                    1. 2

                      Then I assume we could say that you “release” and not “deploy”. You make a release accessible somewhere to some people, but won’t handle it’s lifecycle directly (shutdown, startup, rollbacks, migrations, …).

                    2. 7

                      I deploy Haskell applications to NixOS machines on AWS with NixOps.

                      What does your deployment process look like?

                      To deploy production, I run this command:

                      make deploy-production

                      Under the hood, this just :

                      1. Uses Nix to build a couple of Elm projects and copy them to the right place
                      2. Notifies Rollbar that a deploy has started
                      3. Deploys with NixOps:
                        nixops modify services.nix aws.nix -d my-cool-app
                        nixops deploy -d my-cool-app --include production
                      4. Notifies Rollbar that the deploy finished

                      How do you ensure that your deployments allow rollbacks?

                      NixOps gives you atomic rollbacks out of the box.

                      Do you version your deployment files ? How are you tracking compatibility with builds?

                      Yes. All server configuration is written in config files, which NixOps uses to provision, deploy, and manage services. These files are tracked in the repository with the rest of the code.

                      How are you achieving immutable architecture or at least limiting configuration drift?

                      NixOS behaves this way by default.

                      How do you handle configuration for your services? Are they tied to deployment scripts or to releases?

                      This is also written in the Nix expressions which NixOps reads.

                      1. 7

                        FWIW some of us are just skeptical of Docker Hub and not Docker/containerization. I would never in a million years build a company’s infra on random Docker Hub images but I absolutely would be willing to use containers. That said, I don’t work in devops so what do I know.

                        1. 1

                          I don’t think so. You have the same thing with fetching packages from Github, using custom repositories, etc… The namespacing and resource limitation can be done with systemd too. The (almost) platform independent packaging is maybe the most interesting part that I see, but I’m sure that this couldn’t be done using other solutions (e.g. a JAR).

                          1. 2

                            I think you meant to reply to someone else?

                        2. 7

                          I wrote a blog post on deploying Python applications back in 2012(!) and implementation details aside, it aged quite well and we’re using the same rough approach for everything that’s not running in our nomad cluster for some reason: https://hynek.me/articles/python-app-deployment-with-native-packages/

                          The core idea is to build a virtualenv with everything you need on a build server/CI and package it into a .deb/.rpm./.tar.gz. Configuration goes into ansible, done. Gives you unlimited flexibility, rollbacks, and a whole battle-tested toolchain that you can rely on.

                          The concepts should be easily transferable to other languages and ecosystems.

                          1. 2

                            Great article, thanks!

                          2. 6

                            Across many organizations “traditional” package-based deployment is still the winner. I’ve been building the packages (.deb or custom format), deploy them with rsync or a HTTP mirror, run simple configuration hooks at deploy time. Each application and host is expected to self-configure as much as possible without requiring CM tools. If needed, a host can call APIs to register itself on load balancers, message queues etc. Discovery is done by DNS. Systemd unit files provide better process isolation (seccomp, cgroups, overlayfs) than popular containers. Security updates are provided by the underlying Linux distribution and shared libraries are used to receive updates without compile/deploy cycles.

                            Newly hired ex-colleagues were often surprised at how effective and simple the whole process can be.

                            1. 2

                              Do you have any blog post or article describing your approach in more detail? Several people here seem to use system packages and it got my attention.

                            2. 4

                              We are in the process of moving our apps to containers in k8s, but the process without containers that I put in place for a Rails apps in about 2011 is still working OK for our main apps. We have a build server that checks out the source code and compiles a deployment package, which goes into a Git repository. Our application servers are configured with Puppet, which maintains all the dependent libraries, configurations, etc. Deployment isn’t a separate operation, it’s just built into the Puppet config — if the tagged deployment package has changed, Puppet retrieves It and restarts the server processes. To do a rollback, you just tag a previous version and rerun Puppet. We have a single Puppet config for the entire server fleet, with a resource definition for “a Rails server running releases from this deployment repo”.

                              1. 4

                                I run Ansible on my desktop which has a playbook that installs all dependencies, checks out latest release from Git, and then builds/restarts if any code changed.

                                I don’t use anything like Terraform or even dynamic inventories yet because I only have a handful of nodes. I’ll look into those more when I need them.

                                I also just use Ansible to configure things like HAProxy upstreams and database server hosts, in the future I might look into using Consul for that.

                                1. 4

                                  Even against the hype machine that is docker/k8s I’d argue there is way more content on doing things “the boring way” - you can go a very long ways with a little bit of apt-get.

                                  In the past I have used puppet/chef/terraform/shell scripts/etc. to build aws images and just launched manually whatever is needed with plain old ci (jenkins) systems doing actual deploys/rollbacks. I think the concerns about native library compatibility are typically overblown. Yes you have to change base images every now and then but you’d have to do the same in containerland. I’ve never liked the configuration management software out there yet I can see value on larger teams. I think the ci space is pretty straight-forward yet think there is plenty of room for improvement.

                                  True immutability is rather hard to do unless you severely restrict ssh access. I find people (including myself) will install all sorts of crap if given time and a problem. This can go south when certain tools and I’ll pick on sysdig come with kernel modules or someone’s favorite editor decides to upgrade tls.

                                  However I truly have not had the horrors that I see with people when they choose containers/k8s. My personal opinion on containers is that if people think installing a given service is such a pain then perhaps they shouldn’t be administering that service (with anything).

                                  Also I’m heavily involved in a company that is working on unikernel tooling so there’s that. :)

                                  1. 1

                                    Well, that’s debatable. I’ve seen so many articles that discuss about the deployment process of a jar file and write in the middle of the the blog post “deploy jar” without any explanation on how they deploy it, where do they put it, how they allow rollback, etc… There’s probably no magic, but I feel that it’s the jungle and everybody has different conventions.

                                  2. 6

                                    NixOS and NixOps. Can just toss all the nix configs in git. No scripts needed.

                                    Heard about morph too; curious if anyone’s using that to avoid statefulness on the deploy machine.

                                    1. 3

                                      In my current job, we have TFS CI create and publish a package to Octopus, which depending on the environment is automatically or manually deployed. The Octopus jobs include DB migrations etc. It’s very much a “pet” setup, and provisioning, DNS, load balancing etc is all done manually.

                                      In my previous role, we used TFS CI pipeline to build and publish a package to Artifactory, then used TFS CD with a combination of powershell scripting and VRealize to spin up AWS machines and load balancers, deploy, configure DNS etc. Very much “cattle”, though we’d configured the pipeline to allow us to treat the servers in “pet” mode which was useful for minor changes. Because we could configure DNS it also let us do blue/green deployments.

                                      Although VRealize was, how can I put this… a little rough, in the end we ended up with (almost) single-button deployment for our Ops team - the only thing not done when I left was certificate installation. It was technically possible but the org’s security team had to be convinced that it was secure enough.

                                      1. 1

                                        What is TFS?

                                        1. 2

                                          I guess Team Foundation Server of the Microsoft.

                                          1. 1

                                            What is TFS?

                                            Microsoft’s source control/CI/CD suite of products. Though I think they’ve rebranded it somewhat recently.

                                        2. 3

                                          We use Docker for some things, but not for the main pieces of our application. We deploy to EC2 instances. Our main components run on the JVM and we use systemd to manage starting and stopping them. Our CI process generates a tarfile containing a fat jar and some support files, and uploads it to S3 with a filename that includes the git commit hash.

                                          We have a fairly straightforward deploy script written using Fabric that does the actual deployment. It does a rolling deploy so there are always instances of the service running; zero-downtime deploys are a business requirement for us.

                                          1. Download the tarfile from S3.
                                          2. Unpack the tarfile. This creates a new directory whose name is the git commit hash.
                                          3. Deregister the server from the load balancer and wait for the load balancer to finish the deregistration.
                                          4. Gracefully shut down the application using systemd.
                                          5. Update a symlink to point to the new directory.
                                          6. Start the application using systemd. The application’s working directory goes through the symlink, so this launches the new code.
                                          7. Wait for the load balancer to register that the service is up. (This actually happens in the start script systemd runs.) We need to wait for this so the rolling deploy doesn’t take more of our cluster offline than we expect.
                                          8. Delete old code versions such that the most recent N of them are still available.

                                          The same exact deploy script is used to deploy to staging and to production.

                                          How do you ensure that your deployments allow rollbacks?

                                          Rollback is simple because the old code is still sitting on all the servers; we just point the symlinks back to the previous releases and do a rolling restart. If we need to roll back further than the number of releases we keep sitting around, we run the deploy script and tell it to pull whatever older version we want from S3. Takes a few seconds more per host but not too bad. In practice, we’ve never had to roll back more than one release, and even that is exceedingly rare.

                                          Do you version your deployment files ? How are you tracking compatibility with builds?

                                          The deployment scripts are in the same git repository as the code, so they are always versioned in lockstep.

                                          How are you achieving immutable architecture or at least limiting configuration drift?

                                          Our CI, staging, and production environments are all managed using Terraform and Ansible, and the configurations are version-controlled and go through code review. We do allow ssh access but we treat it as read-only, more or less: nobody makes manual changes to any of our production hosts except when investigating problems, and even then, we almost always make those “manual” changes by adding to the Ansible configuration. We run a host intrusion detection system on all our hosts that generates a daily email report of unexpected file changes, so manual edits by employees are flagged for review (that’s not the main reason we run the HIDS, but we get that benefit for free).

                                          We do have a couple of scratch hosts where we relax the “no changes using ssh” rule a bit to allow experimentation and investigation, but even there, you’re expected to update Ansible to reflect any system changes you want to keep around long term. The scratch hosts get destroyed and rebuilt periodically, which acts as a strong motivation to stick with that discipline.

                                          How do you handle configuration for your services? Are they tied to deployment scripts or to releases?

                                          Aside from a small amount of bootstrapping configuration which is Ansible-managed, all our configuration is in Consul and is organized by tier (dev, staging, production). Consul is populated from a git repository, so configuration is version-controlled.

                                          When we add new configuration settings, the practice is to commit the new values to the Consul repository before committing the code, so that the configuration is already present when the new code starts running in the staging environment.

                                          I feel that very few companies actually describe how they do it “the boring way”.

                                          I think that’s because it’s the boring way. Almost all companies use their engineering blogs and conference presentations as recruiting tools, and no company wants to say, “Come work for us! Our technology is old-school and boring!”

                                          That said, I disagree with the premise a bit. While these tools are not as actively discussed as the latest containerized hotness, I think there are plenty of resources out there for learning to build system packages, manage processes using systemd or other system-level tools, package Java applications as fat jarfiles, and so forth.

                                          1. 3

                                            This was our Linux deployment and management of hundreds of machines:

                                            1. 3

                                              Currently: cmake install (and that’s a step forward from where it was).

                                              Soon: .deb packages built from CI.

                                              1. 2

                                                Well the biggest problem is there aren’t great complete “solutions” for creating repeatable deployments that aren’t using containers.

                                                You can get a long way with a proper Salt deployment for at least machine setup and configuration. I don’t have a huge amount of experience there but folks I work with are beginning to do this with software that simply cannot be containerized.

                                                1. 6

                                                  Well the biggest problem is there aren’t great complete “solutions” for creating repeatable deployments that aren’t using containers.

                                                  There most certainly are.

                                                  1. 1

                                                    Neato. Thanks for the info n

                                                  2. 1

                                                    Interesting. Do you know of “non-great” solutions?

                                                    Background is I am currently developing a lightweight CI solution without docker (because I found that either they are big like Jenkins/Gitlab or they are built to use docker). So I am wondering if it would make sense to develop this into the direction of deployments or create a sibling lightweight program for deployments.

                                                    1. 1

                                                      Honestly, not much that isn’t tied to huge monolithic systems.

                                                      I’m not a huge fan of Docker and Kubernetes but I recognize their value in codifying deployment into code. I don’t deploy many services as a hobby so I haven’t sorted our the none-container solution.

                                                      At work we are heavily Kubernetes and Docker so I work within that framework. It has warts but it works well enough for an environment that runs ~4k containers within K8.

                                                      1. 1

                                                        We’ve been using a combination of Terraform (initial setup) plus Nix/NixOS for our containerless deployments. There is also an autoupdater running on each machine that monitors the CI for new derivations and does an update if there is, with rollback in case of a problem.

                                                        1. 1

                                                          Ansible? I find it perfectly acceptable for small to medium sized deployments. Can do orchestration too, although of course specialized tools like Terraform are better at that.

                                                      2. 2

                                                        Before I switched to a job that deploys software using Docker, I’d previously used mixes of system-package management (RPM) and a configuration management tool (Chef or Ansible).

                                                        At a startup that managed a large number of systems and services, I worked with Chef:

                                                        • CI pipelines publish a new “cookbook” that installs the new package. Cookbooks were released automatically to integration environments, and manually released to staging and production environments.
                                                        • Rollbacks were done by downgrading to an older cookbook, or stopping Chef and downgrading the package.
                                                        • All files were managed by the package or cookbook.
                                                        • Configuration drift was limited by Chef running every half-hour, avoiding manual changes to systems persisting.
                                                        • All configuration was done by the versioned cookbook.

                                                        At a smaller company I used Ansible instead, as we only managed a very small number of systems.

                                                        • Ansible was configured in a single repository, that would upgrade to the latest published versions of packages. CI pipelines that published packages triggered the pipeline that ran Ansible.
                                                        • Rollbacks were very infrequent, and our only option was manually fixing things or downgrading packages.
                                                        • All files were managed by the package or Ansible playbook.
                                                        • Nothing limited configuration drift, but with <10 systems it wasn’t really a problem.
                                                        • All software configuration was done by the separate Ansible playbooks, so it was easy to release software that wouldn’t be compatible with the configuration or vice-versa.

                                                        System package managers like apt and dnf provide a lot of the tooling you need to deploy software the “boring” way. Unless you’re doing something unusually complex, a package repository and a small amount of configuration management (which should do little more than install packages and create config files) will get you a long way.

                                                        1. 1

                                                          Do you have any blog post or article describing your approach in more detail? Several people here seem to use system packages and it got my attention.

                                                        2. 2

                                                          A ‘pseudo’ steps overview of what we do

                                                          • cd ~/my-mono-repo

                                                          • git pull pulls head from master

                                                          • cd ~/my-mono-repo/devops/ansible

                                                          • ansible-playbook build-release-locally.yml –vault-id @prompt

                                                            creates all artifacts locally in ~/my-mono-repo/devops/ansible/files directory (also, our artifacts are not CPU arch or OS sensitive, so build artifacts are same if I deploy on Linuxes or FreeBSD or OpenBSD). When we will add os-specific artifacts (build in D likely) we will have to cross-compile to several platforms during the local build process (this will complicate where we can do this step, today it works on all LInxues and all the BSDs, when we start doing this crosscompile stuff… it will limit us quite a bit, where we run ansible, and we will have to install more tools on each ansible controller where these steps are done…)

                                                          • ansible-playbook site.yml

                                                            This actually pushes release to the registered hosts (some hosts are just for webapp stuff, the others for DB, and others for backends.. and so on)

                                                            we can also do ansible-playbook site_faster.yml

                                                            • or ansible-playbook site.yml –tags restarthem
                                                            • or ansible-playbook site.yml –tags reinstalthem
                                                            • or ansible-playbook reboot_all.yml

                                                          Difference between site.yml and site_faster.yml is that faster version do not update all OS packages in the target os, and does not install my essential tools (it assumes those 2 steps are already done).

                                                          • site.yml (and site_faster.yml) consists of
                                                            • site_install_frontoffice_webapp.yml
                                                            • site_install_frontoffice_backend.yml
                                                            • site_install_business_ops_webapp.yml
                                                            • site_install_business_ops_backend.yml
                                                            • site_install_logger_controller.yml
                                                            • site_install_frontoffice_database.yml
                                                            • site_apply_frontoffice_dbschema.yml
                                                            • site_install_certs.yml

                                                          The above separation let’s us install or just restart (using tags) individual subcomponents, or the whole thing. All of the above, except _database and _apply_dbschema, are idempotent operations. Meaning that they erase and then reinstall everything from scratch

                                                          Our install process is tested on several Linux distro, and FreeBSD and openBSD (ansible let’s us define variation of commands on those, when we need it.. most time consuming stuff was just finding the right syslog replacement from the BSDs, and then some specifics about how to restart the process, the were also some troubles with nginx role that we use for the BSDs… difference in package mgmt, but rest was identical to Linux, so no extra work).

                                                          The release artifact files deposited by local build process are also archived into escrow-like vault. This let’s us recreate what went into prod. The source recreation uses git tags (we do not actually pull git head for prod, but git tag)

                                                          We do not use let’s encrypt model of cert distribution, and instead keep cert in build artifacts, so that we push them during the release process (instead of let’s encrypt model that assume that each server in prod pulls the cert files periodically)

                                                          We also have remotebackup.yml and restorefrombackup.yml that backup prod and then restore from prod (in addition to various automated backup/replication stuff)

                                                          1. 2

                                                            We use Hashicorp Nomad (together with Consul and Vault) and deploy java jar files on it.

                                                            The deployment is triggered by a successful build for our staging environment and for production we go into our build tool and press the deploy button, all job specifications are in their own repository now, but the plan is to have a directory called deploy in each service/application repository with their job specification in them and let the backend developers manage that themselves.

                                                            We rotate all machines at least ones a month as a way to patch and upgrade the machines and only a handful of people have direct access to them over ssh. This sort of forces us to make sure all relevant configuration is committed and pushed in our packer repo.

                                                            All configuration including secrets are managed in Vault, to change something we either wait for a new deploy or restart the service, as we deploy at to production at least once a day this is seldom a problem.

                                                            1. 2

                                                              To answer your questions from our perspective:

                                                              1. NixOS for linux-only based deployments and FreeBSD with shell scripts and our own monitoring/management software for everything else. Sometimes we also use jails for specific security or feature requirements, but I guess that might qualify as a ‘container’ so you aren’t interested in that.

                                                              2. On NixOS you can just rollback. Very handy feature! But we almost never need it. This is more difficult on FreeBSD, but we very very rarely have a need for rollback since we exclusively use high quality stable well-maintained dependencies. If they don’t qualify, we don’t use them or create our own. Stability and high quality software is key.

                                                              3. Yup, alle files are timestamped both by signature as well in-line via comments with version number and date. We never have compatibility issues since we make everything compatible to begin with. If you need a specific version of a dependency to use a piece of code, we simply won’t use it.

                                                              4. NixOS is really awesome for this! On FreeBSD we don’t have the immutability features NixOS provides, but in my experience it’s also less of an issue on FreeBSD since two systems that are configured the same stay very consistent to begin with (in my experience not so much on Debian and CentOS).

                                                              5. On NixOS they are indeed tied to the NixOS files. On FreeBSD we make our own shell scripts and use a bot that manages consistency for us by auditing settings on a regular basis.

                                                              Personal opinion: NixOS is awesome in a lot of ways, but in the end I prefer BSD based operating systems in terms of consistency, lack of complexity and stability. It would be awesome if someday a FreeBSD or OpenBSD based OS would have some NixOS-like features (immutability, reproducible builds, central file with everything in it, easy true rollbacks etc.). Although, that might bite my preference for ‘simple, understandable operating systems that lack complexity’…

                                                              1. 2

                                                                It looks like we have some outline dressing up:

                                                                (sometimes) a build phase

                                                                Jenkins, custom script, language or OS package manager… The question of dependencies happens here, and while compiling, self-contained executables seems to lower the burden (static linking for Go/C/…, self-contained .jar).


                                                                Git, scp, mirror server… This step may take time and bandwidth. I guess as long as you are not Facebook you do not need torrents for that.

                                                                Switching the version

                                                                Blue/Green deployment seems looks the silver bullet: use a symlink pointing at the production and replace it atomically. I’ve not seen anyone using the switch at the network level, such as haproxy or BGP/routing table. I guess as long as you do not have 30 production nodes an FS-level switch fits better…

                                                                Nix OS is mentioned everywhere, probably due to it achieving Blue/Green deployment for the entire system.

                                                                Capistrano sounds like the industry standard solution that comes from the epoch where many deployment tools were written in Ruby (RoR glory days? Think about Puppet).

                                                                Configuration deployment

                                                                Unless if coupled with the application deployed, configuration is managed uncoupled/separately. At each code release, there may be breaking configuration format change so upgrading the configuration is to consider (debian running sed -i in /etc, OpebBSD showing a warning but not touching to anything).

                                                                Git is mentioned very often, for it permit showing the history and rollback. There is not as much pressure in having configuration changed atomically as it is usually only read at application startup/SIGHUP.

                                                                Integration with the OS

                                                                Every project requires features from the operating system, that at least be the kernel. There is the need to fit the application dependencies/requirements from the OS with the state of the packages.

                                                                I guess this is either stated out by the developers, which tell the admins when they switch their dev environment to another version of XYZ.

                                                                Or in case of using OS packages for deployment, the dependencies can simoly be stated out in the package: sites.txz, .deb, .rpm, .nix…

                                                                1. 1

                                                                  Out of it, which step makes docker a requirement? Is failing at this above or wanting less of its hassle the reason?

                                                                  It does not seems that painful to me: make, scp, ln, git pull, service manager reload/restart sounds like all it takes.

                                                                  Maybe people had different experiences or maybe hiring 1 admin was too much for smaller teams, so the lead dev use some container hosting service to get done with deployment.

                                                                2. 1

                                                                  Build an Erlang release, deploy it to my host with edeliver, which is a bunch of bash scripts around ssh/scp and remote commands https://github.com/edeliver/edeliver

                                                                  rollbacks: when you push a release you can select which version

                                                                  configuration: there’s a prod.secrets.exs file on my server. When I update config, I scp it, and the new release reads from it.

                                                                  granted, I’m not powering a major company but have a hobby project, it’s never given me trouble.

                                                                  1. 1

                                                                    An approach for a recent Python project uses AWS CodeBuild/CodeDeploy/CodePipeline via Github webhooks.

                                                                    The server itself is configured via terraform. That is, the basic instance type, and cloudinit stuff so when a new instance boots for the first time it configures itself to be capable of running the codebase. Basic system packages, etc… it runs AWS agents that auth via the instance IAM role so there is no need to store keys anywhere.

                                                                    Then the CodeDeploy process does the rest. The build is created and the test suite is run. Once that passes, it gets deployed to the server. This essentially involves destroying the code that is there and doing a clean checkout. Then it restarts the processes for this particular app which all live under a single/parent systemd service. All that stuff is bootstrapped by the code deploy (it’s idempotent so the same calls get run every deploy and it something is missing it gets added)

                                                                    It’s not a blue/green deploy so there is a blip of downtime. In this instance, that’s okay/acceptable.

                                                                    No containers. Just a virtual environment to isolate python dependencies and systemd.

                                                                    It’s fun being able to completely nuke any part of my infrastructure and run “terraform apply” to bring it right back to normal and a git push to master to trigger a code deploy.

                                                                    I’ve seen and managed it all. Bare metal. Heroku. AWS. GCP. Kubernetes the hard way (in production at FarmLogs), VPS’ etc…. there is no one right or wrong way to do it but the two biggest things I’d focus on are 1. Idempotency and 2. Infrastructure as code. The underlying stuff running your app doesn’t matter. A container is just isolation around regular Linux processes. It’s all the other stuff that is more important to sort out.

                                                                    1. 1
                                                                      • What does your deployment process look like?

                                                                        I deploy using saltstack and rsync - both application dependencies and the application itself are transferred via rsync. This allows very small amounts of data to be moved around and skipping the build step for very fast deployments (< 30s).

                                                                      • How do you ensure that your deployments allow rollbacks?

                                                                        git revert and re-deploy - note, this will not handle reverting db migrations - practically speaking we’ve almost never had to revert and we’ve never had to revert a migration. QA does a pretty good job of making sure we don’t need to do this often and we’ve done thousands of deploys, so the process is pretty stable.

                                                                      • Do you version your deployment files ? How are you tracking compatibility with builds?

                                                                        Everything is in a mono-repo - config management stuff is in saltstack configs in the repo

                                                                      • How are you achieving immutable architecture or at least limiting configuration drift?

                                                                        All the hosts are updated at the same time with the same binaries and libs - all automated in saltstack.

                                                                      • How do you handle configuration for your services? Are they tied to deployment scripts or to releases?

                                                                        All the configs are generated via salt and versioned back to git so we can track any change to the entire system back to the repo.

                                                                      1. 1

                                                                        Outside of work, all of the software I develop is “deployed” into my REPL with (ql:quickload :my-software). To rollback I “git checkout ”.

                                                                        At work some of our software is distributed as RPMs (everntually also as DEBs), so it’s deployed to our customers via our download site. A few projects are shipped as Android APK files, and they’re also downloaded from the website.

                                                                        At my previous job we created .iso images that contained a custom Linux distro with our software baked in, and we uploaded .iso images to a download site. We had a goal to make them usable by the customer’s storage admins, but when I left upgrades (deployments) were still done by our support people ssh’d into the customer’s cluster. After an upgrade completed there was no way to rollback, but an upgrade in progress could be rolled back by re-installing the previous version onto the nodes that had been upgraded already. That would’ve been something for the on-call engineers, but I never had to do it on a customer cluster.

                                                                        Seeing what web devs have to put up with to deploy makes me happy I stayed away from it :-)

                                                                        1. 1

                                                                          At $work we build a machine image with the latest version of the code and launch a bunch of VMs from that. Downside is it takes 10-15 minutes to get new code to production as you have to serialise the entire VM disk. Upside is that everything is reset to a clean state after each deployment (like docker) without needing to think about multiple extra layers.

                                                                          1. 1

                                                                            At $sidegig I have a capistrano-like setup: The (sole) production server has a bare git repo with a post-receive hook. Post-receive hook checks out the master branch to a new directory, symlinks in some shared folders (eg dependencies, logfiles, disk cache), replaces the current-release directory symlink and tells the server to reload its code. Deploys take ~15 seconds. Should be able to scale to 10x current load on the current server, and 1000x (probably much more) by just running on whatever the biggest AWS offering is.

                                                                            Postgres runs on the same server, with a nightly pg_dump copied to s3. Takes me about 20 minutes to build a complete fresh instance and restore the database, which I’ve only had to do twice in three years (and don’t expect to do again for at least another year).

                                                                          2. 1

                                                                            Do you guys ever feel that a chroot jail would be beneficial? That’s all I use containers for.

                                                                            1. 1

                                                                              Correct me if I’m wrong, but jails are for isolation, and from I read mostly, people use Docker (or containers really) as an artifact (like a big tgz).

                                                                              So the question is, how do you release your artifacts in this jail?

                                                                            2. 1

                                                                              Sadly very manual, but it’s how the customer wants it and so it’s

                                                                              scp release_X_Y_X.tgz server1:
                                                                              ssh server1
                                                                              stop daemon
                                                                              mv release release-old
                                                                              tar xf release_X_Y_X.tgz
                                                                              start daemon


                                                                              The good thing is that these releases coincide with a multi-hour long maintenance window so it’s not like we need zero downtime or anything fancy. Still feels like a jump 10 years to the past.

                                                                              Also our other projects have a nifty infra webapp where you configure your release from several bundles and then press a button and wait for it to be finished. Again not zero-downtime, but rollback on another button press.

                                                                              Personal story aside, deployment could be fancy or boring, bulletproof or flimsy, sophiticated or dull - even long before Docker. I’ve seen and done all of the above mentioned without Docker. Well, immutable could be debated I guess.

                                                                              1. 1

                                                                                Hey, a bit of automation and that becomes a fine n’ dandy “one button push deploy”!

                                                                                Does the customer not want that?

                                                                              2. 1

                                                                                I have inherited a mixed bag at work with Envoyer being used to deploy some things to our server with others being a manual process of uploading the files. We also have four event servers that get taken to events, those run Docker but not as you might think and are updated by remoting into each server then remoting into the www docker container and running a combo of git fetch, git reset, git pull with the upstream hard coded to be one of the developers laptops rather than Github due to the nature of events being largely air gapped.

                                                                                In previous employment what I am used to is using a vagrant file per project and a shared init script that provisions a vm to be identical to what we have in deployment. Then once we are happy to deploy I had written a command line tool that would get a zip containing the project at the git hash selected, sftp it to the selected server environment (live-dev or live staging) and run either the install.sh or update.sh script in its .deploy folder depending on kind of deployment.

                                                                                To get a project on to the live-production environment we would run a different tool staging-to-live as everything had to be pushed to staging and checked off by both the client and our internal quality control team before it could go live.

                                                                                Both command line tools supported rolling back to the previous state and worked with zero issues for years; using git hooks we were able to automate things like deploy dev-{branchname}.example.com on commit to branch.

                                                                                Just to add, the init script for the vagrant vms was normally curated once every quarter with a base linux image being generated that our vagrant installs would use rather than sitting for half an hour running through build steps for the environment. The same was done for the staging/production and development servers, database was handled by aws as a separate concern because nobody liked maintaining that and that meant we could bring up a replacement application server and then deploy all applications on to it as they are in their current state on the previous server, swap ip’s and then after some tick boxes shut down the old server and continue as normal. This also allowed for horizontal scaling behind a load balancer for times when we had increased traffic load which luckily for us was predictable around global calendared events.

                                                                                1. 1

                                                                                  Hacky PHP script on top of a database which stores configuration-parameters. The script basically returns bash scripts with specific commands, prepending some global variables depending on which server called it.

                                                                                  The application itself is a name-version.tar.gz file which gets downloaded and unpacked. So the flow for production is: Pack up the application, upload it to the server, bump the version number in the database for a specific set of servers and wait a few minutes for the servers to poll for their next commands.

                                                                                  There’s also some code that reports the status of the current servers back and instructs my hosting provider to create and destroy servers based on the total load of the fleet. The only servers outside of this flow, are the three database-servers, but they do report their status through this mechanism though. The server running the php-scripts is also outside this flow.

                                                                                  1. 1

                                                                                    I ship mostly Magento 2 and Magento 2 module builds. There’s a main Magento 2 repository with a Release that is polled for new commits by a Jenkins server. The Jenkins server then pulls down the code, runs the Magento build tools and archives the release artifact. From there Jenkins kicks off an Ansible playbook that creates and configured EC2 instances in AWS, along with a new ELB. We enter maintenance mode, run more Magento setup commands, and swap our DNS from one ELB to another.

                                                                                    There’s also a single CRON instance that has the artifact deployed to.

                                                                                    Our rollback strategy? Take everything offline for an hour and restore a database snapshop from RDS. Manually reconcile any orders that may have slipped through.

                                                                                    Magento is hell to deploy in a sane way.

                                                                                    1. 1

                                                                                      I am and have been using various means, both container based and container-less.

                                                                                      For system configuration: Some base OS (as minimal as possible), configuration management or just an overlay (in an OpenBSD based test environment I once played with siteXXX.tgz which appears to do the job pretty well).

                                                                                      For application deployment: I prefer static binaries of Go applications with checked in vendor directory or tarballs if there’s more than a directory. This usually is run by some kind of supervisor (daemon(8) on FreeBSD, monit or supervisorctl elsewhere, rarely systemd).

                                                                                      Deployment either happens by some CI or using configuration management. Alternatively (especially rails) capistrano also is a nice way to deploy applications. For very small cases, where complete reproducibility isn’t really necessary go binaries can be scp’ed followed by a restart. Using makefiles works very well for that. But I only ever used that work small/quick use cases.

                                                                                      Rollback via capistrano or by deploying an older tar. However, to be quite honest, no matter whether you’re on containers, rollbacks per se are never really the problem. The problem ends up being on non-reversible parts, like DB-migrations (where you’d lose important data or break compatibility).

                                                                                      In general, cases where a full-rollback really is wanted it never was a huge problem, given you used a sane and simple enough underlying system and did not just blindly mess around with libraries provided as packages for example.

                                                                                      Sometimes rolling back in a way you’d do using containers isn’t really what you want, but actually a new version or a version of a deployment where only parts are rolled back (eg. security patches should be in, application code should be rolled back). This is why it makes sense to have an option for partial rollbacks (similar to how you’d might want to create a new commit reverting an old commit, without completely resetting to that version, reintroducing other problems or breaking compatibility with data).

                                                                                      Configuration drift can be handled like 20+ years ago by images and/or checked in configuration. While I used to be a proponent of declarative management I now consider it to be more a philosophical question, because even when using Ansible with YAML (and not using commands for example) you still can end up in different states on different systems, also by applying the same cookbook twice. How obvious that is depends on the specific case. I think it’s a common modern fallacy to assume a declarative syntax in configuration management automatically prevents that. However, I don’t want to start a war about declarative vs imperative configuration management. What I want to say is that you need to be aware in the back you’re still running commands in a usually imperative manner. So make sure you have a clue about what’s going on behind these abstractions.

                                                                                      One last thing: Nomad by Hashicorp also has java, exec and raw runtimes, which also aren’t containers. So you can get a lot of what people want from containers (using Kubernetes for example) without actually running them. I experimented a bit with that, but never used it in production. I would be curious about experiences.

                                                                                      1. 1

                                                                                        As a learning exercise I tried to do something like this:

                                                                                        • store the entire OS filesystem in git and chroot into it if you ever want to update anything
                                                                                        • keep your project in another repo, configure CI to build static binaries
                                                                                        • in the project CI also clone the OS repo and make a VM image containing both
                                                                                        • upload the VM image to your host and use something like nomad to get everything running

                                                                                        Conceptually making a VM image isn’t much more than making a .zip but the tools I found for it were incredibly complicated, and I figured that would just make it annoying and boring to work on so I gave up. But I’m pretty sure it would all work!