1. 14

  2. 5

    A lot of the tools designed to make configuration management more declarative fall short for me, mostly because they end up feeling pretty imperative by the time I get it where I need it to be.

    I really like NixOS’s approach because the options are a very nice declarative API IMO. On top of that if you have to make more complicated options nix gives you the power to abstract with functions. You can then offer nice clean interfaces to users without them having to understand the nitty gritty, which I feel is the goal of being declarative.

    The Nix approach also makes rolling things back pretty painless, as it keeps every declarative configuration as a generation of the system. So if something goes wrong you can just switch to the previous generation.

    1. 1

      That sounds fantastic. Unfortunately not everyone has their choice of OS to run :)

      Most times I’ve done that class of work in my career the OS / distro has been pre-chosen and I have to work with that.

      In my current gig it’s Amazon Linux all the way.

      1. 1

        You can run Nix on any Linux (and maybe more systems besides) not just on NixOS.

    2. 3

      This definitely resonates with me.

      At $work we’ve been using ansible for 4 years. It’s a great tool, and I still recommend it for some use-cases. But you definitely end up needing a more flexible flow and control over it. Now, ansible offers you plenty of escape hatches, you can use conditionals in the list of yaml statements, and you can add parameters to your playbooks. But over time this becomes increasingly hard to parse, and exposes an unclear interface to the users (developers.)

      Recently I’ve been converting our infrastructure to a kubernetes environment, and it’s the same story. When you’re working through a simple kubernetes example, the minimal deployment-spec-as-yaml is wonderfully clear. But in order to support different flows (or even just different environments as part of the same flow,) you need more power. That’s why, this time round, I have written a small library+cli that is specific to our conventions and requirements. All it does is wrap existing tools (docker, kubectl etc.) but the point is it exposes commands to do certain things with them. And it’s written in an actual programming language (in our case, python) so that conditionals, parameters etc. are easy to write and read.

      1. 3

        There’s this great debate in tech about the benefits of simple configs that are easily to statically analyze vs flexible turing complete languages. Often we see “simple” config language will rather quickly grow conditionals and loops via recursion.

        In most cases I’d much rather just start with a real programming language with real testing frameworks and analysis tools. In my personal projects I prefer using lisp, lua, or (if needed) javascript.

        You can start with:

        :http 80
        :https 443
        :keyfile foo.key
        :certfile foo.crt

        and let that grow naturally.

        Also, same argument for me applies to ansible and salt. I can test my chef recipes using rspec. I’ve recently moved to salt because it has other nice properties, but I find jinja templated yaml files to be yuck and I’m in the process of moving towards more python so things can be re-usable and unit testable (salt supports a python “renderer” and python modules).

        1. 4

          The alternative dating back to LISP is a powerful language with great support for DSL’s. Then, you can use increasingly complex DSL’s and/or language primitives. I see this repeating with better safety in Haskell-land.

          EDIT: Ive also seen logic approaches like Prolog where they just describe it. The runtime does the rest. One did it for Cmake.

          1. 2

            I don’t personally favor the DSL approach. How much do they really buy you over using the syntax and tools that whatever programming language you’re working with give you.

            1. 3

              Do you use sed, HTML, or SQL? Those are DSL’s. The main value users mention is that they’re declarative, often concise, often clear in meaning, and improve productivity. The disadvantage comes when you need something they’re not good at or just raw performance. During debates on the topic, the DSL proponents often pointed out that the alternative, flexible language + libraries, essentially devolves into the same problem on the library side with you stuck memorizing their terms, working within their patterns/frameworks, having to call external things, and so on. Between then, DSL’s are cleaner for a lot of purposes. Aside from above examples, state machines, GUI generation, data formats, bindings, language grammers, test engines, and so on all come to mind. Way easier to do that stuff in a DSL that autogerates code for your language of choice.

              Pieter Hintjens has a nice write-up on the topic given iMatix delivered robust, high-performance apps in C using a set of DSL’s. Recall, though, I advise a powerful language like Scheme that can DSL within itself for consistency. A developer named sklogic does the same for his compiler-writing tool with ability to use LISP, Standard ML, Prolog, XML, etc all in one app seemlessly depending on best tool for the job.



              Another example is Galois using Haskell DSL’s for stuff like writing correct C code. Their Ivory language is good example where it’s advantageous to DSL in Haskell w/ C extraction than build their own tool or exclusively rely on either language in isolation.


              1. 3

                Your response is spot on, but because I wasn’t clear enough in my original post, let me shed some light on what I was trying to express.

                I’ve been working in the new fangled sysadmin/ops/devops/whatever you want to call it space for about 5ish years now. I’ve used Chef pretty much that entire time. I’m pretty familiar with it.

                In my experience, I have found that Chef works really great when your configuration management needs are comparatively simple, but when they become complex, Chef’s DSL starts to become more of an encumbrance than it’s worth - you end up lost in a sea of detail that’s germane only to the DSL.

                As a very concrete example - Chef’s execution model is not intuitive to say the least - is this executing at compile time or at convergence? Why does notify not work the way one would expect? Why are there 40 different syntaxes for attributes?

                So, rather than adding value in a very constrained problem space (e.g. query a database, edit streams of text), in my experience configuration management DSLs can become a morass of detail to master, ultimately requiring that you dive deep and learn the underlying code anyway.

                So what’s the point? Wouldn’t a nice straight forward Python or Ruby library with a well thought out, properly abstracted API do a better job of helping the programmer solve the problem at hand? With that approach, you only have to master a single set of semantics - those of the programming language being used.

                1. 2

                  An internal DSL (like Ivory) does reuse a lot of the core semantics of the language – and can be a nice and intuitive way to work with an underlying API. For example, Rake’s DSL is a succinct way to work with the internal Task abstraction.

                  It is external DSLs – which require laborious redefinition of every PL feature – that are the real problem in my mind.

                  A “cute” Ruby DSL for configuration management could still be importable – it doesn’t have to be like Chef where require is replaced with require_recipe. Let’s say the DSL is called bbq. You could write a config like this:

                  require "bbq"
                  require "company/infra"
                  bbq "App Server" do
                    use Company::Infra::NTPConfig
                    use Company::Infra::SSHConfig
                    use Company::Infra::Users
                    task "Update Ruby" do
                      if ENV["RUBY3"]
                        curl_pipe_sh "https://company-infra.s3.amazonaws.com/ruby" 
                        bash "aptitude install -y ruby"
                  1. 1

                    I have no problem with a DSL like the one you just postulated. It is lightly layered over plain old Ruby code and as a result does not impose a huge additional cognitive load on the developer.

                  2. 1

                    Ok, I see where you’re coming from. What you’re actually experiencing are two problems with only one being common with DSL’s: pain of moments where your needs mismatch what the DSL provides (common); the DSL’s actually being a complicated piece of software without clear model of how it works (uncommon). The most prominent DSL’s of the past were BASIC-like 4GL’s, Excel, HTML, and SQL. Your experience with these should show they were fairly easy to understand at a glance. The conceptual mapping is straight-forward plus the language itself is high-level enough to save time. That’s how a good DSL should be. It seems to me Chef just isn’t well-designed or what it’s being used for has high complexity that’s seeping into the language too much. Maybe it needs increased flexibility.

                    “So what’s the point? Wouldn’t a nice straight forward Python or Ruby library with a well thought out, properly abstracted API do a better job of helping the programmer solve the problem at hand?”

                    It can. You can use either. The best DSL’s will be embedded in your language’s semantics or be similar. Moreover, they should provide a way to call custom functions for weird situations. An easy example of that are state machine compilers that let you combine a high-level description of states/transistions and a list of custom functions. It does the rest.

                    In most cases, libraries are fine. It’s really when there’s a lot of glue, boilerplate, scaffolding, portability issues, error handling, etc that it helps to modify the language itself to do those cleaner. DSL’s are easier than doing a whole language or ecosystem, though. So, it’s really what tool suits your needs for a given situation. I’d also recommend, as with any dependency, that you have an exit strategy where you’ve chosen a DSL or tool that’s easy to move off of if it becomes a problem. A simple one in terms of syntax and execution model might even be automatically translated into something else during a move.

        2. 3

          I’ve been thinking towards the idea of late that we’re moving into the post configuration management era.

          Tools like Ansible and Chef are awesome, but can easily become ungainly when you try to instrument them with enough intelligence to handle the dynamic cluster and fleet configurations many of us have to contend with nowadays.

          Things like Terraform are IMO at least a step in the right direction, and John Keiser et al over at Chef have ben doing some good work with Metal -> Provisioning, but I’d like to see us take this idea several steps farther.

          Ultimately I wonder if we’ll end up with systems that feel more like really rich APIs / libraries with primitives for doing most of the things CM tools do today but which will lend themselves more readily to the super dynamic world we’re currently living in.

          1. 2

            In my opinion they will continue to be needed but they will actually have to do less work because of highly dynamic clusters.

            You won’t add a node to a load balancer by updating some metadata and re-cheffing, your load balancer conf will be a static file that just says something like:

            backends {

            Or even (and not enough people do this imo)

            backends {
                web.foo.com // DNS is integrated with service discovery via multiple A or SRV records

            With good service discovery and proper naming (via etcd or even DNS) and proper namespacing (via containers every web server can listen on :80 w/o collisions) you can make the role of ansible/chef so simple they almost don’t need to exist.

            1. 1

              Oh I totally agree. Things like Consul make integrated service discovery and software configuration much more sane.

              Also pairs nicely with tools like Terraform, which IMO represents one aspect of the next step for configuration management.

          2. 2

            I’m not in DevOps, I’m a developer. But in order to test our code (before it goes to QA) I do need to set up a “mini-environment” (To test one component I may need to run five or more other components). I’ve done the “master config to config all the config files” route and well … it felt silly. Now, I have a script that parses the sample config file checked into version control, makes modifications based upon the local system (mostly setting up IP addresses) and writes the new versions. I’m not sure if it’s better.

            I’ve had this notion (for a few years now) that I’m performing a link stage by hand. If all you have is a simple assembler that spits out binary files (no object files), then to “link” several files together, you have to manually manage symbol addresses and specify, say, the address of routine X from file A in file B as part of the source code, and any changes to A might require a change to said address in B. Over time, assemblers got smarter and could handle such details for you (or rather, the combination of an assembler and linker). I get the feeling that configuring a distributed system is much like that old assembler—too much hand work and there has to be a better way. We have to specify service X on machine A to machine B. Now we need the linker.

            1. 3

              Too lazy to look it up, but this isn’t the first time I’ve heard the analogy of dynamic linking and service discovery. Interesting things that come from this analogy:

              What’s the equivalant of ldd/otool -L? Why can’t we look at a “thing” and know what services it needs and what versions of those services are required?

              What’s the equivalent of the ELF file format that packages up behaviour and data and stubs for dependency resolution.

              Where’s ld.so, our ELF file loader?

              Some might say it’s Docker/OCI/K8s, maybe they are right. I think it would be awesome if we found a way to take another look at the problem, squint, and just find a pretty simple way to make apps aware of these concepts.

              1. 1

                DNS/IP are the linker. Apps should connect to domain names like: database, queue, backend and something below them – at the OS level – should find out which database is the local one and resolve it.

                Implicitly, an application at web.sandbox.example.com using database will go reach out to database.sandbox.example.com.