1. 17
  1.  

  2. 4

    I tend to try and have as much configuration as possible in a config file, included in the Docker image along with the application code, and then let that config file reference environment variables for stuff such as passwords and API tokens. That way the developers of the application can basically manage all the configuration themselves and deploy configuration changes along with code changes, without them having direct access to the passwords and tokens needed to run the service.

    They can of course always dump the environment that the application is running in and expose the secrets that way, but the only way to avoid that is probably to use temporary credentials for everything, with unfortunately isn’t always an option.

    1. 4

      This has been my main criticism of 12 factor as well. So many instances where something has gone wrong because of ENV variables where a simple config file would’ve fixed everything. But they have their place and sometimes they are the correct solution. Still not a fan.

      1. 3

        Another issue with using env vars is that any part of the program can use them. It doesn’t force the developer to make the configuration schema explicit.

        I have been in situations where different part of the program were loading different environment variables, because they were designed by different people. It becomes a mess quite quickly.

        1. 5

          I have been in situations where different part of the program were loading different environment variables, because they were designed by different people. It becomes a mess quite quickly.

          Ick. Only the entrypoint of the program (func main or equivalent) has the right and responsibility to take information from the environment and provide it to components that need them. Corollary: if a component needs a bit of config, it should take it in the form of an explicit constructor or initialization parameter, never by implicitly reaching into the runtime environment or the global namespace.

          1. 1

            I like your point of making “configuration schema explicit” but I don’t know that any popular config file format actually does that. I have an idea that an application should always read its configuration from a database. An in-memory SQLite db would be sufficient for many purposes.

            The “config file” is just a dump of a database from a known state. Its format is portable, editable, and standard, and any arbitrary data schema can be encoded in the relational model.

            At startup time, the application initializes the database from this dump “config file” and also loads commandline and environment parameters into the database. From that point on, all components obtain configuration by SQL query.

            1. 2

              I like your point of making “configuration schema explicit” but I don’t know that any popular config file format actually does that.

              Commandline flags as the only (or primary) way to get configuration from the environment into the program has this side effect: yourprogram -h authoritatively describes the configuration surface area.

              Self-promotion: https://github.com/peterbourgon/ff

        2. 2

          While reviewing this blog one of our developers pointed out that according to The Twelve-Factor Application Manifesto the environment is where all your application configuration belongs. I personally feel this advice is very narrow and only really applies to certain types of apps deployed in a certain way

          As a former co-worker of the author of said manifesto, I can say without hyperbole that “only really applies to certain types of apps deployed a certain way” is the whole point of the manifesto. Not all deployments can be 12-factor for a variety of legitimate reasons, but having a term like “12-factor” you can point at and say “it works like that” simplifies a lot of things and can remove a lot of headache.

          This article raises a lot of good concerns, but at the same time, every single one of them does not apply to any of the deployments I’ve done professionally in the past decade.

          1. 2

            I have come to believe that secrets should always be passed by reference (usually a path in the filesystem), not by value. This holds true for configuration files as well. If you are able to enforce that consistently, suddenly it becomes a non-issue to log environment variables or dump the config file for inspection. Which makes a whole set of other activities like debugging much easier.

            1. 5

              I have come to believe that secrets should always be passed by reference (usually a path in the filesystem), not by value.

              I like passing them as a file descriptor, because it really truly is a capability: unforgeable yet shareable.

              1. 1

                That’s a good idea. Are you able to apply this in the container world or did you create your own special scheduler?

                In Kubernetes the canonical way is to mount the secrets on disk, which makes them vulnerable to file-traversal attacks if there are any.

                1. 1

                  I haven’t done it with containers, only with processes. It should be possible to inject into a container, but I don’t know how well the tooling supports this. Probably not well — POSIX file descriptors are criminally underknown.

                2. 1

                  I’m guessing you mean to use something like file descriptor redirection in a shell command, e.g.:

                  python my_script_needs_secrets.py 3</path/to/secret
                  

                  Then inside the process:

                  secret=os.fdopen(3).read()
                  

                  This is a great approach for security, but how does it scale with multiple secrets? Do you use a separate descriptor for each one, or cat them all into the same descriptor? How do you organize your app to know which descriptor contains the secret data?

                  1. 1

                    When I’ve used the technique, I’ve just used a different descriptor for each, but one could send a bunch of secrets down one descriptor in some format if one wished.

                    The mapping of descriptor to schema is part of the documentation, typically a README (this is all for internal software, often just for my own use).