1. 70
  1. 33

    My position has essentially boiled down to “YAML is the worst config file format, except for all the other ones.”

    It gets pretty bad if your documents are large or if you need to collaborate (it’s possible to have a pretty good understanding of parts of YAML but that’s not always going to line up with what your collaborators understand).

    I keep wanting to say something along the lines of “oh, YAML is fine as long as you stick to a reasonable subset of it and avoid confusing constructs,” but I strongly believe that memory-unsafe languages like C/C++ should be abandoned for the same reason.

    JSON is unusable (no comments, easy to make mistakes) as a config file format. XML is incredibly annoying to read or write. TOML is much more complex than it appears… I wonder if the situation will improve at any point.

    1. 23

      I think TOML is better than YAML. Sure, it has the complex date stuff, but that has never caused big surprises for me (just small annoyances). The article seems to focus mostly on how TOML is not Python, which it indeed is not.

      1. 14

        It’s syntactically noisy.

        Human language is also syntactically noisy. It evolved that way for a reason: you can still recover the meaning even if some of the message was lost to inattention.

        I have a mixed feeling about TOML’s tables syntax. I would rather have explicit delimiters like curly braces. But, if the goal is to keep INI-like syntax, then it’s probably the best thing to do. The things I find really annoying is inline tables.

        As of user-typed values, I came to conclusion that everything that isn’t an array or a hash should just be treated as a string. If you take user input, you cannot just assume that the type is correct and need to check or convert it anyway, so why even bother having different types at the format level?

        Regardless, my experience with TOML has been better than with alternatives, despite its flaws.

        1. 6

          Human language is also syntactically noisy. It evolved that way for a reason: you can still recover the meaning even if some of the message was lost to inattention.

          I have a mixed feeling about TOML’s tables syntax. I would rather have explicit delimiters like curly braces. But, if the goal is to keep INI-like syntax, then it’s probably the best thing to do. The things I find really annoying is inline tables.

          It’s funny how the exact same ideas made me make the opposite decision. I came to the conclusion that “the pain has to be felt somewhere” and that the config files are not the worst place to feel it.

          I have mostly given up on different config formats and just default to one of the following three options:

          1. Write .ini or Java properties-file style config-files when I don’t need more.
          2. Write a dtd and XML when I need tree or dependency-like structures.
          3. Store the configuration in a few tables inside an RDBMS and drop an .ini-style config file with just connection settings and the name of the config tables when things get complex.

          As of user-typed values, I came to conclusion that everything that isn’t an array or a hash should just be treated as a string. If you take user input, you cannot just assume that the type is correct and need to check or convert it anyway, so why even bother having different types at the format level?

          I fully agree with this well.

        2. 24

          Dhall is looking really good! Some highlights from the website:

          • Dhall is a programmable configuration language that you can think of as: JSON + functions + types + imports
          • You can also automatically remove all indirection in any Dhall code, converting the file to a logic-free normal form for non-programmers to understand.
          • We take language security seriously so that your Dhall programs never fail, hang, crash, leak secrets, or compromise your system.
          • The language aims to support safely importing and evaluating untrusted Dhall code, even code authored by malicious users.
          • You can convert both ways between Dhall and JSON/YAML or read Dhall configuration files directly into a language that supports a native language binding.
          1. 9

            I don’t think the tooling should be underestimated, too. The dhall executable includes low-level plumbing tools (individual type checking, importing, normalization), a REPL, a code formatter, a code linter to help with language upgrades, and there’s full blown LSP integration. I enjoy writing Dhall so much that for new projects I’m taking a more traditional split between a core “engine”, and then pushing out the logic into Dhall - then compiling it at a load time into something the engine can work with. The last piece of the puzzle to me is probably bidirectional type inference.

            1. 2

              That looks beautiful! Can’t wait to give it a go on some future projects.

              1. 2

                Although the feature set is extensive, is it really necessary to have such complex functionality in a configuration language?

                1. 4

                  It’s worth understanding what the complexity is. The abbreviated feature set is:

                  • Static types
                  • First class importing
                  • Function abstraction

                  Once I view it through this light, I find it easier to convince myself that these are necessary features.

                  • Static types enforce a schema on configuration files. There is almost always a schema on configuration, as something is ultimately trying to pull information out of it. Having this schema reified into types means that other tooling can make use of the schema - e.g., the VS Code LSP can give me feedback as I edit configuration files to make sure they are valid. I can also do validation in my CI to make sure my config is actually going to be accepted at runtime. This is all a win.

                  • Importing means that I’m not restricted to a single file. This gives me the advantage of being able to separate a configuration file into smaller files, which can help decompose a problem. It also means I can re-use bits of configuration without duplication - for example, maybe staging and production share a common configuration stanza - I can now factor that out into a separate file.

                  • Function abstraction gives me a way to keep my configuration DRY. For example, if I’m configuring nginx and multiple virtual hosts all need the same proxy settings, I can write that once, and abstract out my intention with a function that builds a virtual host. This avoids configuration drift, where one part is left stale and the rest of the configuration drifts away.

                  1. 1

                    That’s very interesting, I hadn’t thought of it like that. Do you mostly use Dhall itself as configuration file or do you use it to generate json/yaml configuration files?

                2. 1

                  I finally need to implement Dhall evaluator in Erlang for my projects. I <3 ideas behind Dhall.

                3. 5

                  I am not sure that there aren’t better options. I am probably biased as I work at Google, but I find Protocol Buffer syntax to be perfectly good, and the enforced schema is very handy. I work with Kubernetes as part of my job, and I regularly screw up the YAML or don’t really know what the YAML is so cutty-pasty from tutorials without actually understanding.

                  1. 4

                    Using protobuf for config files sounds like a really strange idea, but I can’t find any arguments against it.
                    If it’s considered normal to use a serialisation format as human-readable config (XML, JSON, S-expressions etc), surely protobuf is fair game. (The idea of “compiled vs interpreted config file” is amusing though.)

                    1. 3

                      I have experience with using protobuf to communicate configuration-like information between processes and the schema that specifies the configurations, including (nested) structs/hashes and arrays, ended up really hacky. I forgot the details, but protobuf lacks one or more essential ingredients to nicely specify what we wanted it to specify. As soon as you give up and allow more dynamic messages, you’re of course back to having to check everything using custom code on both sides. If you do that, you may as well just go back to yaml. The enforced schema and multi language support makes it very convenient, but it’s no picnic.

                      1. 2

                        One issue here is that knowing how to interpret the config file’s bytes depends on having the protobuf definition it corresponds to available. (One could argue the same is true of any config file and what interprets it, but with human-readable formats it’s generally easier to glean the intention than with a packed binary structure.)

                        1. 2

                          At Google, at least 10 years ago, the protobuf text format was widely used as a config format. The binary format less so (but still done in some circumstances when the config file wouldn’t be modified by a person).

                          1. 3

                            TIL protobuf even had a text format. It sounds like it’s not interoperable between implementations/isn’t “fully portable”, and that proto3 has a JSON format that’s preferable.. but then we’re back to JSON.

                    2. 2

                      JSON can be validated with a schema (lots of tools support it, including VSCode), and it’s possible to insert comments in unused fields of the object, e.g. comment or $comment.

                      1. 17

                        and it’s possible to insert comments in unused fields of the object, e.g. comment or $comment.

                        I don’t like how this is essentially a hack, and not something designed into the spec.

                        1. 2

                          Those same tools (and often the system on the other end ingesting the configuration) often reject unknown fields, so this comment hack doesn’t really work.

                          1. 8

                            And not without good reason: if you don’t reject unknown fields it can be pretty difficult to catch misspellings of optional field names.

                            1. 2

                              I’ve also seen it harder to add new fields without rejecting unknown fields: you don’t know who’s using that field name for their own use and sending it to you (intentionally or otherwise).

                          2. 1

                            Yes, JSON can be validated by schema. But in my experience, JSON schema implementations are widely diverging and it’s easy to write schemas that just work in your particular parser.

                          3. 1

                            JSON is unusable (no comments, easy to make mistakes) as a config file format.

                            JSON5 fixes this problem without falling prey to the issues in the article: https://json5.org/

                            1. 2

                              Yeah, and then you lose the main advantage of json, which is how ubiquitous it is.

                              1. 1

                                In the context of a config format, this isn’t really an advantage, because only one piece of code will ever be parsing it. But this could be true in other contexts.

                                I typically find that in the places where YAML has been chosen over JSON, it’s usually for config formats where the ability to comment is crucial.

                            1. 8

                              Nowadays I use code files for configuration pretty much all the time because:

                              • the team doesn’t have to learn an additional configuration language since it’s the same language than the rest of the project
                              • it’s powerful, nothing prevents you to use some environment variables for example
                              • comments, documentation and variables work as expected
                              • sane programmers will never put crazy amounts of complexity and business logic in configuration files even if you don’t force them to use a dumbed-down configuration language
                              1. 2

                                This.

                                I find Powershell hashtables great for configuration.

                              2. 8

                                The only good config file format is a SQLite database

                                1. 11

                                  With some YAML in it!

                                    1. 1

                                      How do you feel about a tree of text files, each corresponding to a config key?

                                      How many keys are in your config file? 100? 1000? 10,000?

                                      Even 10,000 text files in a tree of subdirectories is not that unmanageable.

                                      You can store them in a repo, and be able to immediately see what’s changed without even doing a diff.

                                      Also one of the most accessible formats as far as tooling.

                                      Easy to write GetConfig() and SetConfig() for, and performs well with basic caching (static hash in these functions.)

                                      Did I miss anything?

                                      Oh yeah, defaults. I have an imperfect solution to defaults that I use in my current project. There is a default/ directory, which contains all the default settings in the same format.

                                      Example: default/interface/voting/enable_checkboxes

                                      The first time GetConfig() is called on this value, if it is not present in config/interface/voting/enable_checkboxes, the value from default/ is copied over.

                                      This also allows me to have a test which checks for orphaned default settings (if they’re present in default/ but not in config/ after test script.)

                                        1. 2

                                          Hey I first thought you were kidding but that’s pretty much how /proc works on Linux!

                                          1. 2

                                            DJB config? at least qmail does something similar

                                        2. 7

                                          maybe unpopular: i can read and write pretty much anything more easy than yaml. especially things which are braced like json or have similar open/close tags like.. apache config?

                                          even more unpopular: and i can use tabs for indentation with these formats! the character invented for indenting things! my editor from before i was born can display tabs with a width i like!

                                          back to topic: i think a small tcl would be a real good local optimum for configuration files. cf. Tcl the Misunderstood.

                                          1. 6

                                            Just to be pedantic, weren’t tabs intended for tabulation, rather than code indentation?

                                            1. 2

                                              TSV best SV.

                                              1. 4

                                                Let me introduce you to my friends in the ASCII table: 0x1c-0x1f; file, group, record and unit separator. Woefully underused.

                                                1. 3

                                                  Woefully underused.

                                                  For good reason. They are poorly supported by almost all tooling, and they don’t rigorously solve any additional problems over tabs (or any other delimiter).

                                                  1. 1

                                                    That sounds pretty cool. While not superior to TSV due to tooling, it’s still very nice to have explicit characters toward this end. It would be cool to have something like \fs \gs \rs \us as a way to type them. Even if just supported by a editor extension. I will say, in response to @burntsushi I think they do solve certain problems over tabs, most notably the ability to specify many tables in a file, and many “files” within a file. It also means one could have tabs, whitespace, etcetera without needing to escape it. If I could open up a single document that represents many text files transparently as many text files in my editor, that would be a pretty cool feature. Similarly I do think being able to represent many “sheets” in a csv is also probably very useful. What would this format be called? If it doesn’t already have a name I think “.dsv” is probably not a bad one, I’m also fond of “.gru” or “.gruf” . Sounds like a fun weekend project to make an extension that handles these gracefully, and has a “save as csv/tsv/etc”.

                                                    1. 1

                                                      It also means one could have tabs, whitespace, etcetera without needing to escape it.

                                                      Right, but you then need to escape whatever delimiter you’re using, unless you ban it from being used. That’s kind of what I was getting at.

                                                      1. 1

                                                        I think the whole idea of file, group, record, unit delimiter characters is as delimiters. The common use of comma and tab as punctuation characters means that we will have to escape them regularly. It’s much easier to ban the use of characters that are unused for any language construct.

                                                        1. 1

                                                          Yes, I understand the concept behind them. If you can really get away with banning them completely, then sure, they can solve a problem nicely that tabs/commas probably can’t (modulo the fact that tooling sucks for them). But personally, I’d be surprised if you could get away with such a ban. If you have to implement escaping even in some cases, then it pretty much drags everything down with it. Escaping is pretty much the only reason why CSV parsing is as complex as it is, and more than that, tends to put a cap on performance (depending on your parsing architecture).

                                                          1. 1

                                                            Why would you be surprised if I could “get away with such a ban”. We can do so by edict, and if you can’t handle it then use some other format. If you are storing \fs \gs \rs \us then it’s not the format for you. If you strip out these control codes, then this is indeed the format for you.

                                                            It looks like there’s already a precedence for how to type these.

                                                            ctrl-\ File
                                                            ctrl-] Group
                                                            ctrl-^ Record
                                                            ctrl-_ Unit
                                                            
                                                            1. 1

                                                              Because we are collectively (including myself) very bad at saying “No,” especially when someone comes to you with a valid use case.

                                                              I’m not really interested in discussing this further. Bottom line is if you can get away with that ban, then great. Your point stands. There’s really no point in debating why I personally would be surprised if you could.

                                                              1. 1

                                                                The statement just sounded like you had an example case in mind. I was hoping you were holding out talking about it because it takes effort to describe. Nebulous fears are valid. Often there are unknowns, I just thought you had something concrete in mind.

                                                                1. 1

                                                                  Ah gotya. Yes, mostly nebulous at this point.

                                            2. 6

                                              Yes? This is fairly obvious to a naive reading & comparison. This is why I aggressively push for TOML for all configuration work.

                                              1. 5

                                                I feel lucky to work in the Clojure ecosystem where edn is ubiquitous. It behaves predictably and supports the features I need.

                                                In a previous job I found myself sometimes hand-editing YAML CloudFormation templates. What a nightmare that was.

                                                1. 3

                                                  I recently tried to use the libyaml C library, and that’s where I first realized how complex YAML really is. The library is so abstract, for it to support all the stuff that YAML allows, just parsing key-value data already seems like overkill.

                                                  1. 2

                                                    https://github.com/tlsa/libcyaml might interest you, the aim of this library as I understand it is to provide a higher level C interface to libyaml.

                                                  2. 2

                                                    I agree YAML isn’t great, but I also don’t think there are better alternatives in use. Example docker-compose.yml so we don’t have an option.

                                                    1. 2

                                                      This excellent talk brought a YAML realisation to me: https://www.youtube.com/watch?v=O8xLxNje30M

                                                      YAML is not terminated. So when we send it over HTTP, it’s hard to make sure we actually sent the entire content and the connection didn’t get closed midflight.

                                                      1. 7

                                                        CSS and JS files aren’t terminated either. HTTP has a content length header for a reason?

                                                        1. 1

                                                          YAML has document start “—” and document end “…”, but a single file/stream can contain multiple documents.

                                                        2. 2

                                                          The first example doesn’t work with the current version of PyYAML - the specific use of function application was disabled some time ago, and yaml.load() logs a noisy deprecation warning telling users to use yaml.safe_load() instead.

                                                          https://github.com/yaml/pyyaml/wiki/PyYAML-yaml.load(input)-Deprecation

                                                          1. 2

                                                            As I read that, this will just print a warning and still work: “in PyYAML version 5.1, you will get a warning, but the function will still work”?

                                                            Sidenote: one of the ideas I’ve had for a long time is to analyse GitHub with Google BigQuery to try and find exploits for this; I’ll bet you’ll find a few. Unfortunately using the BigQuery UI is about as much fun as smashing your toes against the bedside table, so I never made much progress with it.

                                                            1. 1

                                                              Using !!python/object/apply:os.system with PyYAML version 5.1 will raise an error.

                                                              Using yaml.load(...) with PyYAML version 5.1 will trigger a warning (unless you specify the Loader argument).

                                                              Some examples and the output they create on my machine: https://gist.github.com/borntyping/27e4529b8ac17c1ad9a7a72369525365

                                                          2. 2

                                                            I’m quite fond of HJSON generally for configuration files - it’s more flexible and human-manageable than JSON (not to mention it supports comments!), less insane than YAML and still quite easy to marshal and unmarshal with predictable configuration structures in mind.

                                                            The only thing I don’t like about HJSON is that real JSON is still completely valid HJSON so it’s still easy to fall down the JSON sinkhole.

                                                            1. 1

                                                              Any time your API has load() and safe_load(), you’ve got a dangerous API. It needs to be unsafe_load() and load().

                                                              1. 1

                                                                I’ve been eyeing HOCON as an alternative for a while, but not having used it I don’t really have any sense of its pain points. Has anyone outside of certain Scala communities adopted it as their config format of choice?