1. 71

    1. 63

      It is an indictment of our field that this opinion is even controversial. Of course XML is better than YAML. YAML is a pile of garbage that should be abandoned completely. XML at least has a lot of decent engineering behind it for specific purposes.

      1. 69

        Meh, these kind of absolute statements don’t really shed any light on the problem

        Seems like fodder for self-righteous feelings

        1. 28

          You’re right. The principles should be laid out: Ease of reasoning about configuration file formats is vastly more important than conveniences for writing specific values. Implicit conversion among types beyond very basic lifting of integer types is a bad idea, especially for configuration file formats. Grammars for configuration file formats should be simple enough to write a complete, correct grammar as a one day project.

          XML is kind of a weird animal because it’s playing the role equivalent to text for JSON. The principles above apply to the DTD you apply to your XML schema.

          1. 1

            Where does YAML do implicit type conversions?

            1. 6

              The Norway problem is a good example of this.

              1. 2

                There is no implicit type conversion going on on the YAML side. no is a boolean in YAML, just like false is a boolean in JSON. If a YAML parser converts it to a string, that’s the parser’s problem.

                1. 3

                  Ha. I can tell you’ve never written a parser before!

                  1. 2

                    No, @xigoi is right, strictly speaking. The parser is where this conversion is going on. Only if it cannot read it as anything else, it reads unquoted literals as if they were quoted strings. Of course, to a user that is neither here nor there: the rules need to be memorized to be able to use unquoted literals correctly.

                    1. 6

                      the rules need to be memorized to be able to use unquoted literals correctly

                      You’ll have a better time if you just use quotes by default… I don’t understand the appeal of unquoted literals in YAML

                      This, for me, is the root of it. YAML is fine as long as you are explicit. Now what it takes to be explicit is going to be driven by what types you intend to use. It seem, to me, that the majority of yaml use cases intend to use only a handful of scalar types and a handful collection types. That small set of types, not coincidentally, is basically the same as what you get in JSON and properly formed JSON is always valid YAML. So I would assert that if you use YAML and explicitly quote string values that you are effectively getting a slightly looser JSON parser which happens to allow you to write a flavor of JSON which is much easier for human concerns; I.E. less picky about trailing commas, supports comments, and is easier on the eyes with some of its constructs.

                      Of course, we’ve got a whole shitload of options these days, so I wouldn’t be surprised if some other markup/serialization format is better in any given specific domain. Different tools for different jobs…

                      One thing I will absolutely agree with is that YAML is awful when used as a basis for psuedo-DSLs, as you see in things like ansible and a lot of CICD systems.

                      1. 2

                        I think we basically agree, but in my opinion one should accept that people are lazy (or forgetful) and use shortcuts, or even copy/paste bad examples. This is like saying sloppiness in PHP or JS is not a problem because one can always use ===.

                        Most people don’t have the discipline to be explicit all the time (some don’t have the discipline to ever be explicit), therefore it’s probably safer to avoid tools with overly prominent inbuilt footguns entirely.

        2. 3

          TBH it seems that way because it almost feels pointless to reiterate the absurdity of YAML.

        3. 7

          Rubbish, the list of semantic surprises in YAML is long and established. The problems with XML boil down to “my fingies are sore fwom all the typing” and fashion.

          1. 21

            One of the most talented developers I know can only work for 2-3 hours a day on a good day because of RSI. I don’t think your patronising take carries the weight you think it does.

            1. 3

              That some people have physical difficulties does not at all impact the validity of the greater population’s supposed concerns about verbosity.

              1. 3

                Let’s also make websites inaccessible because most people don’t need screen readers, shall we?

                1. 1

                  You’re making my point. We have accessibility standards and specialised tools. We don’t demand web pages don’t have videos.

          2. 10

            There are other issues with XML. Handling entities is complex as are the rules for name spacing. Writing an XML parser is complex so most people use libxml2, which is a massive library that doesn’t have a great security track record. For most YAML use cases (where the input data is trusted) this doesn’t matter too much. Parsing YAML is also incredibly hard so everyone uses the same YAML parser library.

            1. 1

              Problems in a specific parser can’t be called problems in the format itself. For what it’s worth YAML’s popular parsers have also had horrible security problems in the past.

              If you have a minute to go into detail, I’m interested in what I’ve missed that makes namespaces complicated, I found them pleasing when used correctly, and frankly used so infrequently that it hardly ever came up, outside of specific formats that used xml as a container, for example MXML. But this knowledge is old now in my case, so I probably just missed the use case that you’re referring to.

              The entity expansions should never have been a thing, that much I’m sure we can all agree on. DTDs were a mistake, but XSD cleaned most of that up; but unless you were building general XML tooling you could in most cases ignore schemas and custom entities completely.

              What’s good about XML (aside from how much support and tooling it once had) is IMO:

              • The consistency with which the tree structure is defined. I don’t know why “modern” markups are all obsessed with the idea that the end of a node should be implied by what’s around it, rather than clearly marked, but I can’t stand it.
              • A clear separation of attributes and children.
              • Consistency in results, in that there are no “clever” re-interpretations of text.
              1. 2

                Consider this made up XML:

                <?xml version="1.0" encoding="UTF-8"?>
                  <thing xmlns="mynamespace">
                    <item>An item.</item>

                Now, let’s query element item using XPath:

                //something/*[namespace-uri()='mynamespace' and local-name()='thing']/*[namespace-uri()='mynamespace' and local-name()='item']


                And now imagine querying some element from a deeply nested XML that might contain more than one custom namespace.

                In my opinion XML namespaces just make it harder to work with the documents.

                1. 1

                  Dear XPath, please adopt Clark notation so we can do /something/{mynamespace}thing/item

                2. 1

                  Yeah that’s rough as guts 🤣 I’ve never seen somebody override the current namespace in the middle of the document, I never even considered that as something you could do. Nobody should have done this, ever.

                  1. 2

                    As a real-world use case, <svg> elements within HTML documents often set the namespace.

                  2. 1

                    Probably not specifically in this way but I am sure you’ve worked with documents which use different namespaces with nested elements.

              2. 2

                It’s almost 20 years since I did anything serious with XML, but I seem to remember the namespace things let you define alternative names for tags to avoid conflicts, so you had to parse tags as their qualified name, their unqualified name in the current namespace (or the parents?) or their aliased name.

                A lot of the security issues of libxml2 were due to the inherent complexity of the format. There are a lot of JSON parsers because the format is simple. You can write a JSON parser in a couple of hundred lines of code if you have a decent string library. A compliant XML parser is at least one, probably two, orders of magnitude more complex. That significantly increases the probability that it will have bugs.

                I’m also not sure I agree on the ‘clear separation of attributes and children’ thing. XML formats that I’ve worked with have never managed to be completely consistent here. Attributes are unstructured key-value pairs, children are trees, but there are a lot of cases where it’s unclear whether you should put something in an attribute or a child. Things using XML for text markup have to follow the rule that cdata is text that is marked up by surrounding text, but things using XML as a structured data transport often end up accidentally leaking implementation details of their first implementation’s data structures into this decision.

          3. 10

            If you’re creating XML by hand, you’re doing it wrong.

      2. 21

        I have zero real world issues with YAML, honestly. I’ll take YAML over XML every day of the week for config files I have to edit manually. Do I prefer a subset like StrictYAML? Yep. Do I still prefer YAML over anything else? Also yep.

        1. 11

          The problem with YAML is that you believe you have no real world issues until you find out you do.

          1. 5

            This sounds like all the folks who have “no real issues” with MySQL, or PHP (or dare I say it, JavaScript). Somehow the issues with YAML seem more commonly accepted as “objectively” bad, whereas the others tend to get defended more fiercely. I wonder why!

          2. 1

            What is an example of a problem with YAML that syntax highlighting won’t immediately warn you about?

            1. 11

              At a previous employer we had a crazy bug that took a while to track down, and when we did it turned out the root cause was YAML parsing something as a float over a string yet the syntax highlighting parsed it as a string.

              I wasn’t the developer on the case, so I don’t remember the exact specifics, but it boiled down to something like we used a object ID which was a hex string, and the ID in question was something along the lines of:

              oid: 123E456

              Which according to the YAML spec allows scientific notation. Of course this could be chalked up to a bug in the syntax highlighting or failure on our part to not use quotation marks but he results were the same; a difficult to track down bug downstream.

        2. 6

          real-world problems i’ve had with yaml:

          • multiline strings (used the wrong kind)
          • the norway problem
          • no block end delimiter (chop a file in half arbitrarily and it still parses without error)
    2. 26

      Not wrong per se, but the time I spend with five bugs or annoyances in YAML is caught up by working with an XML config file just once.

      XML as a machine-writable+readable (and human-semi-parseable) file format? Sure, no problems. As a configf file format? I guess I’ll stick to YAML although I hate it.

      1. 9

        it almost looks like you didn’t read the article, because you kinda badly repeated it in one line and ended up with the wrong conclusion, which is:

        • markup languages (like xml) are:
          • good to annotate corpus of text
          • bad for config files
        • yaml is:
          • bad for config files (norway, go 1.2(0), …)
          • bad to annotate anything

        I guess I’ll stick to stop spreading YAML although because I hate it.

        use TOML instead.

        1. 2

          Look, you can assume all you want. I don’t even agree with the premise of the article, because imho they don’t compare well at all.

          And no, afaik I’ve never spread YAML by willfully introducing it anywhere, maybe I could have tried to dissuade more people from using it, but I’ll still stick to what I wrote. If you think I’m defending yaml by saying I dislike it less than something else, there’s no use arguing any further.

      2. 7

        From the article itself:

        It’s not good if it’s like:

        “I need to configure this server and the server needs to know if this value is true or false.”

        No, that’s bad. Don’t do that. That’s not a good use for XML.

        1. 3

          Well, there’s a bit of a difference between “true or false” and some actual complicated config, e.g. Maven’s ~/.m2/setting.xml.

    3. 23

      I agree with the author, and would summarize it like this.

      XML is a markup language like HTML, and YAML is a data exchange format like JSON.

      Markup languages are good for (surprise) marking up text. You start with text, and then layer structure and annotations on top of it with markup syntax. The markup syntax is verbose because it gives primacy to plain text, which is the default and requires no ceremony apart from escaping a few characters.

      Markup languages are bad for configuration; data exchange languages are good for configuration. It just so happens that YAML is a bad data exchange format. It has many human-friendly do-what-I-mean features but they backfire when you actually mean something else (e.g. NO for Norway, not false).

      1. 16

        How dare you RTFA and summarize it better than I made it. I am deeply offended by this breach of commenting etiquette. 😉

        1. 1

          A little off topic, but: I think your site sets the main text content’s foreground color (something dark), but neglects to set its background color, letting my browser’s personal fallback settings for unspecified styles take over.

          For unstyled sites, I have my fallback colors set to light-on-dark, so this half-enforced styling comes out as dark-on-dark.


          1. 2

            I’m just a guest blogger, but I’ll pass it along.

      2. 5

        Markup languages are bad for configuration; data exchange languages are good for configuration

        Mostly agree, but I would go further and say

        • XML/HTML are for documents, JSON is for records / “objects”, and CSV / TSV are for tables.

        However, JSON is a pretty good data exchange language, but it’s not good for configuration because the syntax it too fiddly (comments, quoting, commas)

        YAML is definitely not good for data exchange, and it has big flaws as a config language, but there’s no doubt that many people use it successfully as a config language.

        So config languages != interchange formats in my mind. Interchange formats are mostly for two programs to communicate (although being plain text helps humans too, so it’s a bit fuzzy.)

        The space of config languages is very large, AND it blends into programming languages. Whereas JSON is clearly not a programming language (though it was derived from one)


        1. 3

          Yeah that makes sense, config language deserves its own category. I guess I’d say I prefer JSON over XML if those are your only two options (and I find it works well enough in VS Code for example, though they allow comments and trailing commas I think).

      3. 3

        I think YAML is not suitable for data exchange, because of its complexity and shaky security record. Better to stick to JSON if you need text, or CBOR or protobufs etc. if you prefer binary. Good data exchange languages are bad config languages.

        YAML is barely tolerable as a data input or configuration language, but there are better options such as json5. TOML is ugly and confusing but still better than YAML.

        1. 3

          I wish more things would adopt UCL for configuration. Like YAML, it is a representation of the JSON object model but it also has a number of features that make it more useful as a configuration language:

          • Macros.
          • Include files.
          • Explicit merging rules for includes (replace objects, add properties to objects).
          • Cryptographic signing of includes, so you can use semi-trusted transports for them.
          • Syntactic sugar for units
          1. 1

            While I like UCL, it can turn into its own kind of hell. rspamd, for instance, is a fantastic piece of software, but the complex mass of includes and macros can make the configuration hard to reason about. Mind you, this isn’t UCL’s fault, just something it enables.

            1. 1

              The macros can be a bit exciting but I like the fact that rspamd doesn’t have any defaults in the program, they’re all visible in the config file directory that’s included with the lowest priority.

    4. 18

      I like to say that XML is something you inflict on others and not yourself. By this I mean, it is very good for interchange when you need all parties to agree on what constitutes a valid document. YAML and JSON are not great for this, although we use JSON for it a lot in practice for the same reason we use Python and not Haskell.

      I have also deployed it against myself for odd situations where I have a weird superset of information from several systems. For instance, we were migrating a database from one schema to another for compatibility reasons. I wanted to track what the source tables/columns were and what the targets were, generate documentation about both the new schema and the mapping from the old to the new and generate automated migrations to create the new schema. I made a little XML file with a trivial format for this and wrote three XSL stylesheets to generate the outputs. XML is an interesting tool. Not perfect for every scenario but it comes in handy sometimes.

      1. 1

        for odd situations where I have a weird superset of information from several systems

        Which is what we have in almost every microservice-based architecture when responses from multiple services have to be combined. We could go one step further and say that a graph data format like Turtle should be used instead of a tree-based one like XML/JSON to replace a non-trivial n-way tree merging with a straightforward graph union.

        it is very good for interchange when you need all parties to agree

        Seems like exactly what we should do when designing microservice architectures.

        1. 2

          Can you point me to Turtle? It’s not something I’ve heard of before.

            1. 2

              Thank you!

          1. 1

            It’s a human-readable format, simplifying from N3/Notation3, to encode RDF tuples. I heart RDF and Linked Data and wish I had more excuses to use it!

    5. 12

      meh, in my career I’ve not really run into all these weird edge cases in real production apps - yaml has worked fine for me. not perfect, but nothing is perfect, sql is a mess, js is a mess, bash is a mess, the whole industry is a mess and things keep flying.

      yaml does have a lot of weird bloat features that do not need to be there - maybe it’s time for a new spec?

      1. 4
        1. 2

          That is a great project, but it is not a spec, just a tool.

          A great spec which is similar keeps typing out of scope altogether, and is called NestedText.

    6. 10

      Hm what’s the exact example where 1.20 gets interpreted as 1.2? I would have thought if you have quotes it doesn’t do that

      That said, I mostly agree with “there’s always something better than YAML”.

      I divided such languages into 5 categories here, which may help frame discussions / choices:

      https://github.com/oilshell/oil/wiki/Survey-of-Config-Languages (feel free to add missing languages)

      I’m not sure XML really competes with YAML or JSON though. It doesn’t have data types, and the data model for documents doesn’t conveniently map to struct/array or record/list data types in languages – it maps to the DOM, which is probably not what you want.

      Although I can see XML being used as a test file format – you can have <prog></prog> and <expectedStdout> sections. But most people seem to write their own file formats for there

      I wrote my own, and there was some lobste.rs article about the Go test framework that does that – it’s plain text, not XML (maybe someone remembers the link)

      Main problem with embedding code in XML is that <>& have to be manually escaped, and basically all languages use those chars. I guess you can use CDATA but I can’t remember how to do that …

      1. 12

        The problem is that 1.20.1 doesn’t need quotes, so you don’t think to add it to 1.20.

        1. 18

          I learned this one using GitHub Actions, where my [3.7, 3.8, 3.9] list of a Python versions to try broke when Python 3.10 came out. I had to switch to ["3.7", "3.8", "3.9", "3.10"]

          1. 4

            That example would have bit you in any type-aware config language. :) You’re passing in floating point numbers, not strings.

            1. 14

              This is true, but autoquoting strings encourages the problem. In YAML, 3.10 doesn’t “look” wrong. It looks just like all the things which work correctly. In JSON, it would be more obvious that you were doing the wrong thing because [3.9, 3.10] would “look” wrong. It’s a psychological problem more than anything else.

            2. 8

              This is an argument in favor of properlytyped configuration languages.

        2. 9

          Yeah 1.20.1 not requiring quotes is bad. In my head I call that the “else-whatever” or “else-me-no-care” anti-pattern

          The language author just writes the if statement (“happy path”), and then they don’t care what happens in the else statement

          1. if it looks like 1.20, then it’s a float
          2. if it looks like "1.20", it’s a string
          3. else me no care

          Shell has this all over the place.

          Valid Brace expansion:

          $ echo {foo,bar}@hi.com
          foo@hi.com bar@hi.com

          You can just leave off one char and then things are broken. Now your own source code is data that flows through the system:

          $ echo {foo,bar@hi.com

          YSH detects that at parse time, not at runtime:

            echo {foo,bar@hi.com
          [ -c flag ]:1: Word has unbalanced { }.  Maybe add a space or quote it like \{

          Although I think we can even stricter, and say not just that {} should be balanced, but that they should always be quoted.

          Same with

          • globs that don’t match anything – you have to opt into nullglob or failglob options to fix that
          • tilde expansion for a user that’s not real
          • history expansion

          YSH addresses all these things

          So then you can use it reliably as a config file format!

          $ cat _tmp/h
          hay define Package
          Package foo {
            version = 1.2.0
          $ bin/ysh _tmp/h
              version = 1.2.0
          '_tmp/h':4: Syntax error in expression (near Id.Expr_DecInt)

          So it has typed data just like Python or JS, and you get a dictionary when it’s valid … This still needs some work (and documentation), but I think it’s better than YAML for sure, and more flexible than JSON.

          And it’s all in your shell, so you don’t have to switch to a new language – it’s all parsed as one language. Most YAML embeds shell these days

          Hay Ain’t YAML - https://www.oilshell.org/release/0.18.0/doc/hay.html

        3. 2

          Sure, but the way to avoid that is to use a subset of YAML. If you are willing to use the full set of XML capabilities then you run into a lot of similar alternative representation issues and human intuition mistakes. Both formats are usable if you stick to a very restricted subset and neither format is usable in full.

      2. 1

        I’ve suggested this before, but NestedText would be a good addition to your string data category.

    7. 9

      the point of the article is that XML has some good use cases while there are usually better choices than YAML whenever you would reach for YAML. that’s a bit like saying a hammer is better than a screwdriver because whenever you would use a screwdriver you could use an electric drill (or a small electric screwdriver, whatever) instead.

      YAML has deficiencies, but (as the article points out!) XML is not a substitute for YAML for the typical YAML use cases of human-readable, human-editable config files, markdown frontmatter and general “structured lists of things”.

    8. 6

      This rant seems limited to issues of semantic interpretation. However in practice there are many more concerns to weigh. The security track record of XML parsers is much, much worse than the track record of YAML parsers. This is especially true for the most recent versions of each.

      I’m not talking here about application choices of how to use either format. e.g. the famous issues with web frameworks blindly calling into code objects decoded from the format isn’t the fault of the format. I’m talking about the risk you incur simply by passing untrusted input to the parser.

      1. 6

        If you use a real YAML parser, you get the same billion laughs problem as XML. I don’t see much difference in terms of difficulty. YAML has also tags, which no one uses but are basically as bad as namespaces in terms of causing unexpected bugs.

        1. 4

          YAML’s tags have caused some slapstick security failures, but I think most implementations have stopped using them to instantiate application objects by default.

          On the other hand XML still has problems with external entities which can cause hilarious firewall bypasses. And XML’s data model is a really bad match for the usual data structures.

          Really, they are both terrible, and both should be avoided. It’s better to spend time pointing people at actually good alternatives, such as json5.

        2. 3

          Yes, they do have some security concerns in common. Recursive issues in the spec are disappointing on both sides of the ledger. However, a simple review of vulnerabilities for the canonical implementations reveals the difference in how well each format can be practically implemented without introducing vulnerabilities specific to the implementation. XML has been stable for nearly two decades and we’re still suffering from serious vulnerabilities in libxml2.

    9. 5

      It never ceases to amaze me that instead of using a linter, learn to read the spec and language grammar, or push the industry to adopt YAML 1.2, there are always people set in their ways trying to rail against consensus that have long been settled. I think as an industry we’ve really failed to pass down our history and learnings.

      Let me break down why the reverse of the premise of this piece is true, i.e. there are situations where YAML is appropriate, but there’s no situation where XML is appropriate.

      1. XML was designed in a silo, had unrealistic expectations on the world, and was designed as a baseline syntax to evolve HTML4 into XHTML. It was not designed to be a configuration language, and it even failed to serve the purpose of what it was designed for.

      The abstract of the spec says:

      The Extensible Markup Language (XML) is a subset of SGML that is completely described in this document. Its goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML. XML has been designed for ease of implementation and for interoperability with both SGML and HTML.

      We haven’t seen anything where interop with SGML and HTML is paramount for decades now. In fact, a couple of years ago before it became a W3C recommendation, WHATWG, which consists of all the browser engine vendors at the time, had decided to move on to HTML5 instead because they had concluded that the HTML authors in the world past and present just aren’t going to rewrite all of their HTML that do not conform to the XML spec.

      1. XML is inefficient as both a representation of logical structure and storage structure.

      In the Origin and Goals of the spec, the last point literally says

      1. Terseness in XML markup is of minimal importance.

      Except some legacy tech firms who haven’t kept up with the times, I haven’t heard of any system architect say they’ll use XML as an interchange format for message passing between systems for over a decade. It’s inefficient as a wire format (wastes bandwidth), and it’s inefficient to process due to the number of unnecessary tokens in the streams (wastes CPU cycles).

      XML is an eyesore, information density is super low even for its time.

      1. XML is ambiguous, just as difficult to write correctly and has poor tooling.

      The author must have missed the debate in the early 00s where people were confused about when an attribute or a markup element should be used when designing an XML schema. Eventually people decided that markup elements should be used for future-proofing but lots of old documents using attributes still stuck around, so now any XML schema started with attributes had 2 different ways to convey that same information.

      3.1 XML entities is a scourge of humanity. You think YAML 1.0 cutting off the zero is bad? Try forgetting to wrap your character data in <![CDATA[]]>. Your XML document may not even parse.

      3.2 Has anyone ever found an XML validator that will actually download all the the schemas and validate your documents? Do people actually write XML schemas and declare them in the docs correctly?

      XML started out as a syntax tree syntax to evolve HTML5 and ended up being a configuration language in many old-school Java frameworks and failed on both counts. YAML on the other hand, was designed purely as a minimalist syntax tree syntax for configuration only. The reason it’s everywhere is people do recognize it being easy to read, write, parse and transmit. If your favorite CI configuration (ahem, Github Actions) is still using YAML 1.0, the onus is on them to roll out support for YAML 1.2 ASAP, not on the rest of us sensible people preferring a language that is superior to XML on all counts. Even k8s has switched to YAML 1.2 now.

    10. 5

      “test this against Go 1.20”

      It interprets that as Go 1.2. 😭🤣

      All programming languages I remember would interpret decimal literal the very same way. JSON parser would too. What’s the problem?

      You just have to always quote everything.

      No, you have to always quote string literals (unless they are known not to be literals of different types, such as boolean Yes/No) and never quote numbers, booleans, nulls nor dates.

      Also the YAML specification has all these features that nobody ever uses, because they’re really confusing, and hard, and you can include documents inside of other documents, with references and stuff…

      References make life simpler when writing such documents by hands, though. I’ve written JSON schema documents in YAML because it’s simpler that way.

      So while being like 80% there and 20% kinda sucky for a lot of hand-kept data, sometimes I just don’t have the time nor willpower to repeat myself writing JSON nor roll a custom DSL. So YAML it is.

      1. 3

        All programming languages I remember would interpret decimal literal the very same way. JSON parser would too. What’s the problem?

        The problem is being loose with types. 1.20.1 is autocoerced to a string, 1.20 is a number. It makes it exceedingly easy to do the wrong thing, especially if you’re just editing a preexisting configuration. IMO, any time you’re placing additional burden on a human to understand and navigate unexpected behavior, that’s a problem.

        JSON has its own problems, but at least in this limited case you aren’t going to make a two-character content change and have your datatype completely altered.

        No, you have to always quote string literals (unless they are known not to be literals of different types, such as boolean Yes/No) and never quote numbers, booleans, nulls nor dates.

        You should, but it’s incorrect to say you must.

        foo: bar is entirely valid.

        So while being like 80% there and 20% kinda sucky for a lot of hand-kept data, sometimes I just don’t have the time nor willpower to repeat myself writing JSON nor roll a custom DSL. So YAML it is.

        Seems pragmatic. I certainly haven’t gone out of my way to migrate random YAML files in my older Rails projects.

        FWIW, my opinion is that it’s objectively an awful config language, but it’s too late now to stuff that particular horror back into Pandora’s Box.

        1. 2

          It sure is an awful configuration language, but for hand administered dataset it’s the path of least resistance when you don’t feel up to the task of making your own format.

          On the other hand, there has been an enormous amount of developer time spent on format conversions and home grown formats tend to suck even more than YAML and XML.

    11. 4

      Nice article Carl. Thx for the link to CUE. That looks pretty interesting. Personally, I prefer neither. If I had to pick, then it would be XML. The problem with both are the feature surface. This is, unfortunately, where Json just zooms right past them. It is so much easier to implement securely.

    12. 4

      I have definitely partaken of more than my share of Google Kool-Aid, but I personally prefer textproto as my goto configuration format. IMHO any format has to have schema validation. It also has to be understandable what nesting level you are at. Most formats fail on both fronts. XML succeeds on both, but protobufs are more succinctly expressed than XML tags

    13. 4

      At the end of the article you mention TOML. TOML has it own problems.

    14. 4

      It’s not a coincidence that the real meaning of YAML is “Yet Another Migraine Looming”.

    15. 3

      XML is just not used for the same thing and are not comparable.

      First thing first, not throwing an error when the type of an attribute is wrong when parsing a YAML schema is a mistake. A version number must be a string, not a number, so version: 1.20 shouldn’t be allowed. For example, if I remember well, Kubernetes is strict on the data types when parsing YAML, and it works well. If <version>1.20</version> works as you would expect, it’s only because everything is a string in XML (you can use a schema to validate a string, but it is still a string).

      About XML, it is a language used for documents. Tags and attributes are used to provide more data around the text contained in the document. You can read a good article on XML here: XML is almost always misused (2019, Lobsters).

      YAML, on the other end, is a language for storing data structures. It has its own limitations, but also some features that are not widely known, such as anchors.

    16. 3

      I’m not a fan of either format, but I strongly prefer working with XML. I’d even go so far as to say it doesn’t make sense to compare them because YAML doesn’t solve the same probelms as XML. Does YAML have schemas? Does it have XPath? Does it have XSLT? Does it have DTDs? Does it have namespaces?

      And aside from tooling and programatic access the editor support for XML is a lot better than, too.

      YAML feels like an ambiguous version of JSON when I have to use it - anything goes as long as you remember all of the weird rules and corner cases.

      1. 1

        Every time I have to work with XML I end up stripping away the namespaces as the first step. As for XSD, they frequently do not match the data. XSLT is horrible. Even imperative code is more readable and flexible transformation tool than that. XPath is likely the only interesting piece of the stack.

        1. 4

          That’s what I mean, really.

          If you don’t need the tools XML provides then it’s just added complexity and you should use something else - or pick and choose specific parts to use. But when you do need those features then it’s nice to have a standard way of doing it across programming languages and platforms.

          YAML doesn’t have the tools - at least not built-in and standardized - so if you need them YAML isn’t an option.

          As for XSD, they frequently do not match the data.

          It’s like saying “the compiler gives a bunch of errors about my code”…

    17. 3

      In my experience, a lot of the widely-cited problems with YAML go away if you’re deserializing it into statically-typed data structures. Or maybe more precisely, if you have a schema that defines the data types, which you kind of get implicitly as part of deserialization depending on your language/environment. For example (Kotlin):

      import org.yaml.snakeyaml.Yaml
      val yamlDocument = "version: 1.20"
      // The target class declares "version" as a string
      data class YamlTest(var version: String? = null)
      val deserialized = Yaml().loadAs(yamlDocument, YamlTest::class.java)

      prints 1.20, not 1.2. Not to say the complaints about YAML are without merit, but it’s easy to come away from articles like this with the incorrect conclusion that it’s outright impossible to avoid a lot of YAML’s problems.

      1. 1

        When you mentioned static typing I was expecting an example where the parser was expecting a String, but saw a numeric literal (1.20) and threw an error because of the mismatch. But the behavior of that library seems… worse than the alternative, almost? Like, 1.20 is unambiguously a number in YAML, right? Why does the library coerce it to a String for you?

        YAML’s behavior is surprising enough as it is; if some library does an additional surprising thing to try to be helpful then that seems like it would make the YAML ecosystem more surprising, not less.

    18. 3

      Some thoughts:

      • AFAIK (although maybe things have changed recently?) there’s no standard for YAML schema definition. With XML you can hand your customer / client an XSD schema and say “we will use this to validate your input against”.

      • YAML is less human-readable than is claimed. I’d argue that it’s only superficially human-readable; no human without previous YAML experience would expect no to be deserialised into false.

      1. 2

        Although I agree with anything that says XML is better for its own thing and its own thing alone, while everything else sucks in their own right, I gotta get triggered about XSD and WSDL and all that.

        They’re great until something shits the bed, like ever dealt with xmlsec? You got your canonicalization methods, and your digests, and your signatures, and all that good support network, only to learn how little canonicalization matters when someone does a pretty-print reindent in-between and invalidates the signature. Maybe an HTTP 500, maybe an HTTP 200, maybe the XML response follows a schema, maybe it lazily does not.

        I’m not a veteran in these matters necessarily, but I’ve seen plenty of BS seep through the cracks of strict definitions.

        1. 2

          I remember debugging differences between Ruby and .NET XML canonicalisation back in around 2008. Not a pleasant experience.

          1. 1

            Not surprised it never got better with IBM XML and Python in 15 years. Which is sad.

            1. 1

              But, even though I’m still traumatised by that experience ;) I’d take XSD over no schema definition any day.

    19. 2

      Sometimes I feel like a good chunk of the problems with yaml end up being a combo of unfamiliarity with the spec as well as out-of-date implementations.

      Half of the problems people commonly cite with yaml (Norway problem, sexagesimal numbers, surprise octal, merge operator) were removed in yaml 1.2. However, despite yaml 1.2 being 14 years old, surprisingly few parsers implement 1.2, so people continue running into the same issues that were fixed in the spec over a decade ago. Some implementations (the most popular one for go, for example) chose to implement a mix of 1.2 and 1.1 to support certain 1.1 features that were removed.

      1. 1

        One of the problems is that the YAML spec is moderately difficult to implement. I learned to use a PEG implementation by writing a CSV parser and then a JSON parser in a weekend. Decoding JSON was ~65 lines. I set aside my YAML 1.2 parser about 250 lines in and it was incomplete. I’ll get back to it one day but it’s a hairy spec.

        Having read the spec multiple times and looked at other implementations, I’m not plagued by the problems other people report but I don’t think that’s reasonable when all you want to do is write a config file.

    20. 1

      I want to like YAML, but it’s not always intuitive to me what it’s going to do— and I usually end up loading the file into Python, and checking that the data structure is how I intend, and unicode works as I expect, and trailing or leading spacing happens like I want. I know it’s not that hard, but there’s enough subtleties that it has never “stuck” for me. It feels similar to CoffeeScript (RIP).

      Flip side, I never use XML except for describing layouts, which, honestly, I love so much. I would never use it for data structures, much prefer JSON.

      And then if we’re talking configurations, I hate .ini, .xml, .json, .yaml, and slightly less .toml— they’re all terrible, I much would prefer a well typed TypeScript API that I get autocomplete & docstrings. I hate configuring things blind in a markup language.

      Part of me thinks this whole discussion is moot, and it’s just personal preference what one enjoys to do manual data entry with— and that sometime in the future we’ll have a better IDE experience, think spreadsheets for arbitrary data structures. And then we can spit out whatever flavor someone likes, but at that point we would just switch to more efficient binary encodings. My main contention with XML/YAML/JSON etc, is sometimes I want to put images/videos/sounds/etc while doing manual data entry, but I simply can’t drag and drop. Also multi line text starts to get nasty, escaping rules, etc.

    21. 1

      Honestly, it seems like the only people who are complaining about YAML are those that are overusing it, or using tech that overuses it. Every single complaint I’ve heard personally comes from people writing absurd configurations for Ansible or some other devops tech which should have been scripted instead. (Seriously, loops in YAML? Get out of here.)

      This article itself is a complaint about what is likely a floating-point rvalue, which it correctly rounds. Version numbers are strings and not numbers. You wouldn’t expect a scripting language to interpret 1.20 as "1.20", nor should you expect YAML to do so as well. (This assumes you have a basic understanding of YAML parsing.) Last I checked, XML won’t even solve this problem as it’s going to be dependent on your schema.

      1. 3

        I understand that 1.20 != "1.20" is operator error. The thing is on the podcast, my cohost, who likes YAML, laughed because he had the same bug. In this comments section, Simon Willison said he had the same bug. I’m pretty sure I also had the same bug when Python 3.10 came out, although it’s been a while, and maybe I’m misremembering. But I feel like that’s why when the bug hit I figured it out relatively quickly. If you look around, you can find a lot of people experiencing variations of this same bug.

        If operator error is extremely common, maybe the operators aren’t at fault?

        1. 1

          Yep, i’ve had the same bug in TOML too. it’s not a YAML problem, and making me use JSON/XML and wrap every value in redundant quotes, braces and chevrons isn’t going to fix it.

          1. 3

            Really? I thought TOML requires all strings to be quoted, which would make the difference a lot more obvious.

          2. 2

            Perhaps the compounding problem is that the tools that process the Yaml/JSON/TOML/whatever are accepting any data type and coercing to string. This, combined with Yaml autoquoting sometimes working produces a laxness that causes a lot of headaches.

            Ideally, both tools should be stricter: The config syntax you’re using shouldn’t yield strings or numbers depending on how something is written (ie, autoquoting is evil), and the tool that consumes the config document should not be accepting floating-point numbers where a string is expected. Pervasive permissiveness (aka “sloppiness”, but often misinterpreted as “convenience” or even “ergonomics”) is a scourge on our industry.

        2. 1

          YAML is a superset of JSON. If people struggle, they should keep this in mind, and maybe choose to use JSON syntax instead until they learn more YAML. (In the case of numbers/strings, you’ll have the same issue in JSON.)

    22. 1

      Why make the comparison if you recommend TOML for the things YAML is usually used for?

      1. 3

        It’s a little bit apples and oranges, but the thinking is something like

        • They are both file formats that people complain about
        • They both have serious problems with implementation security
        • They both are sometimes used for config files
        • Neither is a good choice for a config file
        • Neither is a good choice for an RPC API, which people did with XML (SOAP) and mostly haven’t done with YAML
        • XML is a good choice for a document format, ie a book or legal markup, and not bad for UI layout, ie JSX and XUL
        • There’s not really usecase where YAML is the best choice

        So adding it all up, but both flawed, but XML is “better” than YAML because it has some pros, whereas YAML is all cons.

        1. 1

          Thanks! Makes a more sense to me now.

          TOML and Nickel look interesting, but it’s an academic matter for me as I’m usually on the receiving end of config file format choices. I will say this, though. If anyone out there is using ESLint and is considering using YAML instead of JSON, I’ve tried this and I can tell you it is not good!

    23. [Comment removed by author]