Threads for stickupkid

    1. 63

      It is an indictment of our field that this opinion is even controversial. Of course XML is better than YAML. YAML is a pile of garbage that should be abandoned completely. XML at least has a lot of decent engineering behind it for specific purposes.

      1. 69

        Meh, these kind of absolute statements don’t really shed any light on the problem

        Seems like fodder for self-righteous feelings

        1. 28

          You’re right. The principles should be laid out: Ease of reasoning about configuration file formats is vastly more important than conveniences for writing specific values. Implicit conversion among types beyond very basic lifting of integer types is a bad idea, especially for configuration file formats. Grammars for configuration file formats should be simple enough to write a complete, correct grammar as a one day project.

          XML is kind of a weird animal because it’s playing the role equivalent to text for JSON. The principles above apply to the DTD you apply to your XML schema.

          1. 1

            Where does YAML do implicit type conversions?

            1. 6

              The Norway problem is a good example of this.

              1. 2

                There is no implicit type conversion going on on the YAML side. no is a boolean in YAML, just like false is a boolean in JSON. If a YAML parser converts it to a string, that’s the parser’s problem.

                1. 3

                  Ha. I can tell you’ve never written a parser before!

                  1. 2

                    No, @xigoi is right, strictly speaking. The parser is where this conversion is going on. Only if it cannot read it as anything else, it reads unquoted literals as if they were quoted strings. Of course, to a user that is neither here nor there: the rules need to be memorized to be able to use unquoted literals correctly.

                    1. 6

                      the rules need to be memorized to be able to use unquoted literals correctly

                      You’ll have a better time if you just use quotes by default… I don’t understand the appeal of unquoted literals in YAML

                      This, for me, is the root of it. YAML is fine as long as you are explicit. Now what it takes to be explicit is going to be driven by what types you intend to use. It seem, to me, that the majority of yaml use cases intend to use only a handful of scalar types and a handful collection types. That small set of types, not coincidentally, is basically the same as what you get in JSON and properly formed JSON is always valid YAML. So I would assert that if you use YAML and explicitly quote string values that you are effectively getting a slightly looser JSON parser which happens to allow you to write a flavor of JSON which is much easier for human concerns; I.E. less picky about trailing commas, supports comments, and is easier on the eyes with some of its constructs.

                      Of course, we’ve got a whole shitload of options these days, so I wouldn’t be surprised if some other markup/serialization format is better in any given specific domain. Different tools for different jobs…

                      One thing I will absolutely agree with is that YAML is awful when used as a basis for psuedo-DSLs, as you see in things like ansible and a lot of CICD systems.

                      1. 2

                        I think we basically agree, but in my opinion one should accept that people are lazy (or forgetful) and use shortcuts, or even copy/paste bad examples. This is like saying sloppiness in PHP or JS is not a problem because one can always use ===.

                        Most people don’t have the discipline to be explicit all the time (some don’t have the discipline to ever be explicit), therefore it’s probably safer to avoid tools with overly prominent inbuilt footguns entirely.

        2. 3

          TBH it seems that way because it almost feels pointless to reiterate the absurdity of YAML.

        3. 7

          Rubbish, the list of semantic surprises in YAML is long and established. The problems with XML boil down to “my fingies are sore fwom all the typing” and fashion.

          1. 21

            One of the most talented developers I know can only work for 2-3 hours a day on a good day because of RSI. I don’t think your patronising take carries the weight you think it does.

            1. 3

              That some people have physical difficulties does not at all impact the validity of the greater population’s supposed concerns about verbosity.

              1. 3

                Let’s also make websites inaccessible because most people don’t need screen readers, shall we?

                1. 1

                  You’re making my point. We have accessibility standards and specialised tools. We don’t demand web pages don’t have videos.

          2. 10

            There are other issues with XML. Handling entities is complex as are the rules for name spacing. Writing an XML parser is complex so most people use libxml2, which is a massive library that doesn’t have a great security track record. For most YAML use cases (where the input data is trusted) this doesn’t matter too much. Parsing YAML is also incredibly hard so everyone uses the same YAML parser library.

            1. 1

              Problems in a specific parser can’t be called problems in the format itself. For what it’s worth YAML’s popular parsers have also had horrible security problems in the past.

              If you have a minute to go into detail, I’m interested in what I’ve missed that makes namespaces complicated, I found them pleasing when used correctly, and frankly used so infrequently that it hardly ever came up, outside of specific formats that used xml as a container, for example MXML. But this knowledge is old now in my case, so I probably just missed the use case that you’re referring to.

              The entity expansions should never have been a thing, that much I’m sure we can all agree on. DTDs were a mistake, but XSD cleaned most of that up; but unless you were building general XML tooling you could in most cases ignore schemas and custom entities completely.

              What’s good about XML (aside from how much support and tooling it once had) is IMO:

              • The consistency with which the tree structure is defined. I don’t know why “modern” markups are all obsessed with the idea that the end of a node should be implied by what’s around it, rather than clearly marked, but I can’t stand it.
              • A clear separation of attributes and children.
              • Consistency in results, in that there are no “clever” re-interpretations of text.
              1. 2

                Consider this made up XML:

                <?xml version="1.0" encoding="UTF-8"?>
                  <thing xmlns="mynamespace">
                    <item>An item.</item>

                Now, let’s query element item using XPath:

                //something/*[namespace-uri()='mynamespace' and local-name()='thing']/*[namespace-uri()='mynamespace' and local-name()='item']


                And now imagine querying some element from a deeply nested XML that might contain more than one custom namespace.

                In my opinion XML namespaces just make it harder to work with the documents.

                1. 1

                  Dear XPath, please adopt Clark notation so we can do /something/{mynamespace}thing/item

                2. 1

                  Yeah that’s rough as guts 🤣 I’ve never seen somebody override the current namespace in the middle of the document, I never even considered that as something you could do. Nobody should have done this, ever.

                  1. 2

                    As a real-world use case, <svg> elements within HTML documents often set the namespace.

                  2. 1

                    Probably not specifically in this way but I am sure you’ve worked with documents which use different namespaces with nested elements.

              2. 2

                It’s almost 20 years since I did anything serious with XML, but I seem to remember the namespace things let you define alternative names for tags to avoid conflicts, so you had to parse tags as their qualified name, their unqualified name in the current namespace (or the parents?) or their aliased name.

                A lot of the security issues of libxml2 were due to the inherent complexity of the format. There are a lot of JSON parsers because the format is simple. You can write a JSON parser in a couple of hundred lines of code if you have a decent string library. A compliant XML parser is at least one, probably two, orders of magnitude more complex. That significantly increases the probability that it will have bugs.

                I’m also not sure I agree on the ‘clear separation of attributes and children’ thing. XML formats that I’ve worked with have never managed to be completely consistent here. Attributes are unstructured key-value pairs, children are trees, but there are a lot of cases where it’s unclear whether you should put something in an attribute or a child. Things using XML for text markup have to follow the rule that cdata is text that is marked up by surrounding text, but things using XML as a structured data transport often end up accidentally leaking implementation details of their first implementation’s data structures into this decision.

          3. 10

            If you’re creating XML by hand, you’re doing it wrong.

      2. 21

        I have zero real world issues with YAML, honestly. I’ll take YAML over XML every day of the week for config files I have to edit manually. Do I prefer a subset like StrictYAML? Yep. Do I still prefer YAML over anything else? Also yep.

        1. 11

          The problem with YAML is that you believe you have no real world issues until you find out you do.

          1. 5

            This sounds like all the folks who have “no real issues” with MySQL, or PHP (or dare I say it, JavaScript). Somehow the issues with YAML seem more commonly accepted as “objectively” bad, whereas the others tend to get defended more fiercely. I wonder why!

          2. 1

            What is an example of a problem with YAML that syntax highlighting won’t immediately warn you about?

            1. 11

              At a previous employer we had a crazy bug that took a while to track down, and when we did it turned out the root cause was YAML parsing something as a float over a string yet the syntax highlighting parsed it as a string.

              I wasn’t the developer on the case, so I don’t remember the exact specifics, but it boiled down to something like we used a object ID which was a hex string, and the ID in question was something along the lines of:

              oid: 123E456

              Which according to the YAML spec allows scientific notation. Of course this could be chalked up to a bug in the syntax highlighting or failure on our part to not use quotation marks but he results were the same; a difficult to track down bug downstream.

        2. 6

          real-world problems i’ve had with yaml:

          • multiline strings (used the wrong kind)
          • the norway problem
          • no block end delimiter (chop a file in half arbitrarily and it still parses without error)
    2. 13

      I feel like this is something like nushell is also trying to solve. I’ve not daily driven nushell, only experimented with it, and I’ve not touched powershell in some years, so I can’t give a definitive answer on it. Both feel better UX and repeatability from a programmatic standpoint.

      ls | where type == "dir" | table
      1. 12

        ironically, i think that nushell on *nix systems is a harder sell than powershell on windows because of compat, despite shells being a much larger part of the way people typically work on *nix systems.

        i tried using nushell as my daily driver, and pretty frequently ran into scripts that assumed $SHELL was going to be at least vaguely bash-compatible. this has been true on most *nix boxes for the past 20+ years and has led to things quietly working that really shouldn’t have been written.

        OTOH cmd.exe is a much smaller part of the windows ecosystem and (at least, it seems to me) there is much less reliance on the “ambient default shell behaves like X”, so switching to the new thing mostly just requires learning.

        (i ultimately dropped nushell for other reasons, but this was also over 30 releases ago so its changed a bit since then)

        1. 16

          You’ve brought up several valid points, but I want to split this up a bit.

          The reason using something other than bash (or something extremely close, like zsh) is painful is due to the number of tools that want to source variables into the environment. This is actually a completely tractable problem, and my default *nix shell (fish) has good tools for working with it. It’s certainly higher friction, but it’s not a big deal; there are tools that will run a bash script and then dump the environment out in fish syntax at the end so you can source it, and that works fine 95% of the time. The remaining 5% of the time almost always has a small fish script to handle the specific use-case. (E.g., for ssh-agent, there’s Hasn’t been updated since 2020, but that’s because it works and is stable; there’s nothing else I could imagine really doing here.) And you could always set whatever shell for interactive-only if you really want it (so that e.g. bash would remain the default $SHELL).

          PowerShell on Windows actually has to do this, too, for what it’s worth. For example, the way you use the Windows SDK is to source a batch file called vcvarsall.bat (or several very similar variants). If you’re in PowerShell, you have to do the same hoisting trick I outlined above–but again, there are known, good ways to do this in PowerShell, to the point that it’s effectively a non-problem. And PowerShell, like fish, can do this trick on *nix, too.

          Where I see Nushell fall down at the moment is three places. First, it’s just really damn slow. For example, sometimes, when I’m updating packages, I see something fly by in brew/scoop/zypper/whatever that I don’t know what it is. 'foo','bar','baz','quux'|%{scoop home $_} runs all but instantly in PowerShell. [foo bar baz] | each { scoop home $it } in Nushell can only iterate on about one item a second. But on top of that, Nushell has no job control, so if I want to start Firefox from the command line, I have to open a new tab/tmux window/what-have-you so I don’t lock my window. And third, it’s still churning enough that my scripts regularly break. And there are dozens of things like this.

          I really want to like Nushell, and I’m keeping a really close eye on it, but, at the moment, running PowerShell as my daily shell on *nix is entirely doable (even if I don’t normally do it). Nushell…not so much.

        2. 7

          You’re absolutely right. It’s a VERY hard sell.

          There are all the software problems, some of which you’ve detailed, and then there’s IMO the even bigger problem - the human problem :)

          UNIX users don’t just use, love, and build with the “everything is a stream of bytes” philosophy, it almost becomes baked into their DNA.

          Have you ever tried to have a discussion about something like object pipelines or even worse yet, something like what AREXX or Apple’s Open Scripting Architecture used to offer to a hardcore UNIX denizen?

          99 times out of 100 it REALLY doesn’t go well. There’s no malice involved, but the person on the other end can’t seem to conceptualize the idea that there are other modes with which applications, operating systems and desktops can interact.

          As someone whose imagination was kindled very early on with this stuff, I’ve attempted this conversation more times than I care to count and have pretty much given up unless I know that the potential conversation partner has at least had some exposure to other ways of thinking about this.

          I’d say it’s kind of sad, but I suspect it’s just the nature of the human condition.

          1. 5

            I believe that, in the end, it all boils down to the fact that plain text streams are human readable and universal. You can opt-in to interpreting them as some other kind of a data structure using a specialized tool for that particular format, but you can’t really do it the other way around unless the transmission integrity is perfect and all formats are perfectly backward and forward compatible.

        3. 2

          I would argue that scripts that don’t include a shebang at the top of them are more wrong than the shell that doesn’t really know any better what to do with them.

          I don’t want to pollute this thread with my love for nushell, but I have high expectations for it, and I’ve previously maintained thousands-of-lines scripts that passed shell-check and were properly string safe. Something I think many people just avoid thinking about. (example, how do you build up an array of args to pass to a command, and then properly string quote them, without hitting the case where you have an empty array and output an empty string that inevitably evokes an error in whatever command you’re calling – in nushell this doesn’t matter, you just call let opts = ["foo", "bar xyz"]; echo $opts and the right thing happens)

          I’ll just leave a small example, so as to not go overboard here. I ought to compile my thoughts more fully, with more examples. But even something like this: would not exactly be super fun to implement in bash.

          1. 2

            I would argue that scripts that don’t include a shebang at the top of them are more wrong than the shell that doesn’t really know any better what to do with them.

            Oh, the scripts absolutely are the problem. Unfortunately, that doesn’t mean that they don’t exist. Just another annoying papercut when trying out something new

          2. 2

            My experience with FreeBSD defaulting to csh is that scripts aren’t really the problem. Sure, some of them hard-code bash in the wrong location, but most of them are easy to fix. The real problem is one-liners. A lot of things have ‘just run this command in your shell’ and all of those assume at least a POSIX shell and a lot assume a bash-compatible shell. FreeBSD’s /bin/sh has grown a few bash features in recent years to help with this.

            1. 1

              FreeBSD’s /bin/sh has grown a few bash features in recent years to help with this.

              Oh that’s interesting, didn’t know that but makes sense. I’ve noticed this with busybox ash too – it’s been growing bash features, and now has more than dash, which is a sibling in terms of code lineage.

              A related issue is that C’s system() is defined to run /bin/sh. There is no shebang.

              If /bin/sh happens to be /bin/bash, then people will start using bash features unconsciously …

              Also, system() from C “leaks” into PHP, Python, and pretty much every other language. So now you have more bash leakage …

          3. 1

            example, how do you build up an array of args to pass to a command, and then properly string quote them, without hitting the case where you have an empty array

            FWIW my blog post is linked in hwayne’s post, and tells you how to do that !!! I didn’t know this before starting to write a bash-compatible shell :)

            Thirteen Incorrect Ways and Two Awkward Ways to Use Arrays

            You don’t need any quoting, you just have to use an array.

            a=( 'with spaces'   'funny $ chars\' )
            ls -- "${a[@]}"   # every char is preserved;  empty array respected

            The -- protects against filenames that look like flags.

            As mentioned at the end of the post, in YSH/Oil it’s just

            ls -- @a

            Much easier to remember :)

            This has worked for years, but we’re still translating it to C++ and making it fast.

            1. 1

              Ah, yes, you certainly know what I (erroneously) was referencing (for others since I mis-explained it: Indeed most times starting with an empty array and building up is both reasonable and more appropriate for what’s being composed/expressed.

      2. 1

        I’m very tempted. I tried it out way back when it first came out and checked back in recently on them and it’s amazing at how far the Nushell project has come.