1. 37
  1.  

  2. 22

    I have this great book, the book of mathematical counterexamples. It’s filled with things like “functions that are continuous without being integrable” and the like. Very helpful to re-confirm that, indeed, some implication does not go the other way.

    But it’s also kind of interesting because you start thinking about the edges of the statement. For example, JSON is not a YAML subset, but is JSON through %yaml 1.2 ------- prepending program a YAML subset? What qualities are missing to make it a subset?

    One might consider this to not matter on the ground, but this is applicable in loads of places! Maybe some database operations are susceptible to race conditions, but you can carve out specific ways of doing things to make them no longer susceptible. Skylark is “Python but without infinite loops”, meaning that you can write “python” that actually is guaranteed to terminate!

    We could defnitely use the book of computer science counterexamples.

    1. 7

      Tangential aside on a good comment: continuity does imply that a function is integrable! It’s the converse that doesn’t hold: not every integrable function is continuous, and maybe more interestingly, not every integrable function is the derivative of some other function.

    2. 15

      I’m not gonna pretend to be a yaml lover. I hate yaml. I have to use it often enough that I have the spec bookmarked for resolving problems, since SO is dogshit.

      This article implies that a Yaml 1.2 document must have a version header. This is not true, per the spec: https://yaml.org/spec/1.2.2/#681-yaml-directives

      A version 1.2 YAML processor must accept documents with an explicit “%YAML 1.2” directive, as well as documents lacking a “YAML” directive. Such documents are assumed to conform to the 1.2 version specification.

      If we stroll slightly back in time to the 1.2.1 spec, there is a useful section called “Relationship to JSON” that describes YAML 1.2.1’s remaining JSON incompatibility: https://yaml.org/spec/1.2.1/#id2759572

      The current spec version, 1.2.2 (from last year!) removes that section, but still has that incompatibility with json: https://yaml.org/spec/1.2.2/#mapping

      While this difference is notable, in all other respects, YAML 1.2 is a superset of JSON. I think functionally, given that a lot of languages will serialize JSON mappings to that language’s map impl, functionally trying to use a JSON object with non-unique keys may be problematic. To be fair, RFC4627 says that keys in a JSON object SHOULD be unique anyway.

      Ultimately, I think it’s fair to say YAML 1.2 is a good-enough superset of JSON, given that the only notable incompatibility is something the JSON RFC advises against.

      1. 3

        The question is not whether YAML 1.2 is a superset of JSON, but whether YAML as a whole is.

        The YAML 1.2 spec allows a document without a version header, but in practice they’re mandatory because parsers default to YAML 1.1 for compatibility. Consider a document like this:

        ---
        key_prefixes:
          - 1d2
          - 1e2
          - 1f2
        

        How should that be parsed? YAML 1.1 provides one answer, YAML 1.2 provides a different answer. If a YAML parser wants to avoid a breaking change, it needs to default to the 1.1 behavior in the absence of a version directive.

        This is the reason I chose exponential notation for the example:

        • There were other JSON incompatibilities in YAML 1.1 that caused parse failures, and those were fixed in 1.2 – since the document couldn’t parse as 1.1, the 1.2 parse is the only valid parse.
        • Changing the parse of no from false to "no" is a backwards-incompatible change for YAML, but it doesn’t affect JSON, since no isn’t a valid JSON token.
        • {"a": 1e2} is a document that is simultaneously valid JSON, valid YAML 1.1, and valid YAML 1.2 – but gets parsed differently depending on the YAML version.

        IMO the best (only?) way to have prevented this would have been to make YAML 1.1 stricter about which unquoted tokens can be strings. If the spec had said “any token starting with [0-9] must parse as a number” then there would’ve been more room to maneuver in YAML 1.2.

        1. 3

          If the question is if yaml as a whole is a superset of json, then it’s not and nowhere does it claim to be. The only claims of being a superset of json are 1.2 (again, using specs as the SoT).

          imo there are much bigger problems with yaml and its version handling here. A document has very little control over how it gets parsed, even with yaml directives. They’re not the solution to what you’re describing.

          Adding a yaml directive to docs that require 1.2 handling isn’t a fix, since a valid yaml 1.1 parser will parse it anyway.

          Adding a yaml directive to a doc that requires 1.1 handling won’t trigger any kind of backwards-compat mode, at least if it’s spec conforming.

          Essentially, a yaml directive for minor versions is essentially meaningless in a compliant parser (yes, I view warnings as easily-ignored and meaningless).

          imo 1.2 should’ve been a major version bump. I think that yaml directives should trigger either backwards-compatible parsing or an error if that version isn’t supported. tbh I’d prefer if version specifiers were mandatory.

          1. 1

            If the question is if yaml as a whole is a superset of json, then it’s not and nowhere does it claim to be. The only claims of being a superset of json are 1.2 (again, using specs as the SoT).

            This post isn’t about the YAML spec, as such. If I may direct your attention to the very first sentence of my post, and the screenshots immediately following it, you’ll understand who/what the post is in response to.

            1. 2

              Your post isn’t about specs until it is about specs.

              Bringing in yaml 1.2 and commenting that it doesn’t save you because the version tag isn’t present in JSON isn’t quite correct. The situation is more complicated than that.

              1. 1

                OK, sorry, I’ll be more explicit.

                When someone says “JSON is a subset of YAML, you can parse it with a YAML parser”, I show them that this does not in fact work by parsing the JSON document {"a": 1e2}. Pretty much every popular YAML parser prints that out as {"a": "1e2"}. The person is surprised / distressed / sad / etc.

                Sometimes they ask a follow-up question about YAML 1.2, because the 1.2 spec does fix that issue. Usually of the form “when Psych/PyYAML/etc gets support for YAML 1.2 won’t this all be fixed?”.

                Then I demonstrate that YAML 1.1 and YAML 1.2 produce different parses for a valid YAML document, so they will have to version their YAML documents. This means they can’t parse JSON using YAML 1.2 rules unless they were willing to go through every one of their existing places that might have YAML and convert it from 1.1 to 1.2 syntax. Sometimes this is impossible, for example when YAML files in Git history must be processed.

                When you say that YAML 1.2 is a superset of JSON, that may or may not be true, but it doesn’t matter here. This post is for people who believe that YAML in general – all versions of it, any conforming parser – is a superset of JSON. Which is not true, as demonstrated.

                1. 1

                  Then I demonstrate that YAML 1.1 and YAML 1.2 produce different parses for a valid YAML document, so they will have to version their YAML documents.

                  This statement is incorrect and what I’ve been trying to point out. You do not need to version your yaml documents, since any compliant yaml 1.2 parser will parse it with 1.2 rules.

                  1. 1

                    Let’s say I have a Git repository with five years of history and … order of magnitude, a million YAML files.

                    Those YAML files are written for a YAML 1.1 parser. Their content might look like this:

                    ---
                    options:
                      preflight: yes
                      legacy_keys: no
                    key_prefixes:
                      - 1d2
                      - 1e2
                      - 1f2
                    

                    which gets parsed the same as this:

                    {
                      "options": {
                        "preflight": true,
                        "legacy_keys": false
                      },
                      "key_prefixes": ["1d2", "1e2", "1f2"]
                    }
                    

                    I have a set of tools, written in various languages (Ruby, Python, C, and Go) which parse those files. It is important that new versions of the tools be able to parse existing files.

                    If I were to take one of those tools, with its YAML 1.1 parser, and pass JSON through it, the JSON would be parsed incorrectly.

                    If I were to take one of those tools, and change its YAML parser to one that did YAML 1.2 by default, then JSON would be parsed correctly but every single one of the existing YAML files in my repository will now be parsed wrong.

                    Since you say that using %YAML directives is unnecessary, how would you recommend implementing a parser such that:

                    • all of the existing YAML 1.1 files are parsed as-is,
                    • any new YAML 1.2 files are parsed as YAML 1.2, and
                    • JSON files are parsed as JSON via a YAML 1.2 parser?
                    1. 1

                      This is again part of what I was saying above. Assuming fully spec-compliant parsers, you cannot do this.

                      Again, here are the relevant parts of the spec.

                      YAML 1.1 https://yaml.org/spec/1.1/#id895631 :

                      Documents with a “YAML” directive specifying a higher minor version (e.g. “%YAML 1.2”) should be processed with an appropriate warning.

                      YAML 1.2 https://yaml.org/spec/1.2.2/#681-yaml-directives :

                      A version 1.2 YAML processor must also accept documents with an explicit “%YAML 1.1” directive. Note that version 1.2 is mostly a superset of version 1.1, defined for the purpose of ensuring JSON compatibility. Hence a version 1.2 processor should process version 1.1 documents as if they were version 1.2, giving a warning on points of incompatibility (handling of non-ASCII line breaks, as described above).

                      What you want, given spec-compliant parsers, is impossible. Claiming that YAML directive headers are either necessary or a panacea is incorrect, because at best they give you warnings.

                      1. 1

                        I think there’s no any way real-world parsers will be able to apply that non-binding optional advice from the YAML 1.2 spec, and that any attempt to do so would be doomed to failure.

                        If a spec says “parsers should break compatibility with the widely-adopted prior version of this spec, for no good reason and without a mitigation”, then that part of the spec will be ignored.

                        Since the spec says “should”, not “must”, it is fine for a conforming parser to apply YAML 1.2 semantics only when the document starts with %YAML 1.2.

                        1. 1

                          You’re right, I was wrong for asserting that there’s a requirement to do so rather than a strong recommendation.

                          However, I don’t think it makes sense to assert that real world parsers will uniformly ignore parts of the spec because we don’t like it. At least when offering generic advice without referring to specific real world parsers, I think it makes more sense to assume that it’s more likely that implementers will stick to the recommendations.

                          Since we’re talking about real-world parsers, I decided to check a few, and at least the handful of ones that I checked did not exhibit your behavior.

                          Given the following yaml whose behavior differs based on whether or not we’re parsing with a yaml 1.1 and yaml 1.2 parser:

                          - no
                          - 1e2
                          

                          I also have versions of these with the appropriate yaml version specified explicitly.

                          Using yaml-rust (which backs serde_yaml):

                          ~/p/yaml|15:14:55|0$ ./rust-yaml.rs *.yaml
                          working on 1.1-header.yaml
                          first value? no
                          second value? 100
                          
                          working on 1.2-header.yaml
                          first value? no
                          second value? 100
                          
                          working on no-header.yaml
                          first value? no
                          second value? 100
                          

                          Using ruamel.yaml:

                          working on ../1.1-header.yaml
                          /Users/william.orr/proj/yaml/ruamel/.venv/lib/python3.10/site-packages/ruamel/yaml/constructor.py:1223: MantissaNoDotYAML1_1Warning: 
                          In YAML 1.1 floating point values should have a dot ('.') in their mantissa.
                          See the Floating-Point Language-Independent Type for YAML™ Version 1.1 specification
                          ( http://yaml.org/type/float.html ). This dot is not required for JSON nor for YAML 1.2
                          
                          Correct your float: "1e2" on line: 3, column: 2
                          
                          or alternatively include the following in your code:
                          
                            import warnings
                            warnings.simplefilter('ignore', ruamel.yaml.error.MantissaNoDotYAML1_1Warning)
                          
                          
                            warnings.warn(MantissaNoDotYAML1_1Warning(node, value_so))
                          data[0] is False with type <class 'bool'>
                          data[1] is 100.0 with type <class 'ruamel.yaml.scalarfloat.ScalarFloat'>
                          
                          working on ../1.2-header.yaml
                          data[0] is no with type <class 'str'>
                          data[1] is 100.0 with type <class 'ruamel.yaml.scalarfloat.ScalarFloat'>
                          
                          working on ../no-header.yaml
                          data[0] is no with type <class 'str'>
                          data[1] is 100.0 with type <class 'ruamel.yaml.scalarfloat.ScalarFloat'>
                          

                          What’s notable here, is that ruamel deviates from the spec in that it does change behavior to 1.1-compatible behavior if the %YAML 1.1 directive is applied.

                          go-yaml has a section claiming 1.2 compatibility, but has the weird behavior.

                          looking at ../1.1-header.yaml
                          type of first string
                          type of second float64
                          
                          looking at ../1.2-header.yaml
                          could not parse ../1.2-header.yaml: yaml: found incompatible YAML document
                          
                          looking at ../no-header.yaml
                          type of first string
                          type of second float64
                          

                          We have YAML 1.2 semantics in the very limited test I provided, unless we explicitly specify that we want YAML 1.2 (they do imply their 1.2 compatibility is incomplete, so maybe that’s why?).


                          If a spec says “parsers should break compatibility with the widely-adopted prior version of this spec, for no good reason and without a mitigation”, then that part of the spec will be ignored.

                          So this is to say, I don’t agree with that assertion, and just quickly sampling some popular yaml parsers in popular languages, that assertion doesn’t bear out.

      2. 12

        It’s fashionable to hate XML because it was used in a lot of places it was a bad fit in the 00s, but at least it’s a pretty good document language.

        YAML though is always a bad fit. If you want machine readable config, use JSON; human readable, use TOML. When does YAML ever fit?

        https://mobile.twitter.com/carlmjohnson/status/1372224080749993988

        1. 7

          TOML hasn’t existed as long as YAML. YAML also has a lot of nice features, including whitespace-sensitive (if it’s a feature for Python, it’s definitely one here) which removes a lot of syntax noise. One thing I can’t live without when using it as a configuration language is anchors. Being able to reduce duplication with a feature of the language itself is very nice.

          In all, I’d prefer if apps just used a DSL or published a complete json-schema which lets you use whatever config you want.

          1. 14

            TOML hasn’t existed as long as YAML

            Sure, but people keep using YAML for new things.

            YAML also has a lot of nice features, including whitespace-sensitive (if it’s a feature for Python, it’s definitely one here) which removes a lot of syntax noise.

            Ugh, this is the single thing I hate most about YAML. I can defend Python, but YAML’s whitespace rules are picky and hard to internalize.

            One thing I can’t live without when using it as a configuration language is anchors. Being able to reduce duplication with a feature of the language itself is very nice.

            Cue has a good solution to this, but my top line is that if you really need references (as opposed to their being convenient), the thing you’re doing is essentially programming, and you should use a programming language, not a configuration language.

            In all, I’d prefer if apps just used a DSL or published a complete json-schema which lets you use whatever config you want.

            Caddy has an interesting approach where the canonical format is JSON, and then they make a ton of adaptors so you can use a Caddy specific DSL too and plug it into APIs etc.

            1. 14

              if you really need references … the thing you’re doing is essentially programming

              I disagree. Needing references just means the thing you’re describing is a DAG, and why is that programming when a tree is not? (Hopefully YAML doesn’t let you create loops, but even so that’s still just a directed graph.)

              As a counterexample, RDF lets you describe directed graphs but is definitely a description language not a programming language. You can of course describe RDF in a programming language but that’s not the same thing as RDF itself.

              1. 1

                fwiw I’d argue adjacency lists are easier to read, especially if you’re not familiar with the dark corners of YAML

        2. 5

          I’m not convinced by this example with %YAML 1.2 at the top, and 1e2 and “no”. That demonstrates that YAML 1 and YAML 1.2 are different, not that JSON isn’t a subset of YAML.

          To prove the claim you need to show an example of a JSON doc like

          { "foo": "1e2",
            "bar": "no"
          }
          

          which is interpreted incorrectly by YAML parsers. Both in terms of the spec and in practice.

          Since JSON isn’t valid YAML 1 syntax, and there’s no interpretation of JSON syntax under YAML 1, I believe there’s no conflict, even without the declaration.

          FWIW I have lightly tested this by converting some of my YAML to JSON, and it works fine with multiple YAML implementations

          https://github.com/oilshell/oil/blob/master/.travis.yml

          1. 5

            The example JSON doc that is interpreted differently is the second code listing:

            {"a": 1e2}
            

            When parsed as JSON, this should return {"a": 100} – the value is a number.

            The statement “JSON is a subset of YAML” means that every JSON document can be parsed with a YAML parser to obtain the same output as a JSON parser.

            1. 2

              Ahh sorry for missing that! You are right, and I repro’d it here:

              https://github.com/oilshell/oil/commit/e33031c6fc84b5461c5da8864cb4e17b58f407b3

              Hilariously the YAML library I’m using outputs the same wrong data structure even with the %YAML 1.2 prefix … (I wasn’t really using it; that was mainly an experiment that I didn’t follow up on.)

              Thanks for this blog post. I think for some purposes it will be OK to use the prefix, but nonetheless I agree that JSON isn’t a subset of YAML!

          2. 4

            YAML is just a bad format, the sooner people accept it the sooner we can abandon it and move on.

            1. 3

              This might be a silly question but…… why would you WANT to treat JSON as a YAML subset? Is it just a fun exercise, or are there real use cases/needs this solves?

              1. 2

                So that rf you make a tool that can be configured by YAML, you can say “if you don’t know YAML, just write JSON”.

              2. 2

                I had a hard time until I realized this but solved this, at least partially [1], on nixpkgs by sending a pull request [2].

                [1] https://github.com/NixOS/nixpkgs/pull/133807#issuecomment-921712627 [2] https://github.com/NixOS/nixpkgs/pull/133807