1. 32
  1.  

  2. 11

    Author here. I heard of Cue on lobsters actually but only recently looked into it.

    ( I swear someone had mentioned to me that it was much more familiar looking to DevOps types than DHALL, but I can’t find the comment now.)

    This is my write-up on attempting to use it to validate some YAML.

    1. 1

      Hello Adam! Thanks for sharing, very interesting read I’m a big fan of your CoRecursive podcast BTW!

      1. 2

        Hi @chreke! Thanks for listening to the podcast!

    2. 3

      If the CNCF’s landscape is becoming increasingly and predominantly declarative, shouldn’t YAML validation (maybe json too) become a part of its remit to support the landscape across the board?

      1. 8

        I think the real solution is to move away from YAML all together. It’s not actually human writable. Every time I need to produce some YAML, I have to begin by copy pasting a block because its rules are so inscrutable. A better solution is something like what Caddy does: use JSON as the source of truth language and then write adapters that translate some DSL into JSON. That gives you a better DX on both sides: as a producer, you get an actually human writable syntax, and as a consumer, you can consume JSON in any language with the standard library with no weird language quirks and security gotchas.

        1. 8

          Maybe we should all go back to XML? Hard to get wrong. :-)

          1. 2

            XML has its own problems. CDATA making it hard to parse on one end of the spectrum and a lot of ways to say very similar things. Presence/absence of a node, having something as an attribute or child, everyone doing booleans differently, etc. And then there is standards stuff, like being sometimes UTF-8, sometimes UTF-16, and sometimes implementations not allowing the latter (XMPP, etc.).

            1. 1

              React JSX is proof that XML is a good language for writing a DOM, with some helpers. It’s just a terrible language for data serialization.

            2. 4

              I think the real solution is to move away from YAML all together. It’s not actually human writable.

              Guess I’m not human.

              (It did take me a couple weeks before I got comfortable with the indentation model.)

              1. 3

                Maybe I just need to sit down and actually read the spec for indenting instead of guessing how it works based on prior examples. But for a configuration language, why should you need to read something to figure it out? Why not have something that’s obvious?

              2. 3

                What bits of YAML do you find inscrutable? I hand-write syntactically correct YAML all the time. The YAML syntax that maps directly to JSON syntax (lists, objects, primitive types) doesn’t seem too inscrutable to me; it only gets weird if I use things like anchors.

                1. 3

                  If you have a nested list of objects and a pipe, how many spaces does the string need? How many spaces are in the result?

                  1. 3

                    You mean like this?

                    a:
                      b:
                        c: |
                          some
                           text
                          here
                        d: 1234
                    

                    Off the cuff, I believe the block needs to be indented at least one space more than the line with the pipe (c: |) and it will strip leading whitespace according to the indentation of the first line. So the JSON string equivalent of c would be "some\n text\nhere\n".

                    I admit I have to glance at the docs for some of the other text block modes, but pipe is the one I use 99% of the time when I use text blocks at all, and I don’t remember its behavior ever surprising me.

                2. 2

                  At work we use the AWS CDK (Java in our case) [0] to do something along those lines. This is infinetely better than hand-editing YAML files. Even the terraform folks have a CDK now. I think this is the right way forward.

                  [0] https://aws.amazon.com/cdk/ [1] https://www.terraform.io/cdktf

                  1. 1

                    Working with CDKTF after working with AWS CDK is fairly annoying (you begin to understand how broken types in terraform are after you encounter your third integer typed as a string), but it’s still so much better than having to write HCL files. Having a programming language under you is just better than trying to cobble the config together by hand.

                  2. 2

                    And there’s TOML on one side and HCL/UCL (which translate to JSON) on the other, with the latter working just fine for DevOps people.

                    YAML has good use cases. It’s great for static blogs and also seems to be nice to add some metadata to packages, etc.

                    Judging by how often it comes up, it doesn’t seem to work so well for a lot of configuration scenarios, unless it’s super simple and straight forward in which case space separated key value stuff like in most UNIX rc files, etc. as well as most other languages work as well.

                    I think YAML was basically the first thing people found when they looked for JSON with comments.

                    1. 1

                      It would be nice to have a minimalist yaml. Something with comments, nested datastructures, literal string blocks.

                      But with a simpler subset of primitives (ex: no implied dates, or string/numeric encoded variants.

                  3. 3

                    Really looking forward to the JSON Schema writeup now that you’ve had this experience!

                    Ninja edit: Do you know about Spectral as well?

                    1. 1

                      I haven’t heard of Spectral! Very cool.

                      JSON Schema seems so verbose, at first look.

                      1. 4

                        JSON Schema pays with verbosity being able to be serialized, stored and sent over the network. The advantages of being JSON.

                        BTW I can’t recommend JSON schema enough.

                    2. 2

                      I’ve got a half-solution in the form of serde parser that parses YAML (and other formats) directly into strongly-typed data structures. This makes it complain at parse time about mismatched data types and missing required fields.

                      1. 2

                        Woah, Dhall looks sick. The only problem with these things, though, is that JSON is pretty universally supported, but all of these have generators/validators at various degrees of functionality. And I don’t want to shell out to run some things to dynamically regenerate.

                        1. 2

                          I try to write my configuration in Typescript, which has an excellent type system, and then emit and commit .yml or .json files. We have a CI step that runs all code generators, and then asserts that the repository is clean - no changes - which guarantees that generated (config) files are up-to-date with the generator scripts. This lets our system experts review diffs to the fully-realized native config file, while letting everyone write configuration in the monorepo’s main language.

                          I think Cue is better tailored to this domain than typescript because it prevents extraneous keys and has these built-in tools for validating data files. But, Typescript is more than good enough, and avoids adding an extraneous language to our stack that we need to teach everyone.

                          The most frustrating issue with Cue, Typescript, or any other “compile to config” strategy is fixing runtime errors that slip through your fancy validador. I saw this with Kubernetes a lot at my last gig. When you generate a 2000 line YAML that you think is valid, but k8s decides violated some obscure constraint - how do you map that error line on 1676 back to the right spot in your Cue file? In the JS ecosystem there’s a “source map” concept that records the original origin source position for each output token… haven’t seen any config generator scheme that can do that.

                          1. 1

                            If you only use Haskell, there is no reason for using a separate IDL like Cue, because you can use an embedded DSL like Autodocodec.

                            At Zurihac, Tie was presented, also in Haskell. It parses openapi. The advantage of this is that some people prefer spec-oriented development, and if OpenAPI is the spec, you may not want to generate it, since it can be better if written manually.

                            The IDL approach seems nice if you have different languages for the client and server. But you’re probably gonna need to add/remove fields too. Rust’s Typical library handles this using the ‘asymmetric’ keyword. How does Cue deal with this?

                            I am missing a survey on this, because there seem to be so many different takes and dimensions to go. E.g. Whether to only support Json, whether to embed a version number in the data.

                            1. 1

                              Validating YAML is important. I don’t know if this exists but I think it would be cool to have a second pane open in my editor that uses the same spec and the context of my first pane to tell me what keys are available to me and what they mean. It would make writing k8 manifests easier. It might even work for Helm… but I don’t know how this would work for dynamically generated keys.

                              1. 1

                                Would probably require a specialized LSP, but all that information is accessible via kubectl explain

                                1. 1

                                  Shelling out on every key stroke might be kind of slow? Could cache though.

                                  1. 2

                                    The kubectl command is just printing struct information from the Go code. An LSP could access the same metadata and do something more efficient.

                                    I merely mention the command because you could construct a helper tool for your environment that could print something useful without the effort of writing an LSP.

                              2. 1

                                validating yaml, or any input data, is likely a good idea. does the validation have to be based on a human readable schema though?

                                i’ve started manually validating yaml the hard way, and writing human readable but otherwise useless schemas. it feels like a reasonable approach.

                                schema:

                                https://github.com/nathants/libaws#infrayaml

                                manual validation the hard way:

                                https://github.com/nathants/libaws/blob/26fe86c65bbb0ea63ca7d6728836a146964f7c10/lib/infra.go#L2492

                                1. 1

                                  I was just about to go into my standard “the human compiler at work” spiel but then I realized that one pretty nifty way to work with this would be an app that generated the schema automatically from your YAML.

                                  Like, you’d write

                                  ymlCorey:
                                      name    : "Corey Larson"
                                      bio     : "Eats, runs, and codes. Dad. Engineer. Progressive. LDS. Disneyland fanatic."
                                      avatar  : "/assets/images/authors/coreylarson.jpg"
                                  Alex:
                                      name    : "Alex Couture-Beil"
                                      bio     : "Alex enjoys writing code, growing vegetables, and the great outdoors."
                                  

                                  And the app would make a .cue file with

                                  Author : {
                                      name   : string
                                      bio    : string 
                                      avatar : string
                                  }
                                  

                                  and [string] : #Author and then you could look at that and be like “yes that’s right!” (Or if it were wrong, you’d have a very clear way of seeing exaxtly where the problem is.)

                                  And then from then on be able to validate from it.

                                  Edit: oh, wow, that is already possible, looks like they had the same great idea!

                                  1. 1

                                    I actually didn’t know about this cue import but I agree it looks super cool. Thanks for sharing.

                                  2. 1

                                    @adamgordonbell how is cue related to relax ng compact?

                                    1. 2

                                      I’m not sure. I’ve not heard of relax ng compact before, but it looks interesting.