I’ve been doing a lot of tinkering lately to re-imagine some local configuration + template setups (mostly about dynamic trees, transforms, merging – always with some adhoc DSL elements)
Nickel looks really nice, and I appreciate the bottom of the readme where they contrast to other formats (eg, dhall mostly came to mind, and they call that out)
I’m the author of ucg and Nickel’s type safety story is better than mine. I’ve got half-formed plans to add better typesafety to UCG but keep getting sidetracked before I can finish the implementation.
I did use it to manage applying kubernetes definitions: https://github.com/docteurklein/lube (à la docker-compose).
it was a bit unstable (api/lang bc breaks f.e) at the time.
It was really good at reducing verbosity by composing generic boilerplate based on conventions.
IIRC* this was a deliberate decision by Crockford. Because the entire point of JSON is to not be a programming language, and to be universally compatible with all programming languages.
This is also why it doesn’t have comments – people would have used comments to add extensions to the language (comments as pragma directives) and suddenly you don’t have JSON, you have 20 different varieties of the language JSON, and the language is utterly useless as a universal language data format.
IMO this is what people trying to replace JSON just don’t get. We already had a good executable interchange format 20, 30 years ago, it was called an “S-Expression” (Or XML, or YAML). Lisps are super easy to embed and they are super easy to write on a programming side (Which is why we have Greenspun’s Tenth Rule). They are easy to read and parse mechanically (inb4 “parentheses!!” - Vim from 1970 supports % to jump between brackets, and highlights matching brackets, so why doesn’t your editor?), but they also totally lack any discrete form of interoperability between languages, because now everything that parses your data format also needs to execute it, and to do that it has to execute it in the same way, and now you have fifty different specifications masquerading as one.
As mentioned, you can see this with YAML and XML too, although there are additional problems there like “Why am I writing so many damn angle brackets” and “Why is my configuration reliant on whitespace?”. And all of them suffer from “Why has my configuration file’s parser blown up my computer when handed this suspicious config file I copy pasted from IRC/Discord/SO”.
So fundamentally, this is a bad idea IMO and it comes from not respecting the journey and history that led to JSON. Personally, I really really hate writing JSON, but the alternatives are worse and there’s a reason why those alternatives are worse that goes beyond syntax alone. The executability and the universality of all of them leave them vulnerable to say, the billion laughs attack, among others. If I’m not remembering incorrectly (I read this in a book whose title I have forgotten :(), in the 90s this was one of the major problems with either Postfix or Sendmail (that might have been them having an accidentally executable configuration file).
We learned and Crockford’s inspiration was adopted everywhere because it tangibly fixed those problems by:
Not making a data format executable
Making it simple as hell while still retaining utility
Deliberately making sure that it would be irritating to create variations (to the point of not even supporting comments).
All so that any language, any program, and any system can read it and understand the contents, rather than having slightly different versions for every system under the sun, and then doubling that for proprietary extensions and all the fun we used to have with configuration files.
And now some people come along and go “Wait why aren’t our configuration files executable! Why don’t they have comments?!” and the cycle unfortunately repeats, again.
When will we learn for good?
* - I remember the design justification used to be listed on json.org but the site has changed somewhat since I saw it last, and the only alternative source I can find goes to a Usenet / Google Groups post that has since been purged.
JSON is a (mostly) reasonable data interchange format but it’s not a great configuration language. A data interchange format definitely should not be programmable. The degree of programmability that you want is still a bit of an open question to me.
I generally think UCL strikes a pretty good balance: it has simple macros and it also has built-in logic for merging different sources, including the ability to delete nodes from the tree that were present in an earlier version. It has the problem of being a single implementation of the specification (which is therefore largely defined by the implementation) and it’s not clear that the parsing logic for the combination modes could be properly abstracted.
Comments definitely are useful in a configuration language because it’s intended to be human readable and it’s useful to be able to store why you made a particular configuration change. The degree of programmability, in part, depends on the flow of configuration information from the user to the program. In a classical UNIX system, config files were files in the filesystem and you’d deploy a service by installing it, editing its config file, and then starting it. This is nice and simple because the config file is just a file and so you can version it as you would any other file. If it’s too complex to write by hand then you can write a program to generate it (if you’re Sendmail then this is your recommended way of writing a config file). There are two important things here:
The person writing the config file is trusted (and so won’t be writing malicious configuration files).
The code that runs to generate the static config file does not have to run with the privileges of the tool and the tool can parse a much simpler version of the config.
It’s quite easy, for example, to write a stand-alone tool that takes a set of UCL config files and spits out minified JSON of the combined result. That’s fine if you assume that the only things that need to edit the config are a text editor for the input and the main program for the output.
If you want to be able to deploy containerised versions of the service with small tweaks to the config then you need something else to take your generic config and tweak it. This probably wants to operate on the simplified JSON but you might want to use some of the file separation bits of UCL (or whatever) to be able to split the instance-specific config, rather than doing that as a tree transform later. If you want to provide a graphical tool for editing the config then it needs to preserve comments and whitespace in the input so that diff tools work well, which means that it needs to operate on the source material (or possibly needs to just provide new override files).
I suspect that JSON or even something like BSON is what you want for the final version of the config that a program loads. It can be parsed very quickly and has an in-memory size that’s proportional to the size of the file (unlike XML, with entities) and so you can restrict yourself to parsing config files of a plausible size. If you want to do something more clever then you add some tooling in front.
To my mind, the biggest problem with configuration at the moment has nothing to do with the file format. The problem is the lack of schemas. This is why I wrote a tool that takes a JSON schema and generates C++ classes reflecting that format for use with UCL. This forces you to have a schema, which ends up embedded in your final binary (configs are validated against the schema before being used) and so can be trivially exported for other tools to use. JSON Schema does have comments (as special nodes) and so you can embed the description of the meaning of configuration options in the schema and then expose it in configuration editing tools. There’s also the problem of how you get configurations to stateless VMs / containers, though I believe etcd is trying to address that.
By the way, nlohmann JSON lists this URL as the source for your claim about comments but it asks me for a Google account so I can’t verify that it actually does say what you and they quote.
All so that any language, any program, and any system can read it and understand the contents
I agree with this in broad strokes but have to disagree in the specifics. Parsing JSON is a minefield and every language, program and system has its own ad-hoc, non-documented and buggy way of doing it.
Yep, this is a great comment. I can’t wait for my solution to be mature enough for people to understand what I am on about but I think I have the answer.
I can agree with them, constantly needing to annotate code with types in Dhall when I just want 50 lines of template code to make my json outputs more reusable is annoying. That’s why I’m more keen towards Jsonnet.
This looks a little like jsonnet, which is at a weird middle ground of being too powerful to be a readable config language, but not powerful enough to do very many useful things. At $thing we just decided to use Python (we considered deno but at the time it was too new) when we need powerful config languages, and TOML when we don’t.
I wouldn’t normally expect this, but, given the overlap in communities, I’m a little surprised that there wasn’t passing mention of the advantages over Dhall. Maybe they decided that it wasn’t relevant to the audience of this post?
More like breaking E’s heart by forgetting that JSON is based on Data-E. JSON started out as the data-only subset of a language which could have been used to write programs. It was also used to store programs, by encoding programs as a miniature term language TermL.
Reminds me somewhat of Nix’s language, but for general configuration and not just nix (granted you said nix was an inspiration), I like it! May use it for projects where I want a flexible configuration language. A library for embedding parsing/execution would be nice alongside the CLI/REPL if there isn’t one already, preferably in C/C++ or another language with a C abi so people can make wrappers for it.
Having played with a few of these, I feel like starting with JSON and building up functions and templating and contracts is the wrong direction. CUE’s approach of graph unification is much smoother once you make the mental shift.
I suggested that they go whole hog in that direction instead of functions.
For the time being, we haven’t yet worked out one robust, practical and powerful solution to use Nickel as a front-end for Nix development. However, we have been actively thinking about it. And now, Nix integration is the very next step on the roadmap
I’ve been doing a lot of tinkering lately to re-imagine some local configuration + template setups (mostly about dynamic trees, transforms, merging – always with some adhoc DSL elements)
Nickel looks really nice, and I appreciate the bottom of the readme where they contrast to other formats (eg, dhall mostly came to mind, and they call that out)
Reading the post earlier made me realize there’s at least one more in this category: https://ucg.marzhillstudios.com
I’m the author of ucg and Nickel’s type safety story is better than mine. I’ve got half-formed plans to add better typesafety to UCG but keep getting sidetracked before I can finish the implementation.
another one, very similar to nickel or dhall: https://cuelang.org/
Curious if you’ve used Cue? I’ve followed it loosely but never see anyone talking about using it or what it’s good / bad at.
I did use it to manage applying kubernetes definitions: https://github.com/docteurklein/lube (à la docker-compose). it was a bit unstable (api/lang bc breaks f.e) at the time. It was really good at reducing verbosity by composing generic boilerplate based on conventions.
IIRC* this was a deliberate decision by Crockford. Because the entire point of JSON is to not be a programming language, and to be universally compatible with all programming languages.
This is also why it doesn’t have comments – people would have used comments to add extensions to the language (comments as pragma directives) and suddenly you don’t have JSON, you have 20 different varieties of the language JSON, and the language is utterly useless as a universal language data format.
IMO this is what people trying to replace JSON just don’t get. We already had a good executable interchange format 20, 30 years ago, it was called an “S-Expression” (Or XML, or YAML). Lisps are super easy to embed and they are super easy to write on a programming side (Which is why we have Greenspun’s Tenth Rule). They are easy to read and parse mechanically (inb4 “parentheses!!” - Vim from 1970 supports
%
to jump between brackets, and highlights matching brackets, so why doesn’t your editor?), but they also totally lack any discrete form of interoperability between languages, because now everything that parses your data format also needs to execute it, and to do that it has to execute it in the same way, and now you have fifty different specifications masquerading as one.As mentioned, you can see this with YAML and XML too, although there are additional problems there like “Why am I writing so many damn angle brackets” and “Why is my configuration reliant on whitespace?”. And all of them suffer from “Why has my configuration file’s parser blown up my computer when handed this suspicious config file I copy pasted from IRC/Discord/SO”.
So fundamentally, this is a bad idea IMO and it comes from not respecting the journey and history that led to JSON. Personally, I really really hate writing JSON, but the alternatives are worse and there’s a reason why those alternatives are worse that goes beyond syntax alone. The executability and the universality of all of them leave them vulnerable to say, the billion laughs attack, among others. If I’m not remembering incorrectly (I read this in a book whose title I have forgotten :(), in the 90s this was one of the major problems with either Postfix or Sendmail (that might have been them having an accidentally executable configuration file).
We learned and Crockford’s inspiration was adopted everywhere because it tangibly fixed those problems by:
Not making a data format executable
Making it simple as hell while still retaining utility
Deliberately making sure that it would be irritating to create variations (to the point of not even supporting comments).
All so that any language, any program, and any system can read it and understand the contents, rather than having slightly different versions for every system under the sun, and then doubling that for proprietary extensions and all the fun we used to have with configuration files.
And now some people come along and go “Wait why aren’t our configuration files executable! Why don’t they have comments?!” and the cycle unfortunately repeats, again.
When will we learn for good?
* - I remember the design justification used to be listed on json.org but the site has changed somewhat since I saw it last, and the only alternative source I can find goes to a Usenet / Google Groups post that has since been purged.
See also: https://www.cio.com/article/238300/xml-is-toast-long-live-json.html
JSON is a (mostly) reasonable data interchange format but it’s not a great configuration language. A data interchange format definitely should not be programmable. The degree of programmability that you want is still a bit of an open question to me.
I generally think UCL strikes a pretty good balance: it has simple macros and it also has built-in logic for merging different sources, including the ability to delete nodes from the tree that were present in an earlier version. It has the problem of being a single implementation of the specification (which is therefore largely defined by the implementation) and it’s not clear that the parsing logic for the combination modes could be properly abstracted.
Comments definitely are useful in a configuration language because it’s intended to be human readable and it’s useful to be able to store why you made a particular configuration change. The degree of programmability, in part, depends on the flow of configuration information from the user to the program. In a classical UNIX system, config files were files in the filesystem and you’d deploy a service by installing it, editing its config file, and then starting it. This is nice and simple because the config file is just a file and so you can version it as you would any other file. If it’s too complex to write by hand then you can write a program to generate it (if you’re Sendmail then this is your recommended way of writing a config file). There are two important things here:
It’s quite easy, for example, to write a stand-alone tool that takes a set of UCL config files and spits out minified JSON of the combined result. That’s fine if you assume that the only things that need to edit the config are a text editor for the input and the main program for the output.
If you want to be able to deploy containerised versions of the service with small tweaks to the config then you need something else to take your generic config and tweak it. This probably wants to operate on the simplified JSON but you might want to use some of the file separation bits of UCL (or whatever) to be able to split the instance-specific config, rather than doing that as a tree transform later. If you want to provide a graphical tool for editing the config then it needs to preserve comments and whitespace in the input so that diff tools work well, which means that it needs to operate on the source material (or possibly needs to just provide new override files).
I suspect that JSON or even something like BSON is what you want for the final version of the config that a program loads. It can be parsed very quickly and has an in-memory size that’s proportional to the size of the file (unlike XML, with entities) and so you can restrict yourself to parsing config files of a plausible size. If you want to do something more clever then you add some tooling in front.
To my mind, the biggest problem with configuration at the moment has nothing to do with the file format. The problem is the lack of schemas. This is why I wrote a tool that takes a JSON schema and generates C++ classes reflecting that format for use with UCL. This forces you to have a schema, which ends up embedded in your final binary (configs are validated against the schema before being used) and so can be trivially exported for other tools to use. JSON Schema does have comments (as special nodes) and so you can embed the description of the meaning of configuration options in the schema and then expose it in configuration editing tools. There’s also the problem of how you get configurations to stateless VMs / containers, though I believe etcd is trying to address that.
By the way, nlohmann JSON lists this URL as the source for your claim about comments but it asks me for a Google account so I can’t verify that it actually does say what you and they quote.
I agree with this in broad strokes but have to disagree in the specifics. Parsing JSON is a minefield and every language, program and system has its own ad-hoc, non-documented and buggy way of doing it.
Yep, this is a great comment. I can’t wait for my solution to be mature enough for people to understand what I am on about but I think I have the answer.
I’m a little sad seeing 0/0 mentions of Dhall. After being ‘forced’ to use it with Spago, it became my prefered typed configuration language.
Check out the bottom of readme: https://github.com/tweag/nickel/
I can agree with them, constantly needing to annotate code with types in Dhall when I just want 50 lines of template code to make my json outputs more reusable is annoying. That’s why I’m more keen towards Jsonnet.
This looks a little like jsonnet, which is at a weird middle ground of being too powerful to be a readable config language, but not powerful enough to do very many useful things. At $thing we just decided to use Python (we considered deno but at the time it was too new) when we need powerful config languages, and TOML when we don’t.
I wouldn’t normally expect this, but, given the overlap in communities, I’m a little surprised that there wasn’t passing mention of the advantages over Dhall. Maybe they decided that it wasn’t relevant to the audience of this post?
Can’t speak to reasoning here, but it looks like https://github.com/tweag/nickel/blob/master/RATIONALE.md#dhall-powerful-type-system addresses the question at least.
You broke JavaScript’s heart!
More like breaking E’s heart by forgetting that JSON is based on Data-E. JSON started out as the data-only subset of a language which could have been used to write programs. It was also used to store programs, by encoding programs as a miniature term language TermL.
Reminds me somewhat of Nix’s language, but for general configuration and not just nix (granted you said nix was an inspiration), I like it! May use it for projects where I want a flexible configuration language. A library for embedding parsing/execution would be nice alongside the CLI/REPL if there isn’t one already, preferably in C/C++ or another language with a C abi so people can make wrappers for it.
Having played with a few of these, I feel like starting with JSON and building up functions and templating and contracts is the wrong direction. CUE’s approach of graph unification is much smoother once you make the mental shift.
I suggested that they go whole hog in that direction instead of functions.
Nix with types! Love it! :)