1. 38

  2. 13

    I actually think the TOML is harder to scan/read than JSON in the example provided. Maybe it’s different in a bigger file, but I tend to doubt it. I will agree that JSON not having comments is annoying, though.

    1. 9

      I agree, but that may just be familiarity. Anyway, in the example TOML is objectively more repetitive. Also, the comparison isn’t showing the same things: the JSON version is pinning specific release versions, while the TOML version seems to be mostly pointing at git branches.

    2. 17

      There are only two real deficiencies with JSON, IMHO:

      • No standardized method of commenting
      • No multiline strings

      There are other problems, of course: no standardized date format, no standardized mechanism for including arbitrary binary data, no hexadecimal notation, etc….but really the two things mentioned above are the only true problems it has.

      (Again, IMHO.)

      1. 36

        I would argue failure to support a trailing comma is a serious defect in any format expected to support the addition of elements over time.

        To clarify, that’s maybe not a defect in json per se. If it’s just a serialization format, fine, whatever. But for the use case of tracking dependencies, configurations, etc., it’s bad. Choosing to use json for this is a defect.

        1. 7

          EDN’s use of whitespace as element seperators, is better than commas. https://learnxinyminutes.com/docs/edn/

          1. 6

            I heard an anecdote from someone who had to make a streaming json parser, and its performance was significantly hampered because it had to check for the trailing comma.

            1. 2

              I have a “counter-dote”. https://github.com/stig/json-framework/blob/master/Classes/SBJson5StreamParserState.m implements a state machine for a streaming JSON parser that is not impaired at all by checking for trailing commas.

              Admittedly I remember struggling with handling the trailing comma issue previous versions, before I adopted the state machine. Also, my parser is not the most performant—even for Objective-C.

            2. 2

              Lack of trailing commas is also one of the things that irks me most about SQL…

              1. 5

                haha, so I was even going to suggest sql as an alternative to json for configs. Create an in memory sqlite database, then .load and .dump. What could be better than the ability to query your config? And it plays nice with version control because every line ends with a semicolon, not just some of them.

                PRAGMA foreign_keys=OFF;
                BEGIN TRANSACTION;
                CREATE TABLE depends (name, version);
                INSERT INTO depends VALUES('foo',7);
                INSERT INTO depends VALUES('bar',3);

                Plus you get atomic updates, rollback, etc. when doing a hot reload. Awesome, right? :)

                1. 5

                  Yeah but as someone who deals with code or system configs first, and databases as an afterthought, SQL just seems so flimsy. Database setup is imperative more than declarative. Data insertion is the same. And the actual data types are terribly weak unless you go through a lot of work trying to enforce invariants.

                  I do know a lot of this is my own bias, because there seems a fairly large cognitive gap between “functional programming first” programmers such as myself where functions and structures come first and composing them should be easy, and “database-first” DBAs like some of my friends where sets of structures come first, functions seem a special case, and composition seems to have different rules… But it’s a hard gap for me to cross.

                  1. 4

                    And the actual data types are terribly weak unless you go through a lot of work trying to enforce invariants.

                    This is one point that I disagree with. If you’re primarily working with MySQL or SQLite, I guess it may be true, or if you’re thinking about SQL databases as filtered through a “least common denominator” database library, but types are pretty robust in most databases.

                    1. 2

                      SQL Server/TSQL is also strongly typed.

                      1. 1

                        Yeah, it’s things like that which find me always reaching for postgres over MySQL. <3

                        As I said, part of the problem is mine… I’m not actually very good with databases and so I tend to solve everything the most complicated and brute-force way. I’m trying to improve though, bit by bit.

                      2. 1

                        If you haven’t read The Third Manifesto you should. Made me fall in love with the relational model all over again

                      3. 1

                        Kinda like what slapd does, but I must say I’m not very fond of configuring it that way.

                  2. 15

                    Also it doesn’t let you round trip numbers correctly.

                    EDIT: To elaborate, you can’t store +/-Inf or Nan to them.

                    1. 6

                      This shortcoming is so important when you deal with complex number crushing applications that many libs add them, for example Google’s Gson. I even made a library to parse those numbers client-side (JSON.parseMore, this approach having the disadvantage that it isn’t as heavily optimized as the native JSON.parse of the most modern browsers, which is also, btw, why you can’t easily replace JSON by other formats like messagepack which would otherwise be faster to parse).

                      1. 2

                        With regards to parsing performance, an interesting approach is to piggyback on those highly optimized parsers. For example, the Transit format provides a richer set of builtin types and an extension mechanism (pretty much equivalent to EDN) but is encoded as JSON or MsgPack while in transit (I guess that’s the reason for the name). That way, serialization and deserialization profit from the host format’s optimized parser implementations.

                        1. 1

                          Transit is just a hack to enable use of native-code JSON parser for browser applications, because parsing EDN might be too slow with parser written in js.

                          It’s not even human-readable and of course is absolutely unusable for configs. We’re using it for React Native and I have to convert it to EDN every time I want to debug some output :(

                    2. 9

                      JSON5 addresses a lot of these issues.

                      1. 6

                        Ren has literals for most common types (dates, IP addresses, URLs, emails) built-in.

                        Another interesting approach, that is far more powerful (while still keeping it sane) is Dhall.

                        1. 2

                          Ren looks a lottle like REBOL, which is neat to me.

                          Dhall looks cool.

                          1. 6

                            Ren looks a lottle like REBOL, which is neat to me.

                            Oh, well, because it is to REBOL what JSON is to JavaScript :)

                            Dhall looks cool.

                            I’ve never personally used it extensively but I’ve heard people using it to also handle configuration updates (e.g. a Dhall function changing old config into new config) that are type-checked. Dhall functions are powerful but without Turing-completeness. In other words JSON and TOML are just variations on syntax preferences but ultimately they’re like each other, Dhall is something else.

                          2. 1

                            Dhall allow external url which looks like crazy from security standpoint

                            1. 1

                              Yep, but you can add content digest to the URL (SHA sum).

                              1. 1

                                it is not by default, as such will not be always used and eventually become a security hole

                          3. 4

                            I do like hjson as an alternative to json nowadays. When I used toml with Cargo/rust, It seriously cost me some time to understand what “data structure” the toml file would be parsed into. I guess I still don’t firmly understand it.

                            1. 1
                              • No streaming capability
                              • No efficient encoding
                              • Requires quotes escaping in strings (and consequently to parse every byte in every string to find a terminator)
                              • No ability to skip ahead during decoding
                              • No ability to store raw data

                              People invented TLV for a reason https://en.wikipedia.org/wiki/Type-length-value

                              ASN.1 DER: https://en.wikipedia.org/wiki/Abstract_Syntax_Notation_One#Example

                              1. 1

                                Since others have mentioned some alternatives, there’s always Lua. I use Lua as a configuration file format both for personal projects and for work. Comments, multi-line strings, ability to call functions (like the example one calling os.setlocale()) and trailing commas. Oh, don’t like commas? Replace them with semicolons if you wish.

                              2. 9

                                Whew, that new format is repetitive:

                                targets = [ "//:satori" ]
                                package = "github.com/buckaroo-pm/google-googletest"
                                version = "branch=master"
                                private = true
                                package = "github.com/buckaroo-pm/libuv"
                                version = "branch=v1.x"
                                package = "github.com/buckaroo-pm/madler-zlib"
                                version = "branch=master"
                                package = "github.com/buckaroo-pm/nodejs-http-parser"
                                version = "branch=master"
                                package = "github.com/loopperfect/neither"
                                version = "branch=master"
                                package = "github.com/loopperfect/r3"
                                version = "branch=master"

                                How about a simple .ini?

                                name = satori
                                libuv/libuv         = 1.11.0
                                google/gtest        = 1.8.0
                                nodejs/http-parser  = 2.7.1
                                madler/zlib         = 1.2.11
                                loopperfect/neither = 0.4.0
                                loopperfect/r3r     = 2.0.0
                                buckaroo-pm/google-googletest = 1.8.0
                                1. 6

                                  TOML certainly is repetitive. YAML, since it hasn’t come up yet, includes standardized comments, hierarchy, arrays, and hashes.

                                  # Config example
                                  name: satori
                                    libuv/libuv: 1.11.0
                                    google/gtest: 1.8.0
                                    nodejs/http-parser: 2.7.1
                                    madler/zlib: 1.2.11
                                    loopperfect/neither: 0.4.0
                                    loopperfect/r3: 2.0.0

                                  More standards! xkcd 792. I’m all for people using whatever structured format they like. The trouble is in the edges and in the attacks. CSV parsers are often implemented incorrectly and explode on complex quoting situations (the CSV parser in ruby is broken). And XML & JSON parsers are a popular vectors for attacks. TOML isn’t new of course, but it does seem to be lesser used. I wish it luck in its ongoing trial by fire.

                                  1. 1

                                    YAML already has wide support so it’s quite odd it hasn’t been mentioned yet

                                  2. 5

                                    Toml can be written densely too, e.g. (taken from Amethyst’s cargo.toml):

                                    nalgebra = { version = "0.17", features = ["serde-serialize", "mint"] }
                                    approx = "0.3"
                                    amethyst_error = { path = "../amethyst_error", version = "0.1.0" }
                                    fnv = "1"
                                    hibitset = { version = "0.5.2", features = ["parallel"] }
                                    log = "0.4.6"
                                    rayon = "1.0.2"
                                    serde = { version = "1", features = ["derive"] }
                                    shred = { version = "0.7" }
                                    specs = { version = "0.14", features = ["common"] }
                                    specs-hierarchy = { version = "0.3" }
                                    shrev = "1.0"
                                    1. 4

                                      More attributes are to come. For example, groups:

                                      package = "github.com/buckaroo-pm/google-googletest"
                                      version = "branch=master"
                                      private = true
                                      groups = [ "dev" ]
                                      1. 1

                                        Makes sense, I don’t see an obvious way to encode that in the ini without repeating the names of deps in different sections.

                                    2. 8

                                      Why can’t we just use S-expressions? It’s been around forever. Easily parseable. Human readable.

                                      1. 3

                                        Yup. It was one of the ones I used in the past. Don’t even to study advanced, parsing strategies. Simple enough to write parser out by hand on paper.

                                      2. 12

                                        Don’t people get tired of reinventing the wheel?

                                        INI -> XML -> JSON -> “INI with a heirarchy”, and it only took 40 years.

                                        And then people wonder why we still can’t deliver anything on time or with fewer bugs.

                                        1. 3

                                          Doesn’t it stem from that same wheel turning and bringing us new people/ideas?

                                          I think it would be more a case of people coming into this and seeing what we have now, hating it, and doing their own thing.

                                          1. 2

                                            Why do the new people hate it, though?

                                            Is it because of actual, measurable deficiencies in the existing technology, or vague feeling of “ickiness” or “my chosen development environment doesn’t mesh well with the existing technology”.

                                        2. 5

                                          Another one? When are we going to get off the self-describing data format treadmill? This happens every five years like clockwork, it would be nice if people would think about why instead of just treating this problem space as a popularity contest.

                                          I have more serious thoughts on the topic, and I’ve gone into them here before, but it’s late so I’m not up to getting into it right at the moment. I’m sure I’ll be happy to elaborate in the morning; my apologies for being cryptic meanwhile.

                                          1. 5

                                            What a lot of bikeshedding! At time of writing there are 44 comments here and nobody has mentioned S-expressions yet. We’ve had them since the 1950s they’re arguably more easily parseable, readable and layout-able by human and computer than anything mentioned here.

                                            1. 5

                                              nobody has mentioned S-expressions yet

                                              They were mentioned 11 hours ago by @jxy here.

                                              1. 14

                                                INI does not support hierarchies.

                                                1. 10

                                                  INI files have no well-defined spec, and very few parser implementations work the same way.

                                                2. 3

                                                  toml is one of the simplest and most effective changes I’ve seen in file formats for a long time.

                                                  I cooked up the first common lisp toml reader, but I have not adequately brought up to date in far, far, far too long.

                                                  1. 3

                                                    How about my format: BOML ;-) Aimed to be parsed by shell scripts.

                                                    ben:QUOTA:1000 1500
                                                    1. 2

                                                      Why not simply go the Nix way? I have found it to be very reasonable during merges.

                                                      Specifically, one can either stick with a JSON like format or expand the data into a dotted format.

                                                      1. 3

                                                        My understanding is that Nix provides a DSL, but TOML and JSON are just data. Are these really equivalent?

                                                        1. 1

                                                          Nix provides the DSL as a configuration language, which can be used instead of TOML here. For example, see this example (not mine).

                                                          1. 2

                                                            Yeah, but it’s Turing Complete. Which is nice if you intend to always modify it by hand anyway, since you can write custom functions to reduce the file size, but it also means that programs like dependabot go from parse-tweak-serialize to IDE Refactoring Engine.

                                                            1. 6

                                                              Perhaps Dhall may be a better choice then?

                                                            2. 1

                                                              Looks like a nice language! Thanks for the example. How is the parser support across various languages?

                                                              1. 2

                                                                Well, lacking at this point. There is only Java, Go, Rust, and Haskell, Ocaml and I suppose C.

                                                                I would like to note that Dhall might also be a better choice (In that it is better designed to be a configuration language than any of the alternatives, and has a number of language bindings in progress. Unfortunately F# is not one of them.)

                                                              2. 1

                                                                Haven’t they simply copied HOCON? At least it looks a lot like it.

                                                                1. 3

                                                                  Good point

                                                                  According to the git history, HOCCON started out in 2011

                                                                  commit 9ca157d34a4f2e14ac0d88de001611bcf3e911d0
                                                                  Author: Havoc Pennington <hp@redacted>
                                                                  Date:   Sat Nov 5 16:45:25 2011 -0400
                                                                      WIP initial sketch

                                                                  where the nix project started out in 2003:

                                                                  commit 2766a4b44ee6eafae03a042801270c7f6b8ed32a
                                                                  Author: Eelco Dolstra <eelco.dolstra@redacted>
                                                                  Date:   Fri Mar 14 16:43:14 2003 +0000
                                                                      * Improved Nix.  Resources (package descriptors and other source
                                                                        files) are now referenced using their cryptographic hashes.
                                                                        This ensures that if two package descriptors have the same contents,
                                                                        then they describe the same package.  This property is not as
                                                                        trivial as it sounds: generally import relations cause this property
                                                                        not to hold w.r.t. temporality.  But since imports also use hashes
                                                                        to reference other packages, equality follows by induction.
                                                                      svn path=/nix/trunk/pkg/; revision=5

                                                                  I guess Nix gets the precedence by ~ a decade here.

                                                                  But to be completely honest, if we make abstraction of some implementation details (builtins, builtin types, strings contexts, …), Nix is just about combining sets using some lambdas. They are probably not the first to come with this design and are unlikely to be the last.

                                                                  1. 1

                                                                    I didn’t know. You are right, it does look a lot like it.

                                                            3. 3

                                                              I’m fairly certain TPW wrote the TOML spec as a joke and no one got it. Now everyone uses it.

                                                              1. 3

                                                                [citation needed]

                                                                Also, I guess the joke is on him! Ha!

                                                              2. 1

                                                                There’s also the option to make a superset of typed JSON, like Amazon Ion.


                                                                But the author seems to be concerned with human readability, while Ion is aimed at machines.