Threads for dloreto

    1. 2

      Congrats on shipping something, that right there is better than most.

      I didn’t see any discussion or comparison between the various other json extension languages, such as Dhall, just referencing them at the end. What makes a Turing complete language moree useful here than a total language like Dhall?

      1. 4

        My view is that it’s better to pick a “good” language that is widely adopted, has well known syntax and a thriving ecosystem, than it is to pick a “perfect” language for which there’s a bigger adoption curve because the syntax needs to be learned and the ecosystem needs to be developed.

        If you (or the users of your software) are willing to learn Dhall and invest in it’s ecosystem, than I think Dhall is a perfectly valid choice.

        FWIW, I don’t think the “total” vs “turing complete” distinction matters in practice. While Dhall programs are guaranteed to terminate, you could write a program that takes a very very long time to compute. In practice people don’t, by using good software engineering practices; but those are the same practices that let you develop using a turing complete language day-to-day.

        1. 1

          That makes sense. Thanks for the response!

    2. 1

      This is pretty much just json with a schema attached. They only difference in the data format itself is including some newer JS syntaxes.

      The reason JSON has the excruciating limits it has basically boil down to crockford wanting to use eval to parse it, and that meant it need to be validated, which he wanted to be done with just a reflex. More or less every annoyance in json boils down to that. It’s was such a major mechanism of data transfer that in JSC I added a JSON preflight step when parsing JS. It had to support a few idiosyncrasies (variations on jsonp, parens for eval, etc), and was a massive PLT win on the majority of sites of the era. I’m not sure how much of a gain it is nowadays (I think XHR just directly supports json now, and CORS handles the jsonp cases).

      Edit: huh, I missed that it actually evaluates the interpolation strings. That’s a clear no go for a data interchange format. You can’t have “parsing data” include “execute code”, that’s unsound - and in the days of json-via-eval was a recurrent source of data compromises.

      1. 3

        I’ll add that this is not meant as a data interchange format. It’s meant as a programmable configuration language for trusted use cases.

        For data interchange, say, on an API, JSON or ProtoBuf would still be the right choice.

      2. 2

        This is more intended for trusted configuration files more than anything.

    3. 6

      Neat. We use Typescript for all the config we can at Notion - and we do things like spit out CircleCI config from a well-typed typescript file. It’s kinda cool you’re exposing this pattern without the consumer needing Node by embedding QuickJS/similar. That said some of the goodness of TS-as-config comes from seamless interpretation of the config types and the program’s larger type system. I would hazard a guess that Rust people would rather config-in-rust-alike-via-maco-magic than use this but what do I know. I’m a TS guy.

      You should be more clear about to what extent this is a “subset” of typescript or if it’s all of Typescript. From the README it sounds like it could be Python:Skylark::Typescript:TySON but — you allow all of typescript?

      1. 2

        It’s limited to TypeScript features that can be transpiled to ES5/ES6 (that’s what lets us use an embeddable JS engine). I’ll add more documentation on which features are support and which are not, but to give you a concrete example await/async is not supported.

    4. 5

      Why not simply use Lua tables? The Lua interpreter certainly has a smaller footprint than a JS interpreter.

      1. 2

        Lua is a good choice for similar use cases. We wanted TypeScript because it’s significantly more popular as a programming language. One way to think of TySON, is that it’s trying to enable Lua-like use cases but for TypeScript.

    5. 10

      Thanks for posting! This is interesting.

      I would like to see mention of JSON5 which is 11 years its elder. For comments in JSON, JSON5 is a good starting point.

      I see the need for a more strict JSON. Types and comments are just the tip of the iceberg. Joe Tsai has compiled a great list of other JSON related issues (this is the context of the Go library, but it’s a great resource).

      One of the larger issues I’ve run into is duplicates. Douglas Crockford, JSON’s inventor, tried to fix the duplicate issue but it was decided it was too late. Although Douglas Crockford couldn’t change the spec forcing all implementations to error on duplicate, his Java JSON implementation errors on duplicates. Others use last-value-wins, support duplicate keys, or other non-standard behavior. The JSON RFC states that implementations should not allow duplicate keys, notes the varying behavior of existing implementations, and states that when names are not unique, “the behavior of software that receives such an object is unpredictable.” Duplicate fields are a security issue, a source of bugs, and a surprising behavior to users. See the article, “An Exploration of JSON Interoperability Vulnerabilities” Disallowing duplicates conforms to the small I-JSON RFC, which is a stricter JSON. The author of I-JSON, Tim Bray, is also the author of JSON RFC 8259. See also the JSON5 duplicate issue.

      1. 8

        Good call out. I’ll add a mention to JSON5.

        In fact, for one of my use cases I considered using JSON5. I really liked it, except for the fact that I really wanted multi-line strings and string interpolation. Template literals weren’t introduced until ES6, so the JSON5 spec rejected the idea of of including them in JSON5 (since JSON5 must be compatible w/ ES5) – it would be possible to add those in a future JSON6 standard (but alas, I couldn’t find any go implementations, which is what I needed)

        For a moment I considered implementing a JSON6 in go … but discovered that it would be easier / faster to use an existing TypeScript bundler + an embeddable JS engine, and I ended up with TySON.

    6. 1

      Pretty cool. One thing I think is also will be cool: replace json schema with typescript.

      1. 2

        Yeah, agreed. I’m thinking that users should be able to write types using Typescript, and we can convert those to a JSON schema if people want to export the types.

        1. 1

          This is impossible unfortunately. TypeScript types and json-schema do not map into each other.

          1. 3

            There’s an overlapping subset. I wrote a compiler that does this and it seems to work fine.

            The newest version is closed source but here’s my work so far - compiles Typescript types to TypeBox/JSONSchema: https://github.com/justjake/ts-simple-type

      2. 2

        I think JSON Schema is still a better solution for cross-language and tooling support, as there are many implementations ie allowing generation of Go and Java code, which would be harder if we only had TySon

        1. 2

          True. But typescript is such powerful language.

    7. 17

      Doesn’t supporting interpolation mean that an entire JavaScript engine is required?

      1. 5

        Yes, but we only need an engine that implements ES5 or ES6 (2015) at most, of which there are plenty of embeddable, light weight options without requiring full-blown V8 or Node. For example, QuickJS https://bellard.org/quickjs/ or GoJa https://github.com/dop251/goja (which we’re using for our go implementation).

        I’ll add that these engines are embeddable, so they would be part of the TySON library, and thus don’t require any external dependencies.

        1. 16

          You’re seriously embedding an entire JavaScript interpreter in your configuration language? I’m sorry, but that’s ludicrous.

          The rest of this format is basically JSON5, which I’ve been using since 2015.

          1. 4

            In fairness, the image format SVG almost got raw socket access :D

          2. 2

            You’re seriously embedding an entire JavaScript interpreter in your configuration language? I’m sorry, but that’s ludicrous.

            Is it? I tend to favour libucl for configuration files (it has a bunch of nice features for humans, such as include files with rules for composition, and I’ve written some tooling that lets me define a JSON Schema for my configuration and then have a nice C++ wrapper that exposes it) but the library is larger than either jsQuick or DukTape. If you have a configuration that is sufficiently complex that people might want to use code with it, embedding a JavaScript interpreter might not be such a bad idea.

            1. 3

              For sanity’s sake there needs to be a hard line between config languages and programming languages; that line is probably Turing-completeness. The attack surface of the latter is so much bigger … not just sandbox escapes but infinite loops, stack overflows, memory exhaustion, nondeterministic behavior.

              I’d be ok with a config language that allowed simple expression evaluation, and interpolating expressions in strings; that’d be useful.

              Beyond that, what you are creating is a program, and you should write it in something designed as a full PL, not a config language that’s metastized. (CMake is my poster child for this disease.) Lua was literally created for this purpose.

              1. 1

                I think it depends a lot on the kind of configuration. If you’re setting some options, you have one set of requirements. If your configuration is really a DSL for defining how a complex set of components are assembled, then you have very different requirements.

                I’m less convinced by the security argument because I rarely think of configuration state as untrusted. I assume an attacker who can write to a configuration file for a program is allowed to do anything that that program is allowed to do. There are some cases where this is not true (for example, per-user configurations for a shared server) but that’s often better handled by dropping privileges to match those of the user than by assuming that the config file parser is bug free.

                Beyond that, what you are creating is a program, and you should write it in something designed as a full PL, not a config language that’s metastized. (CMake is my poster child for this disease.) Lua was literally created for this purpose.

                I don’t see that Lua is better than TypeScript here. A TypeScript EDSL that gives typed definitions of the objects that your program is supposed to generate seems better than ‘please give me some Lua tables and we’ll check when we run your program whether you got the structures right’ as an approach.

          3. 1

            From my point of view whether the tradeoff is worth it depends on whether you want a programmable configuration language (with functions and imports) or not.

            If you have an application for which you don’t need programmability, then JSON5 is great and TySON is not for you. In fact, I would encourage you to use JSON5 for those cases. However, if you need programmability then we think TySON is a good tradeoff and the embedded JS interpreter is very small. As a point of comparison, consider languages like dhall, nickel, jsonnet and cue all of which include their own custom interpreters for their own custom languages.

            1. 4

              Why not just use JS as the config language then?

    8. 12

      TypeIDs are a modern, type-safe extension of UUIDv7.

      I am severely allergic to the use of “modern” in describing software techniques, but it is especially funny here.

      Prepending a string to a UUID is hardly novel.

      1. 5

        I can’t tell if these comments are tongue-in-cheek or not, so apologies if I’m misinterpreting.

        The point of TypeID is not to be novel, nor do we claim to be: people have been using similar techniques to TypeID for a long time, we acknowledge that. The point is to provide a standardized implementation for the pattern, with pre-existing libraries in a wide variety of languages that let you encode/decode the id.

        The choice of the word “modern” (“characteristic of present and recent time”) comes from choosing UUIDv7 as the standard. Which is a new, upcoming standard; instead of basing it on the older UUIDv4 or similar. I’m open to other adjectives that can capture that.

        1. 3

          In truth, your use of “modern” is more reasonable than your use of “type safe”. In what way is any of this actually type safe?

          SQL implementation: you define a compound type as a tuple of (text, uuid). Type-safety here would mean that if my columns were intended for user prefixes, storing any other prefix would be a type error.

          typeid=# select * from test;
                                 id                     | description
            --------------------------------------------+-------------
             (foo,0189043c-655b-7473-b3a3-69f2896595b7) | test 1
             (bar,0189043c-7cf4-79f1-add7-d2214073e52c) | test 1
             (baz,0189043c-8b54-72ff-a8c2-5287adc423fd) | test 1
          

          But a compound type will happily store anything as long as they are strings and uuids. Actually checking the prefix is left as an exercise for the user.

          Why should I even use this rather than a simple CHECK constraint on a string column? The SQL “implementation” is totally superfluous.

          Maybe that’s a limitation of SQL. Let’s look at your Go implementation.

          func From(prefix string, suffix string) (TypeID, error) {
          	if err := validatePrefix(prefix); err != nil {
          		return Nil, err
          	}
          
          	if suffix == "" {
          		uid, err := uuid.NewV7()
          		if err != nil {
          			return Nil, err
          		}
          		suffix = base32.Encode(uid)
          	}
          
          	if err := validateSuffix(suffix); err != nil {
          		return Nil, err
          	}
          
          	return TypeID{
          		prefix: prefix,
          		suffix: suffix,
          	}, nil
          
          }
          

          So now, instead of a tuple that can store any arbitrary prefix, now it’s a struct that can store any arbitrary prefix.

          So, how does the Go type system enforce what prefix you are using? You can’t. Either you have to implement your own wrapper type or check it at runtime.

          I fail to see what you are actually adding in terms of type safety at all.

          You’ve implemented some utility code for generating UUIDv7s in base32, and splitting a string on an underscore. That’s not useless but let’s not oversell it.

          1. 2

            Users of the library do need to implement the wrapper types in order to enforce type checking in their application, the library can’t magically do that for you. The SQL example shows how you can use one of the helper functions to define the check constraint: https://github.com/jetpack-io/typeid-sql/blob/main/example/example.sql#L7

            In either Go or SQL you can only implement those type checks as a user of the library, because the type is encoded as part of the id. If it were “just UUIDv7 in base32” then the type information would be lost.

            You can say that it’s “just UUIDv7 in base32 with a string type prefix”, but the point is in standardizing the encoding (what alphabet in base32, what’s the separator between the type and the uuid, etc) and having ready made implementations across different languages. Just because something is simple (both in concept and implementation) doesn’t mean that it’s standardization is not useful.

            1. 5

              My objection here is that the entire nomenclature of this project is designed to suggest that you are elevating the semantic content of the prefix into the type system.

              You place “type-safe” before k-sortable and globally unique in your description. This is front-and-center of what you’re communicating. The entire point of type safety in a language like Go is that the compiler enforces correctness, not you.

              If I wrote my own string split function and just did

              struct UserID {
                ID uuid.UUID
              }
              

              it would be no less type-safe than using this library.

      2. 3

        Maybe it’s modern because it’s “type-safe”. Everybody knows that anything before 2015 wasn’t safe.

        1. 2

          Everybody knows that anything before 2015 wasn’t safe.

          This non-subtle jab at Rust is hardly appropriate. The domain of low-level programming languages are infamously unsafe (especially C/C++).

          1. 3

            Low level languages are, by definition, not type safe: they can express things that define type safety. Both Rust and C++ have tools to enforce type-based restrictions on part of your program, the difference is that Rust’s are opt-out and C++‘s are opt-in. Which of these makes more sense depends a lot on the domain. My personal bias is that very few things actually need to opt out of a safe type system (memory allocators, context switch routines, garbage collectors, and a handful of other things) and these things do primarily unsafe[1] things and so opt-in may make more sense. Things that don’t need to implement parts of a type system should not be allowed to opt out at all. I’m also aware that 95%+ of uses of low-level languages are things I think are inappropriate and so my opinion is a minority.

            All of that said, Ada has been doing type-safe low-level programming language for a lot longer than Rust and the B5000 had an OS written in a dialect of type-safe (and garbage-collected!) ALGOL60 in 1961, so using type safety in low-level code is not exactly new.

            [1] Or, rather, they should do safe things but the set of invariants that they need to enforce are bespoke to the program, not generic rules that can be applied elsewhere. For example, in snmalloc we use C++ templates to describe a state machine of allowed transitions between kinds of memory, but no other C++ (or Rust) code should have to understand the difference between freed memory on a free list and freed address space returned to the OS but reserved for reuse. Ideally, these things would be written in a language like F* that allows specifying the axioms and invariants that they must preserve and then proving properties on them.

          2. 3

            Not a jab at Rust itself, I love Rust. More a jab at people ignoring all the work that has been done that led to Rust.

            Go is a memory-safe language. Python is a memory-safe language. Haskell is a memory-safe and type-safe language. Etc…

            It is possible with the use of static analysis and sanitizers to write safe C and C++, and those static analysis and sanitizers tools are what led to Rust. I’ve had people tell me that “starting a project in anything but Rust is irresponsible in 2023”. This is a jab at them.

            Yes, low-level programming languages are unsafe, but so is a chainsaw. And prepending a string to another string is something we know how to do safely since a very very very long time, even in unsafe languages.

      3. 1

        Prepending a string to a UUID is hardly novel.

        I also thought that was fairly entertaining. Doesn’t this also make it more difficult to store IDs as your key is no longer fixed-size?

        1. 3

          But it is fixed size. You’d just add size of the prefix for your particular column, as only the given “type” of that key should be stored there. Alternatively, you could strip the prefix, and add it back at the application layer.

    9. 4

      I discovered a few months ago ULIDs. In my DB I always have 2 IDs:

      • the database ID, in PostgreSQL it is the primary key, and of type SERIAL
      • a “business” ID, I used to use UUIDv4, but then I discovered ULIDs and UUIDv7 (which this spec uses)

      Then I computed the type_ prefix from the model’s name and that’s what I returned via the API.

      But ULIDs and UUIDv7 are not the only sortable unique identifier formats, so how does one choose? Flip a coin?

      1. 3

        ulid is great, and before UUIDv7, I would have chosen it as the way to generate globally unique identifiers. With UUIDv7 as an upcoming standard by the IETF, I think it’s now better to choose the standard. Once it’s approved as a standard, that will essentially guarantee wide support across many languages and ecosystems.

        TypeID thus takes UUIDv7 and adds types to it.

      1. 7

        One difference is that devbox wants to almost completely hide nix from you. For example, it automatically installs it for you. The abstraction is thicker, for sure.

        My very naive guess at which you should play with, based on how I decided: if you want an abstraction over nix that you can adopt without ever hearing the word nix, give devbox a try. If you want something to ease you into using nix before you dive in head first, give devenv a try. I have very low confidence that’s a good or accurate heuristic.

        1. 6

          Hi all, I’m the founder of jetpack.io (developers of Devbox), and I agree with your take:

          • Devbox is trying to give you the power of nix, but with a simplified interface that matches the simplicity of a package manager like yarn. If you want reproducible environments by simply specifying the list of packages you depend on, and not having to learn a new language, then devbox is a great match.

          • On the other hand, if you’re trying to do something that is involved and requires the full nix language, then devenv can be a great match.

        2. 3

          and it uses json as a surface language, with all the limitations this implies compared to nix-lang.

          1. 3

            yea, that’s what i surmised just from a cursory look… the devbox abstraction might be quite limiting in some ways, whereas devenv seems thinner and probably therefore less leaky (because it’s designed to leak?)

      2. 5

        Devbox is a commercial product in a pre-commercial “get free contributions” phase.

        1. 10

          I feel like this is misleading (I’m the founder of jetpack.io the company that makes Devbox)

          Yes, jetpack.io is a for-profit business: but we’ve committed to making the Devbox OSS tool free forever. We plan to monetize by offering managed services (Devbox Cloud), but those are an independent codebase, and you only need to pay for them if you want the managed services. If you don’t, then Devbox OSS is, and will forever be, free and completely open source.

          This is similar to how Cachix is a for-profit business. It offers a Cachix managed service that it monetizes. It is commited to making devenv a free open-source project. You only need to pay for cachix if you want to use the managed service.

          In that sense, both companies/projects monetize in a similar way.

    10. -1

      I’m one of the developers behind devbox, one of the tools described in the article (https://github.com/jetpack-io/devbox). I’m a big believer in the idea that dev environments should be reproducible and portable: take them with you anywhere you want, whether you want to run locally or on the cloud, and ensure you always get the exact same environment no matter what.

      We’d love to hear your thoughts on how you’ve used a Cloud Development environment, and what your experience has been with them.

      1. 4

        I don’t think I have have ever used a “real” cloud environment, but I did programming work on a remote server for a while. Nothing special, just me typing into a couple of terminal windows.

        This all went really well, until I started to work while riding the train. It then worked well most the time, but at least twice during the journey the WiFi would stutter a bit and there would be noticeable lag when typing. Occasionally up to a couple of seconds, so that it disturbed my thought process and I completely lost my focus. This was in a West European country with a high standard of internet connectivity.

        This inconvenience alone was already enough for me to turn away from the idea of developing in the cloud. The idea that I could do just the same tasks on the machine on my lap without any needing any external services is a no-brainer. As developers, we know it is best to avoid external dependencies. If you are going to include one, specially a flaky one like connectivity, it better well be worth it. The advantages I see listed are nice. But they can also be worked around.

        So I agree with the author. There is the opportunity to build some nice things, but I should be able to turn it all off and work offline.

        1. 2

          We agree totally. The ideal “cloud environment” is really a portable environment – one where you can easily run it on the cloud, or locally, and where switching is as seamless as possible. This way you can stay productive even if you lose connectivity, or if you are in an environment where you can’t access your dev machine.