1. 32

  2. 8

    See also Y.js, which is another CRDT library in JS that has wide usage. I’ve read about both Automerge and Y.js; my personal assessment is that Y.js has more users (eg Input, a commercial note-taking app that competes with Notion, my employer) and support (offers consulting, advertises bindings for the major JS editor frameworks) than Automerge. For a while the Y.js CRDT also had clear advantages over Automerge, but then Martin landed a big rework and now I expect that they’re roughly on the same performance footing I believe. I’d still need to benchmark them myself before I pick one for my next project.

    Another aspect to keep in mind for these libraries is, can you write your database logic against the CRDT in Rust (or any other non-JS language)? Because CRDT is CPU expensive at scale, I would expect that using Rust or another threaded language that doesn’t die when CPU work happens on the server would eventually be required for a serious scale backend for a CRDT service. Both Yjs and Automerge also have rust port projects in the early stages.

    1. 5

      I’m a maintainer of the Rust port of Automerge. One of the big reasons I’m interested in it is less about performance and more about interoperability. Because Rust has no runtime and C-like FFI it’s possible to build wrappers in other languages which use the Rust codebase for the complicated CRDT parts. E.g this experimental library for python: https://github.com/automerge/automerge-py

      1. 2

        Thank you for your work! I didn’t mean just in the “rust is fast language”. You wrote:

        I’m interested in it is less about performance and more about interoperability.

        To me interoperability is also a performance advantage because I have more freedom to choose an ideal solution when I can run the CRDT algorithm in any process in my cluster, because Rust/C is embeddable.

        An example idea in the scaling solution space that uses Rust would be to add an Automerge column type and operators to Postgres as an extension that uses your library, so that my front-end web server doesn’t need to read the entire document from Postgres and can just pass the diff on to the database directly by writing UPDATE doc SET doc.crdt = automerge(doc.crdt, $update) WHERE doc.id = $id.

      2. 2

        Thanks for sharing! I’m trying to figure out what storage/conflict-resolution library to use as the basis of an app that works in three modes: 1) browser-only 2) desktop 3) SaaS. Y.js and automerge are now on the top of my list as generic solutions to my problem.

        1. 2

          Keep in mind that CRDT come with space trade offs, a CRDT storing a text will always use more space than a plain string, and the CRDT’s space usage may grow without bound as it stored history. For local-only or offline-first, this concern is usually not a problem, but if you make a popular-enough SaaS, you’ll need to think about compaction and dropping old histories, as well as transmission costs when syncing data between nodes.

          I think the simplicity of CRDT makes it a better choice than Operational Transform model, especially for 0->1 scale, but I haven’t come up with a backend architecture for scaling big document CRDTs that doesn’t involve one million shards and hand waving at the storage layer.

          1. 1

            Yeah for sure. History aside from the current “session” is not incredibly important I think.

        2. 1

          Non-JS implementations are also important for native mobile & desktop apps, as well as embedded systems. Not all client-side code is built for web browsers (or for web browsers packaged into a facsimile of an app.)

          1. 1

            … I would expect that using Rust or another threaded language

            You don’t need a server in the first place, that’s part of the point of a CRDT; but a side effect is that if you want to have one (say passively making backups) the server can be rather lazy.

            1. 3

              You don’t need a server in the first place

              I think this is a reductive attitude. CRDT enables offline-first authoring and system design, but many use-cases still work best with an online and timely server component.

              Even for a simple use-case of writing a shopping list on my desktop at home and then viewing that list on my phone at the market, an always-on server node reachable somehow on the public internet is very beneficial to facilitate syncing. Otherwise, I need to make sure my two devices are active on the same network at the same time, and I need to verify they show the same state before I break their connection.

              1. 1

                Did you seriously not read everything after the semicolon?

                1. 2

                  I did, sorry for not including this in my initial response.

                  Laziness is a fine solution at the family-and-friends use-case, and it certainly relaxes some constraints, but at scale you don’t have “down time” you can defer work to and you always need enough CPU cores and storage bandwidth to burn through the sync/backup queue so it doesn’t grow unbounded. All I was trying to say is that having a Rust or C implementation of the CRDT makes the scaling solution space more broad and much less expensive.

          2. 4

            I’ve been working on a similar project, Osmosis. I’m currently working on a rewrite that adds strong typing (based on a simplified JSON schema language) to make it truly conflict-free.

            The main thing Osmosis adds on top of Automerge or Y is a full network stack, using UDP service discovery to find other copies of an app on the same network and automatically sync in the background.

            1. 2

              The pairing stuff sounds cool! Is there something about your service discovery layer that makes it difficult to re-use Yjs or Automerge for the data layer?

              1. 2

                Honestly I started working on it before I was aware of Automerge or Y.js, and so I haven’t seriously considered reworking it to replace my own CRDT with one of those.

                I haven’t looked deep enough into other CRDTs to know if they fulfill this requirement, but one of my goals for Osmosis is to support at-rest encryption, and the CRDT was designed to make this easy. For example, the Osmosis CRDT can be easily stored in a LevelDB-like key-value store: each action is stored at a key named with the action’s ID (UUID + Lamport number), while the JSON structure itself is split up into one entry per node, where the keys are the node paths (a sequence of integers and/or strings, encoded in MessagePack). Queries and insertions use JsonPath, and the JsonPath parser emits a set of these binary path strings. And this (in theory) makes at-rest encryption really easy: path strings can be hashed (with a unique salt), values and actions can be encrypted with a key, and the action IDs can be left unencrypted.

                This allows an app like a password vault to run in “locked” mode by default (without the encryption key in memory), but still receive and propagate encrypted CRDT actions. As soon as it’s unlocked, it can apply all newly-received actions to the database.

                Another thing the Osmosis CRDT was designed for: an extremely uniform JSON-based API. I expect the final version to be usable and callable from several different languages. So I tried to base the entire API around JSON, JsonPath, and subscribing to query events. Osmosis’s internal state (“metadata”) is treated as a read-only part of the database, and is accessible using the same API. So, if you want to ask a question like “which peers am I connected to?” or “which actions were performed by peer X?” or “what was the state of the database 3 days ago?”, you can get this information through JsonPath queries that look exactly like queries for ordinary data.

                1. 2

                  Sounds like a good set of principles! I look forward to the fruits of your effort.

            2. 3

              not directly related to this module. But it has dependency on immutable.js

              At one point in time immutable.js was no longer actively maintained [1] (and I took some pains to switch over as I was using it very lightly, and it was quite big in size for our app’s kb budget).

              But it looks like somebody had picked up the maintenance, of this big, but otherwise wonderful library.

              [1] https://github.com/immutable-js/immutable-js/issues/1689