1. 64
    1. 18

      The observation of this article is spot on! CRDTs are an awesome formal model for distributed data structures, but I was always bothered by the notion that all conflicts must be resolved automatically (hence also the name, conflict-free replicated data type). As the article illustrated, this is a hopeless endeavor. I believe what is needed is a proper structured representation of conflicts, that allows for sharing them and resolving them collaboratively, giving back control to the users and supporting them in the process. One of my favorite papers Turning Conflicts into Collaboration makes a compelling argument for this idea.

      As part of my ongoing PhD studies, we have developed our own formal model for structured conflict representation, based on lattice theory: Lazy Merging: From a Potential of Universes to a Universe of Potentials . Incidentally, it is also a CRDT, but it does not attempt to resolve conflicts automatically. Instead, it represents them within the collaborative documents. Approaching the problem from mathematics allowed us to arrive at a simple conceptual model that can guarantee strong properties, like, e.g., the completeness, minimality, and uniqueness of merges, even after repeated merges of existing conflicts. And merges can be calculates very easily. I always wanted to make a blog post about it, but I never came around to do it.

      1. 5

        I always wanted to make a blog post about it, but I never came around to do it.

        I would love to read that blog post, if you ever get around to writing it. I’m sure I’m not the only one.

        1. 5

          You’re probably aware of it already, but channels (rough equivalent of branches) in the newish VCS Pijul are CRDTs that represent conflicts rather than resolving them automatically. So if two people push conflicting changes to the same channel it’s not an error, but a third change is needed to resolve the conflict before the channel state can be checked out into normal files again.

          https://pijul.org/manual/theory.html

          1. 5

            So it seems to me like the next step along this line of thinking is in the UI. How do we most intuitively represent conflicts to users and allow them to interact with them?

            1. 1

              We have environments to seek inspiration from where users regularly resolve synchronisation and conflict resolution in a shared space with more varied constraints and tighter timing windows with some reverse engineering tacked on for good measure; heck they even seek that specific challenge out: raid bosses in MMORPGs. Combination of body language, verbal communication and paratext ( https://www.youtube.com/watch?v=BKP1I7IocYU ).

            2. 3

              it does not attempt to resolve conflicts automatically. Instead, it represents them within the collaborative documents

              That’s essentially CouchDB’s document model. A document is a JSON object with a tree of revisions. Most often that’s linear, with one current (leaf) revision. But if two diverging versions are synced, it branches. One revision is (semi-arbitrarily) picked as current, but an application can compare the leaves and save a merged revision to resolve the conflict. A bit like git, except without named branches.

              (Couchbase Mobile, a CouchDB descendent, still works like that on the client side, but for efficiency reasons the server side has a more strict MVCC model that requires clients resolve conflicts before pushing.)

            3. 6

              Really slick post! One suggestion: could you add the explanation for why the example works the way it does to the success case? I guessed it correctly purely based on “what would be the most perverse answer”, and had to guess incorrectly to see the explanation!

              1. 2

                Wow thanks Hillel! I hope you are doing well. Yes—that is a good suggestion and I can make the change tomorrow no problem. Thanks for chiming in!

              2. 7

                Bob changes the spelling of Color to the British Colour. Alice deletes all of the text.

                I think the idea that an algorithm can find a “correct” result here is fundamentally wrong.

                1. 16

                  If you make it to the end of the post, you’ll see the author agrees!

                  1. 5

                    I don’t think any merging algorithm can ever be correct in the general case. I add a “as we have mentioned earlier …” to the last paragraph and you remove or change that mention, now the whole text has lost its coherence.

                    In software development, we deploy so many measures to fight against issues like this, like writing test suites, type systems, code review etc.

                    So I don’t think any algorithmic change to a body of text could ever be “correct” in the general case.

                  2. 2

                    related, a short exploration i wrote a couple of years ago after chatting a bit with john cowan: https://elronnd.net/patches.txt. focused more on code but the high level conceptual models overlap

                    1. 2

                      One thing that would help merging/composing problems is to record higher-level edition operations in a tooling-friendly format.

                      In the OP’s example, the change is understood by the tool as “inserting u at position N”. But if instead we recorded “Replace color by colour” (and for example spell-checking programs could easily generate commit messages where this is recorded in a tooling-friendly way), then it would be obvious how to compose with a patch that removes (or adds) occurrences of color.

                      In your example, a commit fixes a bug in two copies of the same logic, and another commit adds two other copies of the same (unfixed) logic. The original commit could come with a semantic-grep format description of the flawed pattern, for example as consumed by Coccinelle. If the description only knows how to recognize the flawed piece of code, then it could alert the user that new occurrences have shown up at merge time – and require intervention. If the description also knows how to transform the flawed code into the correct version, then it could update the two new copies. Tooling could help reverse-engineer these automated descriptions from manual code changes.

                    2. 2

                      Really great post.

                      I was initially confused to see OT lumped in with CRDT because I didn’t realize it tried to be conflict free. I used to work on a p2p app in the 2000s that used change synchronization, in a way that is very similar to OT, but it would only resolve unambiguously correct conflicts. I thought that was OT but I guess not. TIL!

                      I love the conclusion “Offline editing is a UI/UX problem, not an algorithms problem”

                      My favorite example of this idea is git. Software developers are so used to git’s (default) theirs/yours merge UX that it’s hard to explain to them that it is IMPOSSIBLE to properly merge source code without the base. In other words: git is solid because it won’t lose data on its own, but its UX demands that you lose the data for it as an unwilling accomplice, and we’re somehow okay with that. No one says “git’s default configuration loses data” simply because it is the UX that loses the data, not the underlying algorithms.

                      1. 1

                        btw https://www.moment.dev/blog doesn’t load for me in Chrome or Firefox, probably because it’s making failed requests to localhost:3333

                      2. 1

                        I can’t help thinking that there is a lot of prior art in managing version conflicts in source code management systems. Perhaps the only real solution is to allow the person to manually do the merge if there is a conflict. Possibly keeping a history of prior resolutions, as some version control systems do.

                        1. 1

                          Interesting post! I’ve been working on a simple database (document store) to support offline-first. It really depends on the business case what merge strategy is appropriate and what can be automated. In principle a conflict is something that can not be resolved in a generic way.

                          1. 5

                            This didn’t make it into the post but I actually do recommend people think about this as a database problem.

                            If I told you that I was going to have two database write replicas partitioned and accepting completely divergent writes and then I was going to use CRDTs or OT or some other ✨ magic ✨ to merge them together, you’d rightfully balk. But if one of those replicas is a browser with a text document in it or something, somehow a lot of people think that’s ok!

                            The cases this is fine are generally cases where direct conflicts are unlikely, or you can simply ignore direct conflicts, or the data is restricted enough there is no such thing as a conflict (like a monotonically increasing count).

                            1. 3

                              I think this is the right mental model. I also think that it’s unhelpful that we’ve strayed from CRDTs as I learned them.

                              When I was first introduced to the idea they were called “Commutative Replicated Data Types”. And I don’t think that name is any better or worse than conflict-free, but I do think it gives us a tool to talk about what you’re bring up in the blog post.

                              For instance in your above question, if the data type was an integer, and the operation was “sum”. I would pretty much be on board with saying that we CAN magically merge them! (with some edge cases around maximums?). Sum on integers I think we’ll all agree, is at least mostly, commutative.

                              But are document interactions commutative? I think that the answer is probably “NO”.

                              [edit] Immediately after writing this I got to wondering about when commutative became conflict-free and I wonder if one of the folks working on textual CRDTs had the same pedantic math realization and changed the name.

                              1. 1

                                Anders! How are you? Agree with all of this, and also, your last point is kind of funny. Another thing we took out was a section with a similarly pedantic header: “‘Conflict-free’ as in ‘we pretend conflicts do not exist”. We pulled it because it felt mean-spirited and we didn’t want to make it a post about making fun of people, and in any event that name is probably not changing.

                                1. 1

                                  I’m great! Sent an email so we don’t have to catch up via lobste.rs :-D