1. 48
  1. 21

    This is why people are interested in CRDTs. Or at least, it’s why I’m interested in CRDTs. There seems to be a lot of activity in this space at the moment. Two projects that come to mind are Automerge/Hypermerge, and OrbitDB. The irritating thing about all of these projects right now is that they pretty much require you to write your application in javascript, which I really don’t want to do. However, I’m contributing to the Rust port of Automerge so hopefully soon we’ll be able to write automerge applications in Rust.

    1. 4

      This. I’ve been trying to build something on top of SQLite3 to do this as I need it for Rust. All the projects I come across for this type of thing seem to be geared towards NodeJs type applications.

      1. 1

        I have noticed something very similar, I see a lot of Node.js and JS-in-general projects cropping up such as SSB and OrbitJS. Not sure what that means for the ecosystem, actually.

        1. 1

          I would say that interactive use is probably the main driver so people have written it in front end-ish languages. I don’t think it means anything about this technology or theory.

          1. 1

            I would say your assumptions are reasonable. Thanks for sharing.

      2. 1

        That’s a good new keyword for my research, thanks!

      3. 13

        How close is Irmin to what you want?

        1. 1

          That looks really good, thanks for the tip!

          1. 0

            Hey this looks really cool. Thanks for sharing.

          2. 13

            Sorry to toot my own horn, but this is exactly what the project I work on, Couchbase Mobile, does. It’s an extension of the Couchbase server that allows structured data (string ⟶ JSON mappings, basically) to be synchronized between the server and client devices, or directly between clients (P2P). The client-side storage is a full database with queries and indexing, based on SQLite but with a schemaless JSON data model.

            This is a commercial product, but it’s 99% open source (apache2) and we provide a free “community edition” built from the OSS code. The commercial version adds a few features, like at-rest encryption, and tech support.

            Honest;y I’d be working on this stuff even if Couchbase weren’t behind it. Distributed/synced data has been my obsession for over a decade, because it seems the best way to implement services like social networks without corporate control. (Exactly what @tekknolagi wrote about in their comment here!)

            1. 1

              Interesting, especially that there is also a P2P option. I will take a look at this!

              1. 1

                How does it handle partial syncing? So if the database is too large to be completely copied locally?

                1. 1

                  The server-side component can filter the database sync, either to enforce access controls or to let clients pull down subsets.

                2. 1

                  Last year I built a package manager that used PouchDB internally. The use of replication to avoid concurrent writers was an interesting paradigm shift. Can definitely see more applications for this.

                3. 7

                  I basically want a DVCS that doesn’t operate on text files, but on a proper data model like relational algebra or algebraic datatypes.

                  Same! And <programming language> syntax trees are another such data model.

                  1. 1

                    I do wish we had this built into text editors! I picture https://github.com/lambdaisland/deep-diff, but for every AST. I think it would help catch a whole class of bugs, too!

                  2. 3

                    The hard part is of course,“define ‘structured data’”

                    1. 2

                      I give two examples of ways to structure data in the blog post, relational algebra and algebraic datatypes. Those terms have clear definitions.

                    2. 2

                      I want this for messaging. Think: Facebook messenger / WhatsApp / your favorite IM service here, but offline-first, messages are encrypted and can be proxied via other people in cases of lack of internet connectivity, …

                      1. 8

                        Yes! Have you looked at Scuttlebutt? That’s exactly what it does. I have some reservations about their specific schema and protocols, but the idea is right-on.

                        1. 2

                          Yes and no. SSB is interesting, but has a couple drawbacks:

                          • No multi-device support
                          • Private messaging is not first-class
                          • Syncing is very slow
                          • I can’t quite put my finger on it, but the delivery seems sort of hazy… It’s nebulous who has received what, if you’re seeing all of someone’s posts/messages, …
                          1. 4

                            No multi-device support

                            This is kinda in the top 3 requests since forever. Some people are working on it. We tend to call it the SameAs message because it is supposed to link multiple devices under the same identity.

                            Private messaging is not first-class

                            Different clients can choose what kind of messaging they use. A developer can build a client that only sends private messages. A client I’m working with primarily shares content using private messages.

                            Syncing is very slow

                            I think this is a common pattern of offline-first platforms that don’t use a DHT. It takes a while to find peers and start synchronizing. Another important caveat is that SSB verifies each message on a feed and since they are linked to each other in a signature chain, you end up needing to fetch all the messages for a given feed from the start. If you follow someone who has been there for a while this takes some time to sync. The same set of features that cause it to be slow when compared to SaaS is the same set that makes it resilient. SSB can sync without the internet and in times of pandemics and government surveillance this is quite a good feature to have.

                            I can’t quite put my finger on it, but the delivery seems sort of hazy… It’s nebulous who has received what, if you’re seeing all of someone’s posts/messages, …

                            Since it is a gossip protocol there is no way to know who received what. The view of the network is necessarily subjective and this has its charms for people like me who enjoy these aspects. It is a different mode of seeing things and we’ve been trained for experimenting networked platforms in another way. Someone looking at SSB and trying to observe it through the lens of an “instant message platform” or “facebook” is going to have a hard time. SSB is its own thing and it is quite beautiful (IMHO).

                            There are people using SSB as an alternative to Git and NPM. You can manage repos, install stuff using NPM, all of that without ever touching the internet. There are people playing chess, reviewing books with a goodreads like interface. There is an app for Maori to manage their Whakapapas (family trees). It is very flexible.

                            I’ve been involved with many different clients in that platform and am actively developing Patchfox which is a client as a WebExtension for Firefox (still require a local server running because WebExtensions don’t have TCP and UDP). Feel free to reach out if you want to chat about it.

                            1. 2

                              Syncing is very slow I think this is a common pattern of offline-first platforms that don’t use a DHT.

                              It’s slow (in my experience) even in a minimal setup where I connect to 3 or 4 pubs and only follow a handful of people. The client follows everyone 2 degrees of separation away from those, including everyone the pubs follow, which results in a buttload of posts. IIRC, my database is about 600MB. The JS database code seems pretty slow, too, frequently blocking the client while it re-indexes.

                              1. 1

                                No multi-device support

                                Some people are working on it.

                                That’s good news. It’s really hamstringing my usage, not that anyone really cares about me in particular.

                                A developer can build a client that only sends private messages.

                                Sure, but nobody has built it, and it’s not built into any of the major clients, as far as I can tell. And yes, I could build it if I wanted it that bad, sure, but I would have to make time and shake my reservations about SSB.

                                I think this is a common pattern of offline-first platforms that don’t use a DHT.

                                A DHT would be a really really great bootstrap, if available. DHT + mDNS + BLE would help people who are nearby and far away.

                                Another important caveat is that SSB verifies each message on a feed and since they are linked to each other in a signature chain, you end up needing to fetch all the messages for a given feed from the start.

                                I think this ends up being a plus for private messaging, since you can have some amount of ordering in your messages (is it a complete Lamport clock?) and have “read receipts” based on what signature chain the message you receive has attached.

                                SSB can sync without the internet and in times of pandemics and government surveillance this is quite a good feature to have.

                                Absolutely agreed. I think it’s lacking in visibility as to what’s going on internally when it is syncing.

                              2. 2

                                Yup, those are all problems I have with it too.

                              3. 2

                                I guess here’s one more thing I’m missing: the ability to set a designated relay. Imagine both people are online when using this program. They should be able to communicate in near real time via the relay. Not gossip, in that instance.

                              4. 3

                                Matrix with a homeserver that you host on your device (laptop)? There’s work going on for p2p matrix.

                              5. 2

                                On the other hand there are great decentralized version control systems like Pijul/Git/Fossil and many more that check every requirement that I have, but they are built to work with textual data and are therefore unsuited to be a database backend of a graphical application.

                                I read a while back that the text is properly abstracted in Pijul, so could be swapped out, with a bit of effort.

                                1. 1

                                  That’s good to hear, I have to dig into Pijul some more :)

                                2. 2

                                  If you can come up with a mapping of arbitrary data to text, kind of like the –armor option in PGP, but so that mapping also has some nice functional properties (locality), then you can directly apply a text VCS. However, how to come up with that mapping isn’t obvious

                                  1. 2

                                    Have you heard of Irmin?

                                    1. 1

                                      Seems similar to OrbitJS as I mentioned previously.

                                      1. 2

                                        Maybe, never heard of that before. I suspect Irmin might have a stronger theoretical background, it originated from a PhD thesis, but it’s also in OCaml which might slow down adoption.

                                        1. 1

                                          it’s also in OCaml which might slow down adoption.

                                          Agreed, this is a drawback in my book. Not sure where to go from here.

                                    2. 2

                                      Is this project somehow related https://github.com/attic-labs/noms ?

                                      Noms is a decentralized database philosophically descendant from the Git version control system.

                                      1. 2

                                        We built a database product that uses noms as its backing store: https://github.com/liquidata-inc/dolt. It’s basically Git for data, with a command line that copies Git’s. See my comment elsewhere in this thread for details.

                                        1. 1

                                          Interesting, thanks for the link!

                                        2. 2

                                          We’re building Git for Data: https://github.com/liquidata-inc/dolt

                                          Dolt is a SQL database that stores table data in a Merkle DAG of commits, identical to git. This means you can clone, fork, branch, and merge your databases in a distributed fashion. Merges happen on a row-by-row, cell-by-cell basis. The dolt command line is a clone of git, so there’s no learning curve for people already familiar with git. If you also know SQL, then dolt sql pops you into a SQL shell where you can select or modify the data, create new tables, etc. Or start a mysql compatible server with dolt sql-server. Then when you’re done making updates, you can dolt add .; dolt commit -m "updates". If you also are using DoltHub, then push your changes back to master with dolt push origin master.

                                          The mashup of SQL and git lets us do some interesting things. For example, I can run queries on a previous commit with SELECT * FROM table AS OF 'HEAD~', or on a branch with SELECT * FROM table AS OF 'branch-name'.

                                          Anyway, we’re pretty excited about contributing to this space. We have a blog that we’ve been updating several times a week if you’re interested in following along: https://www.dolthub.com/blog/

                                          1. 1

                                            Cool, I will check that out!

                                            1. 1

                                              Hope you find it useful. Feel free to message me with any questions! We’re also always interested in PRs if you have a contribution you want to make.

                                          2. 2

                                            You might want to check out Dolt. It aims to be “Git for relational data”. They have a blog post explaining what that means, but it’s basically what you’re asking for.

                                            Dolt is open source, written in Go. It’s usually deployed as a single executable that you interact with via CLI or it’s built in MySQL server. Last time I dove into the source code, it included the software needed to run your own remote server. If you don’t want to run a remote network service, it also includes drivers to let you push and pull data sets from a file, S3 bucket or GCP bucket. It has roots in the defunct Noms project, if you’re familiar with that.

                                            Dolt is developed by Liquidata. They plan to monetize it with a hosted offering called DoltHub that basically follows Github’s pricing model (i.e. pay for privacy).

                                            1. 2

                                              Have you looked at any EAVT systems? They store facts about an entity with a time stamp that you can then go back and do calculations over. You can ask it “everything about this entity at this time” or “what has changed about this entity between time x and y”.

                                              Clojure has a few examples like the commercial Datomic database, the in-memory datascript, or the differental datalog based clj-3df.

                                              You can obviously combine streams of EAVT “datoms” but then you essentially have “last write wins”. But if you have the ability to ask “what has changed since these were last synchronized” you can do you own merges and branching. Datomic has transactions built into it which gives you a ff-only merge situation.

                                              https://en.m.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model is a non-time indexed version Datalog, with some time added, is what most clojure projects use: https://en.m.wikipedia.org/wiki/Datalog

                                                1. 1

                                                  Wondering also about OrbitJS. Do you have any thoughts about it?