Very neat. I like how Niki starts from where people are likely to be (no own server, but relatively easy access to a syncthing or a dropbox), and then makes a bridge to a design that works well with the platform and for the programmer.
No. CRDTs are a family of generic data structures that can withstand network partitions without conflicts (hence the C for Conflict-free). Distributed databases overlap with this space but they’re not the same thing, and I don’t know of many dbs that use CRDTs in practice.
What happens with your distributed database of choice when there’s a network partition between two of its nodes, same key is modified on both in a conflicting way, and then the network connectivity gets restored?
CRDT’s assume that you can model and automate conflict resolution for your data-model. Depending on your needs this may be easy or difficult. If you can’t do that modeling for some reason then you can’t really use CRDT.
If you haven’t already, I recommend taking a look at the local-fist paper and the conf talk linked at the beginning. There’s a rationale behind opting for local (offline first) data storage and then a sort of call to experimentation on server-side components that can sync state across devices. I see this blog post as a little thought experiment in that space.
I would think of any database as a specific data structure with 0 or more wire protocols and specific semantics associated with partition tolerance, availability, and consistency. This feels like a database without a specific line protocol defined that has chosen AP over C and has a specific conflict resolution defined.
Do you have an example of an AP over C distributed database without a specific line protocol defined?
Git or other DVCSs seem similar, but I think they only have manual conflict resolution.
Exactly my take. I feel I’m missing something here as I don’t see the problem and how an awful dropbox hack could be easier than git or even simple scp.
I have a couple of devices that can never reach each other over a network, but files happily get shuttled between them by Syncthing as an intermediate laptop moves hither and yon. Sometimes sneakernet is forced upon you by the rules of organisations; sometimes it’s easier not to have to think about another network security boundary
Git requires the user to manually fix merge conflicts.
Your scp suggestion boils down to the user has to always copy the data from the last node right before you edit, or you degenerate to document.jakesVersion.final.FINAL.docx style version management.
In CRDT-based data, there’s no intervention needed from the user. Everything just works. If I’m editing notes on my phone and my laptop, I do not want to muck around with manual copy/push/pull/diff operations. I just want to type my notes, like I do today with my BigCo managed notes app, but ideally without the BigCo.
Very neat. I like how Niki starts from where people are likely to be (no own server, but relatively easy access to a syncthing or a dropbox), and then makes a bridge to a design that works well with the platform and for the programmer.
To my knowledge the best work done on this to date is DecSync: https://github.com/39aldo39/DecSync
I’ve used it; it has some warts but it does work and I think a version 3 of that protocol would help tremendously in this effort.
Are we just re-inventing distributed databases here?
No. CRDTs are a family of generic data structures that can withstand network partitions without conflicts (hence the C for Conflict-free). Distributed databases overlap with this space but they’re not the same thing, and I don’t know of many dbs that use CRDTs in practice.
What happens with your distributed database of choice when there’s a network partition between two of its nodes, same key is modified on both in a conflicting way, and then the network connectivity gets restored?
CRDT’s assume that you can model and automate conflict resolution for your data-model. Depending on your needs this may be easy or difficult. If you can’t do that modeling for some reason then you can’t really use CRDT.
If you haven’t already, I recommend taking a look at the local-fist paper and the conf talk linked at the beginning. There’s a rationale behind opting for local (offline first) data storage and then a sort of call to experimentation on server-side components that can sync state across devices. I see this blog post as a little thought experiment in that space.
I would think of any database as a specific data structure with 0 or more wire protocols and specific semantics associated with partition tolerance, availability, and consistency. This feels like a database without a specific line protocol defined that has chosen AP over C and has a specific conflict resolution defined.
Do you have an example of an AP over C distributed database without a specific line protocol defined?
Git or other DVCSs seem similar, but I think they only have manual conflict resolution.
Exactly my take. I feel I’m missing something here as I don’t see the problem and how an awful dropbox hack could be easier than git or even simple scp.
I have a couple of devices that can never reach each other over a network, but files happily get shuttled between them by Syncthing as an intermediate laptop moves hither and yon. Sometimes sneakernet is forced upon you by the rules of organisations; sometimes it’s easier not to have to think about another network security boundary
Git requires the user to manually fix merge conflicts.
Your scp suggestion boils down to the user has to always copy the data from the last node right before you edit, or you degenerate to document.jakesVersion.final.FINAL.docx style version management.
In CRDT-based data, there’s no intervention needed from the user. Everything just works. If I’m editing notes on my phone and my laptop, I do not want to muck around with manual copy/push/pull/diff operations. I just want to type my notes, like I do today with my BigCo managed notes app, but ideally without the BigCo.