Isn’t a CRDT overkill for a chat program? I have no experience with this but seems to me there is are no conflicts that would arise in a chat log even if participants are allowed to add to it when offline. You can merge the logs with ordering based on timestamps?
I think you could build a chat app without a CRDT per-se, but there are a lot of things that it makes quite a bit easier for us to do.
Synchronization: We don’t need to implement our own synchronization algorithm. We can just use the one from Automerge and not have to worry about how to make sure that everybody ends up with the same list of chats in the end.
Serialization Format: Automerge comes with a compressed disk format that is quick to load and can be easily merged with other snapshots, which is important for our lock-free, concurrent storage on top of the remote PDS.
Causality: Automerge documents are structured similar to git commits, which means that can tell all of the chats that were visible when any given chat was posted. This could be useful for bringing clarity to the context in which a chat made offline, and then later merged into the timeline, was made.
“Upgrading” to Forum Topics / Wiki Pages: Another thing is that we want to be able to kind of “upgrade” and merge chats into forum topics and wiki pages, both of which could benefit a lot from having the git-like fine-grained edits offered by the CRDT.
So we’d probably end up just re-doing a lot of the things Automerge is doing for us.
Also, while we do have to figure out exactly how to chunk out long chat histories still, I think there’s a good way to do it. And in that case, I don’t think there’s a much larger overhead for using Automerge than there would be if we were making our own solution.
We’re not creating Automerge updates for every keystroke like you might in Google docs, so each chat message is it’s own commit with it’s metadata, not altogether that different than what you’d need in any chat app.
We might need a little more metadata than normal, but it also compresses well.
Finally, one of the biggest reasons that we are planning on using some upcoming work that will be integrated into Automerge: Beehive ( soon to be renamed Keyhive ) & Beelay.
Those combined will give us End-to-End Encrypted peer-to-peer groups and memory-efficient, fine-grained sync. Those features are really important to us for our long term goals, including growing into more use-cases than just chat.
Good question; we might add an answer to that to the Q&A.
I’m not fluent in the technicals, but in short the answer is no, you can’t rely on local timestamps because they’re variable and can therefore even be used as an attack vector if not accounted for.
you can’t rely on local timestamps because they’re variable and can therefore even be used as an attack vector
I’m not an expert, but attack vectors seem iffy as a reason to prefer CRDTs. Don’t CRDTs also have attack vectors? As I understand it, most CRDT algorithms assume participants are acting in good faith; protecting against bad actors is usually not a guarantee.
I can think of ways to attack a hypothetical chat program that sorts messages by timestamp: for example, forward-date a spam message by 1000 years to pin the spam message to the front of the chat. But with CRDTs, I don’t know in what ways they can fail. Is there a malicious Automerge message I could craft that would prevent new chat messages from appearing, or that would cause the chatroom to look different for different people? Maybe. But CRDTs are complex enough that they obscure the exact nature of any vulnerabilities. Speaking strictly from a security standpoint, you’d be better off with plain old timestamps—yes, there are ways they could fail, but at least they’re obvious, and the average dev can reason about those failures and how to mitigate them.
(Not to be too harsh on CRDTs! I think it’s pretty cool that people can build distributed applications on top of high-quality libraries like Automerge, without having to solve every single tricky consensus problem themselves. They might even be the overall best solution for the chat room being discussed! I’m only saying that just because a black box magically solves consensus problems doesn’t mean it solves security problems.)
My bad for bringing up the attacker case as a casual example; I wasn’t trying to make a point about attack vectors as something that played meaningfully into our reasoning in our choice of tech stack. (Beyond filtering for major red flags).
Better to consider my haphazard reply retracted and refer to zicklag’s much more eloquent response xD
Isn’t a CRDT overkill for a chat program? I have no experience with this but seems to me there is are no conflicts that would arise in a chat log even if participants are allowed to add to it when offline. You can merge the logs with ordering based on timestamps?
I think you could build a chat app without a CRDT per-se, but there are a lot of things that it makes quite a bit easier for us to do.
So we’d probably end up just re-doing a lot of the things Automerge is doing for us.
Also, while we do have to figure out exactly how to chunk out long chat histories still, I think there’s a good way to do it. And in that case, I don’t think there’s a much larger overhead for using Automerge than there would be if we were making our own solution.
We’re not creating Automerge updates for every keystroke like you might in Google docs, so each chat message is it’s own commit with it’s metadata, not altogether that different than what you’d need in any chat app.
We might need a little more metadata than normal, but it also compresses well.
Finally, one of the biggest reasons that we are planning on using some upcoming work that will be integrated into Automerge: Beehive ( soon to be renamed Keyhive ) & Beelay.
Those combined will give us End-to-End Encrypted peer-to-peer groups and memory-efficient, fine-grained sync. Those features are really important to us for our long term goals, including growing into more use-cases than just chat.
Good question; we might add an answer to that to the Q&A.
I’m not fluent in the technicals, but in short the answer is no, you can’t rely on local timestamps because they’re variable and can therefore even be used as an attack vector if not accounted for.
I’m not an expert, but attack vectors seem iffy as a reason to prefer CRDTs. Don’t CRDTs also have attack vectors? As I understand it, most CRDT algorithms assume participants are acting in good faith; protecting against bad actors is usually not a guarantee.
I can think of ways to attack a hypothetical chat program that sorts messages by timestamp: for example, forward-date a spam message by 1000 years to pin the spam message to the front of the chat. But with CRDTs, I don’t know in what ways they can fail. Is there a malicious Automerge message I could craft that would prevent new chat messages from appearing, or that would cause the chatroom to look different for different people? Maybe. But CRDTs are complex enough that they obscure the exact nature of any vulnerabilities. Speaking strictly from a security standpoint, you’d be better off with plain old timestamps—yes, there are ways they could fail, but at least they’re obvious, and the average dev can reason about those failures and how to mitigate them.
(Not to be too harsh on CRDTs! I think it’s pretty cool that people can build distributed applications on top of high-quality libraries like Automerge, without having to solve every single tricky consensus problem themselves. They might even be the overall best solution for the chat room being discussed! I’m only saying that just because a black box magically solves consensus problems doesn’t mean it solves security problems.)
My bad for bringing up the attacker case as a casual example; I wasn’t trying to make a point about attack vectors as something that played meaningfully into our reasoning in our choice of tech stack. (Beyond filtering for major red flags).
Better to consider my haphazard reply retracted and refer to zicklag’s much more eloquent response xD