Choosing a random order is not a good idea, because there is a chance that both possibilities would eventually get inserted, and what would that mean?
We are just now going into the development of this very feature. And I have a feeling we have not taken this into the consideration. Thank you for saving us hours!
Basically, there is a single row, single column model too. I don’t think it’s performance would be good for anything but pairwise friend testing.
Basically any operation with the commutative property and a wide enough range to avoid collisions works. “Sort them and make them a tuple” is just an intuitive function with the commutative property that implicitly has the range needed.
Modeling such concepts in a relational model to me is only half of the equation. The real challenge is how to represent the numerous constraints (rules) between concepts and other logic, without the codebase evolving into a big ball of mud as size and complexity increase?
You could have single-row “core” table and then create view for simulating two-row view to simplify queries (my friends are friends where user_1 = my_id). This would provide best of both of the two worlds.
That’s true. However if you implement this view through UNION ALL it’s possible that it would be later used in some ad-hoc analytic query and the performance could be non-obvious (and hidden by the view). It’s manageable, but it needs to be kept in mind.
The first version of this query was buggy, because I carelessly used the obvious-looking condition “WHERE user1_id = 5 OR user2_id = 5”. This condition is wrong.
For the slower among us, what makes this condition wrong? Is it because the SELECT only gets one of the two user_id values?
Both of those models frankly feel somehow weird, they go strongly against the usual effortlessness of relational database modeling. Maybe this is because of the additional invariants that are not handled directly by the table structure?
It feels like the “right” approach would be some kind of set type, where the values are collections unique unordered user_ids. Then the table constraint is |set| = 2 and friendship tests are select ... where 5 in set. But I don’t know if any SQL databases have a set data type.
SELECT user2_id
FROM mutual_friendship
WHERE user1_id = 5 OR user2_id = 5
And this version survived several minutes of writing until I realized that it’s wrong.
You could rewrite it something like (as suggested on Reddit):
SELECT CASE user1_id WHEN 5 THEN user2_id ELSE user1_id END
FROM mutual_friendship
WHERE user1_id = 5 OR user2_id = 5
I guess the idea of that sentence is that this “double-part” complexity needs to live somewhere, and any query would be somewhat awkward. Maybe I should rewrite it a bit better.
It feels like the “right” approach would be some kind of set type
I tried playing with that idea, but so far all attempts, when compiled to a classic relational framework (scalar columns), do not become more beautiful.
My idea was also that maybe we should just treat each friendship as a ordered tuple (with tuple-typed columns), but that does not allow elegantly querying list of friends.
if you don’t have any other fields in the friendship table, maybe it’d make sense to store the undirected friendship edge as two rows in the table. An undirected graph with edges (v, u) can be converted to a directed one if you store two symmetric edges (v -> u), (u -> v). You can still INSERT and DELETE like now, and you can manage the symmetric edge with triggers. Perhaps this approach will simplify all those things that appear because of symmetry. I know you say this in the article, but n versus 2n isn’t that big a deal.
This came up for me in a prototype app about ten years ago. I found that I needed to represent the asymmetrical relation, whether or not the UI displayed it, because to verify a mutual friendship you have to get both people to attest to it.
In other words, I had to model “A claims to be friends with B” in the database. I added this relation when a user indicated they were friends with another user. But the important property (for the UI and for access control) was still mutual friendship, so I queried for “ A claims to be friends with B and B claims to be friends with A”.
I have trouble imagining how you’d prove that two users were mutual friends without a two-step process like this.
You could do it that way. But in my app there was meaning to the unidirectional “follow” relationship, and the “mutual friends” one was composed of a pair of “follows”.
A potentially silly third option might be to have a table dedicated to relationships (id, timestamps, notes, status, etc.) and then a join table to link users to relationships in (user, relationship) tuples. I think that’d solve the “which id comes first” problem, and is probably the more normalized version still.
CREATE TABLE friendship (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
metadata JSON NOT NULL DEFAULT '{}'::JSON,
created_at TIMESTAMPTZ DEFAULT now()
);
CREATE TABLE friends_friendship_join(
friend_id UUID NOT NULL,
friendship_id UUID NOT NULL
)
# assume a 'friend' table that has some id column.
That make more sense?
And then you just have however many entries in friends_friendship_join are needed to specify the friendship (typically 2, but you could go higher or lower depending on business needs).
I disagree, since the two-row version is really just representing both directions of an edge on a graph, where the verts are friends and then edges are friendships.
This version is a bit dumber and just says “hey, these friends all share a friendship”.
The OP explicitly described a situation where friendships existed only between pairs of people. Your version throws away this invariant, which could be useful in other situations, but would allow invalid data in the context described in the article.
Why not using a single row model with additional column for the friendship status (mutual, forward and backward, canceled)?
This would be better in term of storage and will permit easy analytics
Would you mind producing an analysis of such schema in the same vein as presented in the article? Four typical queries, storage requirements, invariants to be preserved, possible anomalies? Then we would all see how exactly it’s better and easy.
That’s the summary of something that we don’t see, there is no detailed explanation. You understand your own idea and it may be obvious for you, but I don’t really get how the additional column would work, what does “forward and backward” mean, why do we need “cancelled”, etc (given that we’re interested in mutual friendship). And especially I don’t understand how the additional column both improves space and simplifies analytics.
What I’d like to see is table schemas, how the data looks for a friendship between alice and bob, how the four SQL queries look like. At the moment I’m in the blind.
Isn’t storing mutual friendship seperately itself data duplication? It’s derivable from your friendship table, we would want it to he such that deleting a friendship would automatically delete the mutual friendship if any, that’s a good thing to put more thought on.
We are just now going into the development of this very feature. And I have a feeling we have not taken this into the consideration. Thank you for saving us hours!
haha awesome. Please share your experience with the schema that you decide on, later on when you’ve got some.
You can use a cryptographic operation like the one described in this recent post: https://lobste.rs/s/ousoal/how_play_poker_by_mail_without_trusting
Basically, there is a single row, single column model too. I don’t think it’s performance would be good for anything but pairwise friend testing. Basically any operation with the commutative property and a wide enough range to avoid collisions works. “Sort them and make them a tuple” is just an intuitive function with the commutative property that implicitly has the range needed.
Modeling such concepts in a relational model to me is only half of the equation. The real challenge is how to represent the numerous constraints (rules) between concepts and other logic, without the codebase evolving into a big ball of mud as size and complexity increase?
You could have single-row “core” table and then create view for simulating two-row view to simplify queries (my friends are friends where
user_1 = my_id
). This would provide best of both of the two worlds.That’s true. However if you implement this view through UNION ALL it’s possible that it would be later used in some ad-hoc analytic query and the performance could be non-obvious (and hidden by the view). It’s manageable, but it needs to be kept in mind.
Obligatory classic Maciej essay: https://blog.pinboard.in/2011/11/the_social_graph_is_neither/
For the slower among us, what makes this condition wrong? Is it because the
SELECT
only gets one of the twouser_id
values?It feels like the “right” approach would be some kind of set type, where the values are collections unique unordered user_ids. Then the table constraint is
|set| = 2
and friendship tests areselect ... where 5 in set
. But I don’t know if any SQL databases have a set data type.I think I wrote something like
And this version survived several minutes of writing until I realized that it’s wrong.
You could rewrite it something like (as suggested on Reddit):
I guess the idea of that sentence is that this “double-part” complexity needs to live somewhere, and any query would be somewhat awkward. Maybe I should rewrite it a bit better.
I tried playing with that idea, but so far all attempts, when compiled to a classic relational framework (scalar columns), do not become more beautiful.
My idea was also that maybe we should just treat each friendship as a ordered tuple (with tuple-typed columns), but that does not allow elegantly querying list of friends.
[Comment removed by author]
if you don’t have any other fields in the friendship table, maybe it’d make sense to store the undirected friendship edge as two rows in the table. An undirected graph with edges (v, u) can be converted to a directed one if you store two symmetric edges (v -> u), (u -> v). You can still INSERT and DELETE like now, and you can manage the symmetric edge with triggers. Perhaps this approach will simplify all those things that appear because of symmetry. I know you say this in the article, but n versus 2n isn’t that big a deal.
that’s what we started from, the two-row representation. In this thread we’re trying to go higher, beyond that.
This came up for me in a prototype app about ten years ago. I found that I needed to represent the asymmetrical relation, whether or not the UI displayed it, because to verify a mutual friendship you have to get both people to attest to it.
In other words, I had to model “A claims to be friends with B” in the database. I added this relation when a user indicated they were friends with another user. But the important property (for the UI and for access control) was still mutual friendship, so I queried for “ A claims to be friends with B and B claims to be friends with A”.
I have trouble imagining how you’d prove that two users were mutual friends without a two-step process like this.
You will have something like “friend requests” with “approve” and “reject” operations. This is trivially modelled in a separate table.
The article discusses what to do when the friend request was approved.
You could do it that way. But in my app there was meaning to the unidirectional “follow” relationship, and the “mutual friends” one was composed of a pair of “follows”.
A potentially silly third option might be to have a table dedicated to relationships (id, timestamps, notes, status, etc.) and then a join table to link users to relationships in (user, relationship) tuples. I think that’d solve the “which id comes first” problem, and is probably the more normalized version still.
Could you draft a concrete minimal structure of the tables? I suspect that it’s going to be one of the discussed structures in disguise.
Sure!
That make more sense?
And then you just have however many entries in
friends_friendship_join
are needed to specify the friendship (typically 2, but you could go higher or lower depending on business needs).So, it looks like a two-row version basically, doesn’t it?
I disagree, since the two-row version is really just representing both directions of an edge on a graph, where the verts are friends and then edges are friendships.
This version is a bit dumber and just says “hey, these friends all share a friendship”.
The OP explicitly described a situation where friendships existed only between pairs of people. Your version throws away this invariant, which could be useful in other situations, but would allow invalid data in the context described in the article.
I don’t enforce that invariant, but you could via triggers or constraints.
My point was to offer a third option in the solution space, not a good one.
No, absolutely. I don’t judge! The idea is to investigate the design space.
Now that I understand your idea better, I think that it’s actually an equivalent of a one-row solution :)
So, we’ll have for the
friendship
table:For
friends_friendship_join
:And then to query a list of friends of Alice, we use:
right?
I believe so.
Why not using a single row model with additional column for the friendship status (mutual, forward and backward, canceled)? This would be better in term of storage and will permit easy analytics
Would you mind producing an analysis of such schema in the same vein as presented in the article? Four typical queries, storage requirements, invariants to be preserved, possible anomalies? Then we would all see how exactly it’s better and easy.
That’s the summary of something that we don’t see, there is no detailed explanation. You understand your own idea and it may be obvious for you, but I don’t really get how the additional column would work, what does “forward and backward” mean, why do we need “cancelled”, etc (given that we’re interested in mutual friendship). And especially I don’t understand how the additional column both improves space and simplifies analytics.
What I’d like to see is table schemas, how the data looks for a friendship between alice and bob, how the four SQL queries look like. At the moment I’m in the blind.
Gotcha. I am working at the moment, then not too much time to prepare scripts. I will provide scripts and query in the late afternoon or tonight.
Isn’t storing mutual friendship seperately itself data duplication? It’s derivable from your friendship table, we would want it to he such that deleting a friendship would automatically delete the mutual friendship if any, that’s a good thing to put more thought on.
We do not store both tables simultaneously. Friendship and mutual friendship are alternative designs that we discuss.