1. 30
    1. 8

      Choosing a random order is not a good idea, because there is a chance that both possibilities would eventually get inserted, and what would that mean?

      We are just now going into the development of this very feature. And I have a feeling we have not taken this into the consideration. Thank you for saving us hours!

      1. 2

        haha awesome. Please share your experience with the schema that you decide on, later on when you’ve got some.

      2. 2

        You can use a cryptographic operation like the one described in this recent post: https://lobste.rs/s/ousoal/how_play_poker_by_mail_without_trusting

        Basically, there is a single row, single column model too. I don’t think it’s performance would be good for anything but pairwise friend testing. Basically any operation with the commutative property and a wide enough range to avoid collisions works. “Sort them and make them a tuple” is just an intuitive function with the commutative property that implicitly has the range needed.

    2. 5

      Modeling such concepts in a relational model to me is only half of the equation. The real challenge is how to represent the numerous constraints (rules) between concepts and other logic, without the codebase evolving into a big ball of mud as size and complexity increase?

    3. 4

      You could have single-row “core” table and then create view for simulating two-row view to simplify queries (my friends are friends where user_1 = my_id). This would provide best of both of the two worlds.

      1. 1

        That’s true. However if you implement this view through UNION ALL it’s possible that it would be later used in some ad-hoc analytic query and the performance could be non-obvious (and hidden by the view). It’s manageable, but it needs to be kept in mind.

    4. 3
    5. 2

      The first version of this query was buggy, because I carelessly used the obvious-looking condition “WHERE user1_id = 5 OR user2_id = 5”. This condition is wrong.

      For the slower among us, what makes this condition wrong? Is it because the SELECT only gets one of the two user_id values?

      Both of those models frankly feel somehow weird, they go strongly against the usual effortlessness of relational database modeling. Maybe this is because of the additional invariants that are not handled directly by the table structure?

      It feels like the “right” approach would be some kind of set type, where the values are collections unique unordered user_ids. Then the table constraint is |set| = 2 and friendship tests are select ... where 5 in set. But I don’t know if any SQL databases have a set data type.

      1. 1

        I think I wrote something like

        SELECT user2_id
        FROM mutual_friendship
        WHERE user1_id = 5 OR user2_id = 5
        

        And this version survived several minutes of writing until I realized that it’s wrong.

        You could rewrite it something like (as suggested on Reddit):

        SELECT CASE user1_id WHEN 5 THEN user2_id ELSE user1_id END
        FROM mutual_friendship
        WHERE user1_id = 5 OR user2_id = 5
        

        I guess the idea of that sentence is that this “double-part” complexity needs to live somewhere, and any query would be somewhat awkward. Maybe I should rewrite it a bit better.

        It feels like the “right” approach would be some kind of set type

        I tried playing with that idea, but so far all attempts, when compiled to a classic relational framework (scalar columns), do not become more beautiful.

        My idea was also that maybe we should just treat each friendship as a ordered tuple (with tuple-typed columns), but that does not allow elegantly querying list of friends.

        1. [Comment removed by author]

        2. 1

          if you don’t have any other fields in the friendship table, maybe it’d make sense to store the undirected friendship edge as two rows in the table. An undirected graph with edges (v, u) can be converted to a directed one if you store two symmetric edges (v -> u), (u -> v). You can still INSERT and DELETE like now, and you can manage the symmetric edge with triggers. Perhaps this approach will simplify all those things that appear because of symmetry. I know you say this in the article, but n versus 2n isn’t that big a deal.

          1. 1

            that’s what we started from, the two-row representation. In this thread we’re trying to go higher, beyond that.

    6. 2

      This came up for me in a prototype app about ten years ago. I found that I needed to represent the asymmetrical relation, whether or not the UI displayed it, because to verify a mutual friendship you have to get both people to attest to it.

      In other words, I had to model “A claims to be friends with B” in the database. I added this relation when a user indicated they were friends with another user. But the important property (for the UI and for access control) was still mutual friendship, so I queried for “ A claims to be friends with B and B claims to be friends with A”.

      I have trouble imagining how you’d prove that two users were mutual friends without a two-step process like this.

      1. 3

        You will have something like “friend requests” with “approve” and “reject” operations. This is trivially modelled in a separate table.

        The article discusses what to do when the friend request was approved.

        1. 1

          You could do it that way. But in my app there was meaning to the unidirectional “follow” relationship, and the “mutual friends” one was composed of a pair of “follows”.

    7. 2

      A potentially silly third option might be to have a table dedicated to relationships (id, timestamps, notes, status, etc.) and then a join table to link users to relationships in (user, relationship) tuples. I think that’d solve the “which id comes first” problem, and is probably the more normalized version still.

      1. 1

        Could you draft a concrete minimal structure of the tables? I suspect that it’s going to be one of the discussed structures in disguise.

        1. 5

          Sure!

          CREATE TABLE friendship (
            id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
            metadata JSON NOT NULL DEFAULT '{}'::JSON,
            created_at TIMESTAMPTZ DEFAULT now()   
          );
          
          CREATE TABLE friends_friendship_join(
            friend_id UUID NOT NULL,
            friendship_id UUID NOT NULL
          )
          
          # assume a 'friend' table that has some id column.
          

          That make more sense?

          And then you just have however many entries in friends_friendship_join are needed to specify the friendship (typically 2, but you could go higher or lower depending on business needs).

          1. 1

            So, it looks like a two-row version basically, doesn’t it?

            1. 3

              I disagree, since the two-row version is really just representing both directions of an edge on a graph, where the verts are friends and then edges are friendships.

              This version is a bit dumber and just says “hey, these friends all share a friendship”.

              1. 2

                The OP explicitly described a situation where friendships existed only between pairs of people. Your version throws away this invariant, which could be useful in other situations, but would allow invalid data in the context described in the article.

                1. 1

                  I don’t enforce that invariant, but you could via triggers or constraints.

                  My point was to offer a third option in the solution space, not a good one.

                  1. 1

                    No, absolutely. I don’t judge! The idea is to investigate the design space.

                    Now that I understand your idea better, I think that it’s actually an equivalent of a one-row solution :)

              2. 2

                So, we’ll have for the friendship table:

                | id | metadata | created_at |
                | 35 | {}       | 2021-10-20 |
                

                For friends_friendship_join:

                | friend_id | friendship_id |
                | 3         | 35            |
                | 5         | 35            |
                

                And then to query a list of friends of Alice, we use:

                SELECT friend_id 
                FROM ffj 
                WHERE friendship_id IN
                     ( SELECT friendship_id FROM ffj WHERE friend_id = 5)
                AND friend_id <> 5;
                

                right?

                1. 1

                  I believe so.

    8. 1

      Why not using a single row model with additional column for the friendship status (mutual, forward and backward, canceled)? This would be better in term of storage and will permit easy analytics

      1. 1

        Would you mind producing an analysis of such schema in the same vein as presented in the article? Four typical queries, storage requirements, invariants to be preserved, possible anomalies? Then we would all see how exactly it’s better and easy.

        1. 1
          • Establishing friendship: pre-processing is needed before INSERT, like the single row model;
          • Deleting friendship: pre-processing is needed before UPDATE, no need for a DELETE;
          • Getting the list of friends: two-part query is needed;
          • Are they friends?: pre-processing (or two-part query) is needed;
          • Storage requirements: optimal data size, but an additional index would be needed;
          • Potential invariant violations: two symmetric rows (this can be solved with a constraint, the first user_id will be the lower); wrong order of IDs.
          • You can easily extract the top friendship requestor or the top friendship “acceptor”
          1. 1

            That’s the summary of something that we don’t see, there is no detailed explanation. You understand your own idea and it may be obvious for you, but I don’t really get how the additional column would work, what does “forward and backward” mean, why do we need “cancelled”, etc (given that we’re interested in mutual friendship). And especially I don’t understand how the additional column both improves space and simplifies analytics.

            What I’d like to see is table schemas, how the data looks for a friendship between alice and bob, how the four SQL queries look like. At the moment I’m in the blind.

            1. 1

              Gotcha. I am working at the moment, then not too much time to prepare scripts. I will provide scripts and query in the late afternoon or tonight.

    9. 0

      Isn’t storing mutual friendship seperately itself data duplication? It’s derivable from your friendship table, we would want it to he such that deleting a friendship would automatically delete the mutual friendship if any, that’s a good thing to put more thought on.

      1. 2

        We do not store both tables simultaneously. Friendship and mutual friendship are alternative designs that we discuss.