1. 7
  1.  

  2. 2

    First, our toolkit should only allow fetching data by the array of IDs, and never by a scalar ID.

    Disallowing scalar ID access makes it a bit easier to prevent the so called N + 1 query problem (also called shotgun queries).

    So, we want to encourage the developer to think in batches from the very beginning.

    I like these nudges. How about also forcing you to be explicit about when a query happens? In Django it’s pretty implicit:

    users = User.objects.all()  # not sent to the DB yet
    users = users.filter(created__gt=something)   # not yet
    users = users[:10]  # not yet
    
    for u in users:  # now it sends it!
        ...
    

    Something like for u in users.execute(): would remind you where the round-trips are happening.


    Should business logic follow the same principle (everything is batches)? For example, you might have some rules that decide whether a comment is visible to a given user:

    class Comment:
        ...
        def visible_to(self, user):
            return (user.is_moderator
                    or self.owner_id == user.id
                    or not self.is_draft)
    

    Maybe at first you only use it for checking whether to 404 on /comments/<comment_id>, so there’s no N+1 query. You just load one comment and then call this method once.

    Then later you want to render a whole list of links, and hide the links that aren’t visible to the current user. Now you do have an N+1 query. How should visible_to() have been written?

    def select_visible_to(comments, user):
        if user.is_moderator:
            return comments
        else:
            return comments.filter( Q(is_draft=False) | Q(owner_id=user.id) )
    

    Something like this?

    1. 1

      a) yeah, it would be nice to have all three ways to initiate execution: lazy, explicit and implict. But this question is language-level: some languages have a “null-syntax” feature, that is it allows you to call a function when it is not obvious from the syntax that it’s going to happen. I believe that it would be better if the code execution would be explicit, but this is apparently a matter of debate.

      b) yes! That was one of my personal insights, when I realized that it makes sense to design high-level api batch-first too. Yes, something like your select_visible_to() function.

      I have a small social network (with pretty advanced access control) as one of the motivating examples, and I’ve been thinking about how to efficiently eliminate branches, like the one you have in user.is_moderator condition. Basically it’s similar to the idea of code vectorization.

      Thank you,