1. 59
    1. 23

      I need to think about it more to formalize why but…. I hate the user experience of things that use these keyset cursors pretty consistently. I think part of it is I’m pretty OK at guessing which page what I want would be on. With just the “next” link, maybe the computer is avoiding the linear scan, but it is putting it on me instead! With the pages, the computer might be linear scanning, but I’m now empowered to do a binary search (and sometimes the criteria is not easy to formalize, but I know it when I see it and can start clamping from there).

      With pages, if I want to see the first ever commit, I click on the last page and scroll to the bottom. With these cursors…. I’m at the mercy of the UI designer. How do you do it on Github, for example? If you know the date of the first commit, sure, the calendar helps, but, I don’t know that.

      I appreciate the article pointing out ideas to make the UI, but I’m still skeptical. But I also think I’m biased because I have some tricks and habits from the old pagination style and just don’t know all the same tricks for the new ways. So my view right now is still I’d rather the computer do the linear time computation than making me do it.

      edit: another i think related issue too is knowing scrollbar sizes. if activity is uniformly distributed, a date range should give a fairly predictable “page” to look at. But maybe there was only one commit in the whole year of 2019, then 1,000 in the year 2020. Your cursor+limit can handle this just fine in the computer, but how can you present this to the user? I’ve been imagining a two-stage scrollbar, where the first stage is the date start point, then second stage is the items returned on that “page” with some visual presentation of how much came. But I guess I’m getting a little off topic here, just I also really like having some idea of where I am in the list. “Page 6 of 25” achieves this. “Next” does not. But the cursor thing doesn’t preclude giving some kind of position indicator the user… just i don’t see it done often…

      1. 6

        This is kind of besides your point (which I wholeheartedly agree on), but:

        How do you do it on Github, for example? If you know the date of the first commit, sure, the calendar helps, but, I don’t know that.

        You can still binary-search based on offset rather than dates, with commits/<branch>@{<offset>}. Example.

        1. 3

          oh wow, lol i might use that, i had no idea that was even a thing.

          I find much of github’s online offerings to be pretty mediocre. They’re there, it is temping to use them, but it is so much more painful than just git cloning it, doing some local check (like git log then hitting the “end” key and boom, first commit up, no search, or doing a grep -R or whatever else), then deleting it again. Feels silly to download who knows how many megabytes for a one off operation…. but it tends to be faster and less painful than the online alternative.

          Of course, again I’m getting tangential with the topic here though, just implementations do bleed through to ui.

        2. 6

          It seems kind of crazy to me how simple pagination seems but how hard it is to have any sort of satisfying solution. The more I think about it, the less compelling I find the performance concern about the LIMIT/OFFSET solution. Maybe I’m missing something, but linear scan mostly seems like a problem only if you have a very large table that gets lots of requests and you have critical requirements for freshness such that caching isn’t a viable solution and you can’t afford to scale out read replicas.

          The correctness thing seems more compelling, but correctness depends on the application requirements. If you only care about consistency, then you could just write a table snapshot at some interval with a snapshot retention policy to automatically expire the snapshots. The pagination API would give you back the first page of the latest snapshot and its ID, which you would then pass back to the API when you request subsequent pages. Because the client doesn’t care about updates, they just want a consistent view of the API, things are fine so long as care is taken that they can reasonable complete their pagination before the snapshot expires. And the snapshots don’t have to live in your expensive SQL database–they could just be written out to object storage in a JSON blob on some cron, and from there they can be filtered by JavaScript isolates running on edge compute for dirt cheap, super simple, low latency processing.

          Or maybe that’s a lot of work to avoid keyset navigation? 🤷‍♂️

        3. 11

          I really like the discussion of how to match the UI and API to keyset pagination in this article.

          1. 1

            How I’ve done it in the past: API responses include a chunk of parameters for “next page”. When the UI fetches a resrouce list, if the next page field is filled, then the UI knows that it needs to add those to the filter parameters.

            It’s actually entirely opaque to the frontend what “next page” or “previous page” means here, simply that it might or might not exist.

            (An even more straightforward way is for the API to return the URL itself)

          2. 5

            No one shared my favorite resource on the topic: https://use-the-index-luke.com/no-offset.

            1. 3

              I remember reading that page quite some time ago (lots of good stuff on that website!), it got me to do some of this style in my framework but yeah, the UI just wasn’t very good so I put it back on the backburner. Let’s look at the ui advice there:

              However, this is not a problem when using infinite scrolling. Showing page number to click on is a poor navigation interface anyway—IMHO.

              …yeah im not a fan of infinite scroll, that’s basically the worst of all options. But I do try to think about how to make it less bad. I still think some kind of two-piece scrollbar is the way to do it. Inspired by the focus knobs on a microscope, the one is a coarse selector which gives you the “page number”, then the other one is the fine adjustment, scroll inside that page. But, what, exactly, is the page number widget? What happens when you get to the edges of a page?

              The one nice thing about infinite scroll (when it isn’t manipulative anyway) is reducing discontinuities - you can just keep hitting the same button over and over again and keep going, without having a case where two adjacent items cannot be seen together.

              (actually, my favorite pagination scheme is “none”. Just dump it all on me to the limits of that being possible. Then I can scroll it, jump around it, ctrl+f search it, etc. But when there’s too many things, the scrollbar becomes unusable because the handle is just too small and the slightest one pixel movement can jump hundreds of items… so that’s really the degenerate case I’m hoping to avoid. Lazy loading things as you scroll can be done is an implementation detail if the ui works right.)

              Anyway, my first thought for the coarse selector was that it represented a range. So, for example, it goes from the time of the first message to the present and you’d select any time range in there, say, like last two days to now. Then it loads that up, select stuff from posts where date >= 48 hours ago and date < now, and that populates the fine control scroll bar. The problem is there’s some days when nobody posts and other days when everybody posts. So who knows what you get back.

              But this keyset cursor does provide another idea: the coarse control still shows first and last entries to set its boundary range, but the size of the handle is determined by the lookup. Not date >= 48 hours ago and date < now, but instead where date >= 48 hours ago LIMIT 500 (or whatever number is reasonable to still be usable in your fine control scroll bar. It might even change by default with the size of the window. Taller window = load more. BTW again, you might not even actually load it, just pull the index so you know how many items are there and construct the scrollbar sizes, then lazy load the details as they scroll into view.)

              So now the coarse scrollbar’s handle size changes to indicate where the last item actually was. If the last item is at the present, it fills up to the present. If yesterday was talkative and there were thousands of posts, it shows a little sliver to indicate it covers only a short period of time.

              (BTW I think all scrollbars ought to have some extra controls too, like the ability to type in an input instead of just drag and drop to it. So assume that’s part of the widget too.)

              What happens when you scroll to the edges of the fine control though? We don’t really want to make the user go over to the coarse bar and have to drag it. No, ideally, their attempt to scroll up again past the top of this will automatically pull the “previous” keyset cursor, adjusting the coarse control for you in that single action…. and then it can keep some of the stuff on screen at the time as part of this page, or maybe the previous cursor has some deliberate overlap or something, but I guess that isn’t strictly necessary, since you should be able to just hit scroll down to go back to where you were anyway. The fine scroll bar would then go from the top back to the bottom, indicating you’ve changed pages. Not drag friendly though - when mouse dragging, you should NEVER change pages. You must stop dragging and hit the up arrow to trigger the page change. This avoids the scrollbar jumping position in the new page, then your next drag disorients you. (Unless you were to warp the mouse along with the scrollbar but…. probably don’t do that, that sounds wrong to me. And browsers can’t do that anyway.) So you get to the top and the “more” thing appears right there for you to tap/click, and the traditional up arrow button on the scrollbar, up key, mouse wheel, etc., can do it too. As long as it is an explicit, clearly intended action. Ditto on down for the next page.

              The coarse control btw might not be a bar per se, maybe it can expand into a calendar or something, but I do think a bar of some sort is a good indicator. Depending on what column you sort on, it might be a timeline, or an alphabet, or a number line, whatever. It might have its own zoom function or multi level control if it spans too much area to be easily used on screen.

              …I think this is something workable. The coarse bar has a thick line indicating that is the manipulateable thing that is your keyset cursor, then a shade indicating the direction it is loading and how much it managed to cover inside the limit… I gotta get implementing to try it for real now lol

            2. 5

              If you expose a large enough table with limit offset pagination then it can be used for DoS. (The article explains why). I saw it happen with Lemmy a while back.

              1. 3

                Not really related to the article but there are 2 cool tricks I like to use when implementing cursor pagination:

                1. In your query, select 1 more item than the requested limit (ie. if the client requests 20 items, select 21 in your query). This makes it really easy to return a boolean representing “has_next” / “has_more_items”. You still return n items to the client, but you can also let them know there are more items to fetch, with a single query

                2. If you want to return the total number of rows in the set (eg. the client requests 20 items, and you want to let them know there are 400 items matching their request in total), you don’t need to do that in 2 SQL queries. You can use window functions, specifically ˋCOUNT() OVER()`. This will add the total number of lines to each row (so the same value is added to each row). It’s not lightweight, but not much more than doing a select count() in a second query. If you encounter performance issues because of count, you probably need caching or approximate counting anyway

                1. 2

                  Doesn’t this not work for non-sequential IDs, like UUIDs? Or do you just leave the ID out in that case.

                  1. 4

                    It does. The ID does not need to be sequential, it just needs to be unique. You typically sort by another field, like a timestamp, and the ID serves as a tiebreaker. Both fields need to be in the where clause and the order by. If the other field is unique you don’t need to include the ID at all.

                    1. 1

                      Yep I see. Thank you.

                    2. 2

                      Fun fact: UUIDv7 can be sorted chronnologically, as the ID’s most’ significant 48 bits are comprized of the unix epoch + entropy.

                      Many implementations are now out in the wild. Presumably, one will natively reach a RDBMS near you. There are uuidv7 extensions for PG already.

                      1. [Comment removed by author]

                      2. 2

                        My favorite thing about this is that pagination is absolutely something that 95% of people consider a “solved problem.” And that same set of people also exclusively use offset for pagination (I’ve never worked at a company that didn’t).

                        1. 2

                          Let me pose this question for everyone: would you bother with cursor pagination if offset pagination were uniformly fast?

                          Tbh this seems like the kind of thing that a database could be fast at, if there were an index and no joins backing the query in question.

                          1. 2

                            A great point - no, I am not interested in patterns for their own sake. I could care less what the code looks like, I care about its properties: safe, correct, fast, etc.

                            But at the same time, this is a really big “if.” “If” we just had good enough compilers, lots of error-prone things would go away. But the world we live in is mired in tiny semantics that have a huge impact on things like performance. So the patterns for dealing with them become extremely important.

                        2. 1

                          Let’s not overstate the problems of offset pagination. Looking at slide 36 of this presentation from Markus Winand, offset pagination which can use only a single index is pretty good slow: https://www.slideshare.net/slideshow/p2d2-pagination-done-the-postgresql-way/22210863.

                          Clearly, cursor based pagination has even nicer performance properties for very large tables and that might matter, but let’s not pretend that every app could become 10 times faster if we gave up offset.

                          1. 1

                            What do you mean “pretty good”? The issue with offset pagination is that it’s very slow to fetch late pages, and the charts show that the index doesn’t improve last page slowness. (Of course querying is slow without an index! That isn’t really relevant to offset vs keyset pagination.)

                            1. 1

                              Oh man I misread the set of charts 🤦‍♂️. I’m quite surprised that there’s no difference in the speed with which rows can be discarded with an index.

                              1. 2

                                OFFSET needs a counted btree index but for whatever reason databases don’t have them. https://www.chiark.greenend.org.uk/~sgtatham/algorithms/cbtree.html

                                It’s apparently possible to make pagination faster by maintaining summary counts with triggers but it’s complicated! Dunno if anyone has done it in production https://www.depesz.com/2011/05/20/pagination-with-fixed-order/

                                1. 1

                                  The counted btree is interesting, but it only works for an offset if the where clause can take you perfectly to a place in the subtree. Which obviously is possible in some cases.

                          2. [Comment removed by author]

                            1. -2

                              This is always such a good first interview question.

                              Show the candidate a basic schema, ask them to write a reusable sql query for paginating over it.

                              If they use “offset”, end the damn call

                              1. 57

                                Not everyone knows everything. If they use offset it’s a great opportunity for a discussion: talk about the problems of offset, explain the common alternatives, ask them to try to use a cursor in front of you (making clear they may ask questions) and see how they adapt. Being able to learn and adapt fast is just as important as knowing stuff.

                                Keyset/cursor pagination is also not trivial, there are a lot of crappy implementations in the wild. A colleague had an issue with HubSpot just a few days ago: there’s a resource where they offer cursor pagination with just the date, and they set a max limit. When you end up with n resources with exactly the same date, n > their max limit, you end up in an infinite loop and can’t reach a next page. An application as big and well known as HubSpot does keyset pagination wrong.

                                Oh and also: sometimes you do want offset based pagination because you want… pages. If your question is about pagination and they use offset, that’s literally a right answer. Give more precise specs if you expect another answer.

                                1. 8

                                  You can have pages with keyset pagination. See the heading «OFFSET pagination emulation with cached page boundaries» in Lukas Eders article here: https://blog.jooq.org/faster-sql-pagination-with-keysets-continued/ for an example of how.

                                  1. 3

                                    This may be worse than doing it online though, as you have to compute and cache the pagination for every combination of filtering, ordering, and page size, on every table update.

                                    1. 2

                                      The way I would use this is to have the client ask for the pages as an extra request in context of the current search or what have you.

                                      The result of this pages-query is cached by the client and used to drive the pages-widget. Note that the next and previous page buttons should use the start and end cursors from the current page to make sure you don’t skip data in case the cached pages get out of sync with reality.

                                      The pages query only needs to scan N rows once, so the performance here should be good. Depending on UI needs you can probably limit N to X * pageSize as well which is even better for performance.

                                    2. 1

                                      Interesting, I always assumed it was possible to emulate pages with window functions but never took the time to think about the implementation. Thanks for the link

                                  2. 47

                                    So this is both interesting enough for an article with (as of now) 9 votes from seasoned practitioners, yet so basic you don’t deserve a chance at a dev job if you don’t know it already?

                                    You should get better at interviewing.

                                    1. 14

                                      Good Lord you are a cruel person.

                                      1. 9

                                        I’d use offset and I’m pretty comfy with that unless you’ve expressed some constraint to me. If I don’t know the answer, see if you can teach it to me. If I’m supposed to know arbitrary information at the job and no one’s going to teach me when I don’t know, I have zero interest - it’s impossible to grow in that environment.

                                        TBH Postgres hasn’t been a heavy lifter for teams I’ve been on. I’ve spent more time with Cassandra, and Postgres has largely served things like “user table” etc. I’ve built a queue on postgres before and that’s about as fancy as I’ve gotten.

                                        1. 4

                                          The linked article mentions that, to use keyset pagination, you need to order by at least one column having a natural order. If your “basic schema” doesn’t have this property, how can the candidate implement pagination without “offset” and “limit”?

                                          1. 1

                                            I would assume created_at least/ updated_at timestamps are part of a “basic schema”

                                      🇬🇧 The UK geoblock is lifted, hopefully permanently.