1. 32

  2. 8

    sure. You can also eat soup with a fork. But maybe there is a better tool you can use, and it’s the same with “you can use [insert your database name here] instead of [other specialized database name]”. You can do it. Sometimes there are good reasons to do it. But using a more specialized tool has probably some advantages.

    1. 18

      I think the idea here is more that you already have a fork and knife. You could bring along chopsticks, but now you have two sets of cutlery, and have to deal with that (even if chopsticks aren’t much!)

      EDIT: also I read the article and you know what? Redis (and most other queues) is not really useful for background task management unless you are ginormous. Not great queue visibility, in particular, makes life harder for theoretical performance gains that don’t matter if you’re only processing in the tens of tasks a second. It’s plug and play but really you want records of most background tasks anyways.

      Like you’re probably using Postgres already, operationally might be easy to stick to just one DB. Less unknowns as well (tho redis is super duper simple ofc)

      1. 4

        If you go hiking the Appalachian trail, every gram you carry matters. Perhaps you take just a spoon or even a spork and don’t bother with a fork. That would be a more accurate analogy.

        I am getting of topic, but the fork is the specialized tool, which is modern and made with a narrow scope in mind. The spoon is an ancient multi purpose tool with a broad range of uses. I think you got it the other way around.

      2. 6

        I happen to be working on a project that uses postgres in all the ways mentioned in this article, and I think that was a great decision that kept the system simpler and more coherent. I just want to mention a few caveats I’ve encountered though:

        • Keeping the job locked while you’re working on it is great, but it means you’re dedicating an entire connection for this purpose and you don’t have so many of them in Postgres
        • Sometimes the worker dies a weird death and Postgres takes a long time to notice that the connection is dead, so the job can stay locked longer than you might expect
        • Be careful about what else you run over the connection holding that lock, the unintentional locks you might acquire can cause deadlocks
        • Be careful with foreign key references to and from the row that stays locked, modifying rows that refer to other rows or get referred by other rows have non-trivial locking behaviours across the involved tables, make sure you understand what’s going on
        • Be careful with these locked jobs in combination with migrations. Lock management in migrations is always tricky, but these long running locks tend to reveal unlikely corner cases
        • If you’re going to use Postgres for Pub-Sub design for periods of deafness. You might think you’re listening to a channel, but maybe the connection’s been dead for a while, so you need to take measures to detect that and when you do, you need to look around to figure out what you might have missed in that period.

        Despite the above caveats, I still love using Postgres in this way.

        1. 2

          Detailed technical comments based on experience like yours are why I love lobste.rs! Thank you.

        2. 4

          I used many solutions for offline tasks. Both Python frameworks and directly connecting to the broker. I worked with RabbitMQ, Kafka, Redis and PostgreSQL.

          I do not quite understand why Redis is highly recommended as a message broker. If you are already having a Redis instance, depending on the requirements, it might be the right choice. But plain Redis requires some work in order to provide a solid task queue backend.

          As a background task processing that is backing some web interface, a custom PostgreSQL backed task queue works amazing. Initially, I was a bit skeptical, because of what PostgreSQL is and how it works. Reading an old benchmark page gave me much needed confidence. If you are already using Posters and your task throughput is small enough (you have to judge this yourself) using PostgreSQL as a broker has so many advantages over any other solution.

          If PostgreSQL is not enough for you as a task broker, you can tap yourself on the back, because you did a great job on building a popular product! Now take the afternoon to consider switching to another tool that will suite your needs better.

          1. 2

            I think the reason people recommend Redis is that for most average small websites it’d be replacing something really hard to setup, like RabbitMQ. Setting up a Redis instance is trivial. Most cloud providers I’ve used provide a hosted Redis solution, if you want to setup a highly-available server for some reason. A lot of people are using Redis as a replacement for memcached these days, so for most setups you already have a Redis instance. It’s also trivial - and incredibly light in resources - to reproduce your setup locally for development.

            In other words: yes, Postgres might be good, but Redis carved the niche earlier and it’s super convenient and that’s why people recommend it.

            1. 1

              If you’re going to setup a hosted Redis on your cloud provider, there’s a good chance they provide an actual message broker or queue service already. On AWS there are several.

              1. 2

                being locked into a proprietary AWS service doesnt sound like a good time to me

                1. 1

                  They’re not all proprietary - you can get managed Kafka (or something that’s API compatible), for example

          2. 1

            In the first use case, the author mentions a caveat:

            The biggest caveat for this technique is that, if you have a large number of workers trying to pull off this queue and a large number of jobs feeding them, they may spend some time stepping through jobs and trying to acquire a lock. In practice, most of the apps I’ve worked on have fewer than a dozen background workers, and the cost is not likely to be significant.

            I’m trying to understand what time would be spent “stepping through jobs”. It seems like the WITH query will either return a pending job record or it won’t. Maybe they mean that the job will have to continually poll to get new jobs? Either way, seems like redis’s blocking pop commands would be way better for this.

            Am I missing something?

            1. 5

              I think what they mean is that although the query is only returning a single row, it still needs to read and skip any rows currently locked by other transactions to find the first available row.

            2. 1

              I recall Sage Griffin mentioning long ago that the crates.io job system is built this way

              1. 1

                For a simple job queue I reach for Beanstalkd. Fast, boring, just works.

                1. 4

                  I think the point of the article is to stay away from additional infrastructure.