I think the advice in this post is good advice. However, if you are using a job queue, then by definition your jobs all happen in the future, and you will always need to think about what’s going to happen to pending jobs when you deploy new code. Whether there are 2 affected jobs in the queue or 2000 is (“just”) a matter of degree.
Yeah, thinking that you don’t have to deal with this problem seems like a real mistake. If you ever want to change the behavior of an existing job, you have to first be sure all existing ones are finished (which, realistically, means creating an entirely new job, deploying that and stopping using the old job, and once you’ve verified all the old ones are finished, you can get rid of that code).
Then how do you make anything happen in the future?
The article gives a good example of this.
You can run periodic cron-style tasks that query for data that need to be processed right now. So rather than enqueuing a job to process data in a week, say, you save the current timestamp with the data. Then some sort of daily cron job can query for data at least a week old and do the processing immediately.
This has some advantages:
1. Your users can’t “make” your server busy at a later date by creating 1000 jobs in a second.
2. You can process jobs as quickly as you like at a later date, ie. run continuous integration/unit tests/etc. at 12:00am, but add a 5 minute delay between them to avoid hogging 100% of resources.
It has some disadvantages:
1. Your jobs are all run at a certain time, you are batching 7 days worth of jobs you may be planning to complete them within 1 hour at midnight on a Saturday/Sunday. This may not scale well, previously the jobs may have been randomly scattered throughout the week.
2. “1 week after I sign up at 3PM AEST” might be a better time to do something than “Midnight on a Saturday morning AKST”, for example pushing a notification to my phone while I’m sleeping.
If you’re running a daily cron-style task then you’re only batching 1 day’s worth of jobs (even though the job is operating on data created 7 days ago). I agree there’s a downside in that there a bunch of jobs run at a certain time but you could run the job hourly/etc. to better distribute the work.
For example, say you signed up at 8:46AM on Tuesday 6/7/2016. Your email won’t be sent until the hourly task runs on or after 8:46AM on Tuesday 6/14/2016 (probably 9:00AM if jobs are run hourly). If that happens to be an inconvenient time to send the email (based on some condition) the job can just skip it and it will get picked up next time the task runs.