We’ve run literally trillions of jobs through Sidekiq (thanks Mike!) over the years and never had an eviction problem because we set noeviction and used appropriately sized instances.
That said, using Kafka in this fashion is pretty cool, though Kafka is often very complex for “relatively simple” job queues.
Yeah, chalk it up to not paying close enough attention to the settings when I set it up 10 years ago. :) Once we had Kafka in place for our newer pipeline, then it made sense to migrate our older pipeline to use it, too.
The LRU bit was actually a secondary motivation. The primary motivation was that we got to a point where the amount of data being moved through redis could result in it filling up (regardless of the other, non-queue items that could get evicted) too quickly if we had a downstream processing problem of any significant duration. Moving to Kafka gave us much more breathing room, since we now have terabytes instead of gigabytes of storage. :)
Here at Stitch Fix it took us at least 2 years of slow burn migrating our 100+ apps from Resque to Sidekiq. These migrations can be tough! Granted, I don’t think we did it particularly well, in that we didn’t exactly focus on it.
We also have kafka in the mix, and the idea of using kafka/karafka for jobs does come up periodically and I think it would be good thing and karafka is super awesome and I’d recommend it to anyone in the Ruby space. We don’t have the problems described in the article for most of our sidekiq-ing (maybe in some niches, can’t rule it out).
There is something funny about this, in that the primary motivation seems to be about having jobs evicted from Redis due to the LRU setting.
Sidekiq itself recommends changing to
noeviction: https://github.com/sidekiq/sidekiq/wiki/Using-Redis#memory.We’ve run literally trillions of jobs through Sidekiq (thanks Mike!) over the years and never had an eviction problem because we set
noevictionand used appropriately sized instances.That said, using Kafka in this fashion is pretty cool, though Kafka is often very complex for “relatively simple” job queues.
Yeah, chalk it up to not paying close enough attention to the settings when I set it up 10 years ago. :) Once we had Kafka in place for our newer pipeline, then it made sense to migrate our older pipeline to use it, too.
The LRU bit was actually a secondary motivation. The primary motivation was that we got to a point where the amount of data being moved through redis could result in it filling up (regardless of the other, non-queue items that could get evicted) too quickly if we had a downstream processing problem of any significant duration. Moving to Kafka gave us much more breathing room, since we now have terabytes instead of gigabytes of storage. :)
Yeah, makes total sense - might be worth discussing that issue in the post versus the “oops LRU results in data loss” reasoning.
Yea I’m curious what the numbers are that they’re running. Like what is the RAM on the machine running their Redis server?
The redis instance in question has 64G of RAM.
Here at Stitch Fix it took us at least 2 years of slow burn migrating our 100+ apps from Resque to Sidekiq. These migrations can be tough! Granted, I don’t think we did it particularly well, in that we didn’t exactly focus on it.
We also have kafka in the mix, and the idea of using kafka/karafka for jobs does come up periodically and I think it would be good thing and karafka is super awesome and I’d recommend it to anyone in the Ruby space. We don’t have the problems described in the article for most of our sidekiq-ing (maybe in some niches, can’t rule it out).
[Comment removed by author]