Interesting and funny that bluesky is running into the same problems we did at early twitter, and coming up with very similar solutions! We diverted “popular” users (high fanout) to a separate “ashton” cluster so it didn’t disrupt delivery to everyone else, and “drowning” users (following too many people) would fall out of haplo (home timeline cache) and just get a random selection of their home timeline if they ever looked, because it wasn’t worth trying to deliver reliably to them. Obviously the latter solution became moot with Algorithms.
Notably, we did not try to store home timelines in a database – the whiteboard disk throughput numbers looked unreachable. I guess that’s one area where hardware has improved quite a bit. :)
Re: Storing Timelines. Yep, Scylla has been pretty solid for storing all the home timelines assuming we don’t have hot shards but it’s getting to the point where we’re looking at building something custom and more circular-buffer shaped. Might happen this year if I can get the time to work on it but we’ll see.
At the very least we’ll probably end up with hybrid timelines where celeb post references/timestamps are cached heavily and merged in on-demand with typical fanned-out TLs.
Additionally, I’m looking into hibernating timelines of users who aren’t active in a given datacenter for some inactivity period and then doing on-demand generation on reactivation and re-enrolling them in fanout. Will require a bit more coordination work but should help scaling and allow us to have smaller PoPs with full network indices.
Yeah, we used a (custom-replicated) redis cluster to store a ring buffer of about 800 tweet ids per user for the home timeline. On write, it would fanout appends (push/trim) to each follower timeline. On read, each id was used to fetch a JSON fragment from an enormous memcache, to hydrate a page of the timeline. The whole thing depended on running from memory all the time. Madness. :)
Hm, neat. It’s tempting to wonder if a bloom filter would help for telling whether or not to hit redis to fetch the lossiness metric l number, but I guess probably not since that’s one extra lookup. :)
Interesting and funny that bluesky is running into the same problems we did at early twitter, and coming up with very similar solutions! We diverted “popular” users (high fanout) to a separate “ashton” cluster so it didn’t disrupt delivery to everyone else, and “drowning” users (following too many people) would fall out of haplo (home timeline cache) and just get a random selection of their home timeline if they ever looked, because it wasn’t worth trying to deliver reliably to them. Obviously the latter solution became moot with Algorithms.
Notably, we did not try to store home timelines in a database – the whiteboard disk throughput numbers looked unreachable. I guess that’s one area where hardware has improved quite a bit. :)
Oh wow, that’s neat to hear!
Re: Storing Timelines. Yep, Scylla has been pretty solid for storing all the home timelines assuming we don’t have hot shards but it’s getting to the point where we’re looking at building something custom and more circular-buffer shaped. Might happen this year if I can get the time to work on it but we’ll see.
At the very least we’ll probably end up with hybrid timelines where celeb post references/timestamps are cached heavily and merged in on-demand with typical fanned-out TLs.
Additionally, I’m looking into hibernating timelines of users who aren’t active in a given datacenter for some inactivity period and then doing on-demand generation on reactivation and re-enrolling them in fanout. Will require a bit more coordination work but should help scaling and allow us to have smaller PoPs with full network indices.
Yeah, we used a (custom-replicated) redis cluster to store a ring buffer of about 800 tweet ids per user for the home timeline. On write, it would fanout appends (push/trim) to each follower timeline. On read, each id was used to fetch a JSON fragment from an enormous memcache, to hydrate a page of the timeline. The whole thing depended on running from memory all the time. Madness. :)
[Comment removed by author]
Hm, neat. It’s tempting to wonder if a bloom filter would help for telling whether or not to hit redis to fetch the lossiness metric l number, but I guess probably not since that’s one extra lookup. :)