Are you still using SQLite databases for the original repos, or did I miss a complete transition away from SQLite somewhere along the way? If the former, how do you handle durable storage with high availability in a cloud deployment? For example, do you use an NFS-based storage service, upload the SQLite files to object storage and download them when needed, or something else?
Good question! We’re still using SQLite for each bundle, which buys us some nice properties in terms of eviction (it’s easy to just delete an SQLite file when the commit it’s attached to gets sufficiently far away from the tip of the branch, as no one is likely to request code intel results for that commit anymore). It does have some issues with exclusive access.
I tried to swap out SQLite for something like BadgerDB (https://github.com/sourcegraph/sourcegraph/pull/11052), but the performance wasn’t big enough to merit the swap (and force the required migrations on users). I’m still curious about other backends we can use - both embedded ones (a la LevelDB) and client/server ones (nothing is preventing us from re-evaluating Dgraph as a storage backed after the product and our knowledge of it has progressed a bit).
Right now we aren’t planning on horizontally scaling the bundle manager via replication, but by sharding. That means that each SQLite file would be guarded by one process, and we can split hot bundle managers by increasing the shard count. The performance of bundle managers at our current scale isn’t an issue and we haven’t had to increase the shard count past one (but we do have some precedent with sharding services like this - it’s how we scale our gitservers).
I’ve also had the idea for a while that we can keep “hot” bundles in a close SSD cache but keep them permanently in block storage. This would work well since the SQLite databases are write-once read-many and would also allow us to scale our bundle managers horizontally via replication: any bundle manager replica could pull down the same bundle without issue. I haven’t yet evaluated this strategy (bundles can be kind of large, so making the initial requests fast would take some engineering work), but it’s still on my radar.