I couldn’t quite grasp the algorithm for the sparse grams—the chester example seemed to break down into trigrams, and I wasn’t clear on why hes was selected while ste wasn’t. I suspect you could make a post just on this topic, though. :-)
Don’t think there’s much public, sadly. Indexing uses syntect. SQS is used for queuing the jobs. The search is mostly powered by Elasticsearch with some custom analysers.
This is a really fascinating write-up - especially how the footnote about their ngram index design and how regular trigrams weren’t quite performant enough for their particular application: https://github.blog/2023-02-06-the-technology-behind-githubs-new-code-search/#fn-69904-bignote
Hello Lobsters! Thanks for sharing this. I work on this product, happy to answer questions about it or the blog post.
I couldn’t quite grasp the algorithm for the sparse grams—the chester example seemed to break down into trigrams, and I wasn’t clear on why
hes
was selected whileste
wasn’t. I suspect you could make a post just on this topic, though. :-)Bitbucket’s code search is also implemented using Rust.
Interesting! Are there any details about it on the web?
Don’t think there’s much public, sadly. Indexing uses syntect. SQS is used for queuing the jobs. The search is mostly powered by Elasticsearch with some custom analysers.