Sourcegraph has got to be one of the best tools I’ve used… with giant monorepos at Uber. Uber has a monorepo for iOS, Android, Java and Go, each used by hundreds of teams. There’s custom tooling to work with it that I won’t get into, but I’m amazed at how pleasant Sourcegraph makes browsing it and how fast the code search is, with first-class regex support. It’s the search speed of the regex searches that really puzzles me, and I’d love to know how it’s done behind the scenes.
It might sound like I’m selling something but I’m not. I have no association with Sourcegraph, have no idea how much this tool costs for the company or the licensing terms (was that this article touches on, regarding the unconventional open source approach). But tools I’ve learned to appreciate in this environment that I’ve not used before are Kibana, Grafana and Sourcegraph. Obviously, your mileage might vary.
Update: I literally just came across a Software Engineering Daily podcast episode from a month ago where one of the Sourcegraph founders talks indexing large repos, using Uber as the example: https://softwareengineeringdaily.com/2020/07/22/sourcegraph-code-search-and-intelligence-with-beyang-liu/
Searching for regular expressions reminded me of this post by Russ Cox, which the interview you linked to briefly mentions.
I haven’t used Sourcegraph myself so I can’t speak to whether there are any similarities, but you might take a peek at the source code of livegrep, which also offers pretty speedy regex-capable search.
When searching public github with public sourcegraph, you need to limit the query a bit, so as to not get a “too many repos” result. I tend to do it by narrowing the scope down to a single org/dev; it is not explicitly documented AFAIK, but for a single repo, I found that just appending the URL to their one works as a quick shortcut, e.g.:
Then you can modify the URL in the query editbox to remove the suffix.
I’ve been running a Mozilla DXR instance for our internal code. Does anyone have experience with both? What are the advantages of sourcegraph over DXR?
I’ve also been running a Mozilla DXR instance. I’ve been very happy with it. Disclaimer: I have been a contributor to DXR in the past.
I only have minimal experience with Sourcegraph. Sourcegraph does fairly well in my opinion. The only annoying thing that I notice missing is “Find declarations”. You can search for references and it looks like any declarations are in that list but there is no easy way to find the declaration(s) separately.
The main problem with DXR is that it has no future. Development has been abandoned. Any development effort had migrated to SearchFox. DXR was explicitly designed to be able to index arbitrary code but it appears that SearchFox may be designed only to index Firefox. I’ve never tried to use it so I don’t know how easy it would be to get your own custom code indexed by a SearchFox instance. With the recent layoffs at Mozilla I doubt even SearchFox is going to be getting much work done on it. DXR only works with ElasticSearch 1.7.x and not newer versions which is becoming increasingly difficult to deal with.
Sourcegraph has two different ways to index your C++ code: lsif-cpp and lsif-clang, with the latter being the newer, recommended option. The lsif-cpp indexer is based on the DXR clang plugin. Compare https://github.com/sourcegraph/lsif-cpp/blob/master/clang/dxr-index.cpp with https://github.com/mozilla/dxr/blob/master/dxr/plugins/clang/dxr-index.cpp.
If you want to see what using Sourcegraph is like, they have a version at https://sourcegraph.com/search that indexes a bunch of public repos from GitHub. They have the DXR GitHub repo indexed so we can search within that.
For example, here are all the places where the string ->get appears in C++ files
And here are all the references to the function getFileInfo (look in the bottom frame)
Thanks for the explanation! I had a closer look and it seems pretty good. If I ever have to setup a code searching tool again it will probably be sourcegraph. Our current setup still runs on Ubuntu 16.04 which will lose support in 2021. I remember trying to get DXR running on Ubuntu 20.04 but it was too much of a pain due to dependencies on old software (like the old Elasticsearch). The only potential issue with sourcegraph is that multi-branch indexing is still experimental and we will need that. At the moment I think Mozilla’s future is too uncertain to invest much time in searchfox.