1. 17

Hey all, I’m keen to hear how you index code in your projects and use that index to jump around to where specific symbols are defined (e.g: struct, function, interface/trait), as well as for completion, but that’s a bonus

For the longest time, I’ve been using Vim and to get my cursor to some symbol I use a mixture of

  • Fuzzy find and open a file in the project (FZF :Files or Ctrl+P)
  • Fuzzy find and jump to line in the open buffer (FZF :Lines)
  • Grep the project and jump to line (FZF :Rg)

But when working in Sublime, or other more IDE-like editors (i.e: IntelliJ IDEA) I personally find it more valuable to jump to a specific symbol that has been indexed by the editor. This is the case especially when the project is very large, but even when it’s not.

In the past, for Vim, I’ve used vim-gutentags to run ctags and that has been alright. My main issues are that it can be slow (and cpu intensive) for very large projects, and you have to manage this tags file (though that isn’t the end of the world).

Additionally, from lurking round the web, I get the impression that ctags (and alternatives) aren’t really used anymore or that there’s now a better way of supporting jump-to-symbol in Vim. Is this accurate and, if so, what’s the new alternative?

  1.  

  2. 8

    How do you index code in your projects?

    I used to use tags and now I use an LSP client (the one shipped with Neovim v0.5). I find that trying to use the CLI to discover semantic information about your code doesn’t work past a certain amount of lines of code. I still regularily make use of grep, both in a terminal and with a “fuzzy” matcher (denite) but the larger the project the less I’ll use it. For working on Firefox and Linux I prefer using https://searchfox.org/ and https://elixir.bootlin.com/linux/latest/source though.

    Additionally, from lurking round the web, I get the impression that ctags (and alternatives) aren’t really used anymore or that there’s now a better way of supporting jump-to-symbol in Vim. Is this accurate and, if so, what’s the new alternative?

    I would say that it is accurate and that the alternative is LSP clients and servers.

    1. 3

      I would say that it is accurate and that the alternative is LSP clients and servers.

      Personally, I find it annoying that there was such a quick switch to something that is still as unstable and raw as LSP. Setting up a server is often not easier then using tags, crashing is not infrequent and they are usually just tested with VSCode, which allegedly isn’t a faithful implementation of the specification. Nevertheless, so many languages and work environments have deprecated whatever was working before to support LSP, which at least for me has broken more things that helped me.

      It’s a classical example of where more though should have been put into it, but the first working implementation came quicker.

      1. 4

        Setting up a server is often not easier then using tags, crashing is not infrequent

        The bad news is that it’s not just immaturity. A language server is necessary a couple of orders of magnitude more complex than ctags-like approaches (for one, it needs to integrate with project’s build system). Even if perfectly implemented, it would be less reliable and less robust than simpler approaches.

        That being sad, as a server-implementer I agree that the protocol could have been designed better.

        1. 1

          kinda silly but I wonder if you could build like… a ctags LSP shim or something for people who want to manage ctags but are using an LSP-based system

          1. 1

            totally! This is one of the things I’ve suggested in my other comment: https://lobste.rs/s/ujr9mg/how_do_you_index_code_your_projects#c_buj3rg

          2. 1

            This has certainly been inline with my experiences with LSPs (mostly for Rust, Go, and Python). Great concept, and amazing when they work but very finicky with frequent crashes.

          3. 3

            It’s a classical example of where more though should have been put into it, but the first working implementation came quicker.

            This is how I feel about pretty much all the code indexing stuff I’ve used (GNU Global, Ctags and its seemingly unending variants, Cscope). They’re all janky and have been janky for ages.

            1. 1

              The only tooling I’ve had work reliability in terms of indexing/intelligence has been if the editor/IDE does it itself. Visual Studio, IDEA, and good Emacs major modes are the platonic ideal here.

            2. 2

              Nevertheless, so many languages and work environments have deprecated whatever was working before to support LSP

              Interesting, what are these environments and what did they remove support for? My knowledge is limited to (Neo)Vim and tags are still available and supported there.

              they are usually just tested with VSCode

              Yes, that is true. There’s even a great rant by an LSP client maintainer where he points out several issues with LSP, among which the fact that most LSP servers are bad citizens.

              It’s a classical example of where more though should have been put into it, but the first working implementation came quicker.

              Wasn’t the first working implentation ctags (or whatever was there before)? It’s not like jump-to-definition is something completely new, there had already been several approaches by the time LSP was created. The fact that it became so popular so fast IMO proves that it has multiple advantages over the SOTA, despite the bad things in its design (UTF-16, really???).

              I think what’s really required now is to create a conformance testsuite, fuzzers and other tools to improve the quality of current implementations.

              1. 2

                Interesting, what are these environments and what did they remove support for?

                I was thinking about if the other way around: Go and Haskell tools for development are not being worked on any more (and at least in the case of Go is now non-functional), and instead everyone is focusing on LSP, which is still not stable.

          4. 6

            I use an IDE, which indexes the project and provides instant method/function signatures, jump to definition, find usages, refactoring, etc etc.

            1. 1

              Yes - and Vi keys in the IDE.

              1. 1

                Not so much.

            2. 5

              Heh, I personally just implement the thing myself (1, 2) :)

              More practically, I think there are these choices available:

              • use smart IDE, like IntelliJ
              • use editor with LSP support (and bug LSP, editor and server developers to improve support for this, as they don’t allow you to filter your code vs library code, or test vs main code)
              • produce symbol index as a part of build process (Google’s kythe).
              • use offline regex based index (ctags). This works well enough, but might not be super precise (as regexes can’t parse programming languages), and have a drawback that index rebuild must be triggered manually.

              The next two options are a straightforward (Christmas holidays scale) extensions to the last one, but I don’t think they are available as a popular stand-alone tool.

              • online indexing – listen for the file changes and apply an incremental diff to the index, rather than recomputing it from scratch
              • plug tree sitter instead of regular expressions for richer and more correct indexes.
              1. 2

                Sourcegraph has an open source core, for a standalone tool: https://github.com/sourcegraph/sourcegraph

                1. 2

                  But does the core actually has indexing capabilities? I don’t know, but I think they are consumers of LSIF (a feature of LSP) and ctags, rather then producers?

                  GitHub’s semantic might be closer to what I am talking about (but, again, haven’t looked close at it):

                  And, naturally, IntelliJ is open source, that would be (well, it is :) ) the first place for me to look how to do language-aware tooling:

                  1. 2

                    We’ve run the OSS sourcegraph indexers internally and they worked quite well. It isn’t currently running (just for lack of use - mostly). But unless they changed the OSS offering recently, it definitely works as an indexer.

                    1. 2

                      I was thinking about this comment more after replying and I think my first reply is confusing/wrong. I was thinking, as an end-user, does open source Sourcegraph have code search capabilities.

                      You’re right - the core consumes LSIF.

                      https://docs.sourcegraph.com/code_intelligence/references/indexers https://lsif.dev

                  2. 1

                    I really wish kythe has better open source support. The docs are out of date and its hard to get started without a viable frontend.

                    1. 1

                      plug tree sitter instead of regular expressions for richer and more correct indexes.

                      I assume you’re referring to this tree sitter? Looks like an awesome project!

                      @matklad are you aware of any projects using tree sitter for symbol tagging/indexing? And, with your work on rust-analyzer (and thank you for all you’ve put into that) do you have any thoughts on where effort is best placed to improve code indexing outside of big IDEs like IntelliJ? I.e: working more on LSPs to expand support for indexing and searching, and ensure they are robust and work well on large codebases vs building a modern ctags-like tool that is perhaps based on tree sitter.

                      1. 2

                        @matklad are you aware of any projects using tree sitter for symbol tagging/indexing?

                        I think GitHub’s semantic does that (although not sure), but it tries to go way beyond simple indexing. In terms of making a dent, I think if you want to support a specific language, you really should push on the LSP implementation for that language making server more robust & fast, making client more powerful, and making the protocol more capable and less bet.

                        If you want to improve the base line support for all languages, I think a tree-sitter based map-reduced indexer could make a meaningful difference. Basically:

                        • some infra to write mappers which take tree-sitter’s syntax tree and output the list of (symbol name, symbol kind, symbol span) triples (with maybe some extra attributes)
                        • some fuzzy-search index for symbol names (if using rust, just take the fst crate (or copy-paste symbol_index.rs from rust-analyzer))
                        • a driver which can:
                          • process a bunch of files offline embarrassingly parallel
                          • incrementally update the index when files change (that is, if file x changes, remove all x keys from the index, and then re-add the keys after the mapper step).
                        • some client API – LSP protocol for easy integration, custom search protocl for more features (streaming & filtering) or just a CLI for unix way
                        • an optional persistence layer, to load the index from disk (but you most definitely can live with purely in-memory index for a long time, don’t implement this from the start, postpone)
                        • some pluggability, so that it’s not you, but your users who maintain grammars & mappers.
                        • a kind of a feature creep, but why not – a tri-gram index on top of driver/mapper infrastructure, to speed-up text-based greps.
                    2. 3

                      In Emacs I use a mixture of plain grep/ripgrep/whatever, vc-git-grep, and GNU Global with ggtags-mode. Most of the time I lean on Global and it is decent, but misses many use cases that LSP (when I have used it) can catch. I mainly work on a GCC code base, which is rather large, and the plain grep/ripgrep/whatever is just too slow. Global is much faster. If I’m searching for something in the repo, ‘vc-git-grep` is the clear winner. If it’s in the generated build files, grep is fine.

                      I did try LSP but because it has to hook into the build a specific way it’s a bit clunky to use in the GCC build. I’ve been meaning to try it out again but what I have now works pretty well and the motivation to change isn’t sufficient right now.

                      1. 1

                        I did try LSP but because it has to hook into the build a specific way it’s a bit clunky to use in the GCC build. I’ve been meaning to try it out again but what I have now works pretty well and the motivation to change isn’t sufficient right now.

                        Yeah, a lot of C/C++ tooling assumes you use your IDE’s build system or maybe CMake. I end up working with seemingly everything but that, so unfortunately IDEs aren’t as useful to me as they once were.

                      2. 3

                        cscope still works well for me - I find it very easy to use, and I’ve used it on very large codebases. The indexing time is not significant.

                        I do not use completion frameworks, but that’s a personal choice.

                        1. 2

                          In the past, for Vim, I’ve used vim-gutentags to run ctags and that has been alright. My main issues are that it can be slow (and cpu intensive) for very large projects, and you have to manage this tags file (though that isn’t the end of the world).

                          I use my own ctags wrapper (a whopping 24 lines of VimScript!) with a set of heuristics that assemble an exclude list (so you don’t waste time parsing crap in node_modules, generated files, caches, virtualenv dirs, etc). I don’t even use a Vim tags plugin - I just bind that function to a hotkey.

                          It looks like a ghetto solution, but it’s been working surprisingly well for me. I haven’t touched that code in years, and I use ctags every day!

                          1. 2

                            I use emacs and mostly write go. In the past I used godef integration to jump to definition: (C-c j). I’ve been experimenting a bit with lsp which provides the same functionality.

                            I also have come to really like Google’s code search indexing abilities. Both for autocomplete proper regex search and for references/jump to definition. For example load this up: https://source.chromium.org/chromium/chromium/src/+/master:chrome/chrome_proxy/chrome_proxy_main_win.cc;l=19;bpv=1;bpt=1

                            click on kChromeProxyExecutable and you’ll get a list of where it’s used. Click on FILE_PATH_LITERAL and it will jump to definition.

                            [edit] Oh, and grep still. Sometimes your tooling doesn’t understand the project definition/layout.

                            1. 2

                              For my projects I’m really using IDEs (Intellij ones + IdeaVim) that have this out of the box. When I was experimenting with using Vim as an IDE, I was using LSP servers (coc-vim), ripgrep and my own few-line scripts that were performing simple tasks across whole codebases. For libraries that mostly went unmodified, but from some reason I had to have indexing support for them, I’ve used ctags, gtags and an opengrok instance which I use to explore the unknown codebase.

                              1. 2

                                I aliased this script to “findcode”:

                                IFS=$'\n'
                                find .                  \
                                     -type f            \
                                     ! -path '*/.git/*' \
                                     -name "*.$2"       \
                                     -exec grep -Hi --color=auto -- "$1" {} \;
                                

                                So when I want to find lines in my codebase that contain “carWashFactory”, I type:

                                findcode carwashfactory php
                                
                                1. 2

                                  I just use a CLI grepper. Common ones these days are ag, ack and ripgrep. I admit that in-IDE indexed jumping is probably superior when you’re trying to look up things that are in third-party libraries, but, otherwise, I’m content with CLI for in-codebase searches.

                                  1. 3

                                    Interestingly, I somewhat recently learnt of git grep, and it’s since become my default grepprg (obviously with -n flag). Then if I really need raw rg and/or some tricky filtering, there’s always :cex system('...') or :lex system('...')

                                    edit: Ah, and I just remembered: last month I also found out that you can filter filenames in git grep, for example:

                                    git grep '^func Test' -- ':*_test.go'
                                    git grep 'Print' -- ':!*_test.go'
                                    
                                  2. 2

                                    For most of languages I use LSP. The main advantage is that it understands the language and knows where to find definitions, find usages, perform renames, display documentation. The disadvantage is that some implementations of servers are slow, (or all of them are slow, depends on your definition of slow).

                                    1. 2

                                      With the dev version of Neovim, I use LSP whenever possible. The integration is much better than grepping around (that I used to do).

                                      1. 2

                                        Not. I use plain grep in a terminal. Keeps me aware of the general structure of the project.

                                        1. 2

                                          LSP and Universal Ctags (not Exuberant Ctags which are distributed with most systems as ctags). Sometimes I go with :grep in Vim, but in most cases above tools are more than enough for my needs.

                                          1. 2

                                            On my main development machine I have neovim and just greps. As most of the things I work on tend to break graphics, terminal, … on a regular basis, there is a second machine that doubles as local CI, ‘man viewer’ and backup. That machine also indexes and browses using the rather awesome sourcetrail (https://github.com/CoatiSoftware/Sourcetrail ) - would love a CLI/TUI to it though ..

                                            1. 2

                                              I don’t. I just use Notepad++. As long as the project is below 100k LoC and I wrote everything myself, I don’t need it. For larger projects at work, I use Eclipse CDT (for C/C++). I’ve also been looking into Visual Studio Code combined with Rust Language Server (RLS), that looks pretty promising and has helped me out a couple of times.

                                              1. 2

                                                ctags -R, then vim bindings to search