I still think it’s more of a N+M vs N*M thing, but the critical change LSP brought was a change in thinking about the problem.
Language support used to be assumed to be IDE’s job. We’ve had “Java IDEs” and “Visual Studio for $LANG”, and we were hoping someone would write a “Rust IDE”. Not a language server, not a compiler infrastructure for an IDE, but an actual editor with Rust support built-in.
The problem was that it took twice as much effort to write an IDE with an integrated analyzer, than just a bare editor, or just a GUI-less analyzer. On top of that people good at writing IDEs aren’t necessarily also good at writing analyzers, so we’ve had many good editors with very shallow language support.
the critical change LSP brought was a change in thinking about the problem.
I see it as a Betamax/VHS situation. We had language-agnostic, editor-agnostic editor protocols before, (like nREPL) but they never saw mainstream adoption because they didn’t have a megacorp backing them.
Sure, compatibility with a rapidly growing editor from a big corp was definitely a big motivation that helped LSP.
But I’m not sure if nREPL was really an alternative. From the docs it seems very REPL-centric, and beyond that quite generic and unopinionated, almost to the point of being a telnet protocol. You can do anything over telnet, but “what methods are available in line 7 col 15 after typing ‘Z’” needs a more specific protocol.
From the docs it seems very REPL-centric, and beyond that quite generic and unopinionated
This is a fair criticism; the docs on the reference implementation’s site are not very clear on what nREPL actually is. I think this might be because that site is maintained by the same person who runs Cider, the most widely used client for nREPL, which only works for Clojure; the language-agnostic aspects which were key to the protocol’s original designer appear on that site as a bit of an afterthought because that’s not where his interest is.
But the protocol is extensible; eval is only one of the supported operations. You can use it to look up documentation with the describe op, and they recently added a complete op as well: https://github.com/nrepl/nrepl/issues/174
The fact that the docs don’t make this clear is admittedly a big problem, as is the fact that the protocol doesn’t standardize a wider range of declarative queries and operations. I think of these as both being part of the “doesn’t have megacorp resources behind it” problem but that doesn’t help.
It’s cool. It used to be a very common, canonical example of the “better” technology losing to the inferior one; but “better” is a multidimensional comparison, and by certain important measures, VHS was technically superior to Beta. Beta did have superior picture quality, though.
And, when compiler authors start thinking about IDE support, the first thought is “well, IDE is kinda a compiler, and we have a compiler, so problem solved, right?”.
Yep, and I think most languages were / are not developed with tooling in mind and this makes a huge difference. LSP provides a means to commonly needed features around code analysis but debugging support and code evaluation / incremental change at runtime are missing. Smalltalk and Common Lisp are the only ones that provide low-level language support for providing this, developing with them is a stellar experience that IMHO no other language has achieved yet, for exactly that reason.
Right; exactly. The problem here wasn’t the IDE authors, it was the compiler authors. The way to do this is to use the actual compiler, not for the tooling authors to reinvent the wheel into a bunch of features that are supposed to work the same as the compiler but have a ton of edge cases where they don’t match.
The reason many languages didn’t do a good job at this was actually that most compilers just suck at exposing an API that exposes the functionality needed for tooling, so tooling authors had to reinvent the wheel and write mountains of duplicated code.
Compilers that don’t have this problem tended to be pretty easy to adapt to new editors even before LSP existed. Common Lisp had the swank server which contained all the smarts; even though it was originally developed for Emacs, you could write new clients for it in other editors without that much trouble even on a shoestring effort budget compared to what LSP has available. Of course in that case it’s not language agnostic, but that’s why the nREPL protocol was developed that actually has a bunch of different implementations across several languages. (and writing an nREPL client is a lot easier than writing an LSP client because it just needs bencode+sockets; bencode takes a couple pages of code vs the complexity of JSON.)
I feel that nREPL is a different kind of thing. Image-based IDEs and static analysis-based IDEs are very different technologies, even if they can power similar end-user features.
Well, it’s a different thing in that it requires a specific evaluation model in a language in order to be supported, but for those supported languages, it’s dramatically more capable than what LSP offers: “show me the test results for running this given file inline”, “perform completion on the fields of this specific hashmap”, “toggle tracing for the function at point”. All super useful functionality that LSP can’t provide because it’s hamstrung in its evaluation model. Clojure’s LSP server can’t even perform macroexpansion in a way that makes such a basic feature as “find definition” work reliably.
I’m just objecting to the idea that “LSP-shaped things” weren’t around before LSP. It wasn’t that LSP introduced a new category of tooling; it just brought these tools which have existed for decades into more mainstream languages whose tooling have historically really sucked.
What baffles me about LSP is that it actually does not deal with the most fundamental aspect of language support: syntax highlighting. There is some work to extend the protocol but last time I checked there were no IDEs that supported this extension. So while your auto-completion may be handled by fancy LSP, your syntax highlighting is likely handled by brain-dead, regex-based Textmate syntax “grammar”.
I think that’s because syntax highlighting is often done synchronously in the UI thread, while all the other features are done in the background? It’s generally supposed to be cheap (and approximate).
Hm nice post, I agree it’s interesting how the LSP has had pretty big effects on the ecosystem (although I personally don’t want to use VS Code – pretty sure it’s had telemetry enabled by default from day 1).
I would frame it as Microsoft creating a new and successful “narrow waist”. It’s obviously beneficial to bootstrap a narrow waist with some kind of monopoly advantage – a huge piece of software like VSCode, or in WebAssembly’s case getting it in 3 browsers.
That post also alludes to the “lowest common denominator” / compromise problem of narrow waists – you can be stuck with the intersection of language features, not the union. Protobufs also have this issue.
As for the M x N argument, I’m not sure it’s as black and white as you seem to imply. I agree with some comments on Hacker News which back that up – e.g. I definitely knew people who swore by Eclipse and Netbeans, and Visual Studio 6 and .NET were extremely popular and relatively good IDEs. They had very good debugging support.
I was confused by some of the claims in the middle about duplicated implementations (perhaps because I don’t really know how VSCode works). I don’t think that invalidates the M x N argument.
You can still have a small amount of O(M x N) glue (duplicated LSP protocol implementations, some duplication in plugins) but the point is that you don’t want to duplicate compilers, yes ? You don’t want to write a Java compiler for every IDE!
As far as I know Eclipse and IntelliJ used completely different compilers for their IDE support. (And NetBeans too?)
OK I think it relates to this comment on HN:
The point is that you don’t need common protocol. A bunch of language-specific protocols would have worked! It’s interesting to ask why that didn’t happen
I don’t know the details well enough, but I would guess that you can get some decent baseline support by implementing the common protocol, and then if you want more features, use all the bells and whistles of a particular language and of a particular editor?
I think that is a pretty common compromise in the “narrow waist” architectures. You have a lowest common denominator and then some extensions.
I generally understand the motivation behind language servers and why they are a great asset, but where I get confused is on seeing their relationship with tree-sitter and tree-sitter grammars.
I’m coming from a neovim ecosystem perspective where there is the LSP for diagnostics and completion, but then tree-sitter is another tool that is an extension(?) of the LSP for highlighting and more advanced editing.
Not sure if they are complementary or competing technologies
Competing. If a language has a good LSP implementation, there shouldn’t be any reason to use tree sitter. However, for languages without good LSP implementations, tree sitter allows to approximately implement a bunch of IDE features relatively cheaply.
I’m not sure I agree. Even a really good LSP server will be much slower than tree-sitter for, say, syntax coloring. The difference is that tree-sitter (as I understand it) can live as a plugin in the editor, using the editor’s buffer, and it’s fully incremental. On the other hand, LSP requires inter-process communication, and for syntax coloring it surely must return the whole (colored) AST via a json object.
The IPC aspect is fine for semantic queries that don’t carry that much data around, like completion or goto-def, but passing a fully annotated buffer at each keystroke seems a bit more of a stretch to me. That said u/matklad is more experience with LSP so I might be missing something.
IPC is not a bottleneck for these kinds of things. As an example, Xi-editor did everything via JSON IPC. As far as I understand from the retro (https://raphlinus.github.io/xi/2020/06/27/xi-retrospective.html), it was confirmed that the perf of JSON RPC per se is not a problem. Rather, the problems were:
atrociously slow JSON impl in Swift (a limitation of a particular library, not a fundamental limitation of the approach)
code bloat was a problem for Rust (can confirm: a lion’s share of rust-analyzer’s binary is essentailly serde)
I don’t remeber exact numbers from when I was measuring things, but I think RPC round-trip was on the order of hundreds of microseconds, which is well bellow the 16ms frame budget.
Given that VS Code has no built-in support for LSP, would it be practical for Rust Analyzer to run inside the VS Code process, thus eliminating at least some of the overhead of asynchronous IPC? Or does a VS Code language plugin that’s not written in pure JS/TS practically have to use async IPC, to support the browser-based applications of the VS Code engine such as GitHub Code Spaces?
I still think it’s more of a
N+M
vsN*M
thing, but the critical change LSP brought was a change in thinking about the problem.Language support used to be assumed to be IDE’s job. We’ve had “Java IDEs” and “Visual Studio for $LANG”, and we were hoping someone would write a “Rust IDE”. Not a language server, not a compiler infrastructure for an IDE, but an actual editor with Rust support built-in.
The problem was that it took twice as much effort to write an IDE with an integrated analyzer, than just a bare editor, or just a GUI-less analyzer. On top of that people good at writing IDEs aren’t necessarily also good at writing analyzers, so we’ve had many good editors with very shallow language support.
I see it as a Betamax/VHS situation. We had language-agnostic, editor-agnostic editor protocols before, (like nREPL) but they never saw mainstream adoption because they didn’t have a megacorp backing them.
Sure, compatibility with a rapidly growing editor from a big corp was definitely a big motivation that helped LSP.
But I’m not sure if nREPL was really an alternative. From the docs it seems very REPL-centric, and beyond that quite generic and unopinionated, almost to the point of being a
telnet
protocol. You can do anything overtelnet
, but “what methods are available in line 7 col 15 after typing ‘Z’” needs a more specific protocol.This is a fair criticism; the docs on the reference implementation’s site are not very clear on what nREPL actually is. I think this might be because that site is maintained by the same person who runs Cider, the most widely used client for nREPL, which only works for Clojure; the language-agnostic aspects which were key to the protocol’s original designer appear on that site as a bit of an afterthought because that’s not where his interest is.
But the protocol is extensible;
eval
is only one of the supported operations. You can use it to look up documentation with thedescribe
op, and they recently added acomplete
op as well: https://github.com/nrepl/nrepl/issues/174The fact that the docs don’t make this clear is admittedly a big problem, as is the fact that the protocol doesn’t standardize a wider range of declarative queries and operations. I think of these as both being part of the “doesn’t have megacorp resources behind it” problem but that doesn’t help.
I mean, VHS beat Beta for a variety of reasons, but the big one was that you could record more than 60m of video at a time.
So how did rental video on Betamax work?
Multiple tapes.
Are you sure you’re not thinking of Philips/Grundig VCR (with the square cassettes) rather than Betamax?
I’ll admit I’m too young to have ever owned either so maybe that wasn’t the best analogy!
It’s cool. It used to be a very common, canonical example of the “better” technology losing to the inferior one; but “better” is a multidimensional comparison, and by certain important measures, VHS was technically superior to Beta. Beta did have superior picture quality, though.
Also note it’s not cut and dry either - while Beta “lost” the consumer market, it massively won the professional market, and variants there dominated.
I think the biggest takeaway is there’s no binary winner-loser, especially when multiple market segments exist.
This is a very good point.
TextMate language definition files didn’t have a megacorp backing them, yet they’re the defacto standard for syntax highlighting.
Yep, and I think most languages were / are not developed with tooling in mind and this makes a huge difference. LSP provides a means to commonly needed features around code analysis but debugging support and code evaluation / incremental change at runtime are missing. Smalltalk and Common Lisp are the only ones that provide low-level language support for providing this, developing with them is a stellar experience that IMHO no other language has achieved yet, for exactly that reason.
Right; exactly. The problem here wasn’t the IDE authors, it was the compiler authors. The way to do this is to use the actual compiler, not for the tooling authors to reinvent the wheel into a bunch of features that are supposed to work the same as the compiler but have a ton of edge cases where they don’t match.
The reason many languages didn’t do a good job at this was actually that most compilers just suck at exposing an API that exposes the functionality needed for tooling, so tooling authors had to reinvent the wheel and write mountains of duplicated code.
Compilers that don’t have this problem tended to be pretty easy to adapt to new editors even before LSP existed. Common Lisp had the swank server which contained all the smarts; even though it was originally developed for Emacs, you could write new clients for it in other editors without that much trouble even on a shoestring effort budget compared to what LSP has available. Of course in that case it’s not language agnostic, but that’s why the nREPL protocol was developed that actually has a bunch of different implementations across several languages. (and writing an nREPL client is a lot easier than writing an LSP client because it just needs bencode+sockets; bencode takes a couple pages of code vs the complexity of JSON.)
I feel that nREPL is a different kind of thing. Image-based IDEs and static analysis-based IDEs are very different technologies, even if they can power similar end-user features.
Well, it’s a different thing in that it requires a specific evaluation model in a language in order to be supported, but for those supported languages, it’s dramatically more capable than what LSP offers: “show me the test results for running this given file inline”, “perform completion on the fields of this specific hashmap”, “toggle tracing for the function at point”. All super useful functionality that LSP can’t provide because it’s hamstrung in its evaluation model. Clojure’s LSP server can’t even perform macroexpansion in a way that makes such a basic feature as “find definition” work reliably.
I’m just objecting to the idea that “LSP-shaped things” weren’t around before LSP. It wasn’t that LSP introduced a new category of tooling; it just brought these tools which have existed for decades into more mainstream languages whose tooling have historically really sucked.
What baffles me about LSP is that it actually does not deal with the most fundamental aspect of language support: syntax highlighting. There is some work to extend the protocol but last time I checked there were no IDEs that supported this extension. So while your auto-completion may be handled by fancy LSP, your syntax highlighting is likely handled by brain-dead, regex-based Textmate syntax “grammar”.
Several editors that are more serious about it are using Tree-sitter, both for syntax highlighting and for structural editing/navigation.
Neovim is doing a lot in this direction, with more still to come. Although a lot of this is still marked as experimental, I switched over to only tree-sitter-based highlighting about a month ago.
I found the early presentations about the development of tree-sitter to be exellent.
VS Code supports semantic highlighting via language servers. See here: https://code.visualstudio.com/api/language-extensions/semantic-highlight-guide
No idea if any other IDEs have taken this approach yet, though.
They added syntax highlighting in the latest version:
https://microsoft.github.io/language-server-protocol/specification#textDocument_semanticTokens
But yeah, it’s surprising that it took so long – it’s a rather fundamental operation, and it stresses server’s architecture.
I think that’s because syntax highlighting is often done synchronously in the UI thread, while all the other features are done in the background? It’s generally supposed to be cheap (and approximate).
Hm nice post, I agree it’s interesting how the LSP has had pretty big effects on the ecosystem (although I personally don’t want to use VS Code – pretty sure it’s had telemetry enabled by default from day 1).
I would frame it as Microsoft creating a new and successful “narrow waist”. It’s obviously beneficial to bootstrap a narrow waist with some kind of monopoly advantage – a huge piece of software like VSCode, or in WebAssembly’s case getting it in 3 browsers.
I mention them both briefly here: A Sketch of the Biggest Idea in Software Architecture
That post also alludes to the “lowest common denominator” / compromise problem of narrow waists – you can be stuck with the intersection of language features, not the union. Protobufs also have this issue.
As for the M x N argument, I’m not sure it’s as black and white as you seem to imply. I agree with some comments on Hacker News which back that up – e.g. I definitely knew people who swore by Eclipse and Netbeans, and Visual Studio 6 and .NET were extremely popular and relatively good IDEs. They had very good debugging support.
https://news.ycombinator.com/item?id=31151048
I was confused by some of the claims in the middle about duplicated implementations (perhaps because I don’t really know how VSCode works). I don’t think that invalidates the M x N argument.
You can still have a small amount of O(M x N) glue (duplicated LSP protocol implementations, some duplication in plugins) but the point is that you don’t want to duplicate compilers, yes ? You don’t want to write a Java compiler for every IDE!
As far as I know Eclipse and IntelliJ used completely different compilers for their IDE support. (And NetBeans too?)
OK I think it relates to this comment on HN:
I don’t know the details well enough, but I would guess that you can get some decent baseline support by implementing the common protocol, and then if you want more features, use all the bells and whistles of a particular language and of a particular editor?
I think that is a pretty common compromise in the “narrow waist” architectures. You have a lowest common denominator and then some extensions.
https://news.ycombinator.com/item?id=31152466 (It does seem like there is a fair bit of disagreement on this point)
I generally understand the motivation behind language servers and why they are a great asset, but where I get confused is on seeing their relationship with tree-sitter and tree-sitter grammars. I’m coming from a neovim ecosystem perspective where there is the LSP for diagnostics and completion, but then tree-sitter is another tool that is an extension(?) of the LSP for highlighting and more advanced editing.
Not sure if they are complementary or competing technologies
Competing. If a language has a good LSP implementation, there shouldn’t be any reason to use tree sitter. However, for languages without good LSP implementations, tree sitter allows to approximately implement a bunch of IDE features relatively cheaply.
I’m not sure I agree. Even a really good LSP server will be much slower than tree-sitter for, say, syntax coloring. The difference is that tree-sitter (as I understand it) can live as a plugin in the editor, using the editor’s buffer, and it’s fully incremental. On the other hand, LSP requires inter-process communication, and for syntax coloring it surely must return the whole (colored) AST via a json object.
The IPC aspect is fine for semantic queries that don’t carry that much data around, like completion or goto-def, but passing a fully annotated buffer at each keystroke seems a bit more of a stretch to me. That said u/matklad is more experience with LSP so I might be missing something.
IPC is not a bottleneck for these kinds of things. As an example, Xi-editor did everything via JSON IPC. As far as I understand from the retro (https://raphlinus.github.io/xi/2020/06/27/xi-retrospective.html), it was confirmed that the perf of JSON RPC per se is not a problem. Rather, the problems were:
I don’t remeber exact numbers from when I was measuring things, but I think RPC round-trip was on the order of hundreds of microseconds, which is well bellow the 16ms frame budget.
Given that VS Code has no built-in support for LSP, would it be practical for Rust Analyzer to run inside the VS Code process, thus eliminating at least some of the overhead of asynchronous IPC? Or does a VS Code language plugin that’s not written in pure JS/TS practically have to use async IPC, to support the browser-based applications of the VS Code engine such as GitHub Code Spaces?
You could do that (I think we even have ra running on a webpage as WASM blob somewhere), but the IPC isn’t the bottleneck at all.