How does a company get so backwards? They went with utmost haste from leading to following. Where once I thought the sky was the limit for this product, now I just think they’re stuck on the exact same plateau as everyone else.
I think they have some definite polish on aspects of their editor, but I was (and still am) put off by the funding model for the editor. Good quality engineering isn’t free and I remain unconvinced that hockey stick growth style models are practical for guiding reasonable product development.
Another thing that gets me is marrying a reasonably fast process (the editor’s insertion speed and search speed) with a much slower process (LLM inference).
According to the Minimizing Latency: Serving The Model section, they are sending the text to online services to get predictions? Apart from the privacy concerns, I wonder who is paying for the GPU cost, and how does that factor in their business model.
FWIW I should be a little more precise and say “either LLM inference or network requests” since there’s clearly capacity to send over the net. Both seem slower (and more variable) than local editor business.
What’s backwards here? The presence of LLM at all?
I’ve come to consider this kind of LLM tab completion essential. It’s the only piece of LLM software I use and I find it saves me a lot of time, at least the implementation in Cursor. It often feels like having automatic vim macros over the semantics of the code rather than the syntax of the code. Like if I’m refactoring a few similar functions, I do the first one and then magically I can just press tab a few times to apply the spirit of the same refactor to the rest of the functions in the file.
My question is: why is that good? “Magical” is one of those words in programming that usually means something has gone horribly wrong.
Don’t get me wrong: I want my tool to make it easy to make mechanical changes that touch a bunch of code. I just don’t want the process to do it to be a magical heuristic.
You’re the one saying that the company is doing something backwards, I think it’s on you to justify that when asked, not to come back with a question tbh.
“Magical” is one of those words in programming that usually means something has gone horribly wrong.
Statements like these are just dogma/ rhetoric. Words like “magical” are just like “simplicity” or “ugly”, they mean something different to everyone.
I just don’t want the process to do it to be a magical heuristic.
Why not? What if it’s a problem best suited by heuristics?
Don’t get me wrong: I want my tool to make it easy to make mechanical changes that touch a bunch of code. I just don’t want the process to do it to be a magical heuristic.
I’m in a similar boat.
I’m less bullish on having the LLM do a large-scale refactoring than I am on using an LLM to generate a codemod that I can use to do the large-scale refactoring in a deterministic fashion.
But for small-scale changes—I wouldn’t even necessarily call them “refactorings”—like adding a new field to a struct and then threading that all way through, I’ve found that our edit predictions can cut down on a lot of the mundanity of a change like that.
The big question is: what environment does that codemod target?
For a system like this to work well there has to be a consistent high-level way of defining transformations that many people will use and write about so that models will understand it well. For that to happen you need an abstraction over the idea of a syntax node.
I’m not sure about Zed but you can do that with Cursor. The tab model is very small and fast, but Cursor has a few options ranging from “implicit inline completion suggestions with tab” to “long form agent instructions and review loop” similar to what you describe - you ask it to do stuff in a chat like interface, it proposes diffs, you can accept the diffs or request adjustments. But, I find explicitly talking to the AI much slower and more flow interrupting compared to tab completion.
I do use a mode that’s in between the two where I can select some text, press cmd-k, describe the edit and it will propose the diff inline with the document. Usually my prompt is very terse, like “fix”, “add tests”, “implement interface”, “use X instead of Y”, “handle remaining cases” that sort of thing.
I use plenty of heuristics in my editor already, like I appreciate fuzzy-file-find remembering my most opened files and up-weighting them, same with LSP suggestions and auto-imports. The AI tab completion experience is a more magical layer on top, but after using it for about an hour it starts to feel just like regular tab completion that provides “insert function name”, it’s just providing more possible edits. Another time saver I appreciate is when it suggests an edit to balance some parenthesis/braces for a long nested structure that I’m struggling to wrangle in my own.
I do use a mode that’s in between the two where I can select some text, press cmd-k, describe the edit and it will propose the diff inline with the document. Usually my prompt is very terse, like “fix”, “add tests”, “implement interface”, “use X instead of Y”, “handle remaining cases” that sort of thing.
These days, my favorite use of LLMs is to write a // TODO comment at the appropriate place, send a snippet with the lines to be changed to the LLM, and replace the selection with the response. With the right default prompt, this works really well with the pipe command in editors like Neovim, Kakoune, etc. and a command line client like llm, or smartcat.
The place I miss an LLM the most is in my shell. I’d love to be able to fall back to llm to construct a pipeline rather than needing to read 6 different manpages and iterate through trial and error. Do you have a setup for ZSH/bash/etc that’s lightweight? I haven’t seen anything inspiring in this area yet outside proprietary terminal emulators (I’m not interested)
I’m spoiled because I can’t do anything like that. I’m inventing a genuinely new technology, so I always have to think for myself because there’s no one to follow or imitate. I’m sure it sounds weird to hear me be excited about building my internal model for where changed requirements will manifest as need for changed code, but my mental model of that is razor sharp, and thinking about where changes are needed myself gives me leave to think about whether my code is expressive enough and has strong architecture.
But yeah, I know I’m the weird one. I’m the kid that retyped the red-underlined word instead of right clicking to correct spelling, the idea being that I wanted learn how to spell and spot/correct spelling mistakes instead of the machine.
Once the diff gets big enough it starts to have problem of its own. How will you know if it’s all correct without redoing all the work? What if the diff is stale by the time it is reviewed and approved? Generating a script instead of a diff solves those problems, and incidentally has another property that I prize very highly: it is just as useful to humans as it is to LLMs. Once you can define large changes as small scripts typing will no longer be the odious part of making changes that touch a lot of code.
I’ve been using JetBrains’ IDE’s full-line code completion, mostly in CLion, which defaults to a locally-run LLM. I’m on the fence about it; sometimes it’s spookily accurate, sometimes it just gets in the way, sometimes it’s almost right but with minor mistakes that would lead to bugs if I didn’t fix them. I still have to code-review each line it suggests.
If I were a slower typist, or had a disability, or had less expertise in the language/library, I might find it more valuable.
I’m a slow typist due to physical disability. For me, once a “smart” feature has let me down enough times, I stop trusting it. It takes more effort for me to invoke the smart feature, carefully scrutinize the untrustworthy output, and potentially fixing it manually, than to just do it manually from the start, using tools/accessibility aids that I can trust.
Someone on Hacker News said it was like a person trying to guess what you were saying in a conversation and that really hit the mark for me. It totally breaks my line of thought to have text suggested. I either ignore it because I’m trying to work through my thought process (in which case it’s useless) or I have to stop to read it, which breaks my flow.
Zeta […] is fully open source, including an open dataset.
The link on “open dataset” leads me to a page where I can view the training materials, which is nice, but is there any explainer on where it came from? Scraped from permissively licensed codebases? Collected from Zed users? Artificially written by volunteers or staff? Generated by a larger LLM? Torrenting books on work laptops?
I’m a pretty anti-AI guy but I can tolerate a model made from fully consensually collected materials, if for no other reason than to disprove OpenAI’s claim that it can’t be done.
But we had a classic chicken-and-egg problem—we needed data to train the model, but we didn’t have any real examples yet. So we started by having Claude generate about 50 synthetic examples that we added to our dataset. We then used that initial fine-tune to ship an early version of Zeta behind a feature flag and started collecting examples from our own team’s usage.
So we did bootstrap it initially using some examples generated by Claude, and the rest has come from our team’s own usage.
Can I download the model, run it using ollama and use that from Zed? Zed already supports ollama for chat and the base model (qwen) is available for ollama. Therefore it should be doable.
We don’t yet support edit prediction providers outside of Zed (our hosted Zeta model), Copilot, and Supermaven, but we are looking to make this more extensible in the future and allow for running models locally through Ollama and the like.
I know this is taking a lot of hate but I used this today to improve some HTML/CSS for my site and it was quite helpful imo. I’d say I’m a fan if it, and Zed in general.
How does a company get so backwards? They went with utmost haste from leading to following. Where once I thought the sky was the limit for this product, now I just think they’re stuck on the exact same plateau as everyone else.
I think they have some definite polish on aspects of their editor, but I was (and still am) put off by the funding model for the editor. Good quality engineering isn’t free and I remain unconvinced that hockey stick growth style models are practical for guiding reasonable product development.
Another thing that gets me is marrying a reasonably fast process (the editor’s insertion speed and search speed) with a much slower process (LLM inference).
According to the Minimizing Latency: Serving The Model section, they are sending the text to online services to get predictions? Apart from the privacy concerns, I wonder who is paying for the GPU cost, and how does that factor in their business model.
FWIW I should be a little more precise and say “either LLM inference or network requests” since there’s clearly capacity to send over the net. Both seem slower (and more variable) than local editor business.
What’s backwards here? The presence of LLM at all?
I’ve come to consider this kind of LLM tab completion essential. It’s the only piece of LLM software I use and I find it saves me a lot of time, at least the implementation in Cursor. It often feels like having automatic vim macros over the semantics of the code rather than the syntax of the code. Like if I’m refactoring a few similar functions, I do the first one and then magically I can just press tab a few times to apply the spirit of the same refactor to the rest of the functions in the file.
My question is: why is that good? “Magical” is one of those words in programming that usually means something has gone horribly wrong.
Don’t get me wrong: I want my tool to make it easy to make mechanical changes that touch a bunch of code. I just don’t want the process to do it to be a magical heuristic.
You’re the one saying that the company is doing something backwards, I think it’s on you to justify that when asked, not to come back with a question tbh.
Statements like these are just dogma/ rhetoric. Words like “magical” are just like “simplicity” or “ugly”, they mean something different to everyone.
Why not? What if it’s a problem best suited by heuristics?
I’m in a similar boat.
I’m less bullish on having the LLM do a large-scale refactoring than I am on using an LLM to generate a codemod that I can use to do the large-scale refactoring in a deterministic fashion.
But for small-scale changes—I wouldn’t even necessarily call them “refactorings”—like adding a new field to a struct and then threading that all way through, I’ve found that our edit predictions can cut down on a lot of the mundanity of a change like that.
The big question is: what environment does that codemod target?
For a system like this to work well there has to be a consistent high-level way of defining transformations that many people will use and write about so that models will understand it well. For that to happen you need an abstraction over the idea of a syntax node.
can the LSP protocol married somehow to tree-sitter be the answer here?
Tree sitter is far closer to being the answer than LSP is
My ideal interaction would be something like, “an LLM writes a script that modifies code and I decide whether I want to run that script”.
I’m not sure about Zed but you can do that with Cursor. The tab model is very small and fast, but Cursor has a few options ranging from “implicit inline completion suggestions with tab” to “long form agent instructions and review loop” similar to what you describe - you ask it to do stuff in a chat like interface, it proposes diffs, you can accept the diffs or request adjustments. But, I find explicitly talking to the AI much slower and more flow interrupting compared to tab completion.
I do use a mode that’s in between the two where I can select some text, press cmd-k, describe the edit and it will propose the diff inline with the document. Usually my prompt is very terse, like “fix”, “add tests”, “implement interface”, “use X instead of Y”, “handle remaining cases” that sort of thing.
I use plenty of heuristics in my editor already, like I appreciate fuzzy-file-find remembering my most opened files and up-weighting them, same with LSP suggestions and auto-imports. The AI tab completion experience is a more magical layer on top, but after using it for about an hour it starts to feel just like regular tab completion that provides “insert function name”, it’s just providing more possible edits. Another time saver I appreciate is when it suggests an edit to balance some parenthesis/braces for a long nested structure that I’m struggling to wrangle in my own.
These days, my favorite use of LLMs is to write a
// TODOcomment at the appropriate place, send a snippet with the lines to be changed to the LLM, and replace the selection with the response. With the right default prompt, this works really well with the pipe command in editors like Neovim, Kakoune, etc. and a command line client like llm, or smartcat.The place I miss an LLM the most is in my shell. I’d love to be able to fall back to llm to construct a pipeline rather than needing to read 6 different manpages and iterate through trial and error. Do you have a setup for ZSH/bash/etc that’s lightweight? I haven’t seen anything inspiring in this area yet outside proprietary terminal emulators (I’m not interested)
I’m spoiled because I can’t do anything like that. I’m inventing a genuinely new technology, so I always have to think for myself because there’s no one to follow or imitate. I’m sure it sounds weird to hear me be excited about building my internal model for where changed requirements will manifest as need for changed code, but my mental model of that is razor sharp, and thinking about where changes are needed myself gives me leave to think about whether my code is expressive enough and has strong architecture.
But yeah, I know I’m the weird one. I’m the kid that retyped the red-underlined word instead of right clicking to correct spelling, the idea being that I wanted learn how to spell and spot/correct spelling mistakes instead of the machine.
Once the diff gets big enough it starts to have problem of its own. How will you know if it’s all correct without redoing all the work? What if the diff is stale by the time it is reviewed and approved? Generating a script instead of a diff solves those problems, and incidentally has another property that I prize very highly: it is just as useful to humans as it is to LLMs. Once you can define large changes as small scripts typing will no longer be the odious part of making changes that touch a lot of code.
I’ve been using JetBrains’ IDE’s full-line code completion, mostly in CLion, which defaults to a locally-run LLM. I’m on the fence about it; sometimes it’s spookily accurate, sometimes it just gets in the way, sometimes it’s almost right but with minor mistakes that would lead to bugs if I didn’t fix them. I still have to code-review each line it suggests.
If I were a slower typist, or had a disability, or had less expertise in the language/library, I might find it more valuable.
I’m a slow typist due to physical disability. For me, once a “smart” feature has let me down enough times, I stop trusting it. It takes more effort for me to invoke the smart feature, carefully scrutinize the untrustworthy output, and potentially fixing it manually, than to just do it manually from the start, using tools/accessibility aids that I can trust.
Someone on Hacker News said it was like a person trying to guess what you were saying in a conversation and that really hit the mark for me. It totally breaks my line of thought to have text suggested. I either ignore it because I’m trying to work through my thought process (in which case it’s useless) or I have to stop to read it, which breaks my flow.
My experience exactly.
The link on “open dataset” leads me to a page where I can view the training materials, which is nice, but is there any explainer on where it came from? Scraped from permissively licensed codebases? Collected from Zed users? Artificially written by volunteers or staff? Generated by a larger LLM? Torrenting books on work laptops?
I’m a pretty anti-AI guy but I can tolerate a model made from fully consensually collected materials, if for no other reason than to disprove OpenAI’s claim that it can’t be done.
There’s a section about this in the blog post:
So we did bootstrap it initially using some examples generated by Claude, and the rest has come from our team’s own usage.
Can I download the model, run it using ollama and use that from Zed? Zed already supports ollama for chat and the base model (qwen) is available for ollama. Therefore it should be doable.
We don’t yet support edit prediction providers outside of Zed (our hosted Zeta model), Copilot, and Supermaven, but we are looking to make this more extensible in the future and allow for running models locally through Ollama and the like.
I know this is taking a lot of hate but I used this today to improve some HTML/CSS for my site and it was quite helpful imo. I’d say I’m a fan if it, and Zed in general.