This seems to follow a model similar to Visual Studio Code, where a server binary is uploaded to the remote host, and communication is done with that. As opposed to e.g. Emacs’ Tramp mode, where you access the files directly. Comes from being more oriented towards remote development than accessing remote files for e.g. sysadmin tasks.
This does have a different performance profile, especially once you have plenty of extensions running, including language servers, file update checkers etc.; On the other hand, I regularly ran into system issues with this on actual remote servers (i.e. not docker containers) with a Linux setup, as the per-user file notification resources ran out (mostly due to way too many extensions each monitoring the whole project, and some of them buggy enough to include dependencies like node_modules).
Note that right now, zed itself doesn’t seem to support remote extensions anyway, but I’m sure that’s coming.
Trying to use Tramp over an 80ms connection caused me to give up Emacs entirely :(
I think the remote server approach is the only one that works. You do have to bump up inotify limits and such but honestly this is a Linux distro issue – they should be shipping much higher inotify limits than is currently the case. Alternatively I wish more tools had adopted watchman, which acts as a clearinghouse for fs notifications, and which to my knowledge is still the only completely correct implementation of recursive file watching. (I’ve been looking at the Rust notify library recently, and I’ve sadly found that it isn’t resilient to TOCTOU races in all sorts of ways.)
Edit: sorry, while drafting this post I accidentally dropped a disclaimer that I personally worked on making watchman correct wrt recursive file watching. That’s how I know it’s a difficult problem, but it is doable.
If I had to pick a reason it is simply how damn hard it is to get inotify right - and how specific that can be to the use use case you’re dealing with.
To give an example: The linux man page warns about the fact that realistically you won’t be getting all events (which some people verified). And with the way inotify is designed, you basically don’t know that you’re missing something. Each and every file you want to watch, you have to subscribe to. So missing a “create” event for a folder will make you blind to everything inside. Now try to stay low-overhead by not regularly re-scanning (what even is a good interval for that?) and you will fail. This does not yet account for editor specific behavior and trying to get an API right across multiple OS. Or the fact that network mounts (WSL) won’t give you any events.
All of this to say: Projects like watchman go to great lengths to give you reliable file events. And people would complain about resource usage if you had multiple of these running. Sorry linux, but windows gets it right (mostly).
Oh hi! Was meaning to reach out to you. Thank you for maintaining notify and I hope I didn’t sound too harsh!
There is some overhead in making notify correct, resilient to overflows and such but it isn’t too bad. I do think notify would have to provide a way to warn devs about inotify limits being too low, and ask them to bump their limits up.
My question is, would you be willing to take a patch series that fixed notify? I’m concerned that as usage of notify increases a lot of tools will be broken in subtle and not-so-subtle ways.
(The situation you mentioned is real, FWIW — there is no escape from having to lstat all the files within new directories, and you have to do that after you set up the watch so that you don’t miss any notifications in between. You basically have to treat it as a very complicated distributed system with chaotic event sources.)
Patches are always welcome - the hard part is getting them ironed out when crossing OS behavior boundaries.
I am sadly very limited in time for any FOSS since starting my last job. And while I managed to get dfaust as a maintainer he is also pretty busy. And that is 3 years after announcing my intent of “EOL” as primary maintainer without any replacement ;)
Honestly inotify sounds so difficult to use that I do wonder if it would be easier to write a FUSE filesystem that fires callbacks for operations so that listening would be cheap and easy - but then, using FUSE would probably make reading/scanning files way more expensive, so that’s not a win for performance.
I just had an idea for bypassing FUSE most of the time that might speed this up.
notification events are written to a shared memory buffer
./src/ (the monitored path) is a FUSE mount
./real_src/ (not monitored) contains actual source code
the FUSE daemon forwards reads and writes to ./real_src/ and emits notifications as needed. Most processes never actually hit this, because…
LD_PRELOAD wrapper for libc calls like open() and openat(). In any process that doesn’t bypass libc to open files, attempts to read or write files in ./src/ will be redirected to the correct file in ./real_src/ and also run the same “emit notification if needed” code that the FUSE daemon would
For context: it’s been a few years since I’ve done serious kernel-level hacking for desktop OSes but day-to-day I work on relatively low-resource embedded systems (I’ve got like 1MB of RAM, which is quite spacious compared to lots of microcontrollers). Our stuff runs soft-real-time and we have to really carefully consider queue lengths and load shedding and things like that.
In the FUSE case, what would you expect to happen if the consumer of a callback got slow for whatever reason? If you’re going for guaranteed event delivery to the callback it seems like the only thing you could really do in that case would be to block the actual filesystem operation? Maybe you have a bounded queue so that a single slow callback doesn’t result in filesystem blocking, but eventually you’re either going to run out of RAM queuing events, blocking the filesystem because the notification receiver is being slow, or dropping queued events (resulting in lost notifications).
In the FUSE case, what would you expect to happen if the consumer of a callback got slow for whatever reason
eh, probably backpressure. entire system slows down so writing to the FUSE FS gets slow.
edit: that said, I think with development workloads all you actually care about is “do I need to reload the dev server y/n?” and that doesn’t require precise tracking of events at all. you just need to know whether or not any file was written since the last time you restarted it.
Yeah thinking about this a little without having looked at the underlying API at all for inotify and friends… it almost seems like this could be level-triggered instead of edge-triggered/event-driven. Something similar to select/poll/epoll where you give it a list of files you want to be notified about.
Thanks! “dev tools actually want level-triggered instead of edge-triggered” is a good framing.
For restarting dev servers or unit tests, I’m thinking:
it’s not okay to miss events: if one or more writes occurred then you need at least one notification generated
but it’s totally okay to coalesce notifications (if I save a file from vi twice in quick succession, I don’t mind if the dev server only restarts after the second one.)
and it’s actually okay to have spurious notifications too (a dev server restarter can check if files really changed).
it’s okay for a notification to come in a little late (I desire that my dev server restarts in <1s but don’t need it to be <1ms)
and it may still be useful even if it’s somewhat vague about exactly which files changed (a dev server restarter can examine all the files in a directory or something).
it would be enormously nicer if you only needed to establish one “watch” for an entire directory tree instead of one per file or directory in that tree (because it’s much faster and avoids having resource limit pitfalls)
I think for dev servers the ideal API call might be like “please notify me if any file anywhere under ./src/ has an mtime greater than 23456789”.
edit: an idea to avoid memory exhaustion when there are too many events is to have adaptive imprecision. if I change src/engine/a.cpp, then src/engine/b.cpp in quick succession: if the events are being consumed promptly then emit both events individually, but if consumers are lagging then emit only a single “some things changed in src/engine” event.
Could be. The kernel manpage even says that it’s kinda expected that you really won’t be able to receive all events in time to react and not overrun buffers.. I have issues from people with thousands of files and loosing events to them. So it’s not even about “not missing a folder”.
this is a Linux distro issue – they should be shipping much higher inotify limits
Yeah, it’s not a hard problem. Sadly the few times where I actually wanted or needed to develop on a remote system meant that I couldn’t do it because the local hardware was too anemic or locked down (WSL? Docker? Nope, you’ll get our standard office drone image plus a IntelliJ license, that’s it). And would you know, that also sometimes meant the dev servers couldn’t be easily modified (“I’d need to make sure that all the umpteen developers and test systems aren’t affected by this!”)
Trying to use Tramp over an 80ms connection caused me to give up Emacs entirely :(
Wait, what? On a connection with 80ms of network latency if I pay close attention I can certainly notice the lag in an interactive shell session over SSH, but it’s still way below the threshold of being problematic for general use. But that’s in a setting where you’re paying the round-trip cost on the terminal echo of every keystroke. With TRAMP you’re only going to do that on file saves and buffer clean->dirty transitions (for metadata checks), right? I routinely use TRAMP between systems with network RTTs in that range without any difficulty whatsoever; I’m genuinely curious how it ended up being so much of a problem for you.
Yeah I tried that out for a while. I used to ride the Facebook bus 2-3 hours a day with rather flaky mobile internet – and mosh was a lot better, but still not great. At least back in the day, security folks had serious concerns about it too.
My only problem with it is that it’s too hard to build it. Nixpkgs follows updates of most packages, providing updates with short delay. However, for watchman they still have 2024.03 (the last is 2024.11). Even the release pages don’t have binary packages for most of the distributions.
I don’t think either approach is universally applicable. Sometimes you want to remotely edit some files on a small machine without deploying an extra editor binary to it (might not even have the disk space or memory to do so), sometimes the remote machine is more powerful and you want as much work as possible pushed there. Sometimes both in the same project (although I’m not sure I’ve seen a setup that can do that well yet).
The one issue is you need the server it deploys to support your platform. Not a problem for most people, but it is for me on my dayjob. I think TRAMP style can also break down too if it’s different from what it usually expects.
Yeah, it broke a lot for me because of bad prompts etc., but the universality was a big advantage of this setup. You heard that lot in the vim/emacs arguments, where you couldn’t just install a modern vim with all the plugins on the target system, whereas you always had your “home” emacs.
But, well, that’s more the sysadmin perspective of the days of yore, I think. These days, it’s less likely that you also have to edit that /etc/sendmail.cf on that one weird Apollo Domain/OS server. And more likely that “remote” is on your own system or in a LAN. A lot more control, but also a lot more needs.
Thinking about it, didn’t Plan 9’s sam do that, too? I remember spreading some editor parts on various Unices in the early millennial years…
I get a pretty good experience doing remote dev stuff with plan9port Acme and sshfs.
I mount the remote folder I need on /mnt and then open it in Acme. I also open an acme terminal with Win in which I ssh to the remote server and cd to the mounted folder. From there any relative reference to a file inside the mounted folder can be opened in Acme through my local plan9port Plumber with a simple right click on it’s name.
I can also use stuff like grep and find directly on the sshfs mount point in acme, but that can be rather slow, so I do it through the ssh connection in the win window.
At some point I’d like to build an Acme program that takes a host and path then mounts that with sshfs on a mount point and proxies every command it gets through a ssh tunnel to the targeted host, that way I dont need to keep an extra Win around.
This seems to follow a model similar to Visual Studio Code, where a server binary is uploaded to the remote host, and communication is done with that. As opposed to e.g. Emacs’ Tramp mode, where you access the files directly. Comes from being more oriented towards remote development than accessing remote files for e.g. sysadmin tasks.
This does have a different performance profile, especially once you have plenty of extensions running, including language servers, file update checkers etc.; On the other hand, I regularly ran into system issues with this on actual remote servers (i.e. not docker containers) with a Linux setup, as the per-user file notification resources ran out (mostly due to way too many extensions each monitoring the whole project, and some of them buggy enough to include dependencies like node_modules).
Note that right now, zed itself doesn’t seem to support remote extensions anyway, but I’m sure that’s coming.
Trying to use Tramp over an 80ms connection caused me to give up Emacs entirely :(
I think the remote server approach is the only one that works. You do have to bump up inotify limits and such but honestly this is a Linux distro issue – they should be shipping much higher inotify limits than is currently the case. Alternatively I wish more tools had adopted watchman, which acts as a clearinghouse for fs notifications, and which to my knowledge is still the only completely correct implementation of recursive file watching. (I’ve been looking at the Rust notify library recently, and I’ve sadly found that it isn’t resilient to TOCTOU races in all sorts of ways.)
Edit: sorry, while drafting this post I accidentally dropped a disclaimer that I personally worked on making watchman correct wrt recursive file watching. That’s how I know it’s a difficult problem, but it is doable.
Notify maintainer here and you’re right.
If I had to pick a reason it is simply how damn hard it is to get inotify right - and how specific that can be to the use use case you’re dealing with.
To give an example: The linux man page warns about the fact that realistically you won’t be getting all events (which some people verified). And with the way inotify is designed, you basically don’t know that you’re missing something. Each and every file you want to watch, you have to subscribe to. So missing a “create” event for a folder will make you blind to everything inside. Now try to stay low-overhead by not regularly re-scanning (what even is a good interval for that?) and you will fail. This does not yet account for editor specific behavior and trying to get an API right across multiple OS. Or the fact that network mounts (WSL) won’t give you any events.
All of this to say: Projects like watchman go to great lengths to give you reliable file events. And people would complain about resource usage if you had multiple of these running. Sorry linux, but windows gets it right (mostly).
Oh hi! Was meaning to reach out to you. Thank you for maintaining notify and I hope I didn’t sound too harsh!
There is some overhead in making notify correct, resilient to overflows and such but it isn’t too bad. I do think notify would have to provide a way to warn devs about inotify limits being too low, and ask them to bump their limits up.
My question is, would you be willing to take a patch series that fixed notify? I’m concerned that as usage of notify increases a lot of tools will be broken in subtle and not-so-subtle ways.
(The situation you mentioned is real, FWIW — there is no escape from having to lstat all the files within new directories, and you have to do that after you set up the watch so that you don’t miss any notifications in between. You basically have to treat it as a very complicated distributed system with chaotic event sources.)
Patches are always welcome - the hard part is getting them ironed out when crossing OS behavior boundaries.
I am sadly very limited in time for any FOSS since starting my last job. And while I managed to get dfaust as a maintainer he is also pretty busy. And that is 3 years after announcing my intent of “EOL” as primary maintainer without any replacement ;)
Honestly inotify sounds so difficult to use that I do wonder if it would be easier to write a FUSE filesystem that fires callbacks for operations so that listening would be cheap and easy - but then, using FUSE would probably make reading/scanning files way more expensive, so that’s not a win for performance.
I just had an idea for bypassing FUSE most of the time that might speed this up.
For context: it’s been a few years since I’ve done serious kernel-level hacking for desktop OSes but day-to-day I work on relatively low-resource embedded systems (I’ve got like 1MB of RAM, which is quite spacious compared to lots of microcontrollers). Our stuff runs soft-real-time and we have to really carefully consider queue lengths and load shedding and things like that.
In the FUSE case, what would you expect to happen if the consumer of a callback got slow for whatever reason? If you’re going for guaranteed event delivery to the callback it seems like the only thing you could really do in that case would be to block the actual filesystem operation? Maybe you have a bounded queue so that a single slow callback doesn’t result in filesystem blocking, but eventually you’re either going to run out of RAM queuing events, blocking the filesystem because the notification receiver is being slow, or dropping queued events (resulting in lost notifications).
eh, probably backpressure. entire system slows down so writing to the FUSE FS gets slow.
edit: that said, I think with development workloads all you actually care about is “do I need to reload the dev server y/n?” and that doesn’t require precise tracking of events at all. you just need to know whether or not any file was written since the last time you restarted it.
Yeah thinking about this a little without having looked at the underlying API at all for inotify and friends… it almost seems like this could be level-triggered instead of edge-triggered/event-driven. Something similar to select/poll/epoll where you give it a list of files you want to be notified about.
Thanks! “dev tools actually want level-triggered instead of edge-triggered” is a good framing.
For restarting dev servers or unit tests, I’m thinking:
I think for dev servers the ideal API call might be like “please notify me if any file anywhere under ./src/ has an mtime greater than 23456789”.
edit: an idea to avoid memory exhaustion when there are too many events is to have adaptive imprecision. if I change src/engine/a.cpp, then src/engine/b.cpp in quick succession: if the events are being consumed promptly then emit both events individually, but if consumers are lagging then emit only a single “some things changed in src/engine” event.
Could be. The kernel manpage even says that it’s kinda expected that you really won’t be able to receive all events in time to react and not overrun buffers.. I have issues from people with thousands of files and loosing events to them. So it’s not even about “not missing a folder”.
I’m honestly pretty tired of it.
Yeah, it’s not a hard problem. Sadly the few times where I actually wanted or needed to develop on a remote system meant that I couldn’t do it because the local hardware was too anemic or locked down (WSL? Docker? Nope, you’ll get our standard office drone image plus a IntelliJ license, that’s it). And would you know, that also sometimes meant the dev servers couldn’t be easily modified (“I’d need to make sure that all the umpteen developers and test systems aren’t affected by this!”)
Almost makes me miss MFC dev work.
Wait, what? On a connection with 80ms of network latency if I pay close attention I can certainly notice the lag in an interactive shell session over SSH, but it’s still way below the threshold of being problematic for general use. But that’s in a setting where you’re paying the round-trip cost on the terminal echo of every keystroke. With TRAMP you’re only going to do that on file saves and buffer clean->dirty transitions (for metadata checks), right? I routinely use TRAMP between systems with network RTTs in that range without any difficulty whatsoever; I’m genuinely curious how it ended up being so much of a problem for you.
If I load or save a file, it blocking the UI for more than 80ms is well over my threshold of acceptability.
Mosh + Emacs running on the remote server works well for me on bad connections.
The Mosh website features a remote Emacs running Org, so I imagine this is a common usecase.
Yeah I tried that out for a while. I used to ride the Facebook bus 2-3 hours a day with rather flaky mobile internet – and mosh was a lot better, but still not great. At least back in the day, security folks had serious concerns about it too.
Watchman is great!
My only problem with it is that it’s too hard to build it. Nixpkgs follows updates of most packages, providing updates with short delay. However, for watchman they still have 2024.03 (the last is 2024.11). Even the release pages don’t have binary packages for most of the distributions.
Yeah you’re definitely not the only person to bring this up! Stay tuned.
I don’t think either approach is universally applicable. Sometimes you want to remotely edit some files on a small machine without deploying an extra editor binary to it (might not even have the disk space or memory to do so), sometimes the remote machine is more powerful and you want as much work as possible pushed there. Sometimes both in the same project (although I’m not sure I’ve seen a setup that can do that well yet).
The one issue is you need the server it deploys to support your platform. Not a problem for most people, but it is for me on my dayjob. I think TRAMP style can also break down too if it’s different from what it usually expects.
Yeah, same, though I’ve been working on porting Zed to illumos (wasmtime main now experimentally supports illumos :) )
Yeah, it broke a lot for me because of bad prompts etc., but the universality was a big advantage of this setup. You heard that lot in the vim/emacs arguments, where you couldn’t just install a modern vim with all the plugins on the target system, whereas you always had your “home” emacs.
But, well, that’s more the sysadmin perspective of the days of yore, I think. These days, it’s less likely that you also have to edit that /etc/sendmail.cf on that one weird Apollo Domain/OS server. And more likely that “remote” is on your own system or in a LAN. A lot more control, but also a lot more needs.
Thinking about it, didn’t Plan 9’s sam do that, too? I remember spreading some editor parts on various Unices in the early millennial years…
I get a pretty good experience doing remote dev stuff with plan9port Acme and sshfs.
I mount the remote folder I need on /mnt and then open it in Acme. I also open an acme terminal with Win in which I ssh to the remote server and cd to the mounted folder. From there any relative reference to a file inside the mounted folder can be opened in Acme through my local plan9port Plumber with a simple right click on it’s name.
I can also use stuff like grep and find directly on the sshfs mount point in acme, but that can be rather slow, so I do it through the ssh connection in the win window.
At some point I’d like to build an Acme program that takes a host and path then mounts that with sshfs on a mount point and proxies every command it gets through a ssh tunnel to the targeted host, that way I dont need to keep an extra Win around.