As a user/developer, why do I want to be a systems programmer? So I can write an OS? Sure, I guess, but that’s a very niche use case. So I can drive a microcontroller? Even more niche. So I can write the fastest code? Meh, my code is already fast enough. Honestly this is less compelling as a marketing message than “safety!”
I’ve been thinking about taking up Rust in order to write software synthesizer patches. These have a hard-realtime constraint in that they need predictable and low latency for what are sometimes very complex and expensive operations.
Generally I’d say any sort of DSP code is right in the line-of-sight for this sort of thing.
To me identifying as a “systems programmer” has always sounded more like a baseless superiority complex than anything else. I don’t need Rust to write “badass” code–this is a terrible selling point, and I hope it’s a minority position in the Rust community.
Anyway, I don’t see this “baseless superiority complex” among systems programmers I know.
I feel like I see it coming more from Google Go users than the Rust community, and I suspect it comes from something like “hm; this is really tedious and error-prone, but I guess that’s just because it’s systems programming which is inherently tough” when actually it’s just that they have a terrible type system.
I’ve been writing Rust for a long time now, so I don’t really get those “wow Rust is so awesome” moments any more. You know what I’m talking about; discovering that slick new technology for the first time is just such a wonderful high. But I got one recently.
Here is a somewhat simplified version of the problem I was trying to solve: I wanted to recursively iterate over a directory tree that yields file paths, but I only want to yield file paths that respect .gitignore (and other files like it, so I cannot just ask git to do it for me, and I didn’t want to assume that git was installed anyway). I also wanted to do it as fast as possible, because I wanted it to be the best. Finally, in the common case, this traversal happens when the entire directory tree is in cache, so this isn’t a process that I expect to be blocked on reading from disk.
The straight-forward way to do this is to get your friendly neighborhood recursive directory iterator, iterate over each path, and only yield it back to the caller if it shouldn’t be ignored according to the .gitignore files on disk. Recursive directory iterators can be really fast, but in this case, processing the ignore files can actually take quite a bit of CPU time! I was incredulous at first too, but it turns out some really popular projects have really crazy gitignore files. The type of crazy varies. Sometimes it’s all bundled up into one big 3,000 line file and sometimes they’re scattered across almost 200 different files. Not only does it take CPU power to match each file path against matchers built from these ignore files, but it also takes time to build the matchers themselves.
IMO, the next obvious choice was to parallelize directory traversal. There are a lot of details here, and I’ve already written too much, so I’ll just list some bullet points:
One way of structuring the parallelism is to create a pool of workers, one per thread, that churn through the directory tree. There’s no obvious way to split up the work up front because you don’t know what the tree looks like before hand. Therefore, the workers have the unusual property of being both producers and consumers. This makes graceful termination a little tricky.
A single .gitignore file should have a matcher built for it exactly once. Building big matchers isn’t necessarily cheap, so it’s important to only do it once to conserve CPU resources!
Determining whether a file path should be ignored or not may require visiting .gitignore files in parent directories.
Be careful to avoid symlink loops!
The iterator needs to provide a way to gracefully quit.
It needs to do proper error handling, e.g., return an error if a directory listing couldn’t be retrieved.
In short, once I had the idea, Rust made it super easy to write the code to implement this. There’s no unsafe and there are no locks in the happy path. I used a simple linked list implemented via Arc (i.e., thread safe shared ownership) to provide safe sharing of matchers across multiple threads.
Here’s the code. I wrote it pretty recently, and it’s my first draft, so it’s probably more complex than it needs to be. As a bonus, it’s now part of a crate that anyone can use! In fact, other projects are already using it.
You know, in the good old days, you’d be an application programmer or even analyst. :)
For what it’s worth, I kinda feel like the term “systems programmer” should be reserved for people who are writing operating systems, drivers, embedded software, or developing (as opposed to deploying) bespoke distributed systems. Then again, it doesn’t look as good on a resume if you can’t claim that title.
As a user/developer, why do I want to be a systems programmer? So I can write an OS? Sure, I guess, but that’s a very niche use case. So I can drive a microcontroller? Even more niche. So I can write the fastest code? Meh, my code is already fast enough. Honestly this is less compelling as a marketing message than “safety!”
I’ve been thinking about taking up Rust in order to write software synthesizer patches. These have a hard-realtime constraint in that they need predictable and low latency for what are sometimes very complex and expensive operations.
Generally I’d say any sort of DSP code is right in the line-of-sight for this sort of thing.
To me identifying as a “systems programmer” has always sounded more like a baseless superiority complex than anything else. I don’t need Rust to write “badass” code–this is a terrible selling point, and I hope it’s a minority position in the Rust community.
[Comment removed by author]
I feel like I see it coming more from Google Go users than the Rust community, and I suspect it comes from something like “hm; this is really tedious and error-prone, but I guess that’s just because it’s systems programming which is inherently tough” when actually it’s just that they have a terrible type system.
I’ve been writing Rust for a long time now, so I don’t really get those “wow Rust is so awesome” moments any more. You know what I’m talking about; discovering that slick new technology for the first time is just such a wonderful high. But I got one recently.
Here is a somewhat simplified version of the problem I was trying to solve: I wanted to recursively iterate over a directory tree that yields file paths, but I only want to yield file paths that respect
.gitignore
(and other files like it, so I cannot just askgit
to do it for me, and I didn’t want to assume thatgit
was installed anyway). I also wanted to do it as fast as possible, because I wanted it to be the best. Finally, in the common case, this traversal happens when the entire directory tree is in cache, so this isn’t a process that I expect to be blocked on reading from disk.The straight-forward way to do this is to get your friendly neighborhood recursive directory iterator, iterate over each path, and only yield it back to the caller if it shouldn’t be ignored according to the
.gitignore
files on disk. Recursive directory iterators can be really fast, but in this case, processing the ignore files can actually take quite a bit of CPU time! I was incredulous at first too, but it turns out some really popular projects have really crazygitignore
files. The type of crazy varies. Sometimes it’s all bundled up into one big 3,000 line file and sometimes they’re scattered across almost 200 different files. Not only does it take CPU power to match each file path against matchers built from these ignore files, but it also takes time to build the matchers themselves.I did my best to make the matchers themselves fast. There are seven different matching strategies, and they’re all tucked away behind an abstraction that anyone can use. But still, it wasn’t fast enough! Directory traversal was still taking up a large portion of my application’s CPU profile.
IMO, the next obvious choice was to parallelize directory traversal. There are a lot of details here, and I’ve already written too much, so I’ll just list some bullet points:
.gitignore
file should have a matcher built for it exactly once. Building big matchers isn’t necessarily cheap, so it’s important to only do it once to conserve CPU resources!.gitignore
files in parent directories.In short, once I had the idea, Rust made it super easy to write the code to implement this. There’s no unsafe and there are no locks in the happy path. I used a simple linked list implemented via
Arc
(i.e., thread safe shared ownership) to provide safe sharing of matchers across multiple threads.Here’s the code. I wrote it pretty recently, and it’s my first draft, so it’s probably more complex than it needs to be. As a bonus, it’s now part of a crate that anyone can use! In fact, other projects are already using it.
And yes, it’s fast. Like really fast. :-)
You know, in the good old days, you’d be an application programmer or even analyst. :)
For what it’s worth, I kinda feel like the term “systems programmer” should be reserved for people who are writing operating systems, drivers, embedded software, or developing (as opposed to deploying) bespoke distributed systems. Then again, it doesn’t look as good on a resume if you can’t claim that title.
[Comment removed by author]
The fireflower is the ability to: