I’ve been working on and off on a client/server version of this, with a persistent daemon process taking care of the database. I feel that would be a better design, as the work performed by the client after each key press would be really minimal. That daemon could also eventually be used for automating other parts of the CLI experience.
I did consider this, but I didn’t want an entire process running on my computer for the sole purpose of serving me
cd suggestions. Moreover, given how fast
zoxide already is, I doubt that a client-server model would be a massive improvement.
You’re right, I’m sure it’s fast enough. The main point would be to add more features in addition to jumping: a central daemon that talks to all your shells simultaneously could certainly help for many other things.
Yeah, I’ve thought about this as well. Just couldn’t be bothered to figure out all the possible security implications, and never wanted it bad enough. But it could be the sort of generic project that opens up a lot of possibilities we don’t even see yet… Also, who doesn’t love over-engineering stuff ? :-)
Right now I was thinking client/server on the same machine, using a UNIX socket for communication, so the security implications aren’t big. But maybe you’re right, and we could imagine a cloud-based version!
I’ll try to ping you if I ever put something on GitHub :)
Yeah, please do. I shudder to think what the reaction of some folks on HN or here or reddit would be if you announce ‘I made a client-server version of
I will paste my compiled comments from the orange site (which were not so coherent, since I was typing them on my phone). The problem is that the article compares apples to oranges, the Darwin version of wc calls a multi-byte character function taking up a significant amount of time, whereas the Go version does not do anything related to multi-byte characters.
Take the Darwin version linked from the article. Run
perf record wc thefile.txt. Then run
perf report and you will see
iswspace in the call graph. This function tests for white space in wide characters.
Replace the line
if (wch == L' ' || wch == L'\n' || wch == L'\t' || wch == L'\v' || wch == L'\f')
And I get a ~1.7x speedup:
$ time ./wc ../wiki-large.txt 854100 17794000 105322200 ../wiki-large.txt ./wc ../wiki-large.txt 0.47s user 0.02s system 99% cpu 0.490 total $ time ./wc2 ../wiki-large.txt 854100 17794000 105322200 ../wiki-large.txt ./wc2 ../wiki-large.txt 0.28s user 0.01s system 99% cpu 0.293 total
Remove unnecessary branching introduced my multi-character handling . This actually resembles the Go code pretty closely. We get a speedup of 1.8x.:
$ time ./wc3 ../wiki-large.txt 854100 17794000 105322200 ../wiki-large.txt ./wc3 ../wiki-large.txt 0.25s user 0.01s system 99% cpu 0.267 total
If we take the second table from the article and divide the C result (5.56) by 1.8, the C performance would be ~3.09, which is faster than the Go version (3.72). And why would the C version be ~2x slower than the non-parallelized Go version? They would basically compile to the same small state machine in machine code.
Edit: I took the Go version from the webpage and ran that on the same data as well:
$ time ./wcgo ../wiki-large.txt 854100 17794000 105322200 ../wiki-large.txt ./wcgo ../wiki-large.txt 0.32s user 0.02s system 100% cpu 0.333 total
So, the C version is indeed faster when removing this piece of multi-byte character handling.
This is a lot clearer, thank you! I ran your benchmarks and this is what I got on my machine:
100 MB: 0.33 s, 2032kB
1 GB: 3.23 s, 2032 KB
This is indeed slightly faster than the non-parallelized Go version, although it still uses more memory! It does seem strange that it has been written this way.
Also, I have updated the post to remove the dependency on
fmt and to stop manually setting
GOMAXPROCS, which has improved performance and memory consumption significantly. You should check it out, I think you’ll find it interesting.
Additionally, I made the source code available here: https://github.com/ajeetdsouza/blog-wc-go
All of these articles are frustrating because they use different environments and test sets and none of the ones I’ve read have posted the test sets up. Some people use random characters, some people use existing files. Some people use files of 1 MiB, some 100 MiB, some several GiB in size. Not only that, but the people programming the replacements don’t even normalize for the difference in machine/processor capability by compiling the competitors and GNU
wc from scratch. The system
wc is likely to be compiled differently depending on your machine. The multithreaded implementations are going to perform differently depending on if you’re running Chrome when you test the app or not, etc.
This would easily be solved by using the same distribution as a live USB, sharing testing sets, and compiling things from scratch with predefined options, but nobody seems to want to go to that much effort to get coherent comparisons.
I tested this on a fresh install of Fedora 31, so I didn’t really see any benefit of running it on a LiveUSB. As I mentioned in the article, the wc implementation I used for comparison has been compiled locally with
gcc 9.2.1 and
-O3 optimizations. I’ve also listed my exact system specifications there. I’ve used the unprocessed enwik9 dataset (Wikipedia dump), truncated to 100 MB and 1 GB.
I understand your frustrations with the previous posts, but I’ve tried to make my article as unambiguous as possible. Do give it a read, if you have any futher suggestions or comments, I’d be happy to hear them!
These posts are all pointless because
wc doesn’t represent high performance C code. If anyone cared about optimizing
wc, it would use SIMD extensions like AVX to count multiple chars per cycle and trash all these blog posts (edit: apparently someone did, see this post on lobste.rs).
The real take away: all these languages are fast enough for general purpose use, because they beat a C program that everyone considers fast enough for general purpose use.
Someone did write a C version with SIMD. They got a ~100x speedup over
So they’re not pointless. The value in theses posts (this one included) is that they describe how to solve and optimize a problem in various languages.
Most fail to mention the test setup’s locale as well! GNU’s version of wc, at least, uses the definition of “whitespace” from your current locale when counting words, while the linked Go implementation hard-codes whitespace as “ \t\n\r\v\f”. Whether this impacts speed and/or correctness depends on your locale.
As mentioned in the article, the test files are us-ascii encoded. I’m comparing with the OS X implementation, not GNU, and I have used the same definition of whitespace as them. I didn’t mention this in the post for the sake of brevity.
Well, unless you remove the multi-character white space function call (
iswsspace) from the C version and replace it by an if/case statement as in the Go version. Then the C version is faster than the Go version, though by a small margin (they probably compile to similar machine code):
Still, the parallelization done in Go is nice!
loxcraft started off as yet another implementation of Crafting Interpreters, but I decided to take it up a notch and build:
Also, it’s really fast! Most implementations of Lox turn out to be significantly slower than the one in C, but this one comes really close. Suggestions for how to improve performance further would be highly appreciated!
Nicely done, @ajeetdsouza!
I’m currently working through the first implementation in the book (but in Go) and take this as a personal attack. You’re showing me up!
But seriously, this is really impressive! Nice work.