Threads for ajeetdsouza

  1. 11

    loxcraft started off as yet another implementation of Crafting Interpreters, but I decided to take it up a notch and build:

    • syntax highlighting, via tree-sitter
    • an online playground, via WebAssembly
    • IDE integration, via Language Server Protocol
    • an REPL
    • really good error messages

    Also, it’s really fast! Most implementations of Lox turn out to be significantly slower than the one in C, but this one comes really close. Suggestions for how to improve performance further would be highly appreciated!

    1. 3

      Nicely done, @ajeetdsouza!

      1. 1

        I’m currently working through the first implementation in the book (but in Go) and take this as a personal attack. You’re showing me up!

        But seriously, this is really impressive! Nice work.

      1. 1

        I think there are copy-and-paste errors in the instructions for installing via antigen and zplug.

        1. 2

          You’re right, I fixed it now. In my defence, it was late at night and I was half asleep!

        1. 1

          I’ve been working on and off on a client/server version of this, with a persistent daemon process taking care of the database. I feel that would be a better design, as the work performed by the client after each key press would be really minimal. That daemon could also eventually be used for automating other parts of the CLI experience.

          1. 1

            I did consider this, but I didn’t want an entire process running on my computer for the sole purpose of serving me cd suggestions. Moreover, given how fast zoxide already is, I doubt that a client-server model would be a massive improvement.

            1. 2

              You’re right, I’m sure it’s fast enough. The main point would be to add more features in addition to jumping: a central daemon that talks to all your shells simultaneously could certainly help for many other things.

            2. 1

              Yeah, I’ve thought about this as well. Just couldn’t be bothered to figure out all the possible security implications, and never wanted it bad enough. But it could be the sort of generic project that opens up a lot of possibilities we don’t even see yet… Also, who doesn’t love over-engineering stuff ? :-)

              1. 1

                Right now I was thinking client/server on the same machine, using a UNIX socket for communication, so the security implications aren’t big. But maybe you’re right, and we could imagine a cloud-based version!

                I’ll try to ping you if I ever put something on GitHub :)

                1. 2

                  Yeah, please do. I shudder to think what the reaction of some folks on HN or here or reddit would be if you announce ‘I made a client-server version of cd’! :-)

                  1. 1

                    Lol, I’m old enough not to care anymore :)

            1. 5

              I will paste my compiled comments from the orange site (which were not so coherent, since I was typing them on my phone). The problem is that the article compares apples to oranges, the Darwin version of wc calls a multi-byte character function taking up a significant amount of time, whereas the Go version does not do anything related to multi-byte characters.

              Take the Darwin version linked from the article. Run perf record wc thefile.txt. Then run perf report and you will see iswspace in the call graph. This function tests for white space in wide characters.

              Replace the line

                  if (iswspace(wch))
              

              by

                  if (wch == L' ' || wch == L'\n' || wch == L'\t' || wch == L'\v' || wch == L'\f')
              

              And I get a ~1.7x speedup:

              $ time ./wc ../wiki-large.txt
                854100 17794000 105322200 ../wiki-large.txt
              ./wc ../wiki-large.txt  0.47s user 0.02s system 99% cpu 0.490 total
              $ time ./wc2 ../wiki-large.txt                     
                854100 17794000 105322200 ../wiki-large.txt
              ./wc2 ../wiki-large.txt  0.28s user 0.01s system 99% cpu 0.293 total
              

              Remove unnecessary branching introduced my multi-character handling [1]. This actually resembles the Go code pretty closely. We get a speedup of 1.8x.:

              $ time ./wc3 ../wiki-large.txt
                854100 17794000 105322200 ../wiki-large.txt
              ./wc3 ../wiki-large.txt  0.25s user 0.01s system 99% cpu 0.267 total
              

              If we take the second table from the article and divide the C result (5.56) by 1.8, the C performance would be ~3.09, which is faster than the Go version (3.72). And why would the C version be ~2x slower than the non-parallelized Go version? They would basically compile to the same small state machine in machine code.

              Edit: I took the Go version from the webpage and ran that on the same data as well:

              $ time ./wcgo ../wiki-large.txt
              854100 17794000 105322200 ../wiki-large.txt
              ./wcgo ../wiki-large.txt  0.32s user 0.02s system 100% cpu 0.333 total
              

              So, the C version is indeed faster when removing this piece of multi-byte character handling.

              [1] https://gist.github.com/danieldk/f8cdaed4ba255fb2954ded50dd2931ed

              1. 2

                This is a lot clearer, thank you! I ran your benchmarks and this is what I got on my machine:

                100 MB: 0.33 s, 2032kB

                1 GB: 3.23 s, 2032 KB

                This is indeed slightly faster than the non-parallelized Go version, although it still uses more memory! It does seem strange that it has been written this way.

                Also, I have updated the post to remove the dependency on fmt and to stop manually setting GOMAXPROCS, which has improved performance and memory consumption significantly. You should check it out, I think you’ll find it interesting.

                Additionally, I made the source code available here: https://github.com/ajeetdsouza/blog-wc-go

              1. 18

                All of these articles are frustrating because they use different environments and test sets and none of the ones I’ve read have posted the test sets up. Some people use random characters, some people use existing files. Some people use files of 1 MiB, some 100 MiB, some several GiB in size. Not only that, but the people programming the replacements don’t even normalize for the difference in machine/processor capability by compiling the competitors and GNU wc from scratch. The system wc is likely to be compiled differently depending on your machine. The multithreaded implementations are going to perform differently depending on if you’re running Chrome when you test the app or not, etc.

                This would easily be solved by using the same distribution as a live USB, sharing testing sets, and compiling things from scratch with predefined options, but nobody seems to want to go to that much effort to get coherent comparisons.

                1. 7

                  I tested this on a fresh install of Fedora 31, so I didn’t really see any benefit of running it on a LiveUSB. As I mentioned in the article, the wc implementation I used for comparison has been compiled locally with gcc 9.2.1 and -O3 optimizations. I’ve also listed my exact system specifications there. I’ve used the unprocessed enwik9 dataset (Wikipedia dump), truncated to 100 MB and 1 GB.

                  I understand your frustrations with the previous posts, but I’ve tried to make my article as unambiguous as possible. Do give it a read, if you have any futher suggestions or comments, I’d be happy to hear them!

                  1. 8

                    These posts are all pointless because wc doesn’t represent high performance C code. If anyone cared about optimizing wc, it would use SIMD extensions like AVX to count multiple chars per cycle and trash all these blog posts (edit: apparently someone did, see this post on lobste.rs).

                    The real take away: all these languages are fast enough for general purpose use, because they beat a C program that everyone considers fast enough for general purpose use.

                    1. 8

                      Someone did write a C version with SIMD. They got a ~100x speedup over wc.

                      1. 5

                        So they’re not pointless. The value in theses posts (this one included) is that they describe how to solve and optimize a problem in various languages.

                    2. 3

                      Most fail to mention the test setup’s locale as well! GNU’s version of wc, at least, uses the definition of “whitespace” from your current locale when counting words, while the linked Go implementation hard-codes whitespace as “ \t\n\r\v\f”. Whether this impacts speed and/or correctness depends on your locale.

                      1. 5

                        As mentioned in the article, the test files are us-ascii encoded. I’m comparing with the OS X implementation, not GNU, and I have used the same definition of whitespace as them. I didn’t mention this in the post for the sake of brevity.

                        1. 3

                          Well, unless you remove the multi-character white space function call (iswsspace) from the C version and replace it by an if/case statement as in the Go version. Then the C version is faster than the Go version, though by a small margin (they probably compile to similar machine code):

                          https://lobste.rs/s/urrnz6/beating_c_with_70_lines_go#c_flic9z

                          Still, the parallelization done in Go is nice!