1. 34
  1. 7

    i had a similar experience a few months ago: https://mastodon.technology/@robey/104635185018008699 – wasn’t worth a blog post, so i’ll just paste it below:

    assemblyscript piqued my interest, so yesterday i spent some time porting my buz-hash (rsync-style file chunking) implementation into wasm via assemblyscript. it was 50% faster than pure typescript!

    then, today, on a hunch, i took the assemblyscript version and back-ported it to typescript again. 50% faster than wasm!

    here’s why: the original version byte-scanned buffers and did a lot of slice and concat to keep the logic simple. the hash itself does a lot of bit operations like “rotate”, which are awkward in js. for wasm, passing memory across the js/wasm border is the awkward part, so i tuned the algorithm to track more internal state, receive a buffer, and return “you should cut it here”. no slice or concat.

    but if that optimization matters a lot, js can do it too. and it won’t have to copy a buffer into the wasm sandbox. turns out that makes it even faster, because the copying was now (always?) the bottleneck.

    9_521_943’th piece of evidence that the slow code is never what you thought it was.

    1. 2

      9_521_943’th piece of evidence that the slow code is never what you thought it was.

      Yesterday, I was debugging a performance problem with some code. Came up with a potential solution. It was faster, but on the server it wasn’t quite as fast as I expected. I started poking around the server and it turns out it had a hard disk instead of an ssd. Spun up a server with an ssd, tested the code and found it was about as fast as I expected. Ran a real workload on the server and got a 4.5x speed up on the storage heavy section, 2x speed up overall. If we add some of the changes I have been testing speed up for the storage heavy section could be as much as 10-23x (depending on the option).

      Goes to show that (1) you need to benchmark and (2) you need to benchmark where you are actually going to run the code.

      1. 1

        It’s pretty impressive to me that WASM runtimes can come close to JIT for a high level language.

      2. 6

        It’s interesting (and heartening) to note in practice that, in contrast to the black magic that optimizing JS occasionally involves, the optimizations applied to the code generating Webassembly were the same things that would help on any strongly typed language compiling to native code. Remove unnecessary bounds checks, tune things for faster allocation, experiment with different compiler optimization settings, see why the compiler is generating sub-optimal code, etc.

        Also interesting to see that JS was generally at the top of the heap for the “naive” tests, but well-generated Webassembly could get significantly better performance than JS with some work. These are all still microbenchmarks, but this is a fun test.

          1. 1

            I think my conclusion is that javascript jits are magic performance pixie dust so they compare favorably to low level assembly.