1. 25

  2. 3

    I was curious about that typedarrays part, but at least in my browser it’s the expected way around for every step: (i5-9300H, FF, linux)

     benchmarking vec add of size 4
       pure javascript:         15384615.38 iters/sec (0.74 GB/s)
       typed arrays:            19230769.23 iters/sec (0.92 GB/s)
     benchmarking vec add of size 131072
       pure javascript:         2820.08 iters/sec (4.44 GB/s)
       typed arrays:            3835.68 iters/sec (6.03 GB/s)

    Not an intel thing either - FF on Android:

    benchmarking vec add of size 4
       pure javascript:         5000000 iters/sec (0.24 GB/s)
       typed arrays:            9090909.09 iters/sec (0.44 GB/s)

    Looks like a V8 issue. Same Android on Chrome:

    benchmarking vec add of size 4
       pure javascript:         7733952.12 iters/sec (0.37 GB/s)
       typed arrays:            4940711.57 iters/sec (0.24 GB/s)

    Maybe worth reporting as a bug?

    1. 1

      ah, interesting. Testing it out now, SpiderMonkey (Firefox’s engine) is definitely showing consistent results for me as well

      Good catch I’ll add this to the writeup!

    2. 2

      @bwasti I wonder if plain JS array performs so well because the values are always 0? Perhaps if you actually stuff it with well distributed floating point numbers, TypedArray will be more competitive? I figure you are comparing TyoedArray float math to JIT’d int math.

      1. 3

        that’s a good point, the values aren’t zero (set to random values https://github.com/bwasti/wasmblr/blob/main/emscripten_example/benchmark.js#L125-L139), but they aren’t changing at all, which V8 may leverage to some degree.

        I’d suspect that if the JIT knew it was genuinely a no-op the measured “performance” at that point would be much higher.

        1. 1

          It’s good that you’re using random floats as modern JS engines distinguish between arrays of ints, floats, etc to shortcut various checks and conversions. It would be interesting to compare int vs float array, but you’d probably needed to change downstream code to avoid measuring int->float promotion on load. These days I have no idea of the cost of float conversion to loads however :)

      2. 1

        I’m not sure if it’s good or bad that there are so many different ways of interacting with / generating WebAssembly. I’ve been evaluating them for a side project, and I hadn’t even heard of Wasmblr.

        The main thing I learned from this post is that the tool can actually have quite a big effect on performance, because they have different models for the code generation. e.g. emscripten creates a malloc implementation that uses WebAssembly linear memory, but wasemblr allows you to generate WebAssembly vs. just compile your implementation, so you can control the memory allocation more directly. Super interesting approach. I’m not an emscripten expert, so I wonder if there are similar features in it as well?

        1. 1

          I really like your take away. I think that question really gets to the heart of what I was going for!

          With respect to your exact example about memory allocation, Emscripten actually does expose the entire heap (for example, check out Module.HEAPF32).

          That being said, the way memory is used by the compiled code is not exposed (you just have to trust LLVM will do a good job). It’s a classic tradeoff between high and low level coding.

          However, there are more ways in which wasmblr (or any fast-to-recompile tool) can come in handy. If you determine (at runtime) that a certain value will always live at an address in memory, you can hard code that value directly into your web assembly. These types of situations are somewhat common: e.g. if a user resizes their window, you may want to use that size directly for rendering a bit faster.