1. 17
  1. 9

    This is very cool, but note that it is a microbenchmark comparing the overhead of calling plus(int, int). This is a very specific case of FFI that is easy and simple.

    For Oil, I care more about moving strings back and forth across the VM boundary (not to mention bigger objects like records and arrays of records). There are many more design choices in that case, and I suspect the results will also look different.

    1. 2

      This was my thought as well, anything more complex than basic types or char*, you’re essentially serializing/deserializing with all the performance problems that entails.

      1. 3

        Hm, I don’t quite see what you’re getting at, because

        1. You don’t have to serialize complex data types to move them across the VM boundary. For example, you can move a C struct to a Python dict or a Lua table with a series of API calls, without serializing (i.e. integers and floats more or less retain their native representation; they’re just wrapped). It generally involves copying strings, but both of those languages provide various complex ways to avoid copying (buffer protocol, light user data).

        2. I don’t think of serialization as slow. In fact sometimes I have serialized data to protobufs and moved them across the VM boundary as a single string, rather than constructing a series of complex function calls, and doing incref/decref in Python, etc.

        I think tracking ownership can be more inefficient than serializing and copying, since it’s a global algorithm (i.e. causes contention among threads), and people invent all sorts of hacks to try to make it easier (layers like SWIG). More layers is more inefficient.

        Anyway I don’t have numbers to back that up, which is what I would have liked to see here. But it’s pretty difficult to measure that, since these issues are situated in complex programs, with big library bindings and complex call patterns.

        1. 1

          You don’t have to serialize complex data types to move them across the VM boundary. For example, you can move a C struct to a Python dict or a Lua table with a series of API calls, without serializing

          My contention is that this is effectively serialization, only without the intermediate string (or whatever) representation; each element of the data is transformed from host to c and c to host. Similar problem arises in micro kernel architectures and IPC, not being able to push/pop data direcly onto a process stack for example.

    2. 5

      Congratulations to Lua, Zig, and Rust on being in C’s territory. Lua actually beat it. Nim and D are nearly where C++ is but not quite. Hope Nim closes that gap and any others given its benefits over C++, esp readability and compiling to C.

      1. 1

        To be clear, and a little pedantic, Lua =/= Luajit.

        1. 1

          The only thing I know about Lua is it’s a small, embeddable, JIT’d, scripting language. So, what did you mean by that? Do Lua the language and LuaJIT have separate FFI’s or something?

          1. 5

            I think just that there are two implementations. One is just called “Lua”, is an interpreter written in C, supposedly runs pretty fast for a bytecode interpreter. The other is LuaJIT and runs much faster (and is the one benchmarked here).

            1. 1

              I didn’t even know that. Things I read on it made me think LuaJIT was the default version everyone was using. Thanks!

                1. 2

                  I waited till I was having a cup of coffee. Wow, this is some impressive stuff. More than I had assumed. There’s a lot of reuse/blending of structures and space. I’m bookmarking the links in case I can use these techniques later.

                2. 2

                  I think people when doing comparative benchmarks very often skip over the C Lua implementation because it isn’t so interesting to them.

              1. 4

                Extra context: LuaJIT isn’t up to date with the latest Lia either, so they’re almost different things, sorta.

                LuaJIT is extremely impressive.

          2. 4

            I’d like to try this to add support for another language, but holy hell this is a pain in the arse; I had to install yet another build tool, Tup (why does it need libfuse?) and installing the languages in the way that the author expected them to be is hard; especially when they bark at you for trying to use features on the stable channel (Rust….) or don’t even compile. (Zig….)

            1. 3

              Ok, I managed to do it. It wasn’t too bad in the end. (I’m seeing some… interesting numbers though.)

              1. 1

                Sleepy but I’ll try to look at this. The faster languages are pretty close in yours like original. There’s a near four-times or so multiplier for Java in original. C# is around three-times in your build with Java around four again. Then, Go was 30x times in theirs but around 5x in yours. That’s the biggest difference I see if looking at things in ratios.

                1. 2

                  You misread the go results. It is more like 49x in @calvin’s benchmark results