Except the intermediate values are stack-allocated in Rust and C as well. This article claims that stack-allocated data is faster than heap-allocated as well, which is dubious without further clarification. malloc is indeed slower than pushing to the stack, and the stack will almost certainly be in cache, but this program is small enough that its working set will always be in cache so there should be minimal difference.
I’m curious what the modular function is doing in Rust - the C implementation just uses a simple (n % 2) == 0 while the Rust code does ((n % 2) + 2) % 2 and it’s not clear why.
Yeah, there’s a bunch of different speculation going on in that thread.
Regardless, sometimes Rust will not be literally the fastest possible thing. That’s okay, as long as it’s close. I find these little examples interesting, regardless of who “wins”.
Something the other comments aren’t touching on is that what’s slow about functional programming has nothing to do with a collatz function. These are just integers, so the types will always be copied in and out of registers instead of mutated (probably, depends on what the compiler feels like doing). What’s slow about ‘functional programming’ is when you have larger datatypes and you choose to make an entirely new one instead of an old one. What’s slow is when the idiomatic datastructure is the linked list which wreaks havoc on your cache, or other tree based persistent datastructures. What’s slow is the abuse of dynamically dispatched first class functions, which compilers go to great lengths to avoid but is sometimes unavoidable.
The only way to make functional programming fast, as a paradigm, is to really commit to linear/affine types (like BOTH Rust and ATS have, actually), so the compiler has enough information to determine that it can reuse the memory space.
I’m a secret Standard ML fan and I’ve been using Rust since late mid 2014 so of course I like ATS. :) I just find the “point” of the article pretty misleading.
ATS is a statically typed programming language that unifies implementation with formal specification. It is equipped with a highly expressive type system rooted in the framework Applied Type System, which gives the language its name. In particular, both dependent types and linear types are available in ATS.
I was trying to get this to run on my machine, but I was running into issues on Arch Linux (something having to do with Stack and Cabal? I’m not a Haskell guy). One thing I noticed was that there’s no --release flag on the cargo command - the optimizations for Rust really do make a difference. Could someone add that argument to shake.hs and see what the speeds look like?
Except the intermediate values are stack-allocated in Rust and C as well. This article claims that stack-allocated data is faster than heap-allocated as well, which is dubious without further clarification.
malloc
is indeed slower than pushing to the stack, and the stack will almost certainly be in cache, but this program is small enough that its working set will always be in cache so there should be minimal difference.I’m curious what the
modular
function is doing in Rust - the C implementation just uses a simple(n % 2) == 0
while the Rust code does((n % 2) + 2) % 2
and it’s not clear why.((n % 2) + 2) % 2 properly handles negatives, though it shouldn’t matter here.
I can’t wait for “faster than rust” to be the new meme.
Rust isn’t heap allocating anything.
And C isn’t either.
I haven’t had the time to dig into the codegen, but I posted it to /r/rust; we’ll see what they say :)
EDIT:
https://www.reddit.com/r/rust/comments/7m99wo/outperforming_rust_with_functional_programming/drsbori/
I doubt that, this are unmodified results (on my machine):
And here is after replacing i64 with u64:
Yeah, there’s a bunch of different speculation going on in that thread.
Regardless, sometimes Rust will not be literally the fastest possible thing. That’s okay, as long as it’s close. I find these little examples interesting, regardless of who “wins”.
I really like how the Rust community is approaching this as “how can we do better?”
Something the other comments aren’t touching on is that what’s slow about functional programming has nothing to do with a collatz function. These are just integers, so the types will always be copied in and out of registers instead of mutated (probably, depends on what the compiler feels like doing). What’s slow about ‘functional programming’ is when you have larger datatypes and you choose to make an entirely new one instead of an old one. What’s slow is when the idiomatic datastructure is the linked list which wreaks havoc on your cache, or other tree based persistent datastructures. What’s slow is the abuse of dynamically dispatched first class functions, which compilers go to great lengths to avoid but is sometimes unavoidable.
The only way to make functional programming fast, as a paradigm, is to really commit to linear/affine types (like BOTH Rust and ATS have, actually), so the compiler has enough information to determine that it can reuse the memory space.
I’m a secret Standard ML fan and I’ve been using Rust since late mid 2014 so of course I like ATS. :) I just find the “point” of the article pretty misleading.
Never heard of ATS, and it’s not linked to in the post. Here it is:
http://www.ats-lang.org
How was the C function tested? With/without -O2 makes a difference, and using unsigned integers also makes a difference.
collatz_c, linux/amd64, gcc 6.3.0 without optimizations:
collatz_c, linux/amd64, gcc 6.3.0, -O2:
collatz_c, linux/amd64, gcc 6.3.0, -O2, s/int/unsigned int/:
I was trying to get this to run on my machine, but I was running into issues on Arch Linux (something having to do with Stack and Cabal? I’m not a Haskell guy). One thing I noticed was that there’s no
--release
flag on the cargo command - the optimizations for Rust really do make a difference. Could someone add that argument toshake.hs
and see what the speeds look like?