From my humble point of view, this comparison doesn’t make that much sense. The C++ and Node.js code being used does a lot less than the code from Elixir’s Phoenix. It would have made a little more sense to compare the C++ or Node.js code against Cowboy’s websocket implementation. On the other hand, I think that measuring the deviation is very important in this type of benchmarks. Also, I would like to see how the Erlang/Elixir vm is being started. For example: is kernel polling being used? As the JVM, Erlang has a lot of tunning options and in general the defaults are not so great. Also, from what I have seen the code is using Poison, an Elixir JSON coder/decoder implementation. For something like this I would probably use the Jiffy NIF (C code instead of Elixir).
Anyway, I think that Clojure, Go and Erlang/Elixir support for writting websocket servers is top notch.
At last I recommend that you watch this talk by Gil Tene, CTO of Azul (creators of the Zing JVM implementation) called How NOT to Measure Latency
more sense to compare the C++ or Node.js code against Cowboy’s websocket implementation
Rust did better than C++ in my tests, but Haskell did dramatically better than all of them. From what I know of Erlang message send overhead, I do not think Erlang can win a broadcast benchmark without cheating and looping it through C FFI.
I just posted a write-up and a Github PR. It was not close. I’ve used Clojure in production. Tuning can only go so far here, http-kit tends to be pretty good without a lot of tweaking to begin with anyway. I don’t think tuning would help much in Erlang either unless you found the make-broadcasting-messages-to-fifty-thousand-processes-fast flag :)
A reddit conversation on the tradeoffs of Rust’s stack model vs. GHC’s
So, to be clear, this is a post about the threads in the standard library. Rust itself doesn’t mandate anything specific about threads, and since it’s low-level, you can do anything. See libfringe, for example:
libfringe does context switches in 3ns flat on x86 and x86_64!
And, soon, tokio. The initial work there is extremely promising.
How to know someone isn’t really doing a lot of golang programming; throwing the unlock after the critical section without a defer.
Yeah, for these shootouts to be worthwhile you need to have an expert or two in each language implement that language’s solution.
Or someone who knows how the allocator works in Go?
No need to create extra garbage, and in a 200k+ loc golang codebase, my previous company banned using defer entirely.
You don’t always have to use defer, but banning it seems insane to me. If a little extra garbage is unacceptable for your service, apply targeted fixes after profiling, or Go is not the right language for your tasks.
We banned defer for many reasons, defer also has serious safety implications when used with Unlock along with go’s panic/recover(), you can essentially get corruption in memory when go runs off and pops up the stack on a panic while allowing other threads to just stay executing. That was the real reason we banned it. I find it to be the least elegant and least thought out part of the language. I hope for significant changes in panic, defer, and recover in 2.0
That’s an aside though – this post is about a shootout! I hope anyone who codes for a language shootout attempts to reason about their memory usage/gc behavior in tight loops.
If you’re using garbage collected language and you’re not reasoning about garbage collection, then you’re being sloppy. It’s up to you whether that’s acceptable or not.
Finally - i disagree wholeheartedly with your comments about targeted fixes. In my career it has rarely been the right choice, and performance has been the #1 reason why groups I’ve worked in have had to rewrite huge portions of large systems.
Do you know if there’s an issue tracking this problem or an otherwise more detailed description of what’s happening?
Don’t know about their issues / bug tracker. I have repro’d the issue for you here though: https://play.golang.org/p/fMEWFPPr6r
I had to download it locally for it to actually show the issue:
; go run test.go
thread1 val++: 1
val = 1
thread2 val++: 2
panic: bad [recovered]
goroutine 5 [running]:
created by main.main
exit status 2
Now - I recognize that races are part of MP programming, and that this example is a little crazy, but I hope it illustrates the point that using panic as both os.Exit and as an exception stack leads to unexpected behavior unless you’re careful. Not to mention the oddity that the community of go writes if _, err := blah(); err != nil all day, but for some reason also wanted a panic that didn’t panic but excepted. In sufficiently bogged down programs, you can complete writes out to the network before a panic has completed.
if _, err := blah(); err != nil
I guess I’m having trouble groking the actual issue. Was the expected behavior to not print thread2 vall++: 2? If so, that doesn’t seem quite right to me. The mutex is unlocked before thread1 re-panics inside the defer.
thread2 vall++: 2
but I hope it illustrates the point that using panic as both os.Exit and as an exception stack leads to unexpected behavior unless you’re careful.
I agree that relying on panics for os.Exit behavior seems weird (and non-idiomatic).
I guess what I’m having trouble understanding is:
In a large codebase with a significantly tall call stack, the defer Unlock will be far away and not obvious, and leads to bad accidental bugs, especially with how often the stdlib calls panic. This was a reduced example with two threads just to show the interaction I was trying to describe.
I’m using the term “corruption” in the logical sense: the store to val occurring after the panic was called before to me is an unintended (although explainable) write to memory.
“non-idiomatic”.. look the code I wrote has a bug in it, yes. But if you assume your program will randomly pop up exceptions, then you realistically cannot put Unlock in a defer in a multi-threaded program without stores to sensitive memory (the example in the shootout I might add).
Isn’t golang.org/x/net/websocket concurrency safe? Do you need the locks then?
Here is a discussion about the memory footprint of the Elixir implementation.
It was interesting to see C++ in here.