The author states they are mostly talking about client-side development. And they give two reasons:
The author never actually proves either of these statements though. For example: do cache misses actually matter for most client-side applications? Given a lot of client-side apps require responsiveness relative to a human, is a cache miss going to make or break the application and seem slow? I’m not convinced.
It could be that the author is entirely correct, they just have no proven themselves. The author never really mentions what “slow” means. When it comes to performance: you MUST use evidence. Performance claims without evidence have been a huge source of wasted time in development. There are very few performance problems that have an umbrella reason one can just point to and say “never do that”. Systems are just too complicated for those kind of basic statements.
I think “responsiveness” is the wrong term. “noticable” is important.
Given that many rendering engines (both for desktop and for games) struggle with jittering that is noticeable by humans, even if running at high framerates, I am inclined to agree with the author. I’d prefer if he worked it out more as well.
And it doesn’t have to be a high-end game. I currently have the issue that the text input box of discourse has a noticeable lag. It’s very short, but it definitely hangs behind typing.
I currently have the issue that the text input box of discourse has a noticeable lag. It’s very short, but it definitely hangs behind typing.
But that doesn’t necessarily have anything to do with using a “high level language” or not. Plenty of C++ code is slow because it’s badly written. Plenty of Java code is fast (once it’s been JITd) because it’s well written.
Is it fast, or is it predictably fast, without hitches? The problem with Java is that you cannot expect any piece of code to run in a predictable time frame, as it may be stopped by the garbage collector. Often, amortized over time, Java is indeed extremely fast - up to being faster then other memory strategies - and the JIT very good.
It’s probably not the case in the Discourse example and that thing is just plain slow, I just wanted to make clear that those issues are still very present, even on modern, beefy machines. I probably should have avoided that.
The problem with Java is that the semantics of the language make being smart about allocations very hard. As counter examples: Go, Erlang, and Ocaml all have GCs and it’s pretty much a non-issue. The semantics of the language make it much easier to generate less garbage to collect.
But none of these have any guarantees. Go GC behavior is undefined, especially regarding freezes. Erlang has GC issues when handling binary data of non-trivial sizes - when you run the other GC. Erlang doesn’t have issues when used in its natural domain: fast message passing of small packages, which can be garbage collected per process. OCaml can also lead to GC tuning for more speed.
I’m not saying GCs are bad, quite the contrary. All I’m saying is that you just cannot get the interaction with the GC out of a program and especially not go into any guarantees about their behaviors, especially not across runtime versions.
Rust removed the GC for precisely that reason: missing predictability.
But none of these have any guarantees.
I’m not talking about guarantees though, I’m just talking about real-world experiences. These languages do not feel the pain of Java and, possibly, C#. My point is, it’s not fair to say GC == slow. That is simply not true and there are multiple examples of languages that don’t have this issue. In short: it’s complicated.
Erlang has GC issues when handling binary data of non-trivial sizes - when you run the other GC.
These issues are not related to performance of the GC. The shared-heap GC issue in Erlang is completely orthogonal.
OCaml can also lead to GC tuning for more speed
Sure, tuning always becomes an issue at the limits. But you have to tune even non-GC’d languages at the limits too, so this is not a GC issue but an at-the-limits issue. Also: the link you gave is about big data processing. I’m not sure exactly where the context in this discussion is, but if we are relating this back to the original post, this is certainly not the same context the author was speaking of (client-side applications).
All I’m saying is that you just cannot get the interaction with the GC out of a program and especially not go into any guarantees about their behaviors, especially not across runtime versions.
Yes, and I’m not disagreeing with this. I am saying that assertions like “high level languages are slow because of GC” is a not a fact-based assertion. There are languages with GC’s that run just fine. Note, also, that languages like C do not give any guarantees on the latency of malloc or free. So languages not having GC guarantees is not necessarily any worse than C, for guarantees.
Automatic Reference Counting (ARC) is awesome. It’s like GC without the GC, and it’s used by Objective-C and Swift to great success. Other languages should try it on for size.
Reference counting is not without downsides - the counter needs to be locked, which can lead to a lot of lock-contention in pointer-heavy multi-threaded programs.
Presumably for a simple refcount one can just use atomic increment/decrement operations, no? (Which still involve cache-line bouncing and thus performance loss, but are nevertheless cheaper time- and space-wise than using an actual lock.)
Yes, that’s a common strategy. Though slightly more complex than only atomic inc/dec:
You need atomic decrement-and-test rather than just decrement, because the test for 0 refs has to be atomic with the decrement, or else multiple callers might get a 0-refs-left test result, leading to multiple attempts to free the memory;
On some architectures, depending on the arch’s reorder guarantees, you need a memory barrier to make sure that the free isn’t reordered before any of the pending reads.
I mostly use Ocaml and the gc is not problematic at all. ARC leaks very important memory information that can be non trivial to get right in a large system.
ARC leaks very important memory information that can be non trivial to get right in a large system.
Is this referring to marking weak references?
Agreed. In the context of client-side application programming for resource-constrained platforms like smartphones, I’ve never seen an app suffering performance bottlenecks because of RC’s limitations (need for atomic locking, releasing roots of complex object subtrees). I can’t say the same about tracing GC.
This isn’t to say that RC is strictly better or that it doesn’t have any disadvantages, but in the context of where it’s often deployed today those disadvantages are less relevant.
The standard line of thought (“GC is faster”) is technically accurate, but misleading here. The real problem is the UI thread is special and should never be blocked by anything…including the garbage collector! A naive, stop-the-world collector that doesn’t respect the low-latency requirements of the UI thread will create a horrible user experience if the GC is invoked often.
I vaguely recall early versions of Java having this problem. I was put off of GC for many years because of it. (Obviously, GCs are smarter now.)
A lot of the problems in the cache misses section are things that I’d like to address in C# 7. The current proposal for ref returns/locals would help here.