I’d prefer if we actually had some better fleshed-in way of accessing the GC environment in the language. Similarly, it kinda sucks not having some way of doing finalize (though in Java that was a half-broken promise, but still).
Implementing weak references with the classic mark+sweep collector isn’t that difficult, but it requires a third “unlink” stage where you point all of the weak references now pointing to invalid objects to null.
However while mark+sweep is widely recognised as “easy”, it’s not fast.
Fast collectors aren’t much harder, so I don’t see the point of bothering with slow collectors: We have a choice, in we can either visit live memory, or dead memory, so if we create a lot of garbage, it makes sense to visit live memory, and if we don’t create much garbage, then we should only bother scavenging dead memory.
The live memory approach is really common, and the state of the art algorithms that I’m aware of are still based on Cheney’s algorithm, which might be improved upon with extra memory barriers, cards, threads, and colours, but is still pretty easy to implement, and a by-product of its design makes weak references very easy to implement: Add a second generation for broken hearts, and null out values on the marking stage if they point to a broken heart.
The dead memory approach is less common, but finalisers are basically free if you have to visit dead memory anyway, so weak references can actually be implemented trivially.
We have a choice, in we can either visit live memory, or dead memory
One is called garbage collection, the other reference counting.
the state of the art algorithms that I’m aware of are still based on Cheney’s algorithm
No. The copying algorithm usually causes poor mutator performance due to cache misses.
While semi-space partitioning schemes are popular, they are mainly used in the young generation of generational GC schemes, supported by other GC algorithms in older generations.
is still pretty easy to implement
It’s easy to implement if you don’t care about the usefulness of your implementation for any interesting workload.
The problem is not ease of implementation, but prolonging the time until dead objects are considered unreachable and have been collected.
finalisers are basically free if you have to visit dead memory anyway, so weak references can actually be implemented trivially
Famous last words… :-)
(Although not many people use reference counting outside very constrained and controlled environments anymore, so that point is kind of moot anyway. Additionally, supporting finalizers signs you up for a lot of madness like resurrecting dead objects within finalizer methods, and figuring out the correct way to collect chains of objects to be finalized.)
One is called garbage collection, the other reference counting.
Henry Baker’s Linear Lisp, and RAII are other “dead memory” systems that don’t use reference counting (or are special cases of reference counting, by only allowing a count of zero or one).
the state of the art algorithms that I’m aware of are still based on Cheney’s algorithm
No.
Name one.
Although not many people use reference counting outside very constrained and controlled environments anymore,
I think you’ll find reference counting is more popular than you thought.
Arthur (KDB, K5, etc) choses reference counting specifically because it’s faster if you don’t produce much garbage (which is the case in array languages).
Henry Baker’s Linear Lisp, and RAII are other “dead memory” systems that don’t use reference counting (or are special cases of reference counting, by only allowing a count of zero or one).
I’m not sure that approaches that use these techniques fit into this distinction at all. When people talk about automatic memory management, they usually have dynamic systems in mind. If you solve the liveness/reachability question at compile time, there is no “garbage” to “manage”.
Name one.
I’m not sure what you are trying to claim here. There are plenty of different algorithms and approaches, Cheney’s is not special in any kind:
Cheney is just one instance of the tri-color abstraction, Treadmill would be another instance of that abstraction.
Cheney is described in terms of partitioning the heap into semispaces. So everything involving mark&sweep and mark&compact wouldn’t count. Or would it, considering that it’s possible to adapt Cheney’s interpretation to other partitioning schemes and different designs like Beltway, Garbage-First, Immix, Shenandoah?
What’s your definition of “based on Cheney’s algorithm” by the way? Most industrial GCs have tens of thousands of lines of code, while Cheney would fit on a single page. Would they all count as “state of the art implementations” of Cheney if someone found something similar in these huge code-bases?
I’d actually be surprised if Cheney is used that much due to the adverse effects on mutator performance. (If the collector runs twice as slow it’s not that big of a deal if the GC only runs 5% of the time. But if the mutator runs 5% slower, it has a huge impact.)
Although not many people use reference counting outside very constrained and controlled environments anymore
the state of the art algorithms that I’m aware of are still based on Cheney’s algorithm
No.
Name one.
There are plenty of different algorithms and approaches…
How about a state-of-the-art collector not based on Cheney? Or better: One that doesn’t have a forwarding pointer?
G1, Baker’s treadmill, Immix, Shenandoah, and Beltway are all absolutely based on Cheney. Whether they call them forwarding pointers, intraframe-pointers, or Brooks pointers is completely irrelevant.
There isn’t a single garbage collector I can think of that is made “hard or slow” by introducing weak references.
I’m not sure what you are trying to claim here.
That’s because you’re trying to needle a strawman of my argument, and using this to justify your earlier quip:
“Let’s add everything that makes GC hard and slow!”
Let me be clear: I can’t imagine how that can be true.
Either you have an idea of why GC is made hard and slow from weak references, or you don’t.
How about a state-of-the-art collector not based on Cheney? Or better: One that doesn’t have a forwarding pointer?
If “having a forwarding pointer” is your definition of “based on Cheney”, you could have stated that right in the beginning instead of back-tracking 10 comments in, and I would have not bothered wasting my time.
I would love (widespread support for) weak references in JavaScript.
Oh man how I want this to actually become a thing.
I’d prefer if we actually had some better fleshed-in way of accessing the GC environment in the language. Similarly, it kinda sucks not having some way of doing
finalize(though in Java that was a half-broken promise, but still).“Let’s add everything that makes GC hard and slow!”
Implementing weak references with the classic mark+sweep collector isn’t that difficult, but it requires a third “unlink” stage where you point all of the weak references now pointing to invalid objects to null.
However while mark+sweep is widely recognised as “easy”, it’s not fast.
Fast collectors aren’t much harder, so I don’t see the point of bothering with slow collectors: We have a choice, in we can either visit live memory, or dead memory, so if we create a lot of garbage, it makes sense to visit live memory, and if we don’t create much garbage, then we should only bother scavenging dead memory.
The live memory approach is really common, and the state of the art algorithms that I’m aware of are still based on Cheney’s algorithm, which might be improved upon with extra memory barriers, cards, threads, and colours, but is still pretty easy to implement, and a by-product of its design makes weak references very easy to implement: Add a second generation for broken hearts, and null out values on the marking stage if they point to a broken heart.
The dead memory approach is less common, but finalisers are basically free if you have to visit dead memory anyway, so weak references can actually be implemented trivially.
One is called garbage collection, the other reference counting.
No. The copying algorithm usually causes poor mutator performance due to cache misses. While semi-space partitioning schemes are popular, they are mainly used in the young generation of generational GC schemes, supported by other GC algorithms in older generations.
It’s easy to implement if you don’t care about the usefulness of your implementation for any interesting workload. The problem is not ease of implementation, but prolonging the time until dead objects are considered unreachable and have been collected.
Famous last words… :-) (Although not many people use reference counting outside very constrained and controlled environments anymore, so that point is kind of moot anyway. Additionally, supporting finalizers signs you up for a lot of madness like resurrecting dead objects within finalizer methods, and figuring out the correct way to collect chains of objects to be finalized.)
Henry Baker’s Linear Lisp, and RAII are other “dead memory” systems that don’t use reference counting (or are special cases of reference counting, by only allowing a count of zero or one).
Name one.
Perl uses reference counting. Objective-C uses (automatic) reference counting. Node.JS uses explicit reference counting for tracking sockets.
I think you’ll find reference counting is more popular than you thought.
Arthur (KDB, K5, etc) choses reference counting specifically because it’s faster if you don’t produce much garbage (which is the case in array languages).
I’m not sure that approaches that use these techniques fit into this distinction at all. When people talk about automatic memory management, they usually have dynamic systems in mind. If you solve the liveness/reachability question at compile time, there is no “garbage” to “manage”.
I’m not sure what you are trying to claim here. There are plenty of different algorithms and approaches, Cheney’s is not special in any kind:
I’d actually be surprised if Cheney is used that much due to the adverse effects on mutator performance. (If the collector runs twice as slow it’s not that big of a deal if the GC only runs 5% of the time. But if the mutator runs 5% slower, it has a huge impact.)
Q.E.D.
How about a state-of-the-art collector not based on Cheney? Or better: One that doesn’t have a forwarding pointer?
G1, Baker’s treadmill, Immix, Shenandoah, and Beltway are all absolutely based on Cheney. Whether they call them forwarding pointers, intraframe-pointers, or Brooks pointers is completely irrelevant.
There isn’t a single garbage collector I can think of that is made “hard or slow” by introducing weak references.
That’s because you’re trying to needle a strawman of my argument, and using this to justify your earlier quip:
Let me be clear: I can’t imagine how that can be true.
Either you have an idea of why GC is made hard and slow from weak references, or you don’t.
If “having a forwarding pointer” is your definition of “based on Cheney”, you could have stated that right in the beginning instead of back-tracking 10 comments in, and I would have not bothered wasting my time.