Abstract: Using a stack for managing the local state of procedures as popularized by Algol is a simple but effective way to achieve a primitive form of automatic memory management. Hence, the call stack remains the backbone of most programming language runtimes to the present day. However, the appealing simplicity of the call stack model comes at the price of strictly enforced limitations: since every function return pops the stack, it is difficult to return stack-allocated data from a callee upwards to its caller—especially variable-size data such as closures.
This paper proposes a solution by introducing a small tweak to the usual stack semantics. We design a type system that tracks the underlying storage mode of values, and when a function returns a stack-allocated value, we just don’t pop the stack! Instead, the stack frame is de-allocated together with a parent the next time a heap-allocated value or primitive is returned. We identify a range of use cases where this delayed-popping strategy is beneficial, ranging from closures to trait objects to other types of variable-size data. Our evaluation shows that this execution model reduces heap and GC pressure and recovers spatial locality of programs improving execution time between 10% and 25% with respect to standard execution.
No comparison to copy elision / return value optimization, as far as I can tell. Is this really any better?
It supports dynamically-sized values like closures, vectors, strings…
The drawback I see is that the caller keeps the callee’s entire stack frame (not just the returned value) until it returns. And if it too returns a variable-size value, it doesn’t pop the stack either.
This seems like it could cause major stack bloat in some circumstances (especially with recursion…)
Can those handle variable-sized data?
No, but they can handle forwarding. If I have a deep call stack that returns an object from the leaf to near the root, this can be done without copies in C++ with guaranteed copy elision. The root caller allocates space, passes the pointer down the stack, and reach return uses the same space. This cannot support variable-sized structures, but supporting variable-sized structures via any non-heap mechanism would be impossible for anything other than a one-deep call. This is an unfortunate programming abstraction because outlining a piece of code changes the semantics. In particular, I’d your language exposes object identity then this will make a bunch of optimisations visible to the programmer.
Putting variable sized objects on the stack increases the amount of metadata that needs to be tracked. If your passing them up the stack with a bunch of copies and a load of metadata then you may well find that heap allocating is faster (if the objects address never escapes, you can delete it without adding it to the GC).
See: lazy allocation (by baker, referenced by the linked paper) and stack-based allocation.
That Azul deck is a fun counterpoint to Appel’s garbage collection can be faster than stack allocation
The stack-based allocation shown in the Azul slides doesn’t apply here — we’re talking about objects being returned from a function. Azul’s escape analysis (just like Go’s) would detect that the object’s lifespan exceeds the caller’s, so it would be forced to heap-allocate it.
Escape analysis is very simple and everybody does it. The slide deck describes a more sophisticated mechanism which can in fact cope with escaping objects.
Ah, you’re right. It just requires custom hardware to implement the necessary write barrier. That was in 2006 — I assume it never took off, at least not in any mainstream sense?
It didn’t take off, indeed. That’s not so much a technical indictment of it as an indication that external factors did not work out favourably. For instance, cliff click was also responsible for the c4 garbage collector, which azul uses successfully to this day, but which did not start to enter the mainstream until oracle decided to fund an oss clone of it a few years ago (zgc). There are various sociotechnical factors—pretty much all of them artificial—which disincentivise production of hardware gc support.
I’m curious to see that they speak both of “trait objects” (a Rust term, as far as I’m aware) and of “GC pressure” (which implies not-Rust, unless they mean reference counting).
They’re applying the technique to Scala, which also has traits.