1. 18

  2. 4

    Some brainstorming. On fixed-sized heap and between requests, you’re missing (or ignoring since out-of-scope) that the server-side systems often sit behind other systems such as load balancers. For between requests, the load balancer could include the necessary amount of delay between two requests to one machine. For fixed-sized heap, the app might also notify a monitoring system or load balancer so it reduces workload on that machine to let it reorganize its memory. Neither of these are something about the design of the garbage collector itself. However, I think that GC designers thinking of them in isolation might be making them miss opportunities.

    Collecting after a certain amount of time. Let’s say there’s a way to know what the program’s memory usage is. Maybe what the GC has loaded, what system memory it’s actually using (eg due to fragmentation), the rate it’s increasing, and what it has total. What I could see is something running after a period of time to look at that data to see if a collection should be forced. The data might also determine what the spread should be between actual computation and collection.

    Collecting on a schedule. Some languages can mix GC’d and non-GC’d memory. Let’s say most of the work happens with non-GC’d memory since it was easy to do correctly. Memory pools, borrow-checked, whatever. Then, the build-up in the back might accumulate slowly not using all that much memory. So, the app might order the GC to kick in during slow periods, turning off as more requests came in. This is an old technique I originally learned from game developers and folks that used GC’s outside the hot path.

    Collecting when out of memory. I’d consider this a failure. It shouldn’t get that far. The fail-safe principle says it should close in a way that indicates an error, dumps the state somewhere for analysis, and either restarts clean or stays off until administrator re-activates it. What to do at the end depends on the application.

    For Java, you might find Azul’s collector interesting. PDF here. It was the peak of Java GC for enterprise workloads when I last looked into that. Too bad it’s patented: gotta use a totally different design if I build one. :(

    1. 4

      the app might also notify a monitoring system or load balancer so it reduces workload on that machine to let it reorganize its memory

      I think I saw someone on twitter once a long time ago mentioning that they’d worked somewhere where they did that in production? May have been Kellabyte or, uh, dunno. I think it was someone in @aphyr’s social circle.

    2. 2

      Great post, it’s nice to see someone took the time to write this up! Keeping the overhead low when deciding when to garbage collect is a delicate balance. A classic example I can think of is HotSpot’s approach to fast safepoint polling. Mutator threads poll the VM every so often to see if they need to hand control to the collector (among other things). Polling happens often, and in the common case the mutator won’t need to enter GC code. To make this fast, they try and read a particular “polling” address in memory; its contents are irrelevant, but if the read was possible, the mutator continues on as if nothing had happened. If the VM wants the safepoint poll to trigger a collection, it will mprotect (or equivalent) the address, and the custom SIGSEGV handler will jump into GC land. The poll boils down to a single read along the fast path. [0]

      V8 (Chrome’s JS engine) also has an interesting approach on deciding when to collect. They work out how much time is needed on the main-thread to render animations at 60fps and schedule GC work to hide in the idle time between each frame render. They can also do clever things like compaction when a tab is inactive etc. The PLDI paper is well worth a read [1].

      [0] https://shipilev.net/jvm/anatomy-quarks/22-safepoint-polls/ [1] https://ai.google/research/pubs/pub45361

      1. 1

        Using mprotect is an interesting approach, but I wonder how expensive it would be when a SIGSEGV gets triggered. If my memory serves me right, that’s quite expensive; then again it would happen only rarely.

        1. 1

          Indeed. Safepoint polls happen regularly inside tight loops in Java, but a collection is less common, so handling a SIGSEGV is fine when the poll consists of a single test instruction in x86 in most cases.