We use ZGC in production and just discovered an issue this week where threads were hanging due to allocation stalls. You’ve got to watch out for that with these new garbage collectors. Sure, your max GC time is <10ms, but we saw threads getting stalled for >300ms waiting for an allocation.
This is probably our fault though. We should have had GC logs on and analyzed them to know we needed more CPU and memory because we were running an embedded data grid.
What made you choose ZGC over Shenandoah or G1GC? Will you stick with ZGC despite your experience this week?
Some other apps use it in our company so it was recommended to us. Unfortunately I’m not an expert on the differences between the two.
We’re going to be using Spring’s Micrometer library to start getting JVM metrics like GC’s, allocation stalls, etc. If there’s any issues, we may consider a move after profiling.
Thanks, I hadn’t heard of micrometer. I’m not a JVM person, but work has an Elasticsearch cluster that our DevOps team is always trying to tune, so I’ll pass that on to them.
Glad to help! Enabling GC logging is also a great quick thing you can do to see a lot of info, but for our use case we didn’t have the storage in production (~10mb per day, but we’re in Kubernetes).
Is anyone running ZGC and/or Shenandoah? Why did you switch, and what has the experience been like?