1. 14
  1. 14

    I am not convinced that Go does not suffer from fragmentation in general. The cited paper from 1998 appears to evaluate only single-threaded applications. But dynamic storage allocators for highly parallel applications that provide excellent performance and avoid fragmentation are a whole different beast. If an allocator wants to provide high throughput, the cores(/threads) must operate autarkically. But if you want to avoid fragmentation, they must collaborate, which poses synchronization overhead, hindering throughput.

    Why can Go run its GC concurrently and not Java?

    Not sure what I (or the author) missed here. But parallel GCs exist for Java.

    1. 6

      autarkically

      At first I thought your keyboard was faulty, but no, I just learned a new word! As a bit of an aside, Merriam-Webster defines it thusly:

      in an autarkic manner

      OK then. So what’s autarkic?

      of, relating to, or marked by autarky

      How helpful.

      spoilers

      1. 2

        funny, in german “autark” is pretty common

        1. 1

          autarky, an economic system of self-sufficiency and limited trade. A country is said to be in a complete state of autarky if it has a closed economy, which means that it does not engage in international trade with any other country.

      2. 11

        Perhaps unfortunately, I think this article is more interesting for the assumptions the author makes, or glosses over than the actual content. A few I noticed were:

        • There’s no metadata header required per object in memory (there must be somewhere, as you’ll need it for both knowing which fields in an object need to be traced by the GC, and also run type casts)
        • That other languages that experience fragmentation do so simply because they don’t use a “modern” allocator, and that it’s not a tradeoff between space utilisation, allocation / deallocation speed and runtime overhead.

        The hackernews comments have a lot more details on mistakes, too, if that’s your jam.

        1. 9

          Agreed. I mostly agree with the author, (and like him I prefer Go to Java) but there’s a lot I think he overlooks or got wrong.

          • He did actually talk about the object metadata in Java; that’s the Klass* in the OpenJDK header. But Go must need something similar, even if it’s statically typed, since the GC doesn’t know compile-time types.
          • He mentions Java objects having to store mutexes. It’s true that synchronized was one of Java’s biggest mistakes, but to my knowledge the HotSpot runtime stores mutexes in a side table not in the object header.
          • He mentions cache coherency, but not that relocating objects is terrible for cache performance.
          • “Sooner or later you need to do compaction, which involves moving data around and fixing pointers. An Arena allocator does not have to do that.” Yes, but on the flip side, an arena allocator keeps the whole arena in memory as long as even one object in it is still in use — thi can cause memory bloat requiring you to track down how pointers are escaping.
          • “it’s likely to be more efficient to allocate memory using a set of per-thread caches, and at that point you’ve lost the advantages of a bump allocator.” My understanding is that you get around that with per-thread bump allocators.
          • “Modern memory allocators such as Google’s TCMalloc or Intel’s Scalable Malloc do not fragment memory.” — There’s no such thing as a non-fragmenting allocator, unless all your allocations are the same size. Modern allocators just fragment less.
          1. 1

            Yes, but on the flip side, an arena allocator keeps the whole arena in memory as long as even one object in it is still in use — thi can cause memory bloat requiring you to track down how pointers are escaping.

            You can still selectively free unused pages in a nearly-empty arena to save RSS. But that’s a tricky latency/throughput tradeoff since madvise() is so expensive.

          2. 4

            Go’s GC needs to know the start and size for all objects, and where pointers are in them. Start and size are easy for objects up to 32 Kb, which are allocated in size classes in a slab-style allocator; given a memory address, you can determine what slab class it’s in, which gives you both start and size using no per-object metadata. For pointers, the simple version is that Go keeps a bitmap of where there are active pointers in a given area (page, etc) of memory. The pointer alignment requirements mean that this bitmap can be quite dense. As an optimization, there are special slab classes for ‘size class X and contains no interior pointers’, which need no bitmaps at all.

            The important thing about these is that while they both require metadata, it’s large-scale metadata (and it’s aggregated together in memory, which probably helps efficient access as compared to chasing pointers to per-type information for each object).

            One corollary of this is that Go’s GC doesn’t actually know the type of any object in memory. It might not even know the exact size, since some size classes are ranges of sizes.

            (I don’t know exactly how Go implements objects larger than 32 Kb, but I can think of various plausible approaches.)

            1. 1

              A number of JVMs optimise layout for GC by sorting all pointer fields before non-pointer fields. This means that they only need to carry one integer value for GC metadata: everything from the start of the object to some per-type offset is a pointer. This also means that the scanning logic plays nicely with branch predictors and caches.

              This is, as far as I am aware, the only down side of having value types in the language. .NET, for example, cannot do this trick because it has value types and so a class may interleave pointer and non-pointer data in a way that is exposed to the abstract machine. I believe the same is true of Go.

              1. 1

                Which JVMs do this? Also, how does that interact with inheritance? In Hotspot, I believe that the fields for the subclass are just tacked on to the end of the fields from the parent class. If you reorder the fields for the subclass, you’d then need to generate different instructions when you JIT compile methods, right?

                1. 1

                  Which JVMs do this?

                  I think Hotspot did it, not sure if it still does.

                  In Hotspot, I believe that the fields for the subclass are just tacked on to the end of the fields from the parent class. If you reorder the fields for the subclass, you’d then need to generate different instructions when you JIT compile methods, right?

                  Sorry, I missed half of the explanation. The class pointer points to the middle of the object. Primitives go in front of the class pointer, pointers go afterwards. For subclasses, you just append pointers and prepend primitives. This isn’t possible if you have multiple inheritance but Java has only single inheritance.

            2. 3

              The lack of metadata header is interesting in Go because there is one, it’s just not always there and that has other tradeoffs. In Go, a pointer is either to a concrete type or to an interface. If it’s an interface then it’s a fat pointer that has to carry around both a pointer to the metadata header and a pointer to the object. This isn’t necessarily better or worse than the Java[1] model, it optimises for different things. In the Go model, any time you’re passing an object around via an interface, you need to pass two pointers. A Go array, slice, or a map of interface{} is twice as big as a Java array of Object. In exchange for this, you get smaller objects and if you don’t do dynamic dispatch, Java has more overhead. If you do, Go has more overhead.

              I generally prefer the Go approach to Java or .NET because it means that you pay for dynamic dispatch only when you want to use it and it moves the decision about whether to pay the cost to the caller.

              The other interesting thing that Go does is to separate the code-reuse mechanism from the subtyping mechanism. Pony also does this and we’re going to do the same for Verona. This conflation is part of the reason that Smalltalk-family languages are difficult to compile to efficient code. Anamorphic Smalltalk (StrongTalk) differentiated in the VM between subclassing (used for composition at the implementation) and subtyping (used at the call site to determine the set of allowed callees), even though the source language conflated them. Go doesn’t have a notion of subtyping but does have some syntactic sugar to make composition easier. Concrete types in Go are never subtypes of other concrete types, but interfaces can be subtypes of other interfaces and concrete types can be subtypes of interfaces. This is a very nice property, from an implementation perspective.

              [1] Note: a lot of things that this article ascribes to Java were inherited directly from Smalltalk and pre-date the GC research that the author is talking about by at least a year.

              1. 1

                This is a dilemma for any system that uses vtbls for dynamic dispatch. You can either put the vtbl pointer in the object (like Java or C++), or in a fat pointer next to the object pointer (like Go interfaces or Rust trait objects). I’m curious which other languages have taken the latter approach.

            3. 1

              This article makes me wonder about to what extent Chicken fragments memory and if that matters. We have Cheney on the MTA with a generational, copying GC as explained by @sjamaan.

              I’ve sometimes run into it being slow, but I don’t know comparatively to other languages since I wasn’t Rosetta-stoning my own apps.

              1. 1

                I didn’t get this: pointers to calue objects in Go will pit objects on heap.

                1. 2

                  In Go, objects go on the heap if they need to outlive the lifetime of the function they’re created in (or at least if the compiler thinks they do). Since Go uses pass and return by value, the dominant way to have an object outlive its function’s lifetime is to create a pointer to it and then assign or pass the pointer around in a way that causes the compiler to believe it lives on.

                  (In some circumstances the compiler is smart enough to recognize that a pointer passed as a function argument doesn’t escape the call stack; in other cases, it can’t see this.)

                  1. 1

                    Value types are not typically address-taken at the abstract machine level (typically that’s part of the definition: they can’t have their address taken, they can be passed by value, i.e. by copying them). This means that even if they’re allocated on the heap, the compiler knows their exact lifetime (the are no longer valid at the end of the stack frame owning them or when the object holding them is deallocated) and so don’t need to be tracked for GC.