1. 51

  2. 14

    10/10 post, I love it when people do real deel dives into these topics. This post is about Ruby, but it affects any program that uses malloc.

    As for why the author’s patch is faster, I have a couple of theories, but I’d love to see a CPU flamegraph of the benchmarks.

    1. 2

      Some related recent work: Quantitative Overhead Analysis for Python

      1. 1

        As a followup on this article, what I would be interested is a matrix of comparisons between jemalloc, the malloc_trim patch and the MALLOC_ARENA_MAX env tweak. Also, an explanation with numbers on why jemalloc handled things better.

        1. 1

          Well, jemalloc was designed to avoid fragmentation and facilitate cache hits. By default it uses a 32 KiB thread-local cache (tcache) for small allocations. And the default number of shared arenas is a lot smaller, 4 per physical core, rather than glibc’s 8 per hyperthread.

      2. 8

        In classic glibc form, malloc_trim has been freeing OS pages in the middle of the heap since 2007, but this is documented nowhere. Even the function comment in the source code itself is inaccurate.


        1. 6

          Magic is simply science that we don’t understand yet.

          This. I grow increasingly annoyed with technical problems and solutions being dismissed as “magic”. All that really means is “I don’t want to bother understanding it, as long as it works”. That’s fine, as far as it goes, but it also means you’re helpless when it ceases working.

          This is an excellent example of three phenomena: finding the real answer often requires some digging and the right tools make it far easier, the answer often isn’t actually that complicated when you break it down, and it’s often something unexpected.

          1. 6

            Great article, but I’m always kind of shocked that there are these kinds of low-hanging fruit in programming environments that big chunks of the industry are using professionally every day.

            1. 4

              This is an absolutely well written article. Easy to follow, very detailed and incredibly insightful. Kudos to the author! Is the patch already included in the next Ruby release or is there a link to a pull request I can follow?

              1. 3

                Looks like it’s being tracked on the Ruby issue tracker! https://bugs.ruby-lang.org/issues/15667

                1. 1

                  That’s brilliant, thank you.

              2. 3

                This is a good investigation. It’s an article of faith that Mastodon is unusable without jemalloc, for example, and this could really simplify Mastodon administration.

                I’m really surprised that there’s no slowdown from calling malloc_trim on every gc. Clearly something about it is saving more time than the malloc_trim uses up. Is it because it’s avoiding paging in the environment it was tested in? So it would be a slowdown on machines with plenty of RAM? As they say, “more studies are needed”.

                1. 2

                  Very informative. I knew the jemalloc trick, but not this ENV.

                  1. 1

                    Excellent write-up. Trimming allocations in the middle of the heap would benefit certain applications and hurt others, though. You’re possibly making it less likely to be able to get contiguous memory blocks in the future by fragmenting the heap, and it only benefits applications where you’re (a) running lots of long-lived processes that have highly variable working sets and (b) you’re actually memory constrained as a result. If you don’t have both these conditions the existing behavior where you asymptotically approach allocating slightly more than your peak working set will probably perform better.