1. 10

  2. 4


    jemalloc wouldn’t help in the naive Python interpreter. pymalloc directly uses OS virtual memory allocation routines for its object heap, which is mmap on Linux

    … well in that case libc.malloc_trim should not help either?

    1. 2

      All sorts of fun python optimizations that can be done if you specifically know your use case. I wrote about optimizing thread stack size years ago. http://jrwren.wrenfam.com/blog/2016/02/16/optimizing-uwsgi-for-many-many-threads-and-processes/index.html

      1. 1

        This is surprising. The linked What causes Ruby memory bloat? has some numbers: 230 MB allocated vs only 7 MB being used (not freed) by Ruby. I wonder if there is some system-wide way to make libc release free memory sooner. It could have a big impact on available memory (at the expense of performance).

        EDIT: mallopt apparently has M_TRIM_THRESHOLD that controls this. It specifies the minimum amount of space that can be released. But I suspect the man page is out of date because it claims the default is 128 KiB (much less than 230 MB - 7 MB in the Ruby case above) and neither Ruby or CPython call this. It also makes references to sbrk() and I thought these days memory allocators use mmap() instead.

        1. 2

          I think I’ve heard something about glibc’s malloc still using sbrk for some cases? Modern allocators of course only use mmap, new platforms often just don’t have sbrk at all (e.g. FreeBSD/aarch64&riscv64)

          1. 1

            Ah, that’s probably true given GNU’s focus on being compatible with many systems.

          2. 2

            This kind of thing is often misleading. In snmalloc, we use MADV_FREE on *NIX platforms that support it. This allows the kernel to reclaim pages but the kernel won’t unless physical memory is constrained. On Windows, we register for a low-memory notification and only decommit memory when there is physical memory pressure.

            There is no performance problem from using a load of memory, as long as that memory exists and nothing else wants to use it. There is a problem if using memory prevents allocation, causes swapping, or evicts hot things from the buffer cache. The goal for a memory allocator should be to avoid the latter. There’s nothing wrong with using 10GiB of memory on a system with 128GiB total and 32GiB free. There’s a big problem with using 1GiB on a system with 4GiB that’s starting to swap.

          3. 1

            I’m really sceptical of anything that recommends using malloc_trim. This API is only possible to implement (as anything other than a no-op) on memory allocators that subdivide allocations. You leave a lot of performance on the floor as a result of that early design choice. Using any sizeclass allocator will give you far more performance on most workloads than any use of malloc_trim. The big string-processing benchmark in SPEC is xalanc. snmalloc gives a 30% performance increase relative to the default glibc malloc, no amount of malloc_trim will give the same speedup.