1. 28
  1.  

  2. 6

    Some people, when faced with an allocator problem, think they need another allocator.

    (And some people, faced with another allocator problem, think they need another nother allocator… but this is somewhat more forgivable imo.)

    1. 2

      The LLVM libraries are filled with their own allocators. Mind you, a lot of them are bump allocators, but nevertheless, new is overloaded a lot.

      Take that as you will.

    2. 4

      Now that there are only a few server platforms left, maybe “(just) fix malloc” is possible. I bet the original authors were dealing with some subset of the hundreds of (commercial, rarely open source) crap unixes everyone was expected to support back then, and “I’ll write my own malloc in the basement” seemed like the only way out.

      (Disclaimer: I didn’t write this code, so I can’t defend it. Just wanted to bring up the possibility that there are possible contextual explanations other than “people in 1990 were dumb”.)

      1. 1

        The tl;dr of this story is “we had issues with memory fragmentation and punted on solving it by dropping in jemalloc”. Also, “The right answer to \“malloc is slow\” is to make it faster.”

        C++ is one of the few languages where you can solve this properly! They talk about worker threads abusing tiny allocations, which is a fairly easy case to deal with. Since each worker thread can only work on one task at a time, you can give each worker a large chunk of memory to use as a stack, and wipe it before you start the next job. It’s very fast, it will never fragment, and it will never leak.

        The downside is that your worker threads have a hard memory cap, but honestly they have one if you use malloc too. When you set the cap yourself, you can abort the single job that blew its budget. When your OS/RAM sets the cap, you swap and everything grinds to a halt or you get OOM killed and drop everything.

        Have you ever actually looked at C++ STL code?

        Hahaha. I was looking in our standard library a few weeks ago and found a sort implementation which is probably not optimal.

        “I heard std::sort is solid, maybe we can use that!”

        template<typename _InputIterator1, typename _InputIterator2,
           typename _OutputIterator, typename _Compare>
        inline _OutputIterator
        merge(_InputIterator1 __first1, _InputIterator1 __last1,
          _InputIterator2 __first2, _InputIterator2 __last2,
          _OutputIterator __result, _Compare __comp)
        {
          // concept requirements
          __glibcxx_function_requires(_InputIteratorConcept<_InputIterator1>)
          __glibcxx_function_requires(_InputIteratorConcept<_InputIterator2>)
          __glibcxx_function_requires(_OutputIteratorConcept<_OutputIterator,
            typename iterator_traits<_InputIterator1>::value_type>)
          __glibcxx_function_requires(_OutputIteratorConcept<_OutputIterator,
            typename iterator_traits<_InputIterator2>::value_type>)
          __glibcxx_function_requires(_BinaryPredicateConcept<_Compare,
            typename iterator_traits<_InputIterator2>::value_type,
            typename iterator_traits<_InputIterator1>::value_type>)
          __glibcxx_requires_sorted_set_pred(__first1, __last1, __first2, __comp);
          __glibcxx_requires_sorted_set_pred(__first2, __last2, __first1, __comp);
          __glibcxx_requires_irreflexive_pred2(__first1, __last1, __comp);
          __glibcxx_requires_irreflexive_pred2(__first2, __last2, __comp);
        
          return _GLIBCXX_STD_A::__merge(__first1, __last1,
        			__first2, __last2, __result,
        			__gnu_cxx::__ops::__iter_comp_iter(__comp));
        }
        

        for thousands of lines, maybe not.

        1. 7

          The author goes on to say that jemalloc didn’t solve the problem, but they deployed it anyways to reduce CPU and memory usage.

          The problem being gnu libstdc++ overloads new and pools allocations on top of malloc.

          If that is what you also took away from it, it may be worth updating your tl;dr to be move favorable to the author.

          1. 4

            Besides the ugly names, what don’t you like about that code snippet?

            See this SO answer for more info, but long story short, none of those _glibcxx* functions cause any code to be emitted.

            I also suspect the forwarding to call to _GLIBCXX_STD_A::__merge would get automatically inlined.

            1. 1

              It’s not horrible if you really try to read it but it’s a huge amount of code for something that doesn’t need it

            2. 3

              “we had issues with memory fragmentation and punted on solving it by dropping in jemalloc”.

              As Xorlev says, that’s definitely not what happened. That was one of many attempts to solve the problem… that didn’t work.

            3. [Comment removed by author]