1. 7
  1.  

  2. 4

    Interesting analysis, it’s a shame that the conclusion skipped something obvious: The __tls_get_addr function is used only by the global-dynamic and local-dynamic TLS models. These models allow TLS to be allocated in non-contiguous space to allow shared libraries to be loaded and unloaded. I don’t think there’s actually ever a use case for dlopening libc, so the simplest fix for this would be add this one-line patch to the libc Makefile:

    CFLAGS+=-ftls-model=initial-exec
    

    After that, there’s no function call to __tls_get_addr, there’s just a dynamic relocation that provides a fixed offset from the thread pointer (thread segment register on x86[-64]).

    This isn’t quite sufficient to get the same speedup that they’re seeing by itself but it is a speedup that would likely apply to any program that calls libc functions that accesses any thread-local state.

    1. 3

      And, apparently, I wasn’t the only person to notice this. This was committed yesterday.

      1. 2

        The Bugzilla entry __get_locale() is inefficient shows that change has been applied on most architectures.

        1. 6

          It turns out I’m actually responsible for introducing the slow function. It honestly never occurred to me that libc would be compiled with anything other than initial-exec TLS.

          The fact that *NIX systems conflate libraries that are strict dependencies of a program and libraries that exist as plugins continues to annoy me. I wish we had a stricter separation of these things so that we could default to initial exec for most things and have a more sane symbol visibility story for plugins (i.e. they only see things that are explicitly exported from the loading process for use by the plugin, so it’s possible to maintain stable plugin ABIs).

      2. 2

        SVG Graphic Coming Soon

        If the one and only visual aid is not available, why not just wait to publish the article?