1. 17
    1. 7

      I wish more people talked about these tradeoffs and explored the design space. For example, how different really are the assembly of each of those 10 different monomorph versions? Is it mostly a matter of sizes and layouts of the values being copied around, or does it actually generate tons of different code for each variant with all the inlining and stuff it will also want to do? Can you automatically collapse those 10 variants into 2-4 or something?

      1. 4

        Exactly the same thing happens in C++ when (over)using templates. I discovered recently that the std::format feature of C++20 is prone to this — I was using spdlog, which is based on it, and discovered that more than half of my binary’s size was various formatting functions.

        I’m somewhat paranoid of this in my own code, so I try to factor out as much as possible of a template into a non-generic base class or function that can be shared by all the instantiations.

        (I sometimes wonder whether linkers could identify functions with identical machine code and collapse them into one. This could eliminate some template bloat — e.g. vector<size_t> and vector<char*> would generate identical code for their methods, which could be merged. Of course this would complicate looking up symbol names in a debugger or crash log!)

        1. 7

          You’re describing identical code folding (ICF), which is indeed used. I tried to find numbers for its impact on a big C++ codebase like chromium and I think this llvm thread has a lot of good info https://discourse.llvm.org/t/rfc-annotating-global-functions-and-variables-to-prevent-icf-during-linking/57905/6 (also discusses how unrestricted icf can break C++ semantics for function pointer identity).

          1. 3

            how unrestricted icf can break C++ semantics for function pointer identity

            Put a couple of nops in front of the implementation and use each one for a different function? (at some point a few jumps could be interspersed to speed things up)

            1. 1

              This thread is in the context of lld, which I think is LLVM’s own linker? So I suspect it isn’t relevant to Apple platforms which are what I mostly care about. But now I know the name of this feature which makes it easier to find out. Thanks!

              1. 3

                The Windows linker has done ICF for a very long time because so much code is C++ there. I don’t know is ld64 does it. It’s technically a violation of the standard because two function pointers should compare not equal if they are not the same, but I’ve never seen code where that matters (if a function is bit-for-bit identical to another, why would you care that you were passed a function pointer to one version and not the other?).

                1. 1

                  I’ve seen code that used a function’s address as a sentinel value affecting control-flow, but I guess that would work so long as other functions not meant to terminate the search don’t get fused with it. Could be a bit of a nasty gotcha.

            2. 7

              Compilers and linkers already unify identical functions. This regularly happens in Rust.

              However, it’s very easy to generate tons of almost-but-not-quite identical functions. Rust types like &str, String, &String, Box<str>, Cow<str> may all be “strings”, but have different memory representation, so the code using them is different enough that it can’t be unified.

              Even when the types are essentially identical, e.g. a 32-bit char vs u32, it’s easy to create generic functions that can’t be unified for silly reasons. For example, calls to unwrap() or assert_eq will reference debug print methods of the types, which are unique for every type.

              1. 2

                It’s interesting because std::format has some very aggressive type erasure, bin-packing format args into 16-bit discriminated unions, in order to reduce the number of template instantiations required compared to previous formatting libraries and it’s still large.

                1. 2

                  interesting — is there a description someplace of how this is done, or did you just read through the code?

                  1. 1

                    I first learned from these two lectures: https://youtu.be/ptba_AqFYCM?si=pVMJQYFTqRVYJnxx https://youtu.be/zssTF1uhxtM?si=IDGk10-z04WWzkK1

                    I later went through the code (MS STL is imo the easiest implementation to read through) because I needed to make my own format templates that are compatible with the memory allocators, SIMD, error handling, and integer types of a library I’m working on.

                2. 2

                  I sometimes wonder whether linkers could identify functions with identical machine code and collapse them into one.

                  I’m not an expert in compilers so I may be very wrong, but I think Go does something in this vein.

                  1. 3

                    Go doesn’t need it. Until recently, Go didn’t have generics at all and I believe even now Go’s generics are implemented with dynamic dispatch.

                    This is one of the design decisions that I really hate in a lot of languages: they leak compiler implementation details. From a language abstract machine perspective, there should be no difference between a generic function that takes a concrete type that implements some interface and a function that takes that an interface. In C++, these are totally distinct. The former is a template, the latter a function that takes an abstract class. You should be able to write code in the same way for both and have the compiler apply heuristics to determine whether to monomorphise (with attributes to tell it to do the other thing if it gets it wrong). Rust and C# both get this right, though (understandably given their different targets) with opposite defaults: Rust monomorphises unless told to make something dynamic, C# does dynamic dispatch unless told to monomorphise.

                    1. 3

                      Haskell has the property you desire. The compiler can choose to specialize polymorphic functions, or it can use a generic version and implement polymorphism by passing around (and constructing) type class dictionaries, (similar to v-tables). There is also a SPECIALIZE pragma if you want to force specialization.

                      This allows Haskell to specialize and inline whenever it wants, but it also allows the language to have features that would be impossible with specialization, like polymorphic recursion (the type changes every time the function recurses) and existential types.

                      1. 1

                        From a language abstract machine perspective, there should be no difference between a generic function that takes a concrete type that implements some interface and a function that takes that an interface. … Rust and C# both get this right

                        Why do you say Rust does what you want? IIUC, I don’t think it does: Rust has different syntax for a function generic over types implementing an interface (fn f<T>(x: T) where T: Interface) and a function taking a reference to an object implementing an interface, with a vtable (fn f(x: &dyn Interface)).

                        1. 1

                          My understanding was that you could put a dynamic attribute on a generic in Rust, which makes the latter simply syntactic sugar for the former + plus the dynamic attribute.

                          1. 2

                            It’s not quite just adding dyn keyword to a generic argument, since dynamic dispatch is somewhat of an alternative to the <> style generics.

                            Additionally, dynamic dispatch requires the type to have object safety (ie. Not all generics arguments can be one-to-one mapped to dynamic dispatch). This is a fairly subtle point that I have seen trip up a few colleagues in the past.

                            It all comes down to tradeoffs in design. Rust chose the fast and explicit over ease of use.

                            1. 2

                              you could put a dynamic attribute on a generic in Rust

                              I don’t think that exists. I don’t see it in the list of attributes, and the compiler raises an error on it.

                        2. 2

                          This is called “outlining.” No idea if Go does it though.

                      2. 3

                        Other top-level comments bemoan language-specific hacks. This is a good moment to consider referential transparency; a referentially-transparent language’s compilers can choose to factor out common code from any code objects which are being compiled together, and the factorization can be as simple as hash-consing all ASTs and only compiling each tree once.

                        1. 5

                          I don’t understand what referential transparency has to do with it. I think you can outline and share code with impunity in pretty much any language that doesn’t have self-modifying code. I think the principal problem (at least, at a surface level) here is that our compilers are insufficiently discerning in their choices of properties to monomorphise for.

                          1. 2

                            I wonder if outlining would be a useful technique to collapse the almost identical code sections together