1. 5
  1. 5

    There’s another approach I came up with while trying to create clean C++ headers that didn’t expose any messy implementation details.

    • In the header, declare class Foo without any private methods or data members.
      • Constructors must be protected — instead, declare static factory methods that return heap-allocated objects.
      • Destructor must be virtual, or else protected with some other mechanism to free the object, like a Delete method or ref-counting.
    • In the implementation cc or cpp file, declare FooImpl as a subclass of Foo. Add all the private stuff to FooImpl.
      • The factory methods instantiate a FooImpl.
      • FooImpl redeclares the public methods of Foo, as override if they’re virtual (but they don’t have to be virtual.)
      • Nonvirtual methods of Foo must be implemented as stubs that cast this to FooImpl and call the same method on it.

    This is similar to making Foo a pure-virtual abstract interface, except that its methods don’t have to be virtual. As long as you build with LTO, the stub methods will be inlined, so there’s no overhead.

    1. 2

      Oh wow, so at runtime it looks exactly as pimpl (the thing is a RAII managed pointer), but this is expressed using more natural language idioms. I wonder why this trick isn’t better known, seems quite beautiful!

      1. 2

        It isn’t commonly used for a few reasons.

        First, it doesn’t work with subclassing: no one else can subclass your classes correctly. They can subclass them (your constructor is protected) but if you do then you don’t actually get the behaviour. This is basically the Class Cluster pattern.

        Second, it means that you’re doing dynamic dispatch for almost everything. In general, modern C++ tries to avoid virtual because anything that’s virtual doesn’t benefit from compile-time specialisation with C++. One of the hardest things for me to learn with C++ was that trying to write Smalltalk in C++ is just as bad as trying to write C in C++.

        Third, where you’re not doing dynamic dispatch, your superclass methods are doing unsafe downcasts (or very slow dynamic_casts). Unsafe down-casts are generally disallowed by secure coding conventions and, in combination with the first point, are actually incorrect.

        Fourth, unless you run LTO, you don’t get inlining of any of your methods. No method that touches any of the state can be declared in the header because all of the state is private.

        1. 1

          This pattern doesn’t need dynamic dispatch. In my usage of it there are no virtual methods. (Even with virtual methods, I believe LTO can apply monomorphic optimization and inline them, when it sees there’s only one concrete subclass.)

          The downcasts are technically unsafe, but only if there exists an “unofficial” subclass. So don’t do that. (You can enforce that by changing the protected constructors to private and declaring the Impl subclass as a friend.)

          This does assume LTO, but nowadays I consider anyone not using LTO to be leaving performance on the table, for many reasons. I don’t know of realistic reasons not to use it.

          1. 1

            Even with virtual methods, I believe LTO can apply monomorphic optimization and inline them, when it sees there’s only one concrete subclass

            This is true only if LTO can track the concrete type, which isn’t the case if you store the objects on the heap and read them back elsewhere.

            The downcasts are technically unsafe, but only if there exists an “unofficial” subclass. So don’t do that. (You can enforce that by changing the protected constructors to private and declaring the Impl subclass as a friend.)

            That can work, you need to forward declare the Impl classes in the header but I suppose that doesn’t cause recompilation unless you add a new one. But now your classes don’t allow subclassing outside of your library at all, whereas this is one of the main reasons to use the pImpl pattern: so that you can change the size of your class without subclasses that you don’t control changing size and needing to be recompiled.

            This does assume LTO, but nowadays I consider anyone not using LTO to be leaving performance on the table, for many reasons. I don’t know of realistic reasons not to use it.

            Note that LTO still doesn’t buy you template specialisation. If you have only one private subclass, that’s fine, but if you have more than one then any template that uses them will be specialised only for the superclass.

            Fat LTO still has much too long compilation times for most builds. ThinLTO generally doesn’t do more than one level deep inlining between compilation units, so you’re leaving a lot of performance on the table from not having things in the header. If you’re willing to accept the constraint that the class can’t be allocated on the stack and can’t be subclassed outside of your library then you can get better perf by just putting everything in the header, at the expense of build times (though, with modules, even that isn’t necessarily true).

            1. 1

              This is true only if LTO can track the concrete type, which isn’t the case if you store the objects on the heap and read them back elsewhere.

              Hm. My assumption is that, if A introduces a pure-virtual method and B overrides it, and if B is the only subclass of A, then LTO can treat that method as monomorphic and inline it. Is that true?

              I guess this optimization would break if another subclass of A were loaded at runtime … can the linker rule that out if A’s vtable isn’t being exported as a public symbol from the shared library or executable being linked?

              Fat LTO still has much too long compilation times for most builds. ThinLTO generally doesn’t do more than one level deep inlining between compilation units

              I’m not sure how these map to the build options in Xcode, which has IIRC “monolithic” and “incremental” LTO. If the latter is “thin” then it’s less effective, which I wasn’t aware of…

              1. 1

                Hm. My assumption is that, if A introduces a pure-virtual method and B overrides it, and if B is the only subclass of A, then LTO can treat that method as monomorphic and inline it. Is that true?

                This is possible in whole-program optimisation. I think with LLVM’s LTO there’s a flag to tell it that you should assume that it can see the entire program (though this can have some surprising results - see last week’s Chrome CVE).

                I guess this optimization would break if another subclass of A were loaded at runtime … can the linker rule that out if A’s vtable isn’t being exported as a public symbol from the shared library or executable being linked?

                That might work, if you compile your library with -fvisibility=hidden and expose the symbols you want (I don’t remember how well that actually works for C++ vtables, but I think it does). I’m not sure if LLVM actually does this though, the devirtualisation work seems to be mostly driven by Chrome, which tends to favour modern C++ style and avoid virtual as much as possible.

                I’m not sure how these map to the build options in Xcode, which has IIRC “monolithic” and “incremental” LTO. If the latter is “thin” then it’s less effective, which I wasn’t aware of…

                No idea, it’s been a good 5 years since I last used XCode and even then I was generating XCode projects from CMake so the last time I actually bothered with XCode compile options was probably closer to 10 years ago. See what the -flto= option it generates is (I presume it still lists the compile commands in the build log dialog)?

    2. 3

      I’m wondering if he would be willing to try this technique out to get around compilation times in addition to using final on the derived objects in order to get around the virtual method slowdown. I’ve been using it and it seems ok…I think?

      https://stackoverflow.com/a/41292751/2088672

      1. 2

        I would add another one…

        The opaque idiom (my colleague Seb’s idea, I did the implementation).

        The core gotcha with C/C++ is the compiler needs to know the sizeof() the class/struct at compile time, if it is ever going to allocate it either statically, dynamically or on the stack.

        The idea with a opaque idiom is to use preprocessor magic to make every client reference to the “MyType” refer to “opaque_MyType” which merely contains an array of elements of that have the same alignment requirement as the real type and is big enough to just fit the real type.

        With a little help from #undef, the implementation sees MyType as containing all it’s actual elements and private member functions.

        A downside is nothing can be inlined on the client side, but at least you don’t need a dynamic allocation per instance, you can allocate on the clients stack.

        Of course, sizing it is a pain, you have to do it with the help of the debugger like gdb… ie. oversize it, compile, find actual size, trim. And then use a compile time assert to check that it never gets too small if someone changes the implementation or alignment requirement.

        1. 2

          The core gotcha with C/C++ is the compiler needs to know the sizeof() the class/struct at compile time, if it is ever going to allocate it either statically, dynamically or on the stack.

          It’s also necessary for subclasses, so they know the byte offsets of their member variables.

          There have been some C++ runtimes like IBM’s SOM (which I’ve used) that removed this limitation at the expense of adding some indirection for member variable access, but they didn’t catch on. On the other hand, Objective-C’s runtime has done this since 2.0 (mid-00s.)

          The idea with a opaque idiom is to use preprocessor magic to make every client reference to the “MyType” refer to “opaque_MyType” which merely contains an array of elements of that have the same alignment requirement as the real type and is big enough to just fit the real type.

          Doesn’t that screw up name mangling? The call sites will be calling functions with “opaque_MyType” in the name, but those function implementations are named with “MyType”.