1. 17

  2. 2

    LTO typically causes large increases to the stack space needed. Since LTO results in aggressive cross-object inlining, a bunch of local variables from many different functions now wind up getting allocated at the same time. When you enable LTO for the first time, expect to see some stack overflows!

    Is that correct? Yes, individual stack frames may grow larger (but fewer of them), but I wouldn’t expect total stack usage to grow.

    1. 2

      This is based on our experience enabling LTO on a few firmware projects in our career. You can imagine some cases where multiple branches in your execution tree are collapsed in a single stack frame, growing the worst case stack usage.

      1. 2

        It seems possible.
        Consider a contrived example:
        a() - 32 bytes of stack
        b() - 32 bytes of stack
        c() - 64 bytes of stack
        a() calls both b() and c() once.

        If b() and c() are inlined into a() and the compiler doesn’t reuse stack slots then the maximum stack depth is now 128 instead of 96.

        1. 9

          Possible, sure, though in my experience gcc is pretty clever (aggressive, perhaps) about reusing stack slots. Even with a not-super-recent version of it (I think it was circa 5.1 or so), I recall a few years ago being impressed to discover that it had merged two distinct local arrays to share the same stack space, despite the fact that they had overlapping lifetimes – it had noticed (correctly) that while the lifetimes of the two arrays as a whole overlapped, the lifetimes of each individual corresponding pair of elements (e.g. A[0] and B[0]) did not, and hence arranged things so that the same underlying chunk of memory started out as array A and gradually, element by element, became array B.

          1. 1

            Hmm. Ok, you’re probably right. Thanks.