1. 27
  1. 2

    This is an interesting story of how “making good inlining decisions” is very tricky, and most simplifying choices/heuristics/guesses (we always have to make some) will come back to bite someone in the real world.

    1. 6

      It’s also very specific to how GCC does inline assembly. GCC took an explicit design decision to be loosely coupled with GAS. GCC doesn’t parse inline assembly at all, it is really just a form of embedded printf that passes the text through to gas, which can then parse the assembly. From the perspective of gcc, inline assembly is just a string and so the only thing that the GCC optimisers can use is the length of the string. In contrast, LLVM parses inline assembly and so is able to at least count instructions, if not the size (which may depend on other things for ISAs like x86), though I don’t know if it actually does.

      1. 2

        the only thing that the GCC optimisers can use is the length of the string

        …for the purpose of duplicating code. More interesting in other contexts (dataflow,regalloc) are the constraints, which it does take full advantage of.

        There was a really neat bit of research where they made LLVM automatically verify inline assembly constraints. Transparent assembly is useful for semantic analysis, not just optimization!

        Another nice reason to have integrated compiler/assembler is performance. Unix has failed […].

        All of which being said, I think inline assembly is an antifeature which, if supported at all, should be done in the manner of tcc (‘we must support this to be compatible’, not ‘we support this and make it fast’). If a loop is insufficiently fast in plain c, it should be written entirely in assembly. (I think the same of most simd intrinsics. I recognise this is an unpopular view.)

        The hack is really cute, and I like it a lot. I have recently come to think that faults should be used more frequently for exceptional control flow; but that compilers should rather add specific features to aid such constructs than provide such a general mechanism as is ‘asm’. In this case, though, no special features are necessary; ud2 can be generated by __builtin_trap, and the location information can be retrieved with dwarf or similar. (Well, dwarf is a bloated mess, and that is a problem unto itself; still.)

      2. 1

        Compilers have been devised which speculatively inline and then rollback if it was unprofitable. Trouble is, they take forever to compile.

        EDIT: this addresses the more general point. Obviously, as the sibling mentions, you can build a better heuristic for inline assembly at the same cost.