1. 15
  1. 7

    I’m really excited to see this. I hope it will evolve into a full [Objective-]C[++] dialect and be integrated into the LLVM IR generation pipeline so that the static analyser can operate operate over an SSA representation of the source code with good dataflow representations and that this representation is not distinct from the one used for lowering. This should also make it much easier to do things like optimising exception flow (if you use a C++ library flow-control structure and use exceptions to break out of it then, after inlining, it should be possible to lower this to purely local control flow, but that’s very hard in LLVM IR for a variety of reasons), eliding reference counts in shared_ptr, and so on.

    1. 1

      One thing that’s not at all clear to me: would this be replacing one of the other IRs currently used in clang, or would this be strictly adding extra passes? If the latter, it sounds like it would inevitably cause slower compiles since there’s more work to do?

      1. 4

        The current status of Clang is that code is parsed to AST, CFG is built from AST, analysis is run on CFG to warn, and then CFG is thrown away, and then LLVM IR is lowered from AST. This is what is meant by “parallel lowering” in RFC. One goal of CIR is to use the same IR for analysis and lowering to avoid divergence.

        1. 4

          It’s so much worse than that. There are at least three different IRs in clang. Everything is built from the AST, but then things diverge:

          • The static analyser uses its own CFG, built from the AST and completely disconnected from LLVM IR generation.
          • Constant expression evaluation uses two different IRs, built from the AST and distinct from IR generation.
          • LLVM IR is generated for optimisation and lowering.

          The static analyser doesn’t bother me too much (aside from duplicated work), because if the analyser has false positives or negatives that’s a QoI issue, not a compilation-correctness issue. The fact that evaluating constant expressions uses a completely different set of codepaths to IR generation is something I find increasingly terrifying with each new C++ standard and its extensions to what constexpr and consteval are allowed to do.

          I’d love to see constexpr evaluation in clang done by lowering to CIR, lowering the CIR to LLVM with a slightly different lowering that diverges from the IR generation layer only by keeping pointers in abstract form (i.e. keeping object + offset representation so that these can be generated as relocations).

          1. 1

            Thank you for explaining this. That makes this sound like a very good idea just from the perspective of avoiding bugs caused by divergence between the different lowerings.