I’ve heard that one of the reasons llvm was designed with lots and lots of passes like this was that it makes it possible to unit test the passes. It’s neat to see some of them working one at a time.
That’s certainly a big advantage. Most compilers run as a set of passes though, because the quality of the output of any transform can depend a lot on the input. For example, simple basic-block vectorisation works much better after loop unrolling: if you have a 4-wide vector unit and you unroll your loop 4 times then the vectoriser just sees the a set of basic blocks doing the same thing four times on four different sets of data and so can easily vectorise the loop. A lot of passes work best if they see IR in some kind of canonical form and so having separate transforms that create this canonical form from a variety of different input styles lets other passes all depend on the same canonical form.
There are also some things that work best if they happen at different stages in the pipeline. For example, Objective-C automatic reference counting optimisation is made of three different phases. Early in the pipeline, it splits the combined operations (e.g. store-strong) into separate retain / release operations and makes sure that each retain / release call’s argument is threaded through and the return values are ignored (retain returns its argument). Other things then run and eliminate a lot of unreachable operations on Objective-C objects inline things into scopes where there are now duplicate retain/release pairs, and so on, then the main body of the retain / release operations runs. This eliminates duplicate retain / release operations. Then more optimisations run on the cleaned-up code. Finally, a late pass combines things like retain + store into storeStrong and makes sure that every consumer of an object passed to retain uses the return value, not the argument (this lets the register allocator skip a spill and restore on either side of the call).
Other passes produce their best results when they’re run multiple times. The best examples of this is simple dead-code elimination and CFG simplification, which are run multiple times during the process. The LLVM pass pipeline doesn’t yet do this, but you can get better performance if you run the inliner at least a couple of times with different thresholds at different places in the pipeline: a load of the early passes shrink functions and make them better inlining canididates.
I wonder if you run every llvm pass in a loop, how often do they converge? :)
Every pass? Exactly once? In what order? There are far more orders that won’t converge than there are ones that do converge.
Every pass in alphabetical order round-robin was what I was thinking.
The most likely outcome of that would be to crash. Passes are not fuzz tested and some of them depend on the output of ones the live earlier in the pipeline. Passes are allowed to assert or have llvm_unreachable blocks for cases where things are not in a form that an earlier pass can construct.
Another advantage of this design is that transforms can be expressed as a series of passes. Loop invariant code motion, for example, can be implemented by copying invariant expressions to the block(s) that dominates the loop and then running partial redundancy elimination