If you need an API you can try one https://github.com/vnmakarov/mir/ . IIRC the author or QBE isn’t against an API, he just didn’t have the time to implement one yet.
Its a hobby project by someone with a day job. That being said, I know at least 2 competent contributors that are using it for their own projects (and thus contribute small improvements), and the author is still present.
Recently, I did an experiment by switching on only a fast and simple RA and combiner in GCC. There are no options to do this, I needed to modify GCC. (I can provide a patch if somebody is interested.) Compared to hundreds of optimizations in GCC-9.0 with -O2, these two optimizations achieve almost 80% performance on an Intel i7-9700K machine under Fedora Core 29 for real-world programs through SpecCPU, one of the most credible compiler benchmark suites
The place that these modern compilers really tend to win is when your code can be auto-vectorized.
There was an interesting talk at PLDI this year about using machine learning to auto-vectorize code. It’s not difficult to recognize opportunities for vectorization, but given a piece of vectorizable code, it’s O(big) to choose the best way to vectorize it. Existing solutions are mostly heuristics-based, but their ML solution was able to generate very favourable results compared with the brute-force O(big) solution, in much less time.
A warning, though: optimizations are a bottomless pit, and you will never be as good as gcc.
It depends a bit on your goals and your source language.
The JavaScript Core developers abandoned LLVM for their back end in part because they cared about compile latency a lot more than LLVM did (LLVM has improved this a lot for ORCv2 than the older JITs). Optimisation is analysis and transformation and often the transformation part is the easy bit. If your source language expresses things that LLVM or GCC spends a lot of complexity trying to infer then you may be able to do as well as one of them without too much effort.
I used to teach a compiler course where one of the examples was a toy language for 2D cellular automata. Every store in the language was guaranteed non-aliasing and it ran kernels that, at the abstract machine level, were entirely independent. It used LLVM on the back end, but starting from that language you could fairly trivially get close to LLVM’s performance because everything that you want autovectorisation to infer is statically present in the source language and most of the loop optimisations can similarly be applied once statically given the shape of the grid, rather than generically implemented for all of possible loop nests.
This project aims to get 70% of LLVM / GCC perf from 10% of the code. That’s probably easy on average. There are a lot of things in LLVM that give a huge perf win for a tiny subset of inputs. If you care about the performance of something that hits one of these on its most critical path, you will see a huge difference. If you care about aggregate performance over a large corpus, very few of these makes a very significant overall difference.
I wish this had a C API instead of me printing IR text.
If you need an API you can try one https://github.com/vnmakarov/mir/ . IIRC the author or QBE isn’t against an API, he just didn’t have the time to implement one yet.
QBE is also the backend of cproc: https://git.sr.ht/~mcf/cproc (a new C compiler from @mcf).
I looked at QBE about six months ago for a compiler project, but it seemed semi-abandoned. Is that not the case? I’d be delighted if so.
Its a hobby project by someone with a day job. That being said, I know at least 2 competent contributors that are using it for their own projects (and thus contribute small improvements), and the author is still present.
Excellent news, thanks.
I’ve been working on a compiler backend of my own, along similar lines, recently. Very fun, and I’ve learned a lot; would recommend making one.
A warning, though: optimizations are a bottomless pit, and you will never be as good as gcc.
That’s true – however, you can get suprisingly far with surprisingly little effort. To quote https://developers.redhat.com/blog/2020/01/20/mir-a-lightweight-jit-compiler-project/
The place that these modern compilers really tend to win is when your code can be auto-vectorized.
There was an interesting talk at PLDI this year about using machine learning to auto-vectorize code. It’s not difficult to recognize opportunities for vectorization, but given a piece of vectorizable code, it’s O(big) to choose the best way to vectorize it. Existing solutions are mostly heuristics-based, but their ML solution was able to generate very favourable results compared with the brute-force O(big) solution, in much less time.
It depends a bit on your goals and your source language.
The JavaScript Core developers abandoned LLVM for their back end in part because they cared about compile latency a lot more than LLVM did (LLVM has improved this a lot for ORCv2 than the older JITs). Optimisation is analysis and transformation and often the transformation part is the easy bit. If your source language expresses things that LLVM or GCC spends a lot of complexity trying to infer then you may be able to do as well as one of them without too much effort.
I used to teach a compiler course where one of the examples was a toy language for 2D cellular automata. Every store in the language was guaranteed non-aliasing and it ran kernels that, at the abstract machine level, were entirely independent. It used LLVM on the back end, but starting from that language you could fairly trivially get close to LLVM’s performance because everything that you want autovectorisation to infer is statically present in the source language and most of the loop optimisations can similarly be applied once statically given the shape of the grid, rather than generically implemented for all of possible loop nests.
This project aims to get 70% of LLVM / GCC perf from 10% of the code. That’s probably easy on average. There are a lot of things in LLVM that give a huge perf win for a tiny subset of inputs. If you care about the performance of something that hits one of these on its most critical path, you will see a huge difference. If you care about aggregate performance over a large corpus, very few of these makes a very significant overall difference.