1. 13

The growing diversity of domain-specific accelerators spans all scales from mobile devices to data centers. It constitutes a global challenge across the high-performance computing stack and is particularly visible in the field of Machine Learning (ML). Program representations and compilers need to support a variety of devices at multiple levels of abstraction, from scalar instructions to coarse-grain parallelism and large scale distribution of computation graphs. This puts great pressure on the construction of both generic and target-specific optimizations, with domain specific language support, interfaces with legacy and future infrastructure, and special attention to future-proofness, modularity and code reuse. This motivates the construction of a new infrastructure, unifying graph representations, ML operators, optimizations at different levels and also across levels, targets, ML frameworks, training and inference, quantization, tightly interacting with runtime systems. Compilers are expected to readily support new applications, to easily port to new hardware, to bridge many levels of abstraction from dynamic, managed languages to vector accelerators and software-managed memories, while exposing high level knobs for autotuning, enable just-in-time operation, provide diagnostics and propagate functional and performance debugging information across the entire stack, and delivering performance close enough to hand-written assembly in most cases. We will share our vision, progress and plans towards the design and public release of such a compiler infrastructure.

  1.  

  2. 3

    Cool, thanks for posting. I might be able to use this in my own compiler project, since I have many of the same requirements. Building a domain specific IR + optimization passes from scratch is a big job that I’ve been putting off. A library that takes care of all the boilerplate seems attractive.

    Here’s a question. My language can be either interpreted or compiled. Compilation is required for execution on a GPU, or for fast execution on a CPU, but compilation is slow. The interpreter works by quickly compiling to an intermediate representation (kind of like an IR), then interpreting the IR. It starts instantly, with no discernable lag, and that’s helpful when using the REPL interface, or doing live coding.

    In the optimizer that I want to build, constant folding and partial evaluation will be very important. This could lead to the optimizer containing a copy of the interpreter, and I don’t want to maintain two interpreters in parallel. So the question is, can I design an IR that serves as both the executable format for the interpreter, and also as the input and output of the optimizer? Is this a thing that people do? Are SSA, CPS or ANF better or worse for this?

    1. 3

      For a somewhat related paper, see Adaptive Execution of Compiled Queries and link to the paper.

      1. 1

        Your questions remind me of this post describing WebKit’s IRs and an effort to replace LLVM IR with a custom domain-fit IR: https://webkit.org/blog/5852/introducing-the-b3-jit-compiler/

        The WebKit posts by Filip Pizlo about the evolution of Webkit’s JS engine are fascinating.