1. 25

  2. 4

    I feel like the bytecode specialization process is naturally isomorphic to typical JIT pipelines. The words “guard” and “deoptimize” are used to explain how optimized code is respecialized. If they were to also recall the term “tracing”, then suddenly the entire project can be described in terms of retrofitting an interpreter with a tracing JIT.

    Ignoring lessons from PyPy is done at their own peril. For example, the list of execution tiers includes Tier 1, which lightly compiles code after it has been interpreted once. Unless this defers bytecode compilation itself so that Tier 0 is being directly executed from an AST, this cannot lead to speedups due to the cost of bytecode compilation. Similarly, Tier -1 would be simpler if it were built out of guards, as in PyPy, and there’s a tricky concept called object virtualization which is required for removing frames or other heap-allocated objects during JIT compilation. The barrier between Tiers 2 and 3 is nebulous due to overlapping criteria for the hottest code as well as the cliff in compiler costs; mitigations take strange shapes, as in recent PyPy improvements for large regions of hot code.

    1. 2

      I now wonder how clever you can reasonably be here.

      Rather than have a “safe but optimised for integers” add/sub/mul/div operations, if you have a function which performs a number of numeric operations on its arguments, if you can demonstrate that the type of those arguments didn’t change in the function, then you could guard once at the beginning of the func then use “unsafe integer add with no type check” through that func.

      You could then even hoist the integer values into int64s (rather than python integers) and even into registers for the duration of the function, so you are a lot closer to jitting.

      Lastly, one could have this type information - discovered at runtime - spill out to the caller, and inferred further and further.

      1. 5

        Aren’t you just describing type specialization? The Pysco Python compiler did this years ago, in 2004 or earlier (and it was sort of popular at the time – people used it). The author Armin Rigo then worked on the more ambitious PyPy project.

        I think Figure 1 is pretty much what you’re talking about. In the related work section they cite “Self”, etc.



        1. 5

          I know that WebKit’s JavaScriptCore does exactly that optimisation and I believe other implementations do. In the CFG JIT (tier 3 in JSC) it does type inference, based on type guards. If you know the input to a trace (not necessarily a function, the CFG JIT works on a continuation-passing style IR) are a particular concrete type (in JavaScript, typically double, though in some cases int32), you can then infer the output types and can then dispatch to the specialised version of the trace as long as the guards are hit.

          These optimisations are based on ideas from StrongTalk (Anamorphic Smalltalk) and Self, back in the ’80s and ’90s.