1. 10

  2. 1

    I liked this but it didn’t seem to mention performance at all. To me correctness is not enough of a motivation to move to static typing, but performance it is. And of course if performance gets worse due to runtime checks, then that would be a point against switching.

    Oh actually it does in section 6.4 but it has the downside I expected:

    Our current implementation is built on top of the straight-forward (naive) Kernan interpreter, which does not focus on performance, and adding additional dynamic evaluations aswe have can only slow it further, as we would also expect for other implementations of the approach.

    In any case Python has the first class objects part, but not the pattern matching part. The Python typing module is kind of voodoo, but it has only “dumb” type objects. The mechanism lets MyPy do its job, but doesn’t let the user customize type checks (neither static nor dynamic).

    >>> import typing
    >>> typing.Dict
    >>> type(typing.Dict)
    <class 'typing.GenericMeta'>
    >>> typing.Dict[str, str]
    typing.Dict[str, str]
    >>> type(typing.Dict[str, str])
    <class 'typing.GenericMeta'>

    In other words types are objects that can be composed with the [] operator using normal Python expressions.

    1. 2

      I decided to investigate performance in Monte, a distant relative of Grace. Like E and Grace, Monte has guard syntax which integrates this patterns-as-types philosophy. Broadly, Monte’s reference interpreter performs roughly like CPython.

      Here are three functions which implement the same exponential function, but which have different guards for signatures. The first function is like Python or other untyped languages, the second function is like Java or other simply-typed languages, and the third is like Idris or other dependently-typed languages.

      ⛰  def f(x) { return x.exponential() }
      Result: <f>
      ⛰  def g(x :Double) :Double { return x.exponential() }
      Result: <g>
      ⛰  def h(x :Double) :(Double > 0.0) { return x.exponential() }
      Result: <h>
      ⛰  def offset := Timer.unsafeNow() # to not trivially overflow
      Result: 1587487401.913163
      ⛰  def pf := repl.benchmark(fn { f(Timer.unsafeNow() - offset) })
      Result: <unsafely-printed promise>
      ⛰  def pg := repl.benchmark(fn { g(Timer.unsafeNow() - offset) })
      Result: <unsafely-printed promise>
      ⛰  def ph := repl.benchmark(fn { h(Timer.unsafeNow() - offset) })
      Result: <unsafely-printed promise>
      ⛰  pf * 1_000_000_000
      Result: 930.404663
      ⛰  pg * 1_000_000_000
      Result: 1741.385460
      ⛰  ph * 1_000_000_000
      Result: 30171.060562

      Times are in nanoseconds. We can see that there is a significant penalty to each guard invocation. But what if that invocation is amortized across the life of a value in a long-running computation? I’m going to skip over some imports, and use a guard-laden numeric kernel which I have been using as a benchmark for a while.

      ⛰  def entropy := makeEntropy(currentRuntime.getCrypt().makeSecureEntropy())
      Result: <entropy>
      ⛰  def noise := makeSimplexNoise(entropy)
      Result: <noiseMaker>
      ⛰  def f2(x) { return noise.noise(V(x, x, x)) }
      Result: <f2>
      ⛰  def g2(x :Double) :Double { return noise.noise(V(x, x, x)) }
      Result: <g2>
      ⛰  def h2(x :Double) :(-1.0..1.0) { return noise.noise(V(x, x, x)) }
      Result: <h2>
      ⛰  def pf2 := repl.benchmark(fn { f2(Timer.unsafeNow()) })
      Result: <unsafely-printed promise>
      ⛰  pf2 * 1_000_000
      Result: 310.521507
      ⛰  def pg2 := repl.benchmark(fn { g2(Timer.unsafeNow()) })
      Result: <unsafely-printed promise>
      ⛰  pg2 * 1_000_000
      Result: 311.253691
      ⛰  def ph2 := repl.benchmark(fn { h2(Timer.unsafeNow()) })
      Result: <unsafely-printed promise>
      ⛰  ph2 * 1_000_000
      Result: 356.986547

      Times are now in microseconds. Here we can see that, indeed, bigger computations wash out part of the cost of the guards. But there is still a clear performance difference between Double and Double > 0.0 or -1.0..1.0, which seems strange and wrong, yeah? This is because the former guard is built into the interpreter, and the latter guards are written in a prelude dialect of Monte which is interpreted. The JIT can only do so much; it cannot completely paper over the difference.

      But, at the same time, the JIT can clearly remove quite a lot of overhead for simple guards. And it could do better, were my hand-written templates and prelude better.

      1. 1

        Hm that is pretty interesting. What is the algorithm to decide whether the checks are performed? i.e. what is the JIT?

        FWIW I’m coming up at this from the perspective of my Oil project. It was dynamically typed, and too slow, so I added static types to speed it up. I probably wouldn’t have added static types otherwise, since it was a bunch of work.


        After the whole program passes under mypy --strict, it can be semi-automatically translated to C++. This generally involves changing reflection/metaprogramming to textual code generation, and a few other changes.

        I haven’t written that much about translating it to C++, but there are a few details here:


        Interestingly Oil is a program that conforms to TWO type systems: the MyPy type system and the C++ type system. They are admittedly similar, but it does have the effect of making the program more abstract and divorced from the specifics of a particular language.

        Anyway I’m relatively pleased with this process, even though it’s not done. I would like to someday program in a dynamically typed language that allows gradual refactoring to static types – and unlocking C++-like performance. Both in terms of raw throughput but also predictability/low variance.

        1. 2

          We use RPython for our JIT generator. The RPython-generated JIT is a tracing metainterpreter. When the JIT sees that a value is, for example, a wrapper for a double-precision floating-point number, then the JIT remembers this fact by remembering the value’s class. This allows the JIT to then optimize the field accesses for the value. The JIT can also see when primitive guards check the classes of values, and it can remove repeated checks. The resulting JIT traces will check the types of values once near the top of each loop, and then remove any intermediate repeated checks.

          This approach is still a little heavyweight and we don’t take the best advantage of it. Indeed, when I went back to check, I was a little surprised to find that the JIT didn’t even kick in for the REPL benchmarking; all of my numbers above are in interpreted code, and the JIT only removed work done by lower-level code. I can still give you an example. Here is a cleaned-up snippet from a JIT trace log:

          +7316: guard_class(p328, ConstClass(InterpObject), descr=…)
          +7335: p353 = getfield_gc_r(p328, descr=<FieldP typhon.nano.interp.InterpObject.inst_script 112 pure>)
          +7339: guard_value(p353, ConstPtr(ptr354), descr=…)
          +7352: p355 = getfield_gc_r(p328, descr=<FieldP typhon.nano.interp.InterpObject.inst_frame 96 pure>)
          +7356: guard_nonnull_class(p22, ConstClass(StrObject), descr=…)
          +7381: p358 = call_r(ConstClass(subCoerce), p22, descr=<Callr 8 r EF=5>)
          +7444: guard_no_exception(descr=…)
          +7459: guard_nonnull(p358, descr=…)

          The JIT debugging messages all indicate method calls; .run/2 means something like f(x, y). .coerce/2 is the atom for guards performing coercions. Here, we are looking up some interpreter-level value from an object’s frame, and binding it to another object’s frame. The .matchBind/3 operation has to look up a script and frame, and then we check with a .coerce/2 that the value in the frame has a StrObject class. After that, the receiving object seems to coerce it again, or perhaps something unrelated is coerced? We will never know, because the JIT has already removed those operations. So, in this section, two of three coercions were removed entirely.

          And I should note that we don’t emit good code. I am not a good compiler engineer and I know that we waste a lot of time and space doing things that we ought not to do. But I only spend enough time on compiler internals to make things fast enough to enable everything else.

    2. 1

      Sounds a lot like what dry-types in Ruby is doing.