1. 57
  1. 19

    This is nice, interesting work, but I’m uncomfortable with the hyperbole in the description. It sounds like a small amount of effort is devoted to correctness and clear specifications, and a large amount of effort is devoted to good communication (great!) and finding (improbable) blockchain applications (uh…).

    It’s great that there is hobbyist interest in optimal reduction outside academia. Academic researchers never quite managed to get good-looking performance numbers out of “optimal reduction” work (or other similar topics: geometry of intraction, etc.), and it was never quite clear whether it was due to fundamental issues with the work, or a lack of effort and interest in the lower-level aspects of language implementation. We are learning some new things with these new attempts. Being competitive with mature implementations is impressive.

    On the other hand, many things remain unsaid here, and I find it a little worrying:

    • What is exactly the reduction strategy here, is it clearly specified somewhere (HOW.md looks like a tutorial with several parts missing). Are the details of the expected behavior, to be checked against the implementation, written down in full somewhre?
    • What is actually the subset of programs that can be evaluated correctly using this strategy? Are all lambda-terms evaluated correctly, or only some of them? (Past work had restrictions to “elementary affine” stuff, etc.) Can we check whether a given program is in the fragment that will be correctly evaluated, either statically (before we execute the program) or dynamically (during execution with this runtime)?

    Note: the description of this works points out in many place that performance numbers are comparing a prototype (this work) with a mature implementation (GHC); it then suggests that this is impressive and indicates that an optimized implementation (like GHC is) could do even better. I think that this is a bit misleading: mature implementations also handle a lot extra features and concerns not covered in the prototype, and many of those degrade performances instead of improving them. (For example: to improve debuggability, have clear reports when failure happens, etc.)

    I’m a bit worried about the combination of interesting, valuable hobbyist research (the optimal evaluation) with blockchain stuff. The blockchain community has a tendency to make hyperbolic claims for the purpose of visibility, investment and speculation, that is not conductive to learning truths about software systems.

    1. 13

      To add to this, am a bit concerned about the emphasis on microbenchmarking while also not disclosing that the VM does not have a real allocator and it just mallocs 32 GBs of RAM.

      1. 2

        Avoiding memory allocation as a performance optimization

      2. 5

        I followed this work for a while. As far as I know, there is no specification for reduction strategy (other than the implementation itself, that is), no documentation on what is the important innovation, no words on restrictions. As I understand, elementary affine restriction is still there in some form, and HVM silently evaluates it incorrectly without any static or dynamic check. HVM’s position is that it’s not their problem, they plan to document the restriction, compilers should figure out how to check, and it has not been a problem in practice in their experience.

        I understand this is frustrating for someone in academia. But in my opinion, it’s academia’s job to solve this. What’s in it for Victor? Victor isn’t interested in solving this, and I think it is understandable. Interested researcher in academia should reverse engineer and figure it out by themselves. They also could, I mean, ask Victor. I think Victor won’t be hostile to such effort, but it’s not on Victor to do this.

        1. 9

          I understand this is frustrating for someone in academia. But in my opinion, it’s academia’s job to solve this.

          Just to be clear, I’m not suggesting that this work should be done following “the academic way”, in particuliar, by publishing research papers at journals/conferences. I certainly agree that this is not a requirement one should have on our software systems (although it’s of course great when it happens), and it is unnecessarily difficult unless one collaborates with someone familiar with the academic practices.

          The things that I mentioned (to summarize: “having a clear description of what this is doing”, and “having a clear understanding of what the limitations are”), are not academic pursuits, they are what anyone should do if you want to write any software that works well. If you don’t do this, the only reason to believe that this is not utterly broken (most software at this level of complexity is) is the trust one has in the author. Good software systems should no rely on this! There should be enough documentation around that anyone interested can, at reasonable cost, dive into the specification and correctness claims, and see for themselves if they believe that the design is solid and that the implementation matches it.

          Publishing papers would require this clarification work to be done first, but the reason to do it is not to publish papers / academia something, it is to build a system that works well, and will continue to work well as it evolves.

          At a more basic level: what is there in this system right now to ensures that it works correctly? To my knowledge: nothing. There is a bench repository in the code, and no test directory. It’s rather easy to think of low-effort ways to test this, in particular property testing (quickcheck-style) against a known-valid implementation of reduction (possibly very simple). But this relies on the fact that we know what the software should do, what are the expected correctness properties, otherwise what would we test exactly?

          “Academic success” certainly doesn’t need to be a priority for the author, but I think that “software that works well and can be trusted” should be. It should be a priority for everyone, but especially if your intended application domains are “a proof assistant” (as in previous iterations, Kind/Formality I believe?) or “a blockchain”… or whatever.

      3. 12

        This is seriously, seriously impressive. The whole time I was reading “HOW.md”, I kept thinking it was too good to be true. There has to be a catch. And there is, kind of?

        If a lambda that clones its argument is itself cloned, then its clones aren’t allowed to clone each-other.

        For example, this term is not allowed:

        let g = λf(λx(f (f x)))
        (g g)

        That’s because g is a function that clones its argument (since f is used twice). It is then cloned, so each g on the second line is a clone. Then the first clone attempts to clone the second clone. That is considered undefined behavior, and a typed language that compiles to HVM must check that this kind of situation won’t happen.

        Runtime undefined behavior that has to be detected and forbidden by a compiler. What I can’t tell from the documentation is: how difficult is it to statically prove that this can’t happen? How much of a burden will that be on compiler implementers? The documentation claims that the undefined behavior is a very rare situation that won’t happen unless you’re doing Deep Functional Magic, which is fine, so long as you can know it won’t happen in a given program.

        This is something I’m mildly concerned about, but it doesn’t cancel out my enthusiasm. HVM is an idea with a lot of potential, and I can already see a lot of ways to use it in my own projects.

        1. 4

          Terms like this seem to rely on impredicativity. So I think one easy way a type system could rule this out is by ruling out impredicativity altogether (as does Hindley–Milner, for example), but that would also rule out some useful terms like those described in [1].

          [1] Alejandro Serrano, Jurriaan Hage, Simon Peyton Jones, and Dimitrios Vytiniotis. 2020. A quick look at impredicativity. Proc. ACM Program. Lang. 4, ICFP, Article 89 (August 2020), 29 pages. DOI:https://doi.org/10.1145/3408971

        2. 9

          What’s the actual definition of beta-optimal? Searching gives a lot of medical and finance related things, but nothing that looks applicable here.

          1. 2

            yeah, “beta optimal” afaik, is not a term in the literature, it’s probably referring to what Jean-Jacque Levy called “optimal reduction”

            1. 3

              Yeah, I think it’s a minced named for the optimal beta-reduction of an expression.

              disclaimer: I’m not a qualified academic in computational theory or lambda calculus

              As I understand, Levy recognised that in terms of reduction (i.e. application) strategies, simply beta-reducing expressions may not always offer the most optimal form for minimising the number of reductions needed since it’s possible to have non-reducible terms with identical subexpressions. As such, that would mean performing the same computation multiple times when reducing with known values.

              The definition is a bit loose, but the optimal reduction can be thought of as the expression rewritten such that a graph of it’s evaluation contains a minimal number of nodes, which provides a different model for optimizing the computation done for an expression than, for example, a call-by-need strategy as used by Haskell, where identical expressions are made memoizable by the runtime and lazily produced as evaluated.

          2. 4

            The program above creates a perfect binary tree with 2^n elements and adds them up. Since it is recursive, HVM will parallelize it automatically.

            The program above runs in about 6.4 seconds in a modern 8-core processor, while the identical Haskell code takes about 19.2 seconds in the same machine with GHC. This is HVM: write a functional program, get a parallel C runtime. And that’s just the tip of iceberg!

            This doesn’t sound super fast, although if you’re a haskell user this is fantastic, assuming the difference isn’t all start up time. Am I missing something?

            And yes I’m on record as believing that security concerns of predictive execution will likely force us to models more like this.

            1. 8

              GHC programs don’t have start up time, and are also very fast (if well written of course, blah blah).

              1. 4
                1. 1

                  I don’t understand, how can you tell that an exponential program running in 6.4 or 19.2 second “does not sound super fast”? Can you make guesses at how long summing 2^30 leaves should take, or are you comparing with the same program written in another system (which one)?

                  1. 1

                    Well let’s say that this benchmark doesn’t impress me except relatively.

                2. 4

                  This example takes advantage of beta-optimality to implement multiplication using lambda-encoded bitstrings. Once again, HVM halts instantly, while GHC struggles to deal with all these lambdas. Lambda encodings have wide practical applications. For example, Haskell’s Lists are optimized by converting them to lambdas (foldr/build), its Free Monads library has a faster version based on lambdas, and so on. HVM’s optimality open doors for an entire unexplored field of lambda-encoded algorithms that were simply impossible before.

                  Probably my favorite part of the writeup.