1. 31

  2. 1

    So, could you do this from the other end with a scheme-like syntax for Haskell itself?

    You’d have to implement the macro language from scratch I guess.

    1. 12

      You would lose a lot of expressive power if you just tried to re-use the GHC typechecker. I’ve spent a lot of time explaining why this is the case in the /r/haskell thread for this blog post, so I will not re-explain here. Instead, I’ll reproduce part of that discussion.

      Okay, I’ve been sort of avoiding getting into this particular discussion because it’s really complicated, but it seems like a point of confusion, so let me try and clear a couple things up.

      First of all, GHC is not a single thing, obviously. It has a lot of different pieces to it: it has a parser, a typechecker, a desugarer, an optimizer (itself composed of many different pieces), and a set of backends. When you say “reuse GHC”, you have to be specific about which pieces of GHC you are talking about. Obviously, if you just reuse all of it, then you just have Haskell, so presumably you mean one of two things: reusing the typechecker or reusing the optimizer and backends. It’s possible you also mean both of those things, in which case the compilation strategy would basically just be “compile to Haskell, then run the whole compilation pipeline”.

      However, let me make this perfectly clear: based on the way Hackett works, Hackett cannot reuse the GHC typechecker. The typechecking algorithms are fundamentally incompatible. If you are advising reusing GHC’s typechecker implementation, then the answer is “no, it can’t be done, no buts, full stop”. Why? Well, again, it’s the thing I keep referencing and quoting; Hackett requires typechecking to be interleaved with macroexpansion, but GHC’s typechecking algorithm is a whole-program analysis. These are incompatible ideas.

      GHC’s current typechecking algorithm is obviously wildly different from classic Hindley-Milner, but it keeps the general technique of generating a big bag of constraints and solving them at appropriate times (generally just before generalization). This technique has some really good properties, but it also has some bad ones. The good properties are that it provides fantastic type inference for basically all programs, and it does not impose any particular program order since it is such a global transformation. However, the downsides are that error messages can be frustratingly nonlocal and that it requires a full-program traversal to know the types of anything at all.

      For Haskell, this isn’t so bad. But what does it mean for macros? Well, keep in mind that a macro system wants all sorts of useful things, like the ability to inspect the type of some binding in order to direct its expansion. You can see this yourself in a highly limited form with Template Haskell, which has the reify and reifyModule. Of course, Template Haskell is not designed to be nearly as expressive as a macro system, but it still imposes severe constraints on the typechecker! From the section of the GHC Users Guide on Template Haskell:

      Top-level declaration splices break up a source file into declaration groups. A declaration group is the group of declarations created by a top-level declaration splice, plus those following it, down to but not including the next top-level declaration splice. N.B. only top-level splices delimit declaration groups, not expression splices. The first declaration group in a module includes all top-level definitions down to but not including the first top-level declaration splice.

      Each declaration group is mutually recursive only within the group. Declaration groups can refer to definitions within previous groups, but not later ones.

      Accordingly, the type environment seen by reify includes all the top-level declarations up to the end of the immediately preceding declaration group, but no more.

      Unlike normal declaration splices, declaration quasiquoters do not cause a break. These quasiquoters are expanded before the rest of the declaration group is processed, and the declarations they generate are merged into the surrounding declaration group. Consequently, the type environment seen by reify from a declaration quasiquoter will not include anything from the quasiquoter’s declaration group.

      These are serious restrictions, and they stem directly from the fact that GHC’s typechecking algorithm is this sort of whole-program transformation. In Hackett, every definition is a macro, and macro use is likely to be liberal. This restriction would be far to severe. To combat this, Hackett uses a fundamentally different, bidirectional typechecking algorithm, very similar to the one that PureScript uses, which allows the implementation of a Haskell-style type system without sacrificing modularity and incremental typechecking.

      My implementation of this type system has been remarkably successful given the novelty of the implementation and the amount of time I have spent on it, not least in part due to the availability of the PureScript source code, which has already solved a number of these problems. I don’t think that there’s reason to suggest that getting a large set of useful features will be difficult to achieve in a timely manner. The key point, though, is that the easy solution of “just call into GHC!” is a non-starter, and it is a dead end just for the reasons I mentioned above (and I haven’t even mentioned all the myriad problems with error reporting and inspection that sort of technique would create).

      Okay, so using GHC’s typechecker is out. What about reusing the optimizer and compiler? Well, this is actually something I do want to do! As far as I know, this should be completely feasible. It’s a lot more work than just compiling to the Racket VM for now, though, since the Racket VM is designed to be easy to compile to. In general, I want to support multiple backends—probably at least Racket, GHC, and JavaScript—but that is a big increase in work and complexity. Building for the Racket ecosystem to start gives me a trivial implementation with acceptable speed, an easy ecosystem of existing code to leverage, a host of useful built-in abstractions for building language interoperation, a fully-featured IDE that automatically integrates with my programming language, and an extensible documentation tool that can be used to write beautiful docs to make my new programming language accessible. Building a new language on the Racket platform is an obvious choice from a runtime/tooling point of view, it’s only the typechecker that is a lot of work.

      So, to summarize: reusing the typechecker is impossible, reusing the compiler optimizer/backend is feasible but extra work. If you have any additional suggestions for how I could take advantage of GHC, I’d love to hear them! But hopefully this explains why the simplest-looking route is not a viable choice for this project.

      — me, on /r/haskell

      Here’s some more context about what that additional expressive power actually is, from another part of the thread:

      I’m particularly interested about the metaprogramming aspect. At which point are macros run? In particular, can I get access to type info in a macro? That would allow implementing things like idiom brackets as a macro which would be pretty cool.

      — cocreature, on /r/haskell

      This is a great question, and it’s absolutely key to the goal of Hackett. Hackett macros are run at compile-time, obviously, but importantly, they are interleaved with typechecking. In fact, it would probably be more accurate to say that typechecking is subsumed by macroexpansion, since it’s the macros themselves that are actually doing the typechecking. This technique is described in more detail in the Type Systems as Macros paper that Hackett is based on.

      This means that yes, Hackett macros have access to type information. However, the answer is really a little trickier than that: since the Haskell type system is relatively complex but does not require significant type annotation, sometimes types may not be known yet by the time a macro is run. For example, consider typechecking the following expression:

      (f (some-macro (g x)))

      Imagine that f and g both have polymorphic types. In this case, we don’t actually know what type g should be instantiated to until some-macro is expanded, since it can arbitrarily change the expression it is provided—and it can even ignore it entirely. Therefore, the inferred type of (g x) is likely to include unsolved type variables.

      In many cases, this is totally okay! If you know the general shape of the expected type, you can often just introduce some new type variables with the appropriate type equality relationships, and the typechecker will happily try to solve them when it becomes relevant. Additionally, many expressions have an “expected type” that can be deduced from user-provide type annotations. In some situations, this is obvious, like this:

      (def x : Integer
        (my-macro (f y)))

      In this case, my-macro has access to the expected type information, so it can make decisions based on the expectation that the result expression must be an Integer. However, this information can also be useful in other situations, too. For example, consider the following slightly more complicated program:

      (def f : (forall [a] (SomeClass a) => {(Foo a) -> a}) ...)
      (def x : Integer
        (f (my-macro (g y)))

      In this case, we don’t directly know what the expected type should be just by observing the type annotation on x, since there is a level of application in between. However, we can deduce that, since the result must be an Integer and f is a function from (Foo a) to a, then the expected type of the result of my-macro must be (Foo Integer). This is a deduction that the typechecker already performs, and while it doesn’t work for all situations, it works for many of them.

      However, sometimes you really need to know exactly what the type is, and you don’t want to burden users with additional type annotations. Typeclass elaboration is a good example of this, since we need to know the fully solved type of some expression before we can pick an instance. In order to solve this problem, we make a promise to the typechecker that our macro’s expansion has a particular type (possibly in terms of existing unsolved type variables), and the typechecker continues with that information. Once it’s finished typechecking, it returns to expand the macro, providing it a fully solved type environment. This is not currently implemented in a general way, but I think it can be, and I think many macros probably fit this situation.

      In general, this is not a perfectly solvable problem. If a macro can expand into totally arbitrary code, the typechecker cannot proceed without expanding the macro and typechecking its result. However, if we make some restrictions—for example, by weakening what information the macro can obtain or by restricting the type of a macro’s expansion—we can create macros that can implement many different things while still being quite seamless to the user. I think implementing idiom brackets should not only be possible, but it should probably be a good test at whether or not the implementation is really as powerful as I want it to be.

      For a little bit more discussion along these lines, see this section of a previous blog post.

      — me, on /r/haskell

      1. 1

        Right, so the short version is: yes, but a naive implementation would be impoverished by comparison because in Hackett macro expansion is tightly integrated with type checking which means that macros have access to type information at expansion time which enables richer macro definitions.

        If you rewrote the Haskell frontend to do that, then you’d have to re-write the type checker along the way in order to end up with something that looked a lot like the existing Hackett compiler.

        I guess you’d also have to deal with all the questions about how the macro expansion would integrate with the endless extensions to the Haskell type system. Not a small task!

        I’ll look forward to seeing more of Hackett in the future!

      2. 2

        Reading her prior posts on the subject, she talks about the tools the racket ecosystem provides for creating programming languages. So I’m guessing if she ever does implement it in haskell (for instance, make it easier to import haskell libraries) she’ll have to wait until she’s gathered a few more helping hands.

        1. 4

          (The author is not a he – as per her twitter, https://twitter.com/lexi_lambda, she identifies as she/her).

          1. 2

            My bad, I’ll correct it!

        2. 1

          haskell has template haskell, which is it’s version of macros/metaprogramming, so it might not necessarily need to be done from entirely from scratch.

          1. 1

            Sure, but template Haskell’s syntax is legendarily awkward compared to the equivalent macro in lisp / scheme. I dread to think what implementing a macro language in Template Haskell would look like.

            But maybe I’m overthinking it :)