1. 18

From 2007, but still quite relevant in its usage of PEGs/Packrat to compose very different grammars into a single language. The runtime mutability aspect is also a fairly major departure from the traditional offline compiler approach.

This example form the documentation sums how powerful it can be:

import "fortran.kat";
import "python.kat";

fortran {
    SUBROUTINE RANDOM(SEED, RANDX)
        INTEGER SEED
        REAL RANDX
        SEED = 2045*SEED + 1
        SEED = SEED - (SEED/1048576)*1048576
        RANDX = REAL(SEED + 1)/1048577.0
        RETURN
    END
}

python {
    seed = 128
    randx = 0
    for n in range(5):
        RANDOM(seed, randx)
        print randx
}
  1.  

  2. 3

    Very interesting. In Katahdin, can the grammar modifying syntax itself be modified?

    I’ve been experimenting quite a bit with this idea in multiple projects recently. One that probably isn’t useful is a grammar REPL. Haven’t pushed it yet but here’s a quick hack. Each subsequent group of line is parsed from rules cumulated from previous lines (= overwrites a rule instead of defining an alternative).

    I was also wondering how Katahdin managed to mixed Fortran types with Python’s duck typing but it looks like the Fortran types are just ignored? From the source

    class Type
    {
        pattern
        {
            option buildTextNodes = true;
            type:("INTEGER" | "REAL")
        }
        
        method GetType()
        {
            if (this.type == "INTEGER")
                return System.Int32;
            else if (this.type == "REAL")
                return System.Double;
        }
    }
    
    
    class TypeStatement : Statement
    {
        pattern
        {
            type:Type name:Name
        }
        
        method Run(types)
        {
        }
    }
    

    Still pretty interesting and impressive.

    1. 2

      So are you using Python’s syntax (and its parser) to parse the grammars? That’s a very interesting idea, and quite an original approach. I’d be interested in some examples, such as a JSON parser to see how it compares to others in performance.

      1. 2

        Just realized that link expired. Here’s a permanent one.

        So are you using Python’s syntax (and its parser) to parse the grammars?

        No, I’m using my own PEG-based parser, pymetaterp. Here’s the gist of how it parses. It can parse Python but does not use Python’s parser pgen or pgen2.

        Unfortunately, its not that fast. The compiled versions are faster but probably still don’t compare to parsers made for a single language. (Its “compiled” in the sense that it only runs one interpreter (Python) instead of two nested ones (Python + pymetaterp).)

        To test it on JSON, I’d need a grammar for JSON (and then either translate that grammar to the syntax of pymetaterp’s default grammar or write a parser for the grammar file’s syntax in pymetaterp).

    2. 2

      Now what about a compiled language where the syntax is mutable at runtime, eh?

      1. 10

        Yeah, like Forth!

        1. 2

          Touché!

      2. 2

        …so it’s basically Rebol? Or am I missing something?

        1. 2

          I think it’s more like Ometa (by Warth and Piumerta, working with Alan Kay during their stint at the VPRI)

          1. 2

            Yes, although Katahdin is about an executable AST, not just about parsing. Also, the fact that you can change the grammar in-source was not part of Ometa.

        2. 1

          This is a bit tangential, but I would love a language that encapsulates all other languages but also provides great tools for jumping between languages

          We’ve all had that situation where just throwing in a touch of Prolog into the middle of our Python program would be useful.

          Of course there’s huge semantic/runtime difficulties involved. The execution models could be totally different! But that’s why it would be valuable if something like this could exist.

          And no, C is not this, mainly because C itself is pretty miserable to use as a glue language (no built-in iteration primitives? Come on).

          1. 4

            Sounds to me like you’re describing Racket?

            1. 1

              Hmm… maybe this is the case, I haven’t looked into Rackets multi-language abilities in a while.

              To be honest, it might be that just having the right set of high level libraries in any language could give you what is needed here. For example, just having a library to spin up a JS env and sending stuff over to it.

              But then you have stuff like timeouts to deal with, or concurrency, and a bunch of other complications from running an environment that is subordinate to another…

            2. 2

              I tried go build it before. It’s really hard. I plan to try again in the future now that we have great schemes for metaprogramming, intermodule communication, and linking. There will still be some overhead due to calls between incompatible languages. My okd concept for that was to modify the languages or bytecodes to all basically pick one standard for data types and calling conventions. That’s hard, too.

              Just saying there’s people thinking or working on it. :)

              1. 2

                Isn’t .NET doing something like that (shared calling convention and base data types)? I’d really like to see a low level runtime that goes beyond memory allocation, function call and basic computation. Specifically I think unified type identifiers and operations (is subtype, is like type) and dynamic dispatch would be two building blocks that would support many languages.

                1. 2

                  Yeah, I mentioned that in our other discussion: the prior art was OpenVMS standardizing calling conventions for multi-language development at OS level followed by Microsoft doing it in .NET runtime. That Microsoft had acquired OpenVMS core team always made me wonder if those were connected. Far as typed, I know of some hardware approaches at doing stuff like that:

                  • The SAFE architecture with more papers at crash-safe.org. It has a tagged architecture with a general-purpose, metadata unit (PUMP) that can enforce all kinds of properties on memory in data.

                  • The HISC processor adds Java-stye OOP operations. I can’t remember how much, though.

                  Far as software, the Ten15 system used strong typing across a whole system for multi-language use on high-level side. Flex machine supported its operations in hardware. Dynamic dispatch could have some overhead, though, that people with languages that don’t need them would want to avoid. They might not use the VM unless overhead of such features could be disabled.

                  1. 2

                    Yes I remember you mentioned OpenVMS, I really need to look into it now ;) I think augmenting pointers to data with metadata (type info, memory layout) and capabilities would be a key element in reaching this holy grail of a platform that allows for efficient runtime implementation of a wide range of languages. Both SAFE and Ten15 seem to favor that approach (although for different goals). I have a toy project called libref, in C, that tries to formalize what this metadata looks like and what operations can be built with it. One of my inspiration was Pony’s capabilities, which provides a perspective on data that is completely orthogonal to its type. One of the challenge in libref is defining the proper set of capabilities and their relations.