1. 21

  2. 15

    PicoLisp is an interesting example of a purely interpreted language, though it doesn’t meet your request for a popular interpreter. It uses dynamic variable binding, which was historically a common feature of the directly-walking-sexpressions approach to interpreting Lisp.

    Emacs’s Elisp used to be implemented similarly, but now has a byte-compiler. Earlier than that, almost all Lisp and Scheme systems used to include a directly-walking-sexpressions interpreter for use at the REPL because compilers had too high latency to use interactively. But compilers have gotten fast enough that many systems have switched over to just using a compiler everywhere, to avoid having maintain a separate interpreter. One that still has both an interpreter and a compiler is Chez Scheme, which is now open source (and like all Scheme interpreters, it does lexical variable binding). The interpreter has lost most of its reason for existence though: historically, the Chez business model was that the interpreter, Petit Chez, was free and cross-platform, while platform-specific native-code compilers were sold as products.

    1. 10

      Emacs’s Elisp used to be implemented similarly, but now has a byte-compiler.

      Though Emacs has a byte compiler, it’s not used exclusively. Nearly every Emacs setup runs a mix of interpreted and compiled code, so Emacs may contain the most widely-used non-bytecode interpreter in the world.

      1. 2

        This is very interesting to learn. Are there large pure PicoLisp projects? Do you happen to know what’s the last version of ELisp that was still interpreted? I’ve never looked but I’m guessing Emacs is mostly written in ELisp?

        Edit: Ok, so I just looked at some PicoLisp source for PilBox and it seems to me that the call graph is pretty shallow. That is, functions in these libraries call primitives but not each other very much. That and using its Java FFI for Android is really cool! But doesn’t quite help with examples of what I’m trying to find (tricks to get around the speed issue from nesting in direct AST interpretation).

          1. 1

            On a related note, femtolisp was used in Julia compiler. They rewrote Julia in mostly Julia. So, I’m not sure how much femotlisp is in there now.

            1. 2


              even though many primitives (e.g. filter and for-each) are written in the language instead of C.

              This sounded promising but then it went on to

              femtolisp uses a bytecode compiler and VM

              which is unfortunate for the current question. Otherwise, Julia should have certainly been a large enough example of a femtolisp program.

              1. 1

                I seem to recall that femtolisp was a straight up interpreter at one point in time, but I think I may be confusing that with the tiny implementation, which I’d guess is less used than femtolisp itself, and therefore purely academic in nature. :)

              2. 2

                On a related note

                I’m not sure how this is at all related, to be honest. :)

                1. 1

                  Yeah, I read it too quickly or forgot no bytecode by the time I replied. My bad. Least author found the link interesting.

          2. 11

            Awk and the various members of the Unix shell family?

            1. 6

              Yup, I’ve looked at at 5-10 shell implementations, and all of them are essentially AST interpreters + on-the-fly parsing/evaluating.

              Just like Ruby was an AST interpreter that recently became a bytecode interpeter, R made the same jump recently:

              A byte code compiler for R


              1. 1

                Do you happen to know how recently they’ve switched? (And an example of a large project written purely (or almost purely) in that language?) Thanks.

                1. 5

                  Ruby switched in 2007, i used it for a few years before that and the speedups were quite dramatic. See https://en.wikipedia.org/wiki/YARV for a bit more info.

                  1. 1

                    Cool! That links to some benchmarks which saves me from trying to do them.

                    Looks like its only about 4x on average which is hopeful (for having no bytecode).

                  2. 3

                    It looks like it’s around R release 2.13 and 2.14, which was around 2011. R is from the early/mid 90’s, similar to Ruby, and I think they both switched to bytecode compilers at around the same time.



                    R is for data analysis, so it’s hard to say what a “large project” is. It’s also a wrapper around Fortran numeric libraries in the same way Python is a wrapper around C (or NumPy is a wrapper around Fortran.)

                    There are thousands of libraries written in R:


                    1. 1

                      Thanks! This actually looks like a pretty good example. That website even lists package dependencies so I can try to find a long chain of pre-2011 dependencies.

                2. 4

                  Unix shell, for sure. GNU awk and mawk compile to bytecode. Not sure about nawk, though the regexp implementation in nawk uses bytecode.

                  1. 2

                    GNU awk and mawk compile to bytecode

                    Really? That seems wierd, since gawk is both slower than nawk and mawk. Or is it because of the features the GNU project added, that it’s overall slower?

                    1. 6

                      One big difference is that gawk operates on sequences of Unicode characters, while mawk and nawk operate on sequences of bytes. If your text is all ASCII, setting your locale to ‘C’ will cause gawk to also operate on bytes, which should close at least some of the performance gap. But gawk’s Unicode support can be nice if you have UTF-8. mawk and nawk will often still produce the right result on UTF-8 input, especially when non-ASCII characters are only passed through unmodified. But sometimes you get:

                      $ echo $LANG
                      $ echo "ÜNICÖDE" | gawk '{print tolower($0)}'
                      $ echo "ÜNICÖDE" | nawk '{print tolower($0)}'
                      $ echo "ÜNICÖDE" | mawk '{print tolower($0)}'
                  2. 1

                    Thanks for the example. I don’t know how they work internally. Although to me, they are in the write-more-primitives-in-a-faster-language category. The primitives being the external commands called (maybe awk a bit less?).

                    Are there examples of multi-layered library (or use) for awk and shell?

                    Edit: typo

                    1. 2

                      What does “multi-layered push” mean?

                      As for awk, when I use it, I never shell out. If I need to shell out from awk, I just write the whole thing in Go or something. So I just use the primitives exposed by awk. Though, before Go, I shelled out of awk more often.

                      1. 1

                        multi-layered push

                        Oops, that’s the wrong words. That’s what I get for alt-tabbing between the comment window and a shell.

                        I mean something like calling a library which calls another library which calls another library, with all intermediate libraries written in awk (or shell). (Or the same thing with functions if there are layers one on top of each other. By this I mean functions in one layer treats functions in the previous layer as primitives).

                        As for awk, when I use it, I never shell out.

                        That would be a good example then if you also have at least two layers of functions (although more would be nice).

                  3. 10

                    I think Tcl counts:

                    The way tcl works: a line of code is parsed once. Only once. Only ever once. Once, once, once. No exceptions at all. None. Zero. Zip. Nada. After substitutions are performed the first word becomes the command and all the other words become the arguments. Always. Always, always. Always.

                    Source: https://wiki.tcl.tk/12927

                    Sounds like an AST to me.

                    1. 2

                      Yes, Tcl is even worse than shell, because it dynamically parses code everywhere. Everything is a string, including code. This is useful for Lisp-ish metaprogramming, but it’s also inefficient:


                      The two main sources of performance problems in the current Tcl system (Tcl 7.5) are script reparsing and conversions between strings and other data representations. The current interpreter spends as much as 50% of its time parsing. It reparses the body of a loop, for example, on each iteration.

                      So really it’s not even an AST interpreter – it’s an interpreter that parses on the fly.

                      1. 14

                        Tcl has been on-the-fly bytecode compiled and internally typed since version 8.0, which was released twenty one years ago.

                        1. 1

                          Yup, the “stringly typed” nature of Tcl made me think of it first. Like wrapping every line of code in eval :D

                      2. 8

                        Someone can correct me if I’m wrong but Ruby used to be interpreted up to 1.8.x, then 1.9 introduced a bytecode and internal VM.

                        Rails was created while 1.8 was still the stable version, and that remained true for a couple of years, so many Rails application were running in production with a pure interpreted language.

                        1. 7

                          I’d triple-upvote this if I could, if only for pointing out that the term “interpreted language” is nonsense. Interpreting is a property of a language implementation, not a language!

                          1. 3

                            Woah! Totally off-topic but I’ve literally been researching atreus keyboards for the last hour!

                            1. 2

                              Cool; feel free to PM if you have any questions!

                              1. 2

                                Bought one! Will do thanks :)

                          2. 5

                            I question the assertion that the interpreting slowdown is exponential; whether it takes a few dozen cycles per statement (as in a bytecode interpreter) or a few thousand (as in a pure AST interpreter), the same number of statements would need to be executed, and so the slowdown is roughly linear no matter how deep the call stack gets. The exception to that would be if the interpreter were itself getting interpreted for a subroutine call, and I can’t see any reason that an implementation would do such a thing.

                            That said, one interesting place you might look for purely interpreted applications is in the Forth family; it’s fairly common to implement forth using some form of threaded code, which is effectively an AST modulo the fact that neither the S nor the T really apply to forth. You don’t see many programs written in forth directly (though apparently Chuck Moore has written a VLSI layout package), but postscript is fairly common and every implementation I’ve seen has been a pure interpreter.

                            Another possibility, also outside the realm of traditional languages, is TeX, which seems to be completely uncompileable (even to bytecode); I’d definitely consider LaTeX to be a large application written entirely in TeX.

                            1. 2

                              and so the slowdown is roughly linear no matter how deep the call stack gets.

                              You’re the first one to say anything about this! I need to think about this again. Right now I think you’re right. Hm, so the slowdown isn’t from where I think it is in that case. I hope I didn’t just waste more than a month on a false premise.

                              Also, then I’m surprised there aren’t more examples of direct AST interpretation if its “just” a constant factor.

                              Edit: Ok, so I think the issue is that the slowdown scales exponentially with the height of the call stack but that’s unaffected by whether bytecode or direct AST is interpreted. I’m trying to push primitives (like loops) out of the host language and have them written in L (a bit like Forth but its more interpreted than compiled). Now instead of calling a primitive D directly, I call A which calls B which calls C which calls D and each of those have ~10 instructions in L then I get a 1000x slowdown. Is this reasoning right?

                              The exception to that would be if the interpreter were itself getting interpreted for a subroutine call

                              There are many nested layers but the interpreter itself isn’t interpreted (except possibly as a (very deliberate) test).


                              I guess I’m thinking of Forth as basically already bytecode since its a stack machine like most bytecode’s VM. The same goes for Postscript although I don’t know the language.

                              TeX and LaTeX

                              That’s an interesting example. I don’t know how it works internally, just heard it described as a macro expansion language. I’m not sure I’d say that’s interpreted though but its definitely not compiled to either assembly or bytecode. So it would be an example I’m looking for, although I don’t think I can actually make use of this example.

                              1. 4

                                I think the way to look at is is that, as you push the veil back and write more and more of your logic in L, the slowdown applies to more and more of your execution time. It’s still not exponential; if a mixed program takes time t to run, and you have an interpretation overhead of o, the slowest you can possibly get by moving everything to interpretation is t*o and the fastest it can get is t/o. Those bounds could be tightened somewhat by considering the time that it spends in interpreted code vs. compiled code separately; take i to be the ratio of time spent in interpreted code. Then the worst case runtime would be t*i+(1-i)*t*o and the best case runtime would be t*(1-i)+i*t/o.

                                These bounds are difficult to think about intuitively though, because they look at if you moved everything to compiled code or interpreted code. So, instead of looking at the possible performance improvement or loss, let’s look at the actual performance relative to the best case of everything being compiled. If the proportion of interpreted code in your program is i, the slowdown relative to the best possible case will simply be i*o+(1-i)

                                It’s worth noting, though, that for the first few levels of pushing logic into your language, i grows (roughly) exponentially larger. The base has nothing to do with the interpretation overhead though; it’s simply the branching factor of your routines.

                                1. 1

                                  Thanks! Your comments here should be all the way at the top!

                                  If the proportion of interpreted code in your program is i, the slowdown relative to the best possible case will simply be i*o+(1-i)

                                  Yes, I agree and the all-compiled program is a better baseline. This means that I should get a factor of o at worst, even if its all AST interpreted.

                                  It’s worth noting, though, that for the first few levels of pushing logic into your language, i grows (roughly) exponentially larger.

                                  Hm, I’m wondering if this is what I’ve been seeing and let me to the incorrect exponential conclusion.

                                  Still, that means both direct AST interpretation and layering will result in “only” a constant factor slowdown? And its also “only” a constant factor to go from bytecode to native assembly.

                                  In general, is there somewhere I can run my reasoning (like this one) by some people for flaws? Other than to posting on lobste.rs that is. I’ve told some others and no-one had seen the problem. At the same time I’m re-doing the time bound estimates, I’m also estimating how much time I lost because of this :(.

                                  Ok, one more thing. If functions in level j only calls functions in level j-1 (and level 0 is primitives of the host language) then the time for a level j call is exponential in j, right? (And this happens regardless of whether the levels are (all) interpreted bytecode or direct AST.)

                                  And that means I should make calls in as low level as possible (instead of j-1) to make things faster. Except this goes in the wrong direction in terms of the abstraction I’m trying to provide (and potential polymorphism that would allow). For example, not using the object system would be faster than using it.

                                2. 1

                                  Is this reasoning right?

                                  I don’t think so.

                                  Perhaps think of it like this. A function can have more than one call site. Those call sites may have different stack depths. When executing that function, the interpreter will do the same work irrespective of what is on the stack.

                                  You may be thinking of nested loops? But that isn’t anything to do with how the program is run, but rather the algorithm which is implemented.

                                  1. 1

                                    Also, then I’m surprised there aren’t more examples of direct AST interpretation if its “just” a constant factor.

                                    You need to take into consideration the number of toy languages that make it to languages in production use, where that constant factor results in a significant savings in time and money. (Less servers, less developer downtime while tests run, etc)

                                    1. 1

                                      I guess I’m thinking of Forth as basically already bytecode since its a stack machine like most bytecode’s VM. The same goes for Postscript although I don’t know the language.

                                      Hmm. Forths are typically pretty low level, and often run directly on hardware (though they often will use their own calling conventions, instead of the C one). The dynamism comes from the fact that words are “compiled” into a list of addresses to the other words, which can basically then be “jumped” to in succession. So, it’s actually slightly strange, and not really a VM with “a list of known operators that you switch on” in the traditional sense.

                                      1. 2

                                        Good point. Although it means Forths are even more compiled (and a less good example of what I’m looking for) than languages that only stop at the bytecode.

                                  2. 4

                                    I think the question shouldn’t be “are there common purely interpreted languages”, but “are there common pure interpreters for common languages”. I’m sure there have been pure interpreters for ruby, python, perl, js, all of them! The reason why they get replaced is speed, and I’m not really sure why you’d prefer a 100x-1000x slowdown for direct ast interpretation. Maybe for the dynamism?

                                    1. 2

                                      Came here to say something similar. Pure interpreters are a step in (one method of doing) language development, but there’s nothing that says they’re the last step. If a language becomes popular, and performance (runtime, memory, or otherwise) becomes a pain point, you can more or less guarantee a rewrite of the interpreter into a bytecode compiler/executor, a JIT compiler, or a straight compiler.

                                      I’m also curious to know why the author wants to find a pure interpreted language. If it’s AST dynamism, that’s something that lisp macros take care of handily, while still being able to compile at least somewhat. Perhaps as a learning exercise?

                                    2. 4

                                      It’s usually simpler to do a byte code interpreter. Many language constructs boil down to the same bytecode instructions, which simplifies the interpreter loop. And in C, bytecode interpreters are more intuitive to write and maintain. Most languages that aim to be portable are written in C/C++, with the notable exceptions of Go and Rust.

                                      AST-only interpreters are basically only easier to write in languages with pattern matching.

                                      And you’re underestimating how slow AST-based interpreters are. AST nodes tend to be dynamically allocated, which means way more cache misses. I cannot stress enough how ridiculously slow memory is compared to CPU these days. Lots of fully compiled applications are lucky to average even 1.0 instruction per cycle (IPC). Intel has been making 4 and 5 IPC cores for years.

                                      1. 2

                                        Yeah one thing I realized is that bytecode interpreters are more natural in C than AST interpreters, because they’re non-recursive. This simplifies error handling (and probably memory management to a degree). With an AST interpreter, you have to use longjmp in C or exceptions in C++ all over the place, and they both have downsides.

                                        1. 2

                                          This is a good, non-trivial observation one only gets from the experience of implemenation. I did an AST interpreter some years ago, it is an awesome learning experience.

                                      2. 4

                                        Do languages that implement compiling to closures count? They aren’t strictly AST style interpreters at that point, but they also aren’t byte code interpreters, either. I mention it, because I’ve successfully sped up some direct AST walking templating languages in the past with this technique. So, I’d nominate “templating languages” as a potential consideration.

                                        1. 1

                                          Very interesting! I wasn’t aware of this technique so I’ll have ot read that before I can really tell. What’s an example of a language that uses it (other than their Scheme)?

                                          On the practical side, it will depend on how much extra is needed to make use of this method (compared to adding compilation to bytecode).

                                          1. 1

                                            I’m not aware of any real programming languages using this technique (at least not in general use), but once upon a time I added this technique to toffee, and I experimentally added it to one of the Python mustache implementations (but never finished/upstreamed it), with promising initial results.

                                            Conceptually it’s much simpler than compiling to bytecode, I think, so it’s the thing I always reach for when performance of a little AST evaluator matters. I’ve written several small scale interpreters this way for things like evaluation of basic “rules” and, more recently a prototype JSON document query / merger thingy.

                                        2. 3

                                          Another example that might not completely count: Mono re-added its interpreter; while it does have a JIT/AOT mode it primarily uses, the interpreter is useful in contexts like reflection on restrictive platforms (cough, iOS) or the cases where compiler latency causes something to be slower than just interpreting the IL.

                                          1. 2

                                            The author (@asrp) writes:

                                            But a (bytecode) interpreted language already has a speed penalty so why not just make that a bit greater and allow simpler internals? I’d be happy with a 100x slowdown (compared to, say C) to get direct AST interpretation in a dynamic language.

                                            Why? The amount of energy consumed by execution time for any common language is astounding, if you can improve average performance for a language like Python or Ruby by 10%, you are talking about massive energy savings (and there are lots of other related benefits). Compilers have been around for a long time for a reason – you can speed up code a lot by doing transformations ahead of time (you can also work at higher levels of abstraction, rule out unsafe patterns, etc). Are you opposed to any compilation, or just ones that transform between very different ASTs? What about if they just eliminate forms? (i.e., desugaring) What about arithmetic simplification? Inlining? Should DSLs no longer be implemented (they are little compilers)?

                                            Having reference interpreters with the simplest possible implementation makes sense as specification, but ruling out having any faster implementation for real execution seems… bizarre.

                                            1. 1

                                              K doesn’t have anything intermediate; It executes the “ast” directly. It’s also widely accepted as the fastest language.

                                              1. 4

                                                It’s also widely accepted as the fastest language.

                                                Citation needed. It might be the fastest of the APLs (but I don’t know), but surely the language that K is implemented in is faster?

                                                1. 4

                                                  but surely the language that K is implemented in is faster?

                                                  While that’s certainly a true statement, it’s all about perspective of what “fastest” means, and how much effort it takes to get there, right? Turing completeness shows that all Turing complete languages can compute the same set of programs. But, while I can write a Brainfuck program to evaluate Lisp, it’s not going to be very fast.

                                                  As such, the complexity of implementing, in C say, the operations that K makes available at a much higher level, leads to a code size to speed ratio that vastly exceeds the C equivalent, or Assembly equivalent, or other low language equivalent. This likely holds even if you use the underlying K primitives directly.

                                                  This notion, plus your typical benchmark suite, is likely what contributes to the “fastest language” claim (but citation still needed), and it’s pretty easy to see why that might be. While the benchmarks game requires that all implementations use the same algorithm, K (which isn’t in the shootout, so this is somewhat moot) probably parallelizes the crap out of array multiplication and anything else it can, which might be an instrumental part of many of the benchmarks. But because that’s part of the “core” feature set of the language, you don’t have to spend 1,000+ lines of code to build an efficient, generalizably parallel set of array routines that are super suspicious, and “unfair” when you include it in your language’s solution…

                                                  1. 4

                                                    Citation needed

                                                    The most complete benchmarks are probably the ones published by STAC. I like them because they publish details about the hardware, and deal with (contributed) real-world problems, rather than toy problems like “the benchmarks game”

                                                    surely the language that K is implemented in is faster?

                                                    K is implemented in C, but kdb+ also beats out a lot of other databases (Oracle, DB2, Impala, Mongo) which are also all written in C. It’s difficult to say that C is “faster” than K in this situation, since it’s also clearly slower.

                                                    One way that K often beats C is by being an interpreter. The source code and an interpreter can almost always be smaller than directly writing the solution in a lower level language, but that small size means your program neatly fits into a modern CPU’s L1 cache. Getting on-chip can mean a 1000x improvement in speed simply because main memory is that slow. That means you might (with enough time) beat a K solution by writing C implementations of what K is doing, but in many of those cases you’d just end up with K (or something just like it).

                                                2. 1

                                                  I think Io (http://iolanguage.org/) used to be purely interpreted, but the last time I had a look at it was 4 years ago (and I don’t think it counts as popular, though I don’t know how popular it is these days)

                                                  1. 1

                                                    Of the several Brainfuck implementations I have read, only one wasn’t purely interpreted. Some don’t use ASTs. Brainfuck is “popular enough” but you probably meant “popular enough and practical enough” :)