I’m gonna throw my hat in for another couple reasons to write a language as a personal project:
To build a language you want to use. I find it interesting to look at a problem and think “if I had a completely custom tool, what would my ideal solution look like”. That solution might not look like how other people would solve it, but who cares it’s just for yourself.
Having an interesting and technically challenging project that can scale up or down to other problems you want to solve. What I mean by that is, let’s say you try to build the next k8s that fixes all your issues with it. It is an interesting and technically challenging problem, but it’s hard to “scale down”. The difficult bits come from running huge clusters, collaborating with other engineers, etc. and most people don’t have the resources or patience to deal with that. Building your own lang can apply to a breadth of problems, scaling down to a “generate some static HTML” and potentially scaling up to “build your own k8s”. It can also potentially make those smaller projects more fun and interesting, though will inevitably involve some yak shaving.
computers are fast now, so you can also half-ass just about any part of the implementation and still have something functional
like, you don’t have to use optimal single-pass parsing algorithms. it all easily fits in ram, and computers are fast enough that you can throw in some O(n*n) approaches and still be fine. leaving you time to focus on whatever part of it actually grabs your interest
This was a fun read. I love reading langdevs talk about where they’ve stubbed a toe or blown their own leg off. I saw the headline and thought it might link to this, which always makes me chuckle.
I wish we had a fix for the “nobody will ever use it / care about it” problem. It feels like in the mid/late aughts we were a lot more eager to experiment with the language and semantics layer for innovation, then all that capital investment / startup economics made it seem like experimentation of any kind was too risky. From the customer side, I’ve written a lot about how “Use Boring Technology,” like Agile, took initially decent advice and bastardized it into something that sucks.
But from the supplier side, I think of this talk by Evan Czaplicki, creator of Elm. While I never took to Elm, I think he does a great job describing why it’s hard, even when you have a corporate sponsor. There might be more space on how to mitigate these issues, the parent post’s 👆 line about “your language needs to be friends with a grown-up language” is wonderful.
This is pretty good, but the main thing it’s missing is a discussion of the difference between writing a language that sits on an existing runtime vs one that is fully independent. The former is like …. oh cool, fun side project; the latter is like … a tremendously huge amount of work; even if you succeed getting code running now you have to build an entire ecosystem from scratch which will be close to impossible to do on your own.
Context: I’m the lead developer of Fennel, which is a compiler that targets Lua, and we’ve benefited immensely from having access to an existing VM and library ecosystem.
Yeah I definitely noticed this! There is kind of a split between what you can get done with and without funding (often monopoly funding), just based on sheer engineering effort.
Without funding, it’s definitely economical to “be friends with a grown-up language”, as the post says :-)
Professional and funded:
Java (Sun and Oracle, advanced GC)
Go (Google, advanced GC)
v8 (Google)
Dart (Google)
Swift (Apple)
Rust (Mozilla)
SpiderMonkey (Mozilla)
.NET (Microsoft)
LLVM / Clang (Google, Apple)
Erlang and BEAM VM (Ericsson - not sure how much this cost to make, but it’s interesting that there aren’t many successors after 3+ decades ??)
Academic or “Hobby” projects, which are less funded:
Perl / PHP / Python / Ruby / Lua - these are huge achievements, but have less total engineering effort put into them than the above, which lean more toward “fast”
Clojure - reuses JVM
Scala - reuses JVM (although there is a company behind it IIRC)
Elm / Reason / many others - compile to JS
Elixir and Gleam - reuse the BEAM
Even big companies build on others
Kotlin - reuses JVM
TypeScript - compiles to JS
Deno - ecosystem on v8
Bun - ecosystem on SpiderMonkey
Possible exceptions:
GCC - the original open source compiler, which is very impressive in hindsight!
D language - this now looks more impressive, because they did all that work before LLVM !!
Zig seems to be doing a huge amount of work, even moving toward their own back end
LuaJIT - this is an advanced compiler and GC done by a lightly-funded individual
OCaml - not sure where it falls, but it is impressive for being an academic project, having a sophisticated type system, and pretty good runtime and GC
Julia leverages LLVM, but it’s also super impressive given its (somewhat academic) origins, and has a unique ecosystem
In particular, writing a GC is super expensive. Successful GC’s tend to turn into 10-20 year research projects.
I found that even Jetbrains, a billion+ dollar company as far as I know, cannot necessarily justify funding a GC for Kotlin Native (the version of Kotlin that runs without a JVM )
I don’t remember where I read it, but this thread has some similar sentiments:
Pony is an interesting case. Started as an itch by an industry developer, incubated as a PhD thesis project, attracted some academic collaboration and industry interest, including some non-toy projects like Wallaroo… but then sort of languished, for all the reasons (many non-technical!) that promising new languages struggle against industry incumbents. Performance oriented and uses LLVM, with a C FFI. Punted on the GC issue by using boring mark+sweep, but in a clever way that works well with its actor semantics.
One thing that probably didn’t help Pony is that its capabilities system can be rather difficult to grok, combined with the documentation being not that great. I myself didn’t really grok it either until I started adopting bits of it for Inko.
TBH I’d put Python solidly in the “exceptions” camp; it’s the only top-5 language to have been created outside a major corporate backer; definitely an outlier, and there’s not really anything about its success that has been replicated since the 90s.
Now I think I should have put Perl / PHP / Python / Ruby / Lua in a category of their own – they started from scratch, and have their own ecosystems. And they’re all independent projects, which didn’t originate with corporate funding!
And they are not like 1M-2M lines of engineering effort, which is how I think of Clang/GCC/v8/JVM/.NET, and they are not statically typed (which, like GC, does have engineering challenges)
I’m biased toward Python, but I think Ruby and PHP are close. Ruby is close in the sense that people freely choose it, even though it’s not attached to some platform.
PHP is also nearly as popular as Python, from what I remember … for most of their lives, I think PHP was more popular.
Perl is also very impressive, although it seems like there was consensus that Perl 5 “topped out” in terms of language and implementation complexity. Whereas Python’s original implementation still seems to be going strong, and it has gained a ton of features in recent years too
BTW my theory on why Python became so big is an early architectural decision – the narrow waist of PyObject*
Among other things, PyObject* supports operator overloading, and has an extensive C API. This led to NumPy, which led to a “monopoly” on machine learning frameworks – there is an interview with NumPy creator Travis Oliphant linked in that blog post.
If you look at the implementation of Perl/Ruby/PHP, I don’t think they could support such extension, not as cleanly for sure
It’s also impressive how little breakage there has been in Python too. If you look at Python 0.9 or 1.0, it’s basically the same code. They just kept adding functionality to it for 30+ years, without breaking very much (except for Python 3, which IMO was regrettable). But even the Python 3 interpreter is really the same exact interpreter, but the string types are different.
Personally I tend to refactor and revisit my code a lot, but Python just kinda ships stuff and mostly does it right the first time. (Compared with say JS and PHP, which have many more footguns / regretted design decisions)
It also took a long time for that success too. Basically no one used Python 1.x in the 90s; it’s Python 2 in the early 2000s is when it started to both displace Perl (for Unix admin/text processing/web sundry) and snowball rapidly to its current successes and dominant niches.
spec of a small language, or a small implementation of a language
One thing I’ve noticed trying to implement Ada is that with a large mature language spec, it can be difficult to find a particular slice you can implement without pulling in everything. This has turned into many hours of reading the spec and throwing away implementations as I find things incompatible with what I’ve written. I did both parts of “Crafting Interpreters” and maybe I should have started with something much smaller, but I’m still having fun.
If you write in the same language you’re trying to interpret or compile you can make some neat developer tooling for yourself and you get a lot of example code to test your lexer/parser on for free. :)
Namespaces are hard.
I am on my third attempt at namespaces, and it’s looking like maybe, just maybe I got something workable this time.
Write tests and run the tests.
Write permanent instrumentation.
Making my own built-in functions and pragmas, and hooking them into my testbed helps make a lot of internals testable. e.g. In my actual end-to-end driver tests, I can verify the current scope name of a place inside a package chain: pragma Verify_Scope_Named ("Outer.Mid.Inner");
Last year I worked through Crafting Interpreters, and building my own language that’s just for me is high on the list of projects to start this year. The r/ProgrammingLanguages subreddit is an excellent source of discussion and ideas for anyone who’s interested in this topic.
I’m gonna throw my hat in for another couple reasons to write a language as a personal project:
computers are fast now, so you can also half-ass just about any part of the implementation and still have something functional
like, you don’t have to use optimal single-pass parsing algorithms. it all easily fits in ram, and computers are fast enough that you can throw in some O(n*n) approaches and still be fine. leaving you time to focus on whatever part of it actually grabs your interest
This was a fun read. I love reading langdevs talk about where they’ve stubbed a toe or blown their own leg off. I saw the headline and thought it might link to this, which always makes me chuckle.
I wish we had a fix for the “nobody will ever use it / care about it” problem. It feels like in the mid/late aughts we were a lot more eager to experiment with the language and semantics layer for innovation, then all that capital investment / startup economics made it seem like experimentation of any kind was too risky. From the customer side, I’ve written a lot about how “Use Boring Technology,” like Agile, took initially decent advice and bastardized it into something that sucks.
But from the supplier side, I think of this talk by Evan Czaplicki, creator of Elm. While I never took to Elm, I think he does a great job describing why it’s hard, even when you have a corporate sponsor. There might be more space on how to mitigate these issues, the parent post’s 👆 line about “your language needs to be friends with a grown-up language” is wonderful.
This is pretty good, but the main thing it’s missing is a discussion of the difference between writing a language that sits on an existing runtime vs one that is fully independent. The former is like …. oh cool, fun side project; the latter is like … a tremendously huge amount of work; even if you succeed getting code running now you have to build an entire ecosystem from scratch which will be close to impossible to do on your own.
Context: I’m the lead developer of Fennel, which is a compiler that targets Lua, and we’ve benefited immensely from having access to an existing VM and library ecosystem.
Yeah I definitely noticed this! There is kind of a split between what you can get done with and without funding (often monopoly funding), just based on sheer engineering effort.
Without funding, it’s definitely economical to “be friends with a grown-up language”, as the post says :-)
Professional and funded:
Academic or “Hobby” projects, which are less funded:
Even big companies build on others
Possible exceptions:
In particular, writing a GC is super expensive. Successful GC’s tend to turn into 10-20 year research projects.
I found that even Jetbrains, a billion+ dollar company as far as I know, cannot necessarily justify funding a GC for Kotlin Native (the version of Kotlin that runs without a JVM )
I don’t remember where I read it, but this thread has some similar sentiments:
https://old.reddit.com/r/Kotlin/comments/1br81tv/kotlin_native/
Pony is an interesting case. Started as an itch by an industry developer, incubated as a PhD thesis project, attracted some academic collaboration and industry interest, including some non-toy projects like Wallaroo… but then sort of languished, for all the reasons (many non-technical!) that promising new languages struggle against industry incumbents. Performance oriented and uses LLVM, with a C FFI. Punted on the GC issue by using boring mark+sweep, but in a clever way that works well with its actor semantics.
One thing that probably didn’t help Pony is that its capabilities system can be rather difficult to grok, combined with the documentation being not that great. I myself didn’t really grok it either until I started adopting bits of it for Inko.
TBH I’d put Python solidly in the “exceptions” camp; it’s the only top-5 language to have been created outside a major corporate backer; definitely an outlier, and there’s not really anything about its success that has been replicated since the 90s.
Now I think I should have put Perl / PHP / Python / Ruby / Lua in a category of their own – they started from scratch, and have their own ecosystems. And they’re all independent projects, which didn’t originate with corporate funding!
And they are not like 1M-2M lines of engineering effort, which is how I think of Clang/GCC/v8/JVM/.NET, and they are not statically typed (which, like GC, does have engineering challenges)
I’m biased toward Python, but I think Ruby and PHP are close. Ruby is close in the sense that people freely choose it, even though it’s not attached to some platform.
PHP is also nearly as popular as Python, from what I remember … for most of their lives, I think PHP was more popular.
Perl is also very impressive, although it seems like there was consensus that Perl 5 “topped out” in terms of language and implementation complexity. Whereas Python’s original implementation still seems to be going strong, and it has gained a ton of features in recent years too
BTW my theory on why Python became so big is an early architectural decision – the narrow waist of
PyObject*Narrow Waists Can Be Interior or Exterior: PyObject vs. Unix Files
Among other things,
PyObject*supports operator overloading, and has an extensive C API. This led to NumPy, which led to a “monopoly” on machine learning frameworks – there is an interview with NumPy creator Travis Oliphant linked in that blog post.If you look at the implementation of Perl/Ruby/PHP, I don’t think they could support such extension, not as cleanly for sure
It’s also impressive how little breakage there has been in Python too. If you look at Python 0.9 or 1.0, it’s basically the same code. They just kept adding functionality to it for 30+ years, without breaking very much (except for Python 3, which IMO was regrettable). But even the Python 3 interpreter is really the same exact interpreter, but the string types are different.
Personally I tend to refactor and revisit my code a lot, but Python just kinda ships stuff and mostly does it right the first time. (Compared with say JS and PHP, which have many more footguns / regretted design decisions)
It also took a long time for that success too. Basically no one used Python 1.x in the 90s; it’s Python 2 in the early 2000s is when it started to both displace Perl (for Unix admin/text processing/web sundry) and snowball rapidly to its current successes and dominant niches.
I started working on a programming language yesterday. What a coincidence!
Nice. I started writing a single-pass compiler for a custom language last Monday: https://github.com/scottjmaddox/single-pass-compiler/
Nice! Thanks for sharing.
One thing I’ve noticed trying to implement Ada is that with a large mature language spec, it can be difficult to find a particular slice you can implement without pulling in everything. This has turned into many hours of reading the spec and throwing away implementations as I find things incompatible with what I’ve written. I did both parts of “Crafting Interpreters” and maybe I should have started with something much smaller, but I’m still having fun.
If you write in the same language you’re trying to interpret or compile you can make some neat developer tooling for yourself and you get a lot of example code to test your lexer/parser on for free. :)
I am on my third attempt at namespaces, and it’s looking like maybe, just maybe I got something workable this time.
Making my own built-in functions and pragmas, and hooking them into my testbed helps make a lot of internals testable. e.g. In my actual end-to-end driver tests, I can verify the current scope name of a place inside a package chain:
pragma Verify_Scope_Named ("Outer.Mid.Inner");Last year I worked through Crafting Interpreters, and building my own language that’s just for me is high on the list of projects to start this year. The r/ProgrammingLanguages subreddit is an excellent source of discussion and ideas for anyone who’s interested in this topic.