This is not a complaint about how this is a repost. I’m happy to see it posted again. It’s one of the most interesting languages I’ve seen in a good long while. It’s beautiful.
Hey y’all, I was surprised to see this here today! I guess a few updates since the last time this was posted:
Currently we’re in the middle of a massive rewrite of the entire compiler pipeline. It currently lives in the big-refactor branch, and we’re trying to get it done by July to create the next release.
Since then I’ve done a lot of research and thinking on traits and effect systems, and think I have a neat unified approach that I’d like to start prototyping. We’ve also been working on the way Passerine integrates with Rust, and we hope to provide a safe sandboxed parallelizable runtime with a high-level API that integrates well with existing Rust libraries.
We’ve also been rewriting the macro system to allow for compile-time evaluation. This will be much more powerful lisp-style procedural macro system. Lisp is a powerful programming language for manipulating lists, which is why lisp macros, that operate on lists, fit the language so well. Passerine aims to be a powerful programming language for manipulating ADTs, so by representing the language as an ADT to be operated on by macros, we hope to capture the same magic and power of lisp’s macro system. (Note that the current rule-based macro system will still be available, just implemented in terms of the new one.)
The majority of the discussion around the development of the language happens on our Discord server[1]. We have meetings the first Saturday of each month with presentations on PLT, compiler engineering, and other neat stuff.
I’ve been working on an experimental BEAM-style runtime called Qualm that has a custom allocator that supports vaporization (the memory management technique Passerine uses, essentially tagged pointer runtime borrow checking.) I’m not sure it’ll be ready for the next release (as it requires type layouts to be specified, and the typechecker is currently WIP), but it is a nice demo for what I think is possible for the language.
I’m here to answer any questions you may have about the language! I’m based in Europe so I might not see them till tomorrow morning, but don’t be afraid to ask, even if you think it’s a silly question.
[1]: I tried starting a Matrix channel after people on lobste.rs brought it up last time. After a grand total of zero users had joined six months later, I went ahead and scrapped it. I love Matrix, so I might consider bridging the server in the future.
Sorry for the late response. I started writing something, but like 2k words later I realized it should probably be like a blog post, and not a random comment. I’ll summarize the gist of the argument, and link to the blog post later once I’ve finished it:
So, as I’m sure you know, Perceus is a form of reference counting (limited to inductive types) that drastically reduces the number of reference count instructions that are produced. This makes reference counting more efficient, but Perceus is still reference counting at its core.
Vaporization uses a single tag bit to keep track of whether a reference is currently immutable shared or mutable owned. This is a bit different than reference counting, as the number of shared references is not kept track of.
When passing an object reference to a function, we set the reference tag to immutable shared if the reference is used again later in the caller’s scope. If this is the last-use of a value, we leave it as-is, allowing for efficient in-place updates in linear code. To update an object, the reference we have to it must be mutable owned; if the reference is immutable shared instead, the runtime will make a copy of the portion required. All references returned from functions must be mutable owned; when a function returns, all other mutable owned references tied to that stack frame are deallocated.
If effect, a mutable owned reference is owned by a single stack frame; ownership can be passed up or down the call-stack on function call or return. When calling a function, we create a child stack frame that is guaranteed to have a shorter lifetime than the parent stack frame. Therefore, we can make as many immutable references as we’d like to data owned by parent stack frames, because all immutable references to that data will disappear when the child stack frame returns to the parent stack frame.
Both Perceus and Vaporization do not allow cyclic data structures to be created (without requiring users to manage circular references themselves). This operating assumption drastically limits the type of data structures that can exist (in terms of pointer graphs). Because object graphs only be trees rooted at the stack (e.g. anything trivially serializable to JSON), it’s very simple to keep track of when things should be pruned from the heap.
Someone much smarter than I am described Vaporization as “using the stack as a topological sort of the acyclic object graph.” I’m not sure whether this statement is fully correct, but I think it captures the gist of the idea.
So, to answer your question, here’s what Vaporizations does with respect to compile-time and runtime memory management:
At compile-time, we annotate the last use of every variable in a given scope. When generating code, if we encounter a non-last-use variable, we emit an instruction to set the tag of the reference to shared immutable. (We also ensure that all closure captures are immutable).
At runtime, we update reference tags as annotated at compile-time, and we create deep copies of (portions of) objects as required when converting references from immutable shared to mutable owned.
If we know type layouts, it’s possible to inline the code responsible for copying data and updating reference tags such that there is no runtime dependency. This makes Vaporization suitable for both static and dynamic languages alike.
Hope this helps!
PS—Oh, I see you’re the author of Roc! I remember watching your talk “Outperforming Imperative with Pure Functional Languages” a while back, it was quite interesting!
I’m interested in the run-time borrow checking idea. The part that makes sense to me is doing copy-on-write with refcounting: so you have pass-by-value semantics, but you can also do an in-place update when the refcount is 1.
But by “borrow checking”, do you mean you want to give the programmer a way to say: “destroy this value at the end of this scope; if I mess up and retain a reference to it, let me know”? As opposed to: “keep this alive as long as I have references to it”.
See my sibling answer for some more info on vaporization. We essentially use a single bit embedded in the pointer for the refcount, which can either be ‘1’ (mutable owned) or ‘more than 1’ (immutable shared).
All values not returned from a function and not passed in as parameters to a function will be destroyed at the end of a given scope. The only way to retain a reference to an object would be to return it, which makes it very hard to mess up and retain a reference to it.
While you’re open to big ideas, have you considered capability security?
One of the coolest things I’ve added to awesome-ocap lately is Newspeak, with a really cool module system:
Newspeak is an object-capability programming platform that lets you develop code in your web browser. Like Self, Newspeak is message-based; all names are dynamically bound. However, like Smalltalk, Newspeak uses classes rather than prototypes. The current version of Newspeak runs on top of WASM.
Either ‘pass-er-in’ (rhymes with ‘grin’) or ‘pass-er-ine’ (rhymes with ‘brine’) work. Usually shortens to ‘pas-rin’ in conversation. It’s not the most obvious word to pronounce, to the extent that I’ve considered changing it to something else.
I really like this, but don’t understand repeating the Result + Exceptions mistake from Haskell. Especially when the exceptions can be caught in a Result why not just return it to begin with?
Thanks! I will admit that some parts of Passerine’s README are out of date; my opinions with respect to exception handling have also changed a tad. I need to rewrite this section, so thank you for the reminder.
That being said, I don’t think Passerine repeats the Result + Exceptions mistake from Haskell. Haskell Results/Exceptions overlap in functionality, whereas Passerine’s Results and exceptions serve completely different use-cases:
Exceptions (using the Fatal effect) are used to denote broken invariants or programmer errors. Much more like panic or unreachable in Rust. Exceptions should be used to signal that some expected invariant is fundamentally wrong. Unlike Rust, these exceptions can be handled and converted into errors through the effect system to keep things running. In the spirit of Erlang: let the parts of the system crash, but ensure that all crashes are isolated and can be recovered from.
Results are used to denote expected outcomes, including errors. If you think that something could go wrong in the codepath, use a result. If you’re not sure that invariants are being upheld (e.g. for an array access thay may be out of bounds), use a function that returns a Result. If you ever encounter an exception in practice, you should find out where it occurred and either fix the code assuming incorrect invariants, or switch to using a Result type at the source (e.g. don’t catch an exception and turn it into a Result!)
The standard library will use Results except for in the most obvious of cases (e.g. division by zero, indexing past the end of an array, etc. In short, exceptions should never be raised in well-factored code; handling exceptions is a mechanism for building resilient and long-running systems even in the presence of exceptional events (e.g. bit flips, etc.), but should not be used for routine error handling.
Ah, that sounds great. Result + Panic is just fine of course. The readme example was of opening a file that didn’t exist which seemed a lot more like an expected case to handle than a panic, but maybe that’s part of what’s out of date?
Thanks for the kind words! The standard library would define three foundational modules:
kernel, which contains all possible system level effects (I/O, etc.) that must be handled by the host runtime
compiler, which defines Passerine compiler types (e.g. AST) that are used by the macro system.
core, which implements the core language (if, for, match, etc.) in terms of foundational language primitives, like tail-recursive functions and lazy && evaluation.
These modules are all part of the prelude, and included in other standard library modules, which are written in relatively-normal Passerine. Actually writing out and binding these libraries to the compiler pipeline is something we’re working on right now.
The vm implementation seems to be based on or at least influenced by one described in Crafting Interpreters - pratt parsing, up values and references to wren.
Yep, I’m a big fan of Nystrom’s past work, including Wren and Magpie. I found his chapters on upvalues and pratt parsing to be quite helpful and well-written while working on Passerine. The current VM is very simple (something like 17 instructions), and not the fastest (but still faster than a tree-walk interpreter); I’m currently working on a much lower-level runtime that should be able to compile to MiniVM IR or Wasm. Getting a rough working prototype out the door helped validate the design principles behind the language, and I hope that as the compiler matures we’ll be able to improve the performance and capabilities of the language.
This was posted before.
This is not a complaint about how this is a repost. I’m happy to see it posted again. It’s one of the most interesting languages I’ve seen in a good long while. It’s beautiful.
Previous discussion on lobste.rs. The creator, @slightknack, is also on lobste.rs!
Hey y’all, I was surprised to see this here today! I guess a few updates since the last time this was posted:
Currently we’re in the middle of a massive rewrite of the entire compiler pipeline. It currently lives in the
big-refactor
branch, and we’re trying to get it done by July to create the next release.Since then I’ve done a lot of research and thinking on traits and effect systems, and think I have a neat unified approach that I’d like to start prototyping. We’ve also been working on the way Passerine integrates with Rust, and we hope to provide a safe sandboxed parallelizable runtime with a high-level API that integrates well with existing Rust libraries.
We’ve also been rewriting the macro system to allow for compile-time evaluation. This will be much more powerful lisp-style procedural macro system. Lisp is a powerful programming language for manipulating lists, which is why lisp macros, that operate on lists, fit the language so well. Passerine aims to be a powerful programming language for manipulating ADTs, so by representing the language as an ADT to be operated on by macros, we hope to capture the same magic and power of lisp’s macro system. (Note that the current rule-based macro system will still be available, just implemented in terms of the new one.)
The majority of the discussion around the development of the language happens on our Discord server[1]. We have meetings the first Saturday of each month with presentations on PLT, compiler engineering, and other neat stuff.
I’ve been working on an experimental BEAM-style runtime called Qualm that has a custom allocator that supports vaporization (the memory management technique Passerine uses, essentially tagged pointer runtime borrow checking.) I’m not sure it’ll be ready for the next release (as it requires type layouts to be specified, and the typechecker is currently WIP), but it is a nice demo for what I think is possible for the language.
I’m here to answer any questions you may have about the language! I’m based in Europe so I might not see them till tomorrow morning, but don’t be afraid to ask, even if you think it’s a silly question.
[1]: I tried starting a Matrix channel after people on lobste.rs brought it up last time. After a grand total of zero users had joined six months later, I went ahead and scrapped it. I love Matrix, so I might consider bridging the server in the future.
I’m very curious about the differences between what Passerine does and what Perceus does in terms of both compile-time and runtime memory management!
(For context, I’m working on a programming language that currently uses Perceus.)
Sorry for the late response. I started writing something, but like 2k words later I realized it should probably be like a blog post, and not a random comment. I’ll summarize the gist of the argument, and link to the blog post later once I’ve finished it:
So, as I’m sure you know, Perceus is a form of reference counting (limited to inductive types) that drastically reduces the number of reference count instructions that are produced. This makes reference counting more efficient, but Perceus is still reference counting at its core.
Vaporization uses a single tag bit to keep track of whether a reference is currently
immutable shared
ormutable owned
. This is a bit different than reference counting, as the number of shared references is not kept track of.When passing an object reference to a function, we set the reference tag to
immutable shared
if the reference is used again later in the caller’s scope. If this is the last-use of a value, we leave it as-is, allowing for efficient in-place updates in linear code. To update an object, the reference we have to it must bemutable owned
; if the reference isimmutable shared
instead, the runtime will make a copy of the portion required. All references returned from functions must bemutable owned
; when a function returns, all othermutable owned
references tied to that stack frame are deallocated.If effect, a
mutable owned
reference is owned by a single stack frame; ownership can be passed up or down the call-stack on function call or return. When calling a function, we create a child stack frame that is guaranteed to have a shorter lifetime than the parent stack frame. Therefore, we can make as many immutable references as we’d like to data owned by parent stack frames, because all immutable references to that data will disappear when the child stack frame returns to the parent stack frame.Both Perceus and Vaporization do not allow cyclic data structures to be created (without requiring users to manage circular references themselves). This operating assumption drastically limits the type of data structures that can exist (in terms of pointer graphs). Because object graphs only be trees rooted at the stack (e.g. anything trivially serializable to JSON), it’s very simple to keep track of when things should be pruned from the heap.
Someone much smarter than I am described Vaporization as “using the stack as a topological sort of the acyclic object graph.” I’m not sure whether this statement is fully correct, but I think it captures the gist of the idea.
So, to answer your question, here’s what Vaporizations does with respect to compile-time and runtime memory management:
At compile-time, we annotate the last use of every variable in a given scope. When generating code, if we encounter a non-last-use variable, we emit an instruction to set the tag of the reference to
shared immutable
. (We also ensure that all closure captures are immutable).At runtime, we update reference tags as annotated at compile-time, and we create deep copies of (portions of) objects as required when converting references from
immutable shared
tomutable owned
.If we know type layouts, it’s possible to inline the code responsible for copying data and updating reference tags such that there is no runtime dependency. This makes Vaporization suitable for both static and dynamic languages alike.
Hope this helps!
PS—Oh, I see you’re the author of Roc! I remember watching your talk “Outperforming Imperative with Pure Functional Languages” a while back, it was quite interesting!
Very helpful, thank you so much for the detailed explanation!
Also, glad you found the talk interesting. Feel free to DM me if you have any questions about Roc!
I’m interested in the run-time borrow checking idea. The part that makes sense to me is doing copy-on-write with refcounting: so you have pass-by-value semantics, but you can also do an in-place update when the refcount is 1.
But by “borrow checking”, do you mean you want to give the programmer a way to say: “destroy this value at the end of this scope; if I mess up and retain a reference to it, let me know”? As opposed to: “keep this alive as long as I have references to it”.
See my sibling answer for some more info on vaporization. We essentially use a single bit embedded in the pointer for the refcount, which can either be ‘1’ (
mutable owned
) or ‘more than 1’ (immutable shared
).All values not returned from a function and not passed in as parameters to a function will be destroyed at the end of a given scope. The only way to retain a reference to an object would be to return it, which makes it very hard to mess up and retain a reference to it.
While you’re open to big ideas, have you considered capability security?
One of the coolest things I’ve added to awesome-ocap lately is Newspeak, with a really cool module system:
How do I pronounce it?
Either ‘pass-er-in’ (rhymes with ‘grin’) or ‘pass-er-ine’ (rhymes with ‘brine’) work. Usually shortens to ‘pas-rin’ in conversation. It’s not the most obvious word to pronounce, to the extent that I’ve considered changing it to something else.
To be honest, I read it as rhyming with tangerine.
I really like this, but don’t understand repeating the Result + Exceptions mistake from Haskell. Especially when the exceptions can be caught in a Result why not just return it to begin with?
Thanks! I will admit that some parts of Passerine’s README are out of date; my opinions with respect to exception handling have also changed a tad. I need to rewrite this section, so thank you for the reminder.
That being said, I don’t think Passerine repeats the Result + Exceptions mistake from Haskell. Haskell Results/Exceptions overlap in functionality, whereas Passerine’s Results and exceptions serve completely different use-cases:
Exceptions (using the
Fatal
effect) are used to denote broken invariants or programmer errors. Much more likepanic
orunreachable
in Rust. Exceptions should be used to signal that some expected invariant is fundamentally wrong. Unlike Rust, these exceptions can be handled and converted into errors through the effect system to keep things running. In the spirit of Erlang: let the parts of the system crash, but ensure that all crashes are isolated and can be recovered from.Results are used to denote expected outcomes, including errors. If you think that something could go wrong in the codepath, use a result. If you’re not sure that invariants are being upheld (e.g. for an array access thay may be out of bounds), use a function that returns a Result. If you ever encounter an exception in practice, you should find out where it occurred and either fix the code assuming incorrect invariants, or switch to using a Result type at the source (e.g. don’t catch an exception and turn it into a Result!)
The standard library will use Results except for in the most obvious of cases (e.g. division by zero, indexing past the end of an array, etc. In short, exceptions should never be raised in well-factored code; handling exceptions is a mechanism for building resilient and long-running systems even in the presence of exceptional events (e.g. bit flips, etc.), but should not be used for routine error handling.
Ah, that sounds great. Result + Panic is just fine of course. The readme example was of opening a file that didn’t exist which seemed a lot more like an expected case to handle than a panic, but maybe that’s part of what’s out of date?
I adore the macrocentric approach to adding even basic syntax. What would a stdlib even look like in terms of both functionality and syntax?
Thanks for the kind words! The standard library would define three foundational modules:
kernel
, which contains all possible system level effects (I/O, etc.) that must be handled by the host runtimecompiler
, which defines Passerine compiler types (e.g. AST) that are used by the macro system.core
, which implements the core language (if
,for
,match
, etc.) in terms of foundational language primitives, like tail-recursive functions and lazy&&
evaluation.These modules are all part of the prelude, and included in other standard library modules, which are written in relatively-normal Passerine. Actually writing out and binding these libraries to the compiler pipeline is something we’re working on right now.
Thanks for such a killer answer. Really impressed by what you’ve put together here and looking forward to learning even more as things progress.
The vm implementation seems to be based on or at least influenced by one described in Crafting Interpreters - pratt parsing, up values and references to wren.
Yep, I’m a big fan of Nystrom’s past work, including Wren and Magpie. I found his chapters on upvalues and pratt parsing to be quite helpful and well-written while working on Passerine. The current VM is very simple (something like 17 instructions), and not the fastest (but still faster than a tree-walk interpreter); I’m currently working on a much lower-level runtime that should be able to compile to MiniVM IR or Wasm. Getting a rough working prototype out the door helped validate the design principles behind the language, and I hope that as the compiler matures we’ll be able to improve the performance and capabilities of the language.
I’m also a fan of his work. Writing my rust implementation of lox vm/compiler was lots of fun :)