To clarify what appears to be a miscommunication, this untyped concatentive calculus (UCC) is not Dawn, but Dawn is ultimately based on the UCC. Formalizing the UCC is just the first step in formalizing Dawn.
Also, on your choice to make composition a binary constructor: without an extra normalization step, this has the disadvantage of assigning different representations to e1 (e2 e3) and (e1 e2) e3.
This is awesome, though, I can probably learn a thing or two by studying it!
Edit: Ah, just saw that you address the composition representation issue at the end!
I think that there are some issues with the semantics as presented:
Minor technical issue: the way compose is defined, it can only ever built expression-lists of size 2, for example [e1] [e2] [e3] compose compose should leave [e1 (e2 e3)] on the stack, instead of [e1 e2 e3] as seems to be assumed in the rest of the post. (Note that e1 (e2 e3) and e1 e2 e3 are both valid and distinct according to the grammar of expressions.)
Conceptual issue: in the lambda-calculus, “church encodings” give a systematic way to encode any (algebraic) datatype: from say type bool = true | false or type nat = zero | succ nat, you can mechanically derive the encodings in the lambda-calculus (technically we call this systematic encoding the Boehm-Berarducci encoding, but it is a generalization of the Church encoding of booleans and natural numbers). Here it is unclear in your description whether a systematic encoding exists, or whether one has to “solve an equation” to find the terms in an inventive way. There may be a systematic approach that gives the definition but with non-normal term, and you somehow normalized/optimized the encodings for the purpose of the presentation, but this process (if it exists) is not clear to the reader.
Similarly I would expect that there is a systematic way to derive functions on boolean numbers or natural numbers from a primitive elimination construct. In the lambda-calculus, the truth-table of boolean or can be expressed as if x then (if y then true else true) else (if y then true else false), which can be expressed in Church-encoded form \x y. x (y true true) (y true false), and then optimized into the equivalent form \x y. x true y. I would expect that it is possible to systematically derive or instead of having to “guess” it.
Performance issue: with the encoding of natural numbers given, n+1 contains the definition of n duplicated two times. This means that the encoding of n has size exponential in n, which seems extremely impractical. It may be that a better encoding exists. (For example you give an encoding of succ later, so I guess just zero succ succ ... succ would qualify.)
Thanks for the constructive feedback! I think the issue with composition should be easy to address. I specifically defined composition as n-ary in order to avoid this issue, but I see your point that the compose intrinsic is not defined clearly in the text.
As for the encodings, that’s good to know! My main goal for those was to excercise the calculus a bit. Ultimately, I do not plan to use such encodings in Dawn. I’ll try to clarify that these are not systematic or efficient encodings. Perhaps someone else who is particularly interested in encodings can explore how to improve on those aspects!
Minor technical issue: the way compose is defined, it can only ever built expression-lists of size
2
e1 (e2 e3) is equivalent to e1 e2 e3. This equivalence is so fundamental to the concatenative
calculus that I would regard it as a mistake to even allow them to have different representations. I
think a better definition would be:
Sequence s = e_1 ... e_n
Expression e = i | [s]
Performance issue: with the encoding of natural numbers given, n+1 contains the definition of n
duplicated two times
[F] [clone] N apply [compose] N apply apply
= [F] ...n+1 [F] [compose] N apply apply
= [F ...n+1 F] apply
= F ...n+1 F
Which does definitely duplicate N. I think this is avoidable, though: instead of making n copies
of F and then composing them, clone F just once, then apply F n times (using N), then apply F one
last time.
Yes, you can define a helper term to make that approach to defining natural numbers easy. I started with that encoding, but then I changed to this encoding because it applys all of the copies at the same time, so it strictly fulfills the big step semantics I provided, rather than applying the expression one at a time.
This is great; I feel like concatenative languages should be studied more.
In the long run, I expect optimizing compilers for Dawn terms to consistently produce higher performing machine code than for all but the most highly tuned C procedures, thanks in no small part to Dawn’s extremely simple and highly analyzable core syntax and semantics.
My understanding is that “compile down to a simple and highly analyzable IR, then optimize the hell out of it” is a widely used strategy. Like I think LLVM fits this description, though I don’t really know the details. Do you think Dawn will have some advantage here, over IRs that use variables/names? It does have the advantage(?) of being naturally linear, as you said.
Nonetheless, recursion is possible by passing a copy of a quoted term to itself using higher order expressions such as [clone apply] clone apply.
Have you tried finding a type for dup apply? I discovered equirecursive types that way :-)
My plan is for Dawn to act as its own IR, with the lower levels of IR defining memory layouts and having access to platform specific terms, with the lowest level essentially being assembly. I haven’t prototyped any of that yet, though, so we’ll see how it works out in practice.
Dawn’s primary advantages over C, in terms of optimizability, will be that pointer aliasing is disallowed (or explicit) and that memory layout is not specified in the highest level of the language, and therefore can be optimised by the compiler. I also hope to enable the user to directly control compilation, including adding their own optimizations. I want compilation to be much less of a black box than it typically is. Again, though, none of this is prototyped, so we’ll see how it plays out.
Assigning a type dup apply / clone apply requires a form of self types, which I don’t currently plan to include.
My plan is for Dawn to act as its own IR, with the lower levels of IR defining memory layouts and having access to platform specific terms, with the lowest level essentially being assembly. I haven’t prototyped any of that yet, though, so we’ll see how it works out in practice.
That sounds like a lot of fun!
I also hope to enable the user to directly control compilation, including adding their own optimizations.
This frightens me a bit; it means I could encounter a language-level bug due to a library I import. I guess the assumption is that people won’t add optimizations unless they really know what they’re doing.
pointer aliasing is disallowed (or explicit)
This is hard! Or rather, it’s hard to do while making the language nice to use. Are you thinking of having a Rust-like borrow system, and distinguishing mutable references from shared references? If you are, have you seen withoutboat’s post on what a simpler Rust would look like?
Assigning a type dup apply / clone apply requires a form of self types, which I don’t currently plan to include.
I think that’s a good decision. Equirecursive types are neat, but don’t seem to be very useful in practice. You can’t give a type to lambda f. f f either, for the same reason (and thus can’t type the Y combinator, as it uses that), but no one seems to complain.
This frightens me a bit; it means I could encounter a language-level bug due to a library I import.
Such a bug would be scoped to uses of the library, so it’s really no different from existing libraries. My hope is for all optimizations to be formally verified, though. But that might have to come later.
This is hard! Or rather, it’s hard to do while making the language nice to use.
The hard part is making it performant and ergonomic at the same time, i.e. having explicit reference types which statically enforce mutually exclusive mutability. I have plans for how to do this without requiring a separate borrow checker, but we’ll have to see how ergonomic it is in practice. And, of course, there has to be a solution for cyclic references and other patterns that are not possible on safe Rust.
Yes, I did read those posts a while back, but they’re probably worth rereading.
Equirecursive types are neat, but don’t seem to be very useful in practice.
The Kind language, and the other offshoots from FormCore, make excellent use of them. There’s a lot of work needed to adapt dependent types to the concatentive calculus, though, and I don’t expect I’ll be able to tackle that alone or in reasonably short order. If they can be made to work, though, they would be excellent for formal verification.
Such a bug would be scoped to uses of the library, so it’s really no different from existing libraries.
My point is that it is different from a bug in a library in a typical language. If a library is allowed to muck with things in a way that violates invariants that the language relies on, that can cause a more-or-less arbitrary bug at a more-or-less arbitrary location. For example, if I use a badly written library in Rust that uses unsafe, it could potentially cause my variables to have two mutable references to the same data, or cause a segfault in my code. Bugs inside unsafe are nonlocal; while the problem originates inside the unsafe block, it may manifest itself much later. Thus “unsafe” code is important to audit, and likewise user-written compiler optimizations are important to audit (or formally verify!).
I have plans for how to do this without requiring a separate borrow checker, but we’ll have to see how ergonomic it is in practice.
Sorry, I worded that poorly. The point I mean to make is this. The class of bugs that can be introduced in unsafe code in Rust is much worse than the class of bugs that can come from safe code; likewise the class of bugs a library could introduce with (non formally verified) compilation optimizations is much worse than it could introduce without.
Excited to see progress on Dawn! The post is well written and easy to follow. At least, as easy as you can hope for with denotational semantics — definitely had to stop and think about the reductions :)
I see you’ve implemented ucc in Rust. When you get to implementing Dawn, are you planning to use Rust again?
Yes, I plan to stick with Rust for the time being. Eventually it will be self hosting, and eventually there will be a backend that compiles to C, which could provide a mechanism for bootstrapping via a C compiler.
Nitpick: this is not a denotational semantics, but an operational semantics. We call “denotational semantics” the semantics that proceed by interpreting programs as mathematical objects, while “operational semantics” define programs by their computation/reduction behavior (operation).
This is fascinating — I’m not versed enough in theory to contribute any meaningful commentary, but as someone who learned FORTH as a kid I’m enjoying seeing the math underlying it. It took a bit of squinting to make out FORTH’s two stacks and instruction storage in the linear representation here. 🤯
I’m curious how you build up from this to an actually-practical language. For example, integers — do you implement ints as shorthand, like macros that theoretically expand to their Churchian form but which are special-cased by the evaluator before the expansion occurs?
It took a bit of squinting to make out FORTH’s two stacks and instruction storage in the linear representation here.
Yeah, the formal semantics don’t make it clear how you would implement this efficiently on actual hardware, which is what Forth does with it’s return stack and by restricting higher-order functions. Forth isn’t strictly concatentive, though.
do you implement ints as shorthand, like macros that theoretically expand to their Churchian form but which are special-cased by the evaluator before the expansion occurs?
No, I don’t plan to use these sorts of encodings for values in Dawn. That part of the post was mostly just to excercise the formal semantics of this untyped concatentive calculus a bit.
However, the Kind proof language essentially does use similar sorts of encodings to great effect, in that it keeps the core language an order of magnitude or two smaller than other proof languages with similar amounts of functionality.
So I haven’t completely ruled this out as a possibility.
Looks great! I wrote up an article in which I define part of this post in Coq.
Nice!
To clarify what appears to be a miscommunication, this untyped concatentive calculus (UCC) is not Dawn, but Dawn is ultimately based on the UCC. Formalizing the UCC is just the first step in formalizing Dawn.
Also, on your choice to make composition a binary constructor: without an extra normalization step, this has the disadvantage of assigning different representations to
e1 (e2 e3)
and(e1 e2) e3
.This is awesome, though, I can probably learn a thing or two by studying it!
Edit: Ah, just saw that you address the composition representation issue at the end!
Thanks for checking it out! I’ve fixed the references to Dawn to now refer to the UCC.
I think that there are some issues with the semantics as presented:
Minor technical issue: the way compose is defined, it can only ever built expression-lists of size 2, for example
[e1] [e2] [e3] compose compose
should leave[e1 (e2 e3)]
on the stack, instead of[e1 e2 e3]
as seems to be assumed in the rest of the post. (Note thate1 (e2 e3)
ande1 e2 e3
are both valid and distinct according to the grammar of expressions.)Conceptual issue: in the lambda-calculus, “church encodings” give a systematic way to encode any (algebraic) datatype: from say
type bool = true | false
ortype nat = zero | succ nat
, you can mechanically derive the encodings in the lambda-calculus (technically we call this systematic encoding the Boehm-Berarducci encoding, but it is a generalization of the Church encoding of booleans and natural numbers). Here it is unclear in your description whether a systematic encoding exists, or whether one has to “solve an equation” to find the terms in an inventive way. There may be a systematic approach that gives the definition but with non-normal term, and you somehow normalized/optimized the encodings for the purpose of the presentation, but this process (if it exists) is not clear to the reader.Similarly I would expect that there is a systematic way to derive functions on boolean numbers or natural numbers from a primitive elimination construct. In the lambda-calculus, the truth-table of boolean
or
can be expressed asif x then (if y then true else true) else (if y then true else false)
, which can be expressed in Church-encoded form\x y. x (y true true) (y true false)
, and then optimized into the equivalent form\x y. x true y
. I would expect that it is possible to systematically deriveor
instead of having to “guess” it.Performance issue: with the encoding of natural numbers given,
n+1
contains the definition ofn
duplicated two times. This means that the encoding ofn
has size exponential inn
, which seems extremely impractical. It may be that a better encoding exists. (For example you give an encoding ofsucc
later, so I guess justzero succ succ ... succ
would qualify.)Thanks for the constructive feedback! I think the issue with composition should be easy to address. I specifically defined composition as n-ary in order to avoid this issue, but I see your point that the
compose
intrinsic is not defined clearly in the text.As for the encodings, that’s good to know! My main goal for those was to excercise the calculus a bit. Ultimately, I do not plan to use such encodings in Dawn. I’ll try to clarify that these are not systematic or efficient encodings. Perhaps someone else who is particularly interested in encodings can explore how to improve on those aspects!
e1 (e2 e3)
is equivalent toe1 e2 e3
. This equivalence is so fundamental to the concatenative calculus that I would regard it as a mistake to even allow them to have different representations. I think a better definition would be:Working it out:
Applying it to a function [F]
Which does definitely duplicate N. I think this is avoidable, though: instead of making
n
copiesof F and then composing them, clone F just once, then apply F
n
times (using N), then apply F one last time.Yes, you can define a helper term to make that approach to defining natural numbers easy. I started with that encoding, but then I changed to this encoding because it applys all of the copies at the same time, so it strictly fulfills the big step semantics I provided, rather than applying the expression one at a time.
I’ve updated the post to address your comments: https://www.dawn-lang.org/posts/foundations-ucc/
Thanks again for the feedback!
Author here. Ask me anything :-)
This is great; I feel like concatenative languages should be studied more.
My understanding is that “compile down to a simple and highly analyzable IR, then optimize the hell out of it” is a widely used strategy. Like I think LLVM fits this description, though I don’t really know the details. Do you think Dawn will have some advantage here, over IRs that use variables/names? It does have the advantage(?) of being naturally linear, as you said.
Have you tried finding a type for
dup apply
? I discovered equirecursive types that way :-)My plan is for Dawn to act as its own IR, with the lower levels of IR defining memory layouts and having access to platform specific terms, with the lowest level essentially being assembly. I haven’t prototyped any of that yet, though, so we’ll see how it works out in practice.
Dawn’s primary advantages over C, in terms of optimizability, will be that pointer aliasing is disallowed (or explicit) and that memory layout is not specified in the highest level of the language, and therefore can be optimised by the compiler. I also hope to enable the user to directly control compilation, including adding their own optimizations. I want compilation to be much less of a black box than it typically is. Again, though, none of this is prototyped, so we’ll see how it plays out.
Assigning a type
dup apply
/clone apply
requires a form of self types, which I don’t currently plan to include.That sounds like a lot of fun!
This frightens me a bit; it means I could encounter a language-level bug due to a library I import. I guess the assumption is that people won’t add optimizations unless they really know what they’re doing.
This is hard! Or rather, it’s hard to do while making the language nice to use. Are you thinking of having a Rust-like borrow system, and distinguishing mutable references from shared references? If you are, have you seen withoutboat’s post on what a simpler Rust would look like?
I think that’s a good decision. Equirecursive types are neat, but don’t seem to be very useful in practice. You can’t give a type to
lambda f. f f
either, for the same reason (and thus can’t type the Y combinator, as it uses that), but no one seems to complain.Such a bug would be scoped to uses of the library, so it’s really no different from existing libraries. My hope is for all optimizations to be formally verified, though. But that might have to come later.
The hard part is making it performant and ergonomic at the same time, i.e. having explicit reference types which statically enforce mutually exclusive mutability. I have plans for how to do this without requiring a separate borrow checker, but we’ll have to see how ergonomic it is in practice. And, of course, there has to be a solution for cyclic references and other patterns that are not possible on safe Rust.
Yes, I did read those posts a while back, but they’re probably worth rereading.
There’s one very interesting use for self types: https://homepage.divms.uiowa.edu/~astump/papers/fu-stump-rta-tlca-14.pdf
The Kind language, and the other offshoots from FormCore, make excellent use of them. There’s a lot of work needed to adapt dependent types to the concatentive calculus, though, and I don’t expect I’ll be able to tackle that alone or in reasonably short order. If they can be made to work, though, they would be excellent for formal verification.
My point is that it is different from a bug in a library in a typical language. If a library is allowed to muck with things in a way that violates invariants that the language relies on, that can cause a more-or-less arbitrary bug at a more-or-less arbitrary location. For example, if I use a badly written library in Rust that uses unsafe, it could potentially cause my variables to have two mutable references to the same data, or cause a segfault in my code. Bugs inside unsafe are nonlocal; while the problem originates inside the unsafe block, it may manifest itself much later. Thus “unsafe” code is important to audit, and likewise user-written compiler optimizations are important to audit (or formally verify!).
Care to share your high-level idea? :-)
Oh interesting, thanks for the link.
Sorry, I worded that poorly. The point I mean to make is this. The class of bugs that can be introduced in unsafe code in Rust is much worse than the class of bugs that can come from safe code; likewise the class of bugs a library could introduce with (non formally verified) compilation optimizations is much worse than it could introduce without.
Yes, I certainly agree with that.
Static reference counting, essentially, by leveraging qualified types with substructural restrictions and equality constraints.
Slightly ot, but why is “clone” rule called “contraction”?
In the sequent calculus, it looks more like contraction than like clone: https://en.m.wikipedia.org/wiki/Structural_rule
Do you know about dmbarbour’s Awelon? There seem to be similarities between Awelon and Dawn.
I did not! Looks interesting!
In simple English, what does it mean to be transcendental?
FYI, in “ask me anything”, usually it is implied that the question should relate to the topic.
Excited to see progress on Dawn! The post is well written and easy to follow. At least, as easy as you can hope for with denotational semantics — definitely had to stop and think about the reductions :)
I see you’ve implemented ucc in Rust. When you get to implementing Dawn, are you planning to use Rust again?
Thanks! I’m glad to hear it’s easy to follow!
Yes, I plan to stick with Rust for the time being. Eventually it will be self hosting, and eventually there will be a backend that compiles to C, which could provide a mechanism for bootstrapping via a C compiler.
Nitpick: this is not a denotational semantics, but an operational semantics. We call “denotational semantics” the semantics that proceed by interpreting programs as mathematical objects, while “operational semantics” define programs by their computation/reduction behavior (operation).
This is fascinating — I’m not versed enough in theory to contribute any meaningful commentary, but as someone who learned FORTH as a kid I’m enjoying seeing the math underlying it. It took a bit of squinting to make out FORTH’s two stacks and instruction storage in the linear representation here. 🤯
I’m curious how you build up from this to an actually-practical language. For example, integers — do you implement ints as shorthand, like macros that theoretically expand to their Churchian form but which are special-cased by the evaluator before the expansion occurs?
Yeah, the formal semantics don’t make it clear how you would implement this efficiently on actual hardware, which is what Forth does with it’s return stack and by restricting higher-order functions. Forth isn’t strictly concatentive, though.
No, I don’t plan to use these sorts of encodings for values in Dawn. That part of the post was mostly just to excercise the formal semantics of this untyped concatentive calculus a bit.
However, the Kind proof language essentially does use similar sorts of encodings to great effect, in that it keeps the core language an order of magnitude or two smaller than other proof languages with similar amounts of functionality.
So I haven’t completely ruled this out as a possibility.