I’m curious: Why are expressions converted to a linear RPN form, rather than something like a parse tree? Is it for performance?
This reminds me of the way queries are handled in Couchbase Lite (my day job). The various language bindings have “fluent” APIs for describing & building a query; these produce an intermediate JSON representation that the cross-platform core code traverses like an AST to generate SQL.
The JSON representation is mostly composed of arrays whose first elements are strings identifying the operation — very much inspired by S-expressions. For example, [“<“, [“$X”], 5] to represent $X < 5.
Later we added support for writing queries in N1QL, aka SQL++, a SQL extension that operates on JSON-like data instead of columnar, and it was just a matter of implementing a PEG grammar that produces our JSON format.
Fantastic question! Yep, the first version I designed was a parse tree. I roughly had a msgpack ext type for each AST node. For example, I had a dedicated “function” ext structure that mapped args to bodies.
The problem with this approach was that msgpack generally incurred a heavy 3-byte overhead for each ext type. 3-bytes at every AST node was really not great design.
But even worse, the parse tree was more complicated to explain and implement. To give you an idea, the docs were roughly 4x longer before I switched to RPN.
Anyway. To make things simpler, I had the idea of just encoding the tokens in an “expr” msg ext type. To make the implementation simpler, I wanted to get rid of operator precedence. And once I landed on RPN, a lot of other nifty features got unlocked.
I’m curious: Why are expressions converted to a linear RPN form, rather than something like a parse tree? Is it for performance?
This reminds me of the way queries are handled in Couchbase Lite (my day job). The various language bindings have “fluent” APIs for describing & building a query; these produce an intermediate JSON representation that the cross-platform core code traverses like an AST to generate SQL.
The JSON representation is mostly composed of arrays whose first elements are strings identifying the operation — very much inspired by S-expressions. For example,
[“<“, [“$X”], 5]to represent$X < 5.Later we added support for writing queries in N1QL, aka SQL++, a SQL extension that operates on JSON-like data instead of columnar, and it was just a matter of implementing a PEG grammar that produces our JSON format.
Fantastic question! Yep, the first version I designed was a parse tree. I roughly had a msgpack ext type for each AST node. For example, I had a dedicated “function” ext structure that mapped args to bodies.
The problem with this approach was that msgpack generally incurred a heavy 3-byte overhead for each ext type. 3-bytes at every AST node was really not great design.
But even worse, the parse tree was more complicated to explain and implement. To give you an idea, the docs were roughly 4x longer before I switched to RPN.
Anyway. To make things simpler, I had the idea of just encoding the tokens in an “expr” msg ext type. To make the implementation simpler, I wanted to get rid of operator precedence. And once I landed on RPN, a lot of other nifty features got unlocked.
Prediction: Scrapscript 2.0 becomes a concatenative language ;-)