Is this approach fundamentally biased towards languages that favor dynamism and introspection, or can it be meaningfully used with languages that provide strong static guarantees that go beyond memory safety? Could I compose, say, Standard ML and Haskell, without losing the features that make them interesting? In SML’s case: strictness, abstract types and the module system. In Haskell’s case: laziness, higher kinds and effect segregation.
I’d really like to try putting static languages in the mix.
Although you could expect said static languages to be more suited to ahead of time compilation (AOT), there may be situations where runtime feedback might still be useful.
For example, traces by definition aggressively inline calls, and since traces are linear, code may be optimised more effectively than an ahead of time compiler (which must account for branches) could.
Perhaps C code which uses lots of function pointers would benefit from runtime speculation…
Indeed, runtime optimizations are very useful, and I certainly don’t want to deprive myself of the opportunity to take advantage of them. But my original concern is orthogonal to this: can I compose two languages that make strong static guarantees, without losing these guarantees? I want to avoid what usually happens when one uses a C FFI: as soon as you call C code, memory safety is gone, type safety is gone, module privacy and language-enforced separation of concerns are gone.
Ah. I see what you are asking now. And good question!
Currently, our language boxes are (bytecode) compiled lazily upon first invocation, and completely independently of the contents of other boxes. The type conversions at the language peripheries are then decided in an “on-demand” fashion using dynamic types.
To answer your question in short, “no”. If you wanted to compose two strongly statically typed languages with guarantees, using the techniques we have today, you would probably end up having to perform dynamic type checks when crossing from language to another.
That’s not to say that we couldn’t start thinking about ways to fix that. Perhaps we could devise compile-time cross-language type mappings somehow? Any ideas?
Ah, that’s sad, but it’s a honest answer, and thus very much appreciated. Even in this case, your proposal is already a strict improvement over “all cross-language communication goes through C” and “all cross-language communication goes through byte or text streams that have to be deserialized and serialized all over the place”.
And, sadly, nope, no ideas yet. But hopefully in the near future.
This is pretty neat. People interested in this topic might also find two methods from the past to be interesting. Link below is Ten15 VM that tried to integrate all languages using high-level instead of low-level form. Other was OpenVMS Common Language Environment: a standard for data types and calling conventions that allowed you to do cross-language programming at native level. Microsoft, having poached VMS designer, later did that in their VM for applications with a strangely, similar name. ;) Anyway, it’s a trick that more OS’s and platforms should do because it simplifies the FFI or IPC aspects. It can have compatibility issues but we can probably make source-to-source translaters that can fix that over time.
Neat, but I wonder whether there is a benefit for the cost to do this in production. Xerox initiated the Inter Language Unification (ILU) mechanism and it worked very well in practice. Something like that seems much more affordable for most cases.
[Comment removed by author]
Because of PHP and Python?
We didn’t really end up with any additional syntactic quirks via the composition, but-semantic quirks, yes, granted. These semantic quirks were an interesting outcome that we did not anticipate, but we didn’t encounter one we couldn’t address. There’s more info on this in the full conference paper:
Semantic quirks aside, the aim of this research was to show that we can go above and beyond the limited architecture of your typical FFI, at a higher level than the system ABI, and whilst having good performance. These are (hopefully) the takeaway messages of our work.
As the article explains, we work around the problems associated with composing PL grammars (ambiguity, shadowing, “undefinedness”) by using “language boxes”. A paper describing this technology can be found here:
In short, each language box is parsed separately, so you never end up in a situation where ambiguity can occur. Otherwise we would have fallen at the first hurdle with PHP and Python, which both have (for example) a “for” keyword.
In our editor, language boxes are shown in different colours.
It’s not as hard as you might imagine. CTRL+L opens a menu asking what kind of language box to open, CTRL+SHIFT+L moves one level up the language box stack. That’s really all there is to it.
Currently, all boxes on the same level have the same colour, and each level has a different colour. In theory, if the grammar allowed, you could have two same-level boxes next to each other, yes. That said, changing the shading mechanism so that each and every box has a unique colour, would be trivial.
In newer versions of our editor the current language box also has 1 pixel wide square braces ‘’ around the contents of the box to help user comprehension.
Finally, the parsing status window on the right – which is effectively a tree – can help you know where you are in your document.
Hope that helps.