In CHICKEN, the compiler performs CPS transformation on all user code, but then it tries to eliminate as many CPS calls to C intrinsics (builtin functions) as possible. Also, user code which does not allocate and is not exported outside of the compilation unit is converted into flat C code that returns.
The reason is that explicitly wrapping up the continuation (which is basically an array with the function pointer and the arguments), calling the intrinsic and then unwrapping the continuation again when the intrinsic is done is, well, simply slower than just calling the C function and letting it return.
However, CPS calls are the underlying mechanism that allows for fast allocations on the stack (due to Cheney on the MTA), and gives you call/cc for free. It also makes green threads very easy to implement; you already have the suspension inside the continuation, so you can just schedule another thread from your list of threads and invoke the current continuation from it. This means we don’t need any of that async “which colour is this function” bullshit.
So, as always, it’s a tradeoff :)
CPS as an intermediate representation of compilers can actually make the CPS more efficient than the direct encoding, as it matches the compiler’s model of the control flow graph more closely (and CPS is structurally equivalent to SSA). The really important point is closure representation, which can make the CPS variant very efficient even with a compiler that is not “sufficiently smart”, when you have optimizations like what Appel called “callee save registers” and specialized representations (as described in the papers on closure analysis from Zhao, which I believe, have been implemented in SML/NJ). Escape analysis is crucial, of course, but even a direct style compiler will have to do this to be competitive.
So, unless the compiler is very simple or when continuation closures escape, it basically doesn’t make a difference what style you use, with the exception that the performance of the worst case (the continuation itself escapes) will always be abysmal with the direct style compiler.
This might be beside the point, but when I look at the example code, it looks relatively ordinary ‘callback-oriented’ code. Is there anything more specific to CPS that this code illustrates?
CPS is mostly for code transformation as a compiler/optimizer pass, or a straightforward way to get call/cc. Though there are a few use cases besides it. Checkout: http://matt.might.net/articles/by-example-continuation-passing-style/
Yeah, I have the same confusion. The only other use of CPS I know of is in parsers, where it’s a bit more esoteric than just using callbacks..l
No, it’s the same thing, unless you use call/cc.
This all seems predicated on Sufficiently Aggressive Inlining. What do you do when the Inlining becomes too expensive? (In branch mispredictions, instruction cache misses, decoding time…)
You don’t have to inline to make it faster; as I explain in another comment here, you can also convert the CPS call to a regular returning function if you know you can customize the function for the local compilation unit to not require a continuation argument (i.e., it doesn’t escape and isn’t exported).