I dare say most abstractions become leaky if you push on them hard enough. :) Perhaps one dimension of the utility of an abstraction is just how hard you have to push before it leaks.
Many of the problems they discuss are specific to Haskell and GHC, rather than being a general problem with all pure functional languages. Things like: they can’t inline loops, they have to copy a string in order to pass it to a foreign function. Even the problem with not being able to write low level code in a way that will be compiled and optimised predictably is fixable with the right language abstractions. (This is on my mind because I’m working on fixing some of these issues in my own pure functional language.)
I get where you’re coming from, but from experience, Haskell is a lot leaker than others due to thunking and laziness. OCaml’s a functional language, but I don’t find it any leaker than, say, Python, because it’s strict and has a straightforward runtime. (Which isn’t to say Haskell’s bad/slow/whatever, just that the abstraction tower gets a lot higher the moment you pull laziness into the picture.)
It’s a bit crummy that optimizations in GHC are so hit and miss. We really need a good story for reliable optimizations. As the author notes, it is way too easy for your carefully tuned code to suddenly fall off the good path due to some change in GHC. I wish there were some way to at least say: “don’t compile unless this optimization manages to fire!”. Same goes for hackery around rewrite rules.
I wish there were some way to at least say: “don’t compile unless this optimization manages to fire!”. Same goes for hackery around rewrite rules.
That’s actually possible! There’s a recent package called inspection-testing which (with the help of a GHC plugin) allows testing certain properties of an expression’s compiled form, e.g. that calling some high-level API in a certain way is equal to some particular optimised implementation. AFAIK, since it runs at compile time, it will bail out upon failure, like you ask :)
It’s also feasible to check this sort of thing with benchmarks (e.g. criterion, weigh or bench). Projects like asv (which I’ve used for Haskell via the asv-nix plugin) can track performance over commits and look for step-changes, although that requires a reasonably stable environment to keep timings consistent. It might be more robust to just check the performance of multiple implementations, e.g. that implementation X takes between 1.2 and 1.5 times as long as hard-optimised implementation Y, and between 0.1 and 0.2 times as long as naive implementation Z.
If I’ve got time to make a hard-optimised implementation which I know won’t regress, that’s what I’ll use!
I agree that tracking performance over time is a good technique, if sometimes a bit tricky to set up properly. GHC itself has a cute approach of storing performance metrics inside of git notes (all as part of its CI process).
That’s not to say that GHC plugin backed solutions aren’t pretty neat too. I forget the particulars, but I know the generic-lens package has some neat testing to check that generically-derived lens are as efficient (meaning all the generic code inlines properly) as manually written ones.
If I’ve got time to make a hard-optimised implementation which I know won’t regress, that’s what I’ll use!
I meant something more like the blog post, which compares against alternatives written in C and Rust, which turned out to be unusable in their project for different reasons. I also like the blog’s use of a dummy implementation to provide a lower bound, which we could do if there are no suitable alternatives to benchmark against.
Also, just in case I was misunderstood about inspection-testing, the idea there is that we compare a general solution called with some known arguments, against a non-solution which just-so-happens to hard-code those arguments. For example, if we want to check that the function prefixEachWith will unroll/inline the recursion on its first argument, we can compare:
The rhs is optimised, in a way that won’t regress, but only for this particular test. It isn’t a general solution to the problem.
GHC itself has a cute approach of storing performance metrics inside of git notes (all as part of its CI process).
It’s a nice idea, but unfortunately the only time I see it come up is when the allowed boundaries on the timings get bumped to stop spurious-seeming build failures :( I think tracking benchmarks over time is good for human perusal, and maybe some warnings, but not worth aborting a build over. Benchmarking different implementations (perhaps in different languages, perhaps dummies) is less fragile, and perhaps worth aborting for, since we’re measuring relative performance, which should cancel out many aspects of the particular machine being used.
Very interesting read. Makes me think pure functional programming is a leaky abstraction.
I dare say most abstractions become leaky if you push on them hard enough. :) Perhaps one dimension of the utility of an abstraction is just how hard you have to push before it leaks.
Many of the problems they discuss are specific to Haskell and GHC, rather than being a general problem with all pure functional languages. Things like: they can’t inline loops, they have to copy a string in order to pass it to a foreign function. Even the problem with not being able to write low level code in a way that will be compiled and optimised predictably is fixable with the right language abstractions. (This is on my mind because I’m working on fixing some of these issues in my own pure functional language.)
I get where you’re coming from, but from experience, Haskell is a lot leaker than others due to thunking and laziness. OCaml’s a functional language, but I don’t find it any leaker than, say, Python, because it’s strict and has a straightforward runtime. (Which isn’t to say Haskell’s bad/slow/whatever, just that the abstraction tower gets a lot higher the moment you pull laziness into the picture.)
Wow, lots of great advice on exactly how to discover useful optimizations for Haskell code. I gotta go home and try some of these right now!
Great article!
It’s a bit crummy that optimizations in GHC are so hit and miss. We really need a good story for reliable optimizations. As the author notes, it is way too easy for your carefully tuned code to suddenly fall off the good path due to some change in GHC. I wish there were some way to at least say: “don’t compile unless this optimization manages to fire!”. Same goes for hackery around rewrite rules.
That’s actually possible! There’s a recent package called inspection-testing which (with the help of a GHC plugin) allows testing certain properties of an expression’s compiled form, e.g. that calling some high-level API in a certain way is equal to some particular optimised implementation. AFAIK, since it runs at compile time, it will bail out upon failure, like you ask :)
I’ve also come across other testing libraries for non-functional properties (i.e. things other than input/output behaviour), e.g. http://hackage.haskell.org/package/StrictCheck
It’s also feasible to check this sort of thing with benchmarks (e.g. criterion, weigh or bench). Projects like asv (which I’ve used for Haskell via the asv-nix plugin) can track performance over commits and look for step-changes, although that requires a reasonably stable environment to keep timings consistent. It might be more robust to just check the performance of multiple implementations, e.g. that implementation X takes between 1.2 and 1.5 times as long as hard-optimised implementation Y, and between 0.1 and 0.2 times as long as naive implementation Z.
If I’ve got time to make a hard-optimised implementation which I know won’t regress, that’s what I’ll use!
I agree that tracking performance over time is a good technique, if sometimes a bit tricky to set up properly. GHC itself has a cute approach of storing performance metrics inside of git notes (all as part of its CI process).
That’s not to say that GHC plugin backed solutions aren’t pretty neat too. I forget the particulars, but I know the
generic-lens
package has some neat testing to check that generically-derived lens are as efficient (meaning all the generic code inlines properly) as manually written ones.I meant something more like the blog post, which compares against alternatives written in C and Rust, which turned out to be unusable in their project for different reasons. I also like the blog’s use of a dummy implementation to provide a lower bound, which we could do if there are no suitable alternatives to benchmark against.
Also, just in case I was misunderstood about inspection-testing, the idea there is that we compare a general solution called with some known arguments, against a non-solution which just-so-happens to hard-code those arguments. For example, if we want to check that the function
prefixEachWith
will unroll/inline the recursion on its first argument, we can compare:The
rhs
is optimised, in a way that won’t regress, but only for this particular test. It isn’t a general solution to the problem.It’s a nice idea, but unfortunately the only time I see it come up is when the allowed boundaries on the timings get bumped to stop spurious-seeming build failures :( I think tracking benchmarks over time is good for human perusal, and maybe some warnings, but not worth aborting a build over. Benchmarking different implementations (perhaps in different languages, perhaps dummies) is less fragile, and perhaps worth aborting for, since we’re measuring relative performance, which should cancel out many aspects of the particular machine being used.