1. 13
  1. 2

    The foldr implementation looks pretty familiar - it’s roughly the same implementation found in OCaml’s Base library. There’s basically two tricks going on there - the first, which is most of what makes it so ugly, is that it unrolls up to 4 iterations per recursion. This shrinks the callstack and, I expect, improves performance. In the OCaml version, this would happen in a single match rather than a bunch of nested ones, which is a lot easier to understand - maybe there’s something in elm that makes this not work? The second trick is that it counts how many recursions have occurred, and bails to the less-performant tail-call optimized version after hitting some threshold. In this case, the threshold appears to be 500 calls, times the 4x unrolling, means that lists up to 2000 elements get the fast version, and only beyond that does it flip to the slower foldl+reverse implementation to avoid blowing the stack.

    1. 1

      this would happen in a single match rather than a bunch of nested ones

      I imagine it’s for better performance, because unwrapping one level is faster than unwrapping multiple in the current Elm output? I’m not sure, but I imagine that both have been tried out and this one was deemed the winner.

    2. 2

      Reading through this (great!) article made me think about the similarities between TRMC and what Haskell seems to do “by default” via laziness. It turns out, there’s a name for this idea in Haskell as well – guarded recursion[1].

      This helped me understand why we can write the “natural” code in Haskell and it is safe as long as the recursive call is consumed by a lazy constructor (such as (:)).

      module Example where
      
      myMap :: (a -> b) -> [a] -> [b]
      myMap _ [] = []
      myMap f (a : as) = f a : myMap f as
      

      And the usage:

      λ> take 2 $ drop 100000000 $ myMap (+ 1) [0 ..]
      [100000001,100000002]
      

      [1] - https://en.wikipedia.org/wiki/Tail_call#Tail_recursion_modulo_cons

      1. 1

        No CPS? :)

        map f xs =
         let map_helper c f xs =
          match xs:
          | [x:xs] -> map_helper (\xs -> c ((f x) : xs)) f xs
          | nil -> c nil
         in map_helper id f xs
        
        1. 1

          I expect that would be no better than version that maps with an accumulator, and then reverses the result? In this case your intermediate representation is a pile of closures instead of a backwards list, but it has roughly the same shape (i.e. 2-passes, one in each direction).

          1. 1

            I discovered CPS during the course of working on this, and I failed to make it work every time I used it. I actually think that the compile target of Elm (which is JavaScript) doesn’t support CPS because of how closures/scopes work there. In the while loop that the code compiles to, c would be redefined at every iteration (c = function (...) { /* do something with 'c' */) but the c referenced inside the function now references itself, leading to an infinite recursion. And actually, it seems like JavaScript engines don’t remove the tail-call from the c function, meaning this is actually not stack-safe and results in a stack overflow. So there are 2 reasons why this doesn’t work with the (current) compile target.

            Or at least, that’s my current understanding of it and will look into it some more because it would be valuable for this to work.

            But regardless of that, I think that CPS is a lot more complex to write than TRMC functions, and it’s also a lot slower. The benefit of CPS is only that it is more general and can be used in a lot more situations than TRMC and plain TCO. Therefore, I still think that TRMC makes a lot of sense to have in the language.

            1. 2

              JavaScript, and indeed any flavor of ECMAScript, can host trampoline-oriented dispatch loops. I found a tutorial explaining how to do it in a couple different ways, both by hand and with popular libraries. It sounds like the issue is more accurately described as a limitation of the Elm compiler, rather than a limitation of the target platform.

              In general, for any language, tail calls are not optimized. Some languages, like Scheme, require compilers to do it; most languages are like C and leave it up to the compiler.

              The main benefits of CPS come when they are structurally applied to an entire program. I don’t know the current state of Elm’s separate-compilation strategy, but it should be possible to transform either individual modules or entire programs to CPS and emit trampolined JS code.

              1. 1

                the issue is more accurately described as a limitation of the Elm compiler, rather than a limitation of the target platform

                From looking into it a bit more, it’s a bit of both: JavaScript doesn’t help us with its lack of tail call optimization and Elm compiler could do better based on the information that CPS doesn’t seem to be working, I guess it’s just not something that was ever encountered or reported as far as I know, which I find surprising shrug

                With regards to trampoline, isn’t that like really slow? I feel like it could work for when we really want to avoid stack overflows, but the performance cost seems pretty high. Have you ever benchmarked this by any chance?

                1. 1

                  In general, for any language, tail calls are not optimized. Some languages, like Scheme, require compilers to do it; most languages are like C and leave it up to the compiler.

                  At one point, the ecmascript spec mandated tail calls. No browser implemented them, though (except maybe webkit?), so it was removed from the spec. I have no idea what these people are smoking.

                  Even absent tail calls, though, it is possible to do a full CPS transform and get somewhere useful; just do ‘setTimeout(cont, 0)’ every n calls. (Unfortunately, I am told this is somewhat slow.)

                  1. 1

                    That is indeed slow, but also, that is not possible unless you’re working within an asynchronous function (Promise, callback, etc.). You couldn’t call a setTimeout in a function/method like “”.toLowerCase() for instance and expect the result of the function to have the correct value.

                    This would not be applicable to Elm because all functions are synchronous (asynchronous operations are handled differently in Elm).

                    1. 1

                      As I said, this assumes you do a full CPS transform.

                      1. 1

                        Gotcha. Do you know of any languages that do this?