1. 12
  1. 3

    When I tested this myself the tail recursive version was substantially faster.

    code

    -module(tco).
    -compile(export_all).
    
    map_body([], _Func) -> [];
    map_body([Head | Tail], Func) ->
      [Func(Head) | map_body(Tail, Func)].
    
    map_reversed([], Acc, _Func) -> Acc;
    map_reversed([Head | Tail], Acc, Func) ->
      map_reversed(Tail, [Func(Head) | Acc], Func
    

    in the erlang shell

    Erlang/OTP 18 [erts-7.0] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:10] [hipe] [kernel-poll:false]
    
    Eshell V7.0  (abort with ^G)
    1> c(tco).
    {ok,tco}
    2> Data = lists:seq(1,1000000).
    [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
     23,24,25,26,27,28,29|...]
    3> Succ = fun(X) -> X + 1 end.
    #Fun<erl_eval.6.54118792>
    4> timer:tc(tco, map_reversed, [Data, [], Succ]).
    {2844687,
    [1000001,1000000,999999,999998,999997,999996,999995,999994,
     999993,999992,999991,999990,999989,999988,999987,999986,
     999985,999984,999983,999982,999981,999980,999979,999978,
     999977,999976,999975|...]}
    5> timer:tc(tco, map_body, [Data, Succ]).
    {4678078,
    [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,
     24,25,26,27,28|...]}
    
    1. 2

      You did not reverse the list, and for some reason the first measurement of timer.tc can often be off. Plus, it might be that garbage collection triggered while benchmarking map_body. Benchee measures multiple runs and runs garbage collection in between. Might also be something else, though - as the erlang page mentions architecture can also have an impact.

      1. 4

        Here it is with also reversing it after and its still faster. There is a consistent 2 second difference, this is not random fluctuations.

        map_tco(List, Func) -> lists:reverse(map_reversed(List, [], Func)).
        
        5> timer:tc(tco, map_tco, [Data, Succ]).
        {2776833,
         [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,
          24,25,26,27,28|...]}
        6> timer:tc(tco, map_body, [Data, Succ]).
        {4498311,
         [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,
          24,25,26,27,28|...]}
        
        1. 1

          In the shell it indeed seems to behave like mapbody is slower on the first run (at least with the input list you used, 1000000 elements, I ran the benchmark with 10000 elements)

          iex(1)> list = Enum.to_list 1..1000000
          [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
           23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42,
           43, 44, 45, 46, 47, 48, 49, 50, ...]
          iex(2)> my_fun = fn(i) -> i + 1 end
          #Function<6.50752066/1 in :erl_eval.expr/5>
          iex(3)> :timer.tc fn -> MyMap.map_tco(list, my_fun) end
          {458488,
           [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
            23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41,
            42, 43, 44, 45, 46, 47, 48, 49, 50, ...]}
          iex(4)> :timer.tc fn -> MyMap.map_body(list, my_fun) end
          {971825,
           [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
            23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41,
            42, 43, 44, 45, 46, 47, 48, 49, 50, ...]}
          

          However, running it more often in the same iex session map_body gets faster:

          iex(5)> :timer.tc fn -> MyMap.map_tco(list, my_fun) end 
          {555394,
           [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
            23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41,
            42, 43, 44, 45, 46, 47, 48, 49, 50, ...]}
          iex(6)> :timer.tc fn -> MyMap.map_tco(list, my_fun) end
          {505423,
           [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
            23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41,
            42, 43, 44, 45, 46, 47, 48, 49, 50, ...]}
          iex(7)> :timer.tc fn -> MyMap.map_tco(list, my_fun) end
          {467228,
           [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
            23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41,
            42, 43, 44, 45, 46, 47, 48, 49, 50, ...]}
          iex(8)> :timer.tc fn -> MyMap.map_body(list, my_fun) end
          {636665,
           [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
            23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41,
            42, 43, 44, 45, 46, 47, 48, 49, 50, ...]}
          iex(9)> :timer.tc fn -> MyMap.map_body(list, my_fun) end
          {493285,
           [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
            23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41,
            42, 43, 44, 45, 46, 47, 48, 49, 50, ...]}
          iex(10)> :timer.tc fn -> MyMap.map_body(list, my_fun) end
          {490130,
           [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
            23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41,
            42, 43, 44, 45, 46, 47, 48, 49, 50, ...]}
          

          That might be it. Benchee runs a warmup of 2 seconds where it doesn’t measure (to simulate a warm/runnign system) and then takes measurements for 5 seconds so that we have lots of data points.

          Might also still be something with elixir and/or hardware :) Maybe I should retry it with Erlang, but not this morning :)

          1. 2

            Never benchmark in the shell, it’s an interpreter. Compile the module with the benchmarker included and run that.

            1. 1

              Not sure what exactly you mean, the original benchmark was run compiled not in the shell. This here was just done for comparison with the reported erlang benchmarks - my erlang_fu (which is not very existent at this point in time) I currently fail to do that and don’t have the time to look it up atm :)

              1. 1

                Ok, not like an elixir script executable like I’d want to but I wrote a benchmark function that benchmarks it and then called that in the shell - good enough for now I guess. There it is all much faster, and map_body seems to be about as fast as the non reversed tco version or even faster. I’d still need a proper benchmark to determine it all though.

                  3> c(tco).         
                {ok,tco}
                4> tco:benchmark().
                map_tco
                23412
                18666
                18542
                19709
                20939
                map_body
                19908
                20046
                19854
                19753
                18869
                ok
                4> tco:benchmark().
                map_tco
                23729
                21282
                24711
                23922
                18387
                map_body
                19274
                19624
                18598
                19073
                18685
                ok
                

                code

            2. 1

              Ok, I ran your code in erlang and I also get consistently faster results for the TCO version. I don’t get it, it is the same function I wrote in Elixir. The interesting thing for me, comparing Erlang and Elixir is, that with the same list size and what I think are equivalent implementations map_body seems to be much slower in Erlang. E,g, compare the numbers here to the other post where I do the same in Elixir and iex. In Elixir map_body settles at around 490k microseconds, the erlang version is between 904k microseconds and 1500k microseconds.

              1>  c(tco).
              {ok,tco}
              2> Data = lists:seq(1,1000000).
              [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
               23,24,25,26,27,28,29|...]
              3>  Succ = fun(X) -> X + 1 end.
              #Fun<erl_eval.6.50752066>
              4> timer:tc(tco, map_reversed, [Data, [], Succ]).
              {477397,
               [1000001,1000000,999999,999998,999997,999996,999995,999994,
                999993,999992,999991,999990,999989,999988,999987,999986,
                999985,999984,999983,999982,999981,999980,999979,999978,
                999977,999976,999975|...]}
              5> timer:tc(tco, map_body, [Data, Succ]).
              {826180,
               [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,
                24,25,26,27,28|...]}
              6> timer:tc(tco, map_reversed, [Data, [], Succ]).
              {472715,
               [1000001,1000000,999999,999998,999997,999996,999995,999994,
                999993,999992,999991,999990,999989,999988,999987,999986,
                999985,999984,999983,999982,999981,999980,999979,999978,
                999977,999976,999975|...]}
              7> timer:tc(tco, map_reversed, [Data, [], Succ]).
              {471386,
               [1000001,1000000,999999,999998,999997,999996,999995,999994,
                999993,999992,999991,999990,999989,999988,999987,999986,
                999985,999984,999983,999982,999981,999980,999979,999978,
                999977,999976,999975|...]}
              8> timer:tc(tco, map_reversed, [Data, [], Succ]).
              {461504,
               [1000001,1000000,999999,999998,999997,999996,999995,999994,
                999993,999992,999991,999990,999989,999988,999987,999986,
                999985,999984,999983,999982,999981,999980,999979,999978,
                999977,999976,999975|...]}
              9> timer:tc(tco, map_body, [Data, Succ]).        
              {904630,
               [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,
                24,25,26,27,28|...]}
              10> timer:tc(tco, map_body, [Data, Succ]).
              {970073,
               [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,
                24,25,26,27,28|...]}
              11> timer:tc(tco, map_body, [Data, Succ]).
              {1485897,
               [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,
                24,25,26,27,28|...]}
              
          2. 2

            I reran your benchmark and got similar results (consistently over 10x runs) that is around 35-40% faster:

            1> Data = lists:seq(1,1000000).
            [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
             23,24,25,26,27,28,29|...]
            2> Succ = fun(X) -> X + 1 end.
            #Fun<erl_eval.6.50752066>
            3> timer:tc(tco, map_reversed, [Data, [], Succ]).
            {810879,
             [1000001,1000000,999999,999998,999997,999996,999995,999994,
              999993,999992,999991,999990,999989,999988,999987,999986,
              999985,999984,999983,999982,999981,999980,999979,999978,
              999977,999976,999975|...]}
            4> timer:tc(tco, map_body, [Data, Succ]).
            {1250838,
             [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,
              24,25,26,27,28|...]}
            

            Does it have something to do with the fact he used Elixir?

          3. 2

            I’m probably just really dense, but it isn’t obvious to me why the reverse lists stuff is being used. Is it just to deal with the fact that the TCO versions create a swapped list because of the way they’re written?

            I’m also curious if there are any good ways of comparing the compiled reductions/bytecode from the different methods–in C you’d just look at the assembly and clearly see what the compiler thought it was doing.

            Thank you for introducing me to the benchmarking tool! I’ll be using that in my refresh of my math library.

            1. 1

              No you are not dense :) The list is reversed because the first element in the list ends up being the last as the first element is the first you ad to the new list and you always append at the start (as the list is a linked list). If you look for a concrete example, the version of MyMap in my elixir_playground repo has doctests showing how it behaves without reverse. For the bytecode thing, there is a gist from sasa juric showcasing just that. I omitted it from the blog post, maybe should have included it :) Cool that you’ll use benchee, if you want to know more about it there is also an introduction blog post.

              1. 2

                So, given two datapoints (TCO slower than non-TCO, and flat_map slower than map |> flatten)…

                …is it possible at all that Benchee might have a bug?

                1. 2

                  It is always possible. Benchee or something Benchee uses might have a bug. I don’t think so, Benchee is well tested and rather simple. Plus you can reproduce the same with running :timer.tc a few times in a shell. Before I wrote benchee I also used Benchfella - results are similar much identical:

                    ## MyMapBench
                    [06:38:12] 1/3: map with TCO reverse
                    [06:38:15] 2/3: map with TCO and ++
                    [06:38:17] 3/3: map simple without TCO
                  
                    Finished in 6.26 seconds
                  
                    ## MyMapBench
                    map simple without TCO       10000   164.55 µs/op
                    map with TCO reverse         10000   211.55 µs/op
                    map with TCO and ++             10   188430.40 µs/op
                  

                  Did the same double check with flat_map back then. Plus I rather blog about unusual findings, other benchmarks are pretty much what you’d expect them to be (recursion is fastest for repeating something n times, sorting 100k elements is ~15 times slower than sorting 10k elements etc.) There are some more samples in benchees samples folder :)

                  Plus especially for this I sat down and read a lot (the Erlang performance myths etc.) as I also found it highly unlikely and needed to see that this discussion has been had and explained before to be courageous enough to blog about it in an environment that I’m not super familiar with (yet)

            2. [Comment removed by author]