1. 10
  1.  

  2. 12

    While I agree with the analysis in this article it strikes me that these were precisely the caveats that “effectively” was meant to capture.

    1. 10

      The conclusion of the fact that big-O notation is defined using an infinite limit should be that “standard big-O analysis doesn’t really make perfect sense for functions over finite types”, not “let’s pretend 32-bit integers go to infinity”.

      You don’t need to fudge a hypothetical comparison when you can just compare a 4 billion element Scala vector vs a 4 billion element array.

      1. 7

        The author is quite correct though I feel they could have written a shorter post that was a little less polemic. The basic point is that while O(log n) algorithms are so powerful precisely because log n grows so slowly one should not mis-characterize them as O(1) algorithms.

        From the article linked in the post the scala team writes:

        Vectors allow accessing any element of the list in “effectively” constant time. It’s a larger constant than for access to the head of a list or for reading an element of an array, but it’s a constant nonetheless. As a result, algorithms using vectors do not have to be careful about accessing just the head of the sequence. They can access and modify elements at arbitrary locations, and thus they can be much more convenient to write.

        and

        Two hops from the root of the tree to the final element node are sufficient for vectors with up to 2^15 elements, three hops for vectors with 2^20, four hops for vectors with 2^25 elements and five hops for vectors with up to 2^30 elements. So for all vectors of reasonable size, an element selection involves up to 5 primitive array selections. This is what we meant when we wrote that element access is “effectively constant time”.

        The words “effectively constant time” here are meaningless. I prototype my code with a test data set of 1000, find some run time, then I run it on my real data set of 35 million points and discover it now takes twice as long.

        It’s not a constant, though they could say they have made a space-time tradeoff that favors extremely slow growth. But then they would have to benchmark to show us that the constant is not so large that it’s a wash. And that kind of benchmarking is hard.

        1. 4

          If you’d like to see an example of how these numbers play out in practice, take a look at Zach Tellman’s benchmarks for Bifurcan (https://github.com/lacuna/bifurcan/blob/master/doc/benchmarks.md), which compare Java’s ArrayList to Clojure’s vectors (which, like Scala’s vectors, are 32-way hash array mapped tries), and also Java’s HashMap and HashSet to Clojure’s hashmaps. You can also see how some of the decisions in his linear and persistent collections (List vs Linearlist, Map vs LinearMap, etc) affect performance.

          1. 3

            It’s a mistake to write O(log32(n)) ‘cause logarithm base doesn’t matter in terms of big-Oh.