1. 15

Has anyone ever seen a code base that documents the Big O notation on methods - especially memory, or runtime? Most of the (structured) documentation I’ve seen is for C#, Java, Go and Javascript and I haven’t run across anything that has the algorithmic characteristics in the comments.

I’m familiar with the debate of “programmers should understand big O notation” and the debate of “programmers should understand how the code that they are calling works”, but think some documentation would help both of those. While this might be hard, since algorthmic analysis can get hairy, I still think this would make it easier for users of the api to make smarter decisions.

a rough example:

public class HashSet {
    /**
     * @BigO - O(1) memory - O(1) runtime
     */
    public void add(Object o) {
    ....
    }
}

`

  1.  

  2. 10

    There are numerous packages on Hackage which provide Big O annotations - for example see the Data.Map.Strict module (persistent maps.)

    I find it useful as both a rough baseline for my own code as well as a signalling mechanism that the author has thought through some of the higher level API.

    Unfortunately it’s not completely pervasive and Haskell documentation tends to vary wildly in quality, but alot of the more popular base-level libraries do have these annotations.

    1. 6

      It’s surprising how little work has been done on automatic analysis and inference of algorithmic complexity (e.g. incorporating this information into a type system or other static metadata). On the other hand, it actually may be a harder problem than it looks: even program termination is not decidable in general and often requires complex manual proofs (e.g. termination of GCD in Coq).

      On a side note: algorithmic complexity of containers in the C++ standard library is documented.

      1. 1

        I’m a little out of my depth in Coq and the like, but would love if you have any pointers to things I could look into more.

        1. 1

          In Coq, the Fixpoint construct can encode bounded recursion and it is checked for termination: termination must be either inferred or proved. Termination check is easy for structural recursion over inductive types, including Peano-encoded numbers (just iterate until it’s exhausted), but for the Euclid algorithm it’s not obvious: the Euclid algorithm is guaranteed to terminate, but it’s not structurally obvious and requires a number-theoretical proof.

          Coq uses a small set of conservative, syntactic criteria to check termination of all recursive definitions. These criteria are insufficient to support the natural encodings of a variety of important programming idioms. [0]

          [0] http://adam.chlipala.net/cpdt/html/GeneralRec.html

      2. 6
        1. 7

          That’s probably a good thing: When I see those kinds of statements, they’re always wrong.

          Here’s one we can find in Google’s dense hash map documentation: dense_hash_map has a factor of 2-3 memory overhead: if your hashtable data takes X bytes, dense_hash_map will use 3X-4X memory total.

          This of course, is not true. There is in fact an inflection point where the memory usage explodes.

          Something that bothers me is that many programmers believe O(1) is fast, or perhaps at least that O(1) is faster than O(n). It’s often so much more useful to me to know about the “invisible” coefficients when n is small, and where the crossover point is. Even the cache effects. Especially the cache effects.

          Remaining is the risk that the documentation falls out of date. Notebooks seem to help since it can keep the benchmarks with the functions, but this kind of tooling doesn’t seem to be very popular for designing applications.

          1. 3

            Wouldn’t the better answer be to document them better? I think having Big O to start with and then eventually including:

            • “invisible” coefficients
            • inflection points (ala Timsort)
            • effects from caching?

            Theoretically (and this is pretty blue-sky), you could automatically benchmark against the Big O and find out/update some of the above, right?

            1. 1

              It is very difficult to do a good job of discovering and documenting these things; Making good benchmarks is extremely tricky, to the point that there are few programmers that I trust to do this.

              In thinking about it, writing benchmarks is trickier perhaps than any other software development skills, due to the fact the code can work and produce the correct output for decades, but still be wrong.

              I believe my preference is, in general to not want this kind of extra documentation. It won’t be to my standard, so I won’t believe it; If my application uses some of this code and is fast enough, then it was wasted effort, and if it isn’t, I’ll have to benchmark this code myself anyway, meaning the earlier effort was still wasted.

          2. 4

            The C++ reference contains the complexity of various containers and algorithms, although it doesn’t use Big-O notation.

            std::map::find

            std::find

            1. 2

              go sometimes has it in its docs, for example: https://golang.org/pkg/sort/#Sort

              1. 3

                I hadn’t seen that. I know that Russ Cox is pretty algorithmically inclined - his analysis of the algortithms for the new dependency versioning mechanism is really thoughtful, but usually I see it as comment in the implemetation (which is usually more appropriate).

                1. 2

                  thats one thing i like about go, they value the research that has been done and try to implement the best solution (within sane limits).

              2. 2

                I think the most underdocumented area in terms of computational complexity might actually be machine learning, deep learning and the like. I’ve never seen any talk about complexity there.

                1. 2

                  Java often includes the algorithmic complexity, e.g. https://docs.oracle.com/javase/8/docs/api/java/util/HashMap.html

                  1. 1

                    Yeah, but mostly for their core stuff.

                    1. 1

                      I missed that constant-time performance comment - which is very helpful, but wish it was called out a little more explicitly. I wish the memory/runtime complexity were called out as explicitly as the load factor that is liberally sprinkled throughout.

                    2. 2

                      I don’t trust Big O notation. Too many factors throw it off. I’m in favor of a new type of analysis that includes those factors. Alternatively, they can just include performance tests that show it.

                      1. 2

                        This! If performance really matters to you as a library user, don’t blindly trust its documentation. Profile! Use performance analysis tools, there are plenty.

                        1. 1

                          So what’s the new type of analysis? My understanding was that you could include more factors in Big O but people don’t, out of laziness or a desire for (too much) generality.

                        2. 1

                          I think the best docs can do is give you an intuition. Often times there are a mix of datatypes and optimizations in play that can make it super unintuitive. If you want fast code eventually you’re going to have to reach for a profiler and actually measure with differently sized inputs.