1. 7
  1.  

  2. 6

    If you read more about big-O, you may see the word “amortized” thrown around. That’s a fancy word for “average,”

    Uh, no. “Amortized” means that earlier operations may pay the cost of later ones. By the time you need to perform an expensive operation, you will have performed enough cheap ones, so that the cost of the entire sequence of operations is bounded above by the sum of their amortized costs.

    1. 3

      Thinking about it briefly, “average” is a weaker notion than amortised. All amortised-cost algorithms are also average-cost algorithms with the same bound, but not all average-cost algorithms qualify as amortised-cost algorithms with the same bound.

      If some algorithm does n operations each with O(1) cost followed by 1 operation with O(n) cost, that qualifies as O(1) per-operation amortised and qualifies as O(1) per-operation on average.

      If another algorithm does 1 operation with O(n) cost followed by n operations each with O(1) cost, it qualifies as O(1) per-operation on average, but does not qualify as O(1) per-operation amortised.

      As a consumer of some algorithm, if you are given an amortised cost bound and you translate that to “average” in your head then you’ll be being slightly pessimistic. Amortised cost bounds imply that some of the cheap operations happen before any of the expensive ones, but average cost bounds merely imply that the cost of the expensive operations is balanced out by the cost of the cheap ones.

      1. 3

        If another algorithm does 1 operation with O(n) cost followed by n operations each with O(1) cost, it qualifies as O(1) per-operation on average, but does not qualify as O(1) per-operation amortised.

        This is a common misconception I see pop up often (I guess I’d blame the poor naming?): “Amortised” means that, for any sequence of n operations, you can prove that the total amortised cost of the operations is an upper bound on the total actual cost. It doesn’t mean that you have to run a couple of cheap instructions before you can “afford” to run an expensive one.

        People usually tend prove a stronger claim though: That this upper bound also holds for all intermediate stages as well. That’s what you do when you end up using the banker’s/physicist’s method, by ensuring you don’t spend “credits” you don’t yet have. But there are other ways to analyse the total cost that does not force you to save up “credits” before running an expensive operation.

        1. 2

          “Amortised” means that, for any sequence of n operations, you can prove that the total amortised cost of the operations is an upper bound on the total actual cost. It doesn’t mean that you have to run a couple of cheap instructions before you can “afford” to run an expensive one.

          Uh… can you show me a case where “you must run at least O(n) operations that are O(1) apiece before you may run an operation that is O(n)” is not equivalent to “you can’t burn all your time credits before they’re allocated by operations that yield more than they spend”?

          I think they’re equivalent unless you specify that some operations allocate more than O(1) credits at a time… which I don’t think anyone ever does, and I don’t think is allowed in the definition.

          But there are other ways to analyse the total cost that does not force you to save up “credits” before running an expensive operation

          Doesn’t it fall out of the rest of the definition? If I can perform a given sequence of operations, then I can perform any prefix of that sequence of operations and stop early. Suppose that you’re got a proof of amortised time complexity for a data structure for which a sequence of operations A,B is legal, where the upper bound doesn’t hold after A has completed but does hold again by the time that B has completed. Suppose I stop early after performing A. I’ve now performed a sequence of operations. The upper bound does not hold at this point. I’m not going to perform any more operations. The upper bound doesn’t hold at the end of this sequence. The invariant “for any sequence of n operations, I can prove that the total amortised cost of the operations is an upper bound on the total actual cost” has been violated.

          1. 2

            Suppose I stop early after performing A. I’ve now performed a sequence of operations.

            If you stopped after performing A, you have not performed a sequence of m operations (in your case, 2), which is what one has given an amortised bound for. When you have proved an amortised upper bound O(f(n)) for an operation, then you have proved that any sequence of m operations will have the upper bound O(m (f(n)). Typically m = n.

            It’s probably confusing that I wrote n instead of m in the previous comment, so sorry about that.

            1. 2

              Is the distinction that an amortised cost proof can specify a minimum number of operations, prior to which the bound doesn’t have to hold?

              I’ve never seen them used like that, only for “all sequences of API calls result in actual cost bounded by amortised cost” way (which is the kind of proof Chris Okasaki gives for lots of the structures in his book).

              1. 2

                Well, yes, proof of an amortised bound only requires you to specificy that for exactly m operations, the actual total cost has an upper bound of the total amortised cost. And as you mention, it’s very common to prove up to and including m operations instead.

                1. 1

                  Isn’t there a forall m. in there somewhere?

          2. 1

            Right. By “cheap”, I meant “actual cost < amortized cost”, not “low actual cost”.

          3. 2

            There are more fundamental differences between amortized and average-case analyses:

            Amortized analysis uses an accounting system to track computation whose cost has been paid for, but not performed yet. This accounting system is justified entirely by the needs of the analysis, not external problem domain constraints. Amortized analysis naturally arises in the design of batch processes, e.g. formatting disk partitions or compiling programs ahead of time.

            Average-case analysis assumes a probability distribution over the input space, which is necessarily not uniform, because no uniform distribution exists over a countable set (such as the set of lists, trees, graphs, etc. that an algorithm can process). This probability distribution can only possibly be justified by the external nature of the problem domain, in which some inputs are deemed more likely than others. Average-case analysis naturally arises in the design of heuristic algorithms that perform well but in unlikely pathological cases.

            1. 3

              Average-case analysis assumes a probability distribution over the input space

              Pretty sure this isn’t what Ned’s going for here. TFA actually means something much simpler like “when a library says ‘amortised O(1) time per operation’, you can add up the total number of nanoseconds you spent in the library and divide it by the number of operations you did and come out with a number that will be bounded above by some constant’.”

        2. 5

          Another writeup (mine): http://nathanmlong.com/2015/03/understanding-big-o-notation/

          The real-world examples I gave were around greeting people at a conference.

          • Shaking every hand is O(N): 100 people means 100 handshakes.
          • Introducing each person to everyone else is O(N^2): if there are 10 people and 1 more walks in, it’s 10 more handshakes; if there are 100 people and 1 more walks in, it’s 100 more handshakes.
          • Waving at everyone from stage is O(1); the number of people is irrelevant.

          I applied this to “how would you re-implement Ruby’s Hash?” in http://nathanmlong.com/2015/10/reimplementing-rubys-hash/

          1. 1

            Well, this taught me one thing I didn’t know (that ‘O’ really can literally be read as ‘on the order of’, per https://en.wikipedia.org/wiki/Big_O_notation#History_.28Bachmann.E2.80.93Landau.2C_Hardy.2C_and_Vinogradov_notations.29). But it’s unfortunate that the post abandons some precision and accuracy, even if the intention is to introduce big-O to newcomers.

            For example, big-O is an upper-bound, not a tight bound. Also, the reason we throw away the lower-order terms is because the higher-order terms dominate as n gets bigger. For small n, the lower-order terms can dominate.

            Seems like these issues come around every time there’s a new ‘big-O for beginners’ article - which is often - so maybe we have to treat this like grammatical usage and figure that the battle is lost. I checked, and I wrote a (much longer) response like this back in 2008 (https://stackoverflow.com/questions/107165/big-o-for-eight-year-olds/107883#107883) so I guess I’m just a consistent curmudgeon :)