If you read more about big-O, you may see the word “amortized” thrown around. That’s a fancy word for “average,”
Uh, no. “Amortized” means that earlier operations may pay the cost of later ones. By the time you need to perform an expensive operation, you will have performed enough cheap ones, so that the cost of the entire sequence of operations is bounded above by the sum of their amortized costs.
Thinking about it briefly, “average” is a weaker notion than amortised. All amortised-cost algorithms are also average-cost algorithms with the same bound, but not all average-cost algorithms qualify as amortised-cost algorithms with the same bound.
If some algorithm does n operations each with O(1) cost followed by 1 operation with O(n) cost, that qualifies as O(1) per-operation amortised and qualifies as O(1) per-operation on average.
If another algorithm does 1 operation with O(n) cost followed by n operations each with O(1) cost, it qualifies as O(1) per-operation on average, but does not qualify as O(1) per-operation amortised.
As a consumer of some algorithm, if you are given an amortised cost bound and you translate that to “average” in your head then you’ll be being slightly pessimistic. Amortised cost bounds imply that some of the cheap operations happen before any of the expensive ones, but average cost bounds merely imply that the cost of the expensive operations is balanced out by the cost of the cheap ones.
If another algorithm does 1 operation with O(n) cost followed by n operations each with O(1) cost, it qualifies as O(1) per-operation on average, but does not qualify as O(1) per-operation amortised.
This is a common misconception I see pop up often (I guess I’d blame the poor naming?): “Amortised” means that, for any sequence of n operations, you can prove that the total amortised cost of the operations is an upper bound on the total actual cost. It doesn’t mean that you have to run a couple of cheap instructions before you can “afford” to run an expensive one.
People usually tend prove a stronger claim though: That this upper bound also holds for all intermediate stages as well. That’s what you do when you end up using the banker’s/physicist’s method, by ensuring you don’t spend “credits” you don’t yet have. But there are other ways to analyse the total cost that does not force you to save up “credits” before running an expensive operation.
“Amortised” means that, for any sequence of n operations, you can prove that the total amortised cost of the operations is an upper bound on the total actual cost. It doesn’t mean that you have to run a couple of cheap instructions before you can “afford” to run an expensive one.
Uh… can you show me a case where “you must run at least O(n) operations that are O(1) apiece before you may run an operation that is O(n)” is not equivalent to “you can’t burn all your time credits before they’re allocated by operations that yield more than they spend”?
I think they’re equivalent unless you specify that some operations allocate more than O(1) credits at a time… which I don’t think anyone ever does, and I don’t think is allowed in the definition.
But there are other ways to analyse the total cost that does not force you to save up “credits” before running an expensive operation
Doesn’t it fall out of the rest of the definition? If I can perform a given sequence of operations, then I can perform any prefix of that sequence of operations and stop early. Suppose that you’re got a proof of amortised time complexity for a data structure for which a sequence of operations A,B is legal, where the upper bound doesn’t hold after A has completed but does hold again by the time that B has completed. Suppose I stop early after performing A. I’ve now performed a sequence of operations. The upper bound does not hold at this point. I’m not going to perform any more operations. The upper bound doesn’t hold at the end of this sequence. The invariant “for any sequence of n operations, I can prove that the total amortised cost of the operations is an upper bound on the total actual cost” has been violated.
Suppose I stop early after performing A. I’ve now performed a sequence of operations.
If you stopped after performing A, you have not performed a sequence of m operations (in your case, 2), which is what one has given an amortised bound for. When you have proved an amortised upper bound O(f(n)) for an operation, then you have proved that any sequence of m operations will have the upper bound O(m (f(n)). Typically m = n.
It’s probably confusing that I wrote n instead of m in the previous comment, so sorry about that.
Is the distinction that an amortised cost proof can specify a minimum number of operations, prior to which the bound doesn’t have to hold?
I’ve never seen them used like that, only for “all sequences of API calls result in actual cost bounded by amortised cost” way (which is the kind of proof Chris Okasaki gives for lots of the structures in his book).
Well, yes, proof of an amortised bound only requires you to specificy that for exactly m operations, the actual total cost has an upper bound of the total amortised cost. And as you mention, it’s very common to prove up to and including m operations instead.
There are more fundamental differences between amortized and average-case analyses:
Amortized analysis uses an accounting system to track computation whose cost has been paid for, but not performed yet. This accounting system is justified entirely by the needs of the analysis, not external problem domain constraints. Amortized analysis naturally arises in the design of batch processes, e.g. formatting disk partitions or compiling programs ahead of time.
Average-case analysis assumes a probability distribution over the input space, which is necessarily not uniform, because no uniform distribution exists over a countable set (such as the set of lists, trees, graphs, etc. that an algorithm can process). This probability distribution can only possibly be justified by the external nature of the problem domain, in which some inputs are deemed more likely than others. Average-case analysis naturally arises in the design of heuristic algorithms that perform well but in unlikely pathological cases.
Average-case analysis assumes a probability distribution over the input space
Pretty sure this isn’t what Ned’s going for here. TFA actually means something much simpler like “when a library says ‘amortised O(1) time per operation’, you can add up the total number of nanoseconds you spent in the library and divide it by the number of operations you did and come out with a number that will be bounded above by some constant’.”
Leiningen doesn’t default to any latest version as far as I know. Leiningen does
Versioning/pinning is not only about having an API-compliant library though, it’s also about being sure that you can build the exact same version of your program later on. Hyrum’s Law states that any code change may effectively be a breaking one for your consumers. For example:
Of course, pinning is not a panacea: We usually want to apply security issues and bugfixes immediately. But for the most part, there’s no way we can know a priori that new releases will be backwards compatible for our software or not. Pinning gives you the option to vet dependency updates and defer them if they require changes to your system.
1: Unless you use version ranges or dependencies that use them. But that happen so infrequently and is strongly advised against – I don’t think I’ve ever experienced it in the wild.
FYI, Hyrum finally made http://www.hyrumslaw.com/ with the full observation. Useful for linking. :)
Hmm, perhaps I misunderstood the doc I read. I’m having trouble finding it at the moment. I’m not a Clojure user. Could you point me at a good link? Do library users always have to provide some sort of version predicate for each dependency?
Your point about reproducing builds is a good one, but it can coexist with my proposal. Imagine a parallel universe where Bundler works just like it does here and maintains a
Gemfile.lockrecording precise versions in use for all dependencies, but we’ve just all been consistently including major version in gem names and not foisting incompatibilities on our users. Push security fixes and bugfixes, pull API changes.Edit: based on other comments I think I’ve failed to articulate that I am concerned with the upgrade process rather than the deployment process. Version numbers in
Gemfile.lockare totally fine. Version numbers inGemfileare a smell.Oh, yes, sorry for not being clear: I strongly agree that version “numbers” might as well be serial numbers, checksums or the timestamp it was deployed. And I think major versions should be in the library name itself, instead of in the version “number”.
In Leiningen, library users always have to provide some sort of version predicate for each dependency, see https://github.com/technomancy/leiningen/blob/master/doc/TUTORIAL.md#dependencies. There is some specific stuff related to snapshot versions and checkout dependencies, but if you try to build + deploy a project with those, you’ll get an error unless you setup some environment variable. This also applies to boot afaik ; the functionality is equivalent with how Java’s Maven works.
Thanks! I’ve added a correction to OP.
Hmm, I’ve been digging more into Leiningen, and growing increasingly confused. What’s the right way to say, “give me the latest 2.0 version of this library”? It seems horrible that the standard tutorial recommends using exact versions.
There’s no way to do that. The Maven/JVM dependency land always uses exact versions. This ensures stability.