Threads for hyPiRion

  1. 5

    Nerd snipped myself into writing a Go version:

    type Trie struct {
    	children map[byte]*Trie
    }
    
    func NewTrie() *Trie {
    	return &Trie{make(map[byte]*Trie)}
    }
    
    func (t *Trie) Insert(word string) {
    	for _, c := range []byte(word) {
    		if t.children[c] == nil {
    			t.children[c] = NewTrie()
    		}
    		t = t.children[c]
    	}
    	t.children['\x00'] = nil
    }
    
    func (t *Trie) wordsWith(prefix string, yield func(string)) {
    	for c, t := range t.children {
    		if c == '\x00' {
    			yield(prefix)
    		} else {
    			t.wordsWith(prefix+string([]byte{c}), yield)
    		}
    	}
    }
    
    func (t *Trie) Autocomplete(prefix string) []string {
    	for _, c := range []byte(prefix) {
    		t = t.children[c]
    		if t == nil {
    			return nil
    		}
    	}
    	var words []string
    	t.wordsWith(prefix, func(word string) {
    		words = append(words, word)
    	})
    
    	return words
    }
    
    1. 2

      Thoughts:

      1. You could make this Unicode aware by changing byte to rune.
      2. You could keep it as a byte map but implement that as an array of 256 pointers to Tries. 2KB per node seems like a lot of overhead, but maybe it’s worth it?
      1. 3

        A pretty common technique I have used is to have a node for every half byte and use slices, so at most 16 pointers per node. Then, you can use an integer to represent which slots are used (slot 0 used => bit 0 set, etc), and use bits.OnesCount16 to find out which of the branches have been populated. It’s essentially the technique used in ideal hash trees by Bagwell, just half the size, as it’s a little neater to work with for bytes.

        1. 1

          If you insist that words are just 26 English alphabetical letters, then you could use an array of 27 pointers, which is probably pretty cheap.

          1. 1

            You could make this Unicode aware by changing byte to rune.

            i think typically folks build byte trie over utf8 encoding, to keep number of children small. That’s how Unicode automata generally tend to work

          2. 2

            t.children['\x00'] = nil that’s a funny hack :-) Why not just use a boolean?

            1. 1

              I started with a boolean but then decided that a null character would be invalid anyway, so might as well use it.

          1. 2

            First off, a common handy function I have in several Go projects I am working on:

            // New returns a pointer to x.
            func New[T any](x T) *T {
            	return &x
            }
            

            If you put it in a package named ptr you can do ptr.New(true) for optional arguments. Since it’s so small I figured there’s really no need to look for a library doing that (I also have things like ptr.DerefOrZero and stuff like that in that package.)

            With that function, it’s not that painful to add new fields and stay backwards compatible.

            But also, there’s an unmentioned alternative you can do which I would probably do in the example given. You can make the zero value useful via enums:

            type DeploymentState string
            
            const (
            	DeploymentStateAny         DeploymentState = ""
            	DeploymentStateDeployed    DeploymentState = "deployed"
            	DeploymentStateNotDeployed DeploymentState = "not_deployed"
            	// (or camelCase or whatever case you prefer)
            )
            

            It seems unlikely in this particular scenario that the enum will change, but I have been burnt more often than not in cases where we have a boolean flag for filtering and need to extend it in some way. Nowadays I usually take the hit up front so that adding in the new alternative in a backwards compatible manner looks prettier for everyone.

            1. 3

              Experience has taught me that booleans are really better expressed as an enum with 2 values in 95% of cases, when performance isn’t critical. Go is lacking exhaustive matching primitives which would make using enum the superior solution in all but the most extreme situations.

              1. 2

                I made a new.Of helper that works like you’re ptr.New if anyone cares: https;//github.com/carlmjohnson/new. I hope it becomes a builtin soon.

              1. 1

                How well do these alternatives fare compared to S2? My experience is that, apart from some geojson translation and edge cases around, it fulfill most of my needs – mostly thanks to its efficient spatial indices. While I suppose GEOS has some of those, my experience with C bindings is mixed, and sometimes feel very clunky to use in the language on top of it.

                1. 4

                  S2 performs a similar role to GEOS or Rust’s geo package: functions for manipulating vector geometries. The main difference is that S2 works on spherical geometry (points and shapes lie on the surface of a sphere) and GEOS and geo mostly do planar geometry (points and shapes lie on a plane).

                  Planar geometry is fine so long as all of the objects you are interested in are fairly close to each other so that you can project them all onto the same plane with little distortion. If that’s not possible, then you either need to partition your data so that each partition can be safely projected without distortion or accept noticeable errors when operating on objects that are far apart or far from the most accurate parts of the projection.

                  Spherical geometry lets you use a single projection for the entire earth’s surface with little distortion. Calculations will still be slightly wrong because the Earth is not a sphere, it is irregularly squished and bulging, but the errors are much less noticeable.

                  GEOS and geo both provide some functions for calculating geodetic distances between lat/lon points, but don’t provide geodetic or spherical spatial indices, intersection, etc.

                  The very useful R sf package uses both S2 and GEOS, dependent on whether coordinates are lat/lon or planar.

                  1. 1

                    Ah, thanks for the clarification. I assumed GEOS worked on spherical geometry.

                1. 16

                  I find the minified-font for the logo to be a little overkill, or is it just me?

                  Wouldn’t it be more efficient to just use a traced (= convert text to path in Inkscape) SVG as a logo? If you do this, you can also manually adjust the kerning to make a really neat logo.

                  1. 3

                    Having it be properly textual is kinda necessary for non-graphical users, although I guess that’s what the alt attribute is for.

                    1. 4

                      Making the text part of a logo an actual text also makes it automatically adjust its color when the user switches from a light mode to dark. I agree it’s not a worthwhile endeavor for every website, but I think it may be.

                      1. 7

                        You can get that with SVG too, using fill="currentColor".

                    2. 2

                      I didn’t actually think about that, but that sounds like a good option to me! Maybe I’ll explore that at some point in the future.

                    1. 5

                      Generics in Go are probably gonna be a big deal, and the implementation looks nice and everything.

                      But I’m a bit concerned that the rest of the language doesn’t really “support” generics as well as most other languages which feature them. Take the Max example from the article:

                      func Max[T constraints.Ordered](x, y T) T {
                          if x > y {
                              return x
                          }
                          return y
                      }
                      

                      This is a Max function which can take “any ordered type”. But “ordered” here just means the types which have comparison operators – which is a fixed set of built-in types. The Max function works with the signed integer types, the unsigned integer types, the floating point types, and strings. And that’s much better than the situation before generics where each built-in type would need a separate Max function (or, more likely, as is the case with Go’s standard library, just support float64 and require lossy casting between your type and float64). But it’s extremely limiting compared to any other generics I’ve heard of, because no user-defined type can be an “ordered” type.

                      Generics (well, templates) work in C++ because a user-defined type can implement essentially all the operators a built-in type can, and can therefore act indistinguishably from a built-in type. A max function template in C++ can use the < operator to compare, and then any type with a < operator will work with that function.

                      Generics “work” in Java because the generic type must be an object, not a primitive, and so a generic function can just require that the type implements the Comparable interface, which is implemented by all the relevant boxed types (Integer, Float, etc), and can be implemented by all user-defined types. So again, a max function in Java can use the compare method to compare, and then any type which implements Comparable will work with that function.

                      Go is somewhere in the middle, where you can’t overload operators, so you can’t make a custom type act like a built-in primitive type, but it also doesn’t require boxing so people will be reluctant to make generic functions which require boxing (after all, in Go, that would be kind of ridiculous; you might just as well use runtime polymorphism using interfaces if you’re gonna box your types anyways).

                      Maybe this turns out to not really be a problem in practice. Or maybe it turns out to be one of those things which Go users live with and Go haters bring up in every discussion as (valid, if overblown) criticism. I’d be curious to read other people’s thoughts on this. Has it been discussed in the Go community or core team? Does anyone here have experience with generics in other languages and in Go?

                      Maybe the solution will be to pass in the operators/functions you need? A proper generic Max in Go could be written like this:

                      func Max[T any](x, y T, compare func(T, T) int) T {
                          if compare(x, y) > 0 {
                              return x
                          } else {
                              return y
                          }
                      }
                      

                      Though this looks like it would be rather cumbersome to call all the time. I’m also slightly worried about how smart the Go compiler will be when it comes to inlining that comparison function, which will be important to reach performance comparable to C++ templates.

                      All in all, generics look like a positive development for the Go language, but from what I can tell, I think it demands that we re-tread old discussions about operator overloading, function overloading, or if there are better ways to achieve what we need.

                      1. 3

                        But “ordered” here just means the types which have comparison operators – which is a fixed set of built-in types.

                        No, constraints.Ordered matches all types who underlying types are one of those builtin types. See source code and note the ~ notation in interfaces. For example, time.Duration is defined as type Duration int64 for example and satisfies this constraint.

                        This mechanism doesn’t inherit any methods, but it does inherit all the operators. (The builtin primitive types don’t have methods, but if you do type A int64 and then type B A, B doesn’t inherit methods from A.) I don’t know whether this was an intentional design decision or not, but it now certainly poses a practical challenge for treating operators like methods (which is probably never going to happen anyway).

                        Operator overloading has also been explicitly rejected in the Go FAQ (see https://go.dev/doc/faq#overloading). In comparison, the FAQ never explicitly rejected generics - it used to say something like “it’s an open question, we may or may not add it”. Some people took that as a corp-speak way of rejecting generics, but it has now turned out to be an earnest statement.

                        1. 3

                          Fair, I suppose “fixed set of built-in types or type aliases of those built-in types” would be more technically correct. I don’t think that materially changes much of what I wrote, but it’s an important detail.

                          1. 2

                            These aren’t type aliases. Type aliases look like type T = int64 (note the =) and T becomes equivalent to int64 everywhere. Defining type T int64 creates a new type. The new type has the same machine representation as int64 and “inherits” the operators (which work between values of type T, not between T and int64) - but nothing else. The new type is convertible to int64 (and vice versa) that an explicit type cast is needed.

                            This is probably the more quirky aspect of Go’s type system. It is internally consistent though. If you have two type definitions type T struct{} and type U struct{}, these are not the same types and have distinct method sets, but they can be converted to each other with an explicit type cast.

                        2. 2

                          But it’s extremely limiting compared to any other generics I’ve heard of, because no user-defined type can be an “ordered” type.

                          Just to clarify, are you saying it’s limited because there’s no way to implement constraints.Ordered yourself, or because there’s no way to implement a generic “ordered”? If it’s the former, then yes, right now there’s still a difference between primitives and types with methods. If it’s the latter, then the self-referential trick documented in the type parameters proposal can implement what you want:

                          package main
                          
                          import "fmt"
                          
                          type Lesser[T any] interface {
                          	Less(T) bool
                          }
                          
                          func Max[T Lesser[T]](a T, more ...T) T {
                          	max := a
                          	for _, candidate := range more {
                          		if max.Less(candidate) {
                          			max = candidate
                          		}
                          	}
                          	return max
                          }
                          
                          type FlippedInt int
                          
                          func (n FlippedInt) Less(other FlippedInt) bool{
                          	return n > other
                          }
                          
                          func main() {
                          	a := FlippedInt(1)
                          	b := FlippedInt(2)
                          	c := FlippedInt(3)
                          
                          	fmt.Println(Max(b,a,c)) // Prints 1
                          }
                          

                          I wouldn’t say this is “extremely limiting” though. It just means there has to be two methods instead of just one, and you need to pick the right one based on what kind of type you’re dealing with. I mean, OCaml has two different operator sets for integers and floats, and it’s not considered “extremely limiting” from what I gather.

                          For containers, they’ll likely be implemented by using a comparison function underneath. If the community/standard library decides on a particular pattern, you’ll likely have one convenience function for the ones that are constraints.Ordered, another for Lesser (if that ever becomes a thing), then a raw one which lets you specify whatever comparison function you want.

                          1. 2

                            You can write func Max[T constraints.Ordered | interface{ Less(T) bool }](a, b T) T, but with the current implementation, it’s hard to do a type switch inside of Max to figure out if you got something with a Less method or not. It can be done, but it requires reflection or some other trickery. There’s a proposal to allow type switching on T, and I think it will happen in 1.19 or 1.20. Everyone wants type switching, and it’s just a matter of ironing out all the weird corner cases with it.

                            1. 2

                              One alternative to switching would be to have methods which duplicate the effect of the binary operators. So instead of having to have FlippedInt , we’d have an action .Less method on all types which are ordered, so generic code which wanted to support - say bigint and int64 - could just call the methods and eschew the operators.

                              One nice side effect is that the method sets define natural interfaces which can be used as constratins, and I think you no longer need typesets.

                              Do you happen to know if this has been discussed (or if it has obvious negatives I haven’t seen)

                              1. 2

                                Yes, it was discussed. It turned out be hard to get a set of methods that captured things like, uint can’t be negative, int8 can’t be more than 128, etc. so they need type classes, and then at that point natural methods are redundant.

                              2. 1

                                You can try casting as the interface just like any other type cast, where the second return value indicates if it was successful.

                              3. 1

                                Author here. I agree, the current implementation is limited!

                                Coming from C++, not being to implement comparison operators on user-defined types is a big gap. The core team actually acknowledge that in their generics presentation at GopherCon this past moth. I’m not sure if they plan to address this gap, but it does feel like generics (at this point) are really designed for primitives.

                                When I was first exploring the constraint system, I was excited to see that, in theory, any interface could be used in a generic type set. But then I had to pause and question when I would use type-set contained generics and would I would use method-set constrained interfaces. In the absence of user-defined operators, I don’t think there’s a compelling reason to switch from the latter to the former.

                                For example, consider two user defined interfaces Foo and Bar. Now we have the option to define a type-set

                                type Buzz interface {
                                    Foo | Bar
                                }
                                
                                func Do[T Buzz](x T) {
                                    x.F()
                                }
                                

                                But we already had a variant on this pattern with the Reader, Writer, and ReaderWriter interfaces, which (in my opinion, at least) is one of the more powerful interfaces in the standard library.

                                type ReadWriter interface {
                                	Reader
                                	Writer
                                }
                                
                                func Do(rw ReaderWriter) {
                                    rw.Read(...)
                                }
                                

                                So that leaves the question of “what’s the advantage of generics over ‘compound’ interfaces”. I argue that being able to generalize over operators is the biggest selling point. We’re never going to implement OOM patterns like Java, so to your point, yeah, we’re sitting in limbo until we can do operational comparisons on user defined types.

                                All that said, I think the constraint system is a compelling and clean implementation. If anything, I want to see how they expand on this concept to include variables etc. I’m confident this constraint system can be leveraged elsewhere with the right support.

                                1. 1

                                  Coming from C++, not being to implement comparison operators on user-defined types is a big gap.

                                  I don’t want comparison operators to be overloaded, but I do wish we could define methods on any type so we could define a “Less() bool” method on builtin types to make them implement a more general interface. This comes up a lot, and creating a subtype doesn’t work because there’s no cheap way to cast a slice of ints to a slice of MyInt, for example. And on top of that it’s not very ergonomic to have to subtype (remember the days before sort.Slice(), anyone?).

                              1. 2

                                to incrementally create type-safe contexts via middleware functions

                                What did you mean by this? I’ve not much middleware experience, so I’m unclear why anything but the router interface would use contexts.

                                1. 1

                                  A lot of people use contexts something like this for HTTP servers (simplified, obviously): https://play.golang.org/p/kZngCKeBt_Y – Also, uh, people have opinions on whether this is good practice or not because it’s not type-safe.

                                  In the playground example, I add the user ID and the logger incrementally by enriching the context with more data in two different functions (a.k.a. middlewares), but ugh, it’s not type-safe.

                                  But the blog post, I created one function (a.k.a. middleware) to attach both the user ID and the logger to the context. It’s arguably bad form to keep them together, as they are independent pieces of logic. So if one could have different functions adding those elements separately, but in a type-safe manner, then that would be neat.

                                1. 8

                                  For the record, MVS is more or less what Clojure’s deps uses as well. I think there are two things that makes MVS pretty good in practice:

                                  • MVS is predictable and easy to understand. If you ignore the I/O part, you can implement the algorithm itself in less than 50 lines (… depending on language).
                                  • By stating that MVS is the standard, library authors now have to/feel they have to enforce that for their libraries. This means minor version changes don’t tend to contain breaking changes, and since major version bumps are new package altogether, upgrading tends to be pretty painless. There are some exceptions that proves the rule though (grpc-go, what are you doing?)

                                  I think the latter is the most important aspect: Library authors seem to care about backwards compatibility much more than they used to. That automatically makes upgrading easier, regardless of the dependency algorithm.


                                  One problem with Go’s dependency management (but not really MVS) I experience now and then is the entire “built on top of Git(Hub)” issue: Sometimes libraries and programs can just disappear.

                                  The worst part of that aspect is the case where you depend on a library, found and resolved a bug, and need to use that version before it’s merged into the upstream repository. In those cases, you’re left with two options:

                                  1. Depend on your PR’s latest commit on the upstream repository, and hope the library author doesn’t rebase or squash the PR. If they do, your builds will break because the commit doesn’t exists on a branch any more.
                                  2. Depend on your own commit in your own fork, then remember to switch back to the original repository when the PR has been merged (if you’re able to remember it).

                                  i.e. you have to pick between potentially failing builds or potentially stale dependencies. Granted, I don’t think this is worse than what other dependency management solutions do, but it’s a trap that I’ve fallen into a couple of times.

                                  1. 9

                                    Leiningen for Clojure once again defaults to the latest version.

                                    Leiningen doesn’t default to any latest version as far as I know. Leiningen does

                                    1. reproducible dependency resolution: For the same dependency list, you always get the same dependencies ¹
                                    2. in no way “default” to anything, as inserting dependencies and their versions is the user’s responsibility

                                    Versioning/pinning is not only about having an API-compliant library though, it’s also about being sure that you can build the exact same version of your program later on. Hyrum’s Law states that any code change may effectively be a breaking one for your consumers. For example:

                                    • Fixed a bug in your library? Someone will depend on the buggy behaviour, or attempt to fix the bug downstream while it’s still an issue. If you forget to quote apostrophes, for example, fixing that in a bugfix release may cause some tools to double escape them.
                                    • Fixed an edge case/security issue? You’ve most likely introduced some checks which will have some overhead. If your library is used in a hot spot for a consumer, then it may lead to performance degradation of the program they’ve made.
                                    • User complains that an old version of your software is buggy/breaks, but you cannot reproduce on HEAD and you want to know what fixed it? That’s hard if you cannot use old dependencies. If git-bisect doesn’t test a commit with the dependencies you used at the commit time, you’re not testing your software as it was at that point in time. And if the bug is upstream, it’s hard to figure out what dependency caused it and how it was fixed.

                                    Of course, pinning is not a panacea: We usually want to apply security issues and bugfixes immediately. But for the most part, there’s no way we can know a priori that new releases will be backwards compatible for our software or not. Pinning gives you the option to vet dependency updates and defer them if they require changes to your system.

                                    1: Unless you use version ranges or dependencies that use them. But that happen so infrequently and is strongly advised against – I don’t think I’ve ever experienced it in the wild.

                                    1. 3

                                      Hyrum’s Law

                                      FYI, Hyrum finally made http://www.hyrumslaw.com/ with the full observation. Useful for linking. :)

                                      1. 2

                                        Hmm, perhaps I misunderstood the doc I read. I’m having trouble finding it at the moment. I’m not a Clojure user. Could you point me at a good link? Do library users always have to provide some sort of version predicate for each dependency?

                                        Your point about reproducing builds is a good one, but it can coexist with my proposal. Imagine a parallel universe where Bundler works just like it does here and maintains a Gemfile.lock recording precise versions in use for all dependencies, but we’ve just all been consistently including major version in gem names and not foisting incompatibilities on our users. Push security fixes and bugfixes, pull API changes.

                                        Edit: based on other comments I think I’ve failed to articulate that I am concerned with the upgrade process rather than the deployment process. Version numbers in Gemfile.lock are totally fine. Version numbers in Gemfile are a smell.

                                        1. 3

                                          Oh, yes, sorry for not being clear: I strongly agree that version “numbers” might as well be serial numbers, checksums or the timestamp it was deployed. And I think major versions should be in the library name itself, instead of in the version “number”.


                                          In Leiningen, library users always have to provide some sort of version predicate for each dependency, see https://github.com/technomancy/leiningen/blob/master/doc/TUTORIAL.md#dependencies. There is some specific stuff related to snapshot versions and checkout dependencies, but if you try to build + deploy a project with those, you’ll get an error unless you setup some environment variable. This also applies to boot afaik ; the functionality is equivalent with how Java’s Maven works.

                                          1. 2

                                            Thanks! I’ve added a correction to OP.

                                            1. 1

                                              Hmm, I’ve been digging more into Leiningen, and growing increasingly confused. What’s the right way to say, “give me the latest 2.0 version of this library”? It seems horrible that the standard tutorial recommends using exact versions.

                                              1. 3

                                                There’s no way to do that. The Maven/JVM dependency land always uses exact versions. This ensures stability.

                                        1. 6

                                          If you read more about big-O, you may see the word “amortized” thrown around. That’s a fancy word for “average,”

                                          Uh, no. “Amortized” means that earlier operations may pay the cost of later ones. By the time you need to perform an expensive operation, you will have performed enough cheap ones, so that the cost of the entire sequence of operations is bounded above by the sum of their amortized costs.

                                          1. 3

                                            Thinking about it briefly, “average” is a weaker notion than amortised. All amortised-cost algorithms are also average-cost algorithms with the same bound, but not all average-cost algorithms qualify as amortised-cost algorithms with the same bound.

                                            If some algorithm does n operations each with O(1) cost followed by 1 operation with O(n) cost, that qualifies as O(1) per-operation amortised and qualifies as O(1) per-operation on average.

                                            If another algorithm does 1 operation with O(n) cost followed by n operations each with O(1) cost, it qualifies as O(1) per-operation on average, but does not qualify as O(1) per-operation amortised.

                                            As a consumer of some algorithm, if you are given an amortised cost bound and you translate that to “average” in your head then you’ll be being slightly pessimistic. Amortised cost bounds imply that some of the cheap operations happen before any of the expensive ones, but average cost bounds merely imply that the cost of the expensive operations is balanced out by the cost of the cheap ones.

                                            1. 3

                                              If another algorithm does 1 operation with O(n) cost followed by n operations each with O(1) cost, it qualifies as O(1) per-operation on average, but does not qualify as O(1) per-operation amortised.

                                              This is a common misconception I see pop up often (I guess I’d blame the poor naming?): “Amortised” means that, for any sequence of n operations, you can prove that the total amortised cost of the operations is an upper bound on the total actual cost. It doesn’t mean that you have to run a couple of cheap instructions before you can “afford” to run an expensive one.

                                              People usually tend prove a stronger claim though: That this upper bound also holds for all intermediate stages as well. That’s what you do when you end up using the banker’s/physicist’s method, by ensuring you don’t spend “credits” you don’t yet have. But there are other ways to analyse the total cost that does not force you to save up “credits” before running an expensive operation.

                                              1. 2

                                                “Amortised” means that, for any sequence of n operations, you can prove that the total amortised cost of the operations is an upper bound on the total actual cost. It doesn’t mean that you have to run a couple of cheap instructions before you can “afford” to run an expensive one.

                                                Uh… can you show me a case where “you must run at least O(n) operations that are O(1) apiece before you may run an operation that is O(n)” is not equivalent to “you can’t burn all your time credits before they’re allocated by operations that yield more than they spend”?

                                                I think they’re equivalent unless you specify that some operations allocate more than O(1) credits at a time… which I don’t think anyone ever does, and I don’t think is allowed in the definition.

                                                But there are other ways to analyse the total cost that does not force you to save up “credits” before running an expensive operation

                                                Doesn’t it fall out of the rest of the definition? If I can perform a given sequence of operations, then I can perform any prefix of that sequence of operations and stop early. Suppose that you’re got a proof of amortised time complexity for a data structure for which a sequence of operations A,B is legal, where the upper bound doesn’t hold after A has completed but does hold again by the time that B has completed. Suppose I stop early after performing A. I’ve now performed a sequence of operations. The upper bound does not hold at this point. I’m not going to perform any more operations. The upper bound doesn’t hold at the end of this sequence. The invariant “for any sequence of n operations, I can prove that the total amortised cost of the operations is an upper bound on the total actual cost” has been violated.

                                                1. 2

                                                  Suppose I stop early after performing A. I’ve now performed a sequence of operations.

                                                  If you stopped after performing A, you have not performed a sequence of m operations (in your case, 2), which is what one has given an amortised bound for. When you have proved an amortised upper bound O(f(n)) for an operation, then you have proved that any sequence of m operations will have the upper bound O(m (f(n)). Typically m = n.

                                                  It’s probably confusing that I wrote n instead of m in the previous comment, so sorry about that.

                                                  1. 2

                                                    Is the distinction that an amortised cost proof can specify a minimum number of operations, prior to which the bound doesn’t have to hold?

                                                    I’ve never seen them used like that, only for “all sequences of API calls result in actual cost bounded by amortised cost” way (which is the kind of proof Chris Okasaki gives for lots of the structures in his book).

                                                    1. 2

                                                      Well, yes, proof of an amortised bound only requires you to specificy that for exactly m operations, the actual total cost has an upper bound of the total amortised cost. And as you mention, it’s very common to prove up to and including m operations instead.

                                                      1. 1

                                                        Isn’t there a forall m. in there somewhere?

                                                2. 1

                                                  Right. By “cheap”, I meant “actual cost < amortized cost”, not “low actual cost”.

                                                3. 2

                                                  There are more fundamental differences between amortized and average-case analyses:

                                                  Amortized analysis uses an accounting system to track computation whose cost has been paid for, but not performed yet. This accounting system is justified entirely by the needs of the analysis, not external problem domain constraints. Amortized analysis naturally arises in the design of batch processes, e.g. formatting disk partitions or compiling programs ahead of time.

                                                  Average-case analysis assumes a probability distribution over the input space, which is necessarily not uniform, because no uniform distribution exists over a countable set (such as the set of lists, trees, graphs, etc. that an algorithm can process). This probability distribution can only possibly be justified by the external nature of the problem domain, in which some inputs are deemed more likely than others. Average-case analysis naturally arises in the design of heuristic algorithms that perform well but in unlikely pathological cases.

                                                  1. 3

                                                    Average-case analysis assumes a probability distribution over the input space

                                                    Pretty sure this isn’t what Ned’s going for here. TFA actually means something much simpler like “when a library says ‘amortised O(1) time per operation’, you can add up the total number of nanoseconds you spent in the library and divide it by the number of operations you did and come out with a number that will be bounded above by some constant’.”