1. 20

  2. 11

    Mergesort is one of those algorithms that is not used too much because there is another one that does the same job better. Mergesort is a sorting algorithm outshined by quicksort.

    /me winces

    If you are sorting anything whose backing does not lend itself to random access, say a linked-list or magnetic tape or similar structure, you really don’t want quicksort. Also, the worst upper bound on mergesort is n log n, not n^2 as it is with quicksort.

    Then again, for small n, insertion sort can often beat quicksort. And radix sort, when you can get away with it, beats everybody else silly.

    1. 5

      Mergesort is also stable, while quicksort isn’t (at least, as commonly recognized; you can make it stable but the cost of doing so makes it no better than mergesort).

      Also, two examples of non-random-access data sources are ordinary text files (unless you pre-index all of the line positions), and stdin (which may not be seekable), which is why Unix sort will generally use mergesort, and will generally handle input of arbitrary size, even larger than RAM. If the data gets too big, the locally-sorted chunk is simply written to disk and a new one started; then at the end of input all of those chunks can be merged (theoretically pairwise recursively, although GNU’s apparently pulls out all the stops and does up to a 16-way merge at once). As you say, a technique that works just as well even if the files are on tape, because all of the access is perfectly linear.

      1. 2

        props to the author for identifying mergesort as one of the “three algorithms everyone should know”, though

        1. 1

          i think i use linked lists about as often as magnetic tape

          edit: it really is beautiful though

          1. 1

            Going by your past comments you’ve used Haskell a fair bit ?

            1. 2

              yeah yeah, but even then lists are only out of lazyness

              I remember someone asked why their code was slow and people suggested using data.sequence because it was O(1) for the common ops they were using, but I tried rewriting it to use vectors and it was substantially faster. it’s hard to beat good locality

        2. 5

          TL;DR: Binary search, mergesort, and maximum subarray. The third one isn’t even an algorithm though, it’s a computer science problem. An algorithm for that would be, e.g., Kadane’s algorithm.

          1. 1

            His binary search has a bug:

            mid := (begin+end)/2

            The “begin+end” can overflow and cause issues. Better to have:

            mid := begin + ((end - begin) / 2)
            1. 2

              On one hand, yes. on the other hand, at worst this restricts your search to half the address space of your machine. And that case is only hit when the data being sorted is a single byte in size.