1. 28
  1.  

  2. 23

    It always frustrates me when I see a fantastic article up at the top of Lobsters with a lot of discussion that completely misses the point. The point of this article isn’t about the question in the title, it’s about the fact that nobody bothered to check what the real reason for this decision actually was. This paragraph is really the meat of the article:

    Whatever programmers think about themselves and these towering logic-engines we’ve erected, we’re a lot more superstitious than we realize. We tell and retell this collection of unsourced, inaccurate stories about the nature of the world without ever doing the research ourselves, and there’s no other word for that but “mythology”. Worse, by obscuring the technical and social conditions that led humans to make these technical and social decisions, by talking about the nature of computing as we find it today as though it’s an inevitable consequence of an immutable set of physical laws, we’re effectively denying any responsibility for how we got here. And worse than that, by refusing to dig into our history and understand the social and technical motivations for those choices, by steadfastly refusing to investigate the difference between a motive and a justification, we’re disavowing any agency we might have over the shape of the future. We just keep mouthing platitudes and pretending the way things are is nobody’s fault, and the more history you learn and the more you look at the sad state of modern computing the the more pathetic and irresponsible that sounds.

    We’re forgetting our history. Alan Kay said it best when he called computing a “pop culture” more than an actual scientific field.

    I admit I’m as guilty of this as anyone, but we should at least try to be aware of our sins.

    1. 18
      1. 10

        I think that “indexes” should start at 1 and “offsets” should start at 0. That is, I think a simple name change could fix the whole argument.

        1. 2

          [EDIT: This comment is largely wrong. Collections are 1-based, Streams are 0-based. It’d been so long that I’d forgotten. The point in general stands, but the code as-presented below will not work as expected; even when doing a range copy on a SequenceableCollection, it works 1-based with inclusive upper bounds. Doing a PositionableStream on the underlying collection will work as presented here. Sorry about that.]

          That’s exactly what Smalltalk does.

          c := #(a b c d e f).  "Declare a five-element array."
          c at: 1. "a"
          c at: 5. "f"
          c at: 6. "throws an error"
          c copyFrom: 0 to: 1. "#(a) EDIT: Nope, wrong!"
          c copyFrom: 0 to: 5. "#(a b c d e f) EDIT: nope, wrong!"
          

          This made instant and intuitive sense to me when I learned it as a kid in a way that 0-based indexing definitely did not, so I’m definitely on board with it. But I’d also note that in Smalltalk, as in many other modern languages, I barely ever even type the array index in the first place; doing things like first, last, allButFirst: 2, etc. are much more common. So I can’t honestly say it made a huge difference either way. (When I picked up Smalltalk’s successor, Self, I programmed in it for several days before I realized that its indices did in fact start with 0. And I only realized that because I thought I found a bug in the debugger.)

          1. 2

            If you see everything as an offset from the beginning of an array, you can stop using the word index, and the confusion should stop.

            1. 0

              If you see everything as an offset from the beginning of an array, you’re thinking in terms of pointers and RAM blocks instead of thinking in terms of arrays. That abstraction slip will cause you problems.

          2. 6

            The most useful thing I’ve ever read about indexing is that indices can be interpreted as pointing between elements, not at elements. In other words, indices are points, and items are intervals between those points. In this view, the difference between 0-indexed and 1-indexed system isn’t really a difference in the interpretation of the indices so much as a difference in the behavior of the indexing operator – does it choose the cell to the left or right of the index?

            Phrased this way, it seems pretty natural to me to choose the cell on the right (that is, index from 0), since it’s very common to refer to regions and intervals of all kinds by an origin and a positive “extent” beyond that origin – think rectangles in graphics libraries (x, y + width, height) or intervals of time in common speech (e.g. “the 1990s”). It’s very rare to refer to an interval by a point on its right and to then assume it to have negative extent, and that’s exactly what 1-indexing does.

            Thinking of intervals as points between indices with an origin at 0 has a lot of practical benefits. Most importantly, it becomes immediately obvious why, when representing ranges of elements, lower bounds should be inclusive and upper bounds should be exclusive; the items between the lower and upper bound are the items contained in the range, and that range includes the element to the right of the lower bound but not the element to the right of the upper bound. This also makes it crystal clear why, in such a representation of ranges, the length of the range is the difference between the upper and lower bounds. Indices-as-points also makes modular arithmetic and floor and clamping operations a lot easier in my experience, because you can get back to thinking of your indices as ordinary numbers on a number line and stop thinking of them as buckets. Finally, this whole interpretation works very nicely in image processing applications, where it is complementary to the interpretation that pixel values are samples of a function at points, which is often a very useful way of looking at things.

            1. 3

              How many off-by-one disasters could we have avoided if the “foreach” construct that existed in BCPL had made it into C?

              I don’t have concrete evidence, but my intuition is that it’s not usually the loop that’s wrong, it’s the allocation. So not many. But I don’t know anything about how BCPL implemented foreach either.

              1. 9

                How many off-by-one errors could be avoided if we just looped over lists rather than use indices :)

                1. 7

                  Yeah. People hate on Lua for being 1 based, but I almost never think about it because I don’t use manual indices.

              2. 3

                Clearly, it should be a user definable setting. At runtime. If Perl can do it, why can’t anyone else?

                1. 3

                  Visual Basic can.

                  1. 1

                    Anything .NET-based supports it (although many things will break if you do this).

                  2. 3

                    Perl has “highly discouraged” this for a very very long time. It was finally deprecated in 5.12 and the old “feature” removed in 5.16.

                    1. 1
                    2. 2

                      rand(200) at runtime, and has a maximum value of floor(rand(100)*PI).

                      Then finally we all will have something tangible and useful to share complaining about.

                      1. 1

                        I’m not even sure we should attribute these, very likely independently established things to a sole cause. Remember, this was the era of rapid development in a world with borders, expensive travel and no Internet to speak of. There were conferences and CACM sure, but they were hardly authorities for the few hubs of expertise worldwide that worked on programming language development. What we see now as “dominating convention” is a baseline that evolved according to some vague fitness metric over the decades.

                        The reason though for same phenotype of 0-indexing could be different. In case of C I have a great suspicion base 0 is coming from memory addressing starting from 0 on nearly every computer system out there. So e.g. PDP-11 instruction set[1] uses zero base addresses, zero base indexing offsets in Index access modes and so on. Sure you can make the compiler to translate the offsets from 1, but why on the Earth would you want to waste cycles on that? Especially since a C programmer of the day was always an assembly programmer as well, and that would confuse debugging substantially.

                        1) That instruction set seems to be responsible for a number of C quirks. E.g. there are two direct and deferred autoincrement/decrement modes, allowing efficient execution of –var and var++. My suspicion is ++var and var– were added in C as complements for completeness sake.

                        1. 1

                          You want indices in C starting at 1?

                          #include <stdio.h>
                          
                          void main(void) {
                              char a[] = {1, 2, 3};
                              char *b = a-1;
                              printf("%d\n", b[1]);
                          }
                          

                          I love the reason they give here for zero-based is a CEO wants to race yachts.

                          1. 1

                            That’s undefined behavior.