Mike Hoye linked to this same article in his acerbic but nevertheless well-researched article, Citation Needed (2013):
There are … arguments for zero-indexing involving “natural numbers” or “elegance” or some other unresearched hippie voodoo nonsense that are either wrong or too dumb to rise to the level of wrong. The fact of it is this: before pointers, structs, C and Unix existed, at a time when other languages with a lot of resources and (by the standard of the day) user populations behind them were one- or arbitrarily-indexed, somebody decided that the right thing was for arrays to start at zero. So I found that person and asked him.
He then goes on to share his correspondence with Dr. Martin Richards and Tom Van Vleck, the creators of BCPL, which would then go on to influence C. Although ALGOL 60 (for which Dijsktra wrote the first compiler) predated both languages, Van Vleck describes a conscious, practical reason for zero-indexing that Hoye sums up thusly:
The technical reason we started counting arrays at zero is that in the mid-1960’s, you could shave a few cycles off of a program’s compilation time on an IBM 7094. The social reason is that we had to save every cycle we could, because if the job didn’t finish fast it might not finish at all and you never know when you’re getting bumped off the hardware because the President of IBM just called and fuck your thesis, it’s yacht-racing time.
The social reason is that we had to save every cycle we could, because if the job didn’t finish fast it might not finish at all and you never know when you’re getting bumped off the hardware because the President of IBM just called and fuck your thesis, it’s yacht-racing time.
This is the one good thing yacht-racing has done for humanity. Arrays should start at 0.
Meh, I don’t find Dijkstra’s line of argument convincing here because:
a) a subsequence of natural numbers can also be conventionally denoted in the form of its length along with its offset from the smallest Natural, as in APL 1+⍳11, and
b) does one really count ordinally by considering the “number of elements preceding … in the sequence” rather than simply the ordinal of the sole element immediately preceding?
As for (a) I’d not be surprised if EWD knew this and intentionally omitted it because of his disregard for APL and Iverson.
This thread contains the reasoning from the last time that this discussion happened. I would summarize it in this way: The number of elements in a list/array/vector can be zero, one, two, …
Yeah, my pov is that the length of a subsequence, rather than its upper bound, is a preferable property in denotation for programmers. Notwithstanding the heady mathematical arguments of whether zero is Natural, in practice I think the most common uses of index ranges in software are for loop invariants and substring extraction. In both cases it is more ergonomic and less error-prone to express these in terms of length and offset.
One thing that’s harder: list[n % len(list)] will always be in-bounds if n ≥ 0 and arrays start at 0, but not if they start at 1. And writing a “fixed” version of % that goes 1-n is trickier than it looks.
In general, I think if you’re using a language that has any concept of iterators, there’s no need to think about ways to index a list. The authors of the iterator will certainly be doing that thinking, but in general code, this results in usually tricky, badly documented operations (like list[n % len(list)]). If we take out the whole concept of dealing directly with the bounds of the array, or behavior going past the bounds of the array and leave that to an iterator, then I see much fewer reasons for 0 to be the start of counting arrays or for most programmers to ever explicitly index a list.
FWIW I started programming when I was young and learned 0-based indexing really early. I found making that work with my math classes/knowledge a bit tricky. Later in life, as I did more math, I found 1-based indexing natural (most of the books/papers I read define the naturals upon {1, 2, ... }, and I am sympathetic to this definition for the many mathematical arguments out there), and now I find 0-based indexing a bit silly when I use it.
For fun, I tried to see how tricky it would be do achieve this with 1-based indexing. The first attempt is somewhat cumbersome but produces a sequence that can be used to index a list without bounds error:
list←'it was the best of times'
ICYCLE←{1+⍺⍺|¯1+⍺+⍳⍵} ⍝ usage: offset (upbound ICYCLE) length
list[1 ((≢list)ICYCLE) 100]
t was the best of timesit was the best of timesit was the best of timesit was the best of timesit wa
But in APL it is simpler just to reshape the original list which results in the desired cyclic repetition and no no direct indexing is required:
REPEAT←{⍺↓(⍺+⍵)⍴⍺⍺} ⍝ usage: offset (list REPEAT) length
1 (list REPEAT) 100
t was the best of timesit was the best of timesit was the best of timesit was the best of timesit wa
Huh… I wrote this li’l Offset Indexing rant last October. Little did I know then that among the “cargo cult bull from newer programmers who only get half this stuff” we would have the legendary EWD♥!
Yes, when I we say “OK, can I borrow volumes two through twelve from you?” we do mean eleven volumes (convention C) and when we say “I’ll be there between two and twelve” we mean ten hours (convention B—also used for milestone markers along roads).
But that doesn’t mean the first ordinal should be zero. What part of offset don’t you understand?
Quoting myself:
Think of it as how many rooms do I need to go to to get to the kitchen? Zero if you already are in the kitchen. One if you are in the second room. Two if you are in the third room.
The key is to not think of e[0] as “this is the zeroeth element in e”. The number 0 doesn’t belong to the element, it belongs to the array subscript operator. How many steps do you need to tell it to go to find the element you want? If zero steps, then e[0]. If five steps, then e[5]. Don’t think of e[5] as some kind of Leeloo Dallas Multipass fictional “fifth element”. That’s watching way too much Captain Planet. Instead, think of it as asking the array subscript operator to go five steps to get you an element.
Like on your hand, the elements are the fingers and the steps are the spaces between your fingers.
I didn’t mean to troll (I’m not really sure how lobste.rs works yet).
Ordinals are one, two, three, four, matching up with the cardinality of those sets.
To get the second element in e, a.k.a. element number two, we use e[1].
To get to the first element in e, a.ka. element number one, we use e[0]. The array subscript operator denotes offset, not ordinality.
Same in Lisp where the number of d in car, cadr, caddr, cadddr etc indicate which element we wanna get: the first, second, third, fourth etc respectively. Also known as list-ref 0, 1, 2, 3 etc respectively.
EWD says:
So let us let our ordinals start zero.
That is just wrong. That is not what an ordinal is.
Then he says:
An element’s ordinal (subscript) equals the number of elements preceding it in the sequence.
This is also wrong. An element’s ordinal (which is distinct from its subscript) equals the amount of the elements in its set including it.
An element’s subscript (which is distinct from its ordinal) equals the amount of steps you had to take to get there. So yes, it’s all well and good that array subscripting starts at zero but that is not “numbering” or, specifically, that is not ordinality.
And then he says
we had better regard—after all those centuries!—zero as a most natural number.
Well, ISO 80000-2 agrees with him there.
The two sets “positive integers” a.ka. “counting numbers” vs “non-negative integers” differ by the exclusion or inclusion of zero, respectively. Refering to the second of those two sets as “natural numbers” is fine but not relevant to his main point, which is that “numbering should start at zero”.
The case for including zero among natural numbers is to use it for the empty set. I have zero tomatoes, then I get my first tomato now I have one tomato.
Is he sarcastic? Is that why my comment got marked as “troll”, because I fell for some April fool’s thing?
The guy is literally saying that if I have zero apples, and I get my “zeroeth apple”, I now have one apple. That’s not just not how English works, nor how the C family of languages works.
Instead, zero corresponds to the empty set, not the set of cardinality one.
This is reminiscent of counting storeys / floors in a building, which differs in UK vs US. You can count the literal floors (the elements: one, two, three, etc) or you can count the stairs you have to climb to get there (zero, one, two, etc).
One of many reasons why C has to work this way is because you want the pointer addressing offset (when you do string_name+1 for example) to match up with the array offset subscript.
Mike Hoye linked to this same article in his acerbic but nevertheless well-researched article, Citation Needed (2013):
He then goes on to share his correspondence with Dr. Martin Richards and Tom Van Vleck, the creators of BCPL, which would then go on to influence C. Although ALGOL 60 (for which Dijsktra wrote the first compiler) predated both languages, Van Vleck describes a conscious, practical reason for zero-indexing that Hoye sums up thusly:
This is the one good thing yacht-racing has done for humanity. Arrays should start at 0.
New word for me! Thanks!
In case it’s new for others too:
HTML version
Meh, I don’t find Dijkstra’s line of argument convincing here because:
a) a subsequence of natural numbers can also be conventionally denoted in the form of its length along with its offset from the smallest Natural, as in APL
1+⍳11
, andb) does one really count ordinally by considering the “number of elements preceding … in the sequence” rather than simply the ordinal of the sole element immediately preceding?
As for (a) I’d not be surprised if EWD knew this and intentionally omitted it because of his disregard for APL and Iverson.
This thread contains the reasoning from the last time that this discussion happened. I would summarize it in this way: The number of elements in a list/array/vector can be zero, one, two, …
Yeah, my pov is that the length of a subsequence, rather than its upper bound, is a preferable property in denotation for programmers. Notwithstanding the heady mathematical arguments of whether zero is Natural, in practice I think the most common uses of index ranges in software are for loop invariants and substring extraction. In both cases it is more ergonomic and less error-prone to express these in terms of length and offset.
One thing that’s harder:
list[n % len(list)]
will always be in-bounds ifn ≥ 0
and arrays start at 0, but not if they start at 1. And writing a “fixed” version of%
that goes 1-n is trickier than it looks.In general, I think if you’re using a language that has any concept of iterators, there’s no need to think about ways to index a list. The authors of the iterator will certainly be doing that thinking, but in general code, this results in usually tricky, badly documented operations (like
list[n % len(list)]
). If we take out the whole concept of dealing directly with the bounds of the array, or behavior going past the bounds of the array and leave that to an iterator, then I see much fewer reasons for 0 to be the start of counting arrays or for most programmers to ever explicitly index a list.FWIW I started programming when I was young and learned 0-based indexing really early. I found making that work with my math classes/knowledge a bit tricky. Later in life, as I did more math, I found 1-based indexing natural (most of the books/papers I read define the naturals upon
{1, 2, ... }
, and I am sympathetic to this definition for the many mathematical arguments out there), and now I find 0-based indexing a bit silly when I use it.True that’s a hairy case, but if possible I would choose to use something like Python’s itertools.cycle rather than fumbling with modulo arithmetic.
For fun, I tried to see how tricky it would be do achieve this with 1-based indexing. The first attempt is somewhat cumbersome but produces a sequence that can be used to index a list without bounds error:
But in APL it is simpler just to reshape the original list which results in the desired cyclic repetition and no no direct indexing is required:
Huh… I wrote this li’l Offset Indexing rant last October. Little did I know then that among the “cargo cult bull from newer programmers who only get half this stuff” we would have the legendary EWD♥!
Yes, when I we say “OK, can I borrow volumes two through twelve from you?” we do mean eleven volumes (convention C) and when we say “I’ll be there between two and twelve” we mean ten hours (convention B—also used for milestone markers along roads).
But that doesn’t mean the first ordinal should be zero. What part of offset don’t you understand?
Quoting myself:
Hi, wow, this got downvoted and marked “troll”.
I didn’t mean to troll (I’m not really sure how lobste.rs works yet).
Ordinals are one, two, three, four, matching up with the cardinality of those sets.
To get the second element in e, a.k.a. element number two, we use e[1].
To get to the first element in e, a.ka. element number one, we use e[0]. The array subscript operator denotes offset, not ordinality.
Same in Lisp where the number of d in car, cadr, caddr, cadddr etc indicate which element we wanna get: the first, second, third, fourth etc respectively. Also known as list-ref 0, 1, 2, 3 etc respectively.
EWD says:
That is just wrong. That is not what an ordinal is.
Then he says:
This is also wrong. An element’s ordinal (which is distinct from its subscript) equals the amount of the elements in its set including it.
An element’s subscript (which is distinct from its ordinal) equals the amount of steps you had to take to get there. So yes, it’s all well and good that array subscripting starts at zero but that is not “numbering” or, specifically, that is not ordinality.
And then he says
Well, ISO 80000-2 agrees with him there.
The two sets “positive integers” a.ka. “counting numbers” vs “non-negative integers” differ by the exclusion or inclusion of zero, respectively. Refering to the second of those two sets as “natural numbers” is fine but not relevant to his main point, which is that “numbering should start at zero”.
The case for including zero among natural numbers is to use it for the empty set. I have zero tomatoes, then I get my first tomato now I have one tomato.
Is he sarcastic? Is that why my comment got marked as “troll”, because I fell for some April fool’s thing?
The guy is literally saying that if I have zero apples, and I get my “zeroeth apple”, I now have one apple. That’s not just not how English works, nor how the C family of languages works.
Instead, zero corresponds to the empty set, not the set of cardinality one.
This is reminiscent of counting storeys / floors in a building, which differs in UK vs US. You can count the literal floors (the elements: one, two, three, etc) or you can count the stairs you have to climb to get there (zero, one, two, etc).
One of many reasons why C has to work this way is because you want the pointer addressing offset (when you do
string_name+1
for example) to match up with the array offset subscript.