Strong agree. Shot myself in the foot too many times with closed intervals. I’ve been using ‘start’ and ‘endx’ as variable names, where x means exclusive.
In an earlier job this was one of the design decisions I would not budge on, as it allowed me to write a relatively complex piece of code (basically, dealing with overlapping time segments, including forever into the past and future) with IMO close to minimal complexity. It was really satisfying to be able to join intervals and check whether a set of intervals were contained within another set of intervals, all backed by a thorough test suite. I’ve wondered about re-implementing it as a library, but I’m sure there are already excellent alternatives out there, and I don’t really have a use case.
In some way I can’t quite express it seems analogous to how indexes should start at zero.
Here’s at least one way it’s analogous to “indexes should start at zero”: [begin, end) is the same as [start, start+length); the full interval becomes [0, length). If you use 1-indexing, [closed, open) doesn’t work as nice; you end up with [start+1, start+length+1), or (start, start+length] (where the start is exclusive and end is inclusive).
I have been bitten be mistakes using closed intervals, but I also recently found a use case in which I think they’re entirely appropriate. I wrote a text line wrapping algorithm for the Go Text Project, and it needs to be able to map from a range of input runes to a range of shaped output font glyphs. The glyphs are always returned from the text shaper in left-to-right order, even when the language displayed runs right-to-left. They also have the property that every rune is logically part of one of the output glyph clusters, so there’s never an empty interval to worry about. Using inclusive ranges helped a lot with building the mapping between runes and glyphs because I didn’t need to worry about which end of the range was getting chopped off.
For instance, asking for the glyph range corresponding to runes [12:17] could return glyphs [8:12] or glyphs [12:8] depending on text direction (where I can just swap the start and end to get valid indices).
Agreed on general, though sometimes you need a closed-closed interval. You can’t represent the range [0, MAXINT] with open-closed, because MAXINT+1 is not one bigger than MAXINT.
No, this is stupid. Use the right interval for the job. I do have a strong bias for such half-open intervals, but they are obviously not always the right thing. In a recent project, I thought I was being clever by using [) intervals where most of the literature uses [] intervals; turns out that using exclusively either leads to breakage in certain edge-cases.
It’s not obvious to me that the benefits from choosing a specific type of interval for each application outweigh the downsides from lack of consistency. Do you perhaps have any notational conventions that help alleviate these pitfalls?
Consistency is not an excuse for giving incorrect answers.
I don’t follow. You can always rewrite a [] interval as a [) interval, can’t you? I don’t see why enforcing consistency has to lead to incorrect answers.
You can’t naively choose the open bound for a closed bound because you need to know the implied precision. You can often infer it but not always. Especially tricky when you mix in floating point errors.
You can’t convert an open bound to a closed one if your type has arbitrary precision.
Closed intervals are great and intuitive for all kinds of things. E.g. fetching the second to fifth elements in an array: A[2:5] in Julia (1-based indexing with inclusive ranges) vs A[1:5] in python (0-based with right-open interval).
Open intervals are good for stuff too, but I don’t think Dijkstra or this author give compelling reasons to always choose [) intervals. E.g. their argument for finding the length of an interval, if you have interval objects then you can just use length(2:5).
It’s somewhat involved, and I don’t think the details are too important; it does involve nonintegers, as the siblings say. A 2-D vector rendering algorithm will exhibit artifacts if the bounds on its primitives are always half-open.
The GP asked about notational conventions—I realised I do have something to say here: in this project, I use bitmaps to indicate which bounds are inclusive and which exclusive. As a final preprocessing step, purely for performance reasons, I regularise the bounds, pursuant to the particular floating/fixed-point representation in use.
not obvious to me that the benefits from choosing a specific type of interval for each application outweigh the downsides from lack of consistency.
To be fair the benefits in the reverse direction are not clear to me. I think I agree with the “right interval for right job” perspective, but this may be down to personal preference.
Often you will want to operate on the end of an interval and not recalculate it each time. It’s common for range functions to accept either start end or start length for construction, tho
Always? No. I’ve had to write range(0, x+1) so many times when I want to work with an inclusive endpoint, and it’s not just annoying, but also fails to communicate intent. Is the +1 part of the algorithm, or just a choice of closed interval? It’s not clear.
I appreciate that Kotlin gives you both: a..b is closed, a until b is half-open. I use both all the time! Having just one is really irritating, no matter which one it is.
(I also appreciate that IntelliJ displays them with an appropriate mix of < and ≤ signs, removing all ambiguity.)
Yep, half-open intervals and zero-based indexes are the way. When I was learning programming with BASIC, I had so many off-by-one errors due to the way that for-loops there are inclusive. When I later learned C and then C++, embracing the half-open interval was helpful. Then, learning the STL-derived part of the standard library really hammered the lesson home. I honestly can’t remember the last time now that I had an off-by-one error when using the half-open interval.
Also, I work in graphics. Dealing with linearized storage of multi-dimensional arrays at a low level would be so much more of a hassle with closed intervals or one-based indexing.
Strong agree. Shot myself in the foot too many times with closed intervals. I’ve been using ‘start’ and ‘endx’ as variable names, where x means exclusive.
In an earlier job this was one of the design decisions I would not budge on, as it allowed me to write a relatively complex piece of code (basically, dealing with overlapping time segments, including forever into the past and future) with IMO close to minimal complexity. It was really satisfying to be able to join intervals and check whether a set of intervals were contained within another set of intervals, all backed by a thorough test suite. I’ve wondered about re-implementing it as a library, but I’m sure there are already excellent alternatives out there, and I don’t really have a use case.
In some way I can’t quite express it seems analogous to how indexes should start at zero.
Here’s a famous article by Dijkstra about it: https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html
Here’s at least one way it’s analogous to “indexes should start at zero”: [begin, end) is the same as [start, start+length); the full interval becomes [0, length). If you use 1-indexing, [closed, open) doesn’t work as nice; you end up with [start+1, start+length+1), or (start, start+length] (where the start is exclusive and end is inclusive).
I have been bitten be mistakes using closed intervals, but I also recently found a use case in which I think they’re entirely appropriate. I wrote a text line wrapping algorithm for the Go Text Project, and it needs to be able to map from a range of input runes to a range of shaped output font glyphs. The glyphs are always returned from the text shaper in left-to-right order, even when the language displayed runs right-to-left. They also have the property that every rune is logically part of one of the output glyph clusters, so there’s never an empty interval to worry about. Using inclusive ranges helped a lot with building the mapping between runes and glyphs because I didn’t need to worry about which end of the range was getting chopped off.
For instance, asking for the glyph range corresponding to runes [12:17] could return glyphs [8:12] or glyphs [12:8] depending on text direction (where I can just swap the start and end to get valid indices).
Agreed on general, though sometimes you need a closed-closed interval. You can’t represent the range [0, MAXINT] with open-closed, because MAXINT+1 is not one bigger than MAXINT.
No, this is stupid. Use the right interval for the job. I do have a strong bias for such half-open intervals, but they are obviously not always the right thing. In a recent project, I thought I was being clever by using [) intervals where most of the literature uses [] intervals; turns out that using exclusively either leads to breakage in certain edge-cases.
It’s not obvious to me that the benefits from choosing a specific type of interval for each application outweigh the downsides from lack of consistency. Do you perhaps have any notational conventions that help alleviate these pitfalls?
Consistency is not an excuse for giving incorrect answers.
I do not. Else-thread, it was suggested to use a name ending in ‘x’ to refer to an exclusive bound on a range.
I don’t follow. You can always rewrite a [] interval as a [) interval, can’t you? I don’t see why enforcing consistency has to lead to incorrect answers.
Here’s a different way to put this: can you give more details about why [] intervals were so much better in the project you’re discussing? (whereswaldon gives an example in another part of the comments, but I’m curious to hear yours.)
You can’t naively choose the open bound for a closed bound because you need to know the implied precision. You can often infer it but not always. Especially tricky when you mix in floating point errors.
You can’t convert an open bound to a closed one if your type has arbitrary precision.
Closed intervals are great and intuitive for all kinds of things. E.g. fetching the second to fifth elements in an array: A[2:5] in Julia (1-based indexing with inclusive ranges) vs A[1:5] in python (0-based with right-open interval).
Open intervals are good for stuff too, but I don’t think Dijkstra or this author give compelling reasons to always choose [) intervals. E.g. their argument for finding the length of an interval, if you have interval objects then you can just use length(2:5).
It’s somewhat involved, and I don’t think the details are too important; it does involve nonintegers, as the siblings say. A 2-D vector rendering algorithm will exhibit artifacts if the bounds on its primitives are always half-open.
The GP asked about notational conventions—I realised I do have something to say here: in this project, I use bitmaps to indicate which bounds are inclusive and which exclusive. As a final preprocessing step, purely for performance reasons, I regularise the bounds, pursuant to the particular floating/fixed-point representation in use.
To be fair the benefits in the reverse direction are not clear to me. I think I agree with the “right interval for right job” perspective, but this may be down to personal preference.
What is the case to use (start, end) of either style over (start, length) ?
Often you will want to operate on the end of an interval and not recalculate it each time. It’s common for range functions to accept either start end or start length for construction, tho
A system that only supports “inclusive..inclusive” intervals is definitely a problem.
It is useful to support “inclusive..exclusive”, but it is also useful to support “inclusive..inclusive”.
So I’d suggest that the key is to make it quite obvious which is which.
Always? No. I’ve had to write
range(0, x+1)
so many times when I want to work with an inclusive endpoint, and it’s not just annoying, but also fails to communicate intent. Is the+1
part of the algorithm, or just a choice of closed interval? It’s not clear.I appreciate that Kotlin gives you both:
a..b
is closed,a until b
is half-open. I use both all the time! Having just one is really irritating, no matter which one it is.(I also appreciate that IntelliJ displays them with an appropriate mix of
<
and≤
signs, removing all ambiguity.)Thanks for the info about ranges in Kotlin. Ranges in other languages that support both types:
Range
objects.0..3
is closed and0...3
is half-open. Also,..3
is beginless and0..
is endless.0..<3
creates a half-openRange
and0...3
creates aClosedRange
.Ah, cool! Ruby’s syntax scares me a little (so visually similar!) but Swift’s looks nice.
Yep, half-open intervals and zero-based indexes are the way. When I was learning programming with BASIC, I had so many off-by-one errors due to the way that for-loops there are inclusive. When I later learned C and then C++, embracing the half-open interval was helpful. Then, learning the STL-derived part of the standard library really hammered the lesson home. I honestly can’t remember the last time now that I had an off-by-one error when using the half-open interval.
Also, I work in graphics. Dealing with linearized storage of multi-dimensional arrays at a low level would be so much more of a hassle with closed intervals or one-based indexing.
Good article. Reads almost like a Code Complete chapter.