SQLite was originally designed with a policy of avoiding arbitrary limits. Of course, every program that runs on a machine with finite memory and disk space has limits of some kind. But in SQLite, those limits were not well defined. The policy was that if it would fit in memory and you could count it with a 32-bit integer, then it should work.
Unfortunately, the no-limits policy has been shown to create problems. Because the upper bounds were not well defined, they were not tested, and bugs (including possible security exploits) were often found when pushing SQLite to extremes.
Have you seen any other good illustrations? I’ve suffered through a lot of code like this, which is a good illustration of why “Zero, One, Infinity” is often better than the alternative:
And of course we all remember how pre-GNU Unix utilities almost all had some (usually undocumented) maximum line length, and pre-Perl scripting languages almost all had some maximum string length (often 255 bytes), and so anything that might contain unusually long lines was prone to break shit, often in surprising and exploitable ways. Or maybe we don’t all remember this. But it sure happened.
A lot of this depends on how you handle errors, including resource-exhaustion errors. If you just log a stack trace and restart the system without the error-inducing input, that’s probably hard to exploit, although maybe an attacker who can detect the restart can infer something about the length of some data that’s supposed to be secret from them, which can eventually reveal the secret data to them if it’s, say, getting gzipped along with data they themselves supply. But if you handle errors by truncating strings, or by truncating result sets, or by overwriting your stack frame (as in my above example C code), it’s often a major vulnerability!
Maybe one example of saying no to infinity would be in the C standard; §5.2.4.1 of ISO/IEC 9899:TC2 (at least as of the 2005 draft) says:
The implementation shall be able to translate and execute at least one program that contains at least one instance of every one of the following limits:
127 nesting levels of blocks
63 nesting levels of conditional inclusion
12 pointer, array, and function declarators (in any combinations) modifying an arithmetic, structure, union, or incomplete type in a declaration
…
Plauger wrote about the reasoning behind this kind of thing at some point. He talked about his experience with some previous language standard that required that some such thing, maybe identifier lengths, would have no limit. (I read the book in the 1990s, so I can’t remember exactly.) But since that’s untestable, people who wrote standards testing suites had to choose some maximum length to test, and then people wrote their compilers to pass the tests, and then rejected future standards testing suites that tested larger maximum lengths. So, effectively, the standards committee had merely delegated the determination of the minimum supported identifier length to the test-suite authors.
So the C standard, where Plauger was on the committee (maybe still is?) was careful to specify some minimum maximums. So if your C compiler only allocates space for 16 levels of nested blocks, it’s definitely not compliant (at least with the draft I quoted above), even though that’s far more levels than it’s reasonable to write by hand.
That seems like a perfectly reasonable reason to use zero-one-infinity. It’s genuinely is an arbitrary limit. Rather, it’s one based on certain factors which are nigh guaranteed to change over the lifetime of the deployed artifact (e.g., size of memory).
Some such constraints are valuably arbitrary. 32-bit architectures were an “arbitrary” choice but also one so significant that (a) not setting that limit would destroy performance and (b) then change from 25 to 26 was going to be a big one in any circumstance. It doesn’t make 5 less arbitrary than 6, though.
A place where zero-one-infinity fails badly might be backpressure in a distributed system. Zero and one are clearly out when talking about rates of delivery—there’s not even any good sense in which they could be the answer. However, if you jump next to infinity, as many people do, you design systems which are highly susceptible to failure under spikes in demand. It turns out that you have a fuzzily-bounded resource limit and by treating the upper bound arbitrarily you’ll suffer very bad side effects.
The right answer is to, of course, pick a “moderate load” set point and build a control system around that set point which you tune so as to set the distribution on state space to have the right expected load and the right variance.
It’s a completely different kind of resource limiting when you avoid zero-one-infinity. You cannot even use the terminology of hard limits any longer in the same way that taking “infinity” seriously means that you have to start thinking of streaming/lazy algorithms instead of batch ones.
a good illustration of why “Zero, One, Infinity” is often better than the alternative:
The version where you malloc() and realloc() the buf as you go doesn’t fix the whole problem. You still need to handle the case that you run out of memory gracefully, and you still need to apply a limit to input length. It’s better in that it scales to machines with more memory more gracefully (modulo vm_overcommit issues and the like), but it’s worse in that it hides the problem from the programmer.
There’s no single clear solution, of course. SQLite’s “buttons and knobs” approach is better in some circumstances, but adds a bunch of complexity. A fixed-size buffer is better in other circumstances, because it’s simple. The realloc approach is better in some circumstances, because it’s more flexible and future proof. Any solution without bounds checking is broken.
A lot of this depends on how you handle errors, including resource-exhaustion errors.
vm_overcommit behavior in Linux (and similar behavior in other modern OSs) makes resource exhaustion of memory hard to do right. Accepting input until malloc or realloc return NULL is going to lead to some very interesting impacts on the overall system.
Plauger wrote about the reasoning behind this kind of thing at some point.
That’s a great example. Any idea where I could dig up Plauger’s writing?
The version where you malloc() and realloc() the buf as you go doesn’t fix the whole problem. You still need to handle the case that you run out of memory gracefully, and you still need to apply a limit to input length.
Yeah, I agree. But it seems like you could do a pretty good job of that with a ulimit, as long as the software handles running out of memory in a reasonable way, and using stralloc rather than C strings usually results in simpler and more robust software.
Any idea where I could dig up Plauger’s writing?
He wrote some popular columns, but probably his books, which republished most of the columns, are the easiest way to read what he wrote. I read him for the first time when the university library in my small town was throwing out Elements of Programming Style. I probably read this reasoning in one of the Programming on Purpose series. Sadly, he doesn’t seem to be blogging.
https://www.sqlite.org/limits.html
Thanks you, that’s a great illustration.
Have you seen any other good illustrations? I’ve suffered through a lot of code like this, which is a good illustration of why “Zero, One, Infinity” is often better than the alternative:
And of course we all remember how pre-GNU Unix utilities almost all had some (usually undocumented) maximum line length, and pre-Perl scripting languages almost all had some maximum string length (often 255 bytes), and so anything that might contain unusually long lines was prone to break shit, often in surprising and exploitable ways. Or maybe we don’t all remember this. But it sure happened.
A lot of this depends on how you handle errors, including resource-exhaustion errors. If you just log a stack trace and restart the system without the error-inducing input, that’s probably hard to exploit, although maybe an attacker who can detect the restart can infer something about the length of some data that’s supposed to be secret from them, which can eventually reveal the secret data to them if it’s, say, getting gzipped along with data they themselves supply. But if you handle errors by truncating strings, or by truncating result sets, or by overwriting your stack frame (as in my above example C code), it’s often a major vulnerability!
Maybe one example of saying no to infinity would be in the C standard; §5.2.4.1 of ISO/IEC 9899:TC2 (at least as of the 2005 draft) says:
Plauger wrote about the reasoning behind this kind of thing at some point. He talked about his experience with some previous language standard that required that some such thing, maybe identifier lengths, would have no limit. (I read the book in the 1990s, so I can’t remember exactly.) But since that’s untestable, people who wrote standards testing suites had to choose some maximum length to test, and then people wrote their compilers to pass the tests, and then rejected future standards testing suites that tested larger maximum lengths. So, effectively, the standards committee had merely delegated the determination of the minimum supported identifier length to the test-suite authors.
So the C standard, where Plauger was on the committee (maybe still is?) was careful to specify some minimum maximums. So if your C compiler only allocates space for 16 levels of nested blocks, it’s definitely not compliant (at least with the draft I quoted above), even though that’s far more levels than it’s reasonable to write by hand.
That seems like a perfectly reasonable reason to use zero-one-infinity. It’s genuinely is an arbitrary limit. Rather, it’s one based on certain factors which are nigh guaranteed to change over the lifetime of the deployed artifact (e.g., size of memory).
Some such constraints are valuably arbitrary. 32-bit architectures were an “arbitrary” choice but also one so significant that (a) not setting that limit would destroy performance and (b) then change from 25 to 26 was going to be a big one in any circumstance. It doesn’t make 5 less arbitrary than 6, though.
A place where zero-one-infinity fails badly might be backpressure in a distributed system. Zero and one are clearly out when talking about rates of delivery—there’s not even any good sense in which they could be the answer. However, if you jump next to infinity, as many people do, you design systems which are highly susceptible to failure under spikes in demand. It turns out that you have a fuzzily-bounded resource limit and by treating the upper bound arbitrarily you’ll suffer very bad side effects.
The right answer is to, of course, pick a “moderate load” set point and build a control system around that set point which you tune so as to set the distribution on state space to have the right expected load and the right variance.
It’s a completely different kind of resource limiting when you avoid zero-one-infinity. You cannot even use the terminology of hard limits any longer in the same way that taking “infinity” seriously means that you have to start thinking of streaming/lazy algorithms instead of batch ones.
The version where you malloc() and realloc() the buf as you go doesn’t fix the whole problem. You still need to handle the case that you run out of memory gracefully, and you still need to apply a limit to input length. It’s better in that it scales to machines with more memory more gracefully (modulo vm_overcommit issues and the like), but it’s worse in that it hides the problem from the programmer.
There’s no single clear solution, of course. SQLite’s “buttons and knobs” approach is better in some circumstances, but adds a bunch of complexity. A fixed-size buffer is better in other circumstances, because it’s simple. The realloc approach is better in some circumstances, because it’s more flexible and future proof. Any solution without bounds checking is broken.
vm_overcommit behavior in Linux (and similar behavior in other modern OSs) makes resource exhaustion of memory hard to do right. Accepting input until malloc or realloc return NULL is going to lead to some very interesting impacts on the overall system.
That’s a great example. Any idea where I could dig up Plauger’s writing?
Yeah, I agree. But it seems like you could do a pretty good job of that with a ulimit, as long as the software handles running out of memory in a reasonable way, and using
stralloc
rather than C strings usually results in simpler and more robust software.He wrote some popular columns, but probably his books, which republished most of the columns, are the easiest way to read what he wrote. I read him for the first time when the university library in my small town was throwing out Elements of Programming Style. I probably read this reasoning in one of the Programming on Purpose series. Sadly, he doesn’t seem to be blogging.