Hm, it seems to me that assertion discussion which doesn’t touch on ideas from https://www.sqlite.org/assert.html is misleadingly incomplete.
In large, non-batch programs which change over time, some assertions are bound to fire in production after subtle refactors. With the today’s typical assertion machinery, this leads to people either not using assertion at all out of caution, or disabling them for prod builds (which in turn means that asserts are what the programmer is wishing to be true, not something which actually is true empirically).
The SQLite idea that you want assertions that abort in testing, but recover via short-circuiting (and login, if appropriate) in production deserves to be better known, and better enshrined in standard libraries.
The idea of using testcase(cond) for test coverage is great.
But I’m not sure about utility of short-circuiting if never(cond). If your assumption that something never happens is violated, that must be a serious issue. It smells like a recipe for having completely untested “error handling” paths that will cause cascading damage if they ever run.
If your assumption that something never happens is violated, that must be a serious issue.
Not necessary. Big programs usually have a boatload of features, the majority of those are not essential/critical for operation, and there’s usually some kind of a good recovery boundary anyway. Every large reliable system is Erlang in the limit :-)
It is true that the bail-out paths would be untested by definition, so you always want them to be essentially return None; or some such.
I think the under-articulated point is that “cascading damage” is often not how software works: you don’t have a giant, intricate clockwork where every detail depends exactly on every other detail, we are simply not good enough to make those kinds of architectures work. The reality is usually more like a Linux kernel, which has a relatively narrow core to it, and a massive amounts of drivers of varying code quality.
Asserts are useful in situations where the assert failing means the program as a whole is deeply, irrecoverably messed up, and there’s nothing you can do but die gracefully
Failures in non-core components should use exceptions/optional types/whatever your language mechanism for recoverable failure is, and the core system should be able to reset those components to a known good state even from an ‘impossible’ bad one
I think this makes sense, but it doesn’t really address library code. A data structure library can’t possibly know whether or not it’s core to the program it winds up running in; should it use asserts or recoverable failure?
Yeah, that’s a good summary! Couple of more details:
asserts are also ok if the reasoning for them is purely local (“developer has a proof” from SQLite docs)
sometimes (Erlang) assert is a perfectly fine recoverable failure
For libraries, situation is trickier. If you do data structures, you have a chance to exhaustively (property based testing/fuzzing) test them, and usually data structures are small, so this case probably falls into “purely local assert” category. But this also is an easy case
A case which helped me to conceptualize the library issue recently is the SVG rendering library. It could have a simple interface like this:
But internally rendering SVG I would imagine is quite hard and fiddly and is likely to have some latent bugs somewhere. But also, as a user of this library, I also obviously don’t want some bogus SVG to bring my whole process down. So I think we do want to ensure(ideally, statically) that this library doesn’t have panicking paths anywhere.
LLVM has suffered from this a lot over the years. LLVM is intended to be usable as a library, but most of the consumers are programs that take a single trusted input, produce output, and then exit. In a lot of places, LLVM uses asserts rather than reporting errors for invalid inputs. This is absolutely fine for, say, clang: if clang generates invalid LLVM IR (or uses the LLVM support APIs incorrectly) then there’s a clang bug and providing a stack trace and asking users to file a bug report is probably the simplest thing to do. But then you try using LLVM somewhere like a WebGL stack and now you’re forced to accept input that is written by malicious actors specifically to try to exploit bugs in the compiler and if you crash then you may take down the entire browser renderer process with you. If you build without asserts, then the code doesn’t have proper error handling and so the attacker can get it into an invalid state. There’s been a lot of work over the years to try to introduce better error handling.
Assertions feel like a half-completed programming features, especially when used for preconditions and postconditions. The notion of the design-by-contract is that it’s supposed to be visible to the client, to indicate the requirements and products of a particular function. This isn’t true unless you have access to the code using assertions for pre/post conditions.
I like and use assertions, but they often feel like a crutch used to express program design and expected value constraints when the language itself is inadequate in this regard. There’s often better ways to express these things, e.g. creating a new type which has constraints verified like with Rust’s new type pattern.
Compilers such as GCC and LLVM ship with assertions enabled, making the compiler more likely to die and less likely to emit incorrect object code. On the other hand, I heard from a NASA flight software engineer that some of the tricky Mars landings have been done with assertions turned off because an assertion violation would have resulted in a system reboot and by the time the reboot had completed, the spacecraft would have hit the planet. The question of whether it is better to stop or keep going when an internal bug is detected is not a straightforward one to answer.
This is the right discussion to have, and approaches what I believe the correct usage of assertions is: in order to detect absurd or absolutely incorrect situations. In my first job we built telephone softswitches, big computers that could service phone calls for towns of < 1 million people. I remember an assert going off for a customer that eventually boiled down to 3 + 1 == 5 and that the code itself was correct. I recommended that the particular component that came up with the calculation be replaced, and it was, the theory being hardware / cosmic rays / who knows going wrong with the hardware. Moreover, once basic arithmetic no longer works you don’t want to keep running the switch, it may e.g. bill a customer incorrect. The overall system was architected to have two systems running concurrently as primary / warm backup, so there was only a small availability impact, but still concerning. I’m pretty concerned the NASA lander did not have a hot/warm backup running.
Assertions should have no side-effects and nothing to do with exceptions or errors, they are the last line of defense in a world full of actual hardware that can fail.
Hm, it seems to me that assertion discussion which doesn’t touch on ideas from https://www.sqlite.org/assert.html is misleadingly incomplete.
In large, non-batch programs which change over time, some assertions are bound to fire in production after subtle refactors. With the today’s typical assertion machinery, this leads to people either not using assertion at all out of caution, or disabling them for prod builds (which in turn means that asserts are what the programmer is wishing to be true, not something which actually is true empirically).
The SQLite idea that you want assertions that abort in testing, but recover via short-circuiting (and login, if appropriate) in production deserves to be better known, and better enshrined in standard libraries.
The idea of using
testcase(cond)
for test coverage is great.But I’m not sure about utility of short-circuiting
if never(cond)
. If your assumption that something never happens is violated, that must be a serious issue. It smells like a recipe for having completely untested “error handling” paths that will cause cascading damage if they ever run.Not necessary. Big programs usually have a boatload of features, the majority of those are not essential/critical for operation, and there’s usually some kind of a good recovery boundary anyway. Every large reliable system is Erlang in the limit :-)
It is true that the bail-out paths would be untested by definition, so you always want them to be essentially
return None;
or some such.I think the under-articulated point is that “cascading damage” is often not how software works: you don’t have a giant, intricate clockwork where every detail depends exactly on every other detail, we are simply not good enough to make those kinds of architectures work. The reality is usually more like a Linux kernel, which has a relatively narrow core to it, and a massive amounts of drivers of varying code quality.
Is this an accurate summary of your position?
I think this makes sense, but it doesn’t really address library code. A data structure library can’t possibly know whether or not it’s core to the program it winds up running in; should it use asserts or recoverable failure?
Yeah, that’s a good summary! Couple of more details:
For libraries, situation is trickier. If you do data structures, you have a chance to exhaustively (property based testing/fuzzing) test them, and usually data structures are small, so this case probably falls into “purely local assert” category. But this also is an easy case
A case which helped me to conceptualize the library issue recently is the SVG rendering library. It could have a simple interface like this:
But internally rendering SVG I would imagine is quite hard and fiddly and is likely to have some latent bugs somewhere. But also, as a user of this library, I also obviously don’t want some bogus SVG to bring my whole process down. So I think we do want to ensure(ideally, statically) that this library doesn’t have panicking paths anywhere.
LLVM has suffered from this a lot over the years. LLVM is intended to be usable as a library, but most of the consumers are programs that take a single trusted input, produce output, and then exit. In a lot of places, LLVM uses asserts rather than reporting errors for invalid inputs. This is absolutely fine for, say, clang: if clang generates invalid LLVM IR (or uses the LLVM support APIs incorrectly) then there’s a clang bug and providing a stack trace and asking users to file a bug report is probably the simplest thing to do. But then you try using LLVM somewhere like a WebGL stack and now you’re forced to accept input that is written by malicious actors specifically to try to exploit bugs in the compiler and if you crash then you may take down the entire browser renderer process with you. If you build without asserts, then the code doesn’t have proper error handling and so the attacker can get it into an invalid state. There’s been a lot of work over the years to try to introduce better error handling.
Assertions feel like a half-completed programming features, especially when used for preconditions and postconditions. The notion of the design-by-contract is that it’s supposed to be visible to the client, to indicate the requirements and products of a particular function. This isn’t true unless you have access to the code using assertions for pre/post conditions.
I like and use assertions, but they often feel like a crutch used to express program design and expected value constraints when the language itself is inadequate in this regard. There’s often better ways to express these things, e.g. creating a new type which has constraints verified like with Rust’s new type pattern.
This is the right discussion to have, and approaches what I believe the correct usage of assertions is: in order to detect absurd or absolutely incorrect situations. In my first job we built telephone softswitches, big computers that could service phone calls for towns of < 1 million people. I remember an assert going off for a customer that eventually boiled down to
3 + 1 == 5
and that the code itself was correct. I recommended that the particular component that came up with the calculation be replaced, and it was, the theory being hardware / cosmic rays / who knows going wrong with the hardware. Moreover, once basic arithmetic no longer works you don’t want to keep running the switch, it may e.g. bill a customer incorrect. The overall system was architected to have two systems running concurrently as primary / warm backup, so there was only a small availability impact, but still concerning. I’m pretty concerned the NASA lander did not have a hot/warm backup running.Assertions should have no side-effects and nothing to do with exceptions or errors, they are the last line of defense in a world full of actual hardware that can fail.