It would be helpful to know which languages / compilers / tools you had in mind for this.
Therefore, I will choose simplicity and expressiveness over strict static analysis, every single time.
Is this really the trade-off? The language I use that has really strict static analysis is simpler and more expressive than the language I use which has much weaker static analysis. And in that weaker language (it’s C) things get more consistent (and therefore I would say simpler) when we enable more static analysis through warnings and analyzers.
Reliable software is extensively tested software, and extensively tested software is running software. If static analysis helps us get there, this is good.
Sure, we’re still writing tests for our statically analysed code. If anything we can write more tests when we don’t need to worry about the entire classes of errors that were ruled out by the analysis.
But if it too restrictive, too baroque, and gives too many false positives, then nothing will be run. We’re left with no software, just a bitter argument about dead code that a machine has won.
Whether something is too restrictive is personal preference, but there’s a whole lot of code out there written in strictly analysed languages. The machine is helping us win.
But while they can help with our goal, they cannot judge if we have achieved it. For this goal static analysis is merely a tool; it is not an assurance.
In the case of the language I’m thinking of (it’s Rust) it is an assurance. In exchange for the rare false positive we have assurance that a whole class of bugs is impossible. The places where you need to work around the false positives are where you write your own tests.
Judging from the comments here, one thing I absolutely did not make clear here is that I view static analysis is a sort of continuum. Let’s take static typing. It’s not a binary. There’s levels:
ruby - everything is a method, no feedback if you mispell a method name
racket - can statically determine if a procedure exists or not
go, C - primitive static type checking, can help you with basic things
ocaml, haskell - sophisticated static type checking, first class modules / type classes, excellent type inference, sum types, higher kinded types, etc
What I’m trying to say is pick your point on the continuum as high as you can such that you’re still productive, and still writing clear code - but no higher. The languages I had in mind when writing this were rust and typescript, where you can, if you so wish, dial back the static analysis, and where a lot of people (including myself) get stuck in extremely unproductive mental masturbation exercises to try and satisfy static analysis, that can completely grind projects to a halt.
I disagree with the notion more static analysis means you can write less tests. Or rather, not for the sort of testing I was talking about (which I did in fairness make too vague). I definitely did not have simple example based testings in mind. I was thinking thinking deterministic simulation, fuzzing, etc. Doing less fuzzing of your inputs because you used a borrow checker does not make sense.
If you’re not doing this sort of testing, then let’s be frank and say you don’t care that much about reliability. Which is fine! Not every single project needs that care. But then we can also say it’s fine that a 50 line javascript web app doesn’t use typescript.
I didn’t quite “get” your point in the article until this comment, so thank you.
I’ll add, if only to make it explicit, that there are different areas in both safety and design that static tools apply to: you’re right that a borrow checker won’t help much with parser code, and neither will it help you design data structures well, but to that end you might still do well going higher on the typing continuum with the confidence to tell the compiler to shut up when you know better (there’s always an escape hatch).
That’s where “expressiveness” has nuance: I only really want to express programs that are going to work, else inevitably waste a lot of time exploring useless design space. “Simplicity” similarly needs disambiguating between its 5+ interpretations to mean anything (Hickey wasn’t precise enough).
types as a correctness tool are only useful insofar as you can’t just run the code and see if it works.
Eh, most useful when you can’t run the code and see if it works, sure. Only useful?
Shifting errors earlier in the development (or manufacturing, where I think I stole this idea from) process always makes the errors cheaper to handle. Even a short shift (compile time to test time) means I have a much more specific statement as to where the mistake is, it means it shows up in my IDE and I don’t have to context switch back to my code, it means I don’t have to wait for the test-suite to run, etc.
There’s also a cost to correctness tools of course (false positives, time spent annotating code to assist the tool, etc). If the only value in them was shifting errors from unit tests to IDE squiggles I’m not sure many of them would be worth it. That doesn’t mean that the tools don’t also deliver value though, rather that like most things they have both pros and cons.
Eh, most useful when you can’t run the code and see if it works, sure. Only useful?
This reminds me of a quip I heard from one of the old folks who taught at my uni a long time ago. Back in the sixties or seventies, something like that, (“old folks” isn’t some euphemism here; the good man was about ninety at the time) he’d worked on an otherwise very interesting research project on what we’d now call symbolic synthesis for electrical circuits – basically a sort of unified methodology through which you’d be given a target transfer function of a circuit along with the transfer function of some components and some topological restrictions, and come up with a circuit that did exactly that, in an easily-repeatable manner.
The whole thing was meant to fix a problem that the advent of smaller and smaller transistors and increasingly integrated circuits were introducing – you couldn’t “just” take the transfer function of a filter and come up with the damn thing; they basically hoped they’d be able to “fuse” a qualitative model of their passive and active components with what their circuit was supposed to do, and come up with something that worked (more or less) the first time around.
To my fourth year mind that sounded both really cool and incredibly useful – I was working on something very different at the time but it boiled down to trying to fix a similar problem, where you’d have to take a stab at what you wanted to make, get the physical thing made (which was expensive and took a long time) only to find out it wasn’t quite what you needed and repeat the whole thing a few times. Each iteration was basically an exercise in figuring out what physical phenomenon you’d missed or what incorrect assumption you’d made.
So I asked him where I could read all that and he just shrugged it away like, sure, I can tell you all about it, but it didn’t really take off, I don’t know if it’s going to help. Everyone I showed it to hated it, it turned out to be so impractical that you were better off making ten prototypes than doing that whole thing once. I was like BUT WHY, the whole point was that you’d never do something wrong, that sounds super useful! He said nope, it’s only useful if you can also do everything right with it – if all you want is to make sure the design engineers never do something stupid you can just unplug the PVD machine.
Obviously, apples and oranges – code and thin film inductors are very different things – but the whole thing stuck with me. Both guarantee the absence of errors, but there’s a spectrum between “never making a filter with an incorrect transfer function” and “never making a filter except on paper”, and the usefulness of a tool that claims to eliminate errors is measured not only by how close it is to the former, but also by how far it is from the latter.
Eh, most useful when you can’t run the code and see if it works, sure. Only useful?
“As a correctness tool” is an important qualifier!
Types are tremendously useful for making the code fast, and for various IDE tooling (where the simplest “IDE tooling” is a human reading function signature as a succinct documentation).
Shifting errors earlier in the development (or manufacturing, where I think I stole this idea from) process always makes the errors cheaper to handle. Even a short shift (compile time to test time) means I have a much more specific statement as to where the mistake is, it means it shows up in my IDE and I don’t have to context switch back to my code, it means I don’t have to wait for the test-suite to run, etc.
Shifting error detection earlier in development is indeed great. But strong static analysis often comes with long compile times. For many simpler problems, the problem that would be caught by static analysis during compilation might take longer to get caught compared to running a test suite in an interpreted language that also catches the same problem.
Don’t get me wrong - static analysis is amazing at catching the errors that would happen rarely in use that may be of critical importance - but I feel like the “earlier in development” argument starts to loose weight with the time that static analysis takes increasing. “Dumb errors” at least in my experience is the majority of the “write-test-fix” loop, and they can be trivially found by just running basic tests without static analysis.
Agreed. I like how Types And Programming Languages put it:
On one end of the spectrum [of formal methods] are powerful frameworks such as Hoare logic, algebraic specification languages, modal logics, and denotational semantics. These can be used to express very general correctness properties but are often cumbersome to use and demand a good deal of sophistication on the part of programmers. At the other end are techniques of much more modest power – modest enough that automatic checkers can be built into compilers, linkers, or program analyzers and thus be applied even by programmers unfamiliar with the underlying theories.
…
A type system can be regarded as calculating a kind of static approximation to the run-time behaviors of the terms in a program.
Ideally if something typechecks correctly, then that doesn’t prove the program is correct, but if something fails to typecheck, that definitely proves it’s incorrect (within the bounds of the proof written, ie the program’s types).
I think one other really important context is code at security boundaries facing adversarial inputs. That’s another place where you really need to work – or, at least, uphold certain properties – for all inputs, not just the ones you tested or the ones “similar to” the ones that can be tested.
Reliable software is extensively tested software, and extensively tested software is running software
This has been proven false countless of times. In non-trivial programs the state space can be too large to test exhaustively. The states that lead to vulnerabilities are not the states used during normal program execution.
For example, if you’re parsing a file, your code will be battle-tested on correct and slightly broken files. It may be further tested on random fuzzed files, but there may still be a very specific combination of inputs that never happens in real files, and that’s too rare to be discovered by random chance, but which can be deduced by an attacker.
e.g. NSS – millions of users, decades of real-world testing, fuzzed, and yet missed a basic safety issue:
Yes, testing your software is a way to insure you reach your goal, but that post doesn’t talk about « how to be sure the full system is really tested completely? » For that matter, I think static analysis coupled with testing is necessary. It’s super easy not to test some part of the code (conditional compilation, target-based branch, edge-cases that require a non-trivial prop-based testing, etc.).
We enjoy short posts, but hard to tell whether in this one the author chooses to confuse correctness wrt a spec with safe execution. Without a spec, nothing is “correct”, and without a type system your runtime errors are deferred to the point where they’re hardest to fix (the call site).
If my post was so short that it was merely a blurb, then I might hope it could be read a little more carefully. Eg:
If a machine can find our mistakes early, if they can prevent us from even running code that will not behave how we expect, then this is a good and useful thing.
Perhaps my lack of empathy comes from the lack of effort in the original post. And frankly, after 20+ years of hearing the same arguments being made about type systems, expressiveness, and simplicity with not much to show for it, I get annoyed when there is a self-submitted blurb that says nothing original nor carries any weight. “I will choose simplicity and expressiveness over strict static analysis, every single time” is as meaningful these days as the Agile Manifesto.
Reliable software is a software where anyone at all cares about reliability, who here does not have an example of a system failing the basic data consistency check with nobody with sufficient access to the system reacting to the reported problems?
Well-chosen safety properties allow avoiding spending proactive care on some things you definitely want upheld.
Sometimes, you cannot making writing reliable code sufficiently easy, so the next best thing is making writing unreliable code harder.
It would be helpful to know which languages / compilers / tools you had in mind for this.
Is this really the trade-off? The language I use that has really strict static analysis is simpler and more expressive than the language I use which has much weaker static analysis. And in that weaker language (it’s C) things get more consistent (and therefore I would say simpler) when we enable more static analysis through warnings and analyzers.
Sure, we’re still writing tests for our statically analysed code. If anything we can write more tests when we don’t need to worry about the entire classes of errors that were ruled out by the analysis.
Whether something is too restrictive is personal preference, but there’s a whole lot of code out there written in strictly analysed languages. The machine is helping us win.
In the case of the language I’m thinking of (it’s Rust) it is an assurance. In exchange for the rare false positive we have assurance that a whole class of bugs is impossible. The places where you need to work around the false positives are where you write your own tests.
Judging from the comments here, one thing I absolutely did not make clear here is that I view static analysis is a sort of continuum. Let’s take static typing. It’s not a binary. There’s levels:
What I’m trying to say is pick your point on the continuum as high as you can such that you’re still productive, and still writing clear code - but no higher. The languages I had in mind when writing this were rust and typescript, where you can, if you so wish, dial back the static analysis, and where a lot of people (including myself) get stuck in extremely unproductive mental masturbation exercises to try and satisfy static analysis, that can completely grind projects to a halt.
I disagree with the notion more static analysis means you can write less tests. Or rather, not for the sort of testing I was talking about (which I did in fairness make too vague). I definitely did not have simple example based testings in mind. I was thinking thinking deterministic simulation, fuzzing, etc. Doing less fuzzing of your inputs because you used a borrow checker does not make sense.
If you’re not doing this sort of testing, then let’s be frank and say you don’t care that much about reliability. Which is fine! Not every single project needs that care. But then we can also say it’s fine that a 50 line javascript web app doesn’t use typescript.
I didn’t quite “get” your point in the article until this comment, so thank you.
I’ll add, if only to make it explicit, that there are different areas in both safety and design that static tools apply to: you’re right that a borrow checker won’t help much with parser code, and neither will it help you design data structures well, but to that end you might still do well going higher on the typing continuum with the confidence to tell the compiler to shut up when you know better (there’s always an escape hatch).
That’s where “expressiveness” has nuance: I only really want to express programs that are going to work, else inevitably waste a lot of time exploring useless design space. “Simplicity” similarly needs disambiguating between its 5+ interpretations to mean anything (Hickey wasn’t precise enough).
As I like to say, types as a correctness tool are only useful insofar as you can’t just run the code and see if it works.
The catch is that running code can be hard. The two examples where code often isn’t run even once:
The magic of formal methods is that, in restricted circumstances, they give you a true “for all” quantifier.
Eh, most useful when you can’t run the code and see if it works, sure. Only useful?
Shifting errors earlier in the development (or manufacturing, where I think I stole this idea from) process always makes the errors cheaper to handle. Even a short shift (compile time to test time) means I have a much more specific statement as to where the mistake is, it means it shows up in my IDE and I don’t have to context switch back to my code, it means I don’t have to wait for the test-suite to run, etc.
There’s also a cost to correctness tools of course (false positives, time spent annotating code to assist the tool, etc). If the only value in them was shifting errors from unit tests to IDE squiggles I’m not sure many of them would be worth it. That doesn’t mean that the tools don’t also deliver value though, rather that like most things they have both pros and cons.
This reminds me of a quip I heard from one of the old folks who taught at my uni a long time ago. Back in the sixties or seventies, something like that, (“old folks” isn’t some euphemism here; the good man was about ninety at the time) he’d worked on an otherwise very interesting research project on what we’d now call symbolic synthesis for electrical circuits – basically a sort of unified methodology through which you’d be given a target transfer function of a circuit along with the transfer function of some components and some topological restrictions, and come up with a circuit that did exactly that, in an easily-repeatable manner.
The whole thing was meant to fix a problem that the advent of smaller and smaller transistors and increasingly integrated circuits were introducing – you couldn’t “just” take the transfer function of a filter and come up with the damn thing; they basically hoped they’d be able to “fuse” a qualitative model of their passive and active components with what their circuit was supposed to do, and come up with something that worked (more or less) the first time around.
To my fourth year mind that sounded both really cool and incredibly useful – I was working on something very different at the time but it boiled down to trying to fix a similar problem, where you’d have to take a stab at what you wanted to make, get the physical thing made (which was expensive and took a long time) only to find out it wasn’t quite what you needed and repeat the whole thing a few times. Each iteration was basically an exercise in figuring out what physical phenomenon you’d missed or what incorrect assumption you’d made.
So I asked him where I could read all that and he just shrugged it away like, sure, I can tell you all about it, but it didn’t really take off, I don’t know if it’s going to help. Everyone I showed it to hated it, it turned out to be so impractical that you were better off making ten prototypes than doing that whole thing once. I was like BUT WHY, the whole point was that you’d never do something wrong, that sounds super useful! He said nope, it’s only useful if you can also do everything right with it – if all you want is to make sure the design engineers never do something stupid you can just unplug the PVD machine.
Obviously, apples and oranges – code and thin film inductors are very different things – but the whole thing stuck with me. Both guarantee the absence of errors, but there’s a spectrum between “never making a filter with an incorrect transfer function” and “never making a filter except on paper”, and the usefulness of a tool that claims to eliminate errors is measured not only by how close it is to the former, but also by how far it is from the latter.
“As a correctness tool” is an important qualifier!
Types are tremendously useful for making the code fast, and for various IDE tooling (where the simplest “IDE tooling” is a human reading function signature as a succinct documentation).
Shifting error detection earlier in development is indeed great. But strong static analysis often comes with long compile times. For many simpler problems, the problem that would be caught by static analysis during compilation might take longer to get caught compared to running a test suite in an interpreted language that also catches the same problem.
Don’t get me wrong - static analysis is amazing at catching the errors that would happen rarely in use that may be of critical importance - but I feel like the “earlier in development” argument starts to loose weight with the time that static analysis takes increasing. “Dumb errors” at least in my experience is the majority of the “write-test-fix” loop, and they can be trivially found by just running basic tests without static analysis.
Oh absolutely, that’s a variant of what I was trying to address with
If the tool costs too much too run, including taking too much time, it can not be worth it, or be worth it but not that often.
Agreed. I like how Types And Programming Languages put it:
Ideally if something typechecks correctly, then that doesn’t prove the program is correct, but if something fails to typecheck, that definitely proves it’s incorrect (within the bounds of the proof written, ie the program’s types).
I really like this perspective!
I think one other really important context is code at security boundaries facing adversarial inputs. That’s another place where you really need to work – or, at least, uphold certain properties – for all inputs, not just the ones you tested or the ones “similar to” the ones that can be tested.
This has been proven false countless of times. In non-trivial programs the state space can be too large to test exhaustively. The states that lead to vulnerabilities are not the states used during normal program execution.
For example, if you’re parsing a file, your code will be battle-tested on correct and slightly broken files. It may be further tested on random fuzzed files, but there may still be a very specific combination of inputs that never happens in real files, and that’s too rare to be discovered by random chance, but which can be deduced by an attacker.
e.g. NSS – millions of users, decades of real-world testing, fuzzed, and yet missed a basic safety issue:
https://googleprojectzero.blogspot.com/2021/12/this-shouldnt-have-happened.html?m=1
Yes, testing your software is a way to insure you reach your goal, but that post doesn’t talk about « how to be sure the full system is really tested completely? » For that matter, I think static analysis coupled with testing is necessary. It’s super easy not to test some part of the code (conditional compilation, target-based branch, edge-cases that require a non-trivial prop-based testing, etc.).
Indeed, testing is for verifying the presence of bugs, but incapable in general of showing their absence.
That’s a Dijkstra quote correct? I feel it’s only fair I respond with Knuth’s:
“Beware of bugs in the above code; I have only proved it correct, not tried it.”
We enjoy short posts, but hard to tell whether in this one the author chooses to confuse correctness wrt a spec with safe execution. Without a spec, nothing is “correct”, and without a type system your runtime errors are deferred to the point where they’re hardest to fix (the call site).
Sigh. More of this. Again.
Type systems let you reason about the code you do not or cannot run. Hopefully you realize this some day.
If my post was so short that it was merely a blurb, then I might hope it could be read a little more carefully. Eg:
If a machine can find our mistakes early, if they can prevent us from even running code that will not behave how we expect, then this is a good and useful thing.
This was not a call to abandon static typing.
Judging from his other posts, I think @Lac-tech is a rustacean.
Perhaps my lack of empathy comes from the lack of effort in the original post. And frankly, after 20+ years of hearing the same arguments being made about type systems, expressiveness, and simplicity with not much to show for it, I get annoyed when there is a self-submitted blurb that says nothing original nor carries any weight. “I will choose simplicity and expressiveness over strict static analysis, every single time” is as meaningful these days as the Agile Manifesto.
Reliable software is a software where anyone at all cares about reliability, who here does not have an example of a system failing the basic data consistency check with nobody with sufficient access to the system reacting to the reported problems?
Well-chosen safety properties allow avoiding spending proactive care on some things you definitely want upheld.
Sometimes, you cannot making writing reliable code sufficiently easy, so the next best thing is making writing unreliable code harder.