1. 20

    Funnily enough I just gave a talk today on the differences between “public” perception of formal verification and “practitioner” perception of formal verification. The big one is that FM people are much less likely to believe there’s something innate to specific paradigms. Pretty much every style of programming has things that impede correctness and interesting things that help us get closer. For example, you say

    The thing you need to look at if you’re using say, a dynamic language, or object oriented design, is that in the long term, what is the language and mindset of “objects” with dynamic dispatch providing you apart from an endless stream of bugs that seem to keep reoccurring everytime you try an evolve the software to introduce a new requirement?

    You’re suggesting here that “dynamic” and “object oriented design” are bad for correctness. Neither is a priori true. Eiffel and Ada are object languages and both are considered fantastic at writing correct software, often by leveraging their OOPness. With regards to dynamic typing, ACL2 is a dynamically-typed, provably-correct lisp. You also see some really interesting correctness stuff with APLs, which are so terse and dense that they almost encourage a “correct-by-construction” style of writing algorithms.

    Fact is there are many, many ways we strive for correctness. We’ve been seeing a lot of work in typed, pure, functional languages, but we often conflate that with typed, pure, functional languages being intrinsically better for correctness. But the space of correctness and formal verification are barely explored and all sorts of interesting stuff turns up outside the narrow band of the public’s perception. An even in the little we explored, “Scala, Typescript, Purescript, Elm, Haskell, Rust, Coq, Agda, Idris” is itself just a couple facets of what we think works. You’d probably agree it’s unreasonable to talk about correctness without mentioning Haskell. But what about SPARK? What about Esterel? P? Dafny? OpenJML? DRAKON?

    Finally, one criticism about form. You say

    To say that it is just “functional programming”, or “object oriented programming” or is simply not true.

    And then immediately follow by talking about how terrible everything but static FP is. If that’s what you believe, don’t hedge or sugarcoat it.

    EDIT: oh forgot to say: this is all on writing correct code. Correct software development is its own topic with its own set of crazy interesting ideas. My specialty is in formal specification, which I didn’t even talk about here. I think it’s revolutionary. And that is itself another niche of what we already know is possible.

    1. 7

      On reading the comments here and on HN, I certainly realise that I’ve stepped onto a larger stage than I had originally intended. I’ve conflated “avoidable runtime bugs” with “formal methods” which wasn’t my intention. Some more context on how I arrived at this post might be useful.

      I’ve been writing software in Ruby and Javascript extensively over the last couple of years. We did a significant rewrite of our web frontend code in React. It failed for many reasons, not just at the code level. Chose to do it again in Elm. A huge success - again, not just from a code perspective. I then ventured into doing the same thing with a backend service and chose Haskell. Similar story, both of which I’ll be following up with detailed blog posts (taking many of the comments that I’ve read both on HN and Lobsters.

      My target audience was intended to be developers who continually face errors like NoMethodError: public method 'foo' called for nil:NilClass or undefined is not a function in production code on a daily basis. I certainly faced these issues and after taking my first professional steps into the strongly typed functional realm, I saw that these problems weren’t just solved, but led to other effects such as increased developer happiness, better agility with the code base, which led to many financial gains, etc. This was my experience and I was excited to share it with others, albeit a little prematurely.

      Picking the word “correctness” to convey these ideas was a huge semantic error given what I now know.

      I spent some time today reading some of your blog posts and watched a talk you gave at CodeMesh about TLA+. Super interesting stuff, which I have not come across before. I think the closest thing that I’ve come across (which might be similar but you might be able to help me with the differences) is modelling finite state machines and then using property based testing to check all the permutations - which was introduced to me because of my recent experiences with writing Haskell for production.

      If it weren’t for just an idea of “correctness” being introduced to me (even incorrectly), I may not have even traveled this far down the learning path. Programming in different paradigms and languages helped me do that. Something along the lines of “we shape our tools, and our tools shape us”.

      This was the first time that I’ve submitted an article that I’ve written to the wider programming community and I was super nervous about it. I don’t regret posting it. The criticisms have helped me understand a broader context that I need to be aware of that I might not have come across for months if I kept my head down.

      You are right. I was talking about how to write correct code, but more than that I’m interested in what real world effects can come from that, which I will be exploring in my upcoming posts.

      Thanks for taking the time to read the article, and thanks for your detailed reply. We live and learn. If you’re ever in the vicinity of Melbourne, Australia let me know - it’d be great to catch up and spend some time talking about these points.

      1. 2

        Eliminating NoMethodError and undefined is not a function are definitely aspects of correctness. I really doubt there’s a software specification with requirements that either of these should happen. Why do you think that “correctness” was an inappropriate word for this?

        What are you doing with Haskell in production in Melbourne?

        Thanks for writing about what you’ve been learning!! Really good job.

        1. 2

          btw I live in the bush of Tasmania, I guess that’s maybe within the “vicinity” of Melbourne :)

          1. 1

            Most of the people I know who live central to Melbourne would consider Belgrave (where I live) and “in the bush of Tasmania” as roughly equivalent ;)

            1. 1

              I lived in Harkaway in 2015 & 2016, yeah, they do :)

          2. 1

            It may not be inappropriate but certainly, I wrote the post with a very different meaning in mind. I think in retrospect I’m probably more in the space of compile-time guarantees with some higher level type-driven development (which I wish I had one word that could capture all that) but that’ll be a post for another day.

            On the Haskell front, I’m porting some parts of a Rails codebase into microservices. There are many reasons for this; reduced runtime errors, putting software from another paradigm into production, having another technology stack for comparison within the team, facilitating my own growth as a developer and the growth of those around me, etc.

            Thanks for reading! Tasmania is “close”, let’s catch up :)

          3. 1

            Sorry if I was too aggressive. One thing I want to riff on:

            I’ve been writing software in Ruby and Javascript extensively over the last couple of years. We did a significant rewrite of our web frontend code in React. It failed for many reasons, not just at the code level. Chose to do it again in Elm. A huge success - again, not just from a code perspective. I then ventured into doing the same thing with a backend service and chose Haskell.

            I empathize: I was a Rails dev for five years and have seen some horrible, horrible things in that language. And I’d assume that the average Elm codebase will have fewer defects than the average Javascript codebase.

            However, I want to be careful to distinguish the contextual benefit from the fundamental benefit. The four languages you said had very different design goals. Ruby was designed to maximize “developer happiness”, JavaScript was originally designed for small webpage functionality (“make the monkey dance”), Haskell as a research language for studying FP and type systems, and Elm to make FRP easier. The former two weren’t intended to be “safe”, while the latter two were (or at least thought more consciously about it.)

            That’s why I don’t want to make general claims about software correctness: we’re comparing dynamic/imperative languages not designed for safety to static/functional languages that were. You could argue that “they chose static/FP because they wanted safety”, but we see a lot of people in software correctness designed powerful and effective languages outside those paradigms. So I don’t think that’s the case.

            I just get really excited about how much there is to explore here!

          4. 4

            “You also see some really interesting correctness stuff with APLs, which are so terse and dense that they almost encourage a “correct-by-construction” style of writing algorithms.”

            I kind of feel what you’re saying. This might be worth elaborating on. If not a blog post, just a detailed comment at some point on APL thread.

            “but we often conflate that with typed, pure, functional languages being intrinsically better for correctness.”

            Aside from typed, don’t we have evidence that functional in general is easier to verify since it reduces state-driven, combinatorial explosion? I mean, you can visually see the complexity expand when you formalize it due to the heap and such. Then, the effects on ordering if multithreaded.

            I have nothing to say about typing. That’s an endless debate. I’d rather just see controlled experiments in many domains.

            “Correct software development is its own topic with its own set of crazy interesting ideas. “

            Getting the requirements right and being able to easily change are two of main drivers for correct, software development. Some early, formal methods formalized requirements for that reason. How well does the language help someone do these things with minimal baggage? That’s another correctness driver.

            “So that is the first thing we must understand: there is a large cost to building something complex, and nothing can drastically reduce it. Formal methods – or any other useful technique – are beneficial only once you understand this. Most of the cost is determined by what you build, not how you build it, so the only way to drastically cut it is to build drastically less.” (Ron Presser)

            On HN side, pron made that point. That’s correct. One of biggest drivers for correctness in high quality and high-assurance software is simplifying it. They tend to be leaner with the interfaces cleaner. Reducing complexity is biggest way to increase success and reduce costs.

            1. 2

              Aside from typed, don’t we have evidence that functional in general is easier to verify since it reduces state-driven, combinatorial explosion? I mean, you can visually see the complexity expand when you formalize it due to the heap and such. Then, the effects on ordering if multithreaded.

              From the same essay:

              This is why people say that “imperative programming is harder than functional programming” – the unstated assumption is that you have functions, and that state is aliasable pointer-like state.

              However, by making the treatment of state a little less lax, you can retain the ease of reasoning while still permitting the controlled use of state.

              1. 1

                aside from typed, don’t we have evidence that functional in general is easier to verify since it reduces state-driven, combinatorial explosion?

                That’s not evidence of anything. It’s an argument based on plausibility.

                1. 3

                  That anything combinatorial explosion increases ability to verify is empirically supported by the fact that automated analysis, esp model checkers, have mainly been limited in effectiveness by combinatorial explosion. It logically follows that a signifigant reduction of combinatorial explosion may increase the effectiveness of such tools.

                  Now, there might be new methods discovered in the future that make combinatorial explosion irrelevant to verification difficulty. If that happens, reducing it might have little to no benefit. Every team claiming that has been wrong or fraudulent (overpromising) so far. Whereas, stuff like counter-example guided reduction helped a lot on effectiveness. That just corroborates my point that anything reducing combinatorial explosion is better for verification by default.

                  1. 1

                    does functional programming reduce combinatorial explosions? Evidence?

                    1. 2

                      The word evidence in the first comment was a link to it. The author explained how programs passing around pointers have to worry about aliasing and mutable state. Folks use stuff like separation logic to deal with it with VCC productivity at 2 loc a day at one point.

                      People analyzing languages with less of that, like SPARK without pointers or Pascals decades before, were moving way faster with more automation.

                      1. 1

                        I read the link. It is an argument based on plausibility, with no data.

                        1. 3

                          You need more data to know that adding pointers and sharing mutable state increases number of errors that are possible, esp heisenbugs? Or that these make program analysis harder? Or both?

                          1. 1

                            Of course I do. Plausability is not evidence.

                            1. 2

                              How bout you just tell me the minimum proof that would constitute evidence to you. Then, I can keep my eye out for it or run some tests sometime.

                              1. 2

                                How about 3 readable verifications of programs more than 1000 lines of code long?

                                1. 3

                                  As in, you want it as human-inspectable as machine checked? Ok. I can see the value. I don’t know if that will happen, though, given almost all R&D is for human-readable specs that machines check.

                                  Remember I said automated analysis, too. They create the formal models internally, apply the analysis, and show us where the problems are. If a prover like Why3, it should either fail somehow on known-bad code or maybe succeed on known-good code. What about that as evidence on human-readable code more than 1,000 lines long?

                                  I mean, that’s kind of what I meant by we already have some data to work with. Those tools were finding lots of problems. Outside the false-whatevers from program errors, most of the limits had to do with the analysis exploding. If you count those, then someone might be able to make an objective comparison in future by just looking at analytical results or performance of immutable, functional code vs stateful, imperative code maybe in same language.

            1. 1

              I think this is a good article but what am I supposed to take away from this Chris? Is anyone arguing against correct software?

              1. 3

                Most of the market and F/OSS projects. They put other priorities higher than correctness. Silicon Valley likes to move fast and break things. Traditional suppliers sell buggy stuff. Then, sell the fixes next along with more bugs. Most open coders are scratching their itch, which aint maximum correctness.

                1. 3

                  Thanks David! There are multiple takeaways that I was reaching for which depend on which perspective you take, but to address one specifically in relation to your question, I don’t think anyone is arguing against correct software explicitly and if they are it isn’t something I’ve seen first-hand.

                  I do think that with the rapid growth and accessibility for anyone to become a programmer, that many of us (including myself) have implicitly accepted flaws in our languages and tooling that don’t exist in others.

                  In this case, I’d say one takeaway is about raising the awareness of your language and tooling choices especially the hidden implicit choices that don’t often present the shortfalls to your conscious mind until days/months/years later.

                  For developers trying to extend their understanding of languages and tooling - don’t be discouraged by friction, persevere with your journey. Reach out to people on community chats, ask for help, post your stories on HN and Lobste.rs! There are many feedback loops available to us all. Use them, learn, then bring that knowledge back to share with your peers to improve your individual, team and company work.

                1. 2

                  This is great news! Reproducibility was one of the defining characteristics that made me want to switch to NixOS a year ago, and it wouldn’t have been possible without the support of the community and people like you, Graham. I don’t even want to think of the number of personal hours/months/years that have been poured into this effort - it’s really something to be proud of. I hope to advance my Nix + NixOS knowledge to also contribute in a meaningful way in the near future. Thanks to you, and all the contributors for your ongoing hard work on this project!

                  1. 3

                    Thank you, Chris! I must say that I personally have not done all too much work on pushing for binary reproducibility. This has been truly a community effort. My goal with this project was to showcase the incredible effort of the team, and also try and encourage a push to get the rest done.