Counterpoints: Correctness is Easily Defined (maybe), Levels Mix Up, and People Issues Over Tech Issues
(or “Brace Yourself Peer Review is Coming!”)
My typical model for describing correctness is Abstract, State Machine since ASM’s are more like Turing machines for structures. Really easy to learn for ordinary programmers compared to most formal models. Moore and Mealy state machines are basic for this on finite side. If modeling as such, then correctness is simply that your system produces the intended behavior in terms of external output and/or internal states/transitions upon given input(s). Yours are subsets of that at best. Definition 1 is subset observing one or more state machines give incorrect output for input. Definition 2 is subset identifying specific inputs and transitions that led to incorrect output. Definition 3 kind of jumps sideways to focus on arguments made about specifications rather than the specifications themselves. Both might be incorrect. Correctness still reduces to same thing that I could model each of your definitions with. Even modifying it to say “correct under reasonable operating assumptions” is same where “CPU, RAM, or HD aren’t faulty” becomes another input that must be true when CurrentState or Output are correct.
In objective case, your argument that correctness is hard to define or has many definitions seems wrong. If applied to subjective definitions, it’s true where people come up with definitions narrow enough to miss important aspects of correctness. You do a good job illustrating several perspectives such as runtime, code, and logical analysis that can help or hurt depending on what aspect of correctness one wants to achieve. These levels might be too simplistic in real world to tech correctness, though. Some examples follow.
In Level 2: Code, we do consider some behaviors that can not happen directly in code when writing reliable or secure programs. We might write code to get the algorithm correct first, then do an assessment of known issues outside of code that affect code, modify the code to include those solutions, and then we’re done. It still happens at code level even if originally the knowledge came from a different level. A good example is software caches that aren’t really about the semantics of the code but we do modify the code we write to reduce misses. It’s a requirement/design detail that, when applicable to code, causes a transformation of that code. Eventually, the coder might automatically write code that way as part of coding style (esp with caches). Other examples include running out of memory at random points, bit flips, or side channels in CPU’s. These have code styles to mitigate them people can use without even understanding the logic behind it which can be enforce by code analysis tools.
In Level 3: Design/Logic, it assumes logic specs and implementation are a different thing. The most successful by the numbers use of formal methods in imperative programs are contracts and assertions. Most practical that subsets formal methods being Design-by-Contract. These tightly integrate logic dictating correctness with the source code where they’re both created and considered at Level 2. In some implementations, the assertions become runtime checks that are activated in Level 1. So, the assertion-oriented methods operate at Levels 2-3 simultanteously when writing specs/code with 1-3 happening together when testing specs/code.
“That can only be seen when doing formal verification, where a program’s properties and assumptions are written just as concretely as its source code. “
Or Design-by-Contract with verification condition generator. The more specs and VC’s they see, the more complex the behavior might be. These things could possibly be converted to English statements about the software, too. There’s ongoing work in natural-language provers. You’re right that the common case is the specs are implicit in code or in developers’ heads “scattered” about in inconsistent way. That’s a strong argument for explicit specs.
“Level 3 is all about how modular are the interactions between the components of your software”
They’re going to tell you OOP or FP plus dynamic languages solve that. It’s why Smalltalk and LISP systems are so malleable. They also supported live debugging and updates. Many long-running systems with acceptable level of failures achieved with English specs plus memory-safe, dynamic, maintainable code. Not a formal spec in sight. Lots of half-assed versions of aforementioned concept in mainstream languages with bolted-on abstractions, middleware, clusters, and so on. You might want to think on this more to determine how to describe why you’re solution is superior to that for practical reasons. If it even is given that static, verified code vs safe, easy-to-modify, dynamic code being better in general case for “acceptable correctness” is ongoing debate among developers in CompSci, industry, etc.
“At Level 2, this was totally fine, since freed memory in DOS was valid until the next malloc, and so the program worked. At Level 3, this was a defect, “
Another reason I don’t like the levels for these subjects is we already have the SDLC models and other stuff like that which has a way to specify requirements, design, and code with traceability. There’s piles of methodologies and tools for doing this. What this section says is that, ignoring best practice, a developer only considers the code rather than the hardware or OS considerations that will be introduced in “requirements” or “design” documents saying the code has to fit within that model. Then, you illustrate the problem that ignoring everything but code creates. I know there’s plenty of developers out there doing it but it doesn’t mean we need new levels or other models for the basics. Just illustrate with good examples like yours why we have those basic concepts there for software development lifecycle. Note that I’m not pushing order of Waterfall so much as the categories of development.
“The HTTP “referer” is forever misspelled, and that SimCity special-casing code is still there 30 years later.”
You’re treating it all as one thing. The SimCity software might have been cheap to the people who built it. The company still makes games. So, I’m guessing SimCity made them a lot of money, too. That software was totally successful as a product at its goal “for that group at that point in time.” It’s true the software might become a problem for another group down the line needing it to do a different thing due to technical concerns in the code. However, it was at acceptable quality and successful before with it not being in another context later. What you’re seeing here is the effect of economic/human factors dominating technical ones when deciding if specific methods are good methods (esp good enough).
We should always consider such things in our exhortations because those listening in FOSS or industry sure as hell will. Many’s model of operation means they don’t need to care about long-term quality. FOSS folks or CompSci students (esp w/ non-reproducible research) might not need to care about even current quality. Others don’t need high runtime quality so much as “it works often enough with it easy to fix.” Others add fast rate of change to that. Others, esp bigger companies, will talk developer costs for specific lines of code on top of that. The goals and metrics vary across groups. If you want to convince them, you need to target your message to those goals and metrics. If you don’t want to relent, you’ll have to target groups that are using your goals and metrics such as low-defect rate, higher flexibility for change, increased predictability (aka lower liability), and so on. That’s a smaller segment.
Like you, I spent a lot of time talking about objective metrics focused on the tech when the groups’ intended goals, failure tolerances, and subjective metrics are where the ability to influence is at. I wasted a lot of time. Now, I focus on teaching them how defects can be found faster with little extra work (esp at interfaces), how software can be changed faster, how it can be diagnosed faster, how it can be fixed faster, and how rollback can work well and/or fast. And with what cost, level of support, etc. This is their language.
“I now have two years experience teaching engineers a better understanding of how to avoid complexity, improve encapsulation, and make code future-proof.”
Glad you’ve been doing that. Keep at it. All I’m doing here is showing you how to improve the message.
“ The three levels deal with different views of a program: executions, code, and specifications. Each corresponds to its own kind of reasoning.1 Let’s bring in the math!”
Now this is true and where you’r use of levels shines with good examples like overflow. However, the analysis seems to miss that these are common problems at the code level where simple checks can take care of them. Even your malloc example might be handled in a library implementation where some things the spec tracks were tracked by compile-time analyses or run-time library. Such non-formal methods ares typical of memory-safe languages, embedded systems catching integer overflows in code, watchdog timers to dodge termination proofs, and so on. The overhead they have is acceptable to many or maybe most given dominance of such methods in industry and FOSS. Others who want less overhead or more determinism might like the techniques your bring to the table for Level 3. I already refuted separating them too much with examples such as Design-by-Contract that can do all of it at once with lots of automation (eg property-based testing or fuzzing plus runtime checks). I agree prior work shows many tools work best focusing on just one level, though.
I think you prefer clean, isolated, and static ways of analyzing or looking at system correctness when real world is just messier in terms of solutions available for one or multiple levels. It’s a lot messier. Embrace the mess! Like a consultant, factor it into your analyses so your recommendations show the various options at each level for different classes of problem with the trade-offs they take: time/money saved now or later via less debugging or fixes earlier in SDLC, extra time/money for specific verifications (these two biggest where development pace is biggest), compile-time vs runtime analysis, needs runtime checks or not at what cost, and support in their tooling of choice. It’s more work for people like us to do this. However, as some see value and more uptake happens, network effects might kick in where they do some of the evangelizing and tool work for us. That’s on top of potential niche market for specific solutions that work well like we see in DO-178B/C for static analyzers and so on.
You should make a blog or something and keep all the longer form writing there. Then you could reference it easily on forums.
I’m actually closer to having a blog than before. It’s probably going to be up this year after a decade of procrastination on that.
That would be a blog, where I would subscribe before you even publish something. Please do. :)
What has been your primary hindrance, if you don’t mind my asking? There are (and have been) many free blog hosting platforms, that make it quite trivial to start writing. Also, if you desired to host your own, services like github pages and netlify offer very compelling free tiers.
There are (and have been) many free blog hosting platforms
There are (and have been) many free blog hosting platforms
Is there a good one?
I dont mind you asking but rather not say. You could just say my background, current circumstances, and procrastination habit all added up to hold off since I already had places to share ideas. On that latter point, I also found it easier to help others by going where crowds were already at than trying to do lots of self-promotion. I was drafting the heavy hitters instead of building a turbo-charged, 18 wheeler of my own. Whether good or bad idea who knows.
I am switching gears on quite a few things this year. Gonna try a piece or just a few at a time to avoid overwhelming myself.
Hi, OP here.
Thanks for the long response. The typography of this comments section is really not meant for text of this length. As such, I’m having a lot of trouble reading this, for multiple reasons. I can’t tell how you disagree with me. In fact, I can’t even figure out if you disagree with me.
Okay, I’ve read it through again. Here’s my attempt at a summary of what you wrote:
None of this seems at all opposed to anything I wrote. Am I missing a key point?
There is one minor point of difference though: I’m not sure that Level 3 has an analogue in some variations of automata-based specification. This is because an automata-based specification is a very whole-system view of the world. There’s not exactly a notion of “This implementation automaton is a correct refinement of that specification automaton, but for the wrong reasons” unless you have a hierarchical construction of both. Hence, if you try to think of your system as implementing some kind of transition system, you may not realize if you have poorly-defined component boundaries.
I expect this phenomenon is related to why LTL-based program synthesis performs so poorly.
Sidenote: Objects are very misunderstood and underrated among the intelligentsia. I highly recommend reading William Cook.
Sidenote: Objects are very misunderstood and underrated among the intelligentsia. I highly recommend reading William Cook.
Sidenote to your sidenote: while we’ve mostly accepted that there’s several divergent strands of Functional Programming (ML, Haskell, Lisp, APL), we haven’t really done the same with OOP. That’s why people commonly conflate OOP with either Java ObjectFactoryFactory or say “that’s really not OOP, look at Smalltalk.” While Java is a Frankenstein of different strands, Smalltalk isn’t the only one: I’d argue Simula/Eiffel had a very different take on OOP, and to my understanding so did CLU. But a lot of CLU’s ideas have gone mainstream and Eiffel’s ideas have gone niche, so we’re left with C++ and Smalltalk as the main ‘types’ of OOP.
There are old programming concepts called objects, but they don’t necessarily correspond to meaningful constructs. Cook and Cardelli did discover something that does capture what’s unique about several strands of OOP, and also showed why CLU’s ideas are actually something else. I consider this a strong reason to consider their work to be the defining work on what is an “object.”
This is largely summarized in this essay, one of my all-time favorites: http://www.cs.utexas.edu/~wcook/Drafts/2009/essay.pdf
By the numbers, I’d consider it whatever C++, Java, and C# did since they had the most OOP programmers in industry. Then, there’s the scripting languages adding their take. So, my hypothesis about mainstream definition is that what most people think OOP is will be in one of those camps or a blend of them.
I’ll take the main comment paragraph by paragraph showing what I did for context:
I disagreed with your three models for correctness in favor of one that says “Do states, transitions, and outputs happen as intended on specific inputs?” All yours fit into that. There’s also formal models for it deployed in high-assurance CompSci and industry.
I agreed your levels represent different vantage points people might use. I also agreed some tools work best focused only on them.
I disagreed you need these to explain why code alone isn’t sufficient. The standard software, development lifecycle (even Waterfall) shows places where problems show up. Lots of articles online to pull examples from with your DOS thing being one. Best to just use what’s widely-used and well-understood for that.
I disagreed that stuff in the three levels stayed separate during development. I gave specific examples of where stuff at one was solved at another without mentally leaving that other level. I also showed methods that do multiple ones simultaneously with strong, code focus.
I disagreed that strong correctness of software (a) mattered for many groups or (b) mattered in long term. You have to examine the priorities of people behind products or projects to know if it matters or how much. Then, tie your recommendations to their priorities instead of yours. Ours if it’s software quality improving. :)
I disagreed that strong, logical methods were needed for long-term when specific examples from Smalltalk/LISP to industry’s knock-offs regularly produce and maintain software that meet organization’s goals. Since they’re status quo, you must show both those kind of solutions and yours side-by-side arguing why your costlier or stranger (to them) methods are better.
I agreed some would have your view in niche segments of FOSS and industry that are willing to sacrifice some upfront-cost and time-to-market to increase quality for higher levels of (insert good qualities here). You will do best marketing to them again tying the message to their interests and metrics. I gave DO-178B (now DO-178C) market as example where they invest in better tooling or buy better-made software due to regulations. Ada/SPARK, Astree Analyzer, formal methods, and so on have more uptake in safety-critical embedded than anywhere else except maybe smartcards.
So there’s a summary that will hopefully be more clear by itself or if you read the original post with it in mind.
Thanks. That I can read.
First, please tell me where I say that strong correctness is important or that we should be using heavyweight formal methods in today’s industry. I don’t believe either of those statements except in a small number of high-assurance cases, but you’re not the first to think that I do. Also, what is the “your solution” that you speak of? Points 5-7 seem to all be arguing against something I don’t believe. (As an aside, where are you getting your claims about tool adoption? My understanding from talking to their competitors is that Astree has no market share.)
Point 4: Obviously, you need to think about code (level 3), write code (level 2), and run code (level 1) concurrently with each other. Thinking about these three levels of abstraction absolutely do happen concurrently. What should set off alarm bells is writing code that special-cases a certain input (level 1 influencing level 2), and shaping a design around a poorly-thought-out implementation (level 2 influencing level 3). Obviously, your value/time curve will determine when exceptions are worth making.
Point 3: Ummm…..I don’t know what to say. Obviously, there are a lot of people with no training in formal methods who are nonetheless very good software designers. I do believe that the concepts of PL and formal methods, in various incarnations, whether logic-based or automata-based, are the correct way to think about many software design concepts that were previously only vaguely-defined. These all provide a notion there is a layer of reasoning which shapes the code, but is invisible when looking only at the code. This is what I call Level 3, or “the hidden layer of logic,” and closely related to what Don Batory calls “Dark Knowledge.” (Someone on HNH just linked me to a Peter Naur article which might discuss the same concept.) Probably the clearest example of this concept is ghost code, which many static analysis tools rely on. I also believe that understanding the hidden layer of logic is a shortcut to the higher levels of software design skill, as I’m proving in my coaching practice.
I could’ve misread the intent of your post. I thought you were promoting that people consider, maybe adopt, specific ideas. If you’re just brainstorming, then you could read me as liking or shooting down some of the brainstormed ideas. It all depends on what your goal is. I’ll answer some of this.
First, one thing about your post that looks like an evangelistic push of specific solutions like applying your concepts of Level 1-3 or software engineering practices in general. Here’s some examples of why I thought that:
“No, our goal is to be able to continue to deliver working software far into the future.”
“People sometimes tell me how software is easy and you can just… Hogwash… This is why it’s important to get programs right at Level 3, even if it passes all your testing and meets external requirements.”
“This is why it’s important to make your APIs conform as strictly to the spec as possible”
“the most important parts of our craft deal not with code, but with the logic underneath. When you learn to see the reasoning behind your system as plainly as the code, then you have achieved software enlightenment.”
“always think of the components of your… So much becomes clearer when you do, for logic is the language of software design.”
Your replies to me were talking like it’s idle categorization or brainstorming but those quotes sound like pushing specific practices to some audience of programmers to solve their problems. I assume industrial and FOSS programmers as default audience since they account for most working, practical software out there. If you’re (a) talking to real-world programmers and (b) pushing specific advice, then I responded to that: in terms of what their requirements are; what methods they use (eg design review, OOP, patches, runtime checks); told you to compare what you were pushing to good versions of what they did; and, if your methods still sound better for them, then modify your pushes to argue stuff more meaningful to them, esp project managers or FOSS volunteers. That what I meant by your “solutions” or “recommendations” with my counterpoints on suggested modifications.
I should’ve added that this quote that was 100% in agreement with my recommendations:
“I always recommend trying to think in pure concepts, and then translate that into the programming language, in the same way that database designers write ER diagrams before translating them into tables. So whether you’re discussing coupling or security, always think of the components of your software in terms of the interface, its assumptions and guarantees”
Back to other points.
“Obviously, you need to think about code (level 3), write code (level 2), and run code (level 1) concurrently with each other. “
I’m glad you intended for them to know that. It would be obvious if you’re audience already knows benefits of level 1-3 activities but then you wouldn’t have to explain the basics. Problem was your audience (as I perceived them!) was composed of people you were convincing to use more than coding and a debugger. Given Design-by-Contract reactions, I know it’s not going to be obvious for many of them that logic and code can work together in a way that’s also checked at runtime. Maybe it should be in a similar, future write-up if you do them that they can interleave or that some methods do several at once.
“These all provide a notion there is a layer of reasoning which shapes the code, but is invisible when looking only at the code.”
It’s a neat way of describing the other factors. The industrial developers do have methods for that, though, that you might leverage to help them understand. Big attention goes to modeling languages with UML’s probably being most widespread. In safety-critical, Esterel SCADE is good example. Anyway, the industrial side has modeling techniques and diagrams for various parts of the lifecycle that show relationships, constraints, and so on. Adoption varies considerably just like with strong design in general. Many will have seen that stuff, though. Starting with something like it might increase how fast they understand what you’re saying. You can then show how formal logic has benefits of simplicity, consistency, and tooling. And I tend to especially highlight the automated analyses that can happen since resource-constrained groups love solutions that involve a button push. :)
“I also believe that understanding the hidden layer of logic is a shortcut to the higher levels of software design skill, as I’m proving in my coaching practice.”
Definitely! In that case, you also have people who are trying to improve rather than randomly reading your blog. It’s a different audience willing to invest time into it. It’s definitely beneficial to teach them about the implicit stuff so they see just what kind of complexity they’re dealing with. I fully support that. An example I give to people learning formal verification is this report doing a landing system in Event-B. Specifically, I tell them to note how the small number of requirement and design details balloons into all the stuff in section C onward. I tell them, “Although some is due to the logic, much of that is just you explicitly seeing the huge pile of details you have to keep track of throughout your system to make it correct in all situations. It’s always there: you just don’t see it without the formal specs and explicit over implicit modeling.” That always gets a reaction of some kind. Heck, for me, it made me want to reconsider both not using formal methods (what am I ignoring?) and using formal methods (wow, that’s a lot of work!). :)
Oh boy, this is getting unwieldy. Replying in separate comments.
I view this post as saying “Here’s a good way to think about software.” This does make it easier to come up with a number of design claims, such as “conforming strictly to an API reduces the complexity of future clients.” Obviously, everything has tradeoffs, and needs to be tailored to your situation. For the better engineers I’ve taught, the reaction to a good fraction of this material is “It clarified something I already had a vague intuition for,” and that is also the aim of this post.
This lens does lead to a number of recommendations. In the follow-up post, currently #3 on Hacker News, I give a counterintuitive example of reducing coupling. This is, of course, only useful to those who desire to reduce coupling in their code.
I’m compressing most of the second half of your post into “For people already familiar with certain design-capture methodologies, you can help explain your ideas by comparing them to what they’re familiar with.” Although I once came close to taking an internship with the SEI, I can’t say I’ve had the (mis)fortune of working anywhere that used one of these methodologies (startups gotta iterate, yo), and I’ve read 0 books on UML or any of them.
So, if I’m going to go this route, do you have anything in mind that, say, more than 10% of Hacker News readers would likely be familiar with? Otherwise, referencing it won’t fill the desired purpose.
Thanks for sharing the Event-B report, BTW.
First of all, I love love love how vibrant the Lobsters formal methods community is getting. I’m much more likely to find cool FM stuff here than any other aggregator, and it’s awesome.
Second, maybe I’ve been spending too much time staring at specifications, but I’m not seeing how level 1 is different from level 2. Is level 1 “this is broken for probable inputs”, while level 2 is “this is broken for some inputs”? Different in degrees of probability.
Level 1 is statements about specific executions; level 2 is statements about the implementation. This is for all statements, not just correctness.
First of all, I love love love how vibrant the Lobsters formal methods community is getting.
First of all, I love love love how vibrant the Lobsters formal methods community is getting.
I believe crustaceans care more for high-quality code than the average programmer. Maybe pride of the craftsmen? This is a distinction to Hacker News, where monetary considerations get more attention. Formal methods is certainly one of the big topics to improve the quality of code.
I also love that we get more formal methods discussions taking place here!
I am however not sure how much “depth” is accepted; should I post any paper I find interesting here, with short summaries why and personal reflections?
I usually just post the papers. One thing I do, though, is try to make sure they show some practical use along with what they could or couldn’t handle. Especially in abstract so readers can assess with a glance. I’ll sometimes add summaries, reflections, brainstorming, etc if I feel it’s useful. I will warn the acceptance are hit or miss with many of the PDF’s getting 1-3 votes. Then, out of nowhere, they really appreciate one. Also, I only submit anything important on Monday-Friday since many Lobsters seem to be off-site on weekends.
So, that’s been my MO. Hope that helps.