1. 14
  1.  

  2. 7

    Once again, this shows up despite evidence to the contrary. The comparison to NASA is ridiculous. That process is so ridiculously slow and expensive that other high-assurance projects compare themselves to it regularly in both defect rate and cost showing how unnecessary it is. You’ll see tagline “compared to the Space Shuttle’s… blah blah.” Altran/Praxis is an example that supplies nearly defect-free software for a 50% premium that doesn’t take hundreds of people to write. Their unique aspects are using good requirements-gathering, Z specs just to catch English or interface inconsistencies, and Ada + SPARK Ada to prevent boatloads of implementation flaws. Before that, Cleanroom Methodology got quick time to market with anywhere from less cost to 30% more. Low since it knocks problems out early when they’re cheap. Similarly with Fagan Inspection Process. LOCK project for secure computing at highest levels of rigor was 30-40% premium. You’re talking 1-2 extra developers on a 10+ team with tooling that cost similar to existing tooling. Some recent ones are using Haskell with QuickCheck, Ocaml, or Prolog for reducing spec-to-implementation incompatibilities or implementation errors.

    So, this claim that it cost so much to knock out lots of problems is false from the start. The Hypothesis tool is interesting. Python programmers should certainly try it. Whereas, methods like Cleanroom from the 80’s knocked out most defects (esp user-facing) from the start often on first try and sometimes without code execution by developers (obsolete but impressive requirement). I’m sure modern tradeoffs with memory-safe languages, interface checks, and testing tools could do even more even faster given work like Altran Praxis and Galois Inc. One person I read a while back even combined Python and Cleanroom for advantages of both.

    So, the author certainly can argue that his tool will find problems easily in an environment where good QA isn’t supported. It might even supplement good QA. That you need to have NASA budget to get near their error rate is a myth. The only thing consistently in jeopardy for increasing assurance is time-to-market. That’s why I’ve been writing posts about methods that knock out tons of errors without taking much time or effort. High elimination of defects while low wait for delivery. Totally, different picture than CMM whatever processes.

    1. 3

      Thank you for bringing up Cleanroom Methodology. I had never heard of this before and the high-level wikipedia entry makes me think that some of the tenets are basically the good parts of agile and TDD but without the bullshit. Do you have any suggestions for reading about Cleanroom to get a better feel for it?

      1. 2

        It took a while to find a good intro since resources I speak from are paywalled in ACM/IEEE & I could find no training guides. The good intro I found said that was on purpose: IBM forced people to learn by paying them at events since it was a competitive advantage and money-maker to them. Similar to them keeping PL/S language secret & squashing a publication with lawyers. The best page on Cleanroom from someone who went to that training is here:

        http://infohost.nmt.edu/~al/cseet-paper.html

        Note: I submitted that here 5 months ago where it got 11 upvotes & no comments. Sad given it’s something that should really provoke discussion & experimentation on software forums everywhere.

        Dr Dobbs did a write-up on it showing the more formalized aspects:

        http://www.drdobbs.com/architecture-and-design/cleanroom-software-engineering/184405405

        Good guess on similarity to agile and TDD: I used to tell their zealots they were largely reinventing the Cleanroom wheel with less quality. The basic method is decomposition of high-level specs into functions with a constrained way of expressing them and annotating intended behavior that makes verification easy. This is done iteratively as in Spiral and Agile. A verification team systematically analyzes the code for problems both in meaning & coding itself. That requires limiting programming constructs used & how program is expressed to make that tractable. The testing is usage-based testing created from simulations of how customers will use the software or specific features. The clever idea being it’s better to leave in 10 bugs they never experience in production than even 1 they see. User’s positive perception increasing software’s value is a business argument for Cleanroom. Finally, after enough software is produced by a team, Mills theorized the defect rate should fall in a range that could be statistically certified. This number could be used (was used) in warranties that rated software at a specific, defect rate. It could also be used to improve weak spots of developers' skill areas.

        Now for some meta stuff I learned. First, the paywalled sources of early use showed the method worked. The defect rate was super low on Mills' teams. The other effect that’s unusual in QA is it dropped the defect rate a lot on first try. That shows it captures intrinsic aspects of software correctness but those should be obvious to yall from my description. They didn’t allow developers to compile & test their own code even though many admitted they did anyway before submitting to avoid embarrassing failures. Although sort of thought unnecessary, I found one source say the real reason for no compiles was computer time was scarce enough in the 1980’s that compiling less programs let them get more actual work done. We can drop “don’t compile & test it” since we know doing that is beneficial. Although PSP/TSP shows metrics can improve programmers, I’m currently for dropping any claim of statistical, final, defect rate on the software being stated to customers unless team keeps doing similar kinds of software. Being great at one domain doesn’t mean they’ll be great at another & vice versa. The metrics can be shown as a differentiator or used for warranties, though, in that the company promises to fix at least a certain amount on the house. When picking the rate to use, one might go with either the current median or the highest ever encountered depending on how greedy or altruistic. ;)

        Also, although CASE tools were made for it, the Cleanroom method could be embedded in a high-level programming language much like people do functional programming in C or C++ by only using specific, language features in specific ways. I thought this long ago with the link I just shared confirming it: Stavely’s students did this. I also believe that combining Cleanroom with functional programming, Design-by-Contract, assertion-based testing, and random testing could lead to some of highest productivity and quality achieved with the method. They’d be working at a higher level with languages intended to work with functional composition. Right tool for the job. Note that it doesn’t have to be top-down, though, so much as there be a connected chain of functions between high-level intent and low-level modules at point software is distributed.

        There’s still exploration to do on intended functions vs Design-by-Contract-style assertions. We do one, the other, both, intended functions in the assertions, what? (shrugs) One thing I’m sure on at least to start with is making them do human verification before testing to drill into their brains the patterns of what correct and incorrect programs look like along with systematic way of looking at them. None of that “I glanced and thought it looked right” bullshit people call code review these days. Also, prior work in high-assurance on formal specifications showed that they found more defects modifying the specification for use in a prover than using the prover itself. I intuitively know there’s a tie-in to that here where Stavely points out how they had to learn to simplify & properly structure their programs for easy, human verification. It’s that mental process or habit that I’m trying to embed in their mind by not letting them throw their programs at testing tools before human verification. Like Agile showed, people will learn to spot patterns of where trouble is likely to show up in a code base & know to eliminate it immediately.

      2. 3

        Your comment appears to support the thesis of the article. “Make it cheaper to find bugs.” Hypothesis is a Python version of Haskell’s QuickCheck, with more functionality than the original QuickCheck.

        1. 2

          @nickpsecurity wrote:

          The comparison to NASA is ridiculous. That process is so ridiculously slow and expensive that other high-assurance projects compare themselves to it regularly in both defect rate and cost showing how unnecessary it is.

          I’m not sure if I misunderstood your comment here, but I believe the author agrees with you. He specifically wrote:

          Also, if you look at the NASA development process you will probably conclude that we can’t do that. It’s orders of magnitude more work than we ever put into software development. It’s process heavy, laborious, and does not adapt well to changing requirements or tight deadlines.

          I’m not familiar with Cleanroom, Altran, or Praxis, but it sounds like to me the author is advocating the use of tools and techniques similar to those to make it easier (or “cheaper”) to write correct-enough software than it is with a NASA-style approach.

          He didn’t specifically acknowledge those tools, and perhaps that’s what you’re reacting to, but I didn’t read him as saying “without Hypothesis and specifically Hypothesis you need a NASA-level effort to get anything right!” He even wrote:

          And so this is the lever we get to pull to change the world: If you want better software, make or find tools that reduce the effort of finding bugs.

          Obviously I think Hypothesis is an example of this, but it’s neither the only one nor the only one you need. Better monitoring is another. Code review processes. Static analysis. Improved communication. There are many more.

          1. 2

            The author disagreed with my position before on basis that most developers won’t do QA at all or use anything but a near-pushbutton tool. Hypothesis was his solution. So I countered that as bringing them methods like Cleanroom would eliminate more defects. They can add Hypothesis to that, too.

            Also, the author is misleading people by making them think low-defect software has a typical cost or time-to-market of NASA’s. They’re such an outlier people probably shouldn’t even mention them. He’d have been better off referring to formal methods in industry where at least refinement proofs support his position. Yet, I could still counter it if we’re talking about lightweight methods like model-checkers or formal specs since those surveyed usually say time spent is lower than time saved during debugging.

            So, there was more than this post in isolation I was responding to.

            1. 2

              Ah, I understand. Thanks for elaborating.

          2. 2

            You’re mostly right, but Altran / Praxis / Galois do have the advantage of being staffed by people who especially hate bugs (I’d expect). From personal experience, that alone helps a lot.

            That said, Cleanroom / Fagan / etc. have been used widely enough - and Praxis etc. claim sufficiently spectacular results - that there is definitely something there. Do you happen to be aware of studies looking at the more junior end of developers?

            1. 1

              The reply I just posted to apy in this thread has a lot of detail on Cleanroom applied to people with no experience at all. Check it first. Defect rates stayed under 10 per 1,000 lines of code if they were familiar with the programming language. Data on the Eiffel Method and Design-by-Contract was always positive when people were doing real specs. One experiment on stronger specs (not proof though) caught more problems. However, industry data indicated many of them weren’t doing anything past null checks since mgmt didn’t push them on it. The Abstract, State Machine method was used to formalize all kinds of stuff whose teams must have had a mix of skill levels. Always succeeded in knocking out spec errors that would’ve been costly or impossible to fix once some of those were deployed (esp industry standards). NIST data indicated combinational testing at 3-way knocked out over 90+% of errors with them detecting nothing after 6-way. Each, simple, inexpensive method got good results in any case studies. Combining the simple ones on developer side w/ semi-automatic, test generators & fuzzing running overnight should arguably get high quality at reasonable, cost-benefit ratio given above data. My hypothesis has to be tested of course but prior data gives me a lot of confidence in it. At worst, some person-hours will have been wasted overdoing it a bit that were paid for by those saved during debugging if we’re talking inexperienced programmers. Those experimenting can then just dial-down a bit on the methods used to see what happens.

              I mean, just look at how reliable things like OpenVMS and AS/400 got just doing something similar to Fagan Inspection Process on design & implementation. I’ve never met an AS/400 user whose witnessed the system crash. Many used them daily for 10-20 years, too. VMS admins sometimes forgot reboot command with clusters lasting years (record was 17). They were able to keep competitive pace with industry with plenty of profit despite whatever design and quality assurance they were doing. Their decline happened for market reasons rather than development failures. I hope and believe we can use methods like I describe above to get junior developers closer to that level of quality or reliability. Probably still best to have at least one, senior dev on the team who will spot bad ideas or re-apply prior solutions.

            2. 2

              I always find your comments very interesting and insightful. Do you have any recommended readings for those outside the high-assurance field who’d like to start learning about it (and ideally start applying some of its practices in their day jobs)?

              1. 3

                Why thank you! :) I did. The problem is the info is scattered, usually specific to a subset of it, and little uptake by mainstream leads to many site owners to let their links die. I periodically go dredge up whatever I can but it’s time-consuming work. I might in near future do that again to at least find intros to field overall, examples from various categories of assurance methods, case studies in industry if I can find non-paywalled ones (almost all are…), what tools I’ve found to support it, and suggestions for improvements on any of that. I’ll just add your name on my list of people to send that to if I get around to it.

                Meanwhile, the Cleanroom reply I posted in this thread will given you a good idea of the mindset. Another that’s more rigorous is Altran/Praxis method below which combined Z, CSP, and SPARK language. There’s intros & FOSS you can Google but replace CSP with TLA+ optionally with hwayne’s tutorial. DeepSpec is cutting-edge example of “mathematically-prove everything” style of development. A few companies cheated by encoding business rules or specs into first-order logic executed in Prolog or Mercury languages. Prevents lots of mismatches & coding errors with no extra effort albeit you gotta be able to work with the notation. Eiffel’s Design-by-Contract combined with memory-safe and/or functional language would be another good cheat at getting some formal benefits without really doing formal.

                http://www.anthonyhall.org/c_by_c_secure_system.pdf

                https://deepspec.org/main

                https://dtai.cs.kuleuven.be/CHR/files/Elston_SecuritEase.pdf

                https://www.eiffel.com/values/design-by-contract/introduction/

            3. 2

              This should be common knowledge by now, but it isn’t and it’s good to see somebody else saying it.

              1. 2

                How can you post this article and not have read the book… literally called… The Economics Of Software Quality.

                So this is the problem with shipping buggy software: Bugs found by users are more expensive than bugs found before a user sees them. Bugs found by users may result in lost users, lost time and theft. These all hurt the bottom line.

                So close, but no cigar. Do not pass go, do not collect $200. Imagine how cheap it would be if you just decided to be more efficient at finding the bugs than users! What if you decreased your defects immensely merely by a 2 hour meeting where you went over detailed design?!

                The following paragraphs just read like apologies for modern software management theory, which largely has produced some of the least trusting, least empowered users in history.

                1. 2

                  I’m not sure what makes you think the author hasn’t read that book, or how the article is an “apology for modern software management theory”. Could you elaborate on what you mean? (Especially for those of us who are not familiar with the book.)

                  @codemac wrote:

                  What if you decreased your defects immensely merely by a 2 hour meeting where you went over detailed design?!

                  I don’t understand what you’re getting at, but I think the author would approve of any cost-effective method of reducing defect rates, including design reviews. He even wrote:

                  And so this is the lever we get to pull to change the world: If you want better software, make or find tools that reduce the effort of finding bugs.

                  Obviously I think Hypothesis is an example of this, but it’s neither the only one nor the only one you need. Better monitoring is another. Code review processes. Static analysis. Improved communication. There are many more.

                  1. 3

                    To your first point about me thinking the author hasn’t read the book: he named his post after it AFAICT, and then made the opposite conclusion around who should be finding the bugs. The engineer should be, as early as possible. Optimizing the case where users find bugs is fine and all, but the textbook I describe goes to pretty extreme lengths (like 80%+ of the book is just case studies) to show that a single bug found by a user costs as much 100s if not 1000s of design bugs. Regardless of how “efficiently” they found it, the cost to the organization will always be more later in the pipeline of product delivery.

                    This then continues into the concepts of modern software management theory, look at the sentence just after your quote:

                    But one thing that won’t improve your ability to find bugs is feeling bad about yourself and trying really hard to write correct software then feeling guilty when you fail. This seems to be the current standard, and it’s deeply counter-productive. You can’t fix systemic issues with individual action, and the only way to ship better software is to change the economics to make it viable to do so.

                    The claim that you can’t fix these issues because the economics are not in our favor is flatly wrong. Cleanroom ==> statistical defect tracking so companies can make process change, much like silicon fabrication does, so you can facilitate systemic product quality improvements. The economics are already in our favor to make it viable, and the data is the book I posit once again the author has not read, nor countless studies that show the opposite.

                    Modern management theory is basically that because it’s too time consuming and expensive to prove anything with any detail (due to the ever changing marketplace of shitty products), and that because engineers can’t go from mathematical proof to executing code without hand writing it, it’s all worthless. Both of these are not true, even though they are regularly repeated.

                    The best is the question from Lamport: why do architects use blueprints if they can’t generate buildings from blueprints?

                    1. 3

                      “Optimizing the case where users find bugs is fine and all, but the textbook I describe goes to pretty extreme lengths (like 80%+ of the book is just case studies) to show that a single bug found by a user costs as much 100s if not 1000s of design bugs.”

                      I’ll add that this has been confirmed by about every survey, lessons learn, industry study, academic study, and so on I’ve ever seen. The cost/time can vary wildly on the methods used for correctness or verification itself. However, the fix is always cheaper or easier early on in the lifecycle. That’s because flexibility is higher, the fix is applied by people knowing context/intent of what they’re fixing, the baggage of the rest of the implementation hasn’t built up yet around it, and there’s no risk of disrupting production users to worry about. There may be more angles but that’s just off top of my head some of more critical reasons.

                      1. 3

                        To your first point about me thinking the author hasn’t read the book: he named his post after it AFAICT, and then made the opposite conclusion around who should be finding the bugs. The engineer should be, as early as possible. … a single bug found by a user costs as much 100s if not 1000s of design bugs.

                        I don’t mean to belabor the point here, but I don’t understand how you concluded from the article that the author believes users should be finding bugs. While he said it might be nice in theory to have users find bugs, he ultimately called it “rubbish” from a business point of view precisely because it’s so expensive to have users find bugs.

                        He believes like you do that engineers should find bugs, and his point is that the cheaper we make it for engineers to find bugs – whether with better testing tools, design reviews, or something else – the more likely engineers will write correct, high-quality software.

                        His primary contribution to this effort, which he mentions at the end of the article, is Hypothesis. It’s a property-based testing library that engineers use to thoroughly test their code and, to some extent, even their design. It’s not the end-all and be-all solution to software quality by the author’s own admission, but rather a tool to be used among others. And it’s not something any user is going to be using! It’s a developer tool.

                        Anyway, I guess I should stop defending someone else’s writing because I don’t really have a dog in this fight. I just didn’t understand some of the objections raised in the comments here and wanted to clear things up.

                        1. 2

                          “Anyway, I guess I should stop defending someone else’s writing because I don’t really have a dog in this fight. I just didn’t understand some of the objections raised in the comments here and wanted to clear things up.”

                          I think we did as far as the two of us go in this conversation. You brought something up, we all discussed/debated it, people learned things (I found a whole book! :), and now we’re apparently done. All good to me.

                      2. 3

                        Hey, look what I found:

                        http://romisatriawahono.net/lecture/sad/reference/Jones%20-%20The%20Economics%20of%20Software%20Quality%20-%202012.pdf

                        Enjoy. :) I might read it, too, as I never got to in the past. I just read summary points that jibed with what I saw in my research.

                        1. 2

                          Verrrry dry reading, FYI. But after you spend an hour or so poking around in it, you get a feel for how to use it as reference.

                          1. 2

                            Appreciate the warning. Thousands of CompSci papers got me used to skimming around to what I need. Hopefully it helps. :)