1. 4

    Yes, generative/property/fuzz tests are awesome. I use them in Elm whenever I can. I’ve also written a library that uses them for the “Msg / Model / update` triplets of The Elm Architecture: http://package.elm-lang.org/packages/Janiczek/elm-architecture-test/latest

    Used them eg. on my implementation of a text editor, and they found so many edge cases, you wouldn’t believe. https://github.com/Janiczek/elm-editor/blob/master/tests/CodeEditor.elm

    They effectively allow you to pin down the behaviour of a function to a certain degree; giving you counter-examples like “You said the cursor wouldn’t go to a negative column, but if you do [Insert ‘a’, Left, Backspace], the cursor is -1!”, with the Msg list minimized and all that (it probably found the bug with ~100 Msgs in the list, and threw out those that don’t matter)

    EDIT: What would I love in generative testing would be some scheme that allows for it to be a monad (to have a andThen : (a -> Fuzzer b) -> Fuzzer a -> Fuzzer b function) and still be able to shrink well.

    1. 3

      EDIT: What would I love in generative testing would be some scheme that allows for it to be a monad (to have a andThen : (a -> Fuzzer b) -> Fuzzer a -> Fuzzer b function) and still be able to shrink well.

      FWIW this is literally the main innovation in Hypothesis. :-)

    1. 7

      Advance warning: Uh, this got long. Sorry. I’ll probably turn it into a blog post.

      I would say I’m pretty indifferent to coding per se, but that what I love is things that combine intellectual challenges with practical problem solving, and coding is often a good source of that.

      Thing is, coding at work usually isn’t. Most tech jobs are kinda boring, because most of the problems people are payed to solve are variations on a theme and while you can make them intellectually engaging this often this results in a worse result than if you’d just ploughed through and done the work (there’s a reason we have a notion of “overly clever solution”).

      As far as whether programmers who enjoy coding are intrinsically better at it… no, of course they’re not. They are however often extrinsically better at it: People who enjoy a thing do more of it, and get more practice at it as a result, and so get better at it.

      But it’s important to bear both of the above points in mind: People get better at the thing they practice, but the sort of coding that people practice on their own time is not the sort of coding that you typically need for your job. Some of the skills are absolutely transferable, but some of them are not, and often what you end up with is developers who are over-optimised in some skills (often far beyond what they need for what they’re actually doing) and weak in other equally vital ones. Often those are “soft” skills, but they don’t have to be - I think ops, testing, debugging, architecture, are all pretty common examples of under-practiced skills for developers.

      Answering question from down thread…

      That’s awesome that your hobby and profession aligns 🙂

      I think this an overrated goal personally. I have a lot of hobbies that I have briefly considered turning into professions and immediately gone “noooope” over. The nice thing about having your hobby and your profession separate is that emotional baggage from one carries over to the other less - if you turn your hobby into a job, it’s very likely that the things that inevitably make you mad about your job will also ruin the hobby for you.

      It kinda works out OK in programming because the hobby version is so different from the professional version, but the end result of that is often feeling like the professional version is a bad imitation of the hobby version. I’ve seen a lot of burnout happening that way (I think. I may be projecting RE reasons).

      Do you have any advice for people who currently don’t enjoy doing what they do for a profession and how to get there?

      (I do not want to personally offer career advice because my career is in a very weird place, so this is a bit generic)

      My advice to everyone in all circumstances is to over-analyse your problem to death in writing (I’m currently using a hand-written journal for this). Even if you don’t come up with a solution you’ll probably have a better understanding of the shape of the problem and feel better for it.

      Generally I ask things like:

      1. What do I actually want as opposed to just what I think I should want?
      2. What aspects of this problem that I might reasonably be expected to care about do I actually not care about much at all?
      3. What would my ideal scenario look like?
      4. What smallish changes can I do that would improve things right now?

      Specific things that might be worth asking:

      1. Do you actually care about liking your job or do you just want to not dislike it?
      2. What are the things you actually want to get out of a job?
      3. If your concern is being bad at your job, what are some aspects of how you do it that you could improve without practicing it in your spare time?
      4. What are some things that you do enjoy doing, and could you do more of those? Maybe focus separately on things that you could do in and out of work.
      1. 4

        I think @drmaciver uses prop testing in a much different way than I do. I don’t do a lot of prop testing so it’s quite likely that I am the odd ball here but statements like:

        Often you want to generate something much more specific than any value of a given type.

        And

        Especially if we’re only using these in one test, we’re not actually interested in these types at all, and it just adds a whole bunch of syntactic noise when you could just pass the data generators directly.

        What made me find these odd is that I am generally generating the types my API expects, not special types just for testing. If I need a special type just for testing it tells me that my API hasn’t defined its types strictly enough (Correct by Construction). This post seems much more like maybe the author uses these testing types more like contracts?

        1. 5

          One of the examples I like is related to parsers of any kind. Let’s pick one of the simplest cases with a word-counter, that looks for all space characters to delimit words and count how many words have been seen. 1,114,112 code points are usable for this; a single one is the space, 17 fit the ‘space/separator’ family.

          Generating a random string with a uniform distribution that contains more than 2 spaces is tricky. Generating a random string that contains two of them in a sequence is rarer. Generating one that mixes either with combining marks that could change the contextual interpretation is another one.

          What you may want to control is the distribution: 20% of characters are going to be space or separator-related, 70% will be anything at all, and 10% will be combining marks.

          If all you have is a type string, you can hardly optimize for some character sets: in CSV, you’d want more commas, linebreaks, and quotation marks than in the example above; if you’re dealing with HTML, you may want to throw in more brackets and HTML entities. Similarly, if you’re dealing with years, 1970 and 2000 may sometimes prove to be more interesting as central points than 0 (which is not representable in the gregorian calendar anyway).

          At that point, the precision of your property-tests, or their ability to truly exercise your code, depends on getting generators that are likely fancier and better-directed than what your type system lets you express. Either you need to beef up the type system, or you need to be stuck with less control than what could be ideal.

          1. 0

            Let’s pick one of the simplest cases with a word-counter, that looks for all space characters to delimit words and count how many words have been seen.

            The definition of this problem is already problematic, and not at all simple, though. You can’t assume that out of 1,114,112 code points “space like” characters are the only word separators. “犬と猫” (Japanese), according to Google translate, means “the dog and the cat.”

            (This makes your point stronger, but was also a troll).

          2. 4

            I think @drmaciver uses prop testing in a much different way than I do.

            Possibly, but observationally I think my way is pretty common among anyone who use property-based testing libraries which are less tied to types. Certainly it’s very common in Python land, but there’s a bit of a biasing factor there in that I wrote the main Python property-based testing library so they may just be copying me. :-)

            What made me find these odd is that I am generally generating the types my API expects, not special types just for testing.

            Note that I am not saying that you shouldn’t test the whole range of types your API accepts. I’m saying that not every test of your API needs to do that and it is worth testing more specific domains. @ferd’s parser example is a good one, but in general there will be properties that are only interesting in a subset of your domain (e.g. because they test a behaviour that handles a special case).

            Take the mean example. It restricts the domain in large part because the property it’s testing is not valid outside of that restricted domain. You still need to test what happens with NaN, or empty lists, but what happens there is probably just an error condition that you need to test separately.

            1. 2

              One example:

              def fact(n: int):
                if n < 0:
                  raise ValueError
                elif n == 0:
                  return 1
                else:
                  return n * fact(n - 1)
              

              You’re properly handling the negative number case, but if one of your property tests is f(n) = n!, you don’t want that specific test “failing” because Hypothesis tried n = -1.

            1. 3

              I’m interested in this part:

              this project (OPIC including the hash table implementation) is approved by google Invention Assignment Review Committee as my personal project. The work is done only in my spare time with my own machine and does not use and/or reference any of the google internal resources.

              Does Google not allow you to spend your spare time making something unless it’s approved by a committee?

              1. 5

                Does Google not allow you to spend your spare time making something unless it’s approved by a committee?

                Basically, yes.

                (More accurately, you can spend your spare time making whatever you like as long as you don’t release it to the world and do give it to Google. The IARC is “only” required if you for some silly reason don’t want to do that)

                1. 1

                  This is how it’s worked at my two latest employers as well. The way it was described to me was manifold:

                  • Protect the company liability. If you release a project that violates some legal boundary and someone realizes that you work for Big Corp, it’s very hard to prove you didn’t use some of Big Corp’s resources in developing that project so Big Corp is a legal target.
                  • Protecting you. If you develop something that is quite popular/makes you money, the company can claim that you used company resources to develop it and thus own some of it. Of course, they would be a giant dick to do that, but laws aren’t about people being dicks or not.
                  • Making sure you aren’t possibly using domain knowledge you’re being paid to use at the company to further your private benefit.

                  I think one problem with Google is they are so large it’s quite hard to work on a side project that does not use some of your Google domain knowledge so their committee is much more elaborate in deal with it. Where I have worked you make an amendment to your contract describing the personal project and the company waives the IP.

                  1. 4

                    “Protecting you. If you develop something that is quite popular/makes you money, the company can claim that you used company resources to develop it and thus own some of it. “

                    That’s some funny sophistry on the part of who described it to you. It should be read as follows:

                    “ROI for company. The company wants to make as much money as possible off your time and brains. Anything you release that might make you money down the line is something that might make the company money down the line. It’s better for the company if they own everything you do during your employment with any immediate or future value.”

                    Also why I’d never take a position for Microsoft, IBM, Oracle, etc if I was doing R&D. Universities would at least give me a chance of getting my work out there for people to use.

                  2. 1

                    I think most places put this kind of clause in any contract you sign.

                    I suspect those that think it is unreasonable probably are currently unwittingly under such terms, whilst those who do just get it struck out of the contract.

                    I have mine amended to state if using any resources or time of the other party, they own it otherwise it’s mine.

                    Most are okay with this, it is after all just part of the negotiations[1], those that are not are probably best steered clear of anyway.

                    [1] remember kids, don’t be silly and think only to negotiate on money ;-)

                    1. 1

                      I think most places put this kind of clause in any contract you sign.

                      Most places in the US maybe. It’s less common in Europe, and depending exactly where you are in Europe it may not even be enforceable under local employment law (which doesn’t stop companies putting it in there anyway).

                  1. 4

                    This post couldn’t have been timed better. I’m exploring formal methods for proving certain invariant about a system I’m designing, and I stumbled accoss the lamport tla programs.

                    I spent half a day trying to get TLA running and failed. As far as I can tell tla+ was an ocaml program that relied on a bunch of random SAT solvers from various universities (which were very difficult to run). It seems like maybe it’s been rewritten in java now?

                    If someone who uses TLA+ would make a Dockerfile that would be much appreciated.

                    The python version looks quite interesting, is someone familiar with the problem space able to chime in on the features of Hypothesis vs TLA+?

                    1. 11

                      The python version looks quite interesting, is someone familiar with the problem space able to chime in on the features of Hypothesis vs TLA+?

                      Hi, Hypothesis author here (though I’m only passingly familiar with TLA+).

                      The big difference is that Hypothesis is not a formal methods system, it’s a way of doing normal testing augmented by an external source of data and decision making. This is both its advantage and its disadvantage over TLA+.

                      It’s an advantage because it’s flexible and easy to use (the bit shown in the article is actually one of the least user friendly bits of Hypothesis IMO) - there are a lot of things that you can readily test with Hypothesis that it would probably be a real struggle to model in TLA+ - anything you could write tests in Python for you can test using Hypothesis with almost no extra work.

                      It’s a disadvantage because this means that there are bugs that TLA+ will catch that there’s just no way that Hypothesis will ever find. Hypothesis is a well tuned and somewhat smart random tester, but there exist “deep” bugs that no random tester will ever find that formal methods can just spot immediately because they understand the system thoroughly.

                      Fortunately it turns out that a lot of interesting bugs are surprisingly shallow, and a lot of software has such bugs, so in practice I think you get a much better effort : reward ratio by starting with something like Hypothesis, but once you’ve done that if you still wanted to try TLA+ it would probably flush out some interesting bugs that Hypothesis had missed.

                      The other big difference is that TLA+ is explicitly about modelling concurrent systems and I haven’t done much with concurrency in Hypothesis, mostly because Python.

                      1. 2

                        The other other big difference is that you can do useful stuff with Hypothesis after like an hour while you have to invest a couple of days into learning TLA+ to get any benefit. I love TLA+ as much as the next guy but I’m not gonna pretend it’s user-friendly.

                      2. 10

                        The best way to start with TLA+ is probably by downloading the TLA+ Toolbox, which is an all-in-one IDE for modelling and testing. The model checker (TLC) does an exhaustive search, while the proof system (TLAPS) uses SAT solvers on the backend. For the most part you can get pretty far with just TLC, and I’ve not yet had a need to write full proofs.

                        As for TLA+ vs Hypothesis, I want to caveat that I’m much more familiar with TLA+ than Hypothesis, so I can’t speak to accurately to it. From my understanding, though, they’re for different things. Hypothesis is designed for testing actual code: it goes right in your test suite and can partially replace your unit tests. That’s what makes it so fantastic: there’s almost no overhead to start using it and it provides much more rigorous testing than simple unit tests do.

                        TLA+, on the other hand, is more for testing designs: you have to translate it to code, and that leave room for errors. But since it’s declarative, it can specify complex properties in very simple ways. And since it’s abstract, it natively handles liveness and concurrency problems a lot better than a concrete tool, like Hypothesis, can.

                        So I think they’re complementary: Hypothesis works great for testing your code, while TLA+ works great for making sure the algorithm itself isn’t broken.

                      1. 3

                        Was going to write a comment asking @DRMacIver if he intended to combine coverage-driven analysis with the property-based testing of Hypothesis and then scrolled down to see the “glass box testing” section.

                        I think that unguided fuzzing will inherently require more time than property-based testing. Have you considered having a semi-guided mode? If you can find a roughly robust way to partition inputs by the program flow that they cause AND infer additional input constraints from those partitions (both of these aren’t easy, I understand) a semi-guided tool sounds great– “hmm, spend the next 5 minutes just doing inputs that produce traces at least this long”.

                        Is there prior art in partitioning execution traces by similarity or inferring properties from input space partitions?

                        1. 2

                          Could you elaborate on what exactly you mean by guided?

                          1. 1

                            “Guidance” as in human eyes looking at some representation of intermediate fuzzing results and human input specifying to the fuzzer things like, “spend your time only on inputs (that produce traces) like these” or “omit inputs like these; the crash is (upon my insightful human inspection) trivial”.

                        1. 2

                          Depends on your goals. If you want high-impact, it looks like it could have it. Reading it made me think it could benefit from IDE assistance where certain analysis happen to the code every time it’s introduced or changes. Then those guided test cases that took hours might take seconds to minutes… at least a subset of them. Might also be a fun project to try to formally verify in K framework or Coq. Need more verified tooling.

                          I’d also like to see more testing and analysis tools applied to CakeML to show it maintains its invariants in every path of execution. That would increase odds it could be used in DO-178B certified tools without the source-to-object code inspections like required for SCADE done in Ocaml. Just a pile of equivalent tests instead on each transform that can be batched. It would also be good for reference implementations of code used to detect compiler errors in main compilers.

                          1. 9

                            I’d also like to see more testing and analysis tools applied to CakeML to show it maintains its invariants in every path of execution. That would increase odds it could be used in DO-178B certified tools without the source-to-object code inspections like required for SCADE done in Ocaml.

                            I don’t think those kinds of improvements really capture what makes Hypothesis so magical. Hypothesis is accessible. You can slide it into any Python project and using it doesn’t require much learning. The implementation is, of course, amazingly complex, but on the surface none of that appears. Just put a @given on your test and it’s fundamentally changed, giving you something unittests don’t - can’t - give you. It’s hard to overstate how incredible this is.

                            In startupland, everyone thinks in terms of unit tests. Even the common ‘alternatives’, like integration tests and feature specs, are really larger-scale unit tests: each is just a single point in the state space, even if the size of the subdimension is larger. Anything more sophisticated than single-state tests has a steep learning curve. It took me two months to finally feel comfortable with even simple formal specifications. And convincing someone that type systems are good is a running gag at this point. “A monad is just a monoid in the category of endofunctors, what’s the problem?”

                            The thing with steep learning curves is it’s hard to convince people that it’s worth learning, or even that there’s something there to be learned at all. When your options are “unit tests” and “learning category theory”, the default becomes believing that single-state tests are the only useful test, maybe even the only practical test.

                            Then you spin up Hypothesis and property tests are possible, then trivial. And that opens up an entire world of testing, because you know that single-state tests aren’t everything and that there’s really amazing discoveries just waiting out there.

                            I’m a huge advocate of formal methods, to the point of writing an entire beginner’s guide to TLA+. The only reason I started, though, was because I learned Hypothesis. Without that formal methods would have just been another weird academia thing.

                            1. 6

                              I just wanted to say that getting comments like this about Hypothesis is delightful, thank you. :-)

                              1. 2

                                “on’t think those kinds of improvements really capture what makes Hypothesis so magical. ”

                                Of course not. That’s what your comment did pretty well. Mine was about extra ways to target or improve it that were interesting for researchers. That’s what OP asked about in the text of the submission. If aiming for max deployment, I’d say port the methodology to every mainstream language you can with prebuilt integrations for all kinds of build systems. Also apply it on common projects. Write guides for new people.

                                That’s all just boring as hell to most CompSci people wanting to extend cutting-edge projects. Integrating this with a verified stack to produce full coverage or spec testing that might be applied in domains like aerospace… less boring. ;)

                                1. 6

                                  That’s all just boring as hell to most CompSci people wanting to extend cutting-edge projects.

                                  So there’s actually at least one major reason why the widespread deployment of Hypothesis isn’t boring from a research point of view: It’s a great source of data for whether things done to improve its implementation actually find more bugs in practice. If and when I want to test an exciting new algorithm I can just fire up all the open source code that’s tested with Hypothesis and run it and see if it finds interesting new bugs.

                                  1. 1

                                    There’s another for the list. Yet another building on that would be tech that seeds bugs of specific types into codebases like moyix on Hacker News builds. The algorithm introduced into Hypothesis might catch certain types of bugs or might only due so in certain contexts. Such limitations aren’t always obvious from one or two runs. Seeding more of that same kind of bug in many, random places might show it. This continues on with the bug funding and bug seeding tools running opposite of each other testing a variety of algorithms on large codebases. A hostile co-evolution much like Danny Hillis used for sorting algorithms on Thinking Machines' hardware.

                            1. 4

                              I’m torn. I think what you and others are saying makes a lot of sense, but I can’t help feeling that a little more bondage and discipline (which comes with multiple repos and dependence management) help people not be sloppy about APIs between components. You say “don’t do that” about making everything a nebulous megaproject, but I think the risk for that is huge.

                              You can have dependency management and partial builds with binary deliverables even between folders in a monolith repo, but I think the megarepo encourages megabuildsystem and megaproject tendencies. Maybe hardline managers and a system like Bazel can make it work better.

                              1. 6

                                Anecdotally, every time I’ve seen it the bondage and discipline approach it has produced much worse results, and migrating from it to a monorepo has almost always improved matters in ways that would have been very hard to achieve while keeping things in separate repos.

                                I mean you certainly can have clean code with multiple repos. Many people do. But I don’t feel it actively encourages it. Both multiple and mono repos can have bad incentives (though I personally haven’t seen it get too bad with monorepos, I totally believe it can), but the difference is that the monorepo makes it easier rather than harder to fix the problems, by making refactoring in the direction of better modularity easier.

                                1. 1

                                  Anecdotally, every time I’ve seen it the bondage and discipline approach it has produced much worse results…

                                  Seconded. Yes, it’s possible, but it turns out to be very difficult. It requires work that tools generally don’t make frictionless, so it doesn’t get done. (And I’m not convinced yet that making new tools to address will actually address it.)

                                2. 4

                                  A megabuildsystem sounds a lot like build consistency across projects to me. I’ve never seen a multi repo system where the build systems were consistent, always some oddball thing you have to do different for every repo.

                                  Even worse, during cross team collaboration it becomes a total nightmare to synchronize everyone’s pet repos as you work together on a change. Two teams is manageable but when three, four, five teams have to collaborate across 10+ repos you might as well just scrub the feature and tell the customer it’s impossible.

                                  • When is it okay to pull from your repo?
                                  • Which branch is stable right now?
                                  • Where is your repo, I don’t see it in central?

                                  I have never been in a multi repo environment and not been forced to ask each of those questions at least once a month.

                                1. 16

                                  I think almost everyone will benefit from organising their code this way.

                                  No. Sweeping generalizations like this don’t help anyone. It all depends on your development process, and which VCS you choose to support this development process, and what this VCS was designed to handle.

                                  For example, at the ASF we have a big monorepo (svn.apache.org) which works because it’s an SVN repository so each project only needs one top-level directory, beneath which the project models its development process using SVN-style branches and tags (i.e. copies).

                                  And there are ASF projects which use git, and these live in distinct repositories (git.apache.org, which also mirrors projects from SVN). Because of how branches are modeled in git that’s the only reasonable option.

                                  1. 9

                                    No. Sweeping generalizations like this don’t help anyone.

                                    “Sweeping generalisation” is an interesting term for the conclusion to an argument that starts from widely applicable problems to reach a generally applicable solution.

                                    And by interesting I mean “wrong”.

                                    There may be valid reasons someone can’t do it, or that it would be too costly to do it, but I’m pretty confident in saying that almost everyone who can do it would benefit from it.

                                    Also Apache are pretty out of the scope of the scenario I was talking about, as might have been hinted at by the title including the phrase “All your company’s projects”

                                  1. 4

                                    Everything on this list seems reasonable, but I see teams doing many of these things and still having problems. You can say “on they aren’t doing it right”, which is probably true. For myself, I think the biggest tool I have for writing high quality software that I don’t see other people use is that I have like 5 or 6 things, and I just use those in different combinations. I really really really hate adding a new technique to my repertoire. I separate stuff with processes a lot. I use state machines + simple interpreters like crazy. I care a lot about pinning dependencies. I also spend a lot of time describing the semantics of my APIs (and I’m in a position where I can demand that of my team). And I do a few other things. I don’t really care that much about testing, I do it generally but it’s not a big priority for me. I don’t know what my secret sauce is but I consistently see my teams ship things with fewer bugs and less severe bugs. Not all, but many of the bugs we hit take about 5 minutes to diagnose and maybe an hour or two more to get deployed through to production (assuming it’s high enough priority to execute on now).

                                    1. 6

                                      Everything on this list seems reasonable, but I see teams doing many of these things and still having problems. You can say “on they aren’t doing it right”, which is probably true.

                                      Gosh, no. That’s not a thing I’m ever likely to say, and if I do I hope someone calls me on it. I hate it when people use that as an explanation.

                                      It’s not intended as either a necessary or sufficient list of things to do to write good software. It’s literally what it says on the tin: A list of things that might help.

                                      You can certainly ship great software without most of the things on this list, and very few of them are things I’d consider mandatory (only two really). I wouldn’t be very surprised by someone doing all these things and still shipping bad software, though I’d be really interested to talk to the team and see if I could figure out why. Instead they’re things where I feel like if you’re not doing them, adding them is likely to be an improvement.

                                      All of the things you list sound sensible (except not caring about testing ;-) ).

                                      1. 3

                                        very few of them are things I’d consider mandatory (only two really)

                                        I wonder what your two are? I’d go with attitude and manual testing. If you have infinite money and patience, you can just go slow and manually wall-bang the hell out of everything you try to ship; all of the other factors merely reduce the cost of shipping not-garbage down to realistic numbers rather “the yearly GDP of an entire neighbourhood per LoC”.

                                        1. 2

                                          I wonder what your two are? I’d go with attitude and manual testing.

                                          Hmm. Maybe more than two then. :-)

                                          I was going with “have CI” and “use version control”. Manual testing is probably a third though (and I guess “regression testing” is pretty close to mandatory in my book too).

                                          Attitude should be mandatory, but it’s not worth making things mandatory that you can’t actually enforce. Changing peoples' attitudes is a hard problem.

                                        2. 1

                                          “You can certainly ship great software without most of the things on this list, and very few of them are things I’d consider mandatory (only two really)”

                                          From Apollo to Cleanroom, the minimal things that got lowest-defect software were (a) attitude, (b) code review, © design/code done for easy verification, and (d) thorough testing w/ focus on usage-based. CI wasn’t even invented yet with Cleanroom developers not even allowed to compile their code. Even by Orange Book, the build system caught basically nothing in high-assurance systems since the key errors were knocked out with above plus formal specs. So, I’d say above is mandatory minimum. It worked for everything from security kernels to huge monoliths like OpenVMS.

                                          Then the other stuff on your list starts coming in. It’s all valuable in that it would potentially catch problems. CI itself is so easy to do today one might as well do it since one needs a build system anyway. The thing you didn’t really put in your article was ©. You came close with the library part but not enough. The safety-critical, security-critical, and hard-real-time systems all use simple, easy-to-analyze, easy-to-compose structuring and language subsets so very little will surprise them. This aids both static analysis and whole-program optimization as well. The program is just easier to analyze.

                                          So, worth putting something like that in there.

                                      1. 3

                                        Oh, that’s what’s going on with those things. I noticed it when loading some pages but wasn’t sure why it was happening or how I could make it go away.

                                        1. 76

                                          Interesting article. Yeah, some advocates are going off the rails with their praise of static typing. I find it helps to look to prior successes and tradeoffs to try to understand how to best do engineering. The people who invented the engineering of software were independently Bob Barton of Burroughs B5000, Dijkstra w/ THE OS, and Margaret Hamilton in Apollo & 001 Tool Suite. Each had pieces of the problem with some culminating with later work into the TCSEC and Orange Book’s B3/A1 classes of high-assurance security. Big companies like IBM came up with formal inspections and Cleanroom’s cost-effective approach. Cleanroom consistently resulted in low-defect software in commercial applications even on first use. Became my baseline. Meanwhile, safety-critical embedded community continued using all kinds of variations of them with lessons learned. CompSci continued working on formal methods, type systems, testing strategies, and runtimes trying to figure stuff out with numerous successes. Private companies differentiating on high-quality systems used a combination of formal methods (esp Altran/Praxis or Galois), safer languages (all), and DSL’s (iMatix and Galois). So, we have plenty to draw on in the lessons learned reports.

                                          Let’s start with what high-assurance projects showed would help by many real-world demonstrators:

                                          1. Ambiguous, English requirements created problems. The requirements, functional or otherwise, must be specified in a precise, unambiguous form. They used formal specifications for that combined with English descriptions so people knew what they were looking at.

                                          2. Design should be modular with easy to follow control flow plus careful attention to correctness of interfaces and any assumptions modules have about interfacing. All of them did this. B5000 did it with argument and type checks at both compiler level and CPU level during runtime. Hamilton noted interface errors were 80+% of problems on Apollo development. Her USL and 001 Tool Suite used formal specs of functions and interfacing to generate code free of interface errors so long as specs were correct. Would knock out 80+% of flaws potentially. This showed up in larger, commercial use with Eiffel Method/Language and SPARK Ada later combined it with static analysis.

                                          3. Highest-assurance had formal proof that the design matched the requirements or security policy. Medium-assurance did this informally where each part mapped to other parts with clear, convincing arguments they corresponded. Almost every project that applied formal proofs, whether it helped a lot or barely, noted that simply redoing specs and code simple enough for provers to use itself caught errors without doing the proving process. Lasting lesson there.

                                          4. Since Bob Barton, it was known that there would be less problems if the language, CPU, and OS made common activities safe to perform by default. His B5000 did that with it being immune to most forms of code injection today. MULTICS’s reverse stack and non-null-terminated strings with PL/0 made it immune to some problems. Wirth’s languages checked bounds, argument types, etc. Hansen had one free of race-conditions. Eiffel’s SCOOP took that further with Rust doing it again. Various ML’s, Haskell, and IDRIS took it even further in various ways. The focus was designing something so that average use could make sure entire problems would never happen. More typing led to more time so tools often did prototyping in unsafe mode with quick tests to make sure it worked initially. Full type-checks and thorough tests could run in background, on clusters, or overnight to get more thorough results.

                                          5. Review of all that. Burroughs did it implicitly with Barton seeming to invent the concept in an off-hand comment or two. Apollo team did it religiously. Commercial sector got the message when Fagan introduced Software Inspection Process at IBM that dedicated percentage of time to finding bugs based on checklists of kind that keep cropping up. Worked wonders. Later, OpenVMS team alternated one week building, one week reviewing/fixing based on priority, and all tests ran during weekend. One of most reliable OS’s ever built. Microsoft did a 180 in security when high-assurance engineer, Steve Lipner, gave them a watered-down, ship-quick version of this list under SDL. The reviews & QA tests knocked 0-days down to lower than most FOSS that I can tell. So, review, review, review!

                                          6. Testing. Came in many forms with many benefits. Acceptance tests were about customers as much as proving the tech works. Unit tests checked every module and improved quality but test-building and running vs other QA trade-off is still debated. Cleanroom decided bugs users see are worse than those they don’t. So they did usage-driven testing based on models of all the ways the users might use the interfaces to the product. Combining it with formal specs or Design-by-Contract allowed test generation based on specs of actual behavior. You don’t manage those tests at all but still get the benefit. Fuzz testing caught errors manual tests often missed so they’re good idea. So far, automated testing from formal specs and/or types combined with manual tests for what those can’t easily express seems like strongest combo. Run fuzz tests, QuickCheck, etc over night with prioritizing fixes based on importance or severity.

                                          7. It was known since early days that compilers were complicated beasts that could trash your software’s correctness. Wirth went with simplified ones that did quick iterations with OK performance. Ada compilers tried to do everything under the sun for high safety & performance with high build times and costs resulting. LISP’s had ideal method of REPL for instant results, incremental checks/compilation of individual functions for quick iteration of performing code, and full compilation with static types for best results which could be batched. It also showed the complex functionality could be macro’d onto simpler language that’s easy to compile. 4GL’s did that for BASIC as well with tremendous productivity boosts. Certifying compilers showed up that converted HLL to assembly with evidence they’d always be correct. My recent proposal is to add macros, Design-by-Contract, Rust’s temporal safety, and a few other things to a language like Modula-3 that’s already ultra-fast to compile and turn into production code. Give it LISP-like combo of fast interpreter, incremental compilation, and fully-optimizing compilation caching & integrating the results as development happens. Fully test it in overnight runs that do spec-based test generation, acceptance/regression testing, fuzz testing, full analysis, and certifying compilation with tests against the fast versions for correctness checking. Come into work next day to find either what problems to fix or ultra-optimized, correct code ready to integrate into whatever you’re about to build.

                                          8. Information-flow analysis. The early version was to detect covert channels that are the bane of high-assurance security’s existence. The only thing we couldn’t reliably, consistently eliminate. The initial method modeled the system, software and hardware, as interacting parts to find what components they shared. If they can read/write it, it’s a storage channel. If they can modify & time it, it’s a timing channel. Later on, such stuff turned into information flow control that labeled various variables in accordance with your correctness or security model to then track via type systems or other analysis how information flows in your system. Violations of your expectations vs code’s reality are problems to be corrected. Design-by-contract can do some of this but maybe not all. Open problem. Already practical enough that tools like SIF can do web applications that automatically find or prevent leaks then generate client and server-side code with specific algorithms in right spot for acceleration or security. Such methods could benefit software in general.

                                          9. Recovery-oriented computing and fault-tolerance. Hamilton’s people proved this out in Apollo using asynchronous communicating processes, hardware redundancy, code to handle failures of processes/components, and a human-in-the-loop method to let people do what they do better. Tandem NonStop used similar features to design a massively-parallel, five-9’s platform for mission-critical assets. OpenVMS got close to that with decade plus uptimes by just focusing on code review, error handling in kernel, versioned files, standardized locking/batching/whatever to do hard stuff for apps, and bulletproof clustering. Byzantine fault-tolerance and all kinds of exotic stuff went from there but recent stuff like Copilot monitoring scheme and micro-restarts paper show it can be really simple to react to lots of stuff properly. :) The strategy for the rest was fail fast, safe, and as obviously as possible with error messages pointing in the right direction if possible.

                                          10. Build-systems. I’ll briefly mention this as it doesn’t cause most problems. Should protect integrity of software keeping track of all changes to know where problems came from. If security-critical project, it should run on strong TCB with access controls per project or maybe even file to reduce malicious input. Automate any tedious things while letting people decide on the rest as “managing” software in build systems has so many angles I’m still learning basic things in new contexts. Immutable store with per commit integrity, access controls for confidentiality/malice, and replication for availability are the basics.

                                          11. Static and dynamic analysis. These came later in research for lighter, formal methods. The best static analysis tools… probably Astree Analyzer for C and SPARK Ada’s built-in one… can prove software free of common errors if you can write it in the constrained way necessary. Many model-checkers and SMT solvers were formed that could automatically prove properties about specific software. This ties into spec or property based testing where the solvers can handle what they’re good at with tests handling others. Such thinking led to dynamic analysis that combined specs and/or static analysis techniques with actually running the program. The benefits of that, from safety to extra optimization, led to my idea of staged compilation process that does quick checks in interpreter early on, slightly-slower ones on sub-optimized EXE in background, and full battery with full optimization last. In any case, developers should be using whatever automated analysis or testing tools they can for their language/app.

                                          12. Functional/OOP programming. Each group claimed their methods of development prevented flaws common in structured, imperative programming esp as things got big. More flexibility and safety they say. Subsets of these principles were shown to do exactly that. Smalltalk was the early success and is reigning king of OOP model in productivity with Eiffel maybe champ in productivity, speed, and correctness tradeoff. Functional finally matured enough with Ocaml and Haskell ecosystems to do amazing things. Using functional in as pure a style as possible to combine with automated analysis and testing methods seems like great promise in work involved vs correctness obtained. OOP I can’t speak on as I avoided it and have less data on how experts in say Smalltalk or Eiffel might do that.

                                          13. CPU or ASIC level. I left this last as it’s often the least-likely to be available. B5000 architecture made it immune-by-default to stack overflows, out-of-bounds accesses, mis-matched arguments in function calls, and malicious data becoming new code due to app flaws. Wirth and Jurg’s Lilith computer designed the CPU, assembly, compiler, and HLL to work together ideally so no abstraction gaps posed a problem. The Secure Ada Target and Army Secure OS (ASOS) projects made tagged hardware plus high-security runtimes that enforced at runtime many protections which Ada may or may not catch at compile time. They failed but numerous Java CPU’s doing something similar succeeded in marketplace. Recently, at high-end, Cambridge’s CHERI makes C apps capability secure with any violation of POLA being caught by CPU. More important here, the SAFE processor at crash-safe.org has a policy-neutral, metadata engine built-in that can enforce at runtime something like 50 different policies in any combination on running software. A combination of above spec and testing strategies with such a CPU in build-system could have tools produce the EXE’s and policies to check correctness with CPU doing that natively with no penalty. Most important policies, like memory safety or leak prevention, can be used in production as well to be sure any unknown vectors won’t screw with you. CPU’s themselves or at least critical parts can be made with correct-by-construction techniques that often ensure first-pass with asynchronous knocking out timing issues, triplicated checking w/ voter to knock out general issues, and SOI/rad-hard processes knocking out many EM or radiation issues.

                                          So, there’s what over a decade studying high-assurance systems taught me had big payoff in various projects. Methods like Cleanroom, formal specification esp of interface expectations, spec-based testing, safe languages, static analysis, and easy iteration w/ full-on testing done later have collectively always paid off. Combinations of these were done on various projects by small teams with limited expertise in them and still got better success than usual at correctness. The companies like Altran or Galois that consistently apply some of this rarely had failures in production with most defects being non-critical. Such empirical evidence of specific methods consistently getting good results in terms of low defects argues strongly to start with what they do in coming up with next method with better time/work/defect tradeoffs.

                                          It’s about as quick a summary as I can put in. I’ll be busy today chilling with family but will chime in to attempt to answer more detailed questions as time permits.

                                          1. 8

                                            Interesting article. Yeah, some advocates are going off the rails with their praise of static typing. I find it helps to look to prior successes and tradeoffs to try to understand how to best do engineering.

                                            So I basically agree that these are all good ideas for improving the quality of software. But a lot of them work by increasing the cost of software development to the point where it’s not viable for most people to program this way.

                                            Which is the context in which I’m making this point: Given an unlimited (or at least relatively large) budget allocated to software quality, I’d definitely take types + tests, but given a more typically modest one I don’t think the types are adding much.

                                            Such empirical evidence of specific methods consistently getting good results in terms of low defects argues strongly to start with what they do in coming up with next method with better time/work/defect tradeoffs.

                                            I don’t think this is necessarily true if the goal is to actually see widespread adoption. The approach I’m taking is trying to take the tools that people are already using and make them more powerful, which seems a lot more likely to bear fruit than taking the tools that people aren’t already using and trying to make them cheaper/easier to use.

                                            1. 9

                                              But a lot of them work by increasing the cost of software development to the point where it’s not viable for most people to program this way.

                                              The same principle applies to doctors and engineers: true depth of experience and proper protocol means that not every person gets to perform heart surgery or build a bridge. As a society, we’ve accepted that this is desirable.

                                              The goal of widespread adoption is a bit at odds with the goal of secure and reliable software. Taken to even a slight extreme, you end up with:

                                              • Deploying code on hardware architectures that do not afford protection against certain attacks or bad input.
                                              • Requiring deployments to be managed by fewer people than is required to have proper operational checks-and-balances (consider having the developer, the admin, and the security/compliance person all being the same job when management comes calling for an “urgent” feature request).
                                              • Providing simplified services to people that cannot understand the importance or significance of what the software is doing and cannot detect when it is operating incorrectly in time to fix or at least deactivate it.
                                              • Purposefully breaking automation to prevent users from consolidating workforces or losing upsell opportunities (Frank is our reporting guy, who won’t sign off on your new BI platform that makes him redundant…better sell Frank on some obscure voodoo he can be master of and hope he doesn’t break the product).

                                              These things all happen, every day.

                                              I think that there are kinda two asymptotic views we can take about software:

                                              • Accept that all software is disposable, and accept that we will need to replace/upgrade it regularly. At the same time, we should be open about making frequent upgrades and tolerant that occasionally things will break and that every 5 years the business should just budget for new code in the same way that they’d budget for, say, carbon copy paper or new laptops.
                                              • Accept that all software needs to be built as securely and reliably as possible, and accept that this makes software expensive. At the same time, we should learn how to specify software well so that it can actually built to match what we need and we should become comfortable with dealing with minor rough edges when a given package doesn’t do exactly what piddling little thing we ask. Nobody decides to repaint highway spans every year to match the color of their Tesla, because that’d be really stupid–and yet, in software, people expect that this is reasonable.
                                              1. 7

                                                Good points except your dichotomy at bottom because it seems false to me like most dichotomies. Call me extra skeptical whenever someone tells me to choose between two options. Usually a trap. ;) So, here’s your options:

                                                1. Software will be crap and change too much. Result should be designed so we can throw it away. Easy updates and integrations is the preference here.

                                                2. Software needs to be built robustly with strong specifications. Build it to last.

                                                There will be projects that might fall into one extreme or the other. Reality says most won’t, though. The fact is that robust, cathedral-style software can be made under principles of No 1. They just need to clearly decompose it, do interface specifications, tests for probable successes/failures of key functions, be agnostic to storage method, and be agnostic to communication method. Most are straight from medium-assurance development methods. The last two I devised when I saw how Microsoft did data lockin: formats and protocols. They have to be interchangeable easily to counter that whether intentional or incidental. To counter platform lock-in, the platform’s API needs to be open with ideally it also being simple enough for a small team to port or re-implement the whole thing. Hence my early love for industrial BASIC’s, Turbo Pascal, and Oberon derivatives for future-proof software.

                                                Simple OOP a la Smalltalk or Component Pascal. Close to purely functional development like sane Ocaml or Haskell. These two can be combined with above tricks to make correct code easier to crank out, easy to extend, easy to replace, and easy to throw away since everything else is easy vs vanilla, legacy stack. Also, that just using it in the common way creates fewer defects (esp catastrophic defects) means getting stuck on it (worst-case) ain’t as bad as COBOL, J2EE, or MFC/C++. ;)

                                                Note: Many people countering adopting Ada 2012 point out that most of AdaCore’s customers are probably legacy businesses who can’t move. The legacy is a future-proof-so-far language with safe defaults, built-in concurrency, good performance, safer extensions, and safer integrations. Customers probably think it’s a nice thing to be “stuck” on vs what veterans of other stacks write about. So, I think about how I can do the same with something that isn’t Ada and has even more power/safety/ecosystem. Julia, Rust, some Scheme’s, ParaSail, and Ocaml are all in my mind as I wonder about that.

                                                1. 4

                                                  I believe we need to start treating software development more like engineering and less like the anti-intellectual hacker’s dream that many of us are currently in. All devices are internet connected now and therefore software is not disposable unless the hardware becomes disposable. That would be terrible!

                                                  1. 2

                                                    This is a very good point. We can also see what’s happening due to poor security in these internet connected devices. At the very least, it’s extremely wasteful (think about all the resources wasted on generating and then mitigating the Dyn DDoS), and at the other end of the spectrum, it’s unworkable. Throwaway software is definitely not an option when everything is a part of one massively interconnected graph.

                                                    1. 2

                                                      “Throwaway software is definitely not an option when everything is a part of one massively interconnected graph.”

                                                      Interesting you mention that. One looking back paper that I often post is from one of the pioneers of INFOSEC field, David Bell. His paper traces the start, successes, shutdown, and effective disappearance of high-assurance security in government. There was a graph and theorem in his paper where I’m not mathematical enough to be sure whether he was overreaching with it. Section B on Intermediate, Security Value Theorem.

                                                      http://lukemuehlhauser.com/wp-content/uploads/Bell-Looking-Back-Addendum.pdf

                                                      One early counter to high-assurance was that we only need to use them for most critical systems. So, we decide which we can run single-level, low security and which needs multi-level, high security. Thing is, once you interconnect them, Bell et al predicted that the middle systems where any overlap happens are effectively multi-level. They have to be secure or they start compromising the others in some way from attacks to covert channels. The modern net uses protocols where some nodes depend on the others, including strangers. There’s also many direct connections through various paths plus things going through firewalls due to over-reliance on specific ports. A graph of that might look like Bell’s on steroids (or LSD). That the minimal networks ended up needing protection for all systems depending on others means the modern, Internet systems probably do as well.

                                                      That’s how the theory would’ve played out. In practice, we’ve seen hackers abuse the identity/transport aspects of hosts to compromise others more easily and use low-assurance hosts to DDOS those with greater protection. Maybe Bell was right. They might all have to be high-security against code injection & internetworking attacks. I’ve been recommending to protect them as that’s what Orange and Red Books taught us anyway. Just remembered Bell’s claim was based on a graph when you said that.

                                                  2. 3

                                                    The same principle applies to doctors and engineers: true depth of experience and proper protocol means that not every person gets to perform heart surgery or build a bridge. As a society, we’ve accepted that this is desirable.

                                                    Right. To be clear, when I say “these ideas are not suitable for widespread adoption” I do not mean “these ideas are bad”. I mean “these ideas are great for improving the situation in areas to which they are applicable and should be actively encouraged there, but they will not do much to raise the average quality of software because of the comparative size of those areas”.

                                                    Accept that all software needs to be built as securely and reliably as possible, and accept that this makes software expensive.

                                                    I don’t think we get to have this option, because what happens in most cases is instead that end users don’t value security and reliability enough to pay the amount that it actually costs, so we end up not developing the software because someone else got there first/cheaper instead of developing the software securely and reliably.

                                                    …which is why I’m interested in techniques suitable for widespread adoption, because they allow you to make changes to the quality of software within that constraint. If you can make better software on the same budget then software quality improves. If you can make better software on a slightly higher budget software quality might improve. If you can make better software on a substantially higher budget, only software that is already very good is likely to improve.

                                                  3. 4

                                                    “Given an unlimited (or at least relatively large) budget allocated to software quality, I’d definitely take types + tests, but given a more typically modest one I don’t think the types are adding much.”

                                                    Types are a small part of assurance. Investing more won’t add much if we’re talking about average developer. Maybe could still benefit where 3rd parties in the ecosystem build specific components that way with an extra boost in quality. That already happens to a degree. However, static typing itself probably won’t do it. They’re better off with lightweight, formal specifications that feed into checks on their code, drive test generation, and aid compiler optimization. Types are a form of formal spec that can sometimes do that but general ones like Hoare logic or Z are better suited.

                                                    “he approach I’m taking is trying to take the tools that people are already using and make them more powerful, which seems a lot more likely to bear fruit than taking the tools that people aren’t already using and trying to make them cheaper/easier to use.”

                                                    It’s sound advice per Worse is Better theory. It even worked with C++, VB.NET, the PHP enhancements at FB, especially SPARK for Ada, etc. Thing is, most of the good stuff hasn’t worked. Think Spec# for C# plus Microsoft’s other tools. The safe subsets or variants of C combined with static analysis. Java with JML and all its checkers (esp concurrency). Ironclad C++ with vanilla C++. These have all failed to gain any traction. The average developer simply will not use them. Problem is with developers, the tooling available, or both. There’s also almost no effort by developers to improve those tools. Little chicken-and-egg problem.

                                                    So, I’m not entirely sure where to go if we’re starting with broken tools in mainstream languages. I was thinking about reimplementing Python interpreter and standard library in SPARK Ada with Rust checking temporal safety. Then, modify one of those nice, web frameworks to do things fast and securely. A few mods to the language to help specific correctness optionally for the programmer. Developers get more safety than average simply using the product as they normally do with an easy route to increased safety. Something like that might take off, at least in industry. It needs to painlessly integrate with an existing ecosystem, though, as Scala, Clojure, F#, and Julia do. It’s a high bill given the size of all these projects plus the fact that the API’s weren’t designed for correctness or security from the get go.

                                                    1. 5

                                                      The safe subsets or variants of C combined with static analysis.

                                                      Well, Cyclone, which I suppose is one of these safe variants of C, eventually led to the development of Rust, but it’s still too early to tell whether that will affect the general quality of software. I’m hopeful.

                                                      On the other hand, a corollary your paragraph is that Rust may very well be too much for your average developer. Does the “average developer”, if such a thing exists — let alone exists in such a manner that we would be able to characterise it without sounding too elitist or patronising — really care about the reliability of software? I think the answer is decidedly no. For now. Maybe things will change.

                                                      I don’t really like Uncle Bob, nor do many people here, for his patronising tone; but I think he was right in pointing out that one day, a careless programmer will code a mistake that will cost lives or a significant amount of money. When this happens, society will begin to expect programmers to possess same rigorous training as engineers, doctors, and lawyers do.

                                                      1. 4

                                                        Does the “average developer”, if such a thing exists — let alone exists in such a manner that we would be able to characterise it without sounding too elitist or patronising — really care about the reliability of software? I think the answer is decidedly no. For now. Maybe things will change.

                                                        I’m not really a fan of the term “average developer” either, but you don’t need to care about reliability to find Rust attractive. You just need to not like debugging seg faults, data races or other memory safety related bugs.

                                                        1. 3

                                                          “but I think he was right in pointing out that one day, a careless programmer will code a mistake that will cost lives or a significant amount of money.”

                                                          Unfortunately, this sort of thing has happen plenty of times with mixed results. It’s often ignored after some money is paid out with no changes because it’s so rare. Barely impacts the profitable status-quo. The times when change happens were in safety-critical industries like aerospace or railways where average developer is expected to try to prevent whatever they can. They have regulations where people have to code in safer ways with the docs, design, and code getting inspected. These don’t apply to people outside those sectors, though.

                                                          This leaves little opportunity to change the average case. Two possibilities are regulations enforcing a baseline with a few, sane techniques that shouldn’t add much to the cost of everything. Just a few items on my list would be a great start. The other is liability legislation so they have to pay for their misdeeds. Fewer screwups, fewer lawsuits they loose. Ranum notes the lawyers will have a field day with that to the point it might be more trouble than it’s worth. That’s why I’m in favor of an established baseline of good engineering and safety techniques in whatever law passes. Like we had with TCSEC and do with DO-178C.

                                                      2. 2

                                                        Is there data on the relative cost of defects in software? Cuz in the end, that’s what it all boils down to. Will writing a mostly defect-free software result in less cost? Of course, you need to consider the loss in revenue caused by the lag in the initial release, but it only takes one catastrophic defect to kill you dead permanently. I really wanna side with science.

                                                        1. 3

                                                          Outside the first-mover advantage, there’s evidence that software that meets expectations can sell more due to better user experience or reliability. That niche is actually a large market even though it pales in comparison with normal market of software.

                                                          The other angle to look at is maintenance phase. This is actually best selling point where no liability legislation exists. The maintenance phase involves fixes and enhancements of production systems that can’t change too much from user perspective. The further in lifecycle you discover a bug, the more it costs to both develop and field the fix. Empirical evidence is clear on fixing them early as possible. Technical debt shows us codebases get harder to fix, extend, and port if certain amount of upfront or refactoring work isn’t put in on its structure. These combine to show a methodology that forces cleaner design with defects knocked out early will cost less over time while allowing more agility in developments. It just logically follows from proven issues. The question is what kinds of effort and how much to put in to get an acceptable tradeoff.

                                                          The methodology that did that best early on was Cleanroom. It enforces clean structure, does code reviews, ties amount of verification to specific item in question, catches many problems before software even runs, testing centered on what happens in real-world use, control flow easy to analyze by hand, and consistently low defect at reasonable cost (sometime cost savings). The defect rate was so low and predictable they even statistically certified certain teams to specific defect rates in such a way that firms would sometimes offer a warranty on the software where that number of fixes was free. I’m not a believer in statistical certification but bug rates were low enough for warranties. Often non-critical, too. Same with Altran/Praxis Correct-by-Construction method that charges 50% premium on average for low-defect code warrantied at specific bug count.

                                                          My base recommendation for low-cost version is Cleanroom done with industrial-strength, functional programming with interface checks, static analysis (esp QuickCheck-style), and usage-driven testing. Add over time battle-hardened libraries for common tasks that do them correctly and have clear guidance to users on proper use with sane defaults. Not just stuff like crypto but also things like strings, names or date/time handling. They need to be able to just reach for it to find a working library that takes almost no brains to use right. The language and checkers are already like that. Learning how and what to formally spec at interfaces or algorithms will be hard part. Worst case they can do asserts.

                                                      3. 3

                                                        nick – one of the best comments I’ve ever read. Nice work, keep it up.

                                                        1. 4

                                                          Much thanks. Figuring out above been my life’s work pretty much. Been trying to condense it all in forms like that so people don’t have to dig for 100+ papers in IEEE or ACM or blogs. Still work in progress.

                                                          1. 2

                                                            If you wrote a book, I’d read it. I enjoy reading your comments, they provide a lot of historical background I don’t usually see.

                                                      1. [Comment removed by author]

                                                        1. 7

                                                          It’s not that I’m saying you’d comment without having read the post, but uh ok yes it is. Have you read the post? Because it’s literally a discussion of that.

                                                          1. [Comment removed by author]

                                                            1. 6

                                                              It’s the responsibility of internet commenters to not make facile dismissive comments in response to things they haven’t read, right?

                                                        1. 22

                                                          This came up again in the context of the recent Dyn DDoS and comes up regularly in discussion of the Internet of Things and its general total lack of software or security.

                                                          Link? I’m wondering why people think this. Do you mean people are arguing it would have made taking over the IoT devices impossible in the first place, or that a static type system would somehow make handling a DDoS possible? In any case, I think any reasonable person would see through these statements pretty quickly. You can find someone who has said just about anything.

                                                          As static typing becomes more elaborate the scope of the bugs it can prevent becomes wider, but it comes with an additional cost: Build times.

                                                          Ocaml has pretty reasonable build times. IMO, I would say Ocaml build times are close to Go given how much moe expressive the type system is than Go’s.

                                                          My general experience of the correctness software written in fancy statically typed languages is not overwhelmingly positive compared to that of software written in dynamic languages. If anything it trends slightly negative. This suggests that for the scale of many projects the costs and benefits are close enough that this actually matters.

                                                          Having written and inherited code in a dynamically typed language, I can say the big win, in my experience, is refactoring. IME, a lot of people (on both sides) talk about producing code the first time. IME, that is pretty easy no matter what language one is it. Coming back to code 6 months or 6 years later is a much different story.

                                                          1. 14

                                                            Coming back to code 6 months or 6 years later is a much different story.

                                                            Oh gosh this so much.

                                                            I’ve inherited various chunks of under-documented under-tested code over time. Type-y assistance where present has made all the difference in these situations!

                                                            1. 11

                                                              Maintaining nearly 80KLOC of OCaml over three years, I was able to use repeated edit-compile-error-scratch-head sessions as a means to re-acquaint myself with the system. But this is such a low bar; we need better. Maintaining more than 1MLOC of Smalltalk over the past twelve years, I found leaning on a type system–even OCaml’s–to be quite impoverished compared to a live, self-navigating system of uniform, recursively designed objects and messages.

                                                              1. 4

                                                                I’ve heard that refactoring smalltalk is a great experience. I don’t hear the same about erlang. Is this just a matter of the smalltalk folks focusing more on tooling? They’re such similar languages in a lot of ways, I would think you could achieve similar things, but I don’t hear people talking about refactoring erlang systems with such reverence.

                                                                1. 4

                                                                  I can’t do it justice in a few lines, and I don’t want to clutter the discussion here. I will say that’s not just about Smalltalk but the mindset that Smalltalk fostered and, in turn, fed back into the evolution of Smalltalk. Go back, at least, to the writings of Ward Cunningham, Kent Beck, Ralph Johnson, and Brian Foote. As for Smalltalk itself, tooling almost emerges spontaneously from it because–I think–of the attributes I noted in the last line of my previous reply.

                                                                2. 3

                                                                  to be quite impoverished compared to a live, self-navigating system of uniform, recursively designed objects and messages.

                                                                  Do you mean that the code bases are uniform in this way or that any program written in Smalltalk will have this quality? One reason I like a good type system is because most code is not uniform and the type system let’s me take messes and make them clean, pretty safely. But a type system is no panacea, of course.

                                                                  1. 5

                                                                    You can write bad code in Smalltalk, as in any other language. One thing that sets Smalltalk apart is that it’s as much a system as it is a language. Further, it’s a system represented solely in terms of objects sending messages to other objects. Further still, it’s a running, fully reflective, live system with that entire object graph readily introspected and manipulated. So, in Smalltalk–as Alan Kay said–architecture (uniform recursive object system) dominates materials (code). In Smalltalk, even bad code is easily navigated, understood, and refactored.

                                                              2. 7

                                                                “Ocaml build times are close to Go given how much moe expressive the type system is than Go’s.”

                                                                Build time is measured the same, no matter the language. One could say a longer build time is justified because of certain benefits, but it’s still a longer build time.

                                                                1. 4

                                                                  I’m not sure what I said you’re disagreeing or correcting me on. I am saying Ocaml builds fast, but slower than Go, but you get a lot of typing out of that extra time.

                                                                  1. 3

                                                                    That seems reasonable. The original statement might be more clear to me if it said “close enough to Go given…”

                                                                2. 3

                                                                  Ocaml has pretty reasonable build times. IMO, I would say Ocaml build times are close to Go given how much moe expressive the type system is than Go’s.

                                                                  That’s a good example, thanks. I’ll add it in.

                                                                1. 2

                                                                  Does anyone know if a dictionary implemented in a low-level language like Rust/C/C++/D/Go, and using the same compact/ordered structure, would have the same benefits?

                                                                  1. 3

                                                                    It seems likely. Most of the benefits discussed in the pypy blogpost about it don’t have anything to do with Python (GC friendliness is the only one that does, and that would probably also affect Go and D depending on the details of their GC).

                                                                    1. 6

                                                                      Interesting. FWIW, Go intentionally “randomises” map iteration order (this was introduced after people started relying on a stable iteration order).

                                                                        1. 7

                                                                          Users are the worst.

                                                                    2. 1

                                                                      That would be exciting, and there is precedent for this. Timsort was developed for Python, and has found adoption elsewhere. It’s now the default sort for Java 7!

                                                                      1. 2
                                                                      1. 10

                                                                        Maybe someone stole his database?

                                                                        (I updated the URL.)

                                                                        1. 1

                                                                          Huh, bizarre. I just copy and pasted the URL, and the fetch title button worked fine so it must have worked at some point! Thanks jcs for fixing it.

                                                                        1. 4

                                                                          I was disappointed to see the 2011 project is now dead after reading the authors' descriptions in the reddit ama.

                                                                          It isn’t entirely clear to me if the ‘generic programming’ and ‘type propagation’ of Clay2011 is spiritually different from, say, the type inference system in Haskell. Is one more expressive? Is one more concise? Any insight? Different implementations with an equivalent result?

                                                                          If the two are different, are there other languages with an approach like Clay2011?

                                                                          1. 4

                                                                            I was disappointed to see the 2011 project is now dead after reading the authors' descriptions in the reddit ama.

                                                                            Yeah, I used it a bit. It was good. It ran into a bunch of problems - AIUI primarily lack of funding/time to develop it. The primary author has made some promising noises about wanting to resume the project and looking for funding to do so, but I don’t know what the status of that is.

                                                                            It isn’t entirely clear to me if the ‘generic programming’ and ‘type propagation’ of Clay2011 is spiritually different from, say, the type inference system in Haskell. Is one more expressive? Is one more concise? Any insight?

                                                                            Clay’s generic programming can be better thought of as a rationalisation and improvement on C++ style compile time metaprogramming, only not terrible like that makes it sound. It’s much more ad hoc than the Haskell system, but also much more flexible. It’s quite pleasant to use, although does suffer a bit from the problem of it not always being very obvious what went wrong when it doesn’t work IIRC.

                                                                            1. 2

                                                                              I’m not sure why they need funding given they spun it off into a business with interesting projects:

                                                                              http://claylabs.com/portfolio

                                                                              Maybe just not enough revenue coming in for sparing development time? One thing I suggest for situations like these is making, marketing, and licensing at least one app whose profits cover the cost of at least one developer (i.e. the inventor) to continue working on Clay with majority of time. More as sales increase. Ideally, the app could be written in Clay taking advantage of its features. Then, it and other projects later become success stories that promote the language. If uptake starts, then they can sell support for it directly.

                                                                              I haven’t done an analysis to prove this model or anything. It just combines two, proven ones in a way that should imply the combo would work to sponsor a given project. In this case, developing the language. I’ll also add they should use one of the compiler or DSL frameworks that make this stuff easy to avoid compiler development being a time-sink of drudgery. I think that as default would help many new languages develop at least in prototyping phase where their features are in flux and they have stuff to prove.

                                                                              1. 2

                                                                                I’m not sure why they need funding given they spun it off into a business with interesting projects: http://claylabs.com/portfolio

                                                                                AIUI it’s the other way around. The language came out of the labs, and most of their projects are not using it. I don’t really have any inside knowledge on the situation though so may have the details entirely wrong.

                                                                                1. 1

                                                                                  Yeah, they built it. I’m not implying they’re doing the stuff in Clay. I’m just saying the Clay team has a business bringing in income on high-performance, server apps. Both could benefit it or from it in some way. They must just not have a lot of money coming in. Likely, they’re charging for their time instead of I.P.. Often does it.

                                                                            2. 1

                                                                              The author gave a great, brief description of it here:

                                                                              https://www.reddit.com/r/programming/comments/ctmxx/the_clay_programming_language/c0v6o0r

                                                                              That and sub-thread with Felicia may give you what you need to do the comparison.

                                                                              1. -5

                                                                                I was disappointed to see the 2011 project is now dead after reading the authors' descriptions in the reddit ama.

                                                                                just like every other ‘for future programming’ and ‘innovative’ new programming languages nowadays

                                                                              1. 2

                                                                                I tend not to learn new languages much these days unless I’ve got a good excuse to. I miss doing so (I used to do it quite regularly) but not quite enough to put in the effort.

                                                                                That being said, I’m more or less actively looking for an excuse to learn Rust and Elm.

                                                                                Elm, mostly to see what they did right and learn a bit more about their good type error messages. Rust simply because I’ve got a general interest in being able to write efficient low-level code in a safer language than C.

                                                                                I’ve also got a long-running unfulfilled desire to write something non-trivial in a forth, but I’m unlikely to ever get around to it.

                                                                                1. 1

                                                                                  Creator of Thyme here. We made this to track application usage and find possible interventions to boost our productivity at Sourcegraph. Happy to answer any questions and hope you find it useful!

                                                                                  1. 1

                                                                                    I guess my main question is how this differs from RescueTime?

                                                                                    1. 1

                                                                                      or ManicTime Whose main advantage is that all records are stored locally, however you can have your own Manictime server that you can publish your time records to.

                                                                                  1. 3

                                                                                    “Literally ten billion people whose surnames start with “O’” live in Ireland”

                                                                                    There are fewer than 5 million people living in Ireland. Makes me worry about how well researched this is…

                                                                                        1. 6

                                                                                          https://en.wiktionary.org/wiki/literally#Usage_notes

                                                                                          It’s literally in the dictionary that using “literally” to as a generic intensifier for figurative statements (i.e. to mean “not literally”) has been part of the language since the 1800s.

                                                                                          1. 3

                                                                                            It also says that anyone who isn’t a complete savage rejects this, although not literally.

                                                                                            1. 2

                                                                                              Only if you take a prescriptivist view on language; descriptivists have no choice but to accept it.

                                                                                        2. 2

                                                                                          If I was doing some research about how to write my schema, I wouldn’t apprechiate unnecessary rhetoric and exaggerations.

                                                                                          EDIT: To clarify, I’m not making the same argument as adsouza, I don’t doubt that the article is well-researched.

                                                                                          1. 8

                                                                                            I don’t think that’s the point though, this isn’t a well researched, peer-reviewed document, this is somebody expressing frustrations with existing systems, and pointing out some good “you should do this” points, in contrast to the existing articles which are all “don’t do this thing”.

                                                                                        3. 1

                                                                                          They only have to fill in 6 web forms a day to hit 10 billion O' a year :~)