1. -5

    It is only a disaster if your business relies on making use of other people work, in which they own the copyright.

    Not everybody can afford to create stuff and give it away for free, and there are plenty of people who want to earn money from there creative work.

    Those who have made a living from steeling other peoples’ material are up in arms that their free lunch not going to be free anymore.

    1. 17

      Or you run any kind of site where users can input anything that another visitor can see. Not just video and file sharing sites; Lobsters users could paste copyrighted content into a comment/PM and I’d be liable for not having a system implementing some kind of copyright controls.

      (To say nothing of Article 11 wanting us to start paying the news sites we link to for privilege of sending them traffic.)

      1. -2

        If somebody posted something here that I owned the copyright to, and I asked Lobsters admin to remove the material, then I imagine they would. If somebody kept posting this material they could be banned.

        Or are you saying that the Lobsters’ site should be a place where anybody can post copyright material, without any recourse by the copyright holder?

        1. 13

          The new law changes this standard safe harbor behavior. Lobsters (me) is presumptively at fault for copyright infringement for not proactively checking for possibly-copyrighted material before posting. So yes, your scenario is the current, reasonable law and accurately describes why everyone is concerned about this change.

          1. -2

            Lots of FUD being generated by those who will lose out. Copyright holders not making much noise about the fact they will probably make some money (or rather lose less).

            Some good points about what is going on.

          2. 4

            The law isn’t about that, though. The new law doesn’t say admins must take-down on request (that’s already the case under existing law) but rather that they must have an AI system that prevents any infringing uploads from happening in the first place.

            The link tax is a much bigger problem, especially lobsters, but both articles are very bad.

            1. 1

              AI system that prevents any infringing uploads from happening in the first place.

              How is that any different from what @pushcx said? As the owner/operator of lobste.rs he would have to abide by this law and produce, or buy access to some sort of copyrighted work database in order to test for it for all content that is created on lobsters.

              That’s not going to make it easy for startups. That’s not going to make it easy for privately owned, independent side projects. That’s just going to hurt.

              1. 2

                ALSO, you’d better not quote any part of my message if you reply, because I could, apparently, legitimately sue lobsters for not enforcing my copyright. e.g. there’s no such thing as fair use anymore.

                (yes, that’s a stretch, but that seems to be the basic threat)

                1. 1

                  I replied before @pushcx and yes, it seems we agree on how bad it is :)

                  1. 2

                    Blargh! I am sorry. I misread the thread and thought you were replying to pushcx.

          3. 6

            Or lobster gets a fine when you submit a link to any European news sites.

            1. 1

              What’s worse is that people will devise a way to signal what content is linkable and what only with license. This will limit quality news dissemination and strengthen fake news position. This will help to kill EU. Sad, right?

            2. 1

              most probably that lobster will be not able to post most of the links

            1. 2

              Ok, the summer project has just started and he has not read the standard yet.

              undefined behavior: behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements”

              Let’s hope he reads the C standard before implementing any checks. A freely available pdf that is not the actual C standard you buy in the shops (which has a fancy ISO cove r on it)

              1. 2

                “A minimalist knowledge approach to software engineering is cost effective because most code does not exist long enough to make it worthwhile investing in reducing future maintenance costs. Yes, it is more expensive for those that survive to become commonly used, but think of all the savings from not investing in those that did not survive.”

                This is something that I’m probably going to have to think more on. @derek-jones might even have data to support it in his collection. My data, though, indicated that most real-world projects from the 1960’s on up to present times run into problems late in the lifecycle they have to fix. Those fixes usually cost more in money or reputation. Some groups spent small, upfront investment preventing most problems like that. They claim it usually paid off in various ways. This was especially true if software was long-lasting. There were times when the quality cost more overall on top of a thrown-together project.

                Another issue is is that pervasively-buggy software conditioned users to expect that it’s normal. This reduces demand or competitive advantage of high-quality, mass-market software. Many firms, esp small or startups, can profitably supply buggy software so long as it meets a need and they fix the bugs. In enterprise market, you can even sell software that barely works or doesn’t at all so long as it appeared to meet a need making someone in the company look good. So, this needs to be factored into the decision of whether to engineer software vs throw it together.

                I still say lean toward well-documented, easy-to-change software just in case you get stuck with it. You can also charge more in many markets with better rep. Use the amount of QA practices that the market will pay for. If they pay nothing, use stuff that costs about nothing like interface checks, usage-based testing, and fuzzing. If they’ll pay significant amount, add more design/code review, analysis/testing, slices of specialist talent (eg UI or security), improvements to dependencies, and so on.

                1. 4

                  Cost/benefit for applications, there is also a less rigorous analysis.

                  Code containing a fault is more likely to be modified (removing the fault as a side-effect) than have the fault reported (of course it may be experienced and not reported); see Figure 10.75.

                  Other kinds of related data currently being analysed.

                  Microsoft/Intel is responsible for conditioning users to treat buggy software as normal. When companies paid lots of money for their hardware, they expected the software to work. Success with mass market software meant getting getting good enough software out the door, or be wiped out by the competition.

                  1. 2

                    I think IBM’s mainframe data might not fit your argument. IBM kept coming up with things like Fagan Inspections, Cleanroom, formal specs, and safe languages. They often experimented on mainframe software. A good deal of it was written in high-level languages like PL/I and PL/S that prevent many problems a shop using C might have. They have lifecycles that include design and review steps. In other words, IBM was regularly doing upfront investments to reduce maintenance costs down the line. The investments varied depending on which component we’re talking about. The fact they were trying stuff like that should disqualify them, though, as a baseline. A much better example would be Microsoft doing fast-moving, high-feature development in C or C++ before and after introducing SDL and other reliability tools. It made a huge difference.

                    Other issues are backward compatibility and lock-in. The old behavior had to be preserved as new developments happened. The two companies also made the data formats and protocols closed-source, complicated ones to make moves difficult. The result is that both IBM and Microsoft eventually developed a customer base that couldn’t move. Their development practices on maintenance side probably factor this in. So, we might need multiple baselines with some allowing lock-in and some being companies that can loose customers at any time. I expect upfront vs fix or change later decisions to be more interesting in the latter.

                    1. 2

                      The data is on customer application usage that ran on IBM mainframes (or at least plug compatibles).

                      1. 1

                        Oh Ok. So, mainframe apps rather than mainframe systems themselves. That would be fine.

                1. 20

                  Having got one business prediction right, I will stick my neck out and make another one (plus the obvious one that the world will not end because of this purchase).

                  1. -3

                    Since we are doing wild predictions … Here is one, I’ll stick a neck out on … You’re probably young, early in your career, say <5 experience in the industry. Certainly not in the industry since the 90’s early 2000s. It’s fine, there is nothing I can say on this topic that will change anything whatsoever. But save this thread. Queue it up for the day after your 30 B-day or say in 7 years. You’ll be amused between what the you of today believe and what the you of tomorrow will have learned in this industry and specifically about this purchase.

                    1. 22

                      Checking that one is too easy after he linked to a blog with posts dating back 10 years. And checking out posts from 2008… There it says having not taught programming for 25 years.

                      1. 19

                        IIRC @derek-jones was on the C99 standardization committee.

                        1. 3

                          The beauty of predictions is there capability of being wrong. I was wrong, surprised to be so, but wrong.
                          However, another prediction is still undecided, the impact of MS buying Github and how they will manipulate their influence over it compared to the counterfactual. I’m seriously not a tin foil hat kinda guy, but MS is just never a good thing when then step into any area whether the Internet, Browers, Software Development, OS’s, you name it. It is always a net-net-negative (not from a business standpoint of course) but from an overall “good” in that respective area. Far more harm than good will result.

                          1. 5

                            I still don’t get your reply to Derek. He never claimed that MS purchases are good for the community. In fact, he is predicting an EDG buyout solely because he thinks it will allow for vendor lock-in.

                            1. 4

                              I believe Derek triggered him with the line

                              (plus the obvious one that the world will not end because of this purchase)

                              Where he propably refers to his experience from earlier (pre-git, pre-internet) times and how there will be other ways for open source and development (back to Mailing lists, Gitlab, …).

                              But Grey, when hearing about GitHub not being changed too much (as Derek also stated in his posting, “sluggish Integration”, but also “more data friendly”), remembered history on Microsoft (they were anti-open-source and are working a lot on changing their image). GitHub being an “open-source community” therefore is in danger getting swallowed by this “anti-open-source business”.

                              And I can understand getting emotional about such things. And emotion kills rationality. Which propably led to this misunderstanding.

                      1. -2

                        That domain name is the worst thing ever, so many hyphens

                        1. 9

                          No, that’s this one.

                      1. 2

                        I’d offer to buy the companies or groups controlling most of the mining power. It’s usually an oligopoly like in the for-profit, non-crypto markets. Even possible to pay off individual executives to cut deals to lower the price. The resulting buy would probably be way, way, way less than $100 billion for Bitcoin. Hell, it might be less than a billion. That’s owning it, too, rather than a bribe to sabotage it in some way that looks like an accident. That could be mere millions.

                        1. 2

                          Mining power is actually quite decentralized, because for various reasons cheap electricity is decentralized. Mining pool is oligopoly, but mining pools don’t own mining power. Miners can and will leave pools if something happens.

                          1. 2

                            No, mining power is very centralized.


                            1. 1

                              Your link agrees with me. To quote:

                              Mining farms are perhaps the one area where manufacturers and economies of scale are not dominant. Good electricity deals tend to come in smaller packages, tend to be distributed around the world, and tend to be difficult to find and each involve unique circumstances.

                              1. 1

                                The link agrees with one point, good electricity deals. But the economics of scale applies to the building of devices that use less electricity.

                            2. 1

                              I appreciate clarification.

                          1. 1

                            Cutting free from the ball and chain makes it easier for committee members to add more complexity.

                            1. 2

                              There are PoW-less cryptocurrencies being developed. If these gain traction and turn out to be secure, then we can leave behind the first generation of cryptos based on PoW.

                              1. 2

                                Would you provide some more detail here/a URL perhaps?

                                1. 2

                                  These are the two I know of, which also seem to have serious teams backing them up.

                                  Nano: https://nano.org/en Iota: https://www.iota.org/

                                  They’re based on new architectures that enable to dispense with the concept of miner, providing the services that these gave in different manners. For example, Nano uses proof of stake. When there are two conflicting transactions i.e. a double spend, the network votes to resolve the conflict and each node has a voting stake proportional to the the amount of currency it holds, or the amount of accounts that delegate their vote to that particular node. Thus conflicts are resolved through vote. Iota uses a DAG architecture where the cost of making a transaction is doing PoW in the form of “confirmations”. The transactions that are more robust are those with the larger number of confirmations. Both currencies have a set supply so no new coins will be produced ever, this means that all the coins that will exist were generated in the first block.

                                  1. 1

                                    The problem with proof of stake is that once an entity has 51% they own the currency forever. With proof of work, it is a continual effort to own 51% (this is covered in the linked to article).

                                    A quick look at IOTA (not knowing anything about it), and it does not involve a blockchain and it’s not on Wikipedia.

                                    1. 1

                                      I see both PoW and PoS as protocols depending on the rationality assumption. Those that hold the power will act rationally, thus will want to preserve the value of their investment and as a consequence protect the network. Without the rationality assumption, we could have the top N miners combining their hashing power to destroy the network. What stops them from doing this?

                                      Whether IOTA or Nano are blockchain or not isn’t important I think, what matters is that they satisfy (theoretically, and Nano somewhat practically) certain properties that allow them to function as decentralized cryptocurrencies.

                              1. 4

                                Ok, it’s time to stop developing software, get some Verilog books, and go after them coins. That’s what we need to do people. :)

                                On a serious note, one thing the author didn’t mention that might reduce barrier to entry are structured ASIC’s. They use a bit more energy, they’re a bit slower, and have much lower cost to develop/deploy. The main company doing it is eASIC with their Nextreme’s. I remember their 90nm option had prototyping of fifty to sixty grand for a pile of chips at one point. Get it working on an FPGA designed for a conversion. Then, see if they can put it on a Nextreme.

                                1. 2

                                  It’s an interesting situation. Essentially China is extracting wealth from the world with this lock on asic production (Bitmain) and cheap hydro.

                                  1. 3

                                    It’s not China, but people of influence in China who are extracting money from China (which has currency controls).

                                    The electricity is cheap because the ‘right’ people are being paid off (ever wondered how electronics ordered on Amazon, that comes from China, has zero postage?)

                                1. 2

                                  pdf graphs to csv is the most interesting project :-)

                                  1. 13

                                    The author has a pretty explicit bias toward Earley parsing, and using LL(k) over LR(k) methods (which I mostly share). There’s a wealth of information here, but he’s missing a couple other significant strands of generalized parsing research outside of Marpa’s particular Earley variant (perhaps because they don’t fit within his narrative).

                                    GLR (Generalized LR):

                                    • Tomita, LR parsers for natural languages, 1984.
                                    • Tomita, Efficient parsing for natural language, 1986.
                                    • Scott and Johnstone, Generalised bottom up parsers with reduced stack activity, 2005. This segues into their later work with Earley and GLL.

                                    GLL (Generalized LL):

                                    • Scott and Johnstone, GLL Parsing, 2010.
                                    • Afroozeh and Izmaylova, Faster, Practical GLL Parsing, 2015.

                                    Outside of Earley, GLL seems like a very practical generalized parsing approach. Instaparse is one implementation.

                                    Earley / Shared Packed Parse Forests:

                                    • Scott, SPPF-style parsing from Earley recognisers, 2008 (and several related papers by Scott and Johnstone). Note: I’ve never been able to get the approach described in the paper implemented correctly, and not for lack of trying.

                                    Earley Intersection Parsing (not sure if there’s a canonical name for this):

                                    • Bar-Hillel, Perles, and Shamir, On formal properties of simple phrase structure grammars in Zeitschrift für Phonetik, Sprachwissenschaft und Kommunikationsforschung, 1961. Proves some important results about intersecting automata and context free grammars.
                                    • Chapter 13 (“Parsing as Intersection”) of Grune and Jacobs (also cited elsewhere in his timeline), particularly 13.4 (pgs. 437-439): This describes Bar-Hillel’s Intersection Parsing approach using contemporary CS & parsing terminology, and then suggests combining it with Earley parsing. While the basic intersection parsing method produces an astonishing amount of dead-end nodes to garbage-collect later, Earley parsers limit searching to productions reachable from the start symbol. If the completer is modified to produce non-terminals in the intersection parsing format (which is easy), intersection parsing nicely handles producing the parse forest (with structural sharing, when ambiguity produces multiple derivations).

                                    I’ve been working on a C library for Earley Intersection parsing, and an awk-like, pattern-match-directed language based on it, but working on testing them thoroughly tends to lead me to working on theft instead.

                                    1. 3

                                      Thanks for details about alternatives.

                                      I have played around with Bison’s GLR option and have almost managed to create a C parser that does not require special handling of typedefs in the lexer (ambiguities still appearing at runtime for a few constructs).

                                      Grune and Jacobs’ Parsing Techniques - A Practical Guide is sitting in a pile waiting to be read (the first edition was manageable, the second looks daunting)

                                      1. 1

                                        I have a print copy of the second, and it’s never struct me as daunting – reading it cover to cover, perhaps, but it’s well suited to dipping into for just specific techniques, with a well-curated bibliography for further details.

                                      2. 2

                                        packrattle is another example of a GLL parser.

                                        The times I’ve read about Earley parsers, they seem like the same solution as GLL, but attacking the problem from a different angle. Hopefully someone in academia will eventually prove that.

                                        1. 2

                                          I’m particularly interested in Earley parsers because some use cases of mine assume ambiguity (reverse-engineering binaries, scraping damaged data), and the chart that Earley parsers build up supports several operations beyond just recognizing & building parse trees:

                                          • what terminals/nonterminals would advance the current parse(s) in progress? (i.e., autocomplete)
                                          • if we don’t assume where the starting position is, what overlapping instruction encodings are there?
                                          • if we inserted a fake nonterminal here, would enough of the parse have completed to match any instances of (some structure) between here and there?
                                          • incremental re-parsing of changing input, since earlier columns in the chart are constant once the parser has moved on.
                                        2. 2

                                          Thank you for adding this. I understand why everybody is excited about Earley parsing, though I personally don’t feel sufficiently grounded in it to share that excitement, but at the very least other lines of research deserve to be mentioned.

                                        1. 6

                                          Amazing. Well written, well researched and concise. Parsing is such an interesting problem - it arises naturally almost anywhere you find text.
                                          I’ve read the phrase “parsing is a solved problem” so often in discussions about it but this article puts that in its place. Thoroughly.

                                          1. 5

                                            Parsing is a solved problem in the sense that if you spend enough time working on a grammar it is often possible to produce something that Bison can handle. I once spent several months trying to get the grammar from the 1991(?) SQL standard into a form that Bison could handle without too many complaints.

                                            Bison’s GLR option helps a lot, but at the cost of sometimes getting runtime ambiguity errors, something that non-GLR guarantees will not happen.

                                            So the ‘newer’ parsing techniques don’t require so much manual intervention to get them into a form that the algorithm can handle.

                                            1. 3

                                              I think that usually means for most languages and problems people are applying it to. If so, then it probably is a solved problem given we have parsers for all of them plus generators for quite a lot of those kinds of parsers. Just look at what Semantic Designs does with GLR. There’s FOSS tools that try to similarly be thorough on parsing, term-rewriting, metaprogramming, etc. At some point, it seemed like we should be done with parsing algorithm research for most scenarios to instead polish and optimize implementations of stuff like that. Or put time into stuff that’s not parsing at all much of which is nowhere near as solved.

                                              I still say it’s a solved problem for most practical uses if it’s something machine-readable like a data format or programming language.

                                              Note: It is as awesome an article as you all say, though. I loved it.

                                              1. 5

                                                A solved problem, but for a subset of constrained grammars and languages. Still people say it without any qualification as if there’s nothing more to be said or understood.

                                                I take your points though and there are definitely other areas that need work.

                                                I think my point is just that I’m glad people are still putting effort and research into solved problems like this. No telling where breakthroughs in one area can lead and we’re still restricted constructing and representing language grammar.

                                                1. 3

                                                  A solved problem, but for a subset of constrained grammars and languages. Still people say it without any qualification as if there’s nothing more to be said or understood.

                                                  Yeah, they should definitely be clear on what’s solved and isn’t. It’s inaccurate if they don’t.

                                            1. -3

                                              The authors obviously don’t know anything about R.

                                              1. 3
                                                1. 0

                                                  What is the difference between array and vector languages? Vector languages operate on slices of arrays (which might be called array languages).

                                                  1. 1

                                                    I’ve always heard the terms used interchangeably.

                                              1. 3

                                                The author has discovered the work of Anne Chao (a very worth while read if you are into fault prediction) and has built a tool for calculating some of the formula involved (the paper is a bit of a repetitive ego trip in places).

                                                Anne Chao has written some software, which, shall we say, has an academic user interface.

                                                One I did earlier.

                                                1. 2

                                                  This group have been doing the best work in software cost estimation (something of a lost cause) for many years now.

                                                  More data on the variability of programs implementing the same specification

                                                  1. 1

                                                    With you saying that, I’ll definitely look into their other work. Thanks for the tip!

                                                  1. [Comment removed by author]

                                                    1. 2

                                                      I havent thoroughly read/vetted this one yet. My skimming of it showed they do verified data structures, use KLEE for symbolic analysis, and tie things together with contracts. KLEE and contracts have a bunch of bugs exposed in the field as empirical justification for their use. The things worth more exploration are how libVigor works (esp formalism) plus their specs for safety of things like pointers.

                                                      Even if those unknowns were bogus, the NAt-level specs and/or empirical components they’re chaining may prove useful to a reader doing something similar. So I submitted it for that.

                                                      1. 3

                                                        They use KLEE (well worth investigating) to “..prove that VigNAT is free of the following undesired behaviors: buffer over/underflow, invalid pointer dereferences, misaligned pointers, out-of-bounds array indexing, accessing memory that is not owned by the accessor, use after free, double free, type conversions that would overflow the destination, division by zero, problematic bit shifts, and integer over/underflow. Proving these properties boils down to proving that a set of assertions introduced in the VigNAT code—either by default in the KLEE symbolic execution engine [9] or using the LLVM undefined behavior sanitizers [42, 43, 57]—always hold.”

                                                        At least they define what level of proof they achieve.

                                                        1. 3

                                                          KLEE can be used to prove some other things, you can make assertions on the symbolic values and thus prove properties about them.

                                                      2. 1

                                                        Just reread the paper in more detail. They do more than I originally thought. They annotate the code to collect lots of state information during program execution. When the program terminates this state information is checked for consistency “… we wrote a NAT specification that formalizes our interpretation of the RFC, and which we believe to be consistent with typical NAT implementations.”

                                                        I did not see an discussion of how they obtained all the input necessary to invoke all of the behavior specified by RFC 3022. Runtime checks are limited by the input that gets processed.

                                                        1. 1

                                                          FWIW, in my opinion it is better to put such an update as an edit note in the original, rather than delete it. Gives better context and would probably stop any down votes.

                                                        1. 3

                                                          The question is not whether these techniques are practical and useful, but whether they are cost effective (after discounting the claims that soap powder manufacturers would not get away with).

                                                          Would you buy second hand software from a formal methods researcher?

                                                          Does using formal methods mean anything?

                                                          1. 1

                                                            whether they are cost effective

                                                            For whom? Doesn’t that depend on a budget and an assessment of the cost of failure?

                                                            1. 1

                                                              The vendor gets to make the cost decision, based on what the client is willing to pay.

                                                            2. 1

                                                              I don’t trust soap claims any more than the FDA. Plus, advertisers are among most full of shit group out there. They claim they’re about one thing but really only care about pushing product. It’s always weird to me you’re using that as the point of comparison instead of evaluations like Underwriter Laboratories in Consumer Reports or something. I’m glad most formal methods research uses techniques with more evidence of effectivenss than soap ads.

                                                              Far as your other links, I’m already in agreement something like an unverified, C parser needs extensive validation before anything depending on it is “verified” in any way. On the errors in proofs link, I noticed you’re not differentiating between two things even that 1979 critique did: human-oriented proofs that leave out key details vs proofs done in systems that machine-check each low-level step for consistency. Somewhere between most or all of the proof errors I’ve seen look like they’re in the former. The errors in the latter people have showed me were only specification errors so far despite a lot of proof terms. The LCF/machine-check approach that “most mathematicians” avoid out of tradition before making their mistakes seems like it would help them out. That means your claim, if matched against empirical data, weakens to “there could be errors in the specs while the proofs are usually robust.” And then fairness would note formal specs usually reduce errors over informal ones. And then we’re at the cost/effectiveness assessment which is a fair critique for average use case that wouldn’t tolerate the cost.

                                                              Another problem I saw was that VDM was a major player but not really the best point of comparison for cost-effectiveness. Cleanroom, Praxis’ work, or maybe SPIN on distributed systems would be. Those three had most industrial success with these two for regular software: Cleanroom studies that showed low defects at good cost-effectiveness, sometimes no extra cost; Praxis’ Correct-by-Construction with Z and SPARK Ada with tiny, defect rates in delivered code at 50% premium. Both had companies warranting their code at specific defect rates, too. Cleanroom faded due to IBM charging for the method and I hear less of Praxis stuff after Altran bought them out. Oh well.

                                                              Since plenty of SPARK case studies, I just got Stavely’s Cleanroom book to review that method. Most approachable one I’ve seen yet with key features that would give a lower defect rate based on results of prior methods that were similar. However, the careful, algebraic methods might fall apart on modern libraries or big, messy problems requiring basic science to handle effectively in realistic constraints. ;) I’m mainly reading it to assess it for a future study with a modern language and tooling to test effectiveness in a variety of scenarios. Especially combined with something like SPARK or Haskell.

                                                            1. 7

                                                              Ok, there’s been some debate here about mathematical vs empirical evidence. Some people highly believe in formal proof for reasons that probably vary. @derek-jones is main opponent saying those methods are nonsense advocating for strictly empirical approach. I’m somewhere in the middle where I want any formal method to have evidence it works in an empirical sense but I don’t see the concepts as that different. I think we believe in specific types of math/logic because they’re empirically proven. As I formulated those points & dug up related work, I found this submission that argued a lot of what I was going to argue plus some other points (esp on intuition) I wasn’t thinking about. So, enjoy!

                                                              If you’ve read it at this point, I’ll weaken that claim to mine that certain maths/logics are empirically valid by being relied on in hundreds of thousands to billions of uses. If they were wrong, we’d see massive amount of evidence in the field that the fundamental logical methods didn’t work. Matter of fact, some of these failing would make it impossible for Derek to even write his rebuttals online given propositional logic and calculus are how mixed-signal ASIC’s work. ;)

                                                              So, I’m going to list some specific types of math with widespread, empirical confirmation as a start. Anyone else feel free to add to the list if it’s math/logic used in formal proofs that has massive level of real-world use with diverse inputs or environmental setups.

                                                              1. Arithmetic on integers and real numbers. Computers can be boiled down to glorified calculators. Arithmetic can help bootstrap trust in other things, too.

                                                              2. Algebraic laws. Logics like Maude build on these.

                                                              3. Calculus like integration and derivatives. Heavily used in engineering. Also, building blocks of analog circuits and computers implement these primitives.

                                                              4. Propositional and Boolean logic. Digital designs are synthesized, verified, and tested in this form. With all CPU’s shipped, it’s probably the most well-tested logic after arithmetic and algebra.

                                                              5. Some subset of first-order logic. There’s been a lot of experimental confirmation it works in form of Prolog and Datalog programs that do what the logic says they should. I’ll note the older Prolog work was a bit weaker since it was associated with lots of zealotry that led to AI winter. We can focus on working applications, though, of which there are many in commercial and FOSS. Although it has a niche, Prolog was pushed way out of it to implement diverse array of programs and environments. The SMT solvers also catch problems they’re supposed to on a regular basis. All problems I’ve seen investigated in these were implementation errors that didn’t invalidate the logic itself.

                                                              6. Set theory. Experiments with the principles seem to always work long as problems and solutions fit the simple modeling style. It’s pretty intuitive for people. It’s been used to model many problems, esp theoretical, whose applications and rules were reviewed by humans for accuracy. There is some use of it in programming in data structures and composition. Deployments haven’t invalidated the basic operators.

                                                              7. Basic forms of geometry. You can get really far in making stuff happen predictably if it’s modeled in or built with simple shapes. See Legos and construction in general.

                                                              8. Basic probability theory. There’s a ton of use of this in empirical research. They all seem to trust the basic concepts after their observations of the field.

                                                              One can do all kinds of hardware/software verification using 1-6 alone. I’ve seen 7 used periodically in applications such as digital synthesis and collision avoidance for guidance systems. I’ve seen 8 and similar logics used when the problem domain is imprecise. If those logics and their basic rules are empirically-proven, then the applications of them have high, empirical confidence if they’re using the rules correctly. At that point, most if not all problems would show up in the extra items with less or no empirical verification such as formal specifications of the problem and implementations/checkers for those logics. Formal methodists always note things like checkers are in TCB keeping them simple and small. Verified checkers also exist which we can further test if we want. Regardless, looking at these logics and their rules as theories with massive, empirical validation means we’d trust or watch out for the exact things the formal verification papers say we should. Everything in the TCB except the logical system and rules if it uses one of above.

                                                              The mathematical concepts themselves are as empirical as anything else. Probably more so given they’ve had more validation than techniques many empiricists use in experiments. The statisticians and experimenters arguing among themselves about the validity of certain techniques probably don’t have slightest doubt in basic arithmetic or algebra. It’s clear given they rely on math in their experiments to justify their own claims. Gotta wonder why they doubt these maths work only when the inputs are numbers representing programs.

                                                              1. 4

                                                                A few quick comments:

                                                                1. One serious problem with empiricism (or any claim about “reality”) is that its always conditioned (and usually unconsciously) by cultural values, politics, language, and more subtle aspects of human cognition. If you look, it’s easy to find plenty of historical and contemporary examples of people claiming empirical proofs of “facts” which are (to our way of thinking) absurd, repugnant, or both. Note that this is a separate concern from what philosophers call the problem of empiricism, which is basically that the decision to draw conclusions from any finite amount of data is an arbitrary one and thus the conclusions incomplete. That’s like what formal methods people say about unit tests.
                                                                2. Mathematical practice predates the “Enlightenment” and what historians call “modern science” by some millennia. When Gauss called mathematics the “Queen of the sciences” he meant the older sense of the word “science” as an organized body of human knowledge, without any specific constraints on organizing principles. I would say that modern mathematics is somewhat unique in rejecting empiricism and insisting on symbolic, axiomatic reasoning.
                                                                3. What’s empirically durable are the relations between formal systems of symbols and various kinds of measurements of more concrete phenomena. We should certainly hope that these would apply to our artificial computing machines at least as well as messier “natural” phenomena, since we intentionally build them to behave according to your systems 1-4 (and 5 to some extent).
                                                                1. 2

                                                                  What I have been saying, and will continue to say, is that many of the claims made in papers using formal methods are intellectually dishonest and are something that soap powder manufacturers would not be allowed to claim (by the advertising standards authority).

                                                                  Rather than respond to what I am saying, you appear to be branding me as being anti-formalism in general. This is an effective technique for getting others to ignore what I am saying and is as intellectually dishonest as much for published formal methods work in software engineering.

                                                                  1. 2

                                                                    Rather than respond to what I am saying, you appear to be branding me as being anti-formalism in general. This is an effective technique for getting others to ignore what I am saying and is as intellectually dishonest as much for published formal methods work in software engineering.

                                                                    Whoa, now! I can only go on what you’re saying in the comments and the articles you link. Overstatements are just one thing you’ve said. I’ve never seen you approve of a single thing in formal methods as doing what it claims to do. You have dismissed quite a few of them. You said KCC was more accurate on C semantics but that could mean less of a screwup or waste to you. You also state a preference for empirical methods over them when I’ve argued some of their foundations are themselves empirical with massive sample size. Given what you say, it’s reasonable to think you’re anti-formal methods instead of just trying to help them improve their accuracy or trustworthiness. It was only recently that I saw you use more narrow complaints like cost that mean you might accept some form of formal methods. I had to look at that in light of your other comments here, though, when trying to interpret them.

                                                                    So, which formal methods techniques or projects in software have you liked, would build on empirically, and why? Which logically-deduced claims in empirically-verified logics will you trust over experiments using less-verified methods that claim the same things? If I’m misinterpreting your position, those would be a nice start of a better interpretation.

                                                                    1. 1

                                                                      You are a true believer and membership of this tribe invokes a response to those who criticize what you believe in. Intellectual dishonesty is not the appropriate term to apply to your approach (but it it was the rift I was work for formal methods), drinking the cool aid, tinted glasses, blinkered view, something along these lines. Even when I say something positive (e.g., KCC), you attribute negative intentions.

                                                                      Anyway, you are honestly searching for answers and reading what I have to say (which is a cut above most). But, I am not inclined to debate a true believer.

                                                                      1. 1

                                                                        re true believer and part of a tribe. Let’s test us empirically. :)

                                                                        An empiricist will collect data for and against various methods out there. There’s been more empirical studies done without formal methods. I’d expect you to have more data on them. You do. There’s also been lots of case studies in individual projects and on piles of code for static analyzers, model-checkers, and provers. These collectively show formal methods work across a lot of situations for a lot of properties. They’ve been pretty consistent, too, on what kinds of problems they catch. You don’t report or account for any positive findings in these conversations or write-ups that make Lobste.rs or your paper collection on your website that I could tell. You do describe the negatives in great detail. Your mention of KCC matched that established pattern or theory having almost nothing in terms of supporting or differentiating detail like your takedowns do. So, my interpretation of an empiricist discussing formal methods without ever crediting positive results in experiments is heavy bias against those methods. You also may only count specific types of experiments as evidence which filters out most results in formal methods.

                                                                        Now let’s look at what I’m doing. I did two write-ups arguing for applying every method of assurance because we can’t trust any one to do it all. I’m heavily in favor of high-assurance certifications that combine them. I submit esoteric methods for assurance that people might miss which are mostly formal methods since they’re missed most in these forums. I also submit clever informal methods like this, this, and this that you may not see due to lower votes. On skeptical side, I also called for tiny checkers to find errors in proof terms, exhaustive testing of both formal specs and provers, checking program logics with empirically-vetted ones (eg set theory), avoiding overselling to not repeat AI Winter for formal methods, and even warn newcomers that maybe they shouldn’t try it at all. That’s more Skeptic Magazine than true believer talk.

                                                                        What I do have from my background is a bias in favor of them that I counter by involving skeptics like you on specific submissions likely to have errors my bias will miss if I’m in a hurry. That’s worked out. My consistent recommendations on this forum are empirically-backed safe languages and automated testing combined with lightweight, formal methods like Design-by-Contract for interface checks or TLA+ for concurrency. A true believer would recommend Coq and Verdi but the others have better cost-effectiveness based on prior experiments. I’d say the true believer hypothesis gets the null result here. More like “biased with false positives also inviting constant peer review.”

                                                                        “Anyway, you are honestly searching for answers and reading what I have to say (which is a cut above most). But, I am not inclined to debate a true believer.”

                                                                        I didn’t actually want a debate. I wanted two things:

                                                                        1. In this submission, I pointed out that specific maths have so much empirical evidence behind them that anything using them as they’re used successfully in the field should have more weight than most empirical experiments with smaller samples or error-prone math. Given the logic is stronger empirically, I wanted to know when you’d trust a logical result as much as or more than experimental evidence for a method for software correctness. I’m not even talking in general where you have to factor in things like requirements errors. I’m talking about logically modeled properties that provers say a piece of software has with a tiny TCB checking the analysis in a logic with massive empirical proof of accuracy.

                                                                        2. Since you seemed surprised in the tangent, I asked what specific formal methods you believe in from reading empirical data on their effectiveness. I’ve seen and posted it for requirements, models, code, testing, repair, and so on. If you’re not anti-formal methods, you’ll probably have a list of methods with corresponding papers/articles you think well of that researchers like you might agree with. If there’s mutual agreement, leveraging such things might get support from your crowd in form of studies corroborating or rejecting the methods with review and/or experiments. One can’t begin to know what you’d work with or build on until you tell us which ones you found experimentally justified and why. My list so far has you liking KCC models’ realistic semantics for C with no comments on assurance of its tooling (esp compiler). That’s it.

                                                                    2. 2

                                                                      Would it be reasonable to say that while there’s some evidence that code verified with formal methods is more likely, based on our empirical observations, to be more correct than code not verified with it, all other things being equal? It doesn’t mean that it will be correct, or that we get the same benefit with all formal methods, but it means that we can be enthusiastic about FM without being blind to the limits.

                                                                      I believe in this for a few reasons:

                                                                      • We’ve seen mathematically elegant ideas in CS crash and burn before. The main one that comes to mind is N-version programming, which seemed wonderful in theory but failed in practice.
                                                                      • We’ve seen some evidence that code using formal methods still has bugs, but fewer bugs than similar systems without it.
                                                                      • It sidesteps the whole “is math abstract or empirical”, which appeals to the Wittgenstein fanboy in me.
                                                                      1. 2

                                                                        If you put effort into verifying/checking code, it will be more reliable that if your don’t. The question is, which technique is most cost effective, i.e., bang for the buck in reducing fault experiences.

                                                                        N-version program has many benefits, it is the idea of it being a magic bullet that crashed and burned.

                                                                        1. 1

                                                                          The main one that comes to mind is N-version programming, which seemed wonderful in theory but failed in practice.

                                                                          I’m still not sure it failed unless it’s the naive, original form. When researching security via diversity, I remember one paper pointed out that people were doing multiple implementations that often had the same algorithms, dependencies, and so on. They were too similar even when different people wrote them. So, they tried to do an automatic approach to swapping out functions with implementations that were structurally different but equivalent. Then, flaws due to the structure in one might not affect the other.

                                                                          Multiple versions behind a similar interface often don’t have same problems in hardware and OS’s probably for this reason. Sometimes they do but often don’t. I think ensuring implementation is diverse structurally and/or temporally is at least worth some further research in multiple areas of implementation before I’ll call N-version concept as a whole failed. I think the initial version of the idea just missed a key requirement as proven by those experiments.

                                                                          “We’ve seen some evidence that code using formal methods still has bugs, but fewer bugs than similar systems without it.”

                                                                          Yep. I’ll add that most of the case studies also say they caught bugs that review and testing missed. That’s usually specifications and model-checking. Then, the proofs caught the deepest bugs but required the most effort with simplifications. That should become the baseline belief about them given that’s been pretty consistent going back decades. What changes is tooling applicability to platform/language X, labor cost, time-to-run, effectiveness, and so on. Mostly getting better over time but still has limitations.

                                                                      2. 2

                                                                        Yeah that all sounds pretty sensible to me. Just because you can describe something mathematically doesn’t mean that’s how the real world behaves. basically you can prove something is logically inconsistent, or you can evidence that something is logically consistent, that doesn’t mean your computer is say even fast enough to do it. Theory and practice both exist, and I think it’s reasonable to assume they both exist for a reason because they’re both useful. I think certain domains also lend themselves more to formal verification. As I understand it, it’s because the cost of testing is high or the cost of failure is high, or both. If a program costs you a million dollars to run if it fails you better believe formal methods are going to be a priority.

                                                                        1. 3

                                                                          That last point also ties into discussion of whether there should be software liability. The reason formal verification got popular in hardware probably traces back to the Intel FDIV bug that cost them a pile of money. That’s because they might have to do recalls or something on hardware. With software, they try to maintain EULA’s that say they’re not responsible for their own work. If that changed, we might see an uptick in use of both formal verification and other assurance techniques that covers at least what regulations or court rulings required them to cover.

                                                                          This isn’t hypothetical, though, since market responded with activities such as formal verification, model-driven development, and static analysis in response to both TCSEC and DO-178B (now DO-178C) requirements. They’ll definitely do it if failures cost a fortune. Due to basic economics, the tooling will improve and get cheaper as demand and volume go up. EiffelStudio and Ada/SPARK with contract-based techniques are probably the highlights here where they can knock out tons of errors but don’t cost well into five digits a seat. That is, if I’m recalling pricing correctly.

                                                                          1. 1

                                                                            With software, they try to maintain EULA’s that say they’re not responsible for their own work

                                                                            This is probably going to change with the widespread adoption of tools driven by artificial intelligence.

                                                                            Somebody must be accountable of their errors. That actually are always reconductable to humans.
                                                                            Even when we will be able to create an artificial general intelligence.

                                                                            1. 1

                                                                              That’s unlikely. It’s a political problem like a lot of the “technical” problems. The software companies in the U.S. pay bribes to politicians to pass laws that benefit them or not pass laws that hurt them. They’ve fought both acquisition policies demanding high security and liability laws making them responsible for their work. An AGI can notice anything it wants but corrupt politicians will just ignore it.

                                                                              In other areas, though, AI’s could be helpful like this one that checks contracts.

                                                                              1. 2

                                                                                It’s a political problem like a lot of the “technical” problems.

                                                                                Just like the first death caused by a self driving car.

                                                                                Since you cannot punish an software, someone else need to pay for these deaths.

                                                                                Maybe the CEO, or the stock holders.
                                                                                And I guess that all politicians that now vote to protect the industry, will have an hard time to face the parents of a victim and politically survive.

                                                                                AI is a great technology, but it’s not yet ready to decide over humans.
                                                                                And we should not be fooled by marketing and hype: we should fight business-aided ignorance by sharing knowledge of what AI is and how it works.

                                                                                There is always at least one person accountable. He cannot hide behind a black box.

                                                                                1. 2

                                                                                  That gets into a bigger problem, though. Companies might be held accountable when things don’t work out. If it’s a big company, they’ll loose some money instead of conspirators doing prison time. Often, the conspirators will be rewarded by future, high-paying jobs if they were acting in interests of their company when they made the gamble. It’s those that misled the company itself that might get severe punishment unless they had golden parachute built into the deal. The result may get those self-driving cars off the streets, force improvements on them with people still dying/suing, and/or get companies to invest in more lobbying or defense attorneys reducing liability.

                                                                                  So, it’s a possibility that depends a lot on what companies are involved, the nature of the accidents, and so on. Personally, I agree that the unknowns in self-driving cars make them a risk that has high likelihood of negative consequences for users or suppliers. I can neither be sure nor make a general claim about technological harm, though, given all the corruption and successful defenses by predatory companies in the past. I will give you this excellent article that talks about the problem from a different side than usual. I love the example in it of how people will react to autopilot disengaging when a bus is coming at them. Scary shit. ;)

                                                                                  1. 2

                                                                                    WONDERFUL READ.

                                                                                    I just added a link to it in my article about AGI: I didn’t know that it was called “the paradox of automation”, but the concept was already there as the only true threat from an artificial general intelligence to humanity.

                                                                      1. 0

                                                                        This paper spends a lot of time pointing out possible reasons there might be faults in the GNU Multi-Precision library and then suggests that if an arbitrary-precision integer library was implemented using a language co-invented by the authors of this paper, the code would be correct.

                                                                        Lobste.rs needs a vanity-research tag.

                                                                        1. 9

                                                                          We might need a fantasy skepticism tag, too, if you keep selectively reading papers or ignoring prior work like that. Right after the section you mention, they describe the foundation of their solution: the ML-like language with formal specifications and solvers in Why3. The prior work on ML-like languages indicates using them lowers defects compared to C. The prior work with Why3, by itself or with stuff like SPARK on top, caught plenty of defects of the exact nature they’re describing such as range errors, alias, etc. It’s also cathedral-style where a focused team takes it a component at a time instead of drive-by contributions. That itself has increased quality in prior projects. Based on prior data, any of these should make a safer library. A combination even more so.

                                                                          Re-skimming it, I find that they attempt to address some of the problems that come to my mind:

                                                                          1. Mismatch between safe language and target. They use an imperative subset of Why3’s language which has already been integrated with C, Java, and Ada with real-world use. Lots of errors caught but few introduced. Subset itself is close to C. Increases confidence.

                                                                          2. C memory model can have errors. They make a simple one for just features of C they need. Less complexity = less errors. Legacy code doesn’t get machine-checked against a memory model at all.

                                                                          3. Humans need to inspect the code it produces to find errors in their scheme. They include numerous examples of the WhyML code side-by-side with extracted C that show extracted code is quite readable. That means reviewers can check it. Since it’s C, the reviewers can also apply a huge ecosystem of field-proven tools to find bugs.

                                                                          4. Provers could get something wrong. Although this area needs more work, this project uses four provers mainly so each can catch things others miss. A side effect is an increase in trust for whatever they all agree on. They said that data is hard to precisely know for some reason. Fixing that will help with 4 for future experiments. Also, they corroborated the first part when finding specific provers were really good at handling specific properties. Any empirical confirmation or use of provers could increase confidence by tying problem types with provers shown experimentally to handle them well.

                                                                          5. Difference in integer types for adder. They use two, one-limb whereas GMP uses two, two-limbed addition. That leads to a slowdown. They say they plan to switch but I wonder why they broke compatibility there in first place. Best case is it could be as simple as them not knowing it was important to do the addition a specific way. There’s a hint of that in the problem statement where a lot of optimizations are built-in to GMP that people might not know about. Worst case is it was too hard to verify in time for paper submission.

                                                                          If ignoring backward-compatibility, the stuff in the related work section is stronger than this work. Especially which uses HOL4 with its small checkers in TCB or using SPARK which is field-proven in many projects. A drop-in, compatible replacement for GMP makes sense, though. This work even with deficiencies inspires more confidence than methods used to build GMP with their associated vulnerabilities and difficulty of analysis.

                                                                          1. 6

                                                                            You are highly biased against formal verification research and have already made several incorrect/misinformed claims about it (blog posts). I respect your C expertise, but I think people should be aware that you are extremely biased in this case.

                                                                            Besides, frankly this Vanity Research or Real Project blog post is insulting and I think you could avoid that. You have a different philosophy about scientific research than what is commonly accepted (you seem to disagree with formalism and believe only in experiments, which is an extreme but possible position), it does not justify insulting or “naming and shaming” people that recognize, accept and follow other (perfectly consensual) scientific processes.

                                                                            1. 2

                                                                              “you seem to disagree with formalism and believe only in experiments”

                                                                              The amusing thing is that his experiments use formal methods in form of numbers, algebra, probability, and propositional logic (i.e. computer gates they run on), possibly model-checking (i.e. concurrent algorithms they use). He seems to trust math given he derives empirical results by plugging data into such mathematical formulas. That’s what people do with formal verification. Except, there’s both more a faith-it-works aspect and higher error margins with the statistical math versus the simpler forms of logic used in program verification. Maybe there’s something I’m missing but empirical methods seem highly dependent on math functions and reasoning on output of math functions. For some, especially probabilistic vs simple logics, it also looks to take more trust to believe statistical models than logical ones.

                                                                              I’ve been thinking on that since he first started talking about superiority of [math-based] methods he called emprical being better than math-based methods others called formal. I dug up some articles today in what spare time I had to further explore this contradiction. I was guessing core areas of math/logic could probably be considered empirically proven given all field data confirming the basic primitives and rules in them. The stuff I was finding was corroborating that or vainly trying to reject falsifiability due to math’s abstraction level. I might post some references or summaries in the future on this topic when I have more time and sleep.

                                                                              1. 3

                                                                                Let me mention that I appreciated your replies in this thread. I found the general skeptical tone of the discussion baffling, but I think that you did an excellent job addressing concerns and discussing them with patience.

                                                                                1. 2

                                                                                  Thank you. :)

                                                                                  The skepticism comes mostly from his extreme bias but there’s a kernels of truth in it that makes me address this stuff:

                                                                                  1. Overselling of what’s accomplished in some projects. For instance, his blog correctly mentions CompCert was not a verified C compiler. It was a verified Clight compiler IIRC with an unverified, front-end for C or a C subset. I think they added something formal in the front later. Another is when some talk like there can be no errors because they did a portion formally. Countering that stuff is worthwhile.

                                                                                  2. There’s also a habit of not doing rigorous testing after something was formally specified or proven. The old Guttman paper had a lot of examples of verified programs failing after someone so much as ran them once or did some testing. The specs or verification methods can be wrong. That’s why simultaneously doing that and empirical methods makes sense. Of course, high-assurance field has been advocating that longer than any other with formal proof one tool among many. It usually catches the deepest bugs but other methods are more cost-effective for other bugs.

                                                                                  3. The results in many of these experiments aren’t reproducible or cross-checked enough. It helps that most are using the same FOSS software for the proofs with a lot of the work published as open-source. There’s also been efforts he doesn’t assess to reduce the TCB of checkers and extractors down to nearly nothing. There’s still a lot of room for improvement on experiment design, reproducibility and review, though. My bet being we’ll see errors in some of this work as other parties either take a close look at them or try to apply them to lots of real-world software.

                                                                                  4. Cost. Formal methods, depending on how heavy, can add enough cost in terms of less features and time-to-market/completion of a project. Specific ones in industry like Astree can also cost money. This is a tangent here given this is just a research report showing how formal methods were applied to a piece of code. It is worth thinking about when deciding whether to use them or how much. I usually advocate lightweight, formal methods combined with review and automated testing if we’re talking regular software developers with limited interest in or resources for highest correctness.

                                                                                  Although still waking up, those are four reasons that come to mind for why you might see a skeptic in formal methods threads. I didn’t think “pixie dust” was warranted on this work given they picked battle-tested foundations that were FOSS to improve a problematic piece of software. It’s one of the better applications imho. If people want more empirical data, they could have some people code up some libraries like this in C and a group do it in Frama-C, SPARK, or WhyML. Then, run a bunch of static analyzers and fuzzers on it looking for the kind of errors these tools prevent. If our side is correct, they’ll see a drop in bug count. I got money on that given we’ve already seen a drop in bug count every time these methods are used. ;)

                                                                                  1. 3

                                                                                    Nothing wrong with addressing concerns.

                                                                                    For the record, I don’t agree with those claims that Compcert (or related work) is oversold. I think that it is perfectly fair to described Compcert as a “formally verified C compiler”. The Compcert website is clear about which parts of the compiler have or don’t have a formal specification, and the publications about this work are explicit about what precisely has been specified or proven.

                                                                                    I think that the accusation of overselling come from taking a scientific work out of its context (a scientific community discussing software verification) and blaming it for generating unrealistic expectations outside this context. I can see how someone misinformed about what formal specification is could believe that “formally verified” means “will not have a bug ever” without asking themselves what specification it was specified against; but I don’t think you can blame Compcert for that.

                                                                                    I agree of course with your point 2, 3 and 4. But I think it’s also important to point out that those should not necessarily be expected of academic work. Academics that are trying to push the boundary on verified software may find out how to build interesting programs, without being expert on how to do production-testing of those programs, and I don’t think it would be reasonable to expect them to do both – assuming that industrial/production testing methods in the area have not been made publicly and easily available. “There was no production-grade testing of this system yet” is a perfect reason for an engineer not to jump ahead to use the code proposed by a verification effort, but I don’t think it is a valid criticism of the research work.

                                                                                    Finally, I think there is something interesting and delicate to be said about the notion of TCB. In my experience, people that are discovering computer-assisted theorem proving tend to over-emphasize the “who guards the guardians?” aspect (what if there is a bug in the verified), while depending on the sort of verification technique used (I’m thinking of proof-term-generating proof assistants or evidence-generating automated provers) it is usually extremely safe to assume that a practical or theoretical bug in a proof assistant would not result in a mistakenly-verified program being written in good faith (I’m not talking about adversarial contexts) – sort of like the (very real) existence of bug in text editors does not mean it is important to worry about the source code we write not being the source text we expect. On the other hand, there are other tooling-related parts of verification efforts where mistakes can more easily turn into mistaken assurance; for example, I think it’s right to worry about extractors.

                                                                                    1. 2

                                                                                      You seem to be saying that researchers don’t need to be intellectually honesty about what they claim in papers.

                                                                                      Is it ok for doctors to claim to have found a cure for a disease when they have solved part of the problem, can mathematicians claim to have proved a theorem when they have only proved a subset of it?

                                                                                      Soap powder manufacturers are accountable to a level of honesty about what they claim (by the advertising standards authority).

                                                                                      What makes formal methods researchers so special, that they can make claims that soap powder manufacturers are not allowed to make?

                                                                                      The claims made in the Compcert paper are not perfectly clear, at least to somebody who has spent several decades working on compilers and formal systems (i.e., me). They have to be carefully unpicked to reveal that some of them are hollow.

                                                                                      1. 2

                                                                                        I do certainly think that researchers need intellectual honesty.

                                                                                        The point I was making is that academic publications are mostly intended for peers (others experts in the field), and the way they formulate their caveats may be differently from what you would expect when they are writing for the general public (which also happens and is quite different) or, for example, with industrial people working on critical software or certification agencies. To reuse your analogy, doctors talking to each other about the effect of a medication may use terms that wouldn’t be effective communication when talking to the general public – to the point of being perceived as misleading if taken out of context.

                                                                                        Of course, it may also happen that people do oversell their claim (the publish-or-perish system that some academics must live under is an external pressure making this more likely to happen), and I think it is perfectly reasonable to be vigilant – not only on actual malpractice, but also on good-faith people that are oversimplify and oversell. I believe that in the specific cases of the work that you are trashing (Compcert, Verasco, and now this unrelated Why3 research), your accusations are not reasonable. The fact that you decided, in addition, to be extremely insulting makes it very difficult for experts to have a hope of a constructive conversation with you.

                                                                                        1. 1

                                                                                          These papers may be targeted at other academics, but their headline claims get posted and widely cited (by the authors and others).

                                                                                          Soap powder manufacturers are under pressure to make money by increasing sales, does this pressure make it ok for them to make overblown claims about their products? Academics work in a publish-or-perish environment, does this make it ok for them to make overblown claims?

                                                                                          I certainly did not mean to be insulting. However, the researchers involved are likely to have been very uncomfortable with my soap powder comparison or suggesting that their work be labeled as vanity research (what label would you apply to work that invents something that is claimed to have various properties/characteristics, where the claims are based on the researchers preferences and not experimental evidence?)

                                                                                          1. 3

                                                                                            In my experience overblown claims in the field of formal software verification are fairly rare, because the research paper format requires authors to precisely state their claims and give experts the necessary material to understand precisely what has been achieved. (There have been, of course, mistakes of various sorts made, although this is getting rarer with the adoption of proof assistants to mechanically verify the formal arguments – a level of rigor that exceeds the one of many other scientific fields.). Again, titles and abstracts might simplify claims (it is their purpose to be synthetic), but the precise claim are in the articles themselves and this is what should be looked at – unlike soap powder advertising, paper abstracts are not aimed at the general public but at other researchers in the field.

                                                                                            If you don’t mean to be insulting then maybe you should stop comparing an entire sub-field of research to “soap powder manufacturing”, and call perfectly sensible work (such as the one presented in this thread) “vanity research”. Generating verification conditions and discharging them with SMT solvers seem like a robust methodology, rather than “researcher preferences” – you seem to be misunderstanding or misrepresenting this research.

                                                                                      2. 1

                                                                                        I think that it is perfectly fair to described Compcert as a “formally verified C compiler”.

                                                                                        The reason I disagree on that point is the formal semantics. Formally-verified parsers for stuff that’s simpler than C was hard enough work. The formal semantics of C usually had features missing or trouble handling undefined behavior. It wasn’t until just a few years ago that realistic models of C with and without undefined behavior were formally specified. So, we have two reasons no C parser is significant: (a) formal specifications or verifications of tedious, text-processing algorithms often find errors; (b) C itself had been really hard to specify.

                                                                                        So, a “verified compiler”… as in everything is verified… for a hard language should cover each of the hard problems with a formal specification and verification. If it leaves one out, it’s not a verified compiler: it’s a partially-verified compiler. I don’t think more accuracy in titles would hurt here. They might do “A Mostly-Verified, Compiler for a Useful Subset of C” with a subtitle in smaller print or parentheses saying “we verified everything except the front-end.” People would still be impressed with nobody misled. I should note that I was misled by original title, summaries, and discussions that talked like it was fully-verified. When I read the paper(s), I got more accurate information that required me to change my claims a bit. The only one that really handled C’s semantics was KCC: a C semantics with the K framework. Derek actually gives them credit for realistic semantics on his blog.

                                                                                        “I think that the accusation of overselling come from taking a scientific work out of its context (a scientific community discussing software verification) “

                                                                                        They are promoting it for use in DO-178B software in industry. It’s proprietary software that AbsInt sells. That should probably be considered when assessing what terminology they should use for target audience. Of course, it would depend on when they first committed to doing that and what they were writing at the time.

                                                                                        “Academics that are trying to push the boundary on verified software may find out how to build interesting programs, without being expert on how to do production-testing of those programs, and I don’t think it would be reasonable to expect them to do both”

                                                                                        That could be fair if it’s hard to test. I always said they could do generation of tests from correctness goals or properties with runtime checks. Even Data61 is now using property-based testing on specs in at least one project. Generating some checks by hand in final code is straightforward for a lot of properties with a fuzzer able to deliver the input. I think they should definitely do more on empirical side confirming what their specs say with tests. I’m not expecting the full battery of testing a high-assurance product would get, though, if the researchers’ stated goals is just working on formal methods. I’d like to see the tests mainly to check their specifications and tooling for errors.

                                                                                        “ people that are discovering computer-assisted theorem proving tend to over-emphasize the “who guards the guardians?” aspect “

                                                                                        I agree people worry too much about it. I’d like some specific tools, esp checkers, done in a foundational way. Past that, I don’t care as much since we’re mostly just knocking out problems without believing the result will be perfect. I will note there’s been errors in solvers significant enough that this led to the evidence-generating ones in the first place. A lot of the speed-demons are already complex with some written in unsafe languages. The results of that might get worse as they’re accelerated with multicores, GPU’s, FPGA’s, etc to run through programs faster or with more lines of code. I’m for a pile of annotated code as a benchmark to throw at all the provers, individual projects and version to version, to get an empirical idea of what they’re catching and missing. Plus, trusted checkers for all of them.

                                                                                        “for example, I think it’s right to worry about extractors”

                                                                                        Total agreement there. Good news is there’s quite a few projects verifying them. There’s also an extractor for C language now with lower TCB than its predecessor for ML. Before I saw it and still a bit, I was thinking of something like Tolmach et al’s ML-to-Ada-or-C translator being verified so people can go from verified extractors to ML to verified translators to Ada/C w/ their tooling running to verified compilation. Like in hardware, formal equivalence checks are run on each step for a range of values on every path. A verified translator straight to C would be nicer but I’m not a proof engineer: can’t say how hard formal logic to C would be versus ML to C.

                                                                                        1. 3

                                                                                          You claim three limitations of Compcert:

                                                                                          1. It only covers a subset of C.
                                                                                          2. The parser is not verified. (An expert would add: nor the CIL-like post-parsing elaboration pass.)
                                                                                          3. Undefined behaviors are not fully modelled.

                                                                                          (1) is certainly true. I am not convinced that this can be described as “overselling”, though. MSVC only covers a subset of C (plus some extensions), and most people would not object to calling it a “C compiler” – although it is certainly important to point out, in more detailed descriptions of it, that there are limitations.

                                                                                          I don’t agree with calling (2) and (3) overselling, for technical reasons that merit discussions.

                                                                                          For (2), the question is: what would a verified parser even be? If you think about it, you will realize that it is very hard to define a formal specification for a parser that is not an implementation (in particular: none existed before Compcert was created), and same for the elaboration pass – or assembly-level printing for that matter. If there is no specification other than the implementation, verification does not make much sense anyway. The best work on what “formally verifying a parser” would mean has been done precisely in the context of Compcert (Validating LR(1) parsers, Jacques-Henri Jourdan, François Pottier and Xavier Leroy, 2012), and what it does is produce a formal “grammar” description that you have to manually inspect (and convince yourself that it corresponds to the grammar of C you have in mind… which is no small task) along with a parser automaton that is formally proved to verify this grammar. If one small part of a tool has no specification other than its implementation (this is not specific to the Compcert work but a general property of C compilers and programming languages), I don’t think it is misleading to call the whole tool “verified” – or is it perpetually invalid to call any compiler “verified” because the parsing phase has no reasonable specification?

                                                                                          For (3), there is a misunderstanding in your criticism: a semantics of C does not need to model every form of undefined behavior to build a verified compiler, or more generally to allow all behaviors permitted by the standard. It is perfectly valid to use, as Compcert does, a reference semantics that is more defined than what the standard says (typically: it defines some overflow behaviors and fixes expression evaluation order). This is of course the exact meaning of what “implementation-defined” means in the standard, but it is also perfectly valid to do the same for undefined behaviors: giving a defined meaning, in an implementation, to program with undefined behavior is perfectly valid – given that any behavior is valid for this program. We then have to check (informally) that the semantics given by Compcert is, indeed, stronger than the standard’s (it does not make different, incompatible choices in some situations). But it is a misunderstanding to claim that a compiler’s semantics is incomplete because it does not model all undefined behaviors. (In fact, the vast majority of C compilers out there will make similar choices of working with a stronger semantics than strictly required with the standard, if only because it makes the compiler simpler.)

                                                                                          On the other hand, a fine-grained modelling of undefined behavior matters if you want to draw conclusions that are valid for any C compiler, and not just for one fixed implementation. This is why it is an important research endeavor if you want to build static analysis tools for C that would draw conclusions valid for any C compiler. The Verasco project, for example, is a static analyser that uses Compcert’s specification, and its results are only valid if you compile the program using Compcert – Verasco can only check that a certain class of runtime error will be absent from the executable produced by Compcert (or any compiler respecting Compcert’s stronger semantics). Others works, such as KCC (which is more recent than Compcert) and more recently Robert Krebbers’ work, have worked on more complete formal specifications of very-undefined semantics for C.

                                                                              2. 2

                                                                                Cargo cult arguments against C are popular lately. “X is written in C, therefore it’s probably buggy and needs to be rewritten in Y.” Always lots of hedging and talk about what could be, but usually not many real bugs mentioned.

                                                                                1. 5

                                                                                  Your accusation of this work being a “cargo cult argument” is highly unfair. The authors built an implementation of a GMP-like library and defined a specification for it, such that a computer can verify that the implementation matches the specification. This is a very strong argument in favor of the correctness of their implementation, not cargo-culting.

                                                                                  This sort of work is possible to do in C (see Frama-C, VeriFast) but fairly difficult, due to the complexity of the full semantics of the C programming language. Instead, the authors work in a language designed for formal verification, whose semantics is easier to reason about – and with good tooling support for automating verification – writing their program in such a style that they can translate it back to C.

                                                                                  For a given specification of what a piece of software should do, it is reasonable to assume that a program that has not been verified against this specificatin has a higher likelihood of having defects than a program that was verified against the specification. There is nothing cargo-culted in that.

                                                                                  1. 3

                                                                                    There’s been bugs reported in GMP already. Range errors are also common vulnerabilities in general case. How much stronger case needs to be made at that point that immune by default is better position?

                                                                                    1. 0

                                                                                      Ah, the magic pixie dust of mathematics, which must be correct is just as buggy as software.

                                                                                      1. 2

                                                                                        That’s correct. They can screw them up. It’s why I referenced prior work using the language and tool to find defects in software. I thought your view was that experiments or field data showing a method gets results, esp consistently on similar problems, increases the confidence we have in it working. If so, then you should have at least as much confidence in Why3’s math-based approach that got results as you would a purely code approach.

                                                                                        A tool that usually works is a tool that usually works. Nothing more but nothing less. Do explain further if your perspective disallows as evidence field application of a method for software correctness by multiple parties over time.

                                                                                        1. 1

                                                                                          Yes, if experiments or field data show that a method gets results, I’m all for it (of course the code/benefit may not work out, but that’s another matter).

                                                                                          But “immune by default” applied to formal techniques is a magical pixie dust mindset.

                                                                                          Formal techniques, like many other techniques, find problems in code. How cost effective are they in comparison? I wish somebody would do some experiments to find out (the main obstacle is the likely very high cost).

                                                                                          1. 1

                                                                                            But “immune by default” applied to formal techniques is a magical pixie dust mindset.

                                                                                            I held off replying to this because I want to handle it in a separate thread. That is, empirical proof of fundamentals of specific, formal methods that (a) justify/reject trusting them in empirical sense and (b) maybe help derive empirical evidence for/against them as they’re used. The initial idea is having concrete examples and test patterns of every primitive, logical rule, etc Obviously, rely on the simplest ones as the formal methodists are doing with stuff like HOL Zero and Milawa prover. The specifications and activites are vetted by potentially hundreds to hundreds of thousands of tests with diverse strategies as they’re employed. Another quick one is a scheme like @moyix’s that inserts specific flaws in code that formal methods are supposed to catch. Whoever uses them and that tool might let it run overnight on new code to generate background measurements of effectiveness that a central organization collects and reports for those studying effectiveness.

                                                                                            That concept will need more thought about not only the answers (how to test/review) but the right questions to ask (the what). If I put a proposal together, I’ll tag you in it if you’re interested in reviewing it.

                                                                                      2. 0

                                                                                        Is there any popular software that hasn’t had bugs reported already?

                                                                                        There are plenty of good reasons not to use C, but bugs that don’t actually exist aren’t one of them.

                                                                                        1. 2

                                                                                          “Is there any popular software that hasn’t had bugs reported already?”

                                                                                          That’s peripheral to my point. The point is we say don’t use C or trust C apps by default if one is worried about common defects, esp code injections. The reason starts right with the fact that it was intentionally designed to let them hack together code on a PDP-11. Nothing more. To meet their goals within 1970’s constraints, they didn’t add any safety or maintainability features that were in ALGOL-derived languages on better hardware. Some other languages, esp Ada and Modula’s, did add safety and maintainability features. The C apps had lots of defects over time with even experts bitten by them. The apps in the other languages had less defects due to their design even when people had same or less skill. Studies confirmed this for Ada with CVE list showing it for a lot of other languages.

                                                                                          Given that backdrop, both the logical and empirical evidence support that anyone wanting to get low bugs and high maintainability with little effort should avoid C by default unless having really good reasons to use it. Even then, it may be easier to do reference implementation in a safe, non-C language that a C deployment is derived from. It’s not even C bashing or anything from this viewpoint: it’s a BCPL derivative with rough semantics designed for unsafe hacking of small, non-security-critical programs on 1970’s hardware in benign environments. It was great for that. It was just never designed in the first place for what people are using it for today. (Something similar happened with HTML and Javascript for similar reasons caused by a killer app.) Burden of proof for defaulting on C for security-critical work has been on C promoters for some time given all the evidence of its risk vs alternative language designs.

                                                                                          “There are plenty of good reasons not to use C, but bugs that don’t actually exist aren’t one of them.”

                                                                                          Average C app has plenty of bugs alternatives either never have or rarely have depending on design. This tech can prevent them, too. GMP has had them. Writing GMP with alternatives would inherit their properties of having either none of these bugs or less of them. So, they did. That’s the logic of it. Glad you brought it up, though, since looking at the bug reports shows everything from buffer overruns to these gems in v5.0.3:

                                                                                          “There are several problems in the multiply code that can be seen as miscomputations and/or ASSERT failures. These bugs have been triggered in our testing environment, but the probability for miscomputations using uniformly distributed random numbers is extremely small. These bugs have been around since GMP 5.0

                                                                                          “There are several problems in the functions for gcd, gcdext, and invert, causing miscomputations and/or ASSERT failures. These bugs have been triggered in our testing environment, but the probability for miscomputations using uniformly distributed random numbers is infinitesimally small. These bugs have been around since GMP 4.3.

                                                                                          They not only have bugs: they’re bugs that have persisted over numerous versions they couldn’t eradicate with their informal reviews and testing methods. A good time to try something new with a track record of eliminating the kinds of bugs triggered in situations they can’t see or test for.

                                                                                      3. 2

                                                                                        If a C code base includes calls to unsafe functions like strcpy(43), sprintf(29), strncpy(27), or strncat(1) then it seems likely that one or more of them incorrectly computes the length of a string.

                                                                                        1. 1

                                                                                          then it seems likely

                                                                                          This is exactly what I mean. There’s either a bug or there’s not.

                                                                                          I might as well claim that it “seems likely” a rewrite of GMP will introduce regressions, so everybody should keep using the original in order to avoid them.