1. 2

    What if I don’t want to align to the adjacent line?

    1. 1

      Well, then you have to add a blank line in between.

    1. 4

      Geometrically, a square is a specialisation of a rectangle: every square is a rectangle, not every rectangle is a square. For all s in Squares, s is a Rectangle and width of s is equal to height of s. As a type, this relationship is reversed: you can use a rectangle everywhere you can use a square (by having a rectangle with the same width and height), but you cannot use a square everywhere you can use a rectangle (for example, you can’t give it a different width and height).

      I think there is a reasoning mistake in this paragraph. The definition of (square specializes rectangle) in the first part, which I believe is correct, is “every square is a rectangle”. But then the relation is said to be “reversed” because “you can use a rectangle everywhere you can use a square”. But while this may be correct for some rectangles (those that are square), it is not true that every rectangle can be used where a square was expected. (For example, code that would assume that the area of the argument is width*width would be wrong.) So the idea that “the relationship is reversed” is wrong.

      To make sense the paragraph should be precise about what it means about “use a X where you use a Y”.

      1. 5

        I’m curious to know if someone is using OCaml for application where there’s usually an interest with Rust (system programming) and it’s resource management capability.

        There is a small but lively community of system and systems programming in OCaml, see the MirageOS project and its ecosystem. In particular, Mirage’s TCP/IP stack is deployed in the native OSX and Windows Docker application, so it ships to a sizeable number of users.

        1. 18

          I’m writing a summary article about the design of Merlin, a language server for OCaml. (A language server is a program/service/daemon that talks to editors (Emacs, Vim, SublimeText, vscode etc…) to provide support for language-aware editing actions, making it easier to bring IDE-like features to a new editor for a given programming language. The terminology was popularized by the vscode-specific Language Server Protocol (LSP), but Merlin predates it and uses a different protocol.)

          I’m not one of the developers of Merlin, so it’s a lot of fun to dive into the design and discuss it with them. Having a good language server requires a language frontend (from source to type-checking) that supports incrementality (being fast to react to small updates) and partiality (incomplete or partially erroneous text buffers). The Merlin codebase and authors’ brains are full of very interesting knowledge on how to build incremental lexers, parsers and type checkers.

          1. 2

            Note: when linking to an arXiv article, I think that it is better to link to the online abstract (the /abs/ url, in the present case https://arxiv.org/abs/1602.07455) rather than the PDF: the PDF is just one click away, the abstract gives the abstract (instead of you having to repeat it), the author and the year.

            The evaluation in this article is extremely weak: it claims to do better than the auto tactic of Coq, but people with Coq experience know that auto is an extremely limited form of automation (it does the “obvious” things only, unless you enlarge it with large collections of hints). Generic automation tactics have been written (by hand) for Coq, such as the crush tactic, that would be a much better base to compare against. Evaluating an automated prover by comparing with auto does not give any useful information.

            I think that simple evolutionary approaches like the one presented here have been tried many time (it’s basically a master internship topic that any research would have in mind, and it has certainly tried countless time since the eighties), and it doesn’t really work. Much more powerful approaches are available today, for example using a translation into first-order goals that SMT solvers can check (CVC4 or Z3 would eat the examples in this paper alive). There is a growing litterature on “Hammers” for proof assistants, that delegate the hard proving work to automated theorem provers; Isabelle has the very nice sledgehammer tool, and there is recent work in the Coq space, CoqHammer. It’s not quite as good as Isabelle’s sledgehammer yet (or automated provers for other assistants), but it’s already much stronger than the tool described in its article, and potential descendants of it using genetic search only.

            1. 1

              re Coq evolutionary algorithms

              Well, that’s what I thought far as someone had to have tried it. The GA’s/EA’s are one of those general-purpose methods so effective at so many things that I’d have tried it on about every angle of theorem proving if I had a chance and was in that field. They don’t always do that, though. Thanks for chiming in about that weakness in the paper and those alternatives.

              re arXiv

              What I did is specific to this site. I normally include the title, name(s), and year. IIRC I was asked not to put names on some time ago by an admin. Got a damaged memory so that may or may not have been here. It does clutter the front page with long titles, though, with info that’s usually first thing you see in CompSci papers. I was definitely asked by pushcx not to put year on it when it’s current but otherwise OK. And in metas most Lobsters liked that some of us put abstracts, Github, etc in text field to help them avoid wasted time. With all that, I just submit the article, its year, and a trimmed abstract. If they need the citation, they can always Google it. They should anyway as part of due diligence in case there’s more recent work or my citation was wrong. Academics here have caught me screwing up on dates or other stuff on occasion.

              1. 2

                re. arXiv, my point was that you could have submitted the URL https://arxiv.org/abs/1602.07455 instead of https://arxiv.org/pdf/1602.07455, and that I think in general it’s good hygiene for articles in an open-access archive. (I agree with your preference on citing authors and publication year when you cite a work, and I apply them in in-text citations, although I understand that lobster has a policy on titles that suggests only including the title. If you link to the arxiv web abstract instead of the pdf documentations, the terse title is never an issue as the metadata is all available on the landing page.)

                1. 1

                  For arXiv, I’ve mainly just been trying to save people a click taking them right to the PDF. I might experiment with doing it that way, though.

            1. 1

              This looks potentially interesting, but I’d be much more interested if I could actually find source code for it…

              1. 2

                It’s not at that stage yet. They’re presenting a paper on it so far and that’s about it. They might open it up after the research phase is over, fingers crossed.

                1. 1

                  “We are in research pase” is a bad reason not to show code. It’s a bad reason before you get a paper out, but it’s an even worse reason once you had your publication confirmed (and indeed the authors mention that they got an ICDE18 paper out of this work). Research is about sharing ideas in the open, and talking proudly about your work on a specific software prototype without giving the sources is a douche move.

                2. 1

                  It sounds almost too good to be true. I find it weird when researchers claim things that no one can reproduce. So much for transparency.

                  1. 2

                    these folks have a decent pedigree both with respect to research and open sourcing their code. I haven’t looked closely at the paper yet, but the numbers seem non crazy. that said, as with all the work in this particular sub area, consistency models mean everything, and the words “flexibly consistent” a guaranteed to be doing a lot of work.

                    1. 1

                      This is great news! Redis seems to me as one of the best engineered “low level” DB, so the claim to outperform it by a 10x factor is definitely very interesting. They are certainly doing a good job at teasing us with these promising results!

                1. 0

                  This paper spends a lot of time pointing out possible reasons there might be faults in the GNU Multi-Precision library and then suggests that if an arbitrary-precision integer library was implemented using a language co-invented by the authors of this paper, the code would be correct.

                  Lobste.rs needs a vanity-research tag.

                  1. 9

                    We might need a fantasy skepticism tag, too, if you keep selectively reading papers or ignoring prior work like that. Right after the section you mention, they describe the foundation of their solution: the ML-like language with formal specifications and solvers in Why3. The prior work on ML-like languages indicates using them lowers defects compared to C. The prior work with Why3, by itself or with stuff like SPARK on top, caught plenty of defects of the exact nature they’re describing such as range errors, alias, etc. It’s also cathedral-style where a focused team takes it a component at a time instead of drive-by contributions. That itself has increased quality in prior projects. Based on prior data, any of these should make a safer library. A combination even more so.

                    Re-skimming it, I find that they attempt to address some of the problems that come to my mind:

                    1. Mismatch between safe language and target. They use an imperative subset of Why3’s language which has already been integrated with C, Java, and Ada with real-world use. Lots of errors caught but few introduced. Subset itself is close to C. Increases confidence.

                    2. C memory model can have errors. They make a simple one for just features of C they need. Less complexity = less errors. Legacy code doesn’t get machine-checked against a memory model at all.

                    3. Humans need to inspect the code it produces to find errors in their scheme. They include numerous examples of the WhyML code side-by-side with extracted C that show extracted code is quite readable. That means reviewers can check it. Since it’s C, the reviewers can also apply a huge ecosystem of field-proven tools to find bugs.

                    4. Provers could get something wrong. Although this area needs more work, this project uses four provers mainly so each can catch things others miss. A side effect is an increase in trust for whatever they all agree on. They said that data is hard to precisely know for some reason. Fixing that will help with 4 for future experiments. Also, they corroborated the first part when finding specific provers were really good at handling specific properties. Any empirical confirmation or use of provers could increase confidence by tying problem types with provers shown experimentally to handle them well.

                    5. Difference in integer types for adder. They use two, one-limb whereas GMP uses two, two-limbed addition. That leads to a slowdown. They say they plan to switch but I wonder why they broke compatibility there in first place. Best case is it could be as simple as them not knowing it was important to do the addition a specific way. There’s a hint of that in the problem statement where a lot of optimizations are built-in to GMP that people might not know about. Worst case is it was too hard to verify in time for paper submission.

                    If ignoring backward-compatibility, the stuff in the related work section is stronger than this work. Especially which uses HOL4 with its small checkers in TCB or using SPARK which is field-proven in many projects. A drop-in, compatible replacement for GMP makes sense, though. This work even with deficiencies inspires more confidence than methods used to build GMP with their associated vulnerabilities and difficulty of analysis.

                    1. 6

                      You are highly biased against formal verification research and have already made several incorrect/misinformed claims about it (blog posts). I respect your C expertise, but I think people should be aware that you are extremely biased in this case.

                      Besides, frankly this Vanity Research or Real Project blog post is insulting and I think you could avoid that. You have a different philosophy about scientific research than what is commonly accepted (you seem to disagree with formalism and believe only in experiments, which is an extreme but possible position), it does not justify insulting or “naming and shaming” people that recognize, accept and follow other (perfectly consensual) scientific processes.

                      1. 2

                        “you seem to disagree with formalism and believe only in experiments”

                        The amusing thing is that his experiments use formal methods in form of numbers, algebra, probability, and propositional logic (i.e. computer gates they run on), possibly model-checking (i.e. concurrent algorithms they use). He seems to trust math given he derives empirical results by plugging data into such mathematical formulas. That’s what people do with formal verification. Except, there’s both more a faith-it-works aspect and higher error margins with the statistical math versus the simpler forms of logic used in program verification. Maybe there’s something I’m missing but empirical methods seem highly dependent on math functions and reasoning on output of math functions. For some, especially probabilistic vs simple logics, it also looks to take more trust to believe statistical models than logical ones.

                        I’ve been thinking on that since he first started talking about superiority of [math-based] methods he called emprical being better than math-based methods others called formal. I dug up some articles today in what spare time I had to further explore this contradiction. I was guessing core areas of math/logic could probably be considered empirically proven given all field data confirming the basic primitives and rules in them. The stuff I was finding was corroborating that or vainly trying to reject falsifiability due to math’s abstraction level. I might post some references or summaries in the future on this topic when I have more time and sleep.

                        1. 3

                          Let me mention that I appreciated your replies in this thread. I found the general skeptical tone of the discussion baffling, but I think that you did an excellent job addressing concerns and discussing them with patience.

                          1. 2

                            Thank you. :)

                            The skepticism comes mostly from his extreme bias but there’s a kernels of truth in it that makes me address this stuff:

                            1. Overselling of what’s accomplished in some projects. For instance, his blog correctly mentions CompCert was not a verified C compiler. It was a verified Clight compiler IIRC with an unverified, front-end for C or a C subset. I think they added something formal in the front later. Another is when some talk like there can be no errors because they did a portion formally. Countering that stuff is worthwhile.

                            2. There’s also a habit of not doing rigorous testing after something was formally specified or proven. The old Guttman paper had a lot of examples of verified programs failing after someone so much as ran them once or did some testing. The specs or verification methods can be wrong. That’s why simultaneously doing that and empirical methods makes sense. Of course, high-assurance field has been advocating that longer than any other with formal proof one tool among many. It usually catches the deepest bugs but other methods are more cost-effective for other bugs.

                            3. The results in many of these experiments aren’t reproducible or cross-checked enough. It helps that most are using the same FOSS software for the proofs with a lot of the work published as open-source. There’s also been efforts he doesn’t assess to reduce the TCB of checkers and extractors down to nearly nothing. There’s still a lot of room for improvement on experiment design, reproducibility and review, though. My bet being we’ll see errors in some of this work as other parties either take a close look at them or try to apply them to lots of real-world software.

                            4. Cost. Formal methods, depending on how heavy, can add enough cost in terms of less features and time-to-market/completion of a project. Specific ones in industry like Astree can also cost money. This is a tangent here given this is just a research report showing how formal methods were applied to a piece of code. It is worth thinking about when deciding whether to use them or how much. I usually advocate lightweight, formal methods combined with review and automated testing if we’re talking regular software developers with limited interest in or resources for highest correctness.

                            Although still waking up, those are four reasons that come to mind for why you might see a skeptic in formal methods threads. I didn’t think “pixie dust” was warranted on this work given they picked battle-tested foundations that were FOSS to improve a problematic piece of software. It’s one of the better applications imho. If people want more empirical data, they could have some people code up some libraries like this in C and a group do it in Frama-C, SPARK, or WhyML. Then, run a bunch of static analyzers and fuzzers on it looking for the kind of errors these tools prevent. If our side is correct, they’ll see a drop in bug count. I got money on that given we’ve already seen a drop in bug count every time these methods are used. ;)

                            1. 3

                              Nothing wrong with addressing concerns.

                              For the record, I don’t agree with those claims that Compcert (or related work) is oversold. I think that it is perfectly fair to described Compcert as a “formally verified C compiler”. The Compcert website is clear about which parts of the compiler have or don’t have a formal specification, and the publications about this work are explicit about what precisely has been specified or proven.

                              I think that the accusation of overselling come from taking a scientific work out of its context (a scientific community discussing software verification) and blaming it for generating unrealistic expectations outside this context. I can see how someone misinformed about what formal specification is could believe that “formally verified” means “will not have a bug ever” without asking themselves what specification it was specified against; but I don’t think you can blame Compcert for that.

                              I agree of course with your point 2, 3 and 4. But I think it’s also important to point out that those should not necessarily be expected of academic work. Academics that are trying to push the boundary on verified software may find out how to build interesting programs, without being expert on how to do production-testing of those programs, and I don’t think it would be reasonable to expect them to do both – assuming that industrial/production testing methods in the area have not been made publicly and easily available. “There was no production-grade testing of this system yet” is a perfect reason for an engineer not to jump ahead to use the code proposed by a verification effort, but I don’t think it is a valid criticism of the research work.

                              Finally, I think there is something interesting and delicate to be said about the notion of TCB. In my experience, people that are discovering computer-assisted theorem proving tend to over-emphasize the “who guards the guardians?” aspect (what if there is a bug in the verified), while depending on the sort of verification technique used (I’m thinking of proof-term-generating proof assistants or evidence-generating automated provers) it is usually extremely safe to assume that a practical or theoretical bug in a proof assistant would not result in a mistakenly-verified program being written in good faith (I’m not talking about adversarial contexts) – sort of like the (very real) existence of bug in text editors does not mean it is important to worry about the source code we write not being the source text we expect. On the other hand, there are other tooling-related parts of verification efforts where mistakes can more easily turn into mistaken assurance; for example, I think it’s right to worry about extractors.

                              1. 2

                                You seem to be saying that researchers don’t need to be intellectually honesty about what they claim in papers.

                                Is it ok for doctors to claim to have found a cure for a disease when they have solved part of the problem, can mathematicians claim to have proved a theorem when they have only proved a subset of it?

                                Soap powder manufacturers are accountable to a level of honesty about what they claim (by the advertising standards authority).

                                What makes formal methods researchers so special, that they can make claims that soap powder manufacturers are not allowed to make?

                                The claims made in the Compcert paper are not perfectly clear, at least to somebody who has spent several decades working on compilers and formal systems (i.e., me). They have to be carefully unpicked to reveal that some of them are hollow.

                                1. 2

                                  I do certainly think that researchers need intellectual honesty.

                                  The point I was making is that academic publications are mostly intended for peers (others experts in the field), and the way they formulate their caveats may be differently from what you would expect when they are writing for the general public (which also happens and is quite different) or, for example, with industrial people working on critical software or certification agencies. To reuse your analogy, doctors talking to each other about the effect of a medication may use terms that wouldn’t be effective communication when talking to the general public – to the point of being perceived as misleading if taken out of context.

                                  Of course, it may also happen that people do oversell their claim (the publish-or-perish system that some academics must live under is an external pressure making this more likely to happen), and I think it is perfectly reasonable to be vigilant – not only on actual malpractice, but also on good-faith people that are oversimplify and oversell. I believe that in the specific cases of the work that you are trashing (Compcert, Verasco, and now this unrelated Why3 research), your accusations are not reasonable. The fact that you decided, in addition, to be extremely insulting makes it very difficult for experts to have a hope of a constructive conversation with you.

                                  1. 1

                                    These papers may be targeted at other academics, but their headline claims get posted and widely cited (by the authors and others).

                                    Soap powder manufacturers are under pressure to make money by increasing sales, does this pressure make it ok for them to make overblown claims about their products? Academics work in a publish-or-perish environment, does this make it ok for them to make overblown claims?

                                    I certainly did not mean to be insulting. However, the researchers involved are likely to have been very uncomfortable with my soap powder comparison or suggesting that their work be labeled as vanity research (what label would you apply to work that invents something that is claimed to have various properties/characteristics, where the claims are based on the researchers preferences and not experimental evidence?)

                                    1. 3

                                      In my experience overblown claims in the field of formal software verification are fairly rare, because the research paper format requires authors to precisely state their claims and give experts the necessary material to understand precisely what has been achieved. (There have been, of course, mistakes of various sorts made, although this is getting rarer with the adoption of proof assistants to mechanically verify the formal arguments – a level of rigor that exceeds the one of many other scientific fields.). Again, titles and abstracts might simplify claims (it is their purpose to be synthetic), but the precise claim are in the articles themselves and this is what should be looked at – unlike soap powder advertising, paper abstracts are not aimed at the general public but at other researchers in the field.

                                      If you don’t mean to be insulting then maybe you should stop comparing an entire sub-field of research to “soap powder manufacturing”, and call perfectly sensible work (such as the one presented in this thread) “vanity research”. Generating verification conditions and discharging them with SMT solvers seem like a robust methodology, rather than “researcher preferences” – you seem to be misunderstanding or misrepresenting this research.

                                2. 1

                                  I think that it is perfectly fair to described Compcert as a “formally verified C compiler”.

                                  The reason I disagree on that point is the formal semantics. Formally-verified parsers for stuff that’s simpler than C was hard enough work. The formal semantics of C usually had features missing or trouble handling undefined behavior. It wasn’t until just a few years ago that realistic models of C with and without undefined behavior were formally specified. So, we have two reasons no C parser is significant: (a) formal specifications or verifications of tedious, text-processing algorithms often find errors; (b) C itself had been really hard to specify.

                                  So, a “verified compiler”… as in everything is verified… for a hard language should cover each of the hard problems with a formal specification and verification. If it leaves one out, it’s not a verified compiler: it’s a partially-verified compiler. I don’t think more accuracy in titles would hurt here. They might do “A Mostly-Verified, Compiler for a Useful Subset of C” with a subtitle in smaller print or parentheses saying “we verified everything except the front-end.” People would still be impressed with nobody misled. I should note that I was misled by original title, summaries, and discussions that talked like it was fully-verified. When I read the paper(s), I got more accurate information that required me to change my claims a bit. The only one that really handled C’s semantics was KCC: a C semantics with the K framework. Derek actually gives them credit for realistic semantics on his blog.

                                  “I think that the accusation of overselling come from taking a scientific work out of its context (a scientific community discussing software verification) “

                                  They are promoting it for use in DO-178B software in industry. It’s proprietary software that AbsInt sells. That should probably be considered when assessing what terminology they should use for target audience. Of course, it would depend on when they first committed to doing that and what they were writing at the time.

                                  “Academics that are trying to push the boundary on verified software may find out how to build interesting programs, without being expert on how to do production-testing of those programs, and I don’t think it would be reasonable to expect them to do both”

                                  That could be fair if it’s hard to test. I always said they could do generation of tests from correctness goals or properties with runtime checks. Even Data61 is now using property-based testing on specs in at least one project. Generating some checks by hand in final code is straightforward for a lot of properties with a fuzzer able to deliver the input. I think they should definitely do more on empirical side confirming what their specs say with tests. I’m not expecting the full battery of testing a high-assurance product would get, though, if the researchers’ stated goals is just working on formal methods. I’d like to see the tests mainly to check their specifications and tooling for errors.

                                  “ people that are discovering computer-assisted theorem proving tend to over-emphasize the “who guards the guardians?” aspect “

                                  I agree people worry too much about it. I’d like some specific tools, esp checkers, done in a foundational way. Past that, I don’t care as much since we’re mostly just knocking out problems without believing the result will be perfect. I will note there’s been errors in solvers significant enough that this led to the evidence-generating ones in the first place. A lot of the speed-demons are already complex with some written in unsafe languages. The results of that might get worse as they’re accelerated with multicores, GPU’s, FPGA’s, etc to run through programs faster or with more lines of code. I’m for a pile of annotated code as a benchmark to throw at all the provers, individual projects and version to version, to get an empirical idea of what they’re catching and missing. Plus, trusted checkers for all of them.

                                  “for example, I think it’s right to worry about extractors”

                                  Total agreement there. Good news is there’s quite a few projects verifying them. There’s also an extractor for C language now with lower TCB than its predecessor for ML. Before I saw it and still a bit, I was thinking of something like Tolmach et al’s ML-to-Ada-or-C translator being verified so people can go from verified extractors to ML to verified translators to Ada/C w/ their tooling running to verified compilation. Like in hardware, formal equivalence checks are run on each step for a range of values on every path. A verified translator straight to C would be nicer but I’m not a proof engineer: can’t say how hard formal logic to C would be versus ML to C.

                                  1. 3

                                    You claim three limitations of Compcert:

                                    1. It only covers a subset of C.
                                    2. The parser is not verified. (An expert would add: nor the CIL-like post-parsing elaboration pass.)
                                    3. Undefined behaviors are not fully modelled.

                                    (1) is certainly true. I am not convinced that this can be described as “overselling”, though. MSVC only covers a subset of C (plus some extensions), and most people would not object to calling it a “C compiler” – although it is certainly important to point out, in more detailed descriptions of it, that there are limitations.

                                    I don’t agree with calling (2) and (3) overselling, for technical reasons that merit discussions.

                                    For (2), the question is: what would a verified parser even be? If you think about it, you will realize that it is very hard to define a formal specification for a parser that is not an implementation (in particular: none existed before Compcert was created), and same for the elaboration pass – or assembly-level printing for that matter. If there is no specification other than the implementation, verification does not make much sense anyway. The best work on what “formally verifying a parser” would mean has been done precisely in the context of Compcert (Validating LR(1) parsers, Jacques-Henri Jourdan, François Pottier and Xavier Leroy, 2012), and what it does is produce a formal “grammar” description that you have to manually inspect (and convince yourself that it corresponds to the grammar of C you have in mind… which is no small task) along with a parser automaton that is formally proved to verify this grammar. If one small part of a tool has no specification other than its implementation (this is not specific to the Compcert work but a general property of C compilers and programming languages), I don’t think it is misleading to call the whole tool “verified” – or is it perpetually invalid to call any compiler “verified” because the parsing phase has no reasonable specification?

                                    For (3), there is a misunderstanding in your criticism: a semantics of C does not need to model every form of undefined behavior to build a verified compiler, or more generally to allow all behaviors permitted by the standard. It is perfectly valid to use, as Compcert does, a reference semantics that is more defined than what the standard says (typically: it defines some overflow behaviors and fixes expression evaluation order). This is of course the exact meaning of what “implementation-defined” means in the standard, but it is also perfectly valid to do the same for undefined behaviors: giving a defined meaning, in an implementation, to program with undefined behavior is perfectly valid – given that any behavior is valid for this program. We then have to check (informally) that the semantics given by Compcert is, indeed, stronger than the standard’s (it does not make different, incompatible choices in some situations). But it is a misunderstanding to claim that a compiler’s semantics is incomplete because it does not model all undefined behaviors. (In fact, the vast majority of C compilers out there will make similar choices of working with a stronger semantics than strictly required with the standard, if only because it makes the compiler simpler.)

                                    On the other hand, a fine-grained modelling of undefined behavior matters if you want to draw conclusions that are valid for any C compiler, and not just for one fixed implementation. This is why it is an important research endeavor if you want to build static analysis tools for C that would draw conclusions valid for any C compiler. The Verasco project, for example, is a static analyser that uses Compcert’s specification, and its results are only valid if you compile the program using Compcert – Verasco can only check that a certain class of runtime error will be absent from the executable produced by Compcert (or any compiler respecting Compcert’s stronger semantics). Others works, such as KCC (which is more recent than Compcert) and more recently Robert Krebbers’ work, have worked on more complete formal specifications of very-undefined semantics for C.

                        2. 2

                          Cargo cult arguments against C are popular lately. “X is written in C, therefore it’s probably buggy and needs to be rewritten in Y.” Always lots of hedging and talk about what could be, but usually not many real bugs mentioned.

                          1. 5

                            Your accusation of this work being a “cargo cult argument” is highly unfair. The authors built an implementation of a GMP-like library and defined a specification for it, such that a computer can verify that the implementation matches the specification. This is a very strong argument in favor of the correctness of their implementation, not cargo-culting.

                            This sort of work is possible to do in C (see Frama-C, VeriFast) but fairly difficult, due to the complexity of the full semantics of the C programming language. Instead, the authors work in a language designed for formal verification, whose semantics is easier to reason about – and with good tooling support for automating verification – writing their program in such a style that they can translate it back to C.

                            For a given specification of what a piece of software should do, it is reasonable to assume that a program that has not been verified against this specificatin has a higher likelihood of having defects than a program that was verified against the specification. There is nothing cargo-culted in that.

                            1. 3

                              There’s been bugs reported in GMP already. Range errors are also common vulnerabilities in general case. How much stronger case needs to be made at that point that immune by default is better position?

                              1. 0

                                Ah, the magic pixie dust of mathematics, which must be correct is just as buggy as software.

                                1. 2

                                  That’s correct. They can screw them up. It’s why I referenced prior work using the language and tool to find defects in software. I thought your view was that experiments or field data showing a method gets results, esp consistently on similar problems, increases the confidence we have in it working. If so, then you should have at least as much confidence in Why3’s math-based approach that got results as you would a purely code approach.

                                  A tool that usually works is a tool that usually works. Nothing more but nothing less. Do explain further if your perspective disallows as evidence field application of a method for software correctness by multiple parties over time.

                                  1. 1

                                    Yes, if experiments or field data show that a method gets results, I’m all for it (of course the code/benefit may not work out, but that’s another matter).

                                    But “immune by default” applied to formal techniques is a magical pixie dust mindset.

                                    Formal techniques, like many other techniques, find problems in code. How cost effective are they in comparison? I wish somebody would do some experiments to find out (the main obstacle is the likely very high cost).

                                    1. 1

                                      But “immune by default” applied to formal techniques is a magical pixie dust mindset.

                                      I held off replying to this because I want to handle it in a separate thread. That is, empirical proof of fundamentals of specific, formal methods that (a) justify/reject trusting them in empirical sense and (b) maybe help derive empirical evidence for/against them as they’re used. The initial idea is having concrete examples and test patterns of every primitive, logical rule, etc Obviously, rely on the simplest ones as the formal methodists are doing with stuff like HOL Zero and Milawa prover. The specifications and activites are vetted by potentially hundreds to hundreds of thousands of tests with diverse strategies as they’re employed. Another quick one is a scheme like @moyix’s that inserts specific flaws in code that formal methods are supposed to catch. Whoever uses them and that tool might let it run overnight on new code to generate background measurements of effectiveness that a central organization collects and reports for those studying effectiveness.

                                      That concept will need more thought about not only the answers (how to test/review) but the right questions to ask (the what). If I put a proposal together, I’ll tag you in it if you’re interested in reviewing it.

                                2. 0

                                  Is there any popular software that hasn’t had bugs reported already?

                                  There are plenty of good reasons not to use C, but bugs that don’t actually exist aren’t one of them.

                                  1. 2

                                    “Is there any popular software that hasn’t had bugs reported already?”

                                    That’s peripheral to my point. The point is we say don’t use C or trust C apps by default if one is worried about common defects, esp code injections. The reason starts right with the fact that it was intentionally designed to let them hack together code on a PDP-11. Nothing more. To meet their goals within 1970’s constraints, they didn’t add any safety or maintainability features that were in ALGOL-derived languages on better hardware. Some other languages, esp Ada and Modula’s, did add safety and maintainability features. The C apps had lots of defects over time with even experts bitten by them. The apps in the other languages had less defects due to their design even when people had same or less skill. Studies confirmed this for Ada with CVE list showing it for a lot of other languages.

                                    Given that backdrop, both the logical and empirical evidence support that anyone wanting to get low bugs and high maintainability with little effort should avoid C by default unless having really good reasons to use it. Even then, it may be easier to do reference implementation in a safe, non-C language that a C deployment is derived from. It’s not even C bashing or anything from this viewpoint: it’s a BCPL derivative with rough semantics designed for unsafe hacking of small, non-security-critical programs on 1970’s hardware in benign environments. It was great for that. It was just never designed in the first place for what people are using it for today. (Something similar happened with HTML and Javascript for similar reasons caused by a killer app.) Burden of proof for defaulting on C for security-critical work has been on C promoters for some time given all the evidence of its risk vs alternative language designs.

                                    “There are plenty of good reasons not to use C, but bugs that don’t actually exist aren’t one of them.”

                                    Average C app has plenty of bugs alternatives either never have or rarely have depending on design. This tech can prevent them, too. GMP has had them. Writing GMP with alternatives would inherit their properties of having either none of these bugs or less of them. So, they did. That’s the logic of it. Glad you brought it up, though, since looking at the bug reports shows everything from buffer overruns to these gems in v5.0.3:

                                    “There are several problems in the multiply code that can be seen as miscomputations and/or ASSERT failures. These bugs have been triggered in our testing environment, but the probability for miscomputations using uniformly distributed random numbers is extremely small. These bugs have been around since GMP 5.0

                                    “There are several problems in the functions for gcd, gcdext, and invert, causing miscomputations and/or ASSERT failures. These bugs have been triggered in our testing environment, but the probability for miscomputations using uniformly distributed random numbers is infinitesimally small. These bugs have been around since GMP 4.3.

                                    They not only have bugs: they’re bugs that have persisted over numerous versions they couldn’t eradicate with their informal reviews and testing methods. A good time to try something new with a track record of eliminating the kinds of bugs triggered in situations they can’t see or test for.

                                3. 2

                                  If a C code base includes calls to unsafe functions like strcpy(43), sprintf(29), strncpy(27), or strncat(1) then it seems likely that one or more of them incorrectly computes the length of a string.

                                  1. 1

                                    then it seems likely

                                    This is exactly what I mean. There’s either a bug or there’s not.

                                    I might as well claim that it “seems likely” a rewrite of GMP will introduce regressions, so everybody should keep using the original in order to avoid them.

                              1. 4

                                The new type checker, based on a constraint-generating but bidirectional approach, can type a lot more programs than the older, Algorithm W-derived checker. As an example, consider the following definition. […]

                                OCaml (which uses rather standard Hindley-Milner-Damas type inference) can type-check this definition just fine.

                                val map : ('a -> 'b) -> 'a list -> 'b list
                                
                                1. 2

                                  hindley-milner or bust

                                  1. 2

                                    That isn’t a rank-n type. Things become much trickier once you start nesting foralls, which is where bidirectional type checking shines.

                                    1. 2

                                      Certainly, but still I am a bit confused by this blog post that seems to imply that this function is not typable with traditional HM algorithms for prenex polymorphism, while it does seem to work just fine. Is there an error in the blog post itself, and the author meant to insert an example making essential use of higher-order polymorphism? Or was there a bug in the previous type-checker that could have been fixed while keeping the same inference specification, instead of moving to higher-rank bidirectional algorithms?

                                      1. 3

                                        Perhaps I didn’t make this clear enough, but I did not mean to imply this was a fault with Algorithm W in general, but rather a fault of my implementation of HM inference.

                                        Or was there a bug in the previous type-checker that could have been fixed while keeping the same inference specification, instead of moving to higher-rank bidirectional algorithms?

                                        Indeed, but I chose to do both at once. (I had previously tried to add rank-N polymorphism to the old type checker before figuring out it was broken beyond saving)

                                        1. 2

                                          I see, thanks for the clarification!

                                          Edit: Also, I only realize now but: when you write “ε-contraction” (epsilon-contraction), you mean “η-contraction” (eta-contraction), as a transformation from (fun x -> t x) to just (t), right?

                                          1. 1

                                            Er.. Yes, thanks. Greek was never my forte, maybe I should stick to the Latin alphabet. (Edit: fixed.)

                                  1. 2

                                    So this creates the function code from the types? I don’t understand how that works.

                                    If I have Int -> Int -> Int how would that know I want the function foo bar baz = bar + baz and not foo bar baz = bar - baz?

                                    1. 10

                                      It can’t. Your type is too general. Not sure what this specific plugin will do, but it will probably just come up with some expression that will have that type. Possibly the one you want – but probably not. :)

                                      For example of types of functions where the types will only allow one, correct implementation, look up implementations of e.g. length-indexed vectors or red-black trees in dependently typed programming languages. It’s also interesting to look at the interactive modes for languages such as Agda and Idris where you can tell your editor “complete this expression” and it will do that (or tell you that it’s not able to, and what’s missing).

                                      1. 2

                                        I think it is relying on the free theorems guaranteed by parametricity.

                                        See Philip Wadler’s paper Theorems for Free!

                                        1. 1

                                          I believe that these tools are not based on parametricity, merely on syntactic proof search (in essence, enumerating possible terms at this type, using techniques from the proof search literature). If you think about it, theorems-for-free does not actually help you generate code at arbitrary types.

                                          1. 1

                                            Ok, thanks.

                                            But I think the fact that this is possible is a side effect of the free theorems granted by parametricity.

                                            Which explains why it is not possible to do this with a concrete type signature, as per the original question.

                                            1. 2

                                              Abstract types and parametric polymorphism are a nice language feature that has several consequences. One is the existence of free theorems (technically, a nice denotational semantics in the categoy of relations), another is the possibility of having a restricted enough search space that blind proof search produces sensible/interesting program. Those two consequences are independent, I don’t think that you can claim that one is a “side effect” of the other.

                                              “Theorem for free” is an interesting and thought-provoking and well-written research article, so people will easily share it and think about it. This is great! But it also result in some miscomprehensions (or at least exaggerations) about what the result says, which result in the article sometimes being cited in situations where it is not actually relevant (besides being in the same scientific domain as the work being discussed). I thought you may be interested a slightly more detailed picture of the technical details at play here.

                                              1. 1

                                                Thanks for the clarification. As you say, this is stuff I know exists, but do not work with it daily so do not understand the full implications and shortcomings.

                                      1. 19

                                        re OpenBSD’s development

                                        I’m sure it took hard work. Might stop some attacks. Good work getting it done. :)

                                        re: the title and the OS comparisons rather than OpenBSD team’s good, technical work

                                        I aways love the comparisons to Windows. The Windows OS was designed to create a monopoly sucking billions of dollars out of people and big companies. It succeeds in that every day with the company founders being billionaires. There’s a bunch of other groups that focused on creating profitable companies that straight-up hire developers to work on FOSS using the piles of money they’re receiving. FOSS-loving people doing that could potentially do more good for FOSS by having a pile of money to use to support it with paid programmers. Most of Linux’s development comes from proprietary companies for instance. So, Microsoft did it better than all in terms of both making bank and being able to fund whatever they care about.

                                        Now, on the technology. That’s MS Research mostly with some in operations (eg Steve Lipner). So, Steve Lipner came in from DEC watering down Orange Book for Microsoft into SDL. They have fewer 0-days in kernel these days than most FOSS software despite lots of rewriting. MS Research being real gem where they have tools preventing drivers from crashing kernels, verifying pointer safety in C code w/out checks, a language competing with SPARK for provably-safe code that’s low-level, design-by-contract for high-level code, an OS verified to the assembly for safety, protocol verification with Lambert, the Haskell stuff with SPJ, something for asynchronous recently, web services I think, ASM’s for modeling, and so on. The list goes on. In every aspect of development, they have tools that can achieve near perfection of the deliverable with useful prototypes or real-world examples of doing that. Most FOSS, including OpenBSD, have shown no interest in doing something similar despite having the free, labor advantage. Barely a comparison or at least not one that makes MS Research look stupid or incompetent.

                                        So, the author is hilarious talking trash about Microsoft like they don’t know or contribute to security because a product designed for surveillance and monopoly profits lacks (feature or assurance here). Nah, their organization has the talent and does better than almost all of them I know outside CompSci and a few companies. They use the massive profits from that shit product to fund it, too. Maybe better to learn from them imitating their best qualities (dropping the rest) instead of knocking the “drawbacks” of products that profit massively off those “drawbacks.” It’s not a failure if it was the intended goal.

                                        Note: Similar statements could be made for mainstream, Linux distros that barely care about security vs usability and adoption. Of course they’re behind on security but ahead on their goals. Whereas, groups that cared put Linux on separation kernels or did compiler-assisted transforms that immunized against common forms of attack. FreeBSD got capability security down to hardware level by CHERI team. Almost nobody cared to adopt or invest in them much like OpenBSD and other security-focused projects. It’s the market’s preference whether proprietary or FOSS.

                                        1. 6

                                          Most FOSS, including OpenBSD, have shown no interest in doing something similar despite having the free, labor advantage.

                                          Probably the biggest advantage they have is that some project can be carried to completion slowly over years without risk of running out of money and getting cancelled. Unfortunately the risk of running out of labor is real. It is cheap, but not plentiful.

                                          Projects that are primarily driven by volunteers working for free on their limited free time tend to have a labor disadvantage, especially for any subproject that would require cooperation and long term sustained, focused effort from multiple individuals. It is difficult to organize, people come and go as their pool of energy and time fluctuates, work gets spread out over longer time and context switching reduces efficiency further. By contrast, even a small team working full time for a few weeks can have a massive advantage on such a project.

                                          And this is why, without belittling the work the OpenBSD developers have done, I think it is best suited for low hanging fruit. Little things that can be spearheaded by one person, possibly with some help from a handful of others. For example, let’s randomize xyzzy and see what breaks. Randomizing xyzzy isn’t a ton of work. It’s a fairly isolated change, and doesn’t require you to rewrite or redesign any of the existing software; bugfixes and relatively minor changes suffice.

                                          Finding all the things that break and fixing them can take a lot of time, but they can all be handled as individual incidents in isolation, and then you move on. There may be tough corner cases that require people to get together and reconsider the impact of the change & possible solutions, but most of the work can be done in small steps. Steps that can be easily spread over time and people, without much organization. Bugs get fixed, and the result is slow evolution towards more secure software. Furthermore, it doesn’t burden other developers’ backs in the future: if anything, bugs will become easier to spot and fix as the system grows less tolerant of them. That sort of thing is manageable with patchy effort from people working remotely across the globe.

                                          Introducing the straight jacket of verification and provably safe code (but who proves that the proof proves the right thing?) is much more like revolution. We all know you just don’t slap C with it, fix a few dozen bugs, and be done with it. No, it’s a ton of work, it may require substantial rewrites, it requires someone to write the contracts. It changes the way things work in the future so it’s not a little isolated thing someone can do alone. No, if you want these guarantees for more than just one snapshot of the code, you need to get everyone on board and keep doing the work in the future. That is so much more work, and much harder to manage without a tight knit team working together full time. Doing the work and getting everyone on board can be as realistic as “rewrite it in rust.”

                                          I guess if the same limited labor pool were expended into an endeavor like that, the marginal project would be marginalized even further to something on par with an academic exercise. One that you can then point at as an example of something good that no real world security focused project is doing. I guess this underlines the market preference you mentioned, although I associate “market” too strongly with commercial products to use the word in relation with a project of volunteers scrathing their own itches.

                                          And then, in that alternative reality, I could be nickpsecurity^(-1) and provide a list of all the low hanging fruit – that could’ve easily improved the security of real world software – that OpenBSD weren’t interested in doing because they were all too busy doing something else…

                                          1. 4

                                            I don’t believe in “labor pool” as an explanation of the difference. FOSS communities do not have a formal methods culture, at all. When academia shows up with ideas of tools to adopt to improve software production, they are met with a lot of implicit resistance and “I won’t do anything until the tool is super easy and convenient to use”.

                                            In contrast, Microsoft has managed to introduce formal-methods tools in its development process; I suspect by a lot of top-down authority moves “here is how we’re going to do things now”, but I don’t actually know how the changes happened internally. I’m sure it had costs, because the tools were not necessarily able to scale to the project’s scale on day one, and there certainly was a lot more necessary work poured into making them easier to use after they started getting adoption internally.

                                            Personally I think that the lack of interest for formal methods and, in general, most sorts of interaction with academia that do not fit the existing “just send us patches in the way we are used to” workflow (those do happen, for example the systems research community contributes to Linux, and the compiler people contribute to GCC and LLVM), is one of the main issues with open-source communities today.

                                            The Linux kernel, for example has a lot of companies funding its development today, it’s not a volunteer-on-their-spare-time project, yet almost no one is willing to get serious about interacting with academia (somes exceptions I know of are Coccinelle, and the recent work on formalizing Linux memory models and in particular the RCU mechanism, which happened in great part thanks to the motivation of Paul McKenney).

                                            It is not a “open-source vs. industry” split, because a large part of industry is just as bad at interacting with academia. Actually Microsoft is in a rather unique position there, given the long-term investment they made in Microsoft Research, so it’s understandable that they would be better at else than pretty much everyone else. (IBM still does have some good research groups, and I’m sure Intel has interesting stuff in hardware-land and physics but not that much in software as far as I know.)

                                            1. 2

                                              I don’t believe in “labor pool” as an explanation of the difference. FOSS communities do not have a formal methods culture, at all. When academia shows up with ideas of tools to adopt to improve software production, they are met with a lot of implicit resistance and “I won’t do anything until the tool is super easy and convenient to use”.

                                              Perhaps one can view the lack of a labor pool as a manifestation of the “culture problem.” Actually putting results from academia into use in the real world takes work and effort, and there needs to be someone to do it – labor. Now if you find a dedicated individual with the drive, maybe that work gets done. Alternatively, someone could do targeted funding to get that work done.

                                              Now as you point out, funding alone doesn’t magically make people work on the things nickp mentions, unless that funding goes toward that specific goal. Mr. Torvalds may be funded to do basically whatever he wants with his kernel (scratching his itches), and there may be other RH, Intel, etc. employees paid to scratch the itches of their respective company. They still do not get to tell each other to undertake some particular project. Just because they’re paid doesn’t mean they automatically start to work on formal verification of the code.

                                              I still think funding is a game changer. Coccinelle got funding. Would it be interesting enough for you to mention it if it hadn’t received that funding? Maybe, maybe not. OpenBSD got features because they were funded. Not enough volunteer labour to make it happen.

                                              Of course project goals (and top-down authority if it takes that) would play a role too. But you just don’t get to tell volunteers exactly what to do.. well, one can try:

                                              “5 weeks ago at d2k17 I started work on randomized kernels. I’ve been having conversations with other developers for nearly 5 years on the topic… but never got off to a good start, probably because I was trying to pawn the work off on others.” https://marc.info/?l=openbsd-tech&m=149887978201230&w=2

                                              1. 1

                                                “they are met with a lot of implicit resistance and “I won’t do anything until the tool is super easy and convenient to use”.”

                                                Interestingly, they still get that reaction even when they make stuff that’s fairly easy to use or works on existing code. Especially for Linux and FreeBSD, academia does all kinds of push-button tech related to reliability, security, configuration, parallelism, etc. Very little of it gets accepted or built on. We can easily see what benefits utilization would’ve brought just by looking at the benefits the work brought in their papers. For example, although Linux coders don’t do much static analysis or automated generation of tests, the academics that use or build such tools deploying them on Linux or FOSS apps almost always find problems. The projects probably accept the bug reports or fixes but won’t adopt the tools that produced them to begin with.

                                                There’s been a few that made it. The overall trend is negative toward CompSci’s contributions, though.

                                                “given the long-term investment they made”

                                                Another angle to look at with that is CompSci people taking initiative emulates that for FOSS projects like Linux. They’re getting the brains and labor free to them without that initiative. Some of the same stuff is being created, too. They still don’t want to invest in using most of it. One specific example I liked was how Microsoft eliminated tons of crashes or blue screens with their driver, verification tool. An academic team made one for Linux drivers, too, as shown below. I’m not sure if this one is available for download but I doubt it would get used, copied, or extended by FOSS folks even if they had the talent.

                                                https://pdfs.semanticscholar.org/41ab/8a4be4a0ac015a282d70099147063d2fa72c.pdf

                                                1. 3

                                                  I see what you mean, but I find your vision to be a bit too unbalanced. In my experience with academic software production, it’s exceedingly rare that a solution can be used as-is by people other than the authors at the scale of a realistic software project. Even more so for program analysis / verification tool, who are by nature incomplete and that a majority of academics develop in an exploratory fashion (trying experiments to see what sticks) rather than with the mindset of creating a robust long-term tool.

                                                  In any case I think it’s not great to point the finger at one group and say: “it the fault of those people”. Even when it might be true (and I don’t think it is the case here), this is not an efficient approach to get people to change their habits! I prefer to discuss the problem in term of the need for cooperation, with efforts coming from both sides (there are also academics that don’t engage with FOSS communities enough, for example). But this requires both sides to be interested in the interaction, and this is matter of community culture; I agree that, on most FOSS communities’ side, there is unfortunately a lack of verification culture, and in general a lack of interest for interaction with academia on software development. (Whereas in my experience, academia tends to value “impact” into FOSS communities as part of its culture.)

                                                  1. 1

                                                    Oh I’m not trying to imply they’re always very usable. Just that the ones that are pretty usable go unused normally. You’re also right about the verification tools. I really cherry-pick the ones I submit here. Most of the work isn’t as practical. I also agree cooperation is better than finger-pointing to increase odds of acceptance. The solution to FOSS side might be just better marketing of the total benefits (includes maintainability & less debugging) instead of just “it might find or stop bugs.” Better interfaces on the tools, too, with them covering more corner cases. Better packaging & auto-config as well. At least some are on Github now.

                                              2. 2

                                                “Projects that are primarily driven by volunteers working for free on their limited free time tend to have a labor disadvantage, especially for any subproject that would require cooperation and long term sustained, focused effort from multiple individuals.”

                                                Ok. I misspoke in prior post. Let me modify it to say they have potentially a labor advantage where the contributors are free and not forced to use whatever tools companies demand. You’re right in that the actual affect is usually either no contribution or low amount that’s hit and miss. The big ones are outliers.

                                                “Finding all the things that break and fixing them can take a lot of time, but they can all be handled as individual incidents in isolation, and then you move on.”

                                                This is an interesting perspective. It could be a psychological motivator for some of these activities. I know two others that we must remember, though. One is backward compatibility with UNIX model. That architecture dictates quite a bit in terms of what they can do with what results. The other is they prefer mitigations that have barely any observable slowdown that also work with little effort by users. The usability aspect is great default but the other thing means they’ll use risky mitigations or obfuscations. Those addressing the root causes almost always have a significant slow-down if still using C language. So, UNIX model, on by default, little to no performance hit, and maybe like you said easy to work on a little at a time.

                                                Your point would probably apply to other projects even more since the OpenBSD team is unusually committed to doing the hard work.

                                                “Introducing the straight jacket of verification and provably safe code (but who proves that the proof proves the right thing?) is much more like revolution. We all know you just don’t slap C with it, fix a few dozen bugs, and be done with it.”

                                                You actually can if aiming at medium assurance. As I told one of them, the least-painful option would be to create a safe variant like Cyclone that integrates well with C in terms of calling it or compiling to it.

                                                https://en.wikipedia.org/wiki/Cyclone_programming_language

                                                New code can be written in Cyclone or similar language. Old code can be gradually rewritten over time or they can do a big rewrite. The language differences amount to annotations or differences in structuring where you usually don’t have to change the actual algorithms. The port wouldn’t be easy since they never are. It would be way easier than a non-C language and easier than trying to make everyone experts at never introducing vulnerabilities in C code. Integration with tools such as Frama-C, KLEE, affine types, or assertion-based test generation would just make the quality go through the roof if supplementing their code review.

                                                That’s all before we even consider things like formal verification done by hand or really changing the structure of the OS. The safe language option is quite doable. Especially seeing what Redox OS did in Rust and Muen kernel + IRONSIDES DNS did with SPARK Ada. Those projects would be harder given the borrow checker or annotations than C developers using a safer C or something w/ lightweight, verification tools. What stops them is personal preference of the team far as I can tell. They just like using C, want a UNIX, prefer to do the code review, and so on.

                                                “who proves that the proof proves the right thing?”

                                                That’s barely relevant even though I see why you’d ask. Your TCB always includes your requirements, design, source, tooling, underlying platform, and so on. All you can do is reduce it and make it easier to verify. Decades of evidence shows formal specifications in particular find problems in many parts of the lifecycle, esp ambiguities or interface errors. Formal proofs can find nearly impossible-to-spot bugs but seem to be best for tiny, critical stuff that’s similar to prior, verified work so you know it’s possible to do in the first place.

                                                Far as how deep trust goes, a lot of this work is done in Coq or HOL specifications. HOL itself has been verified at a logic level with another person making a low-TCB implementation, HOL/Light, being a few hundred lines of code IIRC. Foundational, proof-carrying code and other methods essentially use a prover to generate all the transformations from high-level specs or source down to assembly producing proof terms in as untrusted way as possible. The trusted, lightweight checker such as HOL/Light then confirms each action was logically valid in the rules of the logic and in terms of the formal specification. You essentially trust a pile of unambiguous specifications which are easier to machine-check plus a checker that’s easy to check. Examples from a leading researcher:

                                                https://www.cl.cam.ac.uk/~mom22/stacks.html

                                                So, there that is for your curiosity. They’ve taken it much farther than I ever expected reading about the old days of Gypsy Verification Environment, VDM, and so on. With work like CertiKOS, the field has come a long way. :) That kind of thing takes a lot of PhD’s. So, I don’t ask FOSS projects to do that much. Most I’d push is Cleanroom w/ safe languages or Altran/Praxis’ Correct by Construction. The latter combines good design, formal specs, code what one can in SPARK or something similar, thorough review of specs/code, and plenty of testing. Their defect rate is usually around 0.04-0.05 per thousand lines of code with some provably immune to common vulnerabilities. It seems practical given they charge a 50% premium on top of going rate for software projects by experienced professionals. As you said, though, raising the bar on quality using such specialist techniques will reduce contributions a lot. It can probably be done in a quality-focused project such as OpenBSD, though, since they already raised the bar quite high.

                                                1. 2

                                                  Let me modify it to say they have potentially a labor advantage where the contributors are free and not forced to use whatever tools companies demand.

                                                  That is a very good point.

                                                  You actually can if aiming at medium assurance. As I told one of them, the least-painful option would be to create a safe variant like Cyclone that integrates well with C in terms of calling it or compiling to it.

                                                  “Least-painful” might actually be a fair choice of words in that it admits the work is far from easy. When you say just create a safe variant like Cyclone, you’re perhaps a little too optimistic and assuming that it’s not a hairy problem to solve. Arguably the biggest safety improving feature of cyclone is fat & bounded pointers. Neither are compatible with C, so the project quickly becomes one of rewriting all the code, or perhaps just writing a whole new newc-libc to coexist with old libc. And then same for every other library the applications depend on. Unfortunately, “legacy” C code does not benefit from this. Notnull pointers could be compatible with C, but you still can’t eliminate null checks from functions accepting such pointers if you want to let legacy C call these functions and still have safety. So there’s not much point, and frankly null pointers aren’t such a big safety concern in application code anyway…

                                                  It gets hairier still when you consider allocators or close-to-the metal code where limitations like the inability to convert integers to pointers just won’t fly. Okay, so perhaps we can drop that one feature too because it doesn’t provide meaningful assurance anyway – applications that need to treat pointers as integers are vanishingly rare and ought to get caught in review. Code that needs to do it still needs to do it.

                                                  Then you consider regions. Again not useful feature if you’re interfacing with any legacy C, as it won’t understand these. Then again, current C compilers actually do a fair job of tracking the easy cases of dangling pointers to automatic storage. Complicated cases are still there of course. But these are a rare problem to begin with, and mitigations on the stack (canaries mainly) do a fair job of making such things hard to exploit. And that works for all C software.

                                                  Heap regions are the more interesting feature. Until you realize you can’t free these heap pointers and cyclone “deals” with the problem by using a conservative garbage collector. Nope, nope, that is a non-starter. So now our developer needs to reinvent rust in order to create this little safe variant of C? Or maybe we should forget that feature too?

                                                  There’s a bucketload of nontrivial design issues before you even get around to the implementation. And then there’s the implementation. It doesn’t look like OpenBSD has too many compiler hackers to begin with, and just having an up to date C compiler that supports the relevant architectures has been a depressingly uphill battle. There’s dead ends like pcc and tendra. Old versions of gcc that were kept around for their old-arch support. Or license. Now there’s clang, which went straight into “gigantic, nasty, and unavoidable.” If this is how life goes with C compilers, I don’t see them designing, implementing, and maintaining a dialect of C (including a copy of libc) to go along with it. Not anytime soon.

                                                  I’d argue that a more likely route is, once again, evolution in small steps. That means, on one hand, safer apis that make things like NUL-termination of C strings (which cyclone seems to be obsessed with) less and less of an issue. On the other hand, attributes like bounded and nonnull along with improved static analysis. I’m somewhat optimistic about C being slowly evolved into something that can have some additional guarantees it does not have now, and something that rejects bad or likely-bad code constructs. I wouldn’t even rule out the possibility of something like fat pointers coming to live alongside normal pointers. But slapping on a bunch of restrictions is a huge deal. I’m not optimistic about revolution.

                                                  1. 2

                                                    Interesting comments. I know too little about C programming right now. I’ll have to think on some of this over time. One I can address immediately, though:

                                                    “Until you realize you can’t free these heap pointers and cyclone “deals” with the problem by using a conservative garbage collector. Nope, nope, that is a non-starter.”

                                                    There’s been plenty of GC-using operating systems. There’s usually some unsafe stuff at the bottom. Past that, they work fine with the Oberon-based systems such as A2 Bluebottle being highly responsive. More than the C-code competition I use. There are also low-latency and real-time GC’s. Go makes a lot of use out of one in performance-sensitive code. Then some OS designs such as JX allow one to swap out different collectors for different parts of the system based on their needs. So, technologically, it’s not a non-starter so much as a proven solution that worked even with the crappy GC’s some of the older OS’s used. A mix of non-real-time GC’s, real-time GC’s, ref counting, and unsafe is a valid strategy that stops most problems by default. I’m also for adding the option of Rust-style borrow checking which was inspired by Cyclone by the way.

                                              1. 3

                                                Managers that let their programmers use tabs use this as a perk to have them accept a smaller salary.

                                                1. 3

                                                  What turned me off calibre (I never used the collection features, but a good ebook reader is useful) is the generally bad security engineering and inappropriate reaction of the author to security-related bug reports (see this LWN.net article from 2011). Has this improved in the recent years, any reason to hope that Calibre 3 will be an improvement or that the authors changed their development practices?

                                                  1. 3

                                                    Have they evaluated used Pypy? (Some libraries FFIs are not supported, but it may be less of an issue for server workloads than for NumPy?) I also wonder if JIT startup time would be an issue in their environment, or if they have long-running processes.

                                                    1. 2

                                                      The graphs are fairly confusing because they compare, for each system, the performance ratio of SQLite versus the Filesystem, but they look like they compare the performance across filesystems. In particular, looking at the graphs, it looks like Windows 7 and 10 are much faster at reading files than all other systems, while in fact the result seem to be explained by the fact that (when antivirus protection is enabled as in these tests) the filesystem access overhead is sensibly larger.

                                                      The point of the author is to talk about the time ratio between filesystem and database, so it is of course reasonable to emphasize that. But I still think that the very easy confusion suggests that this is not the best visualization approach. If the timescale of access on the various system are comparable enough, I think it would be best to have absolute time charts, plotting one bar for the filesystem and one bar for the database, on each system. Otherwise, no graphic plot need to be given: a numeric representation with one ratio per column would convey the same information and be less confusing.

                                                      The key problem is that bar graphs like that are designed to make it easy to visually compare performance of the various measurements, which is non-sensical here (it is not interesting to know that the ratio between filesystem and database is 1.5X worse on Apple systems than on Ubuntu systems; you want to know that the ratio is surprisingly large on all systems). This is a case of a tool used for the wrong purpose.

                                                      1. 1

                                                        I agree that it is somewhat misleading.

                                                        However, don’t you think that the difference in ratio across systems is indicative of OS/filesystem efficiency? Assuming that SQLite and direct read/write performance is optimal in all systems and that the different file systems are comparable in terms of features.

                                                      1. 2

                                                        I’m reviewing/grading proposals submitted to the OCaml Workshop and the ML Workshop (ML is a family of programming languages including OCaml, ML, F# and, depending on the context, may even encompass aspects of Haskell, Rust and Scala), which are going to happen consecutively on September 7th-8th in Oxford, UK, colocated with the International Conference on Functional Programming. The OCaml workshop is more application-oriented (cool new libraries or tools for the OCaml ecosystem), while the ML workshop is more theory-oriented (it’s about research ideas that would apply to several different programming languages). Both have incredibly exciting submissions this year, and I’m already eager to actually see the talks.

                                                        1. 8

                                                          Thanks to ST, I now know that this not the case, which is a huge relief. If required, I can do a literal translation of the algorithm, and clean it up (or not) later.

                                                          That is not completely true, because you can only embed in ST algorithms that follow a rather simple region discipline in the way the effects flow into the program (the computation has to promise to stop performing effects in the future, and at that time its value becomes accessible as a pure value). This captures a large class of interesting programs, but still there are some algorithms that are observationally pure but internally use mutable states in a way that does not follow this discipline. They cannot be “proven pure” by using ST alone. You can of course write them in Haskell, but their type will carry the impurity annotation and may force you to restructure the rest of your program.

                                                          For a practical example, take laziness: a simple implementation of call-by-need evaluation (where thunks are “memoized” after they have been forced) in a strict language is a function that captures a mutable reference, and mutates it on the first time it is called. This is observationally pure (Haskell uses mutation in this way underneath), but it cannot be given a non-effectful type using ST – its effect discipline is more complex than that, as the initial value (the unevaluated thunk) is returned to the rest of the program while there is still one write to be performed in the future.

                                                          1. 11

                                                            I’ve seen some open street maps renderings that are better and more detailed than both. I wish Apple would embrace OSM as a way to compete with Google

                                                            1. 6

                                                              I investigated using openstreetmap.org as a map provider for a private commercial project I worked on, but hit a wall. When it came to their policies (which are reasonable), they all generally resulted in rolling your own system and administering it, whether your project donates to the foundation or not.

                                                              https://operations.osmfoundation.org/policies/api/ (tagged map data, for directions and annotated locations) https://operations.osmfoundation.org/policies/tiles/ (provided image assets) https://operations.osmfoundation.org/policies/hosting/ (contributing hardware for free, open use) https://operations.osmfoundation.org/policies/nominatim/ (search appliance usage)

                                                              I really wanted to use it since there wasn’t much expectation for heavy use, but couldn’t guarantee that any potential spike in usage wouldn’t get our access blacklisted, and couldn’t justify adding the overhead of a mirror to the cost of the project, to get an instance hosted.

                                                              Hypothetically speaking, if Apple adopted Open Street Maps as a provider, and contributed to the foundation, and assisted in crowd-sourcing an accurate data asset, they’d have to deal with a number of complicated issues (running isolated systems that firewall usage habits from leaking into the open; curating information in a commercially relevant manner while operating some else’s data set; controlling the affect of new data introduced from the other direction on their own product), all while hosting their own massive cluster for reliable service delivery (which they probably already effectively do), and certainly rolling their own map skins to match Apple UI conventions.

                                                              Somewhere along the line, it becomes clear that in such a situation, Apple’s influence over Open Street Map would be very nearly total, and not unlike Google’s near total influence over Firefox prior to Chrome’s existence.

                                                              1. 2

                                                                Does that mean that there is a space for an OSM mirror that would offer an API to others, having them pay for resource usage?

                                                                1. 2

                                                                  Aren’t there? At least Mapbox seems to use OSM data (and looks like building lots of stuff around it, pretty cool), I’ve also used MapQuest in the past (but remember they were serving map times not even as PNG but JPEG with artefacts). There seems to be a lot of services out there.

                                                                  1. 1

                                                                    I’m not entirely sure, but I see that the Open Street Map Foundation has some answers to common licensing and legal questions over here:

                                                                    https://wiki.osmfoundation.org/wiki/Licence/Licence_and_Legal_FAQ

                                                                    https://opendatacommons.org/licenses/odbl/

                                                                    https://wiki.osmfoundation.org/wiki/Contact

                                                                  2. 2

                                                                    It’s my understanding that Apple does use OSM as a provider and contributes back.

                                                                    1. 2

                                                                      OSM did a blog post on this back in 2012: https://blog.openstreetmap.org/2012/10/02/apple-maps/.

                                                                      To save you a click, it suggests that Apple uses OSM for some locations (they use Islamabad as an example), but largely sources map data from Tomtom. Granted, OSM’s blog post is coming up on half a decade old, so Apple’s map data preferences may have shifted.

                                                                      1. 1

                                                                        Ha! Shows how much I know… Or could even be bothered to Google about…

                                                                        But then again, I’m neither an Apple employee, nor a contributor to OSM (at the moment).

                                                                1. 5

                                                                  I would rather just rewrite the thousands of lines of code. You already have the architecture laid out so this is a well-defined and easy (albeit boring) task. Writing a Python interpreter, even a small one, will require some research. And when I read

                                                                  OPy will reflect this evolution. It could even end up a specialized language for writing languages — a meta-language — rather than a general-purpose language like Python.

                                                                  I wonder if you’re keeping your eyes on the prize, no offense but this sounds like the kind of dicking around I like to do when I’m bored of whatever I’m supposed to be working on. You’re trying to make a shell, not a meta-language, right?

                                                                  1. 4

                                                                    Especially since multiple meta-languages already exist, such as SML, Ocaml, and I’d include Haskell here. And for Python-like language, the PyPy people made a simple meta-language.

                                                                    The upside to these is that they come with optimizing compilers, debuggers, community, existing libraries, and semantics.

                                                                    If the goal is small, surely something like Lua makes more sense than writing ones own Python implementation.

                                                                    1. 3

                                                                      On top of that, they (esp Ocaml) come with libraries specifically for writing compilers faster.

                                                                      1. 3

                                                                        After using ASDL a lot, I definitely wondered if I should have written it in OCaml. In 2013, I did another parsing project in Python, and I actually resolved to write future projects like that in OCaml.

                                                                        But there are a couple problems with OCaml. I would say it’s a meta-language for semantics and not syntax. ANTLR/yacc are meta-languages for syntax. In particular:

                                                                        • OCaml’s Yacc is NOT good enough for bash. It misses by a mile. If you look at the first half of my blog, you will see I spent a lot of time developing a somewhat novel parsing algorithm for bash.
                                                                        • Moreover, the VERY first thing you hit when writing a language – the lexer, is heavily stateful. OCaml is worse than C for writing this kind of code. Parsers and lexers are slightly paradoxical, because from the outside they’re purely functional, but from the inside they’re heavily stateful. I don’t like writing imperative code in OCaml; there’s no real benefit.

                                                                        The second problem is that OCaml is not a good meta-language for itself. I talked about that in “Type Checking vs. Metaprogramming” [1].

                                                                        So yes I’m daydreaming about a meta-language that’s a cross between ML/Lisp/Yacc/ANTLR – some details here [2]. But it is toward a somewhat practical end. I actually have to implement multiple languages (OSH, Oil, awk/make). If it can implement those languages, I’ll call it done.

                                                                        Now, I’m not necessarily saying that OCaml would be worse than Python. I certainly worked around a lot of Python’s quirks, no doubt. But they both have flaws for this application, and I chose the one I was more familiar with and which had a bigger toolset.

                                                                        [1] http://www.oilshell.org/blog/2016/12/05.html

                                                                        [2] https://www.reddit.com/r/ProgrammingLanguages/comments/62shf7/domain_specific_languages_for_building_compilers/

                                                                        1. 3

                                                                          I am fairly familiar with some lexing/parsing tools for OCaml and I wonder if some of the limitations you mention have not been lifted by some advanced users encountering similar needs.

                                                                          I don’t know exactly what you mean by “novel parsing algorithm for bash” (would you have more precise pointers), but for example your blog mentions the idea of having several lexical states, and it happens that this question has been studied by people writing parsers for multi-languages (say HTML+CSS, etc.) using standard lex+yacc-inspired parsing tools. See Dario Texeira’s on-the-fly lexer switching with Menhir for example.

                                                                          Menhir recently gained an incremental parsing mode where, instead of the parser taking a (lexbuf -> token) lexer as argument, there is an explicit representation of the parser state as a persistent value (a state may be accepting, an error, expect further tokens, etc.), and a function that takes a state and a token and returns the next parsing state. This is advanced and not as convenient as the usual approach, but it gives you a lot of flexibility (you can inspect the state in the middle, and for example decide to switch to a different lexer if you want) that would allow to do lexer-switching in a nicer way than described above.

                                                                          Finally, while Menhir would be my goto-choice of parser generator technology for OCaml – and I haven’t understood yet whether bash parsing is fundamentally non-LR(1) – you may want to experiment with Dypgen, a GLR parser that also explicitly supports scannerless parsing.

                                                                          1. 2

                                                                            Off the top of my head, the difficulties with parsing shell are:

                                                                            • It has to interact with readline. I call this the $PS2 problem. When you hit enter after if true, it should print PS2 rather than executing the command. Most shells do something ad hoc for this and get it wrong in some cases, but my parser handles it without any special cases.
                                                                            • I reuse the parser for tab completion. This imposes some constraints on error handling (parsing incomplete commands). At least some shells do this with another half / ad hoc shell parser and get it wrong.

                                                                            Both of these make bottom-up algorithms unsuitable IMO. You want control over how much input you are reading. And you want to know which production you are looking at earlier rather than later.

                                                                            Some other issues:

                                                                            • Shell is best thought of as not one language, but four mutually recursive languages implemented by four parsers. Something like "dq"$(echo)'sq'${var:-default}unquoted is a single word, and the word language is fully recursive. But the word is used as token for something like for i in x y z. It is unusual structure and many parser generators don’t accomodate it (e.g. ANTLR).

                                                                            I should have really said that I don’t want to use a parser generator at all. It might be possible, but I don’t think it would save any work over a hand-written parser due to the work above. The blog post you linked seems to indicate that the approaches used are a little hacky.

                                                                            I have some daydreams about writing a custom parser generator to express it.

                                                                            But as mentioned, OCaml is close and I wondered if I should have used it. One of the thing that tips it is that not only don’t want a runtime dependency on a Python interpreter, I don’t want any build time dependencies besides a C compiler either. In the world of shell, configure && make && make install is still useful and I think an expected workflow.

                                                                            EDIT: I also have the usual complaints about custom error messages and parser generators. I don’t have any specific experience with Menhir, but there are definitely places in shell where the location of the parse error is not the one you want to point out to the user.

                                                                          2. 3

                                                                            Ocaml is only one of the languages I mentioned, there are other options. In case you are not aware, though, there are other parser libraries such as menhir and stream combinator parsers. I’d also wager it’s easier to make a specialized parser DSL in Ocaml than building a Python implementation. The types alone mean you can get a well typed AST out of it.

                                                                            I’m also very surprised that you claim Python is better than something like Ocaml for parsing, that has not been my experience at all. I still can’t really imagine that a home-brewed mini-Python is better in terms of all the other aspects that come with a language implementation such as optimizations and debugging.

                                                                            But it’s your project, so have fun!

                                                                            1. 2

                                                                              Menhir is still a LALR(1) generator as far as I know, and that algorithm doesn’t work with shell at all. I always see people recommending parser combinators, but I can literally think of zero production quality languages that use them. And I’ve looked at perhaps 30 “real” parsers.

                                                                              As far as I can tell, OCaml, SML, and Haskell all have the same thing going for them – algebraic data types and pattern matching. I don’t know of any non-toy or non-academic programs written in SML. OCaml seems to be what you use if you need to write something production quality. Between OCaml and Haskell, OCaml actually has a Unix tradition and has better tools as far as I can tell. Supposedly Haskell is not efficient with strings.

                                                                              I claimed that C is better than OCaml for writing hand-written lexers. I explicitly didn’t say that Python is better – I said the opposite. OCaml would probably be the #2 choice. The first post describes how I ended up with Python:

                                                                              http://www.oilshell.org/blog/2016/10/10.html

                                                                              I started with 3000 lines of C++ and realized I would never finish at that rate. The prototype became production… but I think that will end up being a good thing.

                                                                            2. 2

                                                                              “cross between ML/LISP/YACC/ANTLR”

                                                                              Closest I know is sklogic’s tool at:

                                                                              https://github.com/combinatorylogic/mbase

                                                                              He mixes and matches about every technique in the book in one system for building arbitrary applications in this space. The commercial use is a product for static analysis. He’s always DSL'ing problems. His main development is done in a mix of LISP, Standard ML, and Prolog. He threw SML in for stuff he wants done safely with easy analysis. Prolog is obviously for anything done easy with logic programming. He has LISP for everything else with the full information of the compiled app available to assist him. So, that’s the sort of thing you can do if you start with the right foundation in a language that can embed the rest on a use-case by use-case basis.

                                                                              Now, one like that outside .NET platform building on something like Racket Scheme might be quite interesting. Even better, something like PreScheme available at least selectively with added benefits of dynamic safety & concurrency of Rust. There’s a lot of options out there that build fundamentals on languages with strong ecosystems and prior work before one attempts to do fully-custom stuff.

                                                                              1. 2

                                                                                That’s exactly what I linked to in my post :)

                                                                                Yeah I don’t like the .NET platform and have some other reservations as you can see in the conversation there. I think doing something fully general purpose may be infeasible, but if you narrow the goals to expressing/ reimplementing the DSLs in Unix, of which there are many, it can probably be done.

                                                                                1. 1

                                                                                  Oh I missed that. I thought it was a Reddit elaboration of Oil or something. Ok, you looked at the toolkit, I agree on .NET, and there were disputes about what it could handle. Most of these I’d have said try and see. I see many people argue with him about whether his method could handle a job he’s already used it for. I think he said on HN he uses it to build static analyzers for C and Verilog that his company sells… written in a smooth combo of something like 4-8 DSL’s. I usually say, “Seriously? You think he doesn’t know how to parse weird stuff?” Your case is an exception where it’s not clear cut because Bash might be the pain you describe from my own research.

                                                                                  I remember looking up a formal semantics. Plus, an off-hand comment by David Wheeler in our verifiable builds discussion got me thinking about using it for a bootsrapping of untrusted compiler since bash is always there & already trusted. So, needed to see if people wrote compilers in it or even formalized it (aka precisely describe it). I found a partial formalization that talked about how difficult it was since it was like several languages at once with static and dynamic aspects. That’s exactly what you said on Reddit. I thought, “How we going to cleanly parse or verify it if they can’t friggin describe its semantics?” The other thing I found were compilers and such that the people semi-regretted writing. Reluctantly shared with caveats haha. Still, even if not sklogic’s method, such a thing made me think you might get best results using two or more fundamental techniques that are each fast instead of one-size-fits-all method. At least keep it in back of your mind as he does it regularly and high-assurance did too. They did sequential stuff in ASM’s and concurrency in SPIN. I could also definitely see Bash done in a rewriting logic with primitives or heuristics for corner cases since they can handle about anything if it’s parsing and transforms. Maude & K framework come to mind.

                                                                                  So, I guess I can’t offer any specific help on this one past remember using more than one tool might be better on occasion for parsing. Not always but sometimes. In any case, I’m glad you at least got to look into such hybrid tools. Bash is also hard enough to cleanly model that it might be worth some PhD’s throwing effort at to produce a tool like GOLD parsing engine, Semantic Design’s, or sklogic’s that can handle it cleanly w/ straight-forward grammar + embedded functions + synthesis. Just can’t let ourselves be stopped by the challenge of something as crude and old as Bash. ;)

                                                                                  1. 1

                                                                                    What would you want to accomplish with a formal semantics of shell?

                                                                                    I have looked at papers for formalizing Python [1] and problems with formalizing ANSI C. The Python one gives examples of using it to express securities contracts and this pox networkiing project [2]. However AFAICT they didn’t say what you could actually prove about those types of programs.

                                                                                    I’m all for formalization but I would want it to be in service of something practical. Lamport’s TLA+ seems like a good example. People used it to find real bugs.

                                                                                    I suppose you could do some sort of taint analysis on programs to prove that data isn’t used as code, as in shellshock. However I think it is easier not to write such a silly bug in the first place :) That just feels too obvious.

                                                                                    Also, one of the reasons I want to keep it in Python is precisely because it’s easier to reason about the semantics of the interpreter than it would be in C or C++. For one, you get the memory safety guarantee for free. What other properties would you want to prove about shell?

                                                                                    Shell is definitely foundational, in that you generally need it to configure and build C programs. However I’m not sure the work on C foundations has borne that much fruit yet. I feel like dynamic analysis has been more practical and widely used (valgrind, ASAN).

                                                                                    [1] https://cs.brown.edu/~sk/Publications/Papers/Published/pmmwplck-python-full-monty/

                                                                                    [2] https://github.com/noxrepo/pox

                                                                                    1. 1

                                                                                      “What would you want to accomplish with a formal semantics of shell?’

                                                                                      Old rule is how can one get a simple, elegant solution to something one can’t even formally express? Among other things like verification. The formal semantics problems were just evidence it might be hard to parse or reason about.

                                                                                      “I’m all for formalization but I would want it to be in service of something practical. “

                                                                                      I agree. I’m thinking on parser generators with efficient synthesis in this case. Gotta formally describe it somehow, though.

                                                                          3. 3

                                                                            If I were just trying to make one shell, that would be true. But I’m writing an old shell OSH, a new shell Oil. The new one is bigger because it will have the functionality of Awk and Make.

                                                                            So yeah there is some yak shaving, but so far my yak shaving has paid off, like writing ASDL and the test framework.

                                                                            Of course, OSH is open source. I will be happy if someone doubts my plan, and instead wants to rewrite it in C or C++ for me. :) It’s true it isn’t that much code – around 12K lines of Python. It will probably expand into 20K lines of C++ or 30K lines of C if you’re up for it.

                                                                            Also, the meta-language thing is speculating about evolution AFTER getting OSH working on a normal Python interpreter.

                                                                            1. 4

                                                                              The new one is bigger because it will have the functionality of Awk and Make.

                                                                              Huh, this sounds a lot like mash: http://www.vitanuova.com/inferno/man/1/mash.html , http://www.vitanuova.com/inferno/man/1/mash-make.html

                                                                              1. 3

                                                                                Woah are there THREE mash shells? That’s a record. I listed two here! https://github.com/oilshell/oil/wiki/ExternalResources

                                                                                If there are any lessons I should take from that mash shell, please let me know.

                                                                                It feels obvious that make and shell should be the same program. The point of both is to start processes, and 90% of the lines of a Makefile are literally shell, or they’re simple assignments that can be expressed in shell.

                                                                                In case you missed it I wrote about that here: http://www.oilshell.org/blog/2016/11/13.html

                                                                                This makes it obvious: http://www.oilshell.org/blog/2016/11/14.html

                                                                          1. 1

                                                                            Abstraction … Modularization

                                                                            Yer double-counting Harry.

                                                                            Not mentioned,

                                                                            • non-strictness
                                                                            • explicit effects
                                                                            • purity
                                                                            • typeclasses (not just in Haskell now, they’re what Rust’s traits are based)

                                                                            I could probably think of a few others. It’s a bit strange that the author picks the oldest possible paper for modularization rather than an assortment of historically or educationally interesting papers.

                                                                            1. 2

                                                                              They are all great points (aren’t “explicit effects” and “purity” in essence the same thing?), but I am not sure that I would trade them for the ones that are proposed in the list.

                                                                              An interesting aspect of the items you propose is that they also are less well-understood, I think: explicit effects/purity is fascinating, but the consensus on how to best achieve it, both in the theory and in the practice, is not yet clear (monads? algebraic effects? should effect inference be done Koka-style, Frank-style?). We have a bit more experience with lazyness, but the gap between the theory (where it’s well-understood that evaluation order should be a per-type property) and practice (where a language has a global strategy with explicit opt-out markers) also gives me pause.

                                                                              You point out that Rust has traits, but that’s a strange narrative. Scala had implicits for a while, and I think that the Coq/Agda/Idris/Isabelle use of type classes are notable as well. They may appear less “mainstream” than Scala or Rust, but they had for example the important consequence of demonstrating that coherence is not an issue in a dependent language: as the elaborated instance shows up in the type, any issue caused by locally distinct resolutions will be caught by the type system.

                                                                              1. 1

                                                                                (aren’t “explicit effects” and “purity” in essence the same thing?)

                                                                                I’d like to believe so but some people believe Haskell isn’t pure because we have partial functions :)

                                                                                ‘Sides, you could have explicit effects in something like Rust and still not be purely functional because it’s an imperative programming language. Adding explicit effects to OCaml would be much of the same. Y'all hacked some limbs off a dead cow and tacked them onto your lambda calculus, so purity’s a dead letter.

                                                                                consensus on how to best achieve it, both in the theory and in the practice, is not yet clear

                                                                                Monads work great for me and I have a range of methods at my disposal for working with them. mtl-style (nota bene: not the actual mtl library) has been best-of-all-worlds. I have not been impressed with Eff even when an entire language was catering to it.

                                                                                You point out that Rust has traits, but that’s a strange narrative. Scala had implicits for a while

                                                                                I point out Rust because it disallows orphans entirely.

                                                                                Scala had implicits for a while

                                                                                Not canonical, so not typeclasses. Implicits are terrible and not equivalent to typeclasses at all. This isn’t some hypothetical, implicit scope behavior in Scala bites users of Scala all the time.

                                                                                Adding an import shouldn’t silently change the behavior of my code at runtime with no warning. That is obscene.

                                                                                important consequence of demonstrating that coherence is not an issue in a dependent language

                                                                                shrugs I’ve only kicked around tutorials in Agda, Idris, and Coq, whereas I work in and write about Haskell. I’d prefer to have the explicit option of making the overloading unique per type across the entire class for the sake of the humans reading and reasoning about the code.

                                                                                1. 1

                                                                                  I’d like to believe so but some people believe Haskell isn’t pure because we have partial functions :)

                                                                                  Touché, but the expected reply is that non-termination (or other sources of partiality) should be tracked in the types. There is something not completely clear about this idea, but I don’t personally understand whether it’s non-termination that is special, or just the current way we track effects that are too heavy for this orthodox reply to be really convincing.

                                                                                  ‘Sides, you could have explicit effects in something like Rust and still not be purely functional because it’s an imperative programming language. Adding explicit effects to OCaml would be much of the same. Y'all hacked some limbs off a dead cow and tacked them onto your lambda

                                                                                  I don’t see a difference with the do notation. (One minor difference is that the do-notation forces explicit sequencing of evaluation order, so it would be compared to these languages in ANF form. Some people think it’s good, some think it’s bad.)

                                                                                  or the sake of the humans reading and reasoning about the code.

                                                                                  Proof-aware languages bring much more powerful ways of reasoning about the code (eg. carrying properties around for the various type-class laws) that are mediated by the type system and I think completely subsume the guilty comfort modularity-breaking determinism. It’s liberating.

                                                                                  1. 1

                                                                                    but I don’t personally understand whether it’s non-termination that is special, or just the current way we track effects that are too heavy for this orthodox reply to be really convincing.

                                                                                    Start concrete, shift into mtl-style as needed: light as a feather baby.

                                                                                    I don’t see a difference with the do notation.

                                                                                    I find it hard to believe you don’t care about the semantics of your programming language.

                                                                                    Proof-aware languages bring much more powerful ways of reasoning about the code (eg. carrying properties around for the various type-class laws) that are mediated by the type system and I think completely subsume the guilty comfort modularity-breaking determinism. It’s liberating.

                                                                                    Sometimes I don’t want proofs about a set of implementations over a type, sometimes I want one implementation per type as a hard invariant. Nobody’s going to squander brain budget to make an academic happy. PL work would reach industry more often if y'all understood that.

                                                                                    1. 1

                                                                                      I find it hard to believe you don’t care about the semantics of your programming language.

                                                                                      We may be talking about different things as I don’t follow your reply; I do care. I am just pointing out that “imperative language” means “you write effectful programs in direct style”, and that the do-notation is precisely there to let people write effectful program in (ANF-)direct style, so it gives the same notational convenience and (if both sides track effects in types) exactly the same reasoning difficulties.