Threads for notanymore

    1. 13

      Many open source projects struggle with the fact that not enough people are interested in reviewing contributions rather than authoring them. This “Collective maintenance” blurb represents my attempt to improve this for the OCaml compiler. It introduces a responsibility among frequent contributors to spend a portion of their time on “collective maintenance”. I wonder what people think / how other projects deal with this. (There is a discussion of this for the Rust implementation at https://internals.rust-lang.org/t/did-we-start-reviewing-prs-slower/18033 ; they use automated process to try to encourage this in ways that I described there. )

      1. 4

        LLVM has a well documented review process that encourages both experts and non-experts to review code, deferring to experts for approval: https://llvm.org/docs/CodeReview.html#who-can-should-review-code

        Although there is a balance here and perhaps a useful question to ask is whether adding a review would help improve the quality or move the PR closer to merging or just generate more controversy/bikeshedding when adding an opinion on something that already has N different opinions (unless there is a very good reason why it shouldn’t be merged in its current form). The LLVM guide mentions that reviewers should be explicit whether they intend their comments to block merging or not, and e.g. what I try to do in other projects for minor nitpicks is to add the comments but approve the PR. That way the author can apply my suggestions, or reply/resolve to the individual review comments as they see fit (e.g. I would’ve done it “this way” instead, there isn’t anything particularly wrong with how the PR did it, but is very unusual, can you explain why that is better? And that can either result in some clarifying comments, the author realizing there is a better way, or pointing out why my suggestion would actually make the code worse)

        (and even with all that in mind my initial comment on one of the OCaml PRs sounded quite “dismissive”, which wasn’t my intent)

        1. 3

          Do you know how effective this LLVM policy is in practice? Are people satisfied with the speed of the decision-making process on each contribution? Is the review load spread among many people, or are a few people working full-time on LLVM doing the vast majority of it?

          1. 1

            I’ve actively contributed to LLVM only during a brief period of time, but during that time I haven’t seen a shortage of reviews (OTOH I wasn’t in a particular hurry either). They also have quite a few maintainers listed : https://github.com/llvm/llvm-project/blob/main/llvm/CODE_OWNERS.TXT

            Although not all areas of LLVM are equal, e.g. see here for some problems where people do get timely initial feedback but then actually getting an approval to merge takes a long time: https://discourse.llvm.org/t/rfc-improve-code-review-process-for-clang-tidy/66740/44

            Similar problems exist in the Xen community: there are areas where there are very few code owners and they are very busy. “The rest” can review code and approve as a last resort but that is used sparingly because merging patches not reviewed by the code owner can introduce security issues that are more costly to fix down the line. One approach discussed there is for (busy) code owners to indicate whether they are happy with the review done by others, or whether the patch is tricky enough that they should wait until it gets reviewed properly by the code owner(s); or for maintainers of other areas to approve only “trivial” changes, where trivialness needs to be agreed upon by 2 other maintainers (if they disagree whether it is trivial or not, then the disagreement is solved in favour of “not trivial”).

            I think reviewer bottlenecks probably exist everywhere, even if people are full-time employed and working for the same company. It is difficult to have more active features/PRs than maintainers, and major changes can get stuck in review for a long time. I don’t have particularly good solutions: tests and good comments in a PR certainly help.

            1. 1

              LLVM review speed probably differs a lot by area… In the corners that I’m watching, reviews typically take weeks / months to arrive. Sometimes explicitly nudging reviewers to look helps speed that up a bit, but in general I don’t really consider review velocity “solved” for LLVM. It’s often quite a bit slower than the patch author would prefer.

              The part where the LLVM community encourages both experts and non-experts to review does help get some feedback earlier on, but still you often want at least one of those mysterious experts to chime in, and (unsurprisingly) they are the busiest.

              They also have quite a few maintainers listed : https://github.com/llvm/llvm-project/blob/main/llvm/CODE_OWNERS.TXT

              Unfortunately, most of that file has been out of date for years. I don’t believe there is currently a clear record of who reviews what. As part of LLVM’s recent migration from Phabricator to GitHub PRs for review, there does seem to be renewed interest in actually keeping that file up to date, so perhaps it will change soon.

      2. 1

        I feel like this is also at least partially a problem with mathlib, where it’s mostly maintainers doing review, in addition to their own contribution work. PR often take very long to get reviewed, to the point of often getting forgotten. I guess it has to do, somehow, with the fact that to properly review, you often need that much more expertise than to simply do a PR, which un-balances the ratio.

        1. 3

          I think that it depends on the project, but for OCaml at least I don’t think that reviewing requires more expertise than contribution. There are a few places that are extremely tricky and soundness-critical, and we need a code expert to review for correctness. Many other places are not and any good programmer can provide a useful review. (A typical correctness assessment would be “I checked that the behavior of the compiler is unchanged by this patch unless exotic option X maintained by the patch contributor is enabled.”; this typically does not require being an expert.)

          In my experience reviews are only partially effective at avoiding bugs (the testsuite is at least as effective if not more; review can help enforce that contributions maintain a strong testsuite), but they are the only effective approach at ensuring that the code remains maintainable: that it is clear, somewhat consistent, appropriately documented etc. If only super-experts can understand a patch, it should probably be explained better and some extra comments should go in the code for other people to follow around. Having a non-expert reviewer provide this feedback is very valuable.

          Reviews are also very useful to spread knowledge about the code around. From the perspective of a given contributor, doing review work can be an excellent way to gradually become more knowledgeable about certain parts of the codebase.

          For these things to work we need to clarify that it is okay for non-experts to do reviews. In some cases a non-expert review is not enough, but it is then often useful to have both kind of reviews. (Maybe the expert is busy and willing to vouch for the correctness of the patch or pointing out a specific issue, but not interested in commenting on the style, etc.) Non-expert reviewers can be pretty good at assessing their own confidence in a change – what parts of the change need to be looked at by an expert, and what parts they are confident about.

          1. 1

            There are a few places that are extremely tricky and soundness-critical, and we need a code expert to review for correctness.

            Theoretically, could soundness proofs be automated with say Coq?

    2. 1

      Is this meant to live in the same space as maddy and mox ?

    3. 1

      Great article, but I don’t understand the footnote:

      GnuPG was invented in 1990. This has made a lot of people very angry and been widely regarded as a bad move

      What has made people very angry? age being released? Or the fact that GnuPG has file encryption? Or that it was released in the 90s??

      If it’s age being released, is it really true that it’s been widely regarded as a bad move? And if so, why is the author creating plugins for it?

      1. 15

        I may be off, but I think this is mostly a combination of:

        • this quote
        • People generally criticizing some aspects of PGP/GPG.
        1. 1

          Indeed, I seem to have missed the reference :)

          1. 4

            Correct, it’s a funny meme for the people that agree with the sentiment “GnuPG is not that great, really”.

      2. 7

        It’s a reference to a famous line from The Hitchhikers Guide to the Galaxy by Douglas Adams.

        In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move.

        In this context, it’s not much more than a joke about how GPG has a very mixed reputation. GPG is a very complex peice of software, and age is an attempt to a specific feature from it with a smaller/less complex tool.

      3. 5

        Setting aside the Hitchhikers’ Guide reference, a lot of cryptography people do criticize PGP. The main reasons cited seem to be:

        • Most cryptographic tools these days are purpose-built for a single use case and choose exactly the right set of basic cryptographic primitives and protocols to accomplish that, deliberately avoiding configurability (if a problem is found in the selected set, do a new version that chooses a different non-problematic set). PGP tries to serve lots of wildly-different use cases and has a combinatorial explosion of different cryptographic options, including many legacy/compatibility options that should never ever be used, which it largely dumps on the end user to sort out. This drastically increases the chance of end-user mistake compromising not only their security but the security of everyone they communicate with. TLS/SSL has gone through a similar problem.
        • Most cryptographic tools these days have forward secrecy – to the extent that they have long-lived keys at all, they are used only to generate ephemeral per-session or per-message keys, with the effect that a compromise of the key-exchange step of the protocol only compromises the session or message which used that key. PGP uses long-lived keys repeatedly and directly for signing and encryption, with the effect that a compromise breaks not only the current message but all messages that key ever was used for or will be used for.
        • PGP continues to promote the idea that email can be a secure encrypted messaging system, when it isn’t – even if everyone involved uses PGP, and configures it correctly (difficult), and generates and exchanges keys correctly (difficult), and remembers to run PGP on every message (rather than accidentally quote-replying and sending unencrypted, which is regrettably easy in many implementations), the subject line and sender/recipients still are not encrypted and are visible to every server the message passes through. This is much more than enough information for the Evil Secret Police of Opressistan to figure out who are the members of the People’s Revolutionary Dissident Cadre and round them up and start XKCD 538 operations on them.

        So at best it’s treated as a sort of relic of its era, a piece of history to acknowledge and learn from but certainly not to use for anything sensitive anymore.

      4. 2

        I read it as criticism against GPG, probably for being clunky and difficult to use? (Same might apply to PGP, of which GPG is kind of a clone?)

        With the wording being a Hitchhiker’s Guide reference, I presume the author isn’t completely serious in calling it a “bad move”. Perhaps you could argue that overall adoption of encryption suffered because GPG, as the “good enough” default for many years, crowded out alternative software without actually making encryption usable for most people. I’m not sure it was a net negative, but I do get that it has usability problems.

    4. 1

      Okay, so what programming language is category theory? Is there a language where you can define algebraic laws and enforce them?

      1. 12

        None of them are “category theory” in the same way no programming languages are “linear algebra” or “complex analysis”.

        1. 2

          Suppose you define a typeclass that represents an algebraic structure (like Monoid for example). This structure has some associated algebraic laws. I understand that in Haskell, you can’t enforce that instances of your typeclass obey these algebraic laws. Are there any programming languages that do a better job than Haskell at enforcing algebraic laws in typeclass instances?

          1. 7

            Pretty much “any” dependently typed language should allow formulating those laws I presume: Coq, Agda, Idris, Lean, …

          2. 6

            Idris does, but then you get into the general madness of formal verification.

          3. 3

            you can write proofs in agda about your code in relation to laws and then generate haskell from it. see https://github.com/agda/agda2hs that’s the ideal, but it has a lot of restrictions

      2. 6

        In general, there is a precise correspondence between variants of typed lambda calcului and what is known as the “internal logic” of categories which obey certain extra properties.


        Here are some words:

        Simply typed lambda calculus is the internal logic of Cartesian closed categories.

        Linear typed lambda calculus is the internal logic of symmetric monoidal categories.


        The “reason” for these correspondences is that one can use a category as an “abstract machine” to compile these lambda calculi into. For worked out examples, see the paper “compiling to categories” by conal Ellott. Finally, I apologise for not linking to anything, as I’m typing on my phone :)

    5. 22

      I imagine that many people will be wondering how Hare differs from Zig, which seems similar to me as an outsider to both projects. Could someone more familiar with the goals of Hare briefly describe why (assuming a future in which both projects are reasonably mature) someone may want to choose Hare over Zig, and Zig over Hare?

      1. 19

        I imagine that many people will be wondering how Hare differs from Zig, which seems similar to me as an outsider to both projects.

        As someone who used Hare briefly last year when it was still in development (so this may be slightly outdated), I honestly see no reason to use Hare for the time being. While it provides huge advances over C, it just feels like a stripped-down version of Zig in the end.

      2. 18

        My understanding is that Hare is for people who want a modern C (fewer footguns, etc) but who also want a substantially more minimalist approach than what Zig offers. Hare differs from Zig by having a smaller scope (eg it doesn’t try to be a C cross-compiler), not using LLVM, not having generic/templated metaprogramming, and by not having async/await in the language.

        1. 2

          That definitely sounds appealing to me as someone who has basically turned his back on 95% of the Rust ecosystem due to it feeling a bit like stepping into a candy shop when I just wanted a little fiber to keep my programs healthier by rejecting bad things. I sometimes think about what a less-sugary Rust might be like to use, but I can’t practically see myself doing anything other than what I am doing currently - using the subset of features that I enjoy while taking advantage of the work that occasionally improves the interesting subset to me. And every once in a while, it’s nice to take a bite out of some sugar :]

          1. 2

            If I remember correctly, there was some discussion about a kind of barebones Rust at some point around here. Is that what you would ideally have/work with? Which features would survive, and which be dropped?

      3. 14

        It looks like it’s a lot simpler. Zig is trying to do much more. I also appreciate that Hare isn’t self-hosting and can be built using any standard C compiler and chooses QBE over LLVM, which is simpler and more light-weight.

        1. 13

          As I understand it, the current Zig compiler is in C++; they are working on a self-hosting compiler, but intend to maintain the C++ compiler alongside it indefinitely.

          1. 4

            Ah, thanks for the correction!

          2. 2

            Indefinitely, as in, there will always be two official implementations?

          1. 3

            Well, at some point it would make sense, much like C compilers are ubiquitously self-hosted. As long as it doesn’t make it too hard to bootstrap (for instance, if it has decent cross-compilation support), it should be fine.

    6. 3

      I am lacking the formal background to understand why this is important. Can someone ELI5 this to me?

      Update: Yes, I read the project’s README. It was of no help, except to get some historical background.

      1. 3

        There is a nice tutorial on cubical type theory in Agda, if you want the precise statements.

        Imagine a series of generalized cubes in different dimensions. The zero-dimensional cube is a point. The one-dimensional cube is an interval or line segment with two points. The two-dimensional cube has four points, four edges, and one face. Each higher-dimensional cube is built from copies of the previous cube glued together, plus a new hyperface.

        We could imagine embedding these cubes into an (infinite-dimensional) space in order to study the space; because a cube is rigid, embedding a cube into a space creates a hierarchy of components (cube-ponents?) Specifically, a cube requires its sides to be parallel, and this means that if we have a cube with one missing side, then we usually can fill it in with a unique filler side.

        Why does this matter for computation of type theory? Let’s imagine a square with one missing edge, embedded into a universe of types. The corners of the square are types. We’ll say that the edge parallel to the missing edge is a proof that its endpoint types are equivalent, and the other two sides are both transformations from those equivalent endpoint types to some other types. Then, filling in the square would transport the proof of equivalence, generating the unique edge which represents the transformed proof; additionally, the face of the square would be a transformation itself, if perhaps the square were a face of a (three-dimensional) cube. which itself could be filled in.

      2. 2

        In the paper.tex file, we can read, from the abstract:

        A proposed model of homotopy type theory built via a reverse iterated Grothendieck construction process formalizing semi-cubical sets in dependent type theory with extensional equality. The accompanying artifact is a Coq formalization that uses highly sophisticated rewriting techniques and pushes the boundary of proof assistant technology.

        And futher down:

        Cubical type theory […] is an extension of dependent type theory that provides a computational interpretation of Voevodsky’s univalence axiom, in a field known as homotopy type theory, or HoTT. Two real-world projects that implement cubical type theory are worthy of mention: an experimental branch of Agda, known as Cubical Agda, and the Arend theorem prover. Cubical type theory is usually modeled as cubical sets, in the set-theoretic setting. Efforts to model cubical type theory have been ongoing since 2014, and many of these efforts are in the set-theoretic setting. Our contribution is to model the core of cubical types, semi-cubical sets, in type theory, using an indexed representation, in contrast to previous efforts that use a fibered representation. We use an iterated reverse-Grothendieck construction as the basis of our formalization, which we expound on in […] Another feature of our work is that, as we are in the type-theoretic setting, we use unicity of identity of proofs(UIP), which would be an inherent property in the set-theoretic setting (see […]).

        So, as far as I understand (which isn’t much), they formalize cubical type theory, by providing a model inside type theory rather than in set theory, and provide the Coq code witnessing this formalization.

    7. 2

      What about infomaniak (since you’re based in CH) ?

      1. 1

        I was not aware they exist :) I’m definitely looking into them as they seems to offer all TLDs I’m interested in, and I like the option to pay for many years in advance. Thanks for the recommendation.

    8. 1

      What is LispE? I can’t find anything about it using google. Also not mentioned in the Lisp wikipedia article anywhere.

      1. 1

        The github repo says:

        Lisp Elémentaire, a version of Lisp that is ultra-minimal but contains all the basic instructions of the language.

      2. 1

        This a Lisp that is being developed at Naver Lab Europe (France) You have pre-compiled versions in the binaries section for Windows and Mac OS. You can also download it and compile it for Linux…

    9. 2

      The problem I found with parser combinator libraries is that left recursion seems to be essentially impossible to tackle, thus rendering them useless for simple things as arithmetic expressions. I’ve also heard that Packrat fails to intelligently parse A ::= a A a | \epsilon: presumably, only strings of length a power of two are accepted. All in all, after hearing about PEG being the one true way to solve real life parsing, I’ve been pretty disappointed… Open to refutation though!

      1. 4

        I think that you can do good engineering work with PEG parsers, even if they don’t handle certain difficult cases. Your job as a programmer is to understand the limitations of the tools available and e.g. rewrite a grammar with a A a in it to something the parser system can handle. I don’t think the problem of taking a completely arbitrary CFG and generating an efficient parser for it has been solved. https://tratt.net/laurie/blog/entries/parsing_the_solved_problem_that_isnt.html

        One of the positives about PEG parsers for parsing things like programming languages is the left biasing is useful to express common PL syntax idioms, where a more theoretically pure description could have ambiguity in it. (relevant terms: ordered choice/soft cut)

        If I was using rust I would want to take advantage of nom for the zero copy properties.

        1. 2

          Interesting link, thanks! The author refers to “boolean grammars”; I wonder if these got any traction, either in research or applications.

      2. 4

        If you are not too much worried about the performance impact, you can limit the recursion to num(nonterminals) * (1 + |input_length|), which lets you escape the infinite recursion. (e.g)

        See “Richard A. Frost, Rahmatullah Hafiz, and Paul C. Callaghan. Modular and efficient top-down parsing for ambiguous left recursive grammars. IWPT 2007”

      3. 3

        There’s a clever way that PEG / packrat parsers can use the memoization cache to support left recursion quite simply: https://medium.com/@gvanrossum_83706/left-recursive-peg-grammars-65dab3c580e1

        I still think that expressing infix operators with the precedence climbing / Pratt parser algorithm is both more natural to write and more performant than via left recursion, though.

        (I maintain another PEG library in Rust that implements both of these.)

        1. 1

          I looked at your library before deciding on Pest for my attempt at Make A Lisp. It looks quite robust but I had trouble with the pseudo-Rust syntax in the parser macro. What made you decide to go that route instead of a more traditional PEG syntax?

          (Not to say Pest is “traditional”, it is fairly different than the PEG grammar described in the original paper, but it’s at least close enough to make translation easy.)

          1. 1

            Using a procedural macro makes it integrate better into the Rust ecosystem. Many of the tokens are passed through to the generated Rust code with span information intact, so error messages point to the code you wrote rather than a generated file. Even works live in an editor with rust-analyzer! It does mean the syntax has to be compatible with the Rust lexer, though.

            The biggest departure in syntax is the use of Rust’s patterns (['a'..='z']) for matching a character vs the regex-like syntax in most other PEG libraries ([a-z]). It’s more verbose, but this is to enable operation over data types other than strings – for example you can use an external lexer and match tokens like [Identifier(n)].

            Other than that, it’s heavily inspired by PEG.js and some of the operators like $ come from there.

      4. 1

        As far as I can tell, most every left recursive expression can be reformed into a non-left recursive form: term <- term "+" num / num can be rewritten as term <- num ("+" num)*, in the simplest example. This isn’t an ideal solution as the resulting parse tree is not as “natural” as the left recursive parse tree would be, but it is possible.

        For your “only even forms” objection, Pest has added specific operators for “range” matching: A = @{ "a" ~ ("a"{2})* } matches 1 “a” plus any number of pairs of “a”. (This could be accomplished more easily with @{ "a" ~ "aa"* } but that’s less fun. ;) )

        There are methods for handling left recursion directly (Python’s new PEG parser implements one such method, and the Pegged library for D implements another), but they’re more complicated than the base case.

      5. 1

        Re left recursion: see here - https://www.youtube.com/watch?v=b2AUW6psVcE Solves left recursion for BNF/EBNF/… https://github.com/Engelberg/instaparse