1. 2

    Nice! Thanks for all the links, you led me to a lot more reading material :)

    It’s easy to recognize (Int, Double) as a product between Int and Bool.


    1. 2

      Thanks for the catch! Definitely a typo :)

    1. 2

      Awesome article!

      Clojure and ClojureScript compile to JVM bytecode and JavaScript, respectively. In contrast, an interpreter directly executes the input code. Some popular interpreted languages are JavaScript, Python, and Ruby.

      To be fair, Python gets compiled to bytecode too, so if it’s interpreted then so are JVM languages.

      1. 2

        There was a good discussion of this a while back: https://lobste.rs/s/21ik7o/are_there_any_interpreted_languages

        1. 15

          C++ got its act together. Java and Perl came along (and later, Python). Meanwhile, Lisp hadn’t advanced much in about a decade (since the ANSI spec was finalized). It was still a win, but the competition had narrowed the field. Once upon a time Lisp had a long list of features that no other language had (GC, full numeric tower, CLOS, incremental development, macros) that list kept getting shorter and shorter, and the case for Lisp, which had never been easy even in the best of times, was getting harder and harder to make.

          I think that this is definitely an issue with Lisp. The core is really, really good; it still has plenty of features that common languages don’t have. But even though it’s more advanced than they are, it’s still stuck in 1994. The median language is a lot better in 2018 than it was in 1994: it’s JavaScript, which is a horrible, terrible, no-good language but still orders of magnitude better than C for software development.

          Lisp is still better: it still has things JavaScript, Python, Java, Ruby, Scala, Erlang & Go are all missing. But it’s not as much better as it used to be, because all of those languages are much more capable than C, assembler & Pascal.

          But then a very strange thing happened: I noticed that all around me people were writing code using C++ and Java and Python and even (gasp!) Perl, and they were actually getting it to work without tearing their hair out. In fact, in a number of cases I saw people whip things up in Perl in a couple of hours that would have taken me days or weeks to do in Lisp.

          I think that’s a factor of batteries being included. Once upon a time Common Lisp was derided for including hash tables: nowadays they are table stakes. Once upon a time people laughed at Common Lisp for being a huge standard; now they laugh at it for not including a web server, let alone standardising sockets.

          The good news is that this is fixable (we can always have a new standard, if we want to go to the effort), and in practice it more-or-less is fixed: Quicklisp provides semi-standard libraries for everything. Would it be nice to have a truly-standardised library equivalent to, say, Go’s? Hell yes. Can we get by with what we have now? Sure.

          I do think that the Lisp world needs to consider a compatibility-breaking Common Lisp 2 at some point. The world has changed a lot in 24 years; it’s time Lisp took note.

          1. 9

            What do you think of other newer age Lisps like Clojure?

            1. 9

              Honestly, I don’t like Clojure — it’s not (IMHO) really a Lisp (among other things, it breaks conses). It’s a neat language, but it’s not my cup of tea. I’d like to see a Lisp continued & carried on, not a break with tradition.

              1. 8

                cons, car, cdr is the least that I miss in clojure, I guess they are a bit too low-level for my taste (and kind of complicated for reading code, although I never had a problem writing them out for some reason).

                What I kind of miss is in Clojure is CLOS, i.e. a multiple dispatch object system.

                1. 3

                  I am not familiar with CLOS, but Clojure does have this.

                  1. 1

                    Yes, and if it hadn’t this, one could probably write a multiple dispatch system with macros. I guess i could build my own island of multi methods. But that is a little different from an ecosystem that invests in them like Dylan or cl.

                    1. 1

                      I’ve been meaning to try Common Lisp. Things like the condition system, image dumping, CLOS, and MOP seem different from what I know and worth learning. I’m going to try out PAIP. Any insight on what the CL ecosystem tends to do with mutli-dispatch?

                      1. 2

                        I might be the wrong person to ask, because I never really used it (CL, and multiple dispatch) for more than a few example programs. First of all it implements OO in a way that nicely fits into the lisp functional programming world, i.e. procedures/functions first. Then there are the instances were you’d normally resort to the visitor pattern and can use multiple dispatch for truely polymorphic functions. I think they aren’t as widespread as one might think, but every now and then, multiple dispatch really helps at keeping code simple.

              2. 8

                You beat me to it. Ill go straight to saying it seems to solve that very problem with proof being its mainstream status having a big community and plenty of job ads for it. Need more LISP initiatives like that.

              3. 8

                I do think that the Lisp world needs to consider a compatibility-breaking Common Lisp 2 at some point.

                i’d love to see that effort put into optimising the racket runtime and adding libraries to it, personally. i think it has real potential to be an “industrial-strength” langage; the design is already excellent.

                1. 6

                  i’d love to see that effort put into optimising the racket runtime and adding libraries to it

                  Some work is being done right now to integrate the Chez scheme backend with Racket, which would mean some really nice speed improvements for compiled code.

                  1. 3

                    The problem with Racket is that it’s very much a Scheme rather than Lisp, with all the mistakes that come with being a Scheme: Lisp-1, #f, (car '()) being an error. It’s an incredibly impressive body of work, but I wish that effort had been expended on Common Lisp instead.

                    1. 8

                      wait why shouldn’t that be an error though.

                      1. 4

                        wait why shouldn’t that be an error though.

                        Because it makes writing code which walks through lists a bit simpler. It’s useful that (car nil)nil & (cdr nil)nil. This — like having multiple namespaces — is one of the places where Lisp is more pragmatic than Scheme. I find that quality attractive in a programming language meant to be used.

                        I really wish that I could find a good article about this to share with you. I know that I’ve read one in the past, found the arguments convincing and have had the ‘nil should be carable’ bit set in my head since.

                        1. 7

                          It’s useful that (car nil) → nil & (cdr nil) → nil.

                          That sounds horrible, a recipe for “foo is not a property of undefined”-style errors miles away from where the problem actually occurred. Surely your functions should know what kind of data they expect to be operating on, and fail-fast is not.

                          1. 3

                            Yes that is my view too. Though I think for smal projects bargap’s proposal could be good but as the project grows in size the need for errors to happen close to the source of the problem grows.

                            1. 1

                              It actually works out very well with the sort of code which is written in Lisp. Typically, Lisp systems depend on it.

                              1. 2

                                I’ve found it to be the single largest source of errors in the lisp systems I work with.

                                1. 1

                                  Are you writing Lisp or Scheme? If you’re finding it to be a huge source of errors in Scheme, then wouldn’t that tend to support the Common Lisp way of doing things?

                                  1. 3

                                    I mostly write Clojure and Emacs Lisp, (since 2008 and 2004 respectively) which use nil in the same way Common Lisp does. I don’t have these problems when I write Racket.

                                    1. 1

                                      I don’t know Clojure, but I’ll agree elisp is a good basis for comparison. All I can say is that my experience with elisp & Common Lisp differs from your own, and that my experience with Scheme has convinced me that I do not want to write a large system in it — nor use a large system written in it.

                            2. 6

                              Leaving nil out of the language is hands-down my favorite thing about Racket. So many of the problems of CL and Clojure just evaporate completely.

                              1. 6

                                Of course Racket has nil. It’s just written '().

                                1. 7

                                  When I say nil here I’m referring to a single value which can mean “emptiness”, “false”, or “not found” depending on the context. Racket has three different values for each of those cases instead of lumping them all into a single value, and that is much less error-prone.

                              2. 2

                                Could you write a car that doesn’t error, Call it care

                        2. 3

                          That reference to Quicklisp made me think about slib

                          I like the idea of having a standard library build only on spec scheme and app’s built with the specific particularities of scheme implementations.

                        1. 1

                          I’m not fully understanding what issue is being described here. Is it that the archive URLs are unreliable, i.e. the “Source code (zip / tar.gz)” URL?

                          1. 2

                            The hash of the auto-generated tar files is not stable. I assume the compression level changes or the tar implementation to create them.

                            1. 1

                              And what about the zip files?

                              1. 3

                                Same problem with zip files.

                                The OpenBSD ports tree stores checksums of release artifacts to ensure authenticity of code that is being compiled into packages.

                                Github’s source code links create a new artifact on demand (using git-archive, I believe). When they upgrade the tooling which creates these artifacts the output for existing artifacts can change, e.g. because the order of paths inside the tarball or zip changes, or compression level settings have changed, etc.

                                Which means that trying to verify the authenticity of a github source link download against a known hash is no better than downloading a tarball and comparing its hash against the hash of another distinct tarball created from the same set of input files. Hashes of two distinct tarballs or zip files are not guaranteed to match even if the set of input files used to create them is the same.

                                1. 1

                                  Thank you for the detailed response! I understand the issue now.

                                  There are likely tradeoffs from GitHub’s perspective on this issue, which is why they create a new artifact on demand. They maintain a massive number of repositories on their website, so they probably can’t just store all those artifacts for long periods of time as one repository could potentially be gigantic. There are a number of other reasons I can think of off the top of my head.

                                  Why not have the checksum run against the file contents rather than the tarball or zip?

                                  1. 3

                                    Why not have the checksum run against the file contents rather than the tarball or zip?

                                    One reason is that this approach couldn’t scale. It would be insane to store and check potentially thousands of checksums for large projects.

                                    It is also harder to keep secure because an untrusted archive would need to be unpacked before verification, see https://lobste.rs/s/jdm7vy/github_auto_generated_tarballs_vs#c_4px8id

                                    I’d rather turn your argument around and ask why software projects hosted on github have stopped doing releases properly. The answer seems to be that github features a button on the web site and these projects have misunderstood the purpose of this button. While some other projects which understand the issue actively try to steer people away from the generated links by creating marker files in large friendly letters: https://github.com/irssi/irssi/releases

                                    I’d rather blame the problem on a UI design flaw on github’s part than blaming best practices software integrators in the Linux and BSD ecosystems have followed for ages.

                                    1. 2

                                      Some more specifics on non-reproducible archives: https://reproducible-builds.org/docs/archives/.

                                      Why not have the checksum run against the file contents rather than the tarball or zip?

                                      Guix can do something like that. While it’s preferred to make packages out of whatever is considered a “release” by upstream, it is also possible to make a package based directly on source by checking it out of git. Here’s what that looks like.

                            1. 2

                              This looks cool but I don’t understand what a ring is in this case! The readme’s pretty sparse… do you have some more information?

                              1. 6

                                Working on implementing an Earley parser using this amazing tutorial. I’ve read that site about a dozen times, and last night it finally clicked. Money quote from the author:

                                Most compiler texts are obsessed with slaying the compiler dragons of the 60s with 70s technology.

                                If you have any sort of appreciation for algorithms or parsing, you should study it.

                                1. 4

                                  The coverage in Dick Grune’s Parsing Techniques: A Practical Guide is great, and his page for the first edition includes PDFs. If you’re interested in parsers in general, it’s definitely worth picking up.

                                  I’ve also been working on implementing Earley parsers, and the Earley/intersection parsing approach seems particularly compelling. It’s described in chapter 13 in the second edition, doesn’t seem to appear in the first.

                                  1. 2

                                    So Earley will recognize all context-free grammars, including ambiguous ones? I have two problems with that:

                                    1. Doesn’t it have to return a “forest” of all possible derivations of ambiguous grammars? What do you do with that? That doesn’t seem to be solving the problem; it’s more like pushing it to a later stage.
                                    2. Context free grammars aren’t powerful enough to represent most languages we use (C, C++, shell, Ruby, Perl 5 and 6, etc.). So even if Earley magically solved problem #1, it still wouldn’t be an ideal algorithm.

                                    I’ve written about this length in various comments, but I think CFGs are the wrong abstraction for human programming languages. They’re simultaneously not powerful enough and too inefficient.

                                    My argument is analogous to the one here regarding CFGs and network protocols. This is admittedly is a fringe claim, where as “CFGs should be used to parse programming languages” is the claim you will read in nearly every textbook:


                                    One of the langsec people responded on HN: https://news.ycombinator.com/item?id=16126234

                                    I read that paper about “calc-regular languages” but it was immature – which the authors admitted – it still can’t handle some common constructs.

                                    Anyway, my overall point is that we should use formalism, but formalism should match practice, not the other way around. Practice shouldn’t be put in the straitjacket of a theory (especially when that theory doesn’t provide you with good efficiency or correctness guarantees, as CFGs don’t).

                                    (FWIW I have a very messy wiki page about the difficulties with parsing here: https://github.com/oilshell/oil/wiki/Parsing-is-Difficult

                                    I don’t believe any of them are addressed by Earley parsing. The main bugaboo I have now is that every language has at least TWO parsers. One for the compiler/interpreter, and a totally separate one for linting/formatting. The former ignores whitespace; the latter must recognize whitespace. Even a brand new language like Julia has two parsers!!!

                                    Any system based on CFGs don’t solve that problem. I do NOT want parse trees (which include the full derivation of the string from the grammar); I want ASTs or “lossless syntax trees”.

                                    1. 2

                                      Well, I’m not happy about how Earley detects ambiguity TBH. But I am looking for something that is simple enough to take the question off the table so I can continue moving forward for now.

                                      You ended up using Pratt parsing for Oil, correct? The choice of a parser for my project is still very much up in the air; the most important consideration is whether I can build the grammar immediately before parsing a module, which isn’t a great use case for a lot of existing tools. I don’t need the ability to mutate the parser while I’m parsing, as I don’t mind cheating by pre-processing to locate all the grammars in use.

                                      From my research, Pratt parsing looks really underutilized, but I wasn’t able to discern whether it was painful to build the rules programmatically after parsing patterns and such, and whether it’d be substantially different from PEGs. For example, if I’m building a for x in iterable statement, would matching the in keyword be painful, especially if the metagrammar specification was declarative in nature? I won’t be writing the led() and nud() methods by hand here, they’ll be generated from specifications provided by the user. Also, I was initially put off by all the precedences on everything, then later realized those are implicit in most grammar specifications anyway, so that is pretty unfounded.

                                      1. 2

                                        Also remember you can always use more than one parser. sklogic used to point that out on HN when people were choosing between one that succeeds/fails fast and one easy fod error messages. Something like that.

                                        His tool on “combinatorylogic” Github is a LISP that embeds a bunch of DSL’s he mainly uses for program analysis. So, his double-parser strategy might be well-informed. ;)

                                        1. 1

                                          Oil is best described as 3 interleaved recursive descent parsers + one Pratt parser for expressions, with 15 or so lexer modes. [1]

                                          Pratt parsing is really best suited for expression languages, not a whole programming language, as Crockford does in his widely read article (which I reviewed here [2]).

                                          Although Pratt parsing isn’t a grammar-based approach, there is something called an “operator precedence grammar” which is a subset of CFGs, and what it handles roughly corresponds to that set of languages:


                                          Bob Nystrom has a good way of putting it:


                                          If recursive descent is peanut butter, Pratt parsing is jelly. When you mix the two together, you get a simple, terse, readable parser that can handle any grammar you throw at it.

                                          I would add the caveat that this technique is really good for implementing existing languages. If you are designing a language from scratch, it might be helpful to use some kind of parser generator to guide the design.

                                          In summary, I would say the parsing method you use depends very heavily on the language. And many languages are compositions of multiple languages. I don’t have experience with Earley but I’m sure the same thing is true: it’s good for some things and not good for other things.

                                          [1] http://www.oilshell.org/blog/2017/12/17.html

                                          [2] http://www.oilshell.org/blog/2016/11/02.html

                                        2. 2

                                          Have you heard about parsing with derivatives?

                                          I recently explored the possibility of doing this with Clojure’s spec. Here’s an example:

                                          (s/conform ::s-expression "(abc  def)")
                                          [[:sub {:left  \(
                                                  :nest  [[:atom [\a \b \c]]
                                                          [:whitespace [\space \space]]
                                                          [:atom [\d \e \f]]]
                                                  :right \)}]]

                                          Notice that in the third line, the whitespace has been captured. A linter could examine whitespace and a compiler/interpreter could throw it out. Specs can be bidirectional too, so a formatting tool could take a data structure like the above, adjust white space, and unform it, producing formatted source code.

                                          Regarding returning a parse forest, the author of parsing with derivatives goes into that here. On the next slide, the author alludes to the possibility of modifying the grammar during parsing (!) which may be of particular interest to you.

                                      1. 2

                                        I had a hard time understanding the Matt Might paper, but his lecture on it really helped me. He also motivates it really well.

                                        1. 1

                                          Somewhere around the 40 minute mark, those definitions of D and δ using define/memoize and define/fix really made my day. :-)

                                        1. 1

                                          Great stuff! I love the format of iterating on implementations and criticisms.

                                          Clojure also has de-nesting shortcuts via -> and related macros, but while the first argument to nest is the outermost expression, the first argument to -> and similar is the innermost expression. This makes it convenient for expressing function composition. However, it seems backward to use it for binding. These two expressions are equivalent:

                                          (let [x 1 y 2]
                                             (when (even? y)
                                          (->> x
                                               (when (even? y))
                                               (let [x 1 y 2]))

                                          Which reminds me of Haskell’s where.

                                          Similarly, the author illustrates the utility of nest with binding forms, but using it for function composition may feel backwards. If I understand nest correctly, then these should be equivalent:

                                          (reduce + (filter odd? (map inc [1 2 3 4])))
                                          (nest (reduce +)
                                                (filter odd?)
                                                (map inc)
                                                [1 2 3 4])

                                          But, this outermost-first order is the usual order of function composition in math, Haskell, and others.

                                          It’s interesting that you can express either binding or composition in either order.

                                          PS: nest is implemented in a library for Clojure as <<-