1. 7

    Pyret looks very well designed, but the compiler implementation looks very frustrating. It’s a 4-stage bootstrapping process involving massive compiled JS blobs checked into the source tree. I wonder why it’s not a 2-stage compiler, with a minimal compiler written in JS and the full compiler written in Pyret.

    I’ve thought about writing another compiler so that Pyret can be used without the bootstrapping madness.

    1. 4

      I work on Pyret. You’re not wrong. I’ve often daydreamed about writing an interpreter, or a compiler in OCaml with bucklescript!

      If you ever get around to writing that compiler, our group would love to hear about it. :-)

      1. 5

        This is such a small thing but I love how pyret lets you use dashes in identifiers! I’ve never seen this in a non-lisp before, other than forth.

        1. 5

          COBOL allows dashes in identifiers. Only a dash surrounded by spaces is interpreted as a minus.

        2. 3

          Are there any weird language corners/features/bugs that a future interpreter/compiler implementor should know about? Things like Python’s descriptor protocol, etc, etc

          1. 1

            i’ve daydreamed of exactly that too, though i was envisioning a gradual migration of the existing compiler to bucklescript. not something i could do on my own but if there were people doing it i would love to jump in and help!

            a command line interpreter in ocaml would be awesome too.

        1. 8

          Single-file version control. I do have something I called fh, but it’s mostly a joke given its design constraints (and performance characteristics as a result of them). The only drive I have is personal issues with licensing on the existing solutions SCCS (CDDL) and RCS (GPL or OpenBSD’s that has some 4-clause BSD files); SRC is just a wrapper around the aforementioned ones. The part that scares me the most is definitely the diffing. I kind of do want to use interleaved deltas on the backend, but I’ve failed multiple times to even retrieve data from an SCCS weave, much less insert a diff.

          Liberally licensed elliptic curve cryptography over a binary field (e.g. NIST B-163). I just kind of feel like I should know this and occasionally I do run into one of those obscure binary field curves. However, I wholly lack the mathematics for it and never felt like actually sitting down for a week to learn discrete math and elliptic curves and whatnot. Library support is actually lacking (does anything other than OpenSSL even support these curves?) but OpenSSL is kind of the be-all-end-all of cryptography libraries to begin with.

          Self-hosted, open source base for DRM. Keygen and Qeys have some very attractive solutions for DRM that are simple to integrate. But I kind of sort of want my own and have it be open source so that people with a lot of paranoia won’t have to rely on third parties. The irony of open source DRM is not lost on me, no.

          Yet another typesetting system. TeX has great output and is painful to actually work with if you want to customize it; LaTeX exists for a reason after all. Troff has mediocre output (let’s not get into the nitty-gritty tradeoff that is choosing between groff and heirloom-troff here) and is somewhere between amenable and hell to actually work with. Why not both?

          1. 2

            Yet another typesetting system. TeX has great output and is painful to actually work with if you want to customize it; LaTeX exists for a reason after all. Troff has mediocre output (let’s not get into the nitty-gritty tradeoff that is choosing between groff and heirloom-troff here) and is somewhere between amenable and hell to actually work with. Why not both?

            (La)TeX is lovely and Knuth is brilliant, but his lack of PLT expertise shows. I’m exceedingly eager to see what would happen if a PL expert tackled typesetting languages. (Unfortunately, the story of Tex is itself a cautionary tale for any PLT PhD students thinking they’d might like to tackle the problem and finish their dissertation within a decade.)

            1. 4

              TeX is the way it is not because Knuth couldn’t do a better language, but because it was built at a time computer’s couldn’t fit an AST in memory. The TeX compiler has to generate the output with few sequential passes through the code.

              1. 2

                A lot of people also confuse latex (Lamport) with Tex (Knuth)

                Knuth was not interested in creating a markup system but to produce a professional typesetting tool for the expert user that pushed the boundaries of what had been done before.

                It’s unfair to say he didn’t care about abstractions, rather I think he chose the abstractions that served his goals and did not abstract where they were unhelpful or introduced performance penalties.

                Like people complain about having to rerun latex to page numbering right in the table of contents. Iirc knuths text document just generate the TOC at the end and he can then reorder pages in a production step.

                One counterexample to the “no abstractions” argument would be metafont.

              2. 2

                What is “PLT” and how does it apply here? I think the issue with TeX doesn’t have to do with “theory” – it has more to do with design, i.e. the “soft” parts of programming languages.

                It doesn’t really make sense to say Knuth lacks “PLT” expertise. First, he invented a lot of the theory we use in practice, like LR parsing.

                As another example, he also showed up in the LLVM source code for a minimum spanning tree algorithm:


                And: this might be a nitpick, but I’ve heard at least one programming language researcher say that “PLT” doesn’t mean anything outside of PLT Scheme. It seems to be an “Internet” word.

                So bottom line, I don’t disagree that TeX could be improved upon after several decades, but your analysis doesn’t get to the core of it. People who study theory don’t write the programming languages we use. Probably Haskell is the counterexample, but Haskell is also known for bad tooling like package management. On the other hand, Go is known for good, usable tools but not being “exciting” in terms of theory.

                I don’t think you need a Ph.D. anymore to write TeX. When Knuth wrote it, that may have been true, but the knowledge has been disseminated since then.

                1. 6

                  PLT is the initialism for Programming Language Theory. PLT Scheme got its name because it came out of the Rice PLT research group. PLT is simply the field of exploring the abstractions we use to program (and creating the theoretical tools to reason about those abstractions). While we very rarely directly use systems created by researchers (Haskell being a notable exception), the abstractions they develop absolutely shape the development of the programming languages we do use.

                  I’m not arguing that you need a Ph.D. to write TeX. I’m claiming that the task of developing the right abstractions for a typesetting programming language have largely been ignored by the group of people who study abstractions and programming languages (PLT researchers!).

                  Addenda: There’s a very distinct flair common to Knuth’s programming projects. His language design is primary influenced by the details of the machine which they program and the algorithmics of executing them. For instance, the reason TeX documents may require multiple rounds of computation before they’re “done” is because Knuth baked the requirement that a TeX compiler be single-pass into TeX’s execution model. Conversely, Knuth wasn’t at all concerned about statically reasoning about TeX programs, as evidence by the fact that merely parsing TeX is itself turing complete. (And TeX’s development happened during a really exciting period for PLT research: right around the discovery of Hindley-Milner type inference and hygienic macros!)

                  1. 6

                    Words aside, it’s silly to say that Knuth lacks expertise in “PLT”, since his work was foundational to the field.

                    Secondly, I don’t buy the claim that lack of abstractions are what’s wrong with TeX – at least without some examples / justification. Like anything that’s 30 years old, I think you could simply take the knowledge we have today and make a nicer version (assuming you have 5-10 years free :-/ ).

                    TeX is constrained by compatibility and a large user base, which I imagine explains most of its warts – much like Unix shell, and Unix in general. And there’s also the problem that it’s a huge amount of work that you won’t likely won’t get paid for.

                  2. 0

                    LR parsing is completely irrelevant to what PLT research is about.

                  3. 2

                    You should have a look at scribble.

                    1. 1

                      Also SILE, which takes the good parts of TeX, rips them out with no remorse, and glues together using Lua for ease of hacking. One important con is it doesn’t have math typesetting yet… which is, fittingly to the theme of the thread, why I tried to add the support ;)

                  4. 2

                    I’ve always thought of open sourcing Keygen. But I’m trying to grow it into a viable business, and I don’t think that would be wise at the moment. However, I do think open source tooling for things like this are incredibly valuable. (Disclosure: I’m the founder of Keygen.)

                    1. 3

                      But I’m trying to grow it into a viable business, and I don’t think that would be wise at the moment.

                      I can’t help but be curious about the magic sauce (and run my own—despite not even selling any software), but oh well. Maybe someday.

                      Open sourcing seems unwise to me, too. Small-scale deployments would just grab the open source edition and run with it. Plus I don’t think it’s entirely unrealistic to expect the big cloud providers to take it and make it theirs.

                      (Possibly there’s a market for obfuscation that you could also work with, but requires very high expertise in very low level development on multiple platforms; Denuvo had decent success there until they didn’t.)

                      While I have your attention, maybe you’ll find these points interesting which I would be doing differently:

                      • Likely a simple pure binary protocol (possibly with a much more lightweight cryptographical protocol than TLS) for the endpoints that clients require. That might be easier to handle in C/embedded platforms.
                      • Possibly drop the notion of a “user” and leave that to the API consumers’ discretion entirely. People need to stop rolling more authentication schemes out.
                      • Built-in license key checksum. I know your license schemes allow for extension with cryptography, but a minor amount of typo correction before making a possibly expensive network request could be helpful, depending on how things are set up.
                      • Elliptic curves (Ed25519 or P-521, specifically) as alternatives or possibly only signing option. Outright drop any padding scheme for RSA that isn’t PSS.
                      • Always sign and encrypt everything on top of TLS so that TLS isn’t your only line of defense against a cracker.
                      • Possibly considering encrypted certificates (X.509 or something of your own) as bundled information about a license. This could save some database lookups based on license ID, allow a “stateless” CDN for delivery—it only has to verify the certificate’s signature and its expiry time. They could also optionally embed (wrapped) encryption keys to wrap contents in, alleviating the catch-22 mentioned in the dist documentation: You could have regular licenses for a stub application that is used to make a request for a certificate and download the real application with the encryption key contained therein.
                      • SHA256 or higher for any sort of file checksum.
                      • Expose a “simple” API that skips the notion of policy, which is instead managed on the backend server. This can be useful for very small deployments.
                      • Flexible elevated access tokens (deleting a product is such a dissimilarly dangerous operation in comparison to creating one or even just issuing new licneses).
                      1. 2

                        Thanks for the great feedback! I’m actually working on a lot of those, especially introducing ECC into the stack and calculating better file checksums (I’m essentially proxying S3 at the moment, which uses MD5 checksums, much to my dismay). And FWIW, user management is an optional feature of the API, so if it’s not needed, one can simply create user-less licenses and be done with it. Better access token privileges has also been on my list since day 1 — just haven’t gotten around to that one yet. But hopefully soon.

                  1. 2

                    As you can see using Emoji solves the classical naming problem you have when programming: waste no more time on “Should I call the table User or Users?”, just use 👤.

                    👤 or 👤s?

                    1. 15

                      👤 or 👤s?

                      Do you mean 👤 or 👥? ;)

                      1. 2

                        The former is the name of your object, the latter is the name of your database table.

                        1. 2

                          Only if you think in objects. If you think in terms of relations (as in, relational algebra), calling it the “user” relation makes sense.

                    1. 7

                      Good read. Anyone knows what competitor he is talking about and where Graydon Hoare worked in 2005?

                      1. 12
                        1. 8

                          Yep. To expand on that:

                          • The technology Brian Cantrill was working on at Sun was DTrace (for Solaris).
                          • SystemTap was a Linux alternative, developed in part by Red Hat where Graydon Hoare was working.
                          • When Oracle bought Sun they started porting DTrace to Linux, but that happened years later.
                      1. 3

                        Further recommendations from past offerings of @shriram’s Programming Language Theory course:

                        I’ve personally referenced both Design Concepts in Programming Languages and TAPL quite a bit. Redex is a really fantastic tool for experimenting with executable operational reduction semantics.

                        1. 1

                          “Design Concepts” could be a bit heavy for a first course in PLT. I’d look at it after you get into the material a bit further. It seems like a decent book, though. I have to sit down and get through it some time.

                        1. 2

                          For me, the space-in-filenames issue is the main thing preventing me from using Make for everything. The limitations on pattern rules make scaling up Makefiles difficult, too.

                          I’ve long thought that a prolog-like DSL would be perfect; makefiles are basically already logical specifications—I just needed something that worked with spaces and could perform some sort of unification-like process for smarter pattern rules. I nearly started writing my own.

                          But then I discovered that someone already created such a tool: biomake. I’ve recently started using biomake to automate a large data science research project. It has been absolute bliss! Not only does it those two pain-points with using Make (and countless other pain-points), it features GridEngine integration out-of-the-box, and fantastic debugging information for when things don’t work as expected. I cannot see myself using any other tool.

                            1. 7

                              This made me full of blood rage when I happened upon it a few months ago. I just want to have some well-named markdown files be automatically converted to HTML and also concatenated, converted to HTML, and that HTML converted to PDF. All with good-looking file names with spaces in them. Like how normal people named files.

                            1. 2

                              Do you use Rust as your daily driver for your work/hobby stuff?

                              For work, I now mostly write Pyret. I typically implement my side projects in either Rust or Racket, though! I have a strong preference towards data-oriented languages. The languages I use most (Pyret, Rust, Racket, Bash, MLs) all favor a data-oriented style of programming, so how I choose a language for a project tends to depend on the ‘shape’ of the data I’m working with.

                              What language features do you leverage from Rust, and how do they make those projects easier?

                              I reach for Rust when I need performance, sophisticated types (traits!), and well-defined structured data (ADTs!). It’s a thrill writing code at a high-level of abstraction and being confident that it will be performant. Cargo makes the entire ecosystem a joy to interact with and contribute to.

                              Why aren’t you using your non-Rust languages for that project?

                              There are a handful of projects for which I haven’t reached for Rust. Racket, though not tremendously performant, is tremendously expressive! When your data look like s-expressions, Racket is usually the best choice. I used Racket extensively for my encyclopedia remix project Liber Brunoniana, which involved processing lots of HTML.

                              What could Rust do to make your life easier?

                              I would love to use Rust for programming embedded devices, like Arduinos, and targeting browsers via WebAssembly.

                              1. 5

                                Hi, Pyret developer here (I hack on error messages)! If anyone has any questions, I’ll do my best to answer or grab someone who can!

                                1. 4

                                  A couple questions, about the check and where blocks:

                                  1. Do you know if they’re consciously inspired by the Design by Contract ideas from Eiffel (and other languages)?
                                  2. The front page says “These assertions are checked dynamically” – are those conditions checked at compile time, first execution of each function, only during a separate testing run, or what?
                                  1. 2

                                    check, where and examples blocks all exist to support the ‘examples’ and ‘testing’ phases of How to Design Program’s functional design recipe. The placeholder expression syntax (...) and doc blocks are other features that Pyret has to encourage the design recipe.

                                    Testing statements are executed in the order they appear in the program alongside other top-level expressions. If you run:


                                    The result is that 1, 2, and 3 are printed (in that order).

                                    1. 1

                                      Interesting. For the where expressions on function definition: are those executed when the function definition is first encountered, or when the function itself is first executed?

                                      1. 2

                                        At the point when the definitions are “encountered” is a good way to think about it. So, the former.

                                1. 5

                                  Warning: live streams require flash.

                                  1. 3

                                    I adore the simplicity of make as a declarative build tool, but for a build system whose operative principle is a close correspondence with the filesystem, the placement of additional, subtle restrictions on valid filenames are ridiculous. I’d happily trade away most of make’s features for a core that didn’t choke on spaces.

                                    (If anybody knows of such an alternative, please share!)

                                    1. 2

                                      I think it’s pretty common on UNIX platforms to avoid spaces where possible. Finding spaces in filenames is peculiar, and generally not a good sign.

                                    1. 2

                                      Internet: Toponymic: Providence, Rhode Island

                                      1. 8

                                        Oh I do so hope Apple counter whacks them with a DMCA suite for reversing engineering a protection mechanism…….

                                        …. my sense of Schadenfreude would know no bounds.

                                          1. 1

                                            Would’ve been funny. ?