1. 8

    Single-file version control. I do have something I called fh, but it’s mostly a joke given its design constraints (and performance characteristics as a result of them). The only drive I have is personal issues with licensing on the existing solutions SCCS (CDDL) and RCS (GPL or OpenBSD’s that has some 4-clause BSD files); SRC is just a wrapper around the aforementioned ones. The part that scares me the most is definitely the diffing. I kind of do want to use interleaved deltas on the backend, but I’ve failed multiple times to even retrieve data from an SCCS weave, much less insert a diff.

    Liberally licensed elliptic curve cryptography over a binary field (e.g. NIST B-163). I just kind of feel like I should know this and occasionally I do run into one of those obscure binary field curves. However, I wholly lack the mathematics for it and never felt like actually sitting down for a week to learn discrete math and elliptic curves and whatnot. Library support is actually lacking (does anything other than OpenSSL even support these curves?) but OpenSSL is kind of the be-all-end-all of cryptography libraries to begin with.

    Self-hosted, open source base for DRM. Keygen and Qeys have some very attractive solutions for DRM that are simple to integrate. But I kind of sort of want my own and have it be open source so that people with a lot of paranoia won’t have to rely on third parties. The irony of open source DRM is not lost on me, no.

    Yet another typesetting system. TeX has great output and is painful to actually work with if you want to customize it; LaTeX exists for a reason after all. Troff has mediocre output (let’s not get into the nitty-gritty tradeoff that is choosing between groff and heirloom-troff here) and is somewhere between amenable and hell to actually work with. Why not both?

    1. 2

      Yet another typesetting system. TeX has great output and is painful to actually work with if you want to customize it; LaTeX exists for a reason after all. Troff has mediocre output (let’s not get into the nitty-gritty tradeoff that is choosing between groff and heirloom-troff here) and is somewhere between amenable and hell to actually work with. Why not both?

      (La)TeX is lovely and Knuth is brilliant, but his lack of PLT expertise shows. I’m exceedingly eager to see what would happen if a PL expert tackled typesetting languages. (Unfortunately, the story of Tex is itself a cautionary tale for any PLT PhD students thinking they’d might like to tackle the problem and finish their dissertation within a decade.)

      1. 4

        TeX is the way it is not because Knuth couldn’t do a better language, but because it was built at a time computer’s couldn’t fit an AST in memory. The TeX compiler has to generate the output with few sequential passes through the code.

        1. 2

          A lot of people also confuse latex (Lamport) with Tex (Knuth)

          Knuth was not interested in creating a markup system but to produce a professional typesetting tool for the expert user that pushed the boundaries of what had been done before.

          It’s unfair to say he didn’t care about abstractions, rather I think he chose the abstractions that served his goals and did not abstract where they were unhelpful or introduced performance penalties.

          Like people complain about having to rerun latex to page numbering right in the table of contents. Iirc knuths text document just generate the TOC at the end and he can then reorder pages in a production step.

          One counterexample to the “no abstractions” argument would be metafont.

        2. 2

          What is “PLT” and how does it apply here? I think the issue with TeX doesn’t have to do with “theory” – it has more to do with design, i.e. the “soft” parts of programming languages.

          It doesn’t really make sense to say Knuth lacks “PLT” expertise. First, he invented a lot of the theory we use in practice, like LR parsing.

          As another example, he also showed up in the LLVM source code for a minimum spanning tree algorithm:

          https://www.reddit.com/r/ProgrammingLanguages/comments/b22tw6/papers_and_algorithms_in_llvms_source_code/

          And: this might be a nitpick, but I’ve heard at least one programming language researcher say that “PLT” doesn’t mean anything outside of PLT Scheme. It seems to be an “Internet” word.

          So bottom line, I don’t disagree that TeX could be improved upon after several decades, but your analysis doesn’t get to the core of it. People who study theory don’t write the programming languages we use. Probably Haskell is the counterexample, but Haskell is also known for bad tooling like package management. On the other hand, Go is known for good, usable tools but not being “exciting” in terms of theory.

          I don’t think you need a Ph.D. anymore to write TeX. When Knuth wrote it, that may have been true, but the knowledge has been disseminated since then.

          1. 6

            PLT is the initialism for Programming Language Theory. PLT Scheme got its name because it came out of the Rice PLT research group. PLT is simply the field of exploring the abstractions we use to program (and creating the theoretical tools to reason about those abstractions). While we very rarely directly use systems created by researchers (Haskell being a notable exception), the abstractions they develop absolutely shape the development of the programming languages we do use.

            I’m not arguing that you need a Ph.D. to write TeX. I’m claiming that the task of developing the right abstractions for a typesetting programming language have largely been ignored by the group of people who study abstractions and programming languages (PLT researchers!).


            Addenda: There’s a very distinct flair common to Knuth’s programming projects. His language design is primary influenced by the details of the machine which they program and the algorithmics of executing them. For instance, the reason TeX documents may require multiple rounds of computation before they’re “done” is because Knuth baked the requirement that a TeX compiler be single-pass into TeX’s execution model. Conversely, Knuth wasn’t at all concerned about statically reasoning about TeX programs, as evidence by the fact that merely parsing TeX is itself turing complete. (And TeX’s development happened during a really exciting period for PLT research: right around the discovery of Hindley-Milner type inference and hygienic macros!)

            1. 6

              Words aside, it’s silly to say that Knuth lacks expertise in “PLT”, since his work was foundational to the field.

              Secondly, I don’t buy the claim that lack of abstractions are what’s wrong with TeX – at least without some examples / justification. Like anything that’s 30 years old, I think you could simply take the knowledge we have today and make a nicer version (assuming you have 5-10 years free :-/ ).

              TeX is constrained by compatibility and a large user base, which I imagine explains most of its warts – much like Unix shell, and Unix in general. And there’s also the problem that it’s a huge amount of work that you won’t likely won’t get paid for.

            2. 0

              LR parsing is completely irrelevant to what PLT research is about.

            3. 2

              You should have a look at scribble.

              1. 1

                Also SILE, which takes the good parts of TeX, rips them out with no remorse, and glues together using Lua for ease of hacking. One important con is it doesn’t have math typesetting yet… which is, fittingly to the theme of the thread, why I tried to add the support ;)

            4. 2

              I’ve always thought of open sourcing Keygen. But I’m trying to grow it into a viable business, and I don’t think that would be wise at the moment. However, I do think open source tooling for things like this are incredibly valuable. (Disclosure: I’m the founder of Keygen.)

              1. 3

                But I’m trying to grow it into a viable business, and I don’t think that would be wise at the moment.

                I can’t help but be curious about the magic sauce (and run my own—despite not even selling any software), but oh well. Maybe someday.

                Open sourcing seems unwise to me, too. Small-scale deployments would just grab the open source edition and run with it. Plus I don’t think it’s entirely unrealistic to expect the big cloud providers to take it and make it theirs.

                (Possibly there’s a market for obfuscation that you could also work with, but requires very high expertise in very low level development on multiple platforms; Denuvo had decent success there until they didn’t.)

                While I have your attention, maybe you’ll find these points interesting which I would be doing differently:

                • Likely a simple pure binary protocol (possibly with a much more lightweight cryptographical protocol than TLS) for the endpoints that clients require. That might be easier to handle in C/embedded platforms.
                • Possibly drop the notion of a “user” and leave that to the API consumers’ discretion entirely. People need to stop rolling more authentication schemes out.
                • Built-in license key checksum. I know your license schemes allow for extension with cryptography, but a minor amount of typo correction before making a possibly expensive network request could be helpful, depending on how things are set up.
                • Elliptic curves (Ed25519 or P-521, specifically) as alternatives or possibly only signing option. Outright drop any padding scheme for RSA that isn’t PSS.
                • Always sign and encrypt everything on top of TLS so that TLS isn’t your only line of defense against a cracker.
                • Possibly considering encrypted certificates (X.509 or something of your own) as bundled information about a license. This could save some database lookups based on license ID, allow a “stateless” CDN for delivery—it only has to verify the certificate’s signature and its expiry time. They could also optionally embed (wrapped) encryption keys to wrap contents in, alleviating the catch-22 mentioned in the dist documentation: You could have regular licenses for a stub application that is used to make a request for a certificate and download the real application with the encryption key contained therein.
                • SHA256 or higher for any sort of file checksum.
                • Expose a “simple” API that skips the notion of policy, which is instead managed on the backend server. This can be useful for very small deployments.
                • Flexible elevated access tokens (deleting a product is such a dissimilarly dangerous operation in comparison to creating one or even just issuing new licneses).
                1. 2

                  Thanks for the great feedback! I’m actually working on a lot of those, especially introducing ECC into the stack and calculating better file checksums (I’m essentially proxying S3 at the moment, which uses MD5 checksums, much to my dismay). And FWIW, user management is an optional feature of the API, so if it’s not needed, one can simply create user-less licenses and be done with it. Better access token privileges has also been on my list since day 1 — just haven’t gotten around to that one yet. But hopefully soon.