1. 11
  1.  

  2. 6

    In the general case, I have developed a deep and long-lasting skepticism of DSLs. I was a very heavy proponent of them during my grad studies, and investigated pretty thoroughly using a rules engine for Space Invaders Enterprise Edition and a runtime monitor for Super Mario World.

    I went a little further down this path before I abandoned it for reasons unrelated to the DSL skepticism. That happened later. I just wanted to give context that I was actually naturally predisposed to liking them.

    What has happened in my time on this earth as a software engineer is the feeling that it is axiomatic that all DSLs eventually tend towards something Turing complete. New requirements appear, features are added, the DSL heads further towards Turing completeness. Except the DSL does not have the fundamental mechanics to express Turing completeness, it is by fundamental design supposed to not do that. What you end up with is something very complex, where users are performing all sorts of crazy contortions to get behavior they want, and you can never roll that back. I feel like DSLs are essentially doomed from the outset.

    I am much, much more optimistic about opinionated libraries as the means to solve the problems DSLs do (Ruby on Rails being the most obvious one). That way any of the contortions can be performed in a familiar language that the developer is happy to use and won’t create crazy syntax, and the library then called to do whatever limited subset of things it wants to support. For basic users, they’ll interact with the library only and won’t see the programming language. As things progress, the base language can be brought in to handle more complex cases as pre/post-processing by the caller, without infringing on the design of the library.

    At Google, we have a number of DSLs to perform many different tasks which I won’t go into here. Each one requires a certain learning curve and a certain topping-out where you can’t express what you want. I was much happier with an opinionated library approach in Python, where I could do a great deal of what I wanted without peering behind the curtain of what was going to be performed.

    1. 6

      sklogic on Hacker News had a different view: you start with a powerful, Turing-complete language that supports DSL’s with them taking the place of libraries. He said he’ll use DSL’s for stuff like XML querying, Prolog where logic approach makes more sense, Standard ML when he wants it type-safe in simple form, and, if all else fails or is too kludgy, drops back into LISP that hosts it all. He uses that approach to build really complicated tools like his mbase framework.

      I saw no problem with the approach. The 4GL’s and DSL’s got messy because they had to be extended toward powerful. Starting with something powerful that you constrain where possible eliminates those concerns. Racket Scheme and REBOL/Red are probably best examples. Ivory language is an example for low-level programming done with Haskell DSL’s. I have less knowledge of what Haskell’s can do, though.

      1. 3

        I think it’s a good approach, but it’s still hard to make sure that the main language hosting all the DSLs can accomodate all of their quirks. Lisp does seem to be an obvious host language, but if it were that simple then this approach would have taken off years ago.

        Why didn’t it? Probably because syntax matters and error messages matter. Towers of macros produce bad error messages. And programmers do want syntax.

        And I agree that syntax isn’t just a detail; it’s an essential quality of the language. I think there are fundamental “information theory” reasons why certain syntaxes are better than others.

        Anything involving s-expressions falls down – although I know that sklogic’s system does try to break free of s-expression by adding syntax.

        Another problem is that ironically by making it too easy to implement a DSL, you get bad DSLs! DSLs have to be stable over time to be made “real” in people’s heads. If you just have a pile of Lisp code, there’s no real incentive for stability or documentation.

        1. 4

          “but if it were that simple then this approach would have taken off years ago.”

          It did. The results were LISP machines, Common LISP, and Scheme. Their users do little DSL’s all the time to quickly solve their problems. LISP was largely killed off by AI Winter in a form of guilt by association. It was also really weird vs things like Python. At least two companies, Franz and LispWorks, are still in Common LISP business with plenty of success stories on complex problems. Clojure brought it to Java land. Racket is heavy on DSL’s backed by How to Design Programs and Beautiful Racket.

          There was also a niche community around REBOL, making a comeback via Red, transformation languages like Rascal, META II follow-ups like Ometa, and Kay et al’s work in STEPS reports using “IS” as foundational language. Now, we have Haskell, Rust, Nim, and Julia programmers doing DSL-like stuff. Even some people in formal verification are doing metaprogramming in Coq etc.

          I’d say the idea took off repeatedly with commercial success at one point.

          “Probably because syntax matters and error messages matter. Towers of macros produce bad error messages. And programmers do want syntax.”

          This is a good point. People also pointed out in other discussions with sklogic that each parsing method had its pro’s and con’s. He countered that they can just use more than one. I think a lot of people don’t realize that today’s computers are so fast and we have so many libraries that this is a decent option. Especially if we use or build tools that autogenerate parsers from grammars.

          So, IIRC, he would use one for raw efficiency first. If it failed on something, that something would get run through a parser designed for making error detection and messages. That’s now my default recommendation to people looking at parsers.

          “Anything involving s-expressions falls down – although I know that sklogic’s system does try to break free of s-expression by adding syntax.”

          Things like Dylan, Nim, and Julia improve on that. There’s also just treating it like a tree with a tree-oriented language to manipulate it. A DSL for easily describing DSL operations.

          “nother problem is that ironically by making it too easy to implement a DSL, you get bad DSLs!”

          The fact that people can screw it up probably shouldn’t be an argument against it since they can screw anything up. The real risk of gibberish, though, led (per online commenters) a lot of teams using Common LISP to mandate just using a specific coding style with libraries and no macros for most of the apps. Then, they use macros just handling what makes sense like portability, knocking out boilerplate, and so on. And the experienced people wrote and/or reviewed them. :)

          1. 2

            Probably because syntax matters and error messages matter. Towers of macros produce bad error messages. And programmers do want syntax.

            Another problem is that ironically by making it too easy to implement a DSL, you get bad DSLs! DSLs have to be stable over time to be made “real” in people’s heads. If you just have a pile of Lisp code, there’s no real incentive for stability or documentation.

            I’m so glad to see this put into words. Although for me, I find it frustrating that this seem to be universally true. I was pretty surprised the first time around when I felt my debugger was telling me almost nothing because my syntax was so uniform, I couldn’t really tell where I was in the source anymore!

            Some possibilities for this not to be true that I’m hoping for: maybe its like goto statements and if we restrict ourselves to make DSLs in a certain way, they won’t become bad (or at least won’t become bad too quickly). By restricting the kind of gotos we use (and presenting them differently), we managed to still keep the “alter control flow” aspect of goto.

            Maybe there’s also something to be done for errors. Ideally, there’d be a way to spend time proportional to the size of the language to create meaningful error messages. Maybe by adding some extra information somewhere that currently implicit in the language design.

            I don’t know what to do about stability though. I mean you could always “freeze” part of the language I guess.

            For this particular project, I’m more afraid that they’ll go the SQL route where you need to know so much about how the internals work that it mostly defeats the purpose of having a declarative language in the first place. I’d rather see declarative languages with well-defined succinct transformations to some version of the code that correspond to the actual execution.

            1. 1

              (late reply) Someone shared this 2011 essay with me, which has apparently been discussed to death, but I hadn’t read it until now. It says pretty much exactly what I was getting at!

              http://winestockwebdesign.com/Essays/Lisp_Curse.html

              In this essay, I argue that Lisp’s expressive power is actually a cause of its lack of momentum.

              I said:

              Another problem is that ironically by making it too easy to implement a DSL, you get bad DSLs!

              So that is the “curse of Lisp”. Although he clarifieds that they’re not just “bad” – there are too many of them.

              He mentions documentation several times too.

              Thus, they will have eighty percent of the features that most people need (a different eighty percent in each case). They will be poorly documented. They will not be portable across Lisp systems.

              Domain knowledge is VERY hard to acquire, and the way you share that is by developing a stable and documented DSL. Like Awk. I wouldn’t have developed Awk on my own! It’s a nice little abstraction someone shared with me, and now I get it.

              The “bipolar lisp programmer” essay that he quotes also says the same things… I had not really read that one either but now I get more what they’re saying.

              1. 1

                Thanks for sharing that link again! I don’t think I’ve seen it before, or at least have forgotten. (Some of the links from it seem to be broken unfortunately.)

                One remark I have is that I think you could transmit information instead of code and programs to work around this curse. Implicit throughout the article is that collaboration is only possible if everyone uses the same language or dialect of it; indeed, this is how version controlled open-source projects are typically structured: around the source.

                Instead, people could collaboratively share ideas and findings so everyone is able to (re)implemented it in their own DSL. I say a bit more on this in my comment here.

                In my case, on top of documentation (or even instead of it), I’d like to have enough instructions for rebuilding the whole thing from scratch.

                To answer your comment more directly

                Domain knowledge is VERY hard to acquire, and the way you share that is by developing a stable and documented DSL

                I totally agree that domain knowledge is hard to acquire but I’m saying that this only one way of sharing that knowledge once found. The other way is through written documents.

        2. 4

          Since I like giving things names, I think of this as the internal DSL vs external DSL argument [1]. This applies to your post and the reply by @nickpsecurity about sklogic’s system with Lisp at the foundation. If there is a better or more common name for it, I’d like to know.

          I agree that internal DSLs (ones embedded in a full programming language) are preferable because of the problems you mention.

          The external DSLs always evolve into crappy programming languages. It’s “failure by success” – they become popular (success) and the failure mode is that certain applications require more power, so they become a programming language.

          Here are my examples with shell, awk, and make, which all started out non Turing-complete (even Awk) and then turned into programming languages.

          http://www.oilshell.org/blog/2016/11/14.html

          Ilya Sher points out the same problems with newer cloud configuration languages.

          https://ilya-sher.org/2018/09/15/aws-cloudformation-became-a-programming-language/

          I also worked at Google, and around the time I started, there were lots of Python-based internal DSLs (e.g. the build system that became Blaze/Bazel was literally a Python script, not a Java interpreter for a subset of Python).

          This worked OK, but these systems eventually got rewritten because Python isn’t a great language for internal DSLs. The import system seems to be a pretty significant barrier. Another thing that is missing is Ruby-style blocks, which are used in configs like Vagrantfile and I think Puppet. Ruby is better, but not ideal either. (Off the top of my head: it’s large, starts up slowly, and has version stability issues.)

          I’m trying to address some of this with Oil, although that’s still a bit far in the future :-/ Basically the goal is to design a language that’s a better host for internal DSLs than Python or Ruby.

          [1] https://martinfowler.com/bliki/InternalDslStyle.html

          1. 3

            If a programming language is flexible enough, the difference between DSL and library practically disappears.

            1. 1

              DSL’s work great when the domain is small and stays small and is backed by corporal punishment. Business Software is an astronomically large domain.

            2. 3

              Just want to point out (since I couldn’t find it on the site at first) “the license is in development” while it’s in beta, but the plan is not to make this free software. AFAICT the community edition will be Creative Commons BY-NC-SA. So that’s a bummer.

              1. 1

                What exactly does it mean that the community edition is BY-NC-SA? That you have to attribute and publicly share any code that you write with Alan?

                1. 1

                  The BY bit means you have to provide attribution if you make a derivative work. The SA means that derivative works have to be under the same license, similar to the GPL. NC means that commerical use is forbidden (the NC stands for non-commercial), which violates freedom 0 and makes this not free software. They don’t specify what version they’d use but presumably it would be the latest, which would make https://creativecommons.org/licenses/by-nc-sa/4.0/ the license in question.

                  1. 1

                    What does derivative work mean here? An app you build using their framework, or just making changes to the framework itself?

                    1. 1

                      I don’t know. There may be an answer out there, but Creative Commons is not designed for software. If we were discussing, say, the GPL, there might be a clearer answer. I have a very hazy guess, but it’s based on no research and overall I’m so uninformed about your question that I don’t want to speculate in public :P

                      I don’t know how prevalent software using CC licenses is, but it’s possible no one would know and we’d have to wait for a court to decide.

              2. 1

                It is always exciting to read about something that helps “close the gap between requirement and implementation”, as stated as the primary motivation in the documentation.

                I’m sure that Alan works well for a subset of domain problems, particularly trivial ones, or those that can be well represented by the recording of domain elements. The same space filled by a solution like SalesForce or Zoho. Like them, it can be attractive to get “services” like authentication/authorization, or a UI, or a provided implementation of other common non-functional requirements for free.

                However, I’m not convinced that a “data-driven” approach is the general answer, for large complex systems outside of the domain of computing. An approach where process and change is modeled through a state machine, as I assume is the case with Alan (no documentation on this?).

                What was the result of our industry spending a decade pursuing a data-flow approach through “structured analysis”?