1. 4

  2. 3

    I used to work a lot on DSLs for data modeling. It is useful to think about “Internal” vs “External” DSLs.

    In the former, you borrow programming language features to create code patterns that “read as” as a DSL. The classic examples here are “fluent APIs” in Java, the builder pattern, or the Django/SQLAlchemy ORMs in Python.

    An “External” DSL, by contrast, is a full-blown language that has its own syntax and semantics, and is not hosted by some other language. A good example here is SQL; a smaller example might be Dockerfiles.

    Of course, internal/external can also form a continuum of sorts. You can have “escape hatches” in the external DSL that allows for arbitrary code; for example, many SQL implementations allow for UDFs written in some other language. Likewise, in some languages, you can have an internal DSL that is quite restricted and starts to veer in the direction of being an external DSL; some Clojure macro-based APIs start to have this feel, for example.

    I spent nearly 2 years on DSLs professionally, and my conclusion at the end of that exercise is that you almost always want an internal DSL. Not only are they easier to build, but they are also easier to use. Every attempt I have seen of an external DSL written specifically for one project has ended up “A Big Pile”.

    I think this is because the best external language designers are programming language designers, and these are generally Herculean efforts. It tuns out parsing and compiling code is hard. Thus the best hosts for domain specific programming end up being flexible programming languages that already have well understood syntax and semantics.

    Looking back to this article, Python’s support for internal DSLs is quite limited. Especially compared to a language like Clojure.

    Some of the metaprogramming features (descriptors, decorators, context managers, metaclasses) let you write somewhat declarative code, but for the most part, true internal DSLs are rare. Instead, I like the take in the O’Reilly book, Fluent Python, that these features are just hallmarks of “Pythonic” API design, not meant to be abused to support non-obvious syntax for odd business domains.

    I think this lack of full-blown metaprogramming support in Python may be a feature, and not a bug, of the language, when it comes to long term maintainability of code. DSLs often feel magical within a language, and Python’s community frowns on too much magic. That said, as this article details, there are some exceptions (like Django ORM and numpy) where it feels like the magic was well-utilized.

    1. 2

      As far as “internal” DSLs go, you also have some choice as to the “embedding depth” of the DSL. A shallow embedding means that a DSL expression is immediately evaluated to a value in the host language, whereas a deep embedding means that a DSL expression is translated to an AST which can then be manipulated before evaluation.

      For example, if you wanted to make a math DSL that supported differentiation of expressions, you could maybe use dual numbers for doing this with a shallow embedding, or you could just use AST-based differentiation with a deep embedding.

      Using the finally tagless style, we can actually make our DSL expressions parametric over the level and manner of embedding. The expression “sin(x+y)” could optionally be interpreted as having type e.g. Double, in which case the code would compile to an extremely fast low-overhead shallow embedding, or it could be interpreted as having some recursive type like

      data Exp = Const Double | Sin Exp | Sum Exp Exp | ...

      in which case we could inspect the structure of the expression, a la;

      d x (Const a) = Const 0
      d x (Sin a) = Prod (Cos a) (d x a)
      d x (Var v) = if x == v then Const 1 else Const 0
    2. 2

      The title is misleading, it should be called DSLs embedded into Python. Or Python as a host language for DSLs