1. 13
  1. 4

    @akkartik what are your thoughts on having many little languages floating around?

    1. 11

      I see right through your little ploy to get me to say publicly what I’ve been arguing privately to you :) Ok, I’ll lay it out.

      Thanks for showing me this paper! I’d somehow never encountered it before. It’s a very clear exposition of a certain worldview and way of organizing systems. Arguably this worldview is as core to Unix as “do one thing and do it well”. But I feel this approach of constantly creating small languages at the drop of a hat has not aged well:

      • Things have gotten totally insane when it comes to the number of languages projects end up using. A line of Awk here, a line of Sed there, makefiles, config files, m4 files, Perl, the list goes on and on. A newcomer potentially may want to poke at any of these, and now (s)he may have to sit with a lengthy manpage for a single line of code. (Hello man perl with your 80+ parts.) I’m trying to find this egregious example in my notes, but I noticed a year or two ago that some core Ruby project has a build dependency on Python. Or vice versa? Something like that. The “sprawl” in the number of languages on a modern computer has gotten completely nuts.

      • I think vulnerabilities like Shellsock are catalyzing a growing awareness that every language you depend on is a potential security risk. A regular tool is fairly straightforward: you just have to make sure it doesn’t segfault, doesn’t clobber memory out of bounds, doesn’t email too many people, etc. Non-trivial but relatively narrow potential for harm. Introduce a new language, though, and suddenly it’s like you’ve added a wormhole into a whole new universe. You have to guard against problems with every possible combination of language features. That requires knowing about every possible language feature. So of course we don’t bother. We just throw up our arms and hope nothing bad happens. Which makes sense. I mean, do you want to learn about every bone-headed thing somebody threw into GNU make?!

      Languages for drawing pictures or filling out forms are totally fine. But that’s a narrower idea: “little languages to improve the lives of non-programmers”. When it comes to “little languages for programmers” the inmates are running the asylum.

      We’ve somehow decided that building a new language for programmers is something noble. Maybe quixotic, but high art. I think that’s exactly wrong. It’s low-brow. Building a language on top of a platform is the easy expedient way out, a way to avoid learning about what already exists on your platform. If existing languages on your platform make something hard, hack the existing languages to support it. That is the principled approach.

      1. 4

        I think the value of little languages comes not from what they let you do, but rather what they wont let you do. That is, have they stayed little? Your examples such as Perl, Make etc are those languages that did not stay little, and hence, no longer as helpful (because one has to look at 80+ pages to understand the supposedly little language). I would argue that those that have stayed little are still very much useful and does not contribute to the problem you mentioned (e.g. grep, sed, troff, dc – although even these have been affected by feature creep in the GNU world).

        Languages for drawing pictures or filling out forms are totally fine. But that’s a narrower idea: “little languages to improve the lives of non-programmers”. When it comes to “little languages for programmers” the inmates are running the asylum.

        This I agree with. The little languages have little to do with non-programmers; As far as I am concerned, their utility is in the discipline they impose.

        1. 3

          On HN a counterpoint paper was posted. It argues that using embedded domain specific languages is more powerful, because you can then compose them as needed, or use the full power of the host language if appropriate.

          Both are valid approaches, however I think that if we subdivide the Little Languages the distinction becomes clearer:

          • languages for describing something (e.g. regular expression, format strings, graph .dot format, LaTeX math equations, etc.) that are usable both from standalone UNIX tools, and from inside programming languages
          • languages with a dedicated tool (awk, etc.) that are not widely available embedded inside other programming languages. Usually these languages allow you to perform some actions / transformations

          The former is accepted as “good” by both papers, in fact the re-implementation of awk in Scheme from the 2nd paper uses regular expressions.

          The latter is limited in expressiveness once you start using them for more than just ad-hoc transformations. However they do have an important property that contributes to their usefulness: you can easily combine them with pipes with programs written in any other language, albeit only as streams of raw data, not in a type-safe way.

          With the little language embedded inside a host language you get more powerful composition, however if the host language doesn’t match that of the rest of your project, then using it is more difficult.

          1. 3

            First, a bit of critique on Olin Shivers’ paper!

            • He attacks the little languages as ugly, idiosyncratic, and limited in expressiveness. While the first two is subjective, I think he misses the point when he says they are limited in expressiveness. That is sort of the point.
            • Second, he criticizes that a programmer has to implement an entire language including loops, conditionals, variables, and subroutines, and these can lead to suboptimal design. Here again, in a little language, each of these structures such as variables, conditionals, and loops should not be included unless there is a very strong argument for the inclusion of it. The rest of the section (3) is more an attack on incorrectly designed little languages than on the concept of little languages per say. The same attacks can be leveled against his preferred approach of embedding a language inside a more expressive language.

            For me, the whole point of little languages has been the discipline they impose. They let me remove considerations of other aspects of the program, and focus on a small layer or stage at a time. It helps me compose many little stages to achieve the result I want in a very maintainable way. On the other hand, while embedding, as Shivers observes, the host language is always at hand, and the temptation for a bit of optimization is always present. Further, the host language does not always allow the precise construction one wants to use, and there is an impedance mismatch between the domain lingo and what the host language allows (as you also have observed). For example, see the section 5.1 on the quoted paper by Shivers.

            My experience has been that, programs written in the fashion prescribed by Shivers often end up much less readable than little languages with pipe line stages approach.

            1. 1

              That’s tantalizing. Do you have any examples of a large task built out of little stages, each written in its own language?

              1. 2

                My previous reply was a bit sparse. Since I have a deadline coming up, and this is the perfect time to write detailed posts in the internet, here goes :)

                In an earlier incarnation, I was an engineer at Sun Microsystems (before the Oracle takeover). I worked on the iPlanet[1] line of web and proxy servers, and among other things, I implemented the command line administration environment for these servers[2] called wadm. This was a customized TCL environment based on Jacl. We chose Jacl as the base after careful study, which looked at both where it was going to be used most (as an interactive shell environment), as well as its ease of extension. I prefer to think of wadm as its own little language above TCL because it had a small set of rules beyond TCL such as the ability to infer right options based on the current environment that made life a bit more simpler for administrators.

                At Sun, we had a very strong culture of testing, with a dedicated QA team that we worked closely with. Their expertise was the domain of web and proxy servers rather than programming. For testing wadm, I worked with the QA engineers to capture their knowledge as test cases (and to convert existing ad-hoc tests). When I looked at existing shell scripts, it struck me that most of the testing was simply invoke a command line and verify the output. Written out as a shell script, these may look ugly for a programmer because the scripts are often flat, with little loops or other abstractions. However, I have since come to regard them as a better style for the domain they are in. Unlike in general programming, for testing, one needs to make the tests as simple as possible, and loops and subroutines often make simple stuff more complicated than it is. Further, tests once written are almost never reused (as in, as part of a larger test case), but only rerun. Further, what we needed was a simple way to verify the output of commands based on some patterns, the return codes, and simple behavior such as response to specific requests, and contents of a few administration files. So, we created a testing tool called cat (command line automation tool) that essentially provided a simple way to run a command line and verify its result. This was very similar to expect[3]. It looked like this

                wadm> list-webapps --user=admin --port=[ADMIN_PORT] --password-file=admin.passwd --no-ssl
                wadm> add-webapp --user=admin --port=[ADMIN_PORT] --password-file=admin.passwd --config=[HOSTNAME] --vs=[VIRTUAL_SERVER] --uri=[URI_PATH]

                The =0 implies return code would be 0 i.e success. For matching, // represented a regular expression, “” represented a string, [] represented a shell glob etc. Ordering was not important, and all matches had to succeed. the names in square brackets were variables that were passed in from command line. If you look at our man pages, this is very similar to the format we used in the man pages and other docs.

                Wadm had two modes – stand alone, and as a script (other than the repl). For the script mode, the file containing wadm commands was simply interpreted as a TCL script by wadm interpreter when passed as a file input to the wadm command. For stand alone mode wadm accepted a sub command of the form wadm list-webapps --user=admin ... etc. which can be executed directly on the shell. The return codes (=0) are present only in stand alone mode, and do not exist in TCL mode where exceptions were used. With the test cases written in cat we could make it spit out either a TCL script containing the wadm commands, or a shell script containing stand alone commands (It could also directly interpret the language which was its most common mode of operation). The advantage of doing it this way was that it provided the QA engineers with domain knowledge an easy environment to function. The cat scripts were simple to read and maintain. They were static, and eschewed complexities such as loops, changing variable values, etc, and could handle what I assumed to be 80% of the testing scenarios. For the 80% of the remaining 20%, we provided simple loops and loop variables as a pre-processor step. If the features of cat were insufficient, engineers were welcome to write their test cases in any of perl, tcl, or shell (I did not see any such scripts during my time there). The scripts spat out by cat were easy to check and were often used as recipes for accomplishing particular tasks by other engineers. All this was designed and implemented in consultation with QA Engineers with their active input on what was important, and what was confusing.

                I would say that we had these stages in the end:

                1. The preprocessor that provides loops and loop variables.
                2. cat that provided command invocation and verification.
                3. wadm that provided a custom TCL+ environment.
                4. wadm used the JMX framework to call into the webserver admin instance. The admin instance also exposed a web interface for administration.

                We could instead have done the entire testing of web server by just implementing the whole testing in Java. While it may have been possible, I believe that splitting it out to stages, each with its own little language was better than such a step. Further, I think that keeping the little language cat simple (without subroutines, scopes etc) helped in keeping the scripts simple and understandable with little cognitive overhead by its intended users.

                Of course, each stage had existence on its own, and had independent consumers. But I would say that the consumers at each stage could chosen to have used any of the more expressive languages above them, and chose not to.

                1: At the time I worked there, it was called the Sun Java System product line.

                2: There existed a few command lines for the previous versions, but we unified and regularized the command line.

                3: We could not use expect as Jacl at that time did not support it.

                1. 1

                  Surely, this counts as a timeless example?

                  1. 1

                    I thought you were describing decomposing a problem into different stages, and then creating a separate little DSL for each stage. Bentley’s response to Knuth is just describing regular Unix pipes. Pipes are great, I use them all the time. But I thought you were describing something more :)

                    1. 1

                      Ah! From your previous post

                      A line of Awk here, a line of Sed there, makefiles, config files, m4 files, Perl, the list goes on and on … If existing languages on your platform make something hard, hack the existing languages to support it. That is the principled approach.

                      I assumed that you were against that approach. Perhaps I misunderstood. (Indeed, as I re-read it, I see that I have misunderstood.. my apologies.)

                      1. 1

                        Oh, Unix pipes are awesome. Particularly at the commandline. I’m just wondering (thinking aloud) if they’re the start of a slippery slope.

                        I found OP compelling in the first half when it talks about PIC and the form language. But I thought it went the wrong way when it conflated those phenomena with lex/yacc/make in the second half. Seems worth adding a little more structure to the taxonomy. There are little languages and little languages.

                        Languages are always interesting to think about. So even as I consciously try to loosen their grip on my imagination, I can’t help but continue to seek a more steelman defense for them.

            2. 2

              Hmm, I think you’re right. But the restrictions a language imposes have nothing to do with how little it is. Notice that Jon Bentley calls PIC a “big little language” in OP. Lex and yacc were tiny compared to their current size, and yet Jon Bentley’s description of them in OP is pretty complex.

              I’m skeptical that there’s ever such a thing as a “little language”. Things like config file parsers are little, maybe, but certainly by the time it starts looking like a language (as opposed to a file format) it’s well on its way to being not-little.

              Even if languages can be little, it seems clear that they’re inevitably doomed to grow larger. Lex and Yacc and certainly Make have not stood still all these years.

              So the title seems a misnomer. Size has nothing to do with it. Rust is not small, and yet it’s interesting precisely because of the new restrictions it imposes.

            3. 3

              I use LPeg. It’s a Lua module that implements Parsing Expression Grammars and in a way, it’s a domain specific language for parsing text. I know my coworkers don’t fully understand it [1] but I find parsing text via LPeg to be much easier than in plain Lua. Converting a name into its Soundex value is (in my opinion) trivial in LPeg. LPeg even comes with a sub-module to allow one to write BNF (here’s a JSON parser using that module). I find that easier to follow than just about any codebase you could present.

              So, where does LPeg fall? Is it another language? Or just an extension to Lua?

              I don’t think there’s an easy answer.

              [1] Then again, they have a hard time with Lua in general, which is weird, because they don’t mine Python, and if anything, Lua is simpler than Python. [2]

              [2] Most programmers I’ve encountered have a difficult time working with more than one or two languages, and it takes them a concerted effort to “switch” to a different language. I don’t have that issue—I can switch among languages quite easily. I wonder if this has something to do with your thoughts on little languages.

              1. 2

                I think you are talking about languages that are not little, with large attack surfaces. If a language has a lengthy man page, we are no longer speaking about the same thing.

                Small configuration DSLs (TOML, etc), text search DSLs (regex, jq, etc), etc are all marvelous examples of small languages.

                1. 1

                  My response to vrthra addresses this. Jon Bentley’s examples aren’t all that little either.[1] And they have grown since, like all languages do.

                  When you add a new language to your project you aren’t just decorating your living room with some acorns. You’re planting them. Prepare to see them grow.

                  [1] In addition to the quote about “big little language”, notice the “fragment of the Lex description of PIC” at the start of page 718.

                  1. 1

                    What, so don’t create programming languages because they will inevitably grow? What makes languages different from any other interface? In my experience, interfaces also tend to grow unless carefully maintained.

                    1. 2

                      No, that’s not what I mean. Absolutely create programming languages. I’d be the last to stop you. but also delete programming languages. Don’t just lazily add to the pile of shit same as everybody else.

                      And yes, languages are exactly the same as any other interface. Both tend to grow unless carefully maintained. So maintain, dammit!

            4. 2

              Is a page lost after 713? The continuation after “These routines are rather primitive; more clever “… is not found in 714