Threads for edwintorok

  1. 2

    There are regex libraries that avoid some of these regex DOS issues. Of course it comes with limitations: no support for backreferences in regex itself (\1, etc.), but for the vast majority of applications that would be an acceptable limitation.(I’m not sure whether matching with backreferences is even doable in linear time in the general case, but it is certainly not trivial, and I haven’t seen any major regex library implement it efficiently).

    Fixing this “vulnerability” in each application doesn’t seem right, a better way to fix this is at the foundations level: almost all libcs (except OpenBSD’s) would be vulnerable to this, however a better regex engine would avoid the problem, e.g.

    (Or their python bindings: https://pypi.org/project/tre/ and https://pypi.org/project/re2/)

    There is a good series of articles describing regex matching, and how to get actual linear time matching (although some care needs to be taken during the compilation of the regex itself to avoid exponential blowup, a combination of a NFA/DFA as in Google’s re2 achieves a good performance balance while still retaining linear time matching): https://swtch.com/~rsc/regexp/

    There is nothing “new” about this vulnerability, I switched ClamAV to use OpenBSD’s implementation of regex in 2007 already. Not necessarily due to DoS, but there were some platforms where the system provided regexec was orders of magnitude slower than the one in libc (quadratic or worse runtime performance even on non-malicious regexes) so using one with known runtime and performance characteristics was a good replacement.

    1. 2

      I never really understood how you’re supposed to sanely make use of these CPU features in general and specifically from C - I mean, if you assume everyone’s a Gentoo user, sure, these flags will be optimally tuned specifically to the user’s machine. But normally you’d want to distribute general-purpose binaries and/or have a build process that’s as portable as possible and makes as few assumptions as possible.

      1. 2

        The attribute was meant to chose the optimal codepath at runtime based on the end user’s machine (not the build machine), by compiling the same function with multiple optimization flags (in this case selecting different instruction sets). So then regular users who don’t use Gentoo can benefit from using the newer instructions their CPUs provide without producing a binary that requires that particular CPU as a minimum.

        However as the article points out that is not so simple because that only works reliably if you use gcc+glibc. Other toolchains (clang, musl-libc) have various issues.

      1. 3

        /tmp/ape

        I hope this isn’t executing some random file out of /tmp if it happens to exist.

        1. 2

          Author here. Here’s the context you didn’t quote. APE first tries to use the ape program off your $PATH. If it isn’t there, then it tries to extract the embedded one to $TMPDIR/ape since it’s defined by POSIX and used by systems like MacOS to create secure user-specific directories. If $TMPDIR isn’t defined, then finally, as a last resort, it extracts to /tmp/ape. This creates a potential race condition where multiple users try to run an APE binary at the same time. So their sysadmin should install the ape program systemwide to fix that.

          1. 1

            I don’t see how this addresses the problem though. Are you saying that it will never use an existing /tmp/ape? If that is the case why use a predictable name instead of something random that doesn’t have the race condition? If it will reuse an existing /tmp/ape then if a malicious user creates a malicious /tmp/ape another user will try to run it.

            1. 2

              That’s a problem with Unix distros rather than APE. The POSIX standard recommendation is to use $TMPDIR. It’s not APE’s responsibility to secure your computer. There’s very little we can do if your distro doesn’t follow standards. We have however always been clear and transparent in documentation. The APE source code has paragraphs talking about this topic. The concern has been raised multiple times on GitHub. If you think there’s something more we could be doing that’s actionable, let me know and I’ll take it into consideration.

              1. 1

                What standard are you talking about? I don’t recall any standard that suggest that it is safe to execute files out of $TMPDIR.

                I would also be interested if you have links to previous discussions or documentation.

                1. 2

                  The Open Group Base Specifications Issue 7, 2018 edition IEEE Std 1003.1-2017 (Revision of IEEE Std 1003.1-2008) talks at length in multiple places about TMPDIR and how it addresses concerns about race conditions between multiple users. Concerns about whether or not it’s safe to run programs you’ve downloaded off the internet is a non-technical issue that’s orthogonal to this discussion.

                  1. 2

                    Just naming a spec isn’t a strong argument. But since apparently I’m bored I decided to grep the version of that spec I could find and only see two notable references to TMPDIR. I don’t know if I have a partial or wrong version.

                    Neither of these seem to imply that TMPDIR should be per-user or otherwise safe to execute files from. In fact the second one explicitly says that “Implementations shall ensure that temporary files, when used by the standard utilities, are named so that different utilities or multiple instances of the same utility can operate simultaneously without regard to their working directories, or any other process characteristic other than process ID.” which doesn’t appear to be the case with a static name. Although this is talking more about standard utilities than the properties of the directory itself.

                    But even if there was something in the standard I don’t think it is very useful if no one follows it and AFAICT all major distros have a common TMPDIR that is world writable but sticky. So if you are executing a file out of it that you can’t prove that your user wrote you are gonna have a bad time.

                    1. 1

                      Neither of these seem to imply that TMPDIR should be per-user

                      Could you explain to me how a multi-user directory defined as an environment variable would work?

                      1. 2

                        Forgive me for misunderstanding the conversation, but my assumption in reading the thread is that the question is actually:

                        why aren’t you using mktemp(1) or similar?

                        1. 1

                          Because ${TMPDIR:-/tmp}/ape.$$ effectively does the same thing except with shell builtins. Fast performance is the most important thing. It then atomically renames the file so there’s no race conditions and you only need to store one copy. Not wasting disk space with a zillion $TMPDIR/ape.9QaW9BO1NB files is very important. It’s not possible to program an executable to delete itself when it’s done running. It’s also not acceptable to have executables delete themselves while they’re running. Maximum transparency is a goal of the project.

                          1. 3

                            $TMPDIR is not guaranteed to be user / process specific, and may be /tmp so the concern is the classic “time of check to time of use” vulnerability. I think that’s a valid concern here.

                            The major problem is that I may not be aware that a program I am using is executed via ape. This means that I might not know to install /usr/bin/ape beforehand leaving me open to exploitation.

                            Also, I’m surprised that you wouldn’t be able to register an atexit that:

                            • forks
                            • setsid
                            • rm argv[0]
                            1. 2

                              We can address your concern by making sure the APE loader is included in Unix distros by default. So far operating systems have had a positive reaction to this new format and have gone to great lengths to help us out. For example, the last two years we spent upstreaming changes to zsh, fish, FreeBSD, and NetBSD are what helped us get this far. Back when I started doing this, there was even a meeting with the POSIX committee (Austin Group) about whether or not the rules against executing shell scripts containing binary code should be lifted, in order to permit formats like this. The answer was yes. So we moved forward. This whole topic of “what if /tmp is compromised by an adversary” never really came up. But then again, back then, the favored approach was to self-modify the binary in place. That approach should address your concerns too. However it made other communities like the NixOS crowd unhappy. So what we’re doing is providing a variety of APE utilities and strategies so you can choose the one that works best for you!

                        2. 1

                          I don’t understand the question.

                          What I meant is that if all users have the same directory in TMPDIR then you have to worry about untrusted files in that directory. If each user had a unique TMPDIR that only they could write to them they can trust it’s contents (to some degree).

            2. 1

              /tmp can also be mounted as noexec in some cases, in which case this may not work. Finding someplace where you are guaranteed to be able to write on every system can be quite difficult (/dev/shm looks like a candidate althought that one I really don’t see why you’d ever want to execute things from), but some XDG env vars might help., e.g. XDG_RUNTIME_DIR if set.

              1. 3

                IIUC the only path that is actually “gaurenteed” to be executable is $HOME/.local/bin as it says “… executable files may be written”. The spec doesn’t say anything specific about exec permissions.

                But $XDG_RUNTIME_DIR does say “The directory MUST be owned by the user, and he MUST be the only one having read and write access to it” which should mitigate the security problems here.

                1. 1

                  Would be nice if ape didn’t have to rely on a shell script to start itself, but just execute it directly (I think after you started it at least once it may rewrite itself so it doesn’t have to, but then you lose the “cross OS” portability part).

                  1. 1

                    Author here. You can make your APE binaries directly executable by “assimilating” them. One of the features of the shell script is a ./foo.com --assimilate flag. It causes your binary to be modified in-place to be ELF or Mach-O depending on the system.

                2. 1

                  /tmp can also be mounted as noexec in some cases

                  If a system administrator has made that choice, then the APE format is designed to respect their decision by not working around it. What APE binaries do instead is print an error. That way the user can make a decision about whether or not he/she wants to install the ape loader systemwide.

              1. 11

                So this is not a dig at them, but what are they trying to achieve? Mozilla overall has recently issues with both user retention and funding. I’m not sure i understand why they’re pushing for an entirely new thing (which I assume cost them some money to acquire k9) rather than improving the core product situation?

                Guesses: a) those projects so separate in funding that it’s not an issue at all, or b) they’re thinking of an enterprise client with a paid version?

                1. 9

                  These things are indeed separate in funding. Thunderbird is under a whole different entity than, say, Firefox

                  1. 2

                    Aren’t they both funded by the Mozilla Foundation? How are they separate?

                      1. 7

                        @caleb wrote:

                        Aren’t they both funded by the Mozilla Foundation? How are they separate?

                        Your link’s first sentence:

                        As of today, the Thunderbird project will be operating from a new wholly owned subsidiary of the Mozilla Foundation […]

                        I’m confused…

                        1. 2

                          Seems pretty clear by the usage of the word “subsidiary”

                          Subsidiaries are separate, distinct legal entities for the purposes of taxation, regulation and liability. For this reason, they differ from divisions, which are businesses fully integrated within the main company, and not legally or otherwise distinct from it.[8] In other words, a subsidiary can sue and be sued separately from its parent and its obligations will not normally be the obligations of its parent.

                          The parent and the subsidiary do not necessarily have to operate in the same locations or operate the same businesses. Not only is it possible that they could conceivably be competitors in the marketplace, but such arrangements happen frequently at the end of a hostile takeover or voluntary merger. Also, because a parent company and a subsidiary are separate entities, it is entirely possible for one of them to be involved in legal proceedings, bankruptcy, tax delinquency, indictment or under investigation while the other is not.

                  2. 5

                    They’re going to need to work on a lot of things, including a lot of stability improvements as well as better/more standard support for policies and autoconfig/SSO for Thunderbird to really be useful in enterprise.

                    Frankly, Thunderbird is the only real desktop app that I know of that competes with Outlook, and it’s kind of terrible… there really is a market here, and I don’t think that working on android client is what they need

                    1. 2

                      Gnome Evolution works better than Thunderbird in an enterprise. For thunderbird IIUC you need a paid add-on to be able to connect to Office365 Outlook mailboxes (in the past there used to be an EWS plugin that worked with onprem Outlook, but doesn’t seem to work with O365), whereas Evolution supports OAuth out of the box.

                      1. 4

                        Thunderbird supports IMAP/SMTP Oauth2 out of the box, which O365 has if your org has it enabled. What it lacks (and what Evolution has in advantage) is Exchange support.

                        If your org has IMAP/SMTP/activesync enabled then you can even do calendaring and global address completion using TbSync, which I rely heavily on for CalDAV / CardDAV support anyway (though I hear Thunderbird is looking to make these two an OOB experience as well)

                    2. 3

                      I can’t say for certain, but I think maybe they’re looking to provide a similar desktop experience on mobile. I use Firefox and Thunderbird for work, and it is a curious thing to note that Thunderbird did not get any kind of Android version. Firefox already released base and Focus as Android applications, so it would be cool to see a Thunderbird exist in the (F)OSS Android ecosystem.

                      I have been a K-9 user for a number of years but I do think it’s UI could use a bit of an update. I have been using it since Android 5.0 and it has basically had the same interface since the initial Material release. This could be an exciting time for K-9 to get a new coat of paint. I will love K-9 mail even if this doesn’t pan out well.

                      1. 5

                        K-9 mail is almost perfect the way it currently is on Android (at least when it comes to connecting to personal mailboxes). I can’t speak about how well it’d work in an enterprise because I keep work stuff off my phone on purpose.

                        1. 4

                          The biggest functional shortcoming with K-9 is no support for OAuth2 logins, such as GMail and Office365. You can currently use K-9 Mail with an app-specific password in GMail, but Google will be taking that ability away soon. I also have some minor issues with notifications; my home IMAP server supports IDLE, but I still often see notifications being significantly delayed.

                          In terms of interface, there was a Material cleanup a while ago, and the settings got less complicated and cluttered, so it’s very usable and reasonably presentable. But it does look increasingly out of date (though that’s admittedly both subjective and an endless treadmill).

                          1. 3

                            oauth2 was merged a few days ago https://github.com/thundernest/k-9/pull/6082

                            1. 1

                              Ah, yeah, I saw elsewhere that it’s the only priority for the next release.

                    1. 2

                      From the title I would’ve assumed this relies on some Obj magic, nice to see it doesn’t and just shows how to use pretty printers written using Fmt in the top-level.

                      1. 1

                        Building blocks are fine when you got time to build. But nowadays not many do. We have stuff to do, we don’t have time to figure out configurations, unfortunately.

                        If I would use emacs, i would just use doom-emacs and call it a day.

                        I have my own neovim config however, because all the configs i could find were full of stuff I didnt need..

                        1. 2

                          If you liked doom-emacs, you might like doom-nvim. Its philosophy is that you can use only what you need from it. Also found kickstart.nvim and neovim-from-scratch useful for writing my own neovim config (well, for “modernising” my previous Vim8 config).

                          1. 1

                            I should give doom-nvim another look then. Didn’t know about kickstart nor neovim-from-scratch, thanks!

                        1. 1

                          It should be possible to use Search Console to figure out where the problem is: https://developers.google.com/web/fundamentals/security/hacked/hacked_with_malware Or try some specific URLs here: https://transparencyreport.google.com/safe-browsing/search?url=gw90.de Using the open source ClamAV anti-virus to scan all the pages you host on the domain might also give some clues.

                          1. 5

                            What’s the point of using DocBook over HTML5? Every one of the elements described maps 1:1 to an HTML5 element. Is it because there’s a larger set of tools to take DocBook and typeset it for printing? Or is the verbosity of DocBook a usability advantage (<para> V.S. <p>)?

                            1. 3

                              Great question!

                              DocBook has lots of tools for rendering, to much more than just HTML: Literal books, PDFs, manpages, knowledgebases, even special formats for some editors to find and interpret and provide contextual help on a project.

                              DocBook has special tags to help represent EBNF and functions and types and GUIs and error messages and function arguments and command line program options and variable names and and and and.

                              Yes, every element described maps 1:1 but there are hundreds more which are undescribed by this post and are useful for large documentation projects.

                              Edited to say: much of this is not strictly necessary for getting started and writing some docs, which is the goal of this document. It is so easy to look at the massive amount of tags and try and pick the most perfect semantics for your writing when the truth is having any docs being contributed is much more important. Leave the precise and fiddly semantics to the maintainers and PR reviewers. Let’s just write some docs.

                              1. 2

                                I’ve been considering building a system to auto-generate documentation from the source code and comments of different languages into a common format with no styling information. Would DocBook be a good format-of-record for my project?

                                I’d like to use the preferred tool for each language to extract comments-based docs, and then a single new tool to combine source-derived documentation with human-authored guides into a final presentation format:

                                • Ruby -> YARD -> DocBook
                                • Java -> Javadoc -> DocBook
                                • Lang -> Native tool for Lang -> DocBook
                                • Markdown -> DocBook

                                Then at the end, I can take all the DocBook and unify the presentation style:

                                • DocBook -> HTML
                                1. 1

                                  You could try integrating pandoc, it has a Lua or Haskell API, and an internal intermediate representation: https://pandoc.org/index.html

                                2. 1

                                  Follow up questions: what toolchain do you use?

                                  And are there styles for rendering docbook that feel satisfactory for people used to latex typesetting?

                              1. 4

                                @akkartik what are your thoughts on having many little languages floating around?

                                1. 11

                                  I see right through your little ploy to get me to say publicly what I’ve been arguing privately to you :) Ok, I’ll lay it out.

                                  Thanks for showing me this paper! I’d somehow never encountered it before. It’s a very clear exposition of a certain worldview and way of organizing systems. Arguably this worldview is as core to Unix as “do one thing and do it well”. But I feel this approach of constantly creating small languages at the drop of a hat has not aged well:

                                  • Things have gotten totally insane when it comes to the number of languages projects end up using. A line of Awk here, a line of Sed there, makefiles, config files, m4 files, Perl, the list goes on and on. A newcomer potentially may want to poke at any of these, and now (s)he may have to sit with a lengthy manpage for a single line of code. (Hello man perl with your 80+ parts.) I’m trying to find this egregious example in my notes, but I noticed a year or two ago that some core Ruby project has a build dependency on Python. Or vice versa? Something like that. The “sprawl” in the number of languages on a modern computer has gotten completely nuts.

                                  • I think vulnerabilities like Shellsock are catalyzing a growing awareness that every language you depend on is a potential security risk. A regular tool is fairly straightforward: you just have to make sure it doesn’t segfault, doesn’t clobber memory out of bounds, doesn’t email too many people, etc. Non-trivial but relatively narrow potential for harm. Introduce a new language, though, and suddenly it’s like you’ve added a wormhole into a whole new universe. You have to guard against problems with every possible combination of language features. That requires knowing about every possible language feature. So of course we don’t bother. We just throw up our arms and hope nothing bad happens. Which makes sense. I mean, do you want to learn about every bone-headed thing somebody threw into GNU make?!

                                  Languages for drawing pictures or filling out forms are totally fine. But that’s a narrower idea: “little languages to improve the lives of non-programmers”. When it comes to “little languages for programmers” the inmates are running the asylum.

                                  We’ve somehow decided that building a new language for programmers is something noble. Maybe quixotic, but high art. I think that’s exactly wrong. It’s low-brow. Building a language on top of a platform is the easy expedient way out, a way to avoid learning about what already exists on your platform. If existing languages on your platform make something hard, hack the existing languages to support it. That is the principled approach.

                                  1. 4

                                    I think the value of little languages comes not from what they let you do, but rather what they wont let you do. That is, have they stayed little? Your examples such as Perl, Make etc are those languages that did not stay little, and hence, no longer as helpful (because one has to look at 80+ pages to understand the supposedly little language). I would argue that those that have stayed little are still very much useful and does not contribute to the problem you mentioned (e.g. grep, sed, troff, dc – although even these have been affected by feature creep in the GNU world).

                                    Languages for drawing pictures or filling out forms are totally fine. But that’s a narrower idea: “little languages to improve the lives of non-programmers”. When it comes to “little languages for programmers” the inmates are running the asylum.

                                    This I agree with. The little languages have little to do with non-programmers; As far as I am concerned, their utility is in the discipline they impose.

                                    1. 3

                                      On HN a counterpoint paper was posted. It argues that using embedded domain specific languages is more powerful, because you can then compose them as needed, or use the full power of the host language if appropriate.

                                      Both are valid approaches, however I think that if we subdivide the Little Languages the distinction becomes clearer:

                                      • languages for describing something (e.g. regular expression, format strings, graph .dot format, LaTeX math equations, etc.) that are usable both from standalone UNIX tools, and from inside programming languages
                                      • languages with a dedicated tool (awk, etc.) that are not widely available embedded inside other programming languages. Usually these languages allow you to perform some actions / transformations

                                      The former is accepted as “good” by both papers, in fact the re-implementation of awk in Scheme from the 2nd paper uses regular expressions.

                                      The latter is limited in expressiveness once you start using them for more than just ad-hoc transformations. However they do have an important property that contributes to their usefulness: you can easily combine them with pipes with programs written in any other language, albeit only as streams of raw data, not in a type-safe way.

                                      With the little language embedded inside a host language you get more powerful composition, however if the host language doesn’t match that of the rest of your project, then using it is more difficult.

                                      1. 3

                                        First, a bit of critique on Olin Shivers’ paper!

                                        • He attacks the little languages as ugly, idiosyncratic, and limited in expressiveness. While the first two is subjective, I think he misses the point when he says they are limited in expressiveness. That is sort of the point.
                                        • Second, he criticizes that a programmer has to implement an entire language including loops, conditionals, variables, and subroutines, and these can lead to suboptimal design. Here again, in a little language, each of these structures such as variables, conditionals, and loops should not be included unless there is a very strong argument for the inclusion of it. The rest of the section (3) is more an attack on incorrectly designed little languages than on the concept of little languages per say. The same attacks can be leveled against his preferred approach of embedding a language inside a more expressive language.

                                        For me, the whole point of little languages has been the discipline they impose. They let me remove considerations of other aspects of the program, and focus on a small layer or stage at a time. It helps me compose many little stages to achieve the result I want in a very maintainable way. On the other hand, while embedding, as Shivers observes, the host language is always at hand, and the temptation for a bit of optimization is always present. Further, the host language does not always allow the precise construction one wants to use, and there is an impedance mismatch between the domain lingo and what the host language allows (as you also have observed). For example, see the section 5.1 on the quoted paper by Shivers.

                                        My experience has been that, programs written in the fashion prescribed by Shivers often end up much less readable than little languages with pipe line stages approach.

                                        1. 1

                                          That’s tantalizing. Do you have any examples of a large task built out of little stages, each written in its own language?

                                          1. 2

                                            My previous reply was a bit sparse. Since I have a deadline coming up, and this is the perfect time to write detailed posts in the internet, here goes :)

                                            In an earlier incarnation, I was an engineer at Sun Microsystems (before the Oracle takeover). I worked on the iPlanet[1] line of web and proxy servers, and among other things, I implemented the command line administration environment for these servers[2] called wadm. This was a customized TCL environment based on Jacl. We chose Jacl as the base after careful study, which looked at both where it was going to be used most (as an interactive shell environment), as well as its ease of extension. I prefer to think of wadm as its own little language above TCL because it had a small set of rules beyond TCL such as the ability to infer right options based on the current environment that made life a bit more simpler for administrators.

                                            At Sun, we had a very strong culture of testing, with a dedicated QA team that we worked closely with. Their expertise was the domain of web and proxy servers rather than programming. For testing wadm, I worked with the QA engineers to capture their knowledge as test cases (and to convert existing ad-hoc tests). When I looked at existing shell scripts, it struck me that most of the testing was simply invoke a command line and verify the output. Written out as a shell script, these may look ugly for a programmer because the scripts are often flat, with little loops or other abstractions. However, I have since come to regard them as a better style for the domain they are in. Unlike in general programming, for testing, one needs to make the tests as simple as possible, and loops and subroutines often make simple stuff more complicated than it is. Further, tests once written are almost never reused (as in, as part of a larger test case), but only rerun. Further, what we needed was a simple way to verify the output of commands based on some patterns, the return codes, and simple behavior such as response to specific requests, and contents of a few administration files. So, we created a testing tool called cat (command line automation tool) that essentially provided a simple way to run a command line and verify its result. This was very similar to expect[3]. It looked like this

                                            wadm> list-webapps --user=admin --port=[ADMIN_PORT] --password-file=admin.passwd --no-ssl
                                            /web-admin/
                                            /localhost/
                                            =0
                                            
                                            wadm> add-webapp --user=admin --port=[ADMIN_PORT] --password-file=admin.passwd --config=[HOSTNAME] --vs=[VIRTUAL_SERVER] --uri=[URI_PATH]
                                            =0 
                                            

                                            The =0 implies return code would be 0 i.e success. For matching, // represented a regular expression, “” represented a string, [] represented a shell glob etc. Ordering was not important, and all matches had to succeed. the names in square brackets were variables that were passed in from command line. If you look at our man pages, this is very similar to the format we used in the man pages and other docs.

                                            Wadm had two modes – stand alone, and as a script (other than the repl). For the script mode, the file containing wadm commands was simply interpreted as a TCL script by wadm interpreter when passed as a file input to the wadm command. For stand alone mode wadm accepted a sub command of the form wadm list-webapps --user=admin ... etc. which can be executed directly on the shell. The return codes (=0) are present only in stand alone mode, and do not exist in TCL mode where exceptions were used. With the test cases written in cat we could make it spit out either a TCL script containing the wadm commands, or a shell script containing stand alone commands (It could also directly interpret the language which was its most common mode of operation). The advantage of doing it this way was that it provided the QA engineers with domain knowledge an easy environment to function. The cat scripts were simple to read and maintain. They were static, and eschewed complexities such as loops, changing variable values, etc, and could handle what I assumed to be 80% of the testing scenarios. For the 80% of the remaining 20%, we provided simple loops and loop variables as a pre-processor step. If the features of cat were insufficient, engineers were welcome to write their test cases in any of perl, tcl, or shell (I did not see any such scripts during my time there). The scripts spat out by cat were easy to check and were often used as recipes for accomplishing particular tasks by other engineers. All this was designed and implemented in consultation with QA Engineers with their active input on what was important, and what was confusing.

                                            I would say that we had these stages in the end:

                                            1. The preprocessor that provides loops and loop variables.
                                            2. cat that provided command invocation and verification.
                                            3. wadm that provided a custom TCL+ environment.
                                            4. wadm used the JMX framework to call into the webserver admin instance. The admin instance also exposed a web interface for administration.

                                            We could instead have done the entire testing of web server by just implementing the whole testing in Java. While it may have been possible, I believe that splitting it out to stages, each with its own little language was better than such a step. Further, I think that keeping the little language cat simple (without subroutines, scopes etc) helped in keeping the scripts simple and understandable with little cognitive overhead by its intended users.

                                            Of course, each stage had existence on its own, and had independent consumers. But I would say that the consumers at each stage could chosen to have used any of the more expressive languages above them, and chose not to.

                                            1: At the time I worked there, it was called the Sun Java System product line.

                                            2: There existed a few command lines for the previous versions, but we unified and regularized the command line.

                                            3: We could not use expect as Jacl at that time did not support it.

                                            1. 1

                                              Surely, this counts as a timeless example?

                                              1. 1

                                                I thought you were describing decomposing a problem into different stages, and then creating a separate little DSL for each stage. Bentley’s response to Knuth is just describing regular Unix pipes. Pipes are great, I use them all the time. But I thought you were describing something more :)

                                                1. 1

                                                  Ah! From your previous post

                                                  A line of Awk here, a line of Sed there, makefiles, config files, m4 files, Perl, the list goes on and on … If existing languages on your platform make something hard, hack the existing languages to support it. That is the principled approach.

                                                  I assumed that you were against that approach. Perhaps I misunderstood. (Indeed, as I re-read it, I see that I have misunderstood.. my apologies.)

                                                  1. 1

                                                    Oh, Unix pipes are awesome. Particularly at the commandline. I’m just wondering (thinking aloud) if they’re the start of a slippery slope.

                                                    I found OP compelling in the first half when it talks about PIC and the form language. But I thought it went the wrong way when it conflated those phenomena with lex/yacc/make in the second half. Seems worth adding a little more structure to the taxonomy. There are little languages and little languages.

                                                    Languages are always interesting to think about. So even as I consciously try to loosen their grip on my imagination, I can’t help but continue to seek a more steelman defense for them.

                                        2. 2

                                          Hmm, I think you’re right. But the restrictions a language imposes have nothing to do with how little it is. Notice that Jon Bentley calls PIC a “big little language” in OP. Lex and yacc were tiny compared to their current size, and yet Jon Bentley’s description of them in OP is pretty complex.

                                          I’m skeptical that there’s ever such a thing as a “little language”. Things like config file parsers are little, maybe, but certainly by the time it starts looking like a language (as opposed to a file format) it’s well on its way to being not-little.

                                          Even if languages can be little, it seems clear that they’re inevitably doomed to grow larger. Lex and Yacc and certainly Make have not stood still all these years.

                                          So the title seems a misnomer. Size has nothing to do with it. Rust is not small, and yet it’s interesting precisely because of the new restrictions it imposes.

                                        3. 3

                                          I use LPeg. It’s a Lua module that implements Parsing Expression Grammars and in a way, it’s a domain specific language for parsing text. I know my coworkers don’t fully understand it [1] but I find parsing text via LPeg to be much easier than in plain Lua. Converting a name into its Soundex value is (in my opinion) trivial in LPeg. LPeg even comes with a sub-module to allow one to write BNF (here’s a JSON parser using that module). I find that easier to follow than just about any codebase you could present.

                                          So, where does LPeg fall? Is it another language? Or just an extension to Lua?

                                          I don’t think there’s an easy answer.

                                          [1] Then again, they have a hard time with Lua in general, which is weird, because they don’t mine Python, and if anything, Lua is simpler than Python. [2]

                                          [2] Most programmers I’ve encountered have a difficult time working with more than one or two languages, and it takes them a concerted effort to “switch” to a different language. I don’t have that issue—I can switch among languages quite easily. I wonder if this has something to do with your thoughts on little languages.

                                          1. 2

                                            I think you are talking about languages that are not little, with large attack surfaces. If a language has a lengthy man page, we are no longer speaking about the same thing.

                                            Small configuration DSLs (TOML, etc), text search DSLs (regex, jq, etc), etc are all marvelous examples of small languages.

                                            1. 1

                                              My response to vrthra addresses this. Jon Bentley’s examples aren’t all that little either.[1] And they have grown since, like all languages do.

                                              When you add a new language to your project you aren’t just decorating your living room with some acorns. You’re planting them. Prepare to see them grow.

                                              [1] In addition to the quote about “big little language”, notice the “fragment of the Lex description of PIC” at the start of page 718.

                                              1. 1

                                                What, so don’t create programming languages because they will inevitably grow? What makes languages different from any other interface? In my experience, interfaces also tend to grow unless carefully maintained.

                                                1. 2

                                                  No, that’s not what I mean. Absolutely create programming languages. I’d be the last to stop you. but also delete programming languages. Don’t just lazily add to the pile of shit same as everybody else.

                                                  And yes, languages are exactly the same as any other interface. Both tend to grow unless carefully maintained. So maintain, dammit!

                                        1. 7

                                          An alternative to the pseudo-tty-pipe program would be stdbuf -oL, it uses LD_PRELOAD instead of creating a new pty and then calls setvbuf to change the buffering mode of standard streams. Not a very elegant way, but it works.

                                          1. 1

                                            Perhaps the maintainers of GNU utilities should follow the security vulnerabilities fixed by the BSDs, and treat those as if they are security vulnerabilities in the GNU implementation too unless proven otherwise.

                                            1. 1

                                              The argument seems vaguely interesting but it is basically a popularization of an academic article that is linked-to but turns out to be behind a paywall, which makes the whole exercise rather useless.