1. 56
    1. 11

      It’s secretly Forth (Joy/Retro Forth, oooh), but for Unix pipes! It’s such a shame there’s no room in the world for new command line tools/shell builtins designed for shell scripting unless POSIX adopts them, because you’re more likely to plonk a full programming language into your enterprise environment or use POSIX+what’s already available… If you’re installing a new tool but doing shell-y things, you’d tend that tool to be Python/Ruby/Raku, to cover all bases. But this tool is really sweet, the only thing it’s missing is cryptic symbolic imagery like & for readf.

      1. 19

        Debian, Red Hat, Linux, and GNU have a lot more influence over what’s common or idiomatic than POSIX. (ie look at what’s in BSD ports)

        POSIX is not well maintained, and hasn’t been for a long time. It was funded during the Unix wars by Unix vendors

        Those vendors no longer support or care about Unix. They have all moved onto Linux, or they don’t exist

        I think when people say “POSIX” now they largely mean something else, something that has been memed but isn’t really true

        In particular, if people like and use dt, it will become common just like curl or jq have

        It has nothing to do with POSIX. Are you not using curl or a similar tool in your shell scripts?

        1. 12

          After taking the time to think about what you’ve said, thanks for talking some sense into me. You are right.

          1. 8

            Thanks for the response … I think there are a lot of memes about POSIX floating out there, to the point that I noticed ChatGPT even picked them up :)

            But if we look at the history of Unix, and the current state of things, it doesn’t support the memes. I might write a little history of Unix to clear up several misconceptions I’ve seen (Unix vs. Linux, etc.)

        2. 5

          Thanks, I’m generally optimistic that it could be useful and be used!

          I’ve also made an effort to have zero dependencies and statically compile to increase portability, although I may take some dependencies to handle things like utf-8 properly.

          1. 3

            One library I was looking at was this, used by Julia: https://juliastrings.github.io/utf8proc/

            Seems like it could be pretty small and self-contained? (Let me know, as I might use it for Oils :) )

            1. 1

              Ooh thanks, I’ll definitely take a look!

              I’m tentatively also looking into “zigstr” since it might fit into my current Zig implementation without needing to FFI over the C ABI:

              1. 1

                (Not that I’m at all opposed to C or C++ of course! Also vice versa, Zig can also produce C ABI .so or .a artifacts for libraries)

        3. 4

          LSB was somewhat successful pushing in this direction. It would be great if some of the most common and useful 2010+ tools were adopted into a similar effort.

          1. 3

            LSB?

            And by new tools, I think things like ripgrep, fzf, the new variants of top, more aligned with modern multicore stuff etc are getting enough traction.

            1. 4

              LSB stands for Linux Standard Base. It was trying to be an analog of POSIX for the 2000s. It was a mess overall but then again so was POSIX. These amalgamations of software always seem to run into an organizational self interest problem where they succeed in part of their mission but then become both unnecessary and unwilling to shut themselves down. Nevertheless they also seem to be the best of a bunch of bad options for establishing new baselines.

          2. 3

            LSB was already dead-on-arrival when containers arrived in ~2013, and my impression is that both VMs and containers put the nail in the coffin (I’m guessing 90%+ of people on this site don’t even know what we mean by “LSB”)

            i.e. in most contexts, it’s easier to ship your new tool in a Debian container than to rewrite it using LSB-only utilities. Not to mention it’s sometimes impossible to rewrite!

            Generally people write software by testing, not by reading documents, so if any of those standards want to gain traction, they have to provide a testable concept of conformance

            “It would be great” means “nobody is going to do this” :)

            1. 4

              I see it as much more of a “yes, and” situation. POSIX, LSB, etc were aiming to raise the floor of long-term support for applications. Containers are great but they are not a panacea, especially for interactive tools used at the command line or used by cross-platform shell scripts.

              POSIX came about because UNIX engineers in the mid 1980s were saying “It would be great if my shell scripts could make more assumptions about the system”. LSB came about because commercial software vendors in the late 1990s were saying “It would be great if our desktop and server software could make more assumptions about the system”. So to the extent that both POSIX and LSB were successful in raising the floor, I expect another standardization effort to come about. “It would be great” just means “there’s a market here for anyone willing to pursue it”.

              It likely won’t look the same because the players aren’t looking for parity in the same domains. Open source has completely taken over since the last wave too. However lots of application developers and script authors would like to be able to make more assumptions, especially cross-platform assumptions. They aren’t well served by containerization. Even if you can ship your own software in a container and solve the direct dependency assumption problem, you’re still stuck with a fractured set of options for how to run that container and how that can be on a user’s shell path.

              1. 6

                One sign that LSB didn’t raise the floor is that Debian dropped support for it in 2015 - https://wiki.debian.org/LSB

                What modern systems support it?

                I am not saying this is DESIRABLE – I would like something like LSB to exist. I’m just saying it’s not reality

                Continually mentioning POSIX on lobste.rs threads does not cause POSIX to standardize curl or jq, not to mention containers or async I/O :)

                The way that has historically happened is for interested vendors to assign skilled engineers to the project. There is no inkling of that happening. It doesn’t happen by magic.

                POSIX is basically in maintenance-only mode, and I don’t see much evidence that LSB exists anymore


                OK let’s look at Wikipedia, which is often helpful - https://en.wikipedia.org/wiki/Linux_Standard_Base

                The standard stopped being updated in 2015 and current Linux distributions do not adhere to or offer it; however, the lsb_release command is sometimes still available.[citation needed] On February 7, 2023, a former maintainer of the LSB wrote, “The LSB project is essentially abandoned.”[5]

                That article reminded me that LSB chose the Red Hat package format. This is another sign of specs being out of touch with reality, and maybe even a sign that the problem couldn’t have been solved, even in principle

                1. 4

                  POSIX is basically in maintenance-only mode, and I don’t see much evidence that LSB exists anymore

                  Well, there do seem to be some long overdue features getting standardized in the next revision.

                  1. 3

                    Ahh nice list … It’s interesting/cool that $'\n', brace expansion, and {fd}> are getting standardized!

                    I wonder when the last time a new command was standardized ?

                    Maybe there is a requirement that it have multiple implementations, and I guess not many commands do.

                    ninja is one command I can think of, and it’s at the foundation of many systems.

      2. 8

        Thanks!

        It’s not too much of a secret, Forth heads like Chuck Moore and Leo Brodie have definitely had an influence on me here. (Also others from the concatenative-multiverse like von Thun, Pestov, Ertl, and Purdy)

        As a proof of concept, I also defined a script that defines a bunch of Joy combinators and launches a REPL: https://github.com/booniepepper/dt/blob/core/demos/joy-combinators.dt (Maybe later I’ll figure out how I want to handle libraries and optionally-loaded code, for now this is just a toy)

        the only thing it’s missing is cryptic symbolic imagery like & for readf

        The language is somewhat hackable already, you could define & and plenty of single-symbol “commands” to do whatever you want! Although readability can suffer of course. My only control here is that I’ve intentionally made parsing dt code evaluate in a strict left-to-right order, and avoided some forth-isms like pushing/popping a return stack, or allowing a ' (tick) operator that can semantically access values “to the right”

        1. 5

          FWIW I took a closer look at the language. I generally like it, but one issue is that it conflicts syntactically with shell

          $ echo [3]
          [3]
          
          $ touch 3
          $ echo [3]
          3
          

          That is, I don’t think you want dt to get different arguments based on whether there’s a file 3 on the system. The [] in POSIX shell is for glob character classes, and unfortunately it “decays” to the literal [3] when there are no files that match.

          Ways to solve it

          • it should always take a single string like python -c, perl -e, awk
          • It should always take argv words, like test and find
            • in that case [ would have to be its own word, which it is for test

          I would choose one style or the other, and not have a mix of both styles, which seems to be the case in the docs right now.

          i.e. this section has a mix of unquoted and quoted styles, which I think is confusing - https://dt.plumbing/#quick-start

          1. 3

            Doesn’t basically everything have this issue too (sed, awk, grep, jq all have variants)? I’d expect the program would be quoted

            1. 3

              Yeah, the most common style is -e 'single quoted program', and also you want -v NAME=value like awk for var substitution (sed is notably missing this)

              This results in safer string substitution vs. the shell doing it with something like awk -e "x == $NAME". That leads to string injection problems (meant to write a blog post about that)

          2. 3

            That was also something I was wondering about - as I read it, I thought “but can that even work?”. I didn’t know about the decaying to literal stuff, just that it was a type of glob!

            1. 2

              Yeah actually shopt -s nullglob turns off the “decay”, which is another reason not to use unquoted [] in such a language:

              $ shopt -s nullglob
              $ echo [3]
              3
              
              $ echo [4]
              <nothing because file 4 doesn't exist>
              
          3. 2

            Thanks for taking a look!

            I didn’t know about the bracket form of substitution. I’ll have to think that through, there are other footguns like using unescaped * for multiplication but turning into a glob, or same with the ? character

            Also 100% agree, I need to standardize the conventions. I’m in process of writing up a proper tutorial and user guide (probably to include a style guide)

        2. 2

          It’s alright, that was a little joke, towards it’s perl-ness. I think avoiding push/pop as a hard & fast rule was a great design decision, and I’ve always hated forward parsing in Forth.

    2. 5

      Ohhh! Another concatenative programming language! I am fascinated by the concatenative paradigm and I ended up created my own little language, too (min).

      One thing I noticed is that I thought I wanted to be minimalist (hence the name) I ended up adding a lot of stuff to it, like regular expressions, dictionaries, cryptography, network and HTTP support… and that gets complicated real quick. But I was able to use it to build more complex application with it, and use it as a backend for some web apps, too.

      For the record, I also forked my own language and created mn, which is very similar to dt, as it has a more limited feature set focused on shell programming.

      What are you planning for dt? Any plan to implement more concatenative combinators? I checked the standard library and I noticed that apart form the common things like swap, dup, etc. you didn’t implement the more exotic ones that Joy provides like linrec or similar.

      The problem with concatenative languages is that they tend to get pretty much unreadable and difficult to debug really quick. I tried to address this in min in different ways, from encouraging the usage of variables to more elaborate constructs for defining new symbols in a more standard way, added some type checking… but that bloated the language.

      My advice: keep it small and focused! Oh, and get it added on concatenative.org ;)

      1. 3

        In my experience this is where Forth and concatenative languages in general diverge. If your Forth is getting unreadable you’re writing words that are too long or doing too much, and using more than one or maybe two variables is a sign that you’re trying to do too much because you could be taking better advantage of the stack. But with concatenative languages, coming from a more theoretical background, love scoping, using functional data structures etc. and these layers of complexity remove the simplicity and accessibility of Forth, necessitating these longer word definitions, necessitating complexity.

        1. 3

          I agree in general that too-long definitions are a code smell in concatenative languages, and I think Forth folks are correct to insist on many short definitions in this paradigm.

          I think with concatenative languages, (also array languages like APL) you have to measure complexity in terms of “words” instead of like other paradigms. Also you have to think about things like “phrasing” or “chording” and grouping words together in a way that’s hard to explain except by “feel.” In other languages you’d make similar measurements but in terms “lines” or “cyclomatic complexity” (layers of nesting) and vertically break up lines into chunks in a similar way that Forth devs will break up words horizontally. But… maybe we just have not yet seen the right IDE experience yet that makes it all more clear what’s going on!

          (Side note: dt needs an idea of hiding information in modules… and also needs modules! But maybe this is just a documentation problem - it’s possible even in current restraints with just getting scoping figured out)

      2. 3

        Hey min and mn were inspirations! Thanks for chiming in

        What are you planning for dt?

        The goals are to be easy-to-grok, useful, and performant, primarily in the context of Unix-style pipes. For me that means if there’s some obvious other thing for network calls (curl, wget) then my philosophy should be … just pipe into those! It’s very much intended to be duct tape for patching small holes, not exactly a batteries-included do-anything machine. I think at this point that does mean small and focused.

        If people like the experience of concatenative languages then I’d be happy if dt is just the gateway to other languages like Forth, Factor, min, Kitten, PostScript or GhostScript, and many others, which are all capable of building general purpose software. I’m exploring the idea of a bigger general purpose concatenative language or VM, but it would not be dt.

        Any plan to implement more concatenative combinators?

        I don’t think as part of the core language. The easy-to-grok idea is part of why I have variable binding fairly prominent. I have played with some more exotic combinators (See: joy-combinators.dt) but I don’t think these are very accessible – I might get them into dt anyway, but make them opt-in and require some kind of “include” or something that I haven’t implemented yet.

    3. 4

      This seems super cool. Is there some page with less trivial dt programs? I want some ideas for when I should use this

      1. 3

        Thanks!

        There are not many examples, and I expect dt to primarily settle into a role similar to awk or perl one-liners

        One example I have of a shebang program is this, which prints n lines of numbers from the fibonacci sequence

        https://github.com/booniepepper/dt/blob/core/demos/fib.dt

        On my machine this runs roughly twice as fast as a previous Haskell-based GHC-compiled fibonacci CLI I implemented before: https://github.com/booniepepper/fib (Although the dt version will hit a 64-bit integer overflow if the numbers get big enough! …considering a big-integer implementation)

    4. 4

      This looks cool, but the semantic model looks like it pulls everything into memory to process it, would that be correct?

      Still handy for jobs that fit in memory, ofc, but gets a little awkward for larger things, but that may not be a goal for you.

      1. 2

        This is the current behavior, and it’s easy to grok, but I agree it doesn’t scale well. I’m still thinking through how I want to model streaming for larger or longer-lived processes. See also this thread

    5. 3

      Oddly, I was just yesterday thinking about the similarities between concatenative programming and the shell-style pipe operator. In the C++20 ranges library you can use “|” as syntactic sugar for function composition, e.g. stuff | a | b instead of b(a(stuff)). Which is closer to concatenative syntax.

      The difference is that in a shell the data type being passed around is a FIFO (infinite byte stream), while in concatenative languages it’s LIFO (a stack).

      1. 2

        The underlying behavior is different, but in practice concatenative programming looks and behaves very similar to pipe-style programming. There really isn’t that big of a difference. It’s how I conceptualize words when writing FORTH and it makes it a bit easier for me.

        1. 3

          I think Andy Chu put it best when he said “Shell Has a Forth-like Quality” https://www.oilshell.org/blog/2017/01/13.html

          I haven’t quite figured out how I want to resolve FIFO streams vs LIFO stacks in the long term, this has big implications in long-running use cases where you do want everything in a pipe to be streaming, but I think/hope that for many simple use cases it doesn’t matter.

          1. 2

            a looping construct would probably be sufficient to turn a LIFO stack into a FIFO stream

            1. 1

              True. I also have tail calls optimized, so the implementation itself is as simple as

              [ read-line    <do the stuff>    go ] \go def
              go
              

              which is exaclty how the REPL is implemented. What I need to think about is how exactly I want to expose that in a way that feels intuitive

              (A very long-running process would also need an update to the naive memory model which is just arena allocation right now)

    6. 2

      I would definitely suggest using printf instead of echo in examples, especially examples with newlines in what is to be printed, which is not always supported by echo

    7. 2

      Lots of RPN around these days