1. 55
    1. 16

      what can be scripted with AWK, should be scripted with AWK (over Python, Ruby, Node.js, etc.). I’m not saying that you should write large applications, but for medium scripts, AWK is an absolutely fine alternative to mainstream scripting languages with many benefits.

      I’ve been exploring this exact idea for the past few weeks.

      For years, I’ve been unhappy with Python as a scripting language. I also lost interest in Python for building applications. So unless I need one of the killer apps (e.g. machine learning libraries), what’s the point?

      20 years ago I bought into “if you know Perl/Python then you don’t have to know bash/sed/awk”, but that was a scam. It took me 5 minutes to learn AWK, I immediately used it to do something I would’ve done before with Python, but with less code and less friction.

      The use case of AWK for Unix-y text processing, etc. is well accepted, but I’m definitely interested in using it for small apps, where I would’ve used Python for NodeJS in the past.

      I’d rather have a “real” scripting language, with no features, no package management, nothing to learn. And when I need more (type checking, richer syntax, libraries, etc.), I’d rather have a “real” language, not Python with bolted on type annotations pretending to be a good application language.

      1. 12

        I went down this path several years ago – I was unhappy with Python for scripting and shell-like tasks, e.g. startup time, dependencies, and also Python 3 unicode. (It’s arguably better for HTTP apps, but clearly worse for the file system.)

        So I started writing some tools in Awk.

        It was OK, but the only thing I use Awk for now is stuff like this:

        awk '  { sum += $1 }  END { printf("%.1f", sum / 1000000) }  '
        

        I wrote some longer awk scripts like this: https://github.com/oilshell/oil/blob/master/test/spec-cpp.sh#L100

        which eventually results in this table of test results: https://www.oilshell.org/release/0.14.2/test/spec.wwz/cpp/osh-summary.html

        But these days I would just write that in Python. I found awk to be pretty limited. It’s missing bazillions of string, integer, float, list, and dict functions that Python has.

        The syntax is better than shell, but it has a different way of doing everything which is annoying, e.g. opening files vs redirects, print vs. printf vs. echo, etc.

        Basically I’d rather use shell + Python than shell + awk + Python. Because shell + awk isn’t powerful enough.


        More fundamentally, both shell and Awk lack garbage collection, so you can’t do simple things like return an associative array from a function.

            $ awk 'function f() { a[1]=2; return a } BEGIN { f() }' </dev/null
            awk: cmd. line:1: fatal: attempt to use array `a' in a scalar context
        

        You can’t nest associative arrays either!

        These things are trivial in Python, so I don’t think awk is a good language for bigger programs, or even medium-sized ones.


        There’s a saying that Python is the second best language for everything. I’d agree it’s the second-best shell-like language AND the second-best awk-like language, which makes it quite powerful. (Python dicts are better than the equivalents in JS, PHP, arguably Perl)

        BUT I did not really want to settle for second best, hence https://www.oilshell.org/ . Many years ago I wrote these posts and still agree:

        ( Similarly to Awk, I also wrote 3 makefiles from scratch starting around that time. I mentioned here how I came around to Ninja instead: https://lobste.rs/s/mtw9kb/makefile_websh_tconfig_json_js )

        The old tools like Awk and Make have a lot of good qualities that we don’t see in modern tools, but they are also unloved and unimproved.


        On the other hand, Oil now has a garbage collector :) https://www.oilshell.org/blog/2023/01/garbage-collector.html

        Thus it can be a more powerful language with real data structures.

        So to anyone who got this far, and wants the best language for shell- and Awk-like stuff, not the second best, then check out https://github.com/oilshell/oil/wiki/Contributing and ask questions on https://oilshell.zulipchat.com/ :)

        Anyone who knows either shell or awk, and has worked on some kind of tree interpreter (e.g. toy Lisp), should be able to contribute. Most of the code is in Python with MyPy types.

        Latest blog post has the status of the project: https://www.oilshell.org/blog/2023/03/release-0.14.2.html

        You can even be paid to work on it, we paid 50K euros total to 4 contributors in the last year, and have another 50K starting now.

        It seems from the renaming thread that there’s a lot of interest in the newer YSH/Oil language.

        It seems like you’re encountering the same “missing language hole” that I did several years ago (although of course you may not agree with the conclusion)

        1. 3

          Another tidbit after trying awk: the match() function from GNU awk lets you extract a subgroup, and is extremely useful, but not present in other awks.

          It’s basically the same as my_re.match().group(X) in Python. I remember someone else here mentioning that too.

          So I’d say POSIX awk is even more limited, and GNU awk (gawk) has stuff you need (as usual)

          Incidentally that primitive is also something that ripgrep has, but grep doesn’t.

        2. 2

          Thanks for sharing your experience with it. I’ll check out Oil.

          I’m definitely going to continue down the AWK + shell path to find out for myself, but I am going into it seeing that there probably is a language hole, like, “this project is too big for AWK, better write it in Go…”, and I’m guessing that is where Oil fits into the picture.

          My knowledge of both Powershell and Oil is limited, but that’s the comparison I’m thinking right now. Powershell is a modern scripting language, you have objects instead of just strings, there is probably a lot less weird syntax than in shell scripts, etc.

          Is Oil the open-source, POSIX competitor to Powershell?

          1. 5

            Awk is very fun and small, so yeah I definitely don’t regret learning it. It’s just that I have a limited number of languages I can remember, and one of them has to be Python.

            Yes Oil can be compared to PowerShell – adding structured data, but in a more Unix way with text, not a Windows way.

            Or really a .NET way, because everything lives inside a .NET VM. Oil’s “VM” is the Unix kernel itself ! :)

        3. 2

          You can’t nest associative arrays either!

          I doubt that GC absence imposes this restriction. Gawk does have nested arrays, obviously:

          $ gawk 'BEGIN { a[0][0] = "hello"; a[0][1][0]="world"; a[1]=123; print typeof(a), typeof(a[0]), typeof(a[1]), typeof(a[0][0]) }'
          array array number string
          
          1. 2

            Hm interesting, I can reproduce that

            But try this

            $ gawk 'BEGIN { a[0]=1; b[0]=a }'
            gawk: cmd. line:1: fatal: attempt to use array `a' in a scalar context
            
            $ gawk --version
            GNU Awk 4.1.4, API: 1.1 (GNU MPFR 4.0.1, GNU MP 6.1.2)
            

            The equivalent works in Python.

            I believe this is basically because Awk doesn’t have pointers/references like Python. It has a model where each cell is “owned” in a restrictive fashion, so it can be de-allocated without GC.

      2. 4

        python (and perl) are also nice when you need a library to talk to a thing (thing being a device or a service etc.) because there is very likely already a library to do that. In AWK probably not.

      3. 3

        I went down the path of writing a full script in AWK instead of Python too. AWK has proven very valuable to run the script anywhere, MacOS or GNU/Linux, in particular in CI without installing anything.

    2. 7

      My impression many years ago was that Awk looked a little too much like C outside the application domain, but reading over the source code for translate-shell, it doesn’t look that bad actually. I appreciate the hot take about using it over other tools, because it is already there. Not sure I’m ready to enact it myself though.

      I do use Awk as a power tool from the shell, a lot. I often wish that it could handle CSV files natively, since then I wouldn’t need to use like Python to do something simple, but CSV files often have embedded quotes and whatnot that require a stronger parser than just awk -F,. My sense has been that if you want to do this kind of record-based programming with Awk, but it can’t parse it with the field separator, you might as well go somewhere else. Perhaps someone here knows better?

      1. 6

        Perhaps you might find Miller or rq (Record Query) useful?

        1. 3

          Thank you! I have not heard of these!

      2. 5
        1. 2

          This looks really interesting, thank you!

          1. 4

            There’s an interview with Brian Kernighan that’s mostly about AWK here: https://www.youtube.com/watch?v=GNyQxXw_oMQ

            At 8m25s he mentions that he’s added a “quick and dirty” hack to the original AWK to handle CSV. You can see that on Github here.

            It hasn’t been merged into master yet though.

      3. 4

        Another tool recommendation: I very much like xsv for CSV-mongling.

      4. 2

        https://github.com/ezrosent/frawk supports CSV as well

    3. 5

      There are people doing really crazy things with awk, like this one: https://github.com/patsie75/awk-demo

    4. 4

      My awk usage has been firmly in the one-liner arena to transform or analyze text as part of a pipeline with other shell commands: it’s been my go to for years. Recently- after banging my head on a tricky substitution and considering the resulting monstrosity- I resolved to use perl with this sort of work for a while.

      I’m a couple of months in and really happy with that decision.