1. 37
  1.  

  2. 15

    You may consider learning perl instead:

    • the programs are as terse as awk
    • much more flexible
    • wide set of libraries
    • as widely available

    remember the power of perls ‘while(<>) func’ which hides command line arg parsing, stdio handling, and per-line loop. i highly recommend perl cookbook, you will be amazed how practical it is. dont allow to grow your perl(or awk) programs more than 10 lines long - they become a pain to maintain as they grow.

    1. 8

      I know perl, but found that awk was much more likely to be available and it’s much faster for quick scripts because there is less overhead in running the binary. i’ve moved most of my muscle memory to reach for awk instead.

      1. 6

        Awk does have the advantage of being a smaller language. You can understand awk enough to do useful things with it in an afternoon. Perl is Byzantine in it’s complexity as a language, which always scared me off using it for one-liners

        1. 3

          This is true especially when reading Perl scripts written by someone else. On the other hand, learn Perl enough to be able to write useful oneliners and process text streams, is quite easy.

          1. 2

            You can understand awk enough to do useful things with it in an afternoon.

            I did that with Perl and a bunch of other languages at various times. The trick is you learn just the subset you need for structured programming, basic I/O, and whatever data format you deal with. Much tinier. Just cuz it’s there doesn’t mean you have to use it.

            For Perl, I also had to learn regular expressions. They kept paying off outside of Perl, though.

            1. 1

              Just cuz it’s there doesn’t mean you have to use it.

              Yeah, but finding a useful subset of Perl means I have to learn enough Perl to know what a useful subset would be, where awk is already that useful subset.

              Doesn’t mean you can’t approach it like that, for sure, but I was pleasantly surprised how easy awk was to pick up when I decided to try to learn it a while back.

            2. 1

              Perl’s most ardent users are the language’s worst enemy ;) The downside of TMTOTDI[1] is that experienced Perl hackers settle into a set of personal idioms that they are comfortable with, but that others may not be.

              Bondage and discipline languages with a much stricter focus on what’s “officially” idiomatic, like Python, don’t have this problem, and neither do small, focused languages like AWK.

              [1] “There’s More Than One Way To Do It!”

            3. 4

              One of the reasons I prefer Perl is it’s portability between BSD and Linux. Sadly, this isn’t the case with AWK due to different implementations.

              1. 2

                scripts written for the One True awk (which most BSDs use) should work with gawk.

                1. 1

                  There is also GNU awk as package.

                2. 3

                  I used to think this, but after stuff like this I exited.

                  1. 1

                    Update: That diff is not very clear, but here’s the issue from another repo I own that triggered the patch: https://github.com/akkartik/wart/issues/5

                    1. 1

                      Sad that enabling warnings caused this, it’s usually a given when writing scripts.

                      1. 2

                        It was a warning for a few minor versions, and then an error at some minor version.

                        1. 3

                          Wow, a lot of sotware was affected by this, based on this google search. Looks like an Autotools artifact? Someone wanted to avoid using / as delimeters?

                          Edit for this specific use case, I think sed and AWK are a better fit…

                  2. 8

                    No one has mentioned The book on AWK so I include it here - The AWK Programming Language - book authors are the creators of AWK (Alfred Aho, Peter Weinberger, and Brian Kernighan). Also, some people forget that AWK is an actual programming language (Turing-complete), albeit domain-specific.

                    1. 1

                      I have this book and reading it has been more useful than anything on the internet. Really worth picking up a used copy for a couple bucks.

                    2. 6

                      AWK can be good for prototyping an idea, but you (very) quickly run into its limitations. No typing, no array literal, functions arent first class citizens, cant pass arrays by value, no imports. Its even missing basic functions like array length.

                      But biggest negative is the myriad implementations: NAWK, MAWK (2 versions), GAWK. Makes it very difficult to write portable code.

                      1. 6

                        AWK can be good for prototyping an idea, but you (very) quickly run into its limitations.

                        If I consider when AWK was created (1977), I must say, that it is incredibly well designed and successful piece of software. Yes, it is sometimes ugly, sometimes limited …but it is still in use after 44 years! God bless Alfred, Peter and Brian.

                        (regardless we usually use the GNU implementation, it is still based on the original idea and language)

                        AWK and classic unix approach is quite limited when it comes to structured data. But we can push it bit further and improve by borrowing ideas from relational data model – and still use classic tools like AWK.

                        1. 3

                          funny enough, gawk’s –lint option will let you know what constructs are gawk (not posix awk) specific which helps with your biggest negative case. if you use vim, ALE for (g)awk scripts will highlight them inline.

                          1. 3

                            AWK can be good for prototyping an idea, but you (very) quickly run into its limitations.

                            A good programmer can work around these limitations. Just look at dwatch(8) on FreeBSD. Heavy use of awk.

                            https://svnweb.freebsd.org/base/head/cddl/usr.sbin/dwatch/

                            Or how about an HTTP caching proxy in gawk?

                            https://pastebin.com/raw/Fmf1Fu4b

                            1. 7

                              A good programmer can work around these limitations.

                              “Should they?” is a better question. They’re better off using a powerful tool that doesn’t limit them. Then, limit their use of it to what they need for maintainability. Subsets, DSL’s, and so on.

                              1. 3

                                Shell script with embedded awk in functions paired with fd redirection AND eval’ed sudo. That looks like a maintenance nightmare for anyone who’s not the original author.

                                1. 1

                                  It was reviewed and signed off by three other core developers, so I don’t think that’s going to be a problem.

                                  https://reviews.freebsd.org/D10006

                              2. 2

                                I don’t write awk for work, more so for pleasure, and its limitations can make it fun to use. It clearly was influential on the languages we use today and it would be interesting to see a programming historian trace that lineage.

                              3. 4

                                I used to use awk a lot, but nowadays for anything bigger than a one-liner I use Go instead.

                                1. 2

                                  Can go really be used for throw away scripting really ? I usually resort to awk/sed scripts for parsing chunks of approximately sorted data, but only come up with a successful solution after a bunch of trial and errors agaimst my data. Wouldn’t it be tedious in go, considering that you add recompilation to the process ?

                                  1. 3

                                    Recompilation is instant, though, and if you use go run, you don’t even have to be aware of the generated binaries.

                                    I mean, once you move past the one-liner stage, you write some code in a text editor. It doesn’t make much of a difference in terms of overhead if that code is written in Go or awk. It’s just some code in a file that you call through a one-line invocation in both cases.

                                    The main feature of awk is that it automatically parses input based on lines and fields. However, sadly that’s rarely enough, and making awk parse other kind of input is usually more difficult than parsing the same kind of input in Go.

                                    1. 1

                                      it’s not bad actually. I choose go when it’s something I’m going to need to deploy to our cluster where python would be a packaging mess and the “script” would be too difficult to actually write correctly in shell. It generates a single binary that you can just scp around and if you have a project skeleton already setup that you can copy and paste you can get arg parsing and all kinds of other things for free.

                                  2. 3

                                    Some people may think “why should I learn awk today when there is Python, Go, etc”. But there are lots of tasks (even big ones) that can be done easier with this tool. The fact that most OSs have a POSIX compliant version of it in the base system also makes it very valuable, not just for the people crunching data but also to sysadmin/DevOps people.

                                    1. 3

                                      It’s even more useful with hashes: …

                                      It should be pointed out that the container[index] structure is not a real hash, but usually (from what I know) an associative list. The gawk man page says:

                                      Arrays are subscripted with an expression between square brackets ([ and ]). If the expression is an expression list (expr, expr …) then the array subscript is a string consisting of the concatenation of the (string) value of each expression, separated by the value of the SUBSEP variable. This facility is used to simulate multiply dimensioned arrays. For example:

                                               i = "A"; j = "B"; k = "C"
                                               x[i, j, k] = "hello, world\n"
                                      

                                      assigns the string “hello, world\n” to the element of the array x which is indexed by the string "A\034B\034C". All arrays in AWK are associative, i.e., indexed by string values.

                                      1. 3

                                        Only real complaint about the article is the claim that awk has perl like regular expressions; they are actually extended regular expressions. think grep -E.

                                        That being said, awk can definitely help make a 5-6 command deep pipeline and make it more like 2 or 3. That alone should be a big motivator to use more awk over tools like sed or grep. Not only that, awk at least behaves and looks like a normal imperative programming language unlike some of the other strange posix defined DSLs.

                                        1. 2

                                          I am fairly sure that a “Why learn SED?” could be as relevant as this one. Interesting article, though a bit too much opinionated for my tastes.

                                          1. 3

                                            it does set the seen for the tutorials 1, 2, 3, and finishing off with tricks

                                          2. 2

                                            “Imagine programming without regular expressions. Can you even imagine the alternative? Would it entail building FSMs from scratch?”

                                            I covered/countered that with these examples back when I was trying out StackExchage network. There’s all kinds of DSL’s and libraries for data/program transformation that might apply.

                                            “Available everywhere.”

                                            This is the best reason to learn both Awk and other standard stuff.