1. 46
  1.  

  2. 4

    Nice introduction (based on both tutorials). One suggestion would be to use

    -v FPAT='[^,]*|"[^"]+"'
    

    instead of

    BEGIN { FPAT = "[^,]*|\"[^\"]+\"" }
    

    If you get the awk programming language manual…you’ll read it in about two hours and then you’re done. That’s it. You know all of awk.

    I can’t work my head around this quote. That’s a ridiculous claim. Even for a experienced programmer, learning a new programming language in 2 weeks, let alone 2 hours would be nothing short of a miracle. I’ve been using awk for past 2-3 years or so and I wrote a book on GNU awk one-liners earlier this year (https://learnbyexample.github.io/learn_gnuawk/). I’m no where close to knowing all of awk

    1. 6

      I’m no where close to knowing all of awk

      Awk or GNU Awk? Awk itself is very small and simple. If you have mawk on your system, then man mawk will probably outline the entire thing (about 1000 lines). I have a copy of The AWK Programming Language from 1988 and it’s a thin book that’s easy to digest.

      GNU Awk has so many extensions you can’t possibly learn it in any reasonable amount of time, and frankly, most of those extensions are not all that useful. Awk is stunningly great for simple text processing and incredibly useful, but as a general purpose programming text processing language, which what GNU Awk seems to be going for, it’s subpar. You are better off with something like Perl or Python because data structures and functions in Awk kind of suck. (I swear the extensions in GNU Awk were added to deal with the bizarreness of some of the GNU builds where a tool was used, the situation changed that really called for a different tool, but for whatever reason the existing tool was extended in unnatural ways to deal with it.)

      1. 1

        Yeah, I had GNU awk in mind when I meant all of awk. Just glanced through man mawk and it is short indeed. I’d say I’m reasonably familiar with most concepts. I know getline for basic usage, but tend to avoid because of caveats (see http://awk.freeshell.org/AllAboutGetline). Also, my awk usage is limited to one-liners for most part, so I haven’t bothered to learn about functions (which has significant spaces in syntax).

        I’d disagree with your take on GNU awk extensions. Many of them are useful for one-liners too. For example: FPAT, multicharacter and regexp based RS (plus RT), FIELDWIDTHS, in-place editing, BEGINFILE, ENDFILE, array sorting with PROCINFO, 4th argument to split, 3rd argument to match and so on.

        And I do use Python or Ruby these days if I need to write a program file instead of one-liners.

        1. 1

          I’d disagree with your take on GNU awk extensions.

          The stuff you cite can be useful (although I use Awk pretty much every day and I think I’ve used FIELDWIDTH once and that’s about it), but then there’s the other stuff.

          And that’s not even all of it.

          1. 1

            sort

            Is possible to implement sort() in awk (this is a quicksort):

            function swap(array, a, b,
                    tmp)
            {
                    tmp = array[a]
                    array[a] = array[b]
                    array[b] = tmp
            }
            
            function sort(array, beg, end)
            {
                    if (beg >= end) # end recursion
                            return
                    a = beg + 1 # 1st is the pivot, so +1
                    b = end
                    while (a < b) {
                            while (a < b && array[a] <= array[beg]) # beg: skip lesser
                                    a++
                            while (a < b && array[b] > array[beg]) # end: skip greater
                                    b--
                            swap(array, a, b) # found 2 misplaced
                    }
                    if (array[beg] > array[a]) # put the pivot back
                            swap(array, beg, a)
                    sort(array, beg, a - 1) # sort lower half
                    sort(array, a, end) # sort higher half
            }
            

            This sorts the array values using integers keys: array[1], array[2], … It sorts from array[beg] to array[end] included, so you can choose your array indices starting at 0 or 1, or sort just a part of the array.

            Example usage: with the both function above:

            {
                    LINES[NR] = $0
            }
            
            END {
                    sort(LINES, 1, NR)
                    for (i = 1; i <= NR; i++)
                            print(LINES[i])
            }
            

            Performance is far from terrible!

            $ od -An /dev/urandom | head -n 1000000 | time ./test.awk >/dev/null
            real    0m 19.23s
            user    0m 17.90s
            sys     0m 0.12s
            
            $ od -An /dev/urandom | head -n 1000000 | time sort >/dev/null
            real    0m 4.39s
            user    0m 3.00s
            sys     0m 0.10s
            
        2. 2

          I do not know GAWK, even though I’ve read Robbins’s book at some point in time. However, what Ultrix called nawk I read in a day and was productive writing “almost C” in nawk and very few idioms right away. Then Perl4 came along and I switched.

        3. 2

          This post is great! The follow-up (linked in the post) is also good.

          1. 2

            If your awk doesn’t have FPAT, you can do a match-loop.

            1. 1

              Can you give an example of that? I’m not familiar with the concept.

            2. 2

              Great article! I just started getting into awk myself, and I hadn’t considered a few of the example you showed. If you’re interested, @thingskatedid does some really interesting things with awk and tweets about it regularly (along with other cool stuff).

              1. 2

                Speaking from experience, processing CSV files with Awk (even GNU Awk) is a fool’s errand. Use something that already handles all the weird corner cases.

                That said, Awk is great and you should learn it. You really can learn it in under a day.

                1. 1

                  There is csvawk: https://github.com/DavyLandman/csvtools

                  It uses a C converter from .csv to a custom binary delimiter unexpected from input, then calls awk setup with this delimiter.

                  It sadly uses a custom BEGIN { IFS = "..." } instead of awk -F "...", but it should really not be too hard to convert it to use -F instead.

                  1. 1

                    Yeah, here’s what a robust solution for parsing csv with awk looks like: https://stackoverflow.com/questions/45420535/whats-the-most-robust-way-to-efficiently-parse-csv-using-awk

                  2. 2

                    “Awk is not a solution for every programming problem, but it’s an indispensable part of a programmer’s toolbox especially on Unix, where easy connection of tool is a way of life. Although the larger examples in the book might give a different impression, most awk programs are short and simple and do tasks the language was originally meant for: counting things, converting data from one form to another, adding up numbers, extracting information for reports.”

                    Maybe spending two hours with the 1988 book would be a good idea? You can parse population data about the Soviet Union based on data from 1984! It’s a fun trip.

                    I also think it’s of note that Larry Wall was using awk for whatever task and it “ran out of steam” and we got Perl.

                    1. 1
                      }
                      
                      # OCD fix, do not upvote.
                      
                      1. 1

                        Must resist urge to upvote

                      2. 1

                        I often discover new ways to use awk out of discussing or experimenting:

                        Print text to an external pager for plain #!/usr/bin/awk -f scripts without awk '[...]' | less.

                        function pager(msg) {
                                printf "%s", msg | "less"
                        }
                        

                        Thanks E.B. for this ^