Nice introduction (based on both tutorials). One suggestion would be to use
-v FPAT='[^,]*|"[^"]+"'
instead of
BEGIN { FPAT = "[^,]*|\"[^\"]+\"" }
If you get the awk programming language manual…you’ll read it in about two hours and then you’re done. That’s it. You know all of awk.
I can’t work my head around this quote. That’s a ridiculous claim. Even for a experienced programmer, learning a new programming language in 2 weeks, let alone 2 hours would be nothing short of a miracle. I’ve been using awk for past 2-3 years or so and I wrote a book on GNU awk one-liners earlier this year (https://learnbyexample.github.io/learn_gnuawk/). I’m no where close to knowing all of awk
Awk or GNU Awk? Awk itself is very small and simple. If you have mawk on your system, then man mawk will probably outline the entire thing (about 1000 lines). I have a copy of The AWK Programming Language from 1988 and it’s a thin book that’s easy to digest.
GNU Awk has so many extensions you can’t possibly learn it in any reasonable amount of time, and frankly, most of those extensions are not all that useful. Awk is stunningly great for simple text processing and incredibly useful, but as a general purpose programming text processing language, which what GNU Awk seems to be going for, it’s subpar. You are better off with something like Perl or Python because data structures and functions in Awk kind of suck. (I swear the extensions in GNU Awk were added to deal with the bizarreness of some of the GNU builds where a tool was used, the situation changed that really called for a different tool, but for whatever reason the existing tool was extended in unnatural ways to deal with it.)
Yeah, I had GNU awk in mind when I meant all of awk. Just glanced through man mawk and it is short indeed. I’d say I’m reasonably familiar with most concepts. I know getline for basic usage, but tend to avoid because of caveats (see http://awk.freeshell.org/AllAboutGetline). Also, my awk usage is limited to one-liners for most part, so I haven’t bothered to learn about functions (which has significant spaces in syntax).
I’d disagree with your take on GNU awk extensions. Many of them are useful for one-liners too. For example: FPAT, multicharacter and regexp based RS (plus RT), FIELDWIDTHS, in-place editing, BEGINFILE, ENDFILE, array sorting with PROCINFO, 4th argument to split, 3rd argument to match and so on.
And I do use Python or Ruby these days if I need to write a program file instead of one-liners.
I’d disagree with your take on GNU awk extensions.
The stuff you cite can be useful (although I use Awk pretty much every day and I think I’ve used FIELDWIDTH once and that’s about it), but then there’s the other stuff.
Is possible to implement sort() in awk (this is a quicksort):
function swap(array, a, b,
tmp)
{
tmp = array[a]
array[a] = array[b]
array[b] = tmp
}
function sort(array, beg, end)
{
if (beg >= end) # end recursion
return
a = beg + 1 # 1st is the pivot, so +1
b = end
while (a < b) {
while (a < b && array[a] <= array[beg]) # beg: skip lesser
a++
while (a < b && array[b] > array[beg]) # end: skip greater
b--
swap(array, a, b) # found 2 misplaced
}
if (array[beg] > array[a]) # put the pivot back
swap(array, beg, a)
sort(array, beg, a - 1) # sort lower half
sort(array, a, end) # sort higher half
}
This sorts the array values using integers keys: array[1], array[2], …
It sorts from array[beg] to array[end] included, so you can choose your
array indices starting at 0 or 1, or sort just a part of the array.
Example usage: with the both function above:
{
LINES[NR] = $0
}
END {
sort(LINES, 1, NR)
for (i = 1; i <= NR; i++)
print(LINES[i])
}
Performance is far from terrible!
$ od -An /dev/urandom | head -n 1000000 | time ./test.awk >/dev/null
real 0m 19.23s
user 0m 17.90s
sys 0m 0.12s
$ od -An /dev/urandom | head -n 1000000 | time sort >/dev/null
real 0m 4.39s
user 0m 3.00s
sys 0m 0.10s
I do not know GAWK, even though I’ve read Robbins’s book at some point in time. However, what Ultrix called nawk I read in a day and was productive writing “almost C” in nawk and very few idioms right away. Then Perl4 came along and I switched.
Great article! I just started getting into awk myself, and I hadn’t considered a few of the example you showed. If you’re interested, @thingskatedid does some really interesting things with awk and tweets about it regularly (along with other cool stuff).
Speaking from experience, processing CSV files with Awk (even GNU Awk) is a fool’s errand. Use something that already handles all the weird corner cases.
That said, Awk is great and you should learn it. You really can learn it in under a day.
“Awk is not a solution for every programming problem, but it’s an indispensable part of a programmer’s toolbox especially on Unix, where easy connection of tool is a way of life. Although the larger examples in the book might give a different impression, most awk programs are short and simple and do tasks the language was originally meant for: counting things, converting data from one form to another, adding up numbers, extracting information for reports.”
Maybe spending two hours with the 1988 book would be a good idea? You can parse population data about the Soviet Union based on data from 1984! It’s a fun trip.
I also think it’s of note that Larry Wall was using awk for whatever task and it “ran out of steam” and we got Perl.
Nice introduction (based on both tutorials). One suggestion would be to use
instead of
I can’t work my head around this quote. That’s a ridiculous claim. Even for a experienced programmer, learning a new programming language in 2 weeks, let alone 2 hours would be nothing short of a miracle. I’ve been using
awk
for past 2-3 years or so and I wrote a book on GNU awk one-liners earlier this year (https://learnbyexample.github.io/learn_gnuawk/). I’m no where close to knowing all of awkAwk or GNU Awk? Awk itself is very small and simple. If you have
mawk
on your system, thenman mawk
will probably outline the entire thing (about 1000 lines). I have a copy of The AWK Programming Language from 1988 and it’s a thin book that’s easy to digest.GNU Awk has so many extensions you can’t possibly learn it in any reasonable amount of time, and frankly, most of those extensions are not all that useful. Awk is stunningly great for simple text processing and incredibly useful, but as a general purpose programming text processing language, which what GNU Awk seems to be going for, it’s subpar. You are better off with something like Perl or Python because data structures and functions in Awk kind of suck. (I swear the extensions in GNU Awk were added to deal with the bizarreness of some of the GNU builds where a tool was used, the situation changed that really called for a different tool, but for whatever reason the existing tool was extended in unnatural ways to deal with it.)
Yeah, I had
GNU awk
in mind when I meant all of awk. Just glanced throughman mawk
and it is short indeed. I’d say I’m reasonably familiar with most concepts. I knowgetline
for basic usage, but tend to avoid because of caveats (see http://awk.freeshell.org/AllAboutGetline). Also, myawk
usage is limited to one-liners for most part, so I haven’t bothered to learn about functions (which has significant spaces in syntax).I’d disagree with your take on GNU awk extensions. Many of them are useful for one-liners too. For example: FPAT, multicharacter and regexp based RS (plus RT), FIELDWIDTHS, in-place editing, BEGINFILE, ENDFILE, array sorting with PROCINFO, 4th argument to split, 3rd argument to match and so on.
And I do use Python or Ruby these days if I need to write a program file instead of one-liners.
The stuff you cite can be useful (although I use Awk pretty much every day and I think I’ve used FIELDWIDTH once and that’s about it), but then there’s the other stuff.
And that’s not even all of it.
Is possible to implement sort() in awk (this is a quicksort):
This sorts the array values using integers keys:
array[1]
,array[2]
, … It sorts fromarray[beg]
toarray[end]
included, so you can choose your array indices starting at 0 or 1, or sort just a part of the array.Example usage: with the both function above:
Performance is far from terrible!
I do not know GAWK, even though I’ve read Robbins’s book at some point in time. However, what Ultrix called
nawk
I read in a day and was productive writing “almost C” in nawk and very few idioms right away. Then Perl4 came along and I switched.This post is great! The follow-up (linked in the post) is also good.
If your awk doesn’t have FPAT, you can do a match-loop.
Can you give an example of that? I’m not familiar with the concept.
Great article! I just started getting into awk myself, and I hadn’t considered a few of the example you showed. If you’re interested, @thingskatedid does some really interesting things with awk and tweets about it regularly (along with other cool stuff).
Speaking from experience, processing CSV files with Awk (even GNU Awk) is a fool’s errand. Use something that already handles all the weird corner cases.
That said, Awk is great and you should learn it. You really can learn it in under a day.
There is csvawk: https://github.com/DavyLandman/csvtools
It uses a C converter from
.csv
to a custom binary delimiter unexpected from input, then calls awk setup with this delimiter.It sadly uses a custom
BEGIN { IFS = "..." }
instead ofawk -F "..."
, but it should really not be too hard to convert it to use -F instead.Yeah, here’s what a robust solution for parsing csv with awk looks like: https://stackoverflow.com/questions/45420535/whats-the-most-robust-way-to-efficiently-parse-csv-using-awk
“Awk is not a solution for every programming problem, but it’s an indispensable part of a programmer’s toolbox especially on Unix, where easy connection of tool is a way of life. Although the larger examples in the book might give a different impression, most awk programs are short and simple and do tasks the language was originally meant for: counting things, converting data from one form to another, adding up numbers, extracting information for reports.”
Maybe spending two hours with the 1988 book would be a good idea? You can parse population data about the Soviet Union based on data from 1984! It’s a fun trip.
I also think it’s of note that Larry Wall was using awk for whatever task and it “ran out of steam” and we got Perl.
Must resist urge to upvote
I often discover new ways to use awk out of discussing or experimenting:
Print text to an external pager for plain
#!/usr/bin/awk -f
scripts withoutawk '[...]' | less
.Thanks E.B. for this ^