1. 48
  1. 12

    My favourite awk oneliner I have memorized, is for extracting contents between some specific begin and end patterns/fences in muliple files:

    awk '/begin-regex/{p=1}; p; /end-regex/{p=0}'

    (I think you don’t need curly braces, but not sure now) For example, contents of all init functions in all Go files:

    awk '/^func init/{p=1}; p; /^}/{p=0}' *.go

    By swapping the expressions between semicolons, you can make it include or exclude the fence lines in the output.

    Explanation: variable p is 0 (i.e. false) by default. Default action for a condition with no action is to print current line, so the sole p in the middle expands to equivalent of: p!=0 {print}.

    1. 3

      I think you don’t need curly braces, but not sure now

      Since assignment is an Action, you would have to use the curly brackets to change the value of p.

      Unless you want this to go over files, you could also just do

      /begin-regex/, /end-regex/

      which uses “Pattern Ranges”, and don’t require the extra auxiliary variable. If you still would want it to match patterns between multiple files, you’d probably have to use a pipe and concatenate the files beforehand.

      1. 2

        Note that you can also achieve this very concisely with sed, including across multiple files:

        $ sed -n '/begin-regex/,/end-regex/p' file1 file2 ...
        1. 1

          Can this let me exclude the begin and/or end fence line from the output? Given that the sed language is Turing-complete, I suppose there is some way, question is how easy? In “my” awk expression, this is a matter of changing the order of the sub-expressions.

          1. 1

            I haven’t looked this up recently, but I believe the canonical sed version is:

            sed -ne '/start/,/end/ { /start\|end/ !p }'

            I thought there was another solution by abusing labels and gotos but I can’t seem to get one written.

      2. 3

        Thanks, this tip led me to refactor some of my awk code :) I like that pattern too, but I forgot that the booleans can go on the left too. I think always think of Awk as “patterns and actions” but it’s really “predicates and actions”.

        Context: as part of hollowing out the Python interpreter, I use this Awk snippet to extract the C struct initializers in the CPython codebase.


        Then I parse that very limited language of {"string", var1, var2, ...} with a small recursive descent parser.

        Overall I’ve found good use for awk in 5-10 places over the last couple years, i.e. NOT the typical “field extraction” use case of { print $5 }.

        Now that I know awk more, I like it more than I used to. On the other hand, I’ve also written a few hundred lines of Make from scratch in the last couple years, and I think less of it than I used to :-/ Make always seem to give me half-working and slow solutions, whereas Awk gives you a precise and fast solution.

        style patch: https://github.com/oilshell/oil/commit/eee1ee13feca3e6de31f34b778fdc3bdb0b520fd

        (I also didn’t know about the implicit { print } but that seems way to obscure for me :) )

        1. 2

          (I think you don’t need curly braces, but not sure now)

          you can do this without any explicit action statements:

          awk -- '/start-reg/ && (p = 1) && 0; p; /end-reg/ && p = 0;'

          only because assignment is an expression that returns the lvalue value that gets assigned to the rvalue. but mind order of operations.