awk scripts

-f option

The -f command line option allows you to pass the awk code via file instead of writing it all on the command line. Here's a one-liner seen earlier that's been converted to a multiline script. Note that ; is no longer necessary to separate the commands, newline will do that too.

$ cat buf.awk
/error/{
    f = 1
    buf = $0
    next
}

f{
    buf = buf ORS $0
}

/state/{
    if(f)
        print buf
    f = 0
}

$ awk -f buf.awk broken.txt
error 2
1234
6789
state 1
error 4
abcd
state 3

Another advantage is that single quotes can be freely used.

$ echo 'cue us on this example' | awk -v q="'" '{gsub(/\w+/, q "&" q)} 1'
'cue' 'us' 'on' 'this' 'example'

# the above solution is simpler to write as a script
$ cat quotes.awk
{
    gsub(/\w+/, "'&'")
}

1

$ echo 'cue us on this example' | awk -f quotes.awk
'cue' 'us' 'on' 'this' 'example'

-o option

If the code has been first tried out on command line, add -o option to get a pretty printed version. Output filename can be passed along -o option, otherwise awkprof.out will be used by default.

$ # adding -o after the one-liner has been tested
$ # input filenames and -v would be simply ignored
$ awk -o -v OFS='\t' 'NR==FNR{r[$1]=$2; next}
         {$(NF+1) = FNR==1 ? "Role" : r[$2]} 1' role.txt marks.txt

$ # pretty printed version
$ cat awkprof.out
NR == FNR {
        r[$1] = $2
        next
}

{
        $(NF + 1) = FNR == 1 ? "Role" : r[$2]
}

1 {
        print
}

$ # calling the script
$ # note that other command line options have to be provided as usual
$ awk -v OFS='\t' -f awkprof.out role.txt marks.txt
Dept    Name    Marks   Role
ECE     Raj     53      class_rep
ECE     Joel    72
EEE     Moi     68
CSE     Surya   81
EEE     Tia     59      placement_rep
ECE     Om      92
CSE     Amy     67      sports_rep

Summary

So, now you know how to write program files for awk instead of just the one-liners. And about the useful -o option, helps to convert complicated one-liners to pretty printed program files.

Next chapter will discuss a few gotchas and tricks.

Exercises

a) Before explaining the problem statement, here's an example of markdown headers and their converted link version. Note the use of -1 for the second occurrence of Summary header. Also note that this sample doesn't simulate all the rules.

# Field separators
## Summary
# Gotchas and Tips
## Summary

* [Field separators](#field-separators)
    * [Summary](#summary)
* [Gotchas and Tips](#gotchas-and-tips)
    * [Summary](#summary-1)

For the input file gawk.md, construct table of content links as per the details described below.

  • Identify all header lines
    • there are two types of header lines, one starting with # and the other starting with ##
    • lines starting with # inside code blocks defined by ```bash and ``` markers should be ignored
  • The headers lines should then be converted as per the following rules:
    • content is defined as portion of the header ignoring the initial # or ## characters and a space character
    • initial ## should be replaced with four spaces and a *
    • else, initial # should be replaced with *
    • create a copy of the content, change it to all lowercase, replace all space characters with - character and then place it within (# and )
      • if there are multiple headers with same content, append -1, -2, etc respectively for the second header, third header, etc
    • surround the original content with [] and then append the string obtained from previous step
  • Note that the output should have only the converted headers, all other input lines should not be present

As the input file gawk.md is too long, only the commands to verify your solution is shown.

$ awk -f toc.awk gawk.md > out.md
$ diff -sq out.md toc_expected.md
Files out.md and toc_expected.md are identical

b) For the input file odd.txt, surround first two whole words of each line with {} that start and end with the same word character. Assume that input file will not require case insensitive comparison. This is a contrived exercise that needs around 10 instructions and makes you recall various features presented in this book.

$ cat odd.txt
-oreo-not:a _a2_ roar<=>took%22
RoaR to wow-

$ awk -f same.awk odd.txt
-{oreo}-not:{a} _a2_ roar<=>took%22
{RoaR} to {wow}-