awk scripts

So far, you've only seen how to provide awk scripts directly on the command line. In this chapter, you'll see basic examples for executing scripts saved in files.

The example_files directory has all the files used in the examples.

-f option

The -f command line option allows you to pass the awk script via files instead of writing everything on the command line. Here's an one-liner seen earlier that's been converted to a multiline script. Note that ; is no longer necessary to separate the commands, newline will do that too.

$ cat buf.awk
/error/{
    f = 1
    buf = $0
    next
}

f{
    buf = buf ORS $0
}

/state/{
    if(f)
        print buf
    f = 0
}

$ awk -f buf.awk broken.txt
error 2
1234
6789
state 1
error 4
abcd
state 3

Another advantage is that single quotes can be freely used.

$ echo 'cue us on this example' | awk -v q="'" '{gsub(/\w+/, q "&" q)} 1'
'cue' 'us' 'on' 'this' 'example'

# the above solution is simpler to write as a script
$ cat quotes.awk
{
    gsub(/\w+/, "'&'")
}

1

$ echo 'cue us on this example' | awk -f quotes.awk
'cue' 'us' 'on' 'this' 'example'

-o option

If the code has been first tried out on the command line, you can use the -o option to get a pretty printed version. Output filename can be passed along as an argument to this option. By default, awkprof.out will be used as the filename.

# adding -o after the one-liner has been tested
# input filenames and -v would be simply ignored
$ awk -o -v OFS='\t' 'NR==FNR{r[$1]=$2; next}
         {$(NF+1) = FNR==1 ? "Role" : r[$2]} 1' role.txt marks.txt

# pretty printed version
$ cat awkprof.out
NR == FNR {
        r[$1] = $2
        next
}

{
        $(NF + 1) = FNR == 1 ? "Role" : r[$2]
}

1 {
        print
}

# calling the script
# note that other command line options have to be provided as usual
$ awk -v OFS='\t' -f awkprof.out role.txt marks.txt
Dept    Name    Marks   Role
ECE     Raj     53      class_rep
ECE     Joel    72      
EEE     Moi     68      
CSE     Surya   81      
EEE     Tia     59      placement_rep
ECE     Om      92      
CSE     Amy     67      sports_rep

Summary

So, now you know how to write program files for awk instead of just the one-liners. And about the -o option, which helps to convert complicated one-liners to pretty printed program files.

Next chapter will discuss a few gotchas and tricks.

Exercises

The exercises directory has all the files used in this section.

1) Before explaining the problem statement, here's an example of markdown headers and their converted link version. Note the use of -1 for the second occurrence of the Summary header. Also note that this sample doesn't illustrate every rule explained below.

# Field separators
## Summary
# Gotchas and Tips
## Summary

* [Field separators](#field-separators)
    * [Summary](#summary)
* [Gotchas and Tips](#gotchas-and-tips)
    * [Summary](#summary-1)

For the input file gawk.md, construct a Table of Content section as per the details described below:

Identify all header lines
- there are two types of header lines, one starting with # and the other starting with ##
- lines starting with # inside code blocks defined by ```bash and ``` markers should be ignored
The headers lines should then be converted as per the following rules:
- content is defined as the portion of the header ignoring the initial # or ## characters and the space character
- ## should be replaced with four spaces and a * character
- else, # should be replaced with * character
- create a copy of the content, change it to all lowercase, replace all space characters with the - character and then enclose it within (# and )
  - if there are multiple headers with the same content, append -1, -2, etc respectively for the second header, third header, etc
- surround the original content with [] and then append the string obtained from the previous step
Note that the output should have only the converted headers, all other input lines should not be present

The script file should be named as toc.awk and save the output in out.md.

$ awk -f toc.awk gawk.md > out.md
$ diff -sq out.md toc_expected.md
Files out.md and toc_expected.md are identical

2) For the input file odd.txt, surround the first two whole words of each line with {} that start and end with the same word character. Assume that the input file will not require case insensitive comparison. This is a contrived exercise that needs around 10 instructions and makes you use various features presented in this book.

$ cat odd.txt
-oreo-not:a _a2_ roar<=>took%22
RoaR to wow-

$ awk -f same.awk odd.txt
-{oreo}-not:{a} _a2_ roar<=>took%22
{RoaR} to {wow}-

CLI text processing with GNU awk

awk scripts

-f option

-o option

Summary

Exercises