Control Structures

You've already seen various examples requiring conditional expressions. This chapter will revisit if-else control structure along with the ternary operator. Then you will see some examples with explicit loops (recall that awk is already looping over input records). Followed by keywords that control loop flow. Most of the syntax is very similar to the C language.

if-else

Mostly, when you need to use if control structure, you can get away with using condX{actionX} format in awk one-liners. But sometimes, you need additional condition checking within such action blocks. Or, you might need it inside loops. The syntax is if(cond){action} where the braces are optional if you need only one statement. if can be optionally followed by multiple else if conditions and a final else condition. These can also be nested as needed.

$ # print all lines starting with 'b'
$ # additionally, if last column is > 0, then print some more info
$ awk '/^b/{print; if($NF>0) print "------"}' table.txt
brown bread mat hair 42
------
blue cake mug shirt -7

$ # same as above, but includes 'else' condition as well
$ awk '/^b/{print; if($NF>0) print "------"; else print "======"}' table.txt
brown bread mat hair 42
------
blue cake mug shirt -7
======

The ternary operator often reduces the need for single statement if-else cases.

$ # same as: awk '{if(NR%3) ORS="-" ; else ORS=RS} 1'
$ seq 6 | awk '{ORS = NR%3 ? "-" : RS} 1'
1-2-3
4-5-6

$ # note that parentheses is necessary for print in this case
$ awk '/^b/{print; print($NF>0 ? "------" : "======")}' table.txt
brown bread mat hair 42
------
blue cake mug shirt -7
======

info See also stackoverflow: finding min and max value of a column.

info See also gawk manual: switch.

loops

for loops are handy when you are working with arrays. Also for processing input fields, since $N syntax allows passing an expression instead of fixed value.

$ awk 'BEGIN{for(i=2; i<7; i+=2) print i}'
2
4
6

$ # looping each field
$ awk -v OFS=, '{for(i=1; i<=NF; i++) if($i ~ /^[bm]/) $i="["$i"]"} 1' table.txt
[brown],[bread],[mat],hair,42
[blue],cake,[mug],shirt,-7
yellow,[banana],window,shoes,3.14

Here's an example of looping over a dynamically constructed array.

$ cat marks.txt
Dept    Name    Marks
ECE     Raj     53
ECE     Joel    72
EEE     Moi     68
CSE     Surya   81
EEE     Tia     59
ECE     Om      92
CSE     Amy     67

$ # average marks for each department
$ awk 'NR>1{d[$1]+=$3; c[$1]++} END{for(k in d) print k, d[k]/c[k]}' marks.txt
ECE 72.3333
EEE 63.5
CSE 74

You can use break and continue to alter the normal flow of loops. break will cause the current loop to quit immediately without processing the remaining statements and iterations. continue will skip the remaining statements in the loop and start next iteration.

$ awk -v OFS=, '{for(i=1; i<=NF; i++) if($i ~ /b/){NF=i; break}} 1' table.txt
brown
blue
yellow,banana

info See also stackoverflow: find missing numbers from sequential list.

awk supports while and do-while loop mechanisms as well.

$ awk 'BEGIN{i=6; while(i>0){print i; i-=2}}'
6
4
2

$ # recursive substitution
$ echo 'titillate' | awk '{while(gsub(/til/, "")) print}'
tilate
ate
$ echo 'titillate' | awk '{do{print} while(gsub(/til/, ""))}'
titillate
tilate
ate

next

next is similar to continue statement but it acts on the default loop that goes through the input records. It doesn't affect BEGIN or END blocks as they are outside the record looping. When next is executed, rest of the statements will be skipped and next input record will be fetched for processing.

$ awk '/\<par/{print "%% " $0; next} {print /s/ ? "X" : "Y"}' word_anchors.txt
%% sub par
X
Y
X
%% cart part tart mart

You'll see more examples with next in coming chapters.

exit

You saw the use of exit earlier to quit early and avoid unnecessary processing of records. If an argument isn't passed, awk considers the command to have finished normally and exit status will indicate success. You can pass a number to indicate other cases.

$ seq 3542 4623452 | awk 'NR==2452{print; exit}'
5993
$ echo $?
0

$ awk '/^br/{print "Invalid input"; exit 1}' table.txt
Invalid input
$ echo $?
1

$ # any remaining files to be processed are also skipped
$ awk 'FNR==2{print; exit}' table.txt greeting.txt
blue cake mug shirt -7

If exit is used in BEGIN or normal blocks, any code in END block will still be executed. For more details and corner cases, see gawk manual: exit.

$ # first print is executed
$ # on seeing exit, rest of BEGIN and normal blocks are skipped
$ # code in END is then executed
$ awk 'BEGIN{print "hi"; exit; print "hello"}
       /^b/;
       END{print "bye"}' table.txt
hi
bye

Summary

This chapter covered some of the control flow structures provided by awk. These features makes awk flexible and easier to use compared to sed for those familiar with programming languages.

Next chapter will discuss some of the built-in functions.

Exercises

a) The input file nums.txt contains single column of numbers. Change positive numbers to negative and vice versa. Can you do it with using only sub function and without explicit use of if-else or ternary operator?

$ cat nums.txt
42
-2
10101
-3.14
-75

$ awk ##### add your solution here
-42
2
-10101
3.14
75

b) For the input file table.txt, change the field separator from space to , character. Also, any field not containing digit characters should be surrounded by double quotes.

$ awk ##### add your solution here
"brown","bread","mat","hair",42
"blue","cake","mug","shirt",-7
"yellow","banana","window","shoes",3.14

c) For each input line of the file secrets.txt, remove all characters except the last character of each field. Assume space as the input field separator.

$ cat secrets.txt
stag area row tick
deaf chi rate tall glad
Bi tac toe - 42

$ awk ##### add your solution here
gawk
field
ice-2

d) Emulate q and Q commands of sed as shown below.

$ # sed '/are/q' sample.txt will print until (and including) line contains 'are'
$ awk ##### add your solution here
Hello World

Good day
How are you

$ # sed '/are/Q' sample.txt will print until (but excluding) line contains 'are'
$ awk ##### add your solution here
Hello World

Good day

e) For the input file addr.txt:

  • if line contains e
    • delete all occurrences of e
    • surround all consecutive repeated characters with {}
    • assume that input will not have more than two consecutive repeats
  • if line doesn't contain e but contains u
    • surround all lowercase vowels in that line with []
$ awk ##### add your solution here
H{ll}o World
How ar you
This gam is g{oo}d
T[o]d[a]y [i]s s[u]nny
12345
You ar fu{nn}y