Control Structures
You've already seen various examples requiring conditional expressions. This chapter will revisit the if-else
control structure and the ternary operator. Then you will see some examples with explicit loops (recall that awk
is already looping over input records). Followed by keywords that control loop flow. Most of the syntax is very similar to the C
language.
The example_files directory has all the files used in the examples.
if-else
Mostly, when you need to use if
control structure, you can get away with using the condX{actionX}
blocks instead. But sometimes, you need additional condition checking within such action blocks. Or, you might need it inside loops. The syntax is if(cond){action}
where the braces are optional if you need only one statement. if
can be optionally followed by multiple else if
conditions and a final else
condition. These can also be nested as needed.
# print all lines starting with 'b'
# additionally, if the last column is > 0, then print some more text
$ awk '/^b/{print; if($NF>0) print "------"}' table.txt
brown bread mat hair 42
------
blue cake mug shirt -7
# same as above, but uses the 'else' condition as well
$ awk '/^b/{print; if($NF>0) print "------"; else print "======"}' table.txt
brown bread mat hair 42
------
blue cake mug shirt -7
======
The ternary operator often reduces the need for single statement if-else
control structures.
# same as: awk '{if(NR%3) ORS="-" ; else ORS=RS} 1'
$ seq 6 | awk '{ORS = NR%3 ? "-" : RS} 1'
1-2-3
4-5-6
# note that parentheses is necessary for print in this case
$ awk '/^b/{print; print($NF>0 ? "------" : "======")}' table.txt
brown bread mat hair 42
------
blue cake mug shirt -7
======
See also stackoverflow: finding min and max value of a column and gawk manual: switch.
loops
for
loops are handy when you are working with arrays. Also for processing input fields, since $N
syntax allows passing an expression instead of just fixed values.
$ awk 'BEGIN{for(i=2; i<7; i+=2) print i}'
2
4
6
# looping each field
$ awk -v OFS=, '{for(i=1; i<=NF; i++) if($i ~ /^[bm]/) $i="["$i"]"} 1' table.txt
[brown],[bread],[mat],hair,42
[blue],cake,[mug],shirt,-7
yellow,[banana],window,shoes,3.14
Here's an example of looping over a dynamically constructed array.
$ cat marks.txt
Dept Name Marks
ECE Raj 53
ECE Joel 72
EEE Moi 68
CSE Surya 81
EEE Tia 59
ECE Om 92
CSE Amy 67
# average marks for each department
$ awk 'NR>1{d[$1]+=$3; c[$1]++} END{for(k in d) print k, d[k]/c[k]}' marks.txt
ECE 72.3333
EEE 63.5
CSE 74
You can use break
and continue
to alter the normal flow of loops. break
will cause the current loop to quit immediately without processing the remaining statements and iterations. continue
will skip the remaining statements in the loop and start the next iteration.
$ awk -v OFS=, '{for(i=1; i<=NF; i++) if($i ~ /b/){NF=i; break}} 1' table.txt
brown
blue
yellow,banana
See also stackoverflow: find missing numbers from sequential list.
awk
supports the while
and do-while
loop mechanisms as well.
$ awk 'BEGIN{i=6; while(i>0){print i; i-=2}}'
6
4
2
# recursive substitution
$ echo 'titillate' | awk '{while(gsub(/til/, "")) print}'
tilate
ate
$ echo 'titillate' | awk '{do{print} while(gsub(/til/, ""))}'
titillate
tilate
ate
next
next
is similar to the continue
statement but it acts on the default loop that goes through the input records. It doesn't affect BEGIN
or END
blocks as they are outside the record looping. When next
is executed, rest of the statements will be skipped and next input record will be fetched for processing.
$ awk '/\<par/{print "%% " $0; next} {print /s/ ? "X" : "Y"}' anchors.txt
%% sub par
X
Y
X
%% cart part tart mart
You'll see more examples with next
in the coming chapters.
exit
You saw the use of exit
earlier to quit early and avoid unnecessary processing of records. If an argument isn't passed, awk
considers the command to have finished normally and the exit status will indicate success. You can pass a number argument for other cases.
$ seq 3542 4623452 | awk 'NR==2452{print; exit}'
5993
$ echo $?
0
$ awk '/^br/{print "invalid data"; exit 1}' table.txt
invalid data
$ echo $?
1
# any remaining files to be processed are also skipped
$ awk 'FNR==2{print; exit}' table.txt greeting.txt
blue cake mug shirt -7
If exit
is used in BEGIN
or normal blocks, any code in the END
block will still be executed. For more details and corner cases, see gawk manual: exit.
# first print is executed
# on seeing exit, rest of BEGIN and normal blocks are skipped
# code in the END block is then executed
$ awk 'BEGIN{print "hi"; exit; print "hello"}
/^b/;
END{print "bye"}' table.txt
hi
bye
Summary
This chapter covered some of the control flow structures provided by awk
. These features makes awk
flexible and easier to use compared to sed
.
Next chapter will discuss some of the built-in functions.
Exercises
The exercises directory has all the files used in this section.
1) The input file nums.txt
contains a single column of numbers. Change positive numbers to negative and vice versa. Solution should use the sub
function and shouldn't explicitly use the if-else
control structure or the ternary operator.
$ cat nums.txt
42
-2
10101
-3.14
-75
$ awk ##### add your solution here
-42
2
-10101
3.14
75
2) For the input file table.txt
, change the field separator from space to the ,
character. Also, any field not containing digit characters should be surrounded by double quotes.
$ awk ##### add your solution here
"brown","bread","mat","hair",42
"blue","cake","mug","shirt",-7
"yellow","banana","window","shoes",3.14
3) For each input line of the file secrets.txt
, remove all characters except the last character of each field. Assume space as the input field separator.
$ cat secrets.txt
stag area row tick
deaf chi rate tall glad
Bi tac toe - 42
$ awk ##### add your solution here
gawk
field
ice-2
4) For the input file sample.txt
, emulate the q
and Q
commands of sed
as shown below.
# sed '/are/q' sample.txt will print till the line containing 'are'
$ awk ##### add your solution here
Hello World
Good day
How are you
# sed '/are/Q' sample.txt is similar to the 'q' command,
# but the matching line won't be part of the output
$ awk ##### add your solution here
Hello World
Good day
5) For the input file addr.txt
:
- if a line contains
e
- delete all occurrences of
e
- surround all consecutive repeated characters with
{}
- assume that the input will not have more than two consecutive repeats
- delete all occurrences of
- if a line doesn't contain
e
but containsu
- surround all lowercase vowels in that line with
[]
- surround all lowercase vowels in that line with
$ awk ##### add your solution here
H{ll}o World
How ar you
This gam is g{oo}d
T[o]d[a]y [i]s s[u]nny
12345
You ar fu{nn}y
6) The goal is to print found you
if the input file contains you
and not found
otherwise. However, both the print
statements are executed in the awk
code shown below. Change it to work as expected.
$ awk '/you/{print "found you"; exit} END{print "not found"}' addr.txt
found you
not found