Do you find awk one-liners cryptic? Stuff like !a[$0]++, 1, $1=$1, NR==FNR and -v RS=? You'll find examples and brief explanations for such idioms in this post.


awk command structure🔗

awk 'cond1{action1} cond2{action2} ... condN{actionN}'

When a conditional expression isn't provided, the action is always executed. When an action isn't provided, the $0 variable (which has the contents of the current record being processed) is printed if the conditional expression evaluates to true.


Regexp filtering🔗

# same as: grep 'at' and sed -n '/at/p'
$ printf 'gate\napple\nwhat\nkite\n' | awk '/at/'
gate
what

# same as: grep -v 'e' and sed -n '/e/!p'
$ printf 'gate\napple\nwhat\nkite\n' | awk '!/e/'
what

The generic syntax is string ~ /regexp/ to check if the given string matches the regexp and string !~ /regexp/ to invert the condition.

  • /regexp/ is a shortcut for $0 ~ /regexp/{print $0}
  • !/regexp/ is a shortcut for $0 !~ /regexp/{print $0}

Idiomatic use of 1🔗

Non-zero numeric values and non-empty strings are truthy (zero and empty strings are falsy). Idiomatically, 1 is used as a conditional expression to print the contents of $0.

$ echo 'ring amazing jar' | awk '{sub(/ing/, "ed", $2)} 1'
ring amazed jar

$ seq 2 | awk 'BEGIN{print "---"} 1; END{print "==="}'
---
1
2
===

Special variables🔗

  • $0 contains the current record being processed
  • $1 first field
  • $2 second field and so on
  • FS input field separator
  • OFS output field separator
  • NF number of fields
  • RS input record separator
  • ORS output record separator
  • NR number of records (i.e. line number) for the entire input
  • FNR number of records per file

Removing duplicates🔗

awk '!a[$0]++' is one of the most famous awk one-liners. It eliminates line based duplicates while retaining the input order.

$ cat purchases.txt
coffee
tea
washing powder
coffee
tea
coffee milkshake
soap
tea
washing soda

$ awk '{print +a[$0] "\t" $0; a[$0]++}' purchases.txt
0	coffee
0	tea
0	washing powder
1	coffee
1	tea
0	coffee milkshake
0	soap
2	tea
0	washing soda

# only the entries with zero in the first column will be retained
$ awk '!a[$0]++' purchases.txt
coffee
tea
washing powder
coffee milkshake
soap
washing soda

a[$0] creates an uninitialized element in array a with $0 as the key (if the key doesn't exist yet). Thus, !a[$0] will succeed only on the first occurrence of an item (since an uninitialized value is falsy) and the post-increment operator will ensure that further instances of an item will fail the conditional expression.


Rebuild $0🔗

Sometimes you just want to change the field separator, or perform some record-level text processing and then print it with a new field separator. In such cases, you'll have to explicitly fake a field operation — otherwise the field separation update won't happen for $0.

$ s='sample123string42with777numbers'

$ echo "$s" | awk -F'[0-9]+' -v OFS=, '{$1=$1} 1'
sample,string,with,numbers

$ echo "$s" | awk -F'[0-9]+' -v OFS=- '{gsub(/[aeiou]/, ""); $1=$1} 1'
smpl-strng-wth-nmbrs

Paragraph mode🔗

When RS is set to an empty string, one or more consecutive empty lines is used as the input record separator.

$ cat para.txt
hello world

hi there
how are you

just doing
believe it

banana
papaya
mango

much ado about nothing
he he he
adios amigo

# uninitialized variable 's' will be empty for the first match
# afterwards, 's' will provide the empty line separation
$ awk -v RS= '/do/{print s $0; s="\n"}' para.txt
just doing
believe it

much ado about nothing
he he he
adios amigo

Two file processing🔗

For two files as input, NR==FNR will be true only when the first file is being processed. The next statement will skip the rest of the code for the current record.

$ cat marks.txt
dept    name    marks
ece     raj     53
ece     joel    72
eee     moi     68
cse     surya   81
eee     tia     59
ece     om      92
cse     amy     67

$ cat dept_mark.txt
ece 70
eee 65
cse 80

# match dept and minimum marks specified in dept_mark.txt
$ awk 'NR==FNR{d[$1]=$2; next}
       $1 in d && $3 >= d[$1]' dept_mark.txt marks.txt
ece     joel    72
eee     moi     68
cse     surya   81
ece     om      92

warning Note that the NR==FNR logic will fail if the first file is empty, since NR wouldn't get a chance to increment. You can set a flag after the first file has been processed to avoid this issue — for example, awk '!f{a[$0]; next} !($0 in a)' file1 f=1 file2. See this unix.stackexchange thread for more workarounds.


Forcing string and numeric context🔗

Strings are automatically converted to a number when used in an arithmetic expression (for example, "42" + 5). You can use the unary + and - operators to force numeric context. If the string doesn't start with a valid number (ignoring any starting whitespaces), it will be treated as 0.

$ seq 3 | awk '{sum += $0} END{print sum}'
6
$ awk '{sum += $0} END{print sum}' /dev/null

$ awk '{sum += $0} END{print +sum}' /dev/null
0

Similarly, you can concatenate a string to a number to force string context.

$ awk 'BEGIN{n1="5.0"; n2=5; if(n1==n2) print "equal"}'
$ awk 'BEGIN{n1="5.0"; n2=5; if(n1==n2".0") print "equal"}'
equal

See gawk manual: How awk Converts Between Strings and Numbers for more details.


Programming ebooks🔗

Check out my ebooks on Regular Expressions, Linux CLI tools, Python and Vim. You can get them all as a single bundle via leanpub or gumroad.