Multiple file input
You have already seen blocks like BEGIN
, END
and statements like next
. This chapter will discuss features that are useful to make decisions around each file when there are multiple files passed as input.
The example_files directory has all the files used in the examples.
BEGINFILE, ENDFILE and FILENAME
BEGINFILE
— this block gets executed before the start of each input fileENDFILE
— this block gets executed after processing each input fileFILENAME
— special variable having the filename of the current input file
Here are some examples:
# can also use: awk 'BEGINFILE{printf "--- %s ---\n", FILENAME} 1'
$ awk 'BEGINFILE{print "--- " FILENAME " ---"} 1' greeting.txt table.txt
--- greeting.txt ---
Hi there
Have a nice day
Good bye
--- table.txt ---
brown bread mat hair 42
blue cake mug shirt -7
yellow banana window shoes 3.14
# same as: tail -q -n1 greeting.txt table.txt
$ awk 'ENDFILE{print $0}' greeting.txt table.txt
Good bye
yellow banana window shoes 3.14
nextfile
The nextfile
statement helps to skip the remaining records from the current file being processed and move on to the next file. Note that the ENDFILE
block will still be executed, if present.
# print filename if it contains 'I' anywhere in the file
# same as: grep -l 'I' f[1-3].txt greeting.txt
$ awk '/I/{print FILENAME; nextfile}' f[1-3].txt greeting.txt
f1.txt
f2.txt
# print filename if it contains both 'o' and 'at' anywhere in the file
$ awk 'BEGINFILE{m1=m2=0} /o/{m1=1} /at/{m2=1}
m1 && m2{print FILENAME; nextfile}' f[1-3].txt greeting.txt
f2.txt
f3.txt
# print filename if it contains 'at' but not 'o'
$ awk 'BEGINFILE{m1=m2=0} /o/{m1=1; nextfile} /at/{m2=1}
ENDFILE{if(!m1 && m2) print FILENAME}' f[1-3].txt greeting.txt
f1.txt
nextfile
cannot be used in theBEGIN
orEND
orENDFILE
blocks. See gawk manual: nextfile for more details, how it affectsENDFILE
and other special cases.
ARGC and ARGV
The ARGC
special variable contains the total number of arguments passed to the awk
command, including awk
itself as an argument. The ARGV
special array contains the arguments themselves.
# note that the index starts with '0' here
$ awk 'BEGIN{for(i=0; i<ARGC; i++) print ARGV[i]}' f[1-3].txt greeting.txt
awk
f1.txt
f2.txt
f3.txt
greeting.txt
Similar to manipulating NF
and modifying $N
field contents, you can change the values of ARGC
and ARGV
to control how the arguments should be processed.
However, not all arguments are necessarily filenames. awk
allows assigning variable values without -v
option if it is done in the place where you usually provide file arguments. For example:
$ awk 'BEGIN{for(i=0; i<ARGC; i++) print ARGV[i]}' table.txt n=5 greeting.txt
awk
table.txt
n=5
greeting.txt
In the above example, the variable n
will get a value of 5
after awk
has finished processing the table.txt
file. Here's an example where FS
is changed between two files.
$ cat table.txt
brown bread mat hair 42
blue cake mug shirt -7
yellow banana window shoes 3.14
$ cat books.csv
Harry Potter,Mistborn,To Kill a Mocking Bird
Matilda,Castle Hangnail,Jane Eyre
# for table.txt, FS will be the default value
# for books.csv, FS will be the comma character
# OFS is comma for both the files
$ awk -v OFS=, 'NF=2' table.txt FS=, books.csv
brown,bread
blue,cake
yellow,banana
Harry Potter,Mistborn
Matilda,Castle Hangnail
See stackoverflow: extract positions 2-7 from a fasta sequence for a practical example of changing field/record separators between the files being processed.
Summary
This chapter introduced few more special blocks and variables are that handy for processing multiple file inputs. These will show up in examples in the coming chapters as well.
Next chapter will discuss use cases where you need to take decisions based on multiple input records.
Exercises
The exercises directory has all the files used in this section.
1) Print the last field of the first two lines for the input files table.txt
, scores.csv
and fw.txt
. The field separators for these files are space, comma and fixed width respectively. To make the output more informative, print filenames and a separator as shown in the output below. Assume that the input files will have at least two lines.
$ awk ##### add your solution here
>table.txt<
42
-7
----------
>scores.csv<
Chemistry
99
----------
>fw.txt<
0.134563
6
----------
2) For the input files sample.txt
, secrets.txt
, addr.txt
and table.txt
, display only the names of files that contain at
or fun
in the third field. Assume space as the field separator.
$ awk ##### add your solution here sample.txt secrets.txt addr.txt table.txt
secrets.txt
addr.txt
table.txt