CLI tip 32: text processing between two files with GNU awk
awk
is handy to compare records and fields between two or more files. The key features used in the solution below:
- For two files as input,
NR==FNR
will betrue
only when the first file is being processed next
will skip rest of the script and fetch the next recorda[$0]
by itself is a valid statement. It will create an uninitialized element in arraya
with$0
as the key (assuming the key doesn't exist yet)$0 in a
checks if the given string ($0
here) exists as a key in the arraya
$ cat colors_1.txt
teal
light blue
green
yellow
$ cat colors_2.txt
light blue
black
dark green
yellow
# common lines
$ awk 'NR==FNR{a[$0]; next} $0 in a' colors_1.txt colors_2.txt
light blue
yellow
# lines from colors_2.txt not present in colors_1.txt
$ awk 'NR==FNR{a[$0]; next} !($0 in a)' colors_1.txt colors_2.txt
black
dark green
Note that the
NR==FNR
logic will fail if the first file is empty, sinceNR
wouldn't get a chance to increment. You can set a flag after the first file has been processed to avoid this issue. See this unix.stackexchange thread for more workarounds.# no output $ awk 'NR==FNR{a[$0]; next} !($0 in a)' /dev/null <(seq 2) # gives the expected output $ awk '!f{a[$0]; next} !($0 in a)' /dev/null f=1 <(seq 2) 1 2
Here's an example of comparing specific fields instead of whole lines. When you use a ,
separator between strings to construct the array key, the value of SUBSEP
is inserted. This special variable has a default value of the non-printing character \034
which is usually not used as part of text files.
$ cat marks.txt
Dept Name Marks
ECE Raj 53
ECE Joel 72
EEE Moi 68
CSE Surya 81
EEE Tia 59
ECE Om 92
CSE Amy 67
$ cat dept_name.txt
EEE Moi
CSE Amy
ECE Raj
$ awk 'NR==FNR{a[$1,$2]; next} ($1,$2) in a' dept_name.txt marks.txt
ECE Raj 53
EEE Moi 68
CSE Amy 67
Video demo:
See also my CLI text processing with GNU awk ebook.