CLI tip 17: common and unique lines
Consider these sample input files that are already sorted and the default output from comm
:
$ paste colors_1.txt colors_2.txt
Blue Black
Brown Blue
Orange Green
Purple Orange
Red Pink
Teal Red
White White
$ comm colors_1.txt colors_2.txt
Black
Blue
Brown
Green
Orange
Pink
Purple
Red
Teal
White
The following comm
options will help you construct solutions to get common and unique lines:
-1
suppress lines unique to the first file-2
suppress lines unique to the second file-3
suppress lines common to both the files
# common lines
$ comm -12 colors_1.txt colors_2.txt
Blue
Orange
Red
White
# lines unique to colors_2.txt
$ comm -13 colors_1.txt colors_2.txt
Black
Green
Pink
If the input files are not already sorted, or if you want to preserve the order of input lines, you can use awk
instead:
# common lines
$ awk 'NR==FNR{a[$0]; next} $0 in a' colors_1.txt colors_2.txt
Blue
Orange
Red
White
# lines unique to colors_2.txt
$ awk 'NR==FNR{a[$0]; next} !($0 in a)' colors_1.txt colors_2.txt
Black
Green
Pink
You can also use grep -Fxf colors_1.txt colors_2.txt
(add -v
for unique lines) but this wouldn't scale well for larger input files.
Video demo:
See also my Linux Command Line Computing ebook.