comm
The comm
command finds common and unique lines between two sorted files. These results are formatted as a table with three columns and one or more of these columns can be suppressed as required.
Three column output
Consider the sample input files as shown below:
# side by side view of the sample files
# note that these files are already sorted
$ paste colors_1.txt colors_2.txt
Blue Black
Brown Blue
Orange Green
Purple Orange
Red Pink
Teal Red
White White
By default, comm
gives a tabular output with three columns:
- first column has lines unique to the first file
- second column has lines unique to the second file
- third column has lines common to both the files
The columns are separated by a tab character. Here's the output for the above sample files:
$ comm colors_1.txt colors_2.txt
Black
Blue
Brown
Green
Orange
Pink
Purple
Red
Teal
White
You can change the column separator to a string of your choice using the --output-delimiter
option. Here's an example:
# note that the input files need not have the same number of lines
$ comm <(seq 3) <(seq 2 5)
1
2
3
4
5
$ comm --output-delimiter=, <(seq 3) <(seq 2 5)
1
,,2
,,3
,4
,5
Collating order for
comm
should be same as the one used tosort
the input files.
--nocheck-order
option can be used for unsorted inputs. However, as per the documentation, this option "is not guaranteed to produce any particular output."
Suppressing columns
You can use one or more of the following options to suppress columns:
-1
to suppress the lines unique to the first file-2
to suppress the lines unique to the second file-3
to suppress the lines common to both the files
Here's how the output looks like when you suppress one of the columns:
# suppress lines common to both the files
$ comm -3 colors_1.txt colors_2.txt
Black
Brown
Green
Pink
Purple
Teal
Combining two of these options gives three useful solutions. -12
will give you only the common lines.
$ comm -12 colors_1.txt colors_2.txt
Blue
Orange
Red
White
-23
will give you the lines unique to the first file.
$ comm -23 colors_1.txt colors_2.txt
Brown
Purple
Teal
-13
will give you the lines unique to the second file.
$ comm -13 colors_1.txt colors_2.txt
Black
Green
Pink
You can combine all the three options as well. Useful with the --total
option to get only the count of lines for each of the three columns.
$ comm --total -123 colors_1.txt colors_2.txt
3 3 4 total
Duplicate lines
The number of duplicate lines in the common column will be minimum of the duplicate occurrences between the two files. Rest of the duplicate lines, if any, will be considered as unique to the file having the excess lines. Here's an example:
$ paste list_1.txt list_2.txt
apple cherry
banana cherry
cherry mango
cherry papaya
cherry
cherry
# 'cherry' occurs only twice in the second file
# rest of the 'cherry' lines will be unique to the first file
$ comm list_1.txt list_2.txt
apple
banana
cherry
cherry
cherry
cherry
mango
papaya
NUL separator
Use the -z
option if you want to use NUL character as the line separator. In this scenario, comm
will ensure to add a final NUL character even if not present in the input.
$ comm -z -12 <(printf 'a\0b\0c') <(printf 'a\0c\0x') | cat -v
a^@c^@
Alternatives
Here are some alternate commands you can explore if comm
isn't enough to solve your task. These alternatives do not require the input files to be sorted.
- zet — set operations on one or more input files
- Comparing lines between files section from my GNU grep ebook
- Two file processing chapter from my GNU awk ebook, has examples for both line and field based comparisons
- Two file processing chapter from my Perl one-liners ebook, has examples for both line and field based comparisons
Exercises
The exercises directory has all the files used in this section.
1) Get the common lines between the s1.txt
and s2.txt
files. Assume that their contents are already sorted.
$ paste s1.txt s2.txt
apple banana
coffee coffee
fig eclair
honey fig
mango honey
pasta milk
sugar tea
tea yeast
##### add your solution here
coffee
fig
honey
tea
2) Display lines present in s1.txt
but not s2.txt
and vice versa.
# lines unique to the first file
##### add your solution here
apple
mango
pasta
sugar
# lines unique to the second file
##### add your solution here
banana
eclair
milk
yeast
3) Display lines unique to the s1.txt
file and the common lines when compared to the s2.txt
file. Use ==>
to separate the output columns.
##### add your solution here
apple
==>coffee
==>fig
==>honey
mango
pasta
sugar
==>tea
4) What does the --total
option do?
5) Will the comm
command fail if there are repeated lines in the input files? If not, what'd be the expected output for the command shown below?
$ cat s3.txt
apple
apple
guava
honey
tea
tea
tea
$ comm -23 s3.txt s1.txt