comm

The comm command finds common and unique lines between two sorted files. These results are formatted as a table with three columns and one or more of these columns can be suppressed as required.

Three column output

Consider the sample input files as shown below:

# side by side view of the sample files
# note that these files are already sorted
$ paste colors_1.txt colors_2.txt
Blue    Black
Brown   Blue
Orange  Green
Purple  Orange
Red     Pink
Teal    Red
White   White

By default, comm gives a tabular output with three columns:

first column has lines unique to the first file
second column has lines unique to the second file
third column has lines common to both the files

The columns are separated by a tab character. Here's the output for the above sample files:

$ comm colors_1.txt colors_2.txt
        Black
                Blue
Brown
        Green
                Orange
        Pink
Purple
                Red
Teal
                White

You can change the column separator to a string of your choice using the --output-delimiter option. Here's an example:

# note that the input files need not have the same number of lines
$ comm <(seq 3) <(seq 2 5)
1
                2
                3
        4
        5

$ comm --output-delimiter=, <(seq 3) <(seq 2 5)
1
,,2
,,3
,4
,5

Collating order for comm should be same as the one used to sort the input files.

--nocheck-order option can be used for unsorted inputs. However, as per the documentation, this option "is not guaranteed to produce any particular output."

Suppressing columns

You can use one or more of the following options to suppress columns:

-1 to suppress the lines unique to the first file
-2 to suppress the lines unique to the second file
-3 to suppress the lines common to both the files

Here's how the output looks like when you suppress one of the columns:

# suppress lines common to both the files
$ comm -3 colors_1.txt colors_2.txt
        Black
Brown
        Green
        Pink
Purple
Teal

Combining two of these options gives three useful solutions. -12 will give you only the common lines.

$ comm -12 colors_1.txt colors_2.txt
Blue
Orange
Red
White

-23 will give you the lines unique to the first file.

$ comm -23 colors_1.txt colors_2.txt
Brown
Purple
Teal

-13 will give you the lines unique to the second file.

$ comm -13 colors_1.txt colors_2.txt
Black
Green
Pink

You can combine all the three options as well. Useful with the --total option to get only the count of lines for each of the three columns.

$ comm --total -123 colors_1.txt colors_2.txt
3       3       4       total

The number of duplicate lines in the common column will be minimum of the duplicate occurrences between the two files. Rest of the duplicate lines, if any, will be considered as unique to the file having the excess lines. Here's an example:

$ paste list_1.txt list_2.txt
apple   cherry
banana  cherry
cherry  mango
cherry  papaya
cherry  
cherry  

# 'cherry' occurs only twice in the second file
# rest of the 'cherry' lines will be unique to the first file
$ comm list_1.txt list_2.txt
apple
banana
                cherry
                cherry
cherry
cherry
        mango
        papaya

NUL separator

Use the -z option if you want to use NUL character as the line separator. In this scenario, comm will ensure to add a final NUL character even if not present in the input.

$ comm -z -12 <(printf 'a\0b\0c') <(printf 'a\0c\0x') | cat -v
a^@c^@

Alternatives

Here are some alternate commands you can explore if comm isn't enough to solve your task. These alternatives do not require the input files to be sorted.

zet — set operations on one or more input files
Comparing lines between files section from my GNU grep ebook
Two file processing chapter from my GNU awk ebook, has examples for both line and field based comparisons
Two file processing chapter from my Perl one-liners ebook, has examples for both line and field based comparisons

Exercises

The exercises directory has all the files used in this section.

1) Get the common lines between the s1.txt and s2.txt files. Assume that their contents are already sorted.

$ paste s1.txt s2.txt
apple   banana
coffee  coffee
fig     eclair
honey   fig
mango   honey
pasta   milk
sugar   tea
tea     yeast

##### add your solution here
coffee
fig
honey
tea

2) Display lines present in s1.txt but not s2.txt and vice versa.

# lines unique to the first file
##### add your solution here
apple
mango
pasta
sugar

# lines unique to the second file
##### add your solution here
banana
eclair
milk
yeast

3) Display lines unique to the s1.txt file and the common lines when compared to the s2.txt file. Use ==> to separate the output columns.

##### add your solution here
apple
==>coffee
==>fig
==>honey
mango
pasta
sugar
==>tea

4) What does the --total option do?

5) Will the comm command fail if there are repeated lines in the input files? If not, what'd be the expected output for the command shown below?

$ cat s3.txt
apple
apple
guava
honey
tea
tea
tea

$ comm -23 s3.txt s1.txt

CLI text processing with GNU Coreutils