comm

The comm command finds common and unique lines between two sorted files. These results are formatted as a table with three columns and one or more of these columns can be suppressed as required.

Three column output

Consider the sample input files as shown below:

# side by side view of the sample files
# note that these files are already sorted
$ paste colors_1.txt colors_2.txt
Blue    Black
Brown   Blue
Orange  Green
Purple  Orange
Red     Pink
Teal    Red
White   White

By default, comm gives a tabular output with three columns:

  • first column has lines unique to the first file
  • second column has lines unique to the second file
  • third column has lines common to both the files

The columns are separated by a tab character. Here's the output for the above sample files:

$ comm colors_1.txt colors_2.txt
        Black
                Blue
Brown
        Green
                Orange
        Pink
Purple
                Red
Teal
                White

You can change the column separator to a string of your choice using the --output-delimiter option. Here's an example:

# note that the input files need not have the same number of lines
$ comm <(seq 3) <(seq 2 5)
1
                2
                3
        4
        5

$ comm --output-delimiter=, <(seq 3) <(seq 2 5)
1
,,2
,,3
,4
,5

info Collating order for comm should be same as the one used to sort the input files.

info --nocheck-order option can be used for unsorted inputs. However, as per the documentation, this option "is not guaranteed to produce any particular output."

Suppressing columns

You can use one or more of the following options to suppress columns:

  • -1 to suppress lines unique to the first file
  • -2 to suppress lines unique to the second file
  • -3 to suppress lines common to both the files

Here's how the output looks like when you suppress one of the columns:

# suppress lines common to both the files
$ comm -3 colors_1.txt colors_2.txt
        Black
Brown
        Green
        Pink
Purple
Teal

Combining two of these options gives three useful solutions. -12 will give you only the common lines.

$ comm -12 colors_1.txt colors_2.txt
Blue
Orange
Red
White

-23 will give you the lines unique to the first file.

$ comm -23 colors_1.txt colors_2.txt
Brown
Purple
Teal

-13 will give you the lines unique to the second file.

$ comm -13 colors_1.txt colors_2.txt
Black
Green
Pink

You can combine all the three options as well. Useful with the --total option to get only the count of lines for each of the three columns.

$ comm --total -123 colors_1.txt colors_2.txt
3       3       4       total

Duplicate lines

The number of duplicate lines in the common column will be minimum of the duplicate occurrences between the two files. Rest of the duplicate lines, if any, will be considered as unique to the file having the excess lines. Here's an example:

$ paste list_1.txt list_2.txt
apple   cherry
banana  cherry
cherry  mango
cherry  papaya
cherry  
cherry  

# 'cherry' occurs only twice in the second file
# rest of the 'cherry' lines will be unique to the first file
$ comm list_1.txt list_2.txt
apple
banana
                cherry
                cherry
cherry
cherry
        mango
        papaya

NUL separator

Use -z option if you want to use NUL character as the line separator. In this scenario, comm will ensure to add a final NUL character even if not present in the input.

$ comm -z -12 <(printf 'a\0b\0c') <(printf 'a\0c\0x') | cat -v
a^@c^@

Alternatives

Here's some alternate commands you can explore if comm isn't enough to solve your task. These alternatives do not require the input files to be sorted.