comm command finds common and unique lines between two sorted files. These results are formatted as a table with three columns and one or more of these columns can be suppressed as required.
Consider the sample input files as shown below:
# side by side view of the sample files # note that these files are already sorted $ paste colors_1.txt colors_2.txt Blue Black Brown Blue Orange Green Purple Orange Red Pink Teal Red White White
comm gives a tabular output with three columns:
- first column has lines unique to the first file
- second column has lines unique to the second file
- third column has lines common to both the files
The columns are separated by a tab character. Here's the output for the above sample files:
$ comm colors_1.txt colors_2.txt Black Blue Brown Green Orange Pink Purple Red Teal White
You can change the column separator to a string of your choice using the
--output-delimiter option. Here's an example:
# note that the input files need not have the same number of lines $ comm <(seq 3) <(seq 2 5) 1 2 3 4 5 $ comm --output-delimiter=, <(seq 3) <(seq 2 5) 1 ,,2 ,,3 ,4 ,5
Collating order for
commshould be same as the one used to
sortthe input files.
--nocheck-orderoption can be used for unsorted inputs. However, as per the documentation, this option "is not guaranteed to produce any particular output."
You can use one or more of the following options to suppress columns:
-1to suppress lines unique to the first file
-2to suppress lines unique to the second file
-3to suppress lines common to both the files
Here's how the output looks like when you suppress one of the columns:
# suppress lines common to both the files $ comm -3 colors_1.txt colors_2.txt Black Brown Green Pink Purple Teal
Combining two of these options gives three useful solutions.
-12 will give you only the common lines.
$ comm -12 colors_1.txt colors_2.txt Blue Orange Red White
-23 will give you the lines unique to the first file.
$ comm -23 colors_1.txt colors_2.txt Brown Purple Teal
-13 will give you the lines unique to the second file.
$ comm -13 colors_1.txt colors_2.txt Black Green Pink
You can combine all the three options as well. Useful with the
--total option to get only the count of lines for each of the three columns.
$ comm --total -123 colors_1.txt colors_2.txt 3 3 4 total
The number of duplicate lines in the common column will be minimum of the duplicate occurrences between the two files. Rest of the duplicate lines, if any, will be considered as unique to the file having the excess lines. Here's an example:
$ paste list_1.txt list_2.txt apple cherry banana cherry cherry mango cherry papaya cherry cherry # 'cherry' occurs only twice in the second file # rest of the 'cherry' lines will be unique to the first file $ comm list_1.txt list_2.txt apple banana cherry cherry cherry cherry mango papaya
-z option if you want to use NUL character as the line separator. In this scenario,
comm will ensure to add a final NUL character even if not present in the input.
$ comm -z -12 <(printf 'a\0b\0c') <(printf 'a\0c\0x') | cat -v a^@c^@
Here's some alternate commands you can explore if
comm isn't enough to solve your task. These alternatives do not require the input files to be sorted.