Comparing Files

In this chapter, you'll learn how to find and report differences between the contents of two files.

The example_files directory has the sample input files used in this chapter.

cmp

The cmp command is useful to compare text and binary files. If the two input files have the same content, no output is displayed and exit status is 0. If there is a difference, it prints the first difference with details like line number and byte location and the exit status will be 1.

$ mkdir practice_cmp
$ cd practice_cmp
$ echo 'hello' > x1.txt
$ cp x{1,2}.txt
$ echo 'hello.' > x3.txt

# files with the same content
$ cmp x1.txt x2.txt
$ echo $?
0

# files with differences
$ cmp x1.txt x3.txt
x1.txt x3.txt differ: byte 6, line 1
$ echo $?
1

Use the -s option to suppress the output when you just need the exit status. The -i option will allow you to skip initial bytes from the input.

diff

Useful to find differences between text files. All the differences are printed, which might not be desirable for long files.

Common options

Commonly used options are shown below. Examples will be discussed in the later sections.

-i ignore case
-w ignore whitespaces
-b ignore changes in the amount of whitespace
-B ignore only blank lines
-E ignore changes due to tab expansion
-z ignore trailing whitespaces at the end of lines
-y two column output
-r recursively compare files between the two directories specified
-s convey message when two files are same
-q report if files differ, not the details of differences

Default diff

By default, the diff output shows lines from the first input file prefixed with < and lines from the second file prefixed with >. A line containing --- is used as the group separator. Each difference is prefixed by a command that indicates the differences (these commands are understood by tools like patch).

# change to the 'example_files/text_files' directory
# side-by-side view of sample input files
$ paste f1.txt f2.txt
1       1
2       hello
3       3
world   4

$ diff f1.txt f2.txt
2c2
< 2
---
> hello
4c4
< world
---
> 4

$ diff <(seq 4) <(seq 5)
4a5
> 5

Ignoring whitespaces

There are several options to ignore specific whitespace characters during comparison. Here are some examples:

# ignore changes in the amount of whitespace
$ diff -b <(echo 'good day') <(echo 'good    day')
$ echo $?
0

# ignore all whitespaces
$ diff -w <(echo 'hi    there ') <(echo ' hi there')
$ echo $?
0
$ diff -w <(echo 'hi    there ') <(echo 'hithere')
$ echo $?
0

Side-by-side output

The -y option is handy to view the differences side-by-side. By default, all the input lines will be present in the output and the line width is 130 print columns. You can use the -W option to change the width when dealing with short input lines. The --suppress-common-lines helps to focus only on the differences.

$ diff -y f1.txt f2.txt
1                                                               1
2                                                             | hello
3                                                               3
world                                                         | 4

$ diff -W 60 --suppress-common-lines -y f1.txt f2.txt
2                            |  hello
world                        |  4

Exercises

Use the example_files/text_files directory for input files used in the following exercises.

1) Which cmp option would you use if you just need the exit status reflecting whether the given inputs are same or not?

2) Which cmp option would you use to skip the initial bytes for comparison purposes? The below example requires you to skip the first two bytes.

$ echo '1) apple' > x1.txt
$ echo '2. apple' > x2.txt
$ cmp x1.txt x2.txt
x1.txt x2.txt differ: byte 1, line 1

$ cmp # ???
$ echo $?
0

$ rm x[12].txt

3) What does the diff -d option do?

4) Which option will help you get colored output with diff?

5) Use appropriate options to get the desired output shown below.

# instead of this output
$ diff -W 40 --suppress-common-lines -y f1.txt f2.txt
2                  |    hello
world              |    4

# get this output
$ diff # ???
1                  (
2                  |    hello
3                  (
world              |    4

6) Use appropriate options to get the desired output shown below.

$ echo 'hello' > d1.txt
$ echo 'Hello' > d2.txt

# instead of this output
$ diff d1.txt d2.txt
1c1
< hello
---
> Hello

# get this output
$ diff # ???
Files d1.txt and d2.txt are identical

$ rm d[12].txt

Linux Command Line Computing