uniq

The uniq command identifies similar lines that are adjacent to each other. There are various options to help you filter unique or duplicate lines, count them, group them, etc.

Retain single copy of duplicates

This is the default behavior of the uniq command. If adjacent lines are the same, only the first copy will be displayed in the output.

# only the adjacent lines are compared to determine duplicates
# which is why you get 'red' twice in the output for this input
$ printf 'red\nred\nred\ngreen\nred\nblue\nblue' | uniq
red
green
red
blue

You'll need sorted input to make sure all the input lines are considered to determine duplicates. For some cases, sort -u is enough, like the example shown below:

# same as sort -u for this case
$ printf 'red\nred\nred\ngreen\nred\nblue\nblue' | sort | uniq
blue
green
red

Sometimes though, you may need to sort based on some specific criteria and then identify duplicates based on the entire line contents. Here's an example:

# can't use sort -n -u here
$ printf '2 balls\n13 pens\n2 pins\n13 pens\n' | sort -n | uniq
2 balls
2 pins
13 pens

info sort+uniq won't be suitable if you need to preserve the input order as well. You can use alternatives like awk, perl and huniq for such cases.

# retain single copy of duplicates, maintain input order
$ printf 'red\nred\nred\ngreen\nred\nblue\nblue' | awk '!seen[$0]++'
red
green
blue

Duplicates only

The -d option will display only the duplicate entries. That is, only if a line is seen more than once.

$ cat purchases.txt
coffee
tea
washing powder
coffee
toothpaste
tea
soap
tea

$ sort purchases.txt | uniq -d
coffee
tea

To display all the copies of duplicates, use the -D option.

$ sort purchases.txt | uniq -D
coffee
coffee
tea
tea
tea

Unique only

The -u option will display only the unique entries. That is, only if a line doesn't occur more than once.

$ sort purchases.txt | uniq -u
soap
toothpaste
washing powder

# just a reminder that uniq works based on adjacent lines only
$ printf 'red\nred\nred\ngreen\nred\nblue\nblue' | uniq -u
green
red

Grouping similar lines

The --group options allows you to visually separate groups of similar lines with an empty line. This option can accept four values — separate, prepend, append and both. The default is separate, which adds a newline character between the groups. prepend will add a newline before the first group as well and append will add a newline after the last group. both combines prepend and append behavior.

$ sort purchases.txt | uniq --group
coffee
coffee

soap

tea
tea
tea

toothpaste

washing powder

The --group option cannot be used with -c, -d, -D or -u options. The --all-repeated alias for the -D option uses none as the default grouping. You can change that to separate or prepend values.

$ sort purchases.txt | uniq --all-repeated=prepend

coffee
coffee

tea
tea
tea

Prefix count

If you want to know how many times a line has been repeated, use the -c option. This will be added as a prefix.

$ sort purchases.txt | uniq -c
      2 coffee
      1 soap
      3 tea
      1 toothpaste
      1 washing powder

$ sort purchases.txt | uniq -dc
      2 coffee
      3 tea

The output of this option is usually piped to sort for ordering the output by the count value.

$ sort purchases.txt | uniq -c | sort -n
      1 soap
      1 toothpaste
      1 washing powder
      2 coffee
      3 tea

$ sort purchases.txt | uniq -c | sort -nr
      3 tea
      2 coffee
      1 washing powder
      1 toothpaste
      1 soap

Ignoring case

Use the -i option to ignore case while determining duplicates.

# depending on your locale, sort and sort -f can give the same results
$ printf 'cat\nbat\nCAT\ncar\nbat\nmat\nmoat' | sort -f | uniq -iD
bat
bat
cat
CAT

Partial match

uniq has three options to change the matching criteria to partial parts of the input line. These aren't as powerful as the sort -k option, but they do come in handy for some use cases.

The -f option allows you to skip first N fields. Field separation is based on one or more space/tab characters only. Note that these separators will still be part of the field contents, so this will not work with variable number of blanks.

# skip first field, works as expected since no. of blanks is consistent
$ printf '2 cars\n5 cars\n10 jeeps\n5 jeeps\n3 trucks\n' | uniq -f1 --group
2 cars
5 cars

10 jeeps
5 jeeps

3 trucks

# example with variable number of blanks
# 'cars' entries were identified as duplicates, but not 'jeeps'
$ printf '2 cars\n5 cars\n1 jeeps\n5  jeeps\n3 trucks\n' | uniq -f1
2 cars
1 jeeps
5  jeeps
3 trucks

The -s option allows you to skip first N characters (calculated as bytes).

# skip first character
$ printf '* red\n* green\n- green\n* blue\n= blue' | uniq -s1
* red
* green
* blue

The -w option restricts the comparison to the first N characters (calculated as bytes).

# compare only first 2 characters
$ printf '1) apple\n1) almond\n2) banana\n3) cherry' | uniq -w2
1) apple
2) banana
3) cherry

When these options are used simultaneously, the priority is -f first, then -s and finally -w option. Remember that blanks are part of the field content.

# skip first field
# then skip first two characters (including the blank character)
# use next two characters for comparison ('bl' and 'ch' in this example)
$ printf '2 @blue\n10 :black\n5 :cherry\n3 @chalk' | uniq -f1 -s2 -w2
2 @blue
5 :cherry

info If a line doesn't have enough fields or characters to satisfy the -f and -s options respectively, a null string is used for comparison.

Specifying output file

uniq can accept filename as the source of input contents, but only a maximum of one file. If you specify another file, it will be used as the output file.

$ printf 'apple\napple\nbanana\ncherry\ncherry\ncherry' > ip.txt
$ uniq ip.txt op.txt

$ cat op.txt
apple
banana
cherry

NUL separator

Use -z option if you want to use NUL character as the line separator. In this scenario, uniq will ensure to add a final NUL character even if not present in the input.

$ printf 'cherry\0cherry\0cherry\0apple\0banana' | uniq -z | cat -v
cherry^@apple^@banana^@

info If grouping is specified, NUL will be used as the separator instead of the newline character.

Alternatives

Here's some alternate commands you can explore if uniq isn't enough to solve your task.