uniq command identifies similar lines that are adjacent to each other. There are various options to help you filter unique or duplicate lines, count them, group them, etc.
This is the default behavior of the
uniq command. If adjacent lines are the same, only the first copy will be displayed in the output.
# only the adjacent lines are compared to determine duplicates # which is why you get 'red' twice in the output for this input $ printf 'red\nred\nred\ngreen\nred\nblue\nblue' | uniq red green red blue
You'll need sorted input to make sure all the input lines are considered to determine duplicates. For some cases,
sort -u is enough, like the example shown below:
# same as sort -u for this case $ printf 'red\nred\nred\ngreen\nred\nblue\nblue' | sort | uniq blue green red
Sometimes though, you may need to sort based on some specific criteria and then identify duplicates based on the entire line contents. Here's an example:
# can't use sort -n -u here $ printf '2 balls\n13 pens\n2 pins\n13 pens\n' | sort -n | uniq 2 balls 2 pins 13 pens
sort+uniqwon't be suitable if you need to preserve the input order as well. You can use alternatives like
perland huniq for such cases.
# retain single copy of duplicates, maintain input order $ printf 'red\nred\nred\ngreen\nred\nblue\nblue' | awk '!seen[$0]++' red green blue
-d option will display only the duplicate entries. That is, only if a line is seen more than once.
$ cat purchases.txt coffee tea washing powder coffee toothpaste tea soap tea $ sort purchases.txt | uniq -d coffee tea
To display all the copies of duplicates, use the
$ sort purchases.txt | uniq -D coffee coffee tea tea tea
-u option will display only the unique entries. That is, only if a line doesn't occur more than once.
$ sort purchases.txt | uniq -u soap toothpaste washing powder # just a reminder that uniq works based on adjacent lines only $ printf 'red\nred\nred\ngreen\nred\nblue\nblue' | uniq -u green red
--group options allows you to visually separate groups of similar lines with an empty line. This option can accept four values —
both. The default is
separate, which adds a newline character between the groups.
prepend will add a newline before the first group as well and
append will add a newline after the last group.
$ sort purchases.txt | uniq --group coffee coffee soap tea tea tea toothpaste washing powder
--group option cannot be used with
-u options. The
--all-repeated alias for the
-D option uses
none as the default grouping. You can change that to
$ sort purchases.txt | uniq --all-repeated=prepend coffee coffee tea tea tea
If you want to know how many times a line has been repeated, use the
-c option. This will be added as a prefix.
$ sort purchases.txt | uniq -c 2 coffee 1 soap 3 tea 1 toothpaste 1 washing powder $ sort purchases.txt | uniq -dc 2 coffee 3 tea
The output of this option is usually piped to
sort for ordering the output by the count value.
$ sort purchases.txt | uniq -c | sort -n 1 soap 1 toothpaste 1 washing powder 2 coffee 3 tea $ sort purchases.txt | uniq -c | sort -nr 3 tea 2 coffee 1 washing powder 1 toothpaste 1 soap
-i option to ignore case while determining duplicates.
# depending on your locale, sort and sort -f can give the same results $ printf 'cat\nbat\nCAT\ncar\nbat\nmat\nmoat' | sort -f | uniq -iD bat bat cat CAT
uniq has three options to change the matching criteria to partial parts of the input line. These aren't as powerful as the
sort -k option, but they do come in handy for some use cases.
-f option allows you to skip first
N fields. Field separation is based on one or more space/tab characters only. Note that these separators will still be part of the field contents, so this will not work with variable number of blanks.
# skip first field, works as expected since no. of blanks is consistent $ printf '2 cars\n5 cars\n10 jeeps\n5 jeeps\n3 trucks\n' | uniq -f1 --group 2 cars 5 cars 10 jeeps 5 jeeps 3 trucks # example with variable number of blanks # 'cars' entries were identified as duplicates, but not 'jeeps' $ printf '2 cars\n5 cars\n1 jeeps\n5 jeeps\n3 trucks\n' | uniq -f1 2 cars 1 jeeps 5 jeeps 3 trucks
-s option allows you to skip first
N characters (calculated as bytes).
# skip first character $ printf '* red\n* green\n- green\n* blue\n= blue' | uniq -s1 * red * green * blue
-w option restricts the comparison to the first
N characters (calculated as bytes).
# compare only first 2 characters $ printf '1) apple\n1) almond\n2) banana\n3) cherry' | uniq -w2 1) apple 2) banana 3) cherry
When these options are used simultaneously, the priority is
-f first, then
-s and finally
-w option. Remember that blanks are part of the field content.
# skip first field # then skip first two characters (including the blank character) # use next two characters for comparison ('bl' and 'ch' in this example) $ printf '2 @blue\n10 :black\n5 :cherry\n3 @chalk' | uniq -f1 -s2 -w2 2 @blue 5 :cherry
If a line doesn't have enough fields or characters to satisfy the
-soptions respectively, a null string is used for comparison.
uniq can accept filename as the source of input contents, but only a maximum of one file. If you specify another file, it will be used as the output file.
$ printf 'apple\napple\nbanana\ncherry\ncherry\ncherry' > ip.txt $ uniq ip.txt op.txt $ cat op.txt apple banana cherry
-z option if you want to use NUL character as the line separator. In this scenario,
uniq will ensure to add a final NUL character even if not present in the input.
$ printf 'cherry\0cherry\0cherry\0apple\0banana' | uniq -z | cat -v cherry^@apple^@banana^@
If grouping is specified, NUL will be used as the separator instead of the newline character.
Here's some alternate commands you can explore if
uniq isn't enough to solve your task.