Frequently used options

This chapter will cover many of the options provided by GNU grep. Regular expressions will be covered from next chapter, so the examples in this chapter will use only literal strings for input patterns. Literal or fixed string matching means exact string comparison is intended, no special meaning for any character.

info Files used in examples are available chapter wise from learn_gnugrep_ripgrep repo. The directory for this chapter is freq_options.

By default, grep would print all input lines which matches the given search patterns. The newline character \n is considered as the line separator. This section will show you how to filter lines matching a given search string using grep.

$ # sample input file for this section
$ cat programming_quotes.txt
Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it by Brian W. Kernighan

Some people, when confronted with a problem, think - I know, I will
use regular expressions. Now they have two problems by Jamie Zawinski

A language that does not affect the way you think about programming,
is not worth knowing by Alan Perlis

There are 2 hard problems in computer science: cache invalidation,
naming things, and off-by-1 errors by Leon Bambrick

To filter the required lines, invoke grep command, pass the search string and then specify one or more filenames to be searched. As a good practice, always use single quotes around the search string. Examples requiring shell interpretation will be discussed later.

$ grep 'twice' programming_quotes.txt
Debugging is twice as hard as writing the code in the first place.

$ grep 'e th' programming_quotes.txt
Therefore, if you write the code as cleverly as possible, you are,
A language that does not affect the way you think about programming,

If the filename is - or left out, grep will perform the search on stdin data.

$ printf 'avocado\nmango\nguava' | grep 'v'
avocado
guava

warning If your input file has some other format like \r\n (carriage return and newline characters) as the line ending, convert the input file to Unix style before processing. See stackoverflow: Why does my tool output overwrite itself and how do I fix it? for a detailed discussion and mitigation methods. Make sure to remember this point, it'll come up in exercises.

$ # Unix and DOS style line endings
$ printf '42\n' | file -
/dev/stdin: ASCII text
$ printf '42\r\n' | file -
/dev/stdin: ASCII text, with CRLF line terminators

The search string (pattern) is treated as a Basic Regular Expression (BRE) by default. But regular expressions is a topic for the next chapter. For now, use the -F option to indicate that the patterns should be matched literally. As a performance optimization, GNU grep automatically tries to perform literal search even if -F option is not used depending upon the nature of the search string.

$ # oops, why did it not match?
$ echo 'int a[5]' | grep 'a[5]'
$ # where did that error come from??
$ echo 'int a[5]' | grep 'a['
grep: Invalid regular expression
$ # what is going on???
$ echo 'int a[5]' | grep 'a[5'
grep: Unmatched [, [^, [:, [., or [=

$ # use -F option or fgrep to match strings literally
$ echo 'int a[5]' | grep -F 'a[5]'
int a[5]
$ echo 'int a[5]' | fgrep 'a[5]'
int a[5]

Sometimes, you don't know if the log file has error or Error or ERROR and so on. In such cases, you can use the -i option to ignore case.

$ grep -i 'jam' programming_quotes.txt
use regular expressions. Now they have two problems by Jamie Zawinski

$ printf 'Cat\ncOnCaT\nscatter\ncut' | grep -i 'cat'
Cat
cOnCaT
scatter

Invert matching lines

Use -v option to get lines other than those matching the given search string.

$ seq 5 | grep -v '3'
1
2
4
5

$ printf 'goal\nrate\neat\npit' | grep -v 'at'
goal
pit

info Text processing often involves negating a logic to arrive at a solution or to make it simpler. Look out for opposite pairs like -l -L, -h -H, negative logic in regular expression, etc in coming sections.

Line number and count

The -n option will prefix line number and a colon character while displaying the output results. This is useful to quickly locate the matching lines for further editing.

$ grep -n 'not' programming_quotes.txt
3:by definition, not smart enough to debug it by Brian W. Kernighan
8:A language that does not affect the way you think about programming,
9:is not worth knowing by Alan Perlis

$ printf 'great\nnumber\numpteen' | grep -n 'r'
1:great
2:number

Having to count total number of matching lines comes up often. Somehow piping grep output to wc command is prevalent instead of simply using the -c option.

$ # number of lines matching the pattern
$ grep -c 'in' programming_quotes.txt
8

$ # number of lines NOT matching the pattern
$ printf 'goal\nrate\neat\npit' | grep -vc 'g'
3

With multiple file input, count is displayed for each file separately. Use cat if you need a combined count.

$ # here - is used to specify stdin as a file to be searched
$ seq 15 | grep -c '1' programming_quotes.txt -
programming_quotes.txt:1
(standard input):7

$ # useful application of cat command
$ cat <(seq 15) programming_quotes.txt | grep -c '1'
8

warning The output given by -c is total number of lines matching the given patterns, not total number of matches. Use -o option and pipe the output to wc -l to get total matches (example shown later).

Limiting output lines

Sometimes there are too many results in which case you could pipe the output to a pager tool like less. Or use the -m option to limit how many matching lines should be displayed for each input file. grep would stop processing an input file as soon as the condition specified by -m is satisfied. Just like -c option, note that -m works by line count, not by number of matches.

$ grep -m3 'in' programming_quotes.txt
Debugging is twice as hard as writing the code in the first place.
by definition, not smart enough to debug it by Brian W. Kernighan
Some people, when confronted with a problem, think - I know, I will

$ seq 1000 | grep -m4 '2'
2
12
20
21

Multiple search strings

The -e option can be used to specify multiple search strings from the command line. This is similar to conditional OR boolean logic.

$ # search for '1' or 'two'
$ grep -e '1' -e 'two' programming_quotes.txt
use regular expressions. Now they have two problems by Jamie Zawinski
naming things, and off-by-1 errors by Leon Bambrick

If there are lot of search strings, save them in a file (one search string per line and make sure there are no empty lines). Use -f option to specify a file as the source of search strings. You can use this option multiple times and also add more patterns from the command line using the -e option.

$ printf 'two\n1\n' > search_strings.txt
$ cat search_strings.txt
two
1

$ grep -f search_strings.txt programming_quotes.txt
use regular expressions. Now they have two problems by Jamie Zawinski
naming things, and off-by-1 errors by Leon Bambrick

$ grep -f search_strings.txt -e 'twice' programming_quotes.txt
Debugging is twice as hard as writing the code in the first place.
use regular expressions. Now they have two problems by Jamie Zawinski
naming things, and off-by-1 errors by Leon Bambrick

To find lines matching all of the search strings, you'd need to resort to regular expressions (covered later) or workaround by using shell pipes. This is similar to conditional AND boolean logic.

$ # match lines containing both 'in' and 'not' in any order
$ # same as: grep 'not' programming_quotes.txt | grep 'in'
$ grep 'in' programming_quotes.txt | grep 'not'
by definition, not smart enough to debug it by Brian W. Kernighan
A language that does not affect the way you think about programming,
is not worth knowing by Alan Perlis

Get filename instead of matching lines

Often, you just want a list of filenames that match the search patterns. The output might get saved for future reference, passed to another command like sed/awk/perl/sort/etc for further processing and so on. Some of these commands can handle search by themselves, but grep is fast and specialized tool for searching and using shell pipes can improve performance if parallel processing is available. Similar to the -m option, grep will stop processing the input file as soon as the given condition is satisfied.

  • -l will list files matching the pattern
  • -L will list files NOT matching the pattern
$ # list filename if it contains 'are' anywhere in the file
$ grep -l 'are' programming_quotes.txt search_strings.txt
programming_quotes.txt
$ # no output because no match was found
$ grep -l 'xyz' programming_quotes.txt search_strings.txt
$ # list filename if it contains '1' anywhere in the file
$ grep -l '1' programming_quotes.txt search_strings.txt
programming_quotes.txt
search_strings.txt

$ # list filename if it does NOT contain 'xyz' anywhere in the file
$ grep -L 'xyz' programming_quotes.txt search_strings.txt
programming_quotes.txt
search_strings.txt
$ grep -L 'are' programming_quotes.txt search_strings.txt
search_strings.txt

Filename prefix for matching lines

If there are multiple file inputs, grep would automatically prefix filename while displaying matching lines. You can also control whether or not to add the prefix using options.

  • -h option will prevent filename prefix in the output (this is the default for single file input)
  • -H option will always show filename prefix (this is the default for multiple file input)
$ # -h is on by default for single file input
$ grep '1' programming_quotes.txt
naming things, and off-by-1 errors by Leon Bambrick
$ # using -h to suppress filename prefix for multiple file input
$ seq 1000 | grep -h -m3 '1' - programming_quotes.txt
1
10
11
naming things, and off-by-1 errors by Leon Bambrick

$ # -H is on by default for multiple file input
$ seq 1000 | grep -m3 '1' - programming_quotes.txt
(standard input):1
(standard input):10
(standard input):11
programming_quotes.txt:naming things, and off-by-1 errors by Leon Bambrick
$ # using -H to always show filename prefix
$ # another trick instead of -H is to provide /dev/null as an input file
$ grep -H '1' programming_quotes.txt
programming_quotes.txt:naming things, and off-by-1 errors by Leon Bambrick

The vim editor has an option -q that allows to easily edit the matching lines from grep output if it has both line number and filename prefixes.

$ grep -Hn '1' *
programming_quotes.txt:12:naming things, and off-by-1 errors by Leon Bambrick
search_strings.txt:2:1

$ # use :cn and :cp to navigate to next/previous occurrences
$ # the status line at bottom will have additional info
$ # use -H or /dev/null to ensure filename is always present in the output
$ vim -q <(grep -Hn '1' *)

Colored output

When working from terminal, having --color option enabled makes it easier to spot the matching portions in the output. Especially when you are experimenting to get the correct regular expression. Modern terminals will usually have color support, see unix.stackexchange: How to check if bash can print colors? for details.

The --color option will highlight matching patterns, line numbers, filename, etc. It has three different settings:

  • auto will result in color highlighting when results are displayed on terminal, but not when output is redirected to another command, file, etc. This is the default setting
  • always will result in color highlighting when results are displayed on terminal as well as when output is redirected to another command, file, etc
  • never explicitly disable color highlighting

grep color output

The below image shows difference between auto and always. In the first case, in is highlighted even after piping, while in the second case, in is not highlighted. In practice, always is rarely used as it has extra information added to matching lines and could cause undesirable results when processing such lines.

grep auto vs always

Usually, both ls and grep commands are aliased to include --color=auto.

$ # this is usually saved in ~/.bashrc or ~/.bash_aliases
$ alias grep='grep --color=auto'
$ # another use case for 'always' is piping the results to 'less' command
$ grep --color=always 'not' programming_quotes.txt | less -R

Match whole word or line

A word character is any alphabet (irrespective of case), digit and the underscore character. You might wonder why there are digits and underscores as well, why not only alphabets? This comes from variable and function naming conventions — typically alphabets, digits and underscores are allowed. So, the definition is more programming oriented than natural language. The -w option will ensure that given patterns are not surrounded by other word characters. For example, this helps to distinguish par from spar, park, apart, par2, _par, etc

warning The -w option behaves a bit differently than word boundaries in regular expressions. See Word boundary differences section for details.

$ # this matches 'par' anywhere in the line
$ printf 'par value\nheir apparent\n' | grep 'par'
par value
heir apparent
$ # this matches 'par' only as a whole word
$ printf 'par value\nheir apparent\n' | grep -w 'par'
par value

Another useful option is -x which will display a line only if the entire line satisfies the given pattern.

$ # this matches 'my book' anywhere in the line
$ printf 'see my book list\nmy book\n' | grep 'my book'
see my book list
my book
$ # this matches 'my book' only if no other characters are present
$ printf 'see my book list\nmy book\n' | grep -x 'my book'
my book

$ grep '1' *.txt
programming_quotes.txt:naming things, and off-by-1 errors by Leon Bambrick
search_strings.txt:1
$ grep -x '1' *.txt
search_strings.txt:1

$ # counting empty lines, won't work for files with DOS style line endings
$ grep -cx '' programming_quotes.txt
3

Comparing lines between files

The -f and -x options can be combined to get common lines between two files or the difference when -v is used as well. In these cases, it is advised to use -F because you might not know if there are regular expression metacharacters present in the input files or not.

$ printf 'teal\nlight blue\nbrown\nyellow\n' > colors_1
$ printf 'blue\nblack\ndark green\nyellow\n' > colors_2

$ # common lines between two files
$ grep -Fxf colors_1 colors_2
yellow

$ # lines present in colors_2 but not in colors_1
$ grep -Fvxf colors_1 colors_2
blue
black
dark green

$ # lines present in colors_1 but not in colors_2
$ grep -Fvxf colors_2 colors_1
teal
light blue
brown

See also stackoverflow: Fastest way to find lines of a text file from another larger text file — go through all the answers.

Extract only matching portion

If total number of matches is required, use the -o option to display only the matching portions (one per line) and then use wc to count. This option is more commonly used with regular expressions.

$ grep -o -e 'twice' -e 'hard' programming_quotes.txt
twice
hard
hard

$ # -c only gives count of matching lines
$ grep -c 'in' programming_quotes.txt
8
$ # use -o to get each match on a separate line
$ grep -o 'in' programming_quotes.txt | wc -l
13

Summary

In my initial years of cli usage as a VLSI engineer, I knew may be about five of the options listed in this chapter. Didn't even know about the color option. I've seen comments about not knowing -c option. These are some of the reasons why I'd advice to go through list of all the options if you are using a command frequently. Bonus points for maintaining a list of example usage for future reference, passing on to your colleagues, etc.

Exercises

info All the exercises are also collated together in one place at Exercises.md. For solutions, see Exercise_solutions.md.

First create exercises directory and then within it, create another directory for this chapter, say freq_options or chapter_2. Input is a file downloaded from internet — https://www.gutenberg.org/files/345/old/345.txt saved as dracula.txt. To solve the exercises, modify the partial command shown just before the expected output.

a) Display all lines containing ablaze

$ mkdir -p exercises/freq_options && cd $_
$ wget https://www.gutenberg.org/files/345/old/345.txt -O dracula.txt

$ grep ##### add your solution here
the room, his face all ablaze with excitement. He rushed up to me and

b) Display all lines containing abandon as a whole word.

$ grep ##### add your solution here
inheritors, being remote, would not be likely to abandon their just

c) Display all lines that satisfies both of these conditions:

  • professor matched irrespective of case
  • either quip or sleep matched case sensitively
$ grep ##### add your solution here
equipment of a professor of the healing craft. When we were shown in,
its potency; and she fell into a deep sleep. When the Professor was
sleeping, and the Professor seemingly had not moved from his seat at her
to sleep, and something weaker when she woke from it. The Professor and

d) Display first three lines containing Count

$ grep ##### add your solution here
town named by Count Dracula, is a fairly well-known place. I shall enter
must ask the Count all about them.)
Count Dracula had directed me to go to the Golden Krone Hotel, which I

e) Display first six lines containing Harker but not either of Journal or Letter

$ grep ##### add your solution here
said, "The Herr Englishman?" "Yes," I said, "Jonathan Harker." She
"I am Dracula; and I bid you welcome, Mr. Harker, to my house. Come in;
I shall be all alone, and my friend Harker Jonathan--nay, pardon me, I
Jonathan Harker will not be by my side to correct and aid me. He will be
"I write by desire of Mr. Jonathan Harker, who is himself not strong
junior partner of the important firm Hawkins & Harker; and so, as you

f) Display lines containing Zooelogical Gardens along with line number prefix.

$ grep ##### add your solution here
5597:         _Interview with the Keeper in the Zooelogical Gardens._
5601:the keeper of the section of the Zooelogical Gardens in which the wolf
8042:the Zooelogical Gardens a young one may have got loose, or one be bred

g) Find total count of whole word the (irrespective of case).

$ grep ##### add your solution here
8090

h) The below code snippet tries to get number of empty lines, but apparently shows wrong result, why?

$ grep -cx '' dracula.txt
0