Frequently used options

This chapter will cover many of the options provided by GNU grep. Regular expressions will be covered later, so the examples in this chapter will only use literal strings as search patterns. Literal (fixed string) matching refers to exact string comparison, so no special meaning is assigned for any of the search characters.

The example_files directory has all the files used in the examples.

Basic string search

By default, grep will print all the input lines that match the given search patterns. The newline character \n is the line separator by default. This section will show you how to filter lines matching a given search string using grep. Consider this sample input file:

$ cat ip.txt
it is a warm and cozy day
listen to what I say
go play in the park
come back before the sky turns dark

There are so many delights to cherish
Apple, Banana and Cherry
Bread, Butter and Jelly
Try them all before you perish

To filter desired lines, invoke the grep command, pass the search string and then specify one or more filenames that have to be searched. As a good practice, always use single quotes around the search string. Examples requiring shell interpretation will be discussed later.

$ grep 'play' ip.txt
go play in the park

$ grep 'y t' ip.txt
come back before the sky turns dark
Try them all before you perish

grep will perform the search on stdin data if there are no file arguments or if - is used as a filename.

$ printf 'apple\nbanana\nmango\nfig\n' | grep 'an'
banana
mango

$ printf 'apple\nbanana\nmango\nfig\n' | grep 'an' -
banana
mango

Here's an example where grep reads user written stdin data and the filtered output is redirected to a file.

# press Ctrl+d after the line containing 'histogram'
$ grep 'is' > op.txt
hi there
this is a sample line
have a nice day
histogram

$ cat op.txt
this is a sample line
histogram

$ rm op.txt

If your input file has \r\n (carriage return and newline characters) as the line ending, convert the input file to Unix-style before processing. See stackoverflow: Why does my tool output overwrite itself and how do I fix it? for a detailed discussion and mitigation methods.
# Unix style
$ printf '42\n' | file -
/dev/stdin: ASCII text

# DOS style
$ printf '42\r\n' | file -
/dev/stdin: ASCII text, with CRLF line terminators

Fixed string search

The search string (pattern) is treated as a Basic Regular Expression (BRE) by default. But regular expressions is a topic for the next chapter. For now, use the -F option to indicate that the patterns should be matched literally.

# oops, why did it not match?
$ echo 'int a[5]' | grep 'a[5]'
# where did that error come from??
$ echo 'int a[5]' | grep 'a['
grep: Invalid regular expression
# what is going on???
$ echo 'int a[5]' | grep 'a[5'
grep: Unmatched [, [^, [:, [., or [=

# use the -F option to match strings literally
$ echo 'int a[5]' | grep -F 'a[5]'
int a[5]

If the search string doesn't have any regular expression metacharacters, GNU grep will try a literal search even if the -F option isn't used.

Case insensitive search

Sometimes, you don't know if a log file contains search terms, such as error, Error, or ERROR. In such cases, you can use the -i option to ignore case.

$ grep -i 'the' ip.txt
go play in the park
come back before the sky turns dark
There are so many delights to cherish
Try them all before you perish

$ printf 'Cat\ncOnCaT\ncut\n' | grep -i 'cat'
Cat
cOnCaT

Invert matching lines

Use the -v option to get lines other than those matching the search term.

$ seq 4 | grep -v '3'
1
2
4

$ printf 'goal\nrate\neat\npit' | grep -v 'at'
goal
pit

Text processing often involves negating a logic to arrive at a solution or to make it simpler. Look out for opposite pairs like -l -L, -h -H, negative logic in regular expressions and so on in the examples to follow.

Line number and count

The -n option will prefix line numbers to matching results, using a colon character as the separator. This is useful to quickly locate matching lines for further processing.

$ grep -n 'to' ip.txt
2:listen to what I say
6:There are so many delights to cherish

$ printf 'great\nneat\nuser' | grep -n 'eat'
1:great
2:neat

Having to count the total number of matching lines comes up often. Somehow piping grep output to the wc command is prevalent instead of simply using the -c option.

# number of lines matching the pattern
$ grep -c 'is' ip.txt
4

# number of lines NOT matching the pattern
$ printf 'goal\nrate\neat\npit' | grep -vc 'g'
3

When multiple input files are passed, the count is displayed for each file separately. Use cat if you need a combined count.

# here - represents the stdin data
$ printf 'this\nis\ncool\n' | grep -c 'is' ip.txt -
ip.txt:4
(standard input):2

# useful application of the cat command
$ cat <(printf 'this\nis\ncool\n') ip.txt | grep -c 'is'
6

The output given by the -c option is the total number of lines matching the given patterns, not the total number of matches. Use the -o option, and pipe the output to wc -l to count every occurrence (example will be shown later).

Limiting output lines

Sometimes, there are too many results, in which case you could pipe the output to a pager tool like less. Or use the -m option to limit how many matching lines should be displayed for each input file. grep will stop processing an input file as soon as the condition specified by -m is satisfied. Note that just like the -c option, -m works by line count and not based on the total number of matches.

$ grep -m2 'is' ip.txt
it is a warm and cozy day
listen to what I say

$ seq 1000 | grep -m4 '2'
2
12
20
21

Multiple search strings

The -e option can be used to specify multiple search strings from the command line. This is similar to conditional OR boolean logic.

# search for 'what' or 'But'
$ grep -e 'what' -e 'But' ip.txt
listen to what I say
Bread, Butter and Jelly

If you have a huge list of strings to search, save them in a file, one search string per line. Make sure there are no empty lines. Then use the -f option to specify a file as the source of search strings. You can use this option multiple times and also add more patterns from the command line using the -e option. Also, add the -F option when searching for literal matches. It is easy to miss regular expression metacharacters in a big list of terms.

$ cat search.txt
say
you

$ grep -Ff search.txt ip.txt
listen to what I say
Try them all before you perish

# example with both -f and -e options
$ grep -Ff search.txt -e 'it' -e 'are' ip.txt
it is a warm and cozy day
listen to what I say
There are so many delights to cherish
Try them all before you perish

To find lines matching more than one search term, you'd need to either resort to using regular expressions (covered later) or workaround by using shell pipes. This is similar to conditional AND boolean logic.

# match lines containing both 'is' and 'to' in any order
# same as: grep 'to' ip.txt | grep 'is'
$ grep 'is' ip.txt | grep 'to'
listen to what I say
There are so many delights to cherish

Get filename instead of matching lines

Often, you just want a list of filenames that match the search patterns. The output might get saved for future reference or passed to another command like sed, awk, perl, sort, etc for further processing. Some of these commands can handle search by themselves, but grep is a fast and specialized tool for searching and using shell pipes can improve performance if parallel processing is available. Similar to the -m option, grep will stop processing the input file as soon as the given condition is satisfied.

-l will list files matching the pattern
-L will list files NOT matching the pattern

Here are some examples:

# list filename if it contains 'are'
$ grep -l 'are' ip.txt search.txt
ip.txt
# no output because no match was found
$ grep -l 'xyz' ip.txt search.txt
# list filename if it contains 'say'
$ grep -l 'say' ip.txt search.txt
ip.txt
search.txt

# list filename if it does NOT contain 'xyz'
$ grep -L 'xyz' ip.txt search.txt
ip.txt
search.txt
# list filename if it does NOT contain 'are'
$ grep -L 'are' ip.txt search.txt
search.txt

Filename prefix for matching lines

If there are multiple input files, grep will automatically prefix the filename when displaying the matching lines. You can also control whether or not to add the prefix using the following options:

-h option will prevent filename prefix in the output (default for single input file)
-H option will always show filename prefix (default for multiple input files)

# -h is on by default for single input file
$ grep 'say' ip.txt
listen to what I say
# use -h to suppress filename prefix for multiple input files
$ printf 'say\nyou\n' | grep -h 'say' - ip.txt
say
listen to what I say

# -H is on by default for multiple input files
$ printf 'say\nyou\n' | grep 'say' - ip.txt
(standard input):say
ip.txt:listen to what I say
# use -H to always show filename prefix
# instead of -H, you can also provide /dev/null as an additional input file
$ grep -H 'say' ip.txt
ip.txt:listen to what I say

Quickfix

The vim editor has a quickfix option -q that makes it easy to edit the matching lines from grep's output. Make sure that the output has both line numbers and filename prefixes.

# -H ensures filename prefix and -n provides line numbers
$ grep -Hn 'say' ip.txt search.txt
ip.txt:2:listen to what I say
search.txt:1:say

# use :cn and :cp to navigate to the next/previous occurrences
# command-line area at the bottom will show the number of matches and filenames
# you can also save the grep output and pass that filename instead of <()
$ vim -q <(grep -Hn 'say' ip.txt search.txt)

Colored output

When working from the terminal, having the --color option enabled makes it easier to spot the matching portions in the output. Especially useful when you are experimenting to find the correct regular expression. Modern terminals will usually have color support, see unix.stackexchange: How to check if bash can print colors? for details.

The --color (or --colour) option will highlight matching patterns, line numbers, filenames, etc. There are three different settings:

auto will result in color highlighting when results are displayed on terminal, but not when the output is redirected to another command, file, etc. This is the default setting
always will result in color highlighting when results are displayed on terminal as well as when the output is redirected to another command, file, etc
never explicitly disables color highlighting

Here are couple of examples with the --color option enabled (default is auto).

grep color output

It is typical to alias both the ls and grep commands to include --color=auto.

# aliases are usually saved in ~/.bashrc or ~/.bash_aliases
$ alias ls='ls --color=auto'
$ alias grep='grep --color=auto'

Using --color=always is handy if you want to retain color information even when the output is redirected. For example, piping the results to the less command.

$ grep --color=always -i 'the' ip.txt | less -R

The below image will help you understand the difference between the auto and always features. In the first case, is gets highlighted even after piping, while in the second case is loses the color information. In practice, always is rarely used as it provides extra information to matching lines, which could cause undesirable results when processed.

grep auto vs always

Match whole word

A word character is any alphabet (irrespective of case), digit and the underscore character. You might wonder why there are digits and underscores as well, why not only alphabets? This comes from variable and function naming conventions — typically alphabets, digits and underscores are allowed. So, the definition is more programming oriented than natural language. The -w option will ensure that given patterns are not surrounded by other word characters. For example, this helps to distinguish par from spar, park, apart, par2, _par, etc.

# this matches 'par' anywhere in the line
$ printf 'par value\nheir apparent\n' | grep 'par'
par value
heir apparent
# this matches 'par' only as a whole word
$ printf 'par value\nheir apparent\n' | grep -w 'par'
par value

The -w option behaves a bit differently than word boundaries in regular expressions. See the Word boundary differences section for details.

Match whole line

Another useful option is -x, which will display a line only if the entire line satisfies the given pattern.

# this matches 'my book' anywhere in the line
$ printf 'see my book list\nmy book\n' | grep 'my book'
see my book list
my book
# this matches 'my book' only as a whole line
$ printf 'see my book list\nmy book\n' | grep -x 'my book'
my book

$ grep 'say' ip.txt search.txt
ip.txt:listen to what I say
search.txt:say
$ grep -x 'say' ip.txt search.txt
search.txt:say

# count empty lines, won't work for files with DOS style line endings
$ grep -cx '' ip.txt
1

Comparing lines between files

The -f and -x options can be combined to get common lines between two files or the difference when -v is used as well. If you want to match the lines literally, it is advised to use the -F option as well, because you might not know if there are regular expression metacharacters present in the input files or not.

$ printf 'teal\nlight blue\nbrown\nyellow\n' > colors_1
$ printf 'blue\nblack\ndark green\nyellow\n' > colors_2

# common lines between two files
$ grep -Fxf colors_1 colors_2
yellow

# lines present in colors_2 but not in colors_1
$ grep -Fvxf colors_1 colors_2
blue
black
dark green

# lines present in colors_1 but not in colors_2
$ grep -Fvxf colors_2 colors_1
teal
light blue
brown

See also stackoverflow: Fastest way to find lines of a text file from another larger text file — go through all the answers.

Extract only the matching portions

If the total number of matches is required, use the -o option to display only the matching portions (one per line), and then use wc to count them. This option is more commonly used with regular expressions.

$ grep -oi 'the' ip.txt
the
the
The
the

# -c only gives the count of matching lines
$ grep -c 'an' ip.txt
4
# use -o to get each match on a separate line
$ grep -o 'an' ip.txt | wc -l
6

Summary

In my initial years of CLI usage as a VLSI engineer, I knew only some of the options listed in this chapter. Didn't even know about the --color option. I've come across comments about not knowing the -c option in online forums. These are some of the reasons why I'd advise going through the list of all the options if you use a command frequently. Bonus points for maintaining a cheatsheet of example usage for future reference, passing on to your colleagues, etc.

Interactive exercises

I wrote a TUI app to help you solve some of the exercises from this book interactively. See GrepExercises repo for installation steps and app_guide.md for instructions on using this app.

Here's a sample screenshot:

GrepExercises example

Exercises

All the exercises are also collated together in one place at Exercises.md. For solutions, see Exercise_solutions.md.

The exercises directory has all the files used in this section.

1) Display lines containing an from the sample.txt input file.

##### add your solution here
banana
mango

2) Display lines containing do as a whole word from the sample.txt input file.

##### add your solution here
Just do-it

3) Display lines from sample.txt that satisfy both of these conditions:

he matched irrespective of case
either World or Hi matched case sensitively

##### add your solution here
Hello World
Hi there

4) Display lines from code.txt containing fruit[0] literally.

##### add your solution here
fruit[0] = 'apple'

5) Display only the first two matching lines containing t from the sample.txt input file.

##### add your solution here
Hi there
Just do-it

6) Display only the first three matching lines that do not contain he from the sample.txt input file.

##### add your solution here
Hello World

How are you

7) Display lines from sample.txt that contain do along with line number prefix.

##### add your solution here
6:Just do-it
13:Much ado about nothing

8) For the input file sample.txt, count the number of times the string he is present, irrespective of case.

##### add your solution here
5

9) For the input file sample.txt, count the number of empty lines.

##### add your solution here
4

10) For the input files sample.txt and code.txt, display matching lines based on the search terms (one per line) present in the terms.txt file. Results should be prefixed with the corresponding input filename.

$ cat terms.txt
are
not
go
fruit[0]

##### add your solution here
sample.txt:How are you
sample.txt:mango
sample.txt:Much ado about nothing
sample.txt:Adios amigo
code.txt:fruit[0] = 'apple'

11) For the input file sample.txt, display lines containing amigo prefixed by the input filename as well as the line number.

##### add your solution here
sample.txt:15:Adios amigo

12) For the input files sample.txt and code.txt, display only the filename if it contains apple.

##### add your solution here
code.txt

13) For the input files sample.txt and code.txt, display only whole matching lines based on the search terms (one per line) present in the lines.txt file. Results should be prefixed with the corresponding input filename as well as the line number.

$ cat lines.txt
banana
fruit = []

##### add your solution here
sample.txt:9:banana
code.txt:1:fruit = []

14) For the input files sample.txt and code.txt, count the number of lines that do not match any of the search terms (one per line) present in the terms.txt file.

##### add your solution here
sample.txt:11
code.txt:3

15) Count the total number of lines containing banana in the input files sample.txt and code.txt.

##### add your solution here
2

16) Which two conditions are necessary for the output of the grep command to be suitable for the vim -q quickfix mode?

17) What's the default setting for the --color option? Give an example where the always setting would be useful.

18) The command shown below tries to get the number of empty lines, but apparently shows the wrong result, why?

$ grep -cx '' dos.txt
0

CLI text processing with GNU grep and ripgrep