Introduction
Quoting from wikipedia:
grep
is a command-line utility for searching plain-text data sets for lines that match a regular expression. Its name comes from the ed commandg/re/p
(global / regular expression search / and print), which has the same effect.
Use of grep
has become so ubiquitous that it has found its way into the Oxford dictionary as well. As part of everyday computer usage, the need to search comes up often. It could be finding the right emoji by name on social media, searching your browser bookmarks, locating a particular function in a programming file and so on. Some of these tools have options for refining a search further, like controlling case sensitivity, restricting matches to whole words, using regular expressions, etc.
grep
provides all of the above features and much more when it comes to searching and extracting content from text files. After getting used to grep
, the search features provided by GUI programs feel slower and inadequate.
Installation
If you are on a Unix-like system, you will most likely have some version of grep
already installed. This book is primarily about GNU grep
and also has a chapter on ripgrep
. As there are syntax and feature differences between various implementations, make sure to have these particular commands to follow along the examples presented in this book.
GNU grep
is part of the text creation and manipulation tools and comes by default on GNU/Linux distributions. To install a particular version, visit gnu: grep software. See also release notes for an overview of changes between versions and bug list if you think some command isn't working as expected.
Sample instructions for compiling the latest version are shown below. You might need to install a PCRE library first, for example sudo apt install libpcre2-dev
.
$ wget https://ftp.gnu.org/gnu/grep/grep-3.10.tar.xz
$ tar -xf grep-3.10.tar.xz
$ cd grep-3.10/
# see https://askubuntu.com/q/237576 if you get compiler not found error
$ ./configure
$ make
$ sudo make install
$ grep -V | head -n1
grep (GNU grep) 3.10
If you are not using a Linux distribution, you may be able to access GNU grep
using an option below:
- Git for Windows — provides a Bash emulation used to run Git from the command line
- Windows Subsystem for Linux — compatibility layer for running Linux binary executables natively on Windows
- brew — Package Manager for macOS (or Linux)
Options overview
It is always good to know where to find documentation. From the command line, you can use man grep
for a short manual and info grep
for the full documentation. I prefer using the online gnu grep manual, which feels much easier to use and navigate.
$ man grep
NAME
grep - print lines that match patterns
SYNOPSIS
grep [OPTION...] PATTERNS [FILE...]
grep [OPTION...] -e PATTERNS ... [FILE...]
grep [OPTION...] -f PATTERN_FILE ... [FILE...]
DESCRIPTION
grep searches for PATTERNS in each FILE. PATTERNS is one or more
patterns separated by newline characters, and grep prints each
line that matches a pattern. Typically PATTERNS should be quoted
when grep is used in a shell command.
A FILE of “-” stands for standard input. If no FILE is given,
recursive searches examine the working directory, and
nonrecursive searches read standard input.
For a quick overview of all the available options, use grep --help
from the command line. These are shown below in table format:
Regexp selection:
Option | Description |
---|---|
-E, --extended-regexp | PATTERNS are extended regular expressions |
-F, --fixed-strings | PATTERNS are strings |
-G, --basic-regexp | PATTERNS are basic regular expressions |
-P, --perl-regexp | PATTERNS are Perl regular expressions |
-e, --regexp=PATTERNS | use PATTERNS for matching |
-f, --file=FILE | take PATTERNS from FILE |
-i, --ignore-case | ignore case distinctions in patterns and data |
--no-ignore-case | do not ignore case distinctions (default) |
-w, --word-regexp | match only whole words |
-x, --line-regexp | match only whole lines |
-z, --null-data | a data line ends in 0 byte, not newline |
Miscellaneous:
Option | Description |
---|---|
-s, --no-messages | suppress error messages |
-v, --invert-match | select non-matching lines |
-V, --version | display version information and exit |
--help | display this help text and exit |
Output control:
Option | Description |
---|---|
-m, --max-count=NUM | stop after NUM selected lines |
-b, --byte-offset | print the byte offset with output lines |
-n, --line-number | print line number with output lines |
--line-buffered | flush output on every line |
-H, --with-filename | print file name with output lines |
-h, --no-filename | suppress the file name prefix on output |
--label=LABEL | use LABEL as the standard input file name prefix |
-o, --only-matching | show only nonempty parts of lines that match |
-q, --quiet, --silent | suppress all normal output |
--binary-files=TYPE | assume that binary files are TYPE; |
TYPE is 'binary', 'text', or 'without-match' | |
-a, --text | equivalent to --binary-files=text |
-I | equivalent to --binary-files=without-match |
-d, --directories=ACTION | how to handle directories; |
ACTION is 'read', 'recurse', or 'skip' | |
-D, --devices=ACTION | how to handle devices, FIFOs and sockets; |
ACTION is 'read' or 'skip' | |
-r, --recursive | like --directories=recurse |
-R, --dereference-recursive | likewise, but follow all symlinks |
--include=GLOB | search only files that match GLOB (a file pattern) |
--exclude=GLOB | skip files that match GLOB |
--exclude-from=FILE | skip files that match any file pattern from FILE |
--exclude-dir=GLOB | skip directories that match GLOB |
-L, --files-without-match | print only names of FILEs with no selected lines |
-l, --files-with-matches | print only names of FILEs with selected lines |
-c, --count | print only a count of selected lines per FILE |
-T, --initial-tab | make tabs line up (if needed) |
-Z, --null | print 0 byte after FILE name |
Context control:
Option | Description |
---|---|
-B, --before-context=NUM | print NUM lines of leading context |
-A, --after-context=NUM | print NUM lines of trailing context |
-C, --context=NUM | print NUM lines of output context |
-NUM | same as --context=NUM |
--group-separator=SEP | print SEP on line between matches with context |
--no-group-separator | do not print separator for matches with context |
--color[=WHEN], | use markers to highlight the matching strings; |
--colour[=WHEN] | WHEN is 'always', 'never', or 'auto' |
-U, --binary | do not strip CR characters at EOL (MSDOS/Windows) |