ripgrep

ripgrep is a very popular alternative to the grep command. Editors like Visual Studio Code are using ripgrep to power their search and replace features. The major selling point is its default behavior for recursive search, parallel processing and speed. ripgrep doesn't aim to be compatible with POSIX or GNU grep and there are various differences in terms of features, option names, output style, regular expressions and so on.

info See Feature comparison of ack, ag, git-grep, GNU grep and ripgrep for an overview of features among various grep implementations. See also benchmark among grep implementations.

Installation

See ripgrep: installation for details on various methods and platforms. Instructions shown below is for Debian-like distributions.

# link shown here on two lines as it is too long
# visit using the first part to get latest version
$ link='https://github.com/BurntSushi/ripgrep/releases/'
$ link="$link"'download/13.0.0/ripgrep_13.0.0_amd64.deb'
$ wget "$link"
$ sudo gdebi ripgrep_13.0.0_amd64.deb

# note that the installed command name is rg, not ripgrep
$ rg --version
ripgrep 13.0.0 (rev 7ec2fd51ba)
-SIMD -AVX (compiled)
+SIMD -AVX (runtime)

Command line text processing with Rust tools

Earlier versions of this book discussed ripgrep from the basics, just like GNU grep. That led to a lot of repetitive details that were very similar to GNU grep. This chapter will now cover only notable features and differences.

You can still access the earlier version from my work-in-progress Command line text processing with Rust tools ebook.

Default behavior differences

Here are some notable differences in behavior between ripgrep and GNU grep when they are invoked without any options:

  • Regular expressions flavor is provided by the regex crate
    • to put it roughly, this provides more features compared to BRE/ERE but less compared to PCRE
  • Line number prefix and color options are enabled by default
  • Blank line separates matching lines from different files
  • Filename is added as a prefix line above the matching lines instead of a prefix for each matching line
  • Recursive search is on by default for directories provided as an argument (current directory if input source is not specified). In addition,
    • files and directories that match rules specified by ignore files like .gitignore are not searched
    • hidden files and directories are ignored
    • binary files (determined by the presence of the ASCII NUL character) are ignored, but a matching line is displayed if found before encountering the NUL character along with a warning

Options overview

It is always a good idea to know where to find the documentation. From command line, you can use man rg for the manual and rg -h for a list of all the options. See also ripgrep: User guide.

This section will cover some of the options provided by ripgrep with examples. As mentioned earlier, the focus will be on differences compared to GNU grep. So, options like -F, -f, -i, -o, -v, -w, -x, -m, -q, -b, -A, -B and -C won't be discussed as they behave the same as GNU grep. Regular expressions will be covered in a later section.

info The example_files directory has all the files used in the examples.

Line number

As mentioned earlier, line number prefix is enabled by default. However, if the output is redirected or if the input is being read from stdin, this option won't be on by default. You can override the default behavior by using -n to always add the line number prefix and -N to turn off the numbering.

# default behavior
$ rg 'day' ip.txt
1:it is a warm and cozy day
$ printf 'apple\nbanana\ncherry' | rg 'an'
banana

# using options explicitly
$ rg -N 'day' ip.txt
it is a warm and cozy day
$ printf 'apple\nbanana\ncherry' | rg -n 'an'
2:banana

Here are some examples with output redirection:

# saving output to a file
$ rg 'to' ip.txt > out.txt
$ cat out.txt
listen to what I say
There are so many delights to cherish
$ rm out.txt

# passing output to another command
$ rg 'to' ip.txt | rg 'many'
There are so many delights to cherish

# use options explicitly if required
$ rg -n 'to' ip.txt | rg 'many'
6:There are so many delights to cherish

Count

Unlike GNU grep, the -c will not display files that don't have a match. You can add the --include-zero option to display files without matches as well.

$ rg -c 'to' ip.txt search.txt
ip.txt:2

$ rg -c --include-zero 'to' ip.txt search.txt
search.txt:0
ip.txt:2

When -o is combined with -c, you'll get the total count of matches. Unlike GNU grep, you don't have to use another command like wc. You can also use --count-matches instead of the -co combination.

$ rg -co 'an' ip.txt
6

$ rg --count-matches 'an' ip.txt
6

Get filename instead of matching lines

Similar to GNU grep, you can use -l or --files-with-matches to get filenames when a match is found. But the -L option in ripgrep is used to follow links, so only the long option --files-without-match is available to get filenames when a match is not found.

$ rg --files-without-match 'to' ip.txt search.txt
search.txt

Filename prefix for matching lines

Filename prefix is automatically added for recursive search and multiple file arguments. You can use -H to always show the prefix and -I to suppress it.

# filename prefix automatically added for multiple file arguments
$ rg -N 'say' ip.txt search.txt
search.txt
say

ip.txt
listen to what I say
$ rg -NI 'say' ip.txt search.txt
say

listen to what I say

# single file search
$ rg -N 'play' ip.txt
go play in the park
$ rg -NH 'play' ip.txt
ip.txt
go play in the park

Use the --no-heading option to get filename prefix for each matching line. This will also remove the newline separation between multiple files. When output is redirected, --no-heading option will be automatically active.

$ rg -N --no-heading 'say' ip.txt search.txt
search.txt:say
ip.txt:listen to what I say

$ rg 'say' ip.txt search.txt | cat -
search.txt:say
ip.txt:listen to what I say

# add -I to suppress the filename prefix
$ rg -NI --no-heading 'say' ip.txt search.txt
say
listen to what I say

Field separator

By default, : is used to separate prefixes like filename and line numbers. You can use the --field-match-separator option to customize this separator.

$ rg --field-match-separator ')' 'the' ip.txt
3)go play in the park
4)come back before the sky turns dark
9)Try them all before you perish

$ rg --no-heading --field-match-separator ';' 'par' ip.txt pets.txt
pets.txt;2;I like parrots
ip.txt;3;go play in the park

Colored output

The --color option works similar to the one seen earlier with GNU grep.

The --colors (note the plural form) option is useful to customize colors and style for matching text, line numbers, etc. A common usage is to highlight multiple terms in different colors. See manual for more details.

rg colors customize

Context matching

The options for context matching are very similar to GNU grep. The customization options are named differently: --context-separator and --no-context-separator. Also, escape sequences like \t, \n, etc can be used as part of the separator.

info Unlike GNU grep, using 0 as the context number will never add a separator in the output.

$ seq 29 | rg --context-separator '=====' -A1 '3'
3
4
=====
13
14
=====
23
24

$ seq 29 | rg --no-context-separator -A1 '3'
3
4
13
14
23
24

By default, - is used to separate the fields such as filename and line number prefix for context lines. You can use the --field-context-separator option to customize this separator.

$ rg --no-heading -H -A1 'play' ip.txt
ip.txt:3:go play in the park
ip.txt-4-come back before the sky turns dark

$ rg --no-heading -H --field-context-separator ')' -A1 'play' ip.txt
ip.txt:3:go play in the park
ip.txt)4)come back before the sky turns dark

Scripting options

You can use the -q option to suppress stdout and --no-messages to suppress stderr.

# when file doesn't exist
$ rg 'in' xyz.txt
xyz.txt: No such file or directory (os error 2)
$ rg --no-messages 'in' xyz.txt
$ echo $?
2

# some errors will require explicit redirection
$ rg --no-messages 'a(' ip.txt
regex parse error:
    a(
     ^
error: unclosed group
$ rg --no-messages 'a(' ip.txt 2> /dev/null
$ echo $?
2

Substitution

The -r option will help you perform substitution operations. Here's an example:

# 'day' is the search pattern
# 'morning' is the replacement string
$ rg 'day' -r 'morning' ip.txt
1:it is a warm and cozy morning

Using rg --passthru -N 'search' -r 'replace' is very similar to how you can use the command sed 's/search/replace/g' for substitution. Some advantages with ripgrep include fixed string matching, recursive search (and speed benefit due to parallel processing), etc.

# replace 'and' with '&'
$ rg --passthru -N 'and' -r '&' ip.txt
it is a warm & cozy day
listen to what I say
go play in the park
come back before the sky turns dark

There are so many delights to cherish
Apple, Banana & Cherry
Bread, Butter & Jelly
Try them all before you perish

Multiline matching

The -U option will allow you to match across multiple lines. Here's an example:

$ rg -U 'y\ng' ip.txt
2:listen to what I say
3:go play in the park

info See my blog post Multiline fixed string search and replace with CLI tools for more examples with the -U option.

NUL separator

The --null-data option helps to process data that use the ASCII NUL character as the separator.

$ printf 'cred\nteal\0a2\0spared' | rg --null-data 'red' | sed 's/\x0/\n---\n/g'
cred
teal
---
spared
---

ripgrep regex

From regex crate:

Its syntax is similar to Perl-style regular expressions, but lacks a few features like look around and backreferences. In exchange, all searches execute in linear time with respect to the size of the regular expression and search text.

By default, rg treats the search term as a regular expression. You can use the following options to alter the default behavior:

  • -F option will cause the search patterns to be treated literally
  • -P option will enable Perl Compatible Regular Expression (PCRE) instead of regex crate
  • --engine=auto option will dynamically use PCRE when needed

This section will cover syntax and features that are different from the BRE/ERE flavor seen earlier. PCRE will be discussed later in a separate section.

String vs line anchors

\A restricts the match to the start of string and \z restricts the match to the end of string. You'll also need the -U multiline option to use string anchors.

# start of the line vs start of the string
$ printf 'hi-hello\ntop-spot\n' | rg -o '^\w+'
hi
top
$ printf 'hi-hello\ntop-spot\n' | rg -Uo '\A\w+'
hi

# end of the line vs end of the string
$ printf 'hi-hello\ntop-spot\n' | rg -o '\w+$'
hello
spot
# note that you need to match \n as well (if present) for \z
$ printf 'hi-hello\ntop-spot\n' | rg -Uo '\w+\n\z'
spot

Alternation precedence

The alternative which matches earliest in the input gets higher precedence. Left-to-right precedence if there are alternatives that match from the same starting index.

# alternative which matches earliest gets higher precedence
$ echo 'best years' | rg 'year|years' -r 'X'
best Xs
$ echo 'best years' | rg 'years|year' -r 'X'
best X

# left to right precedence if alternatives match from the same index
$ printf 'spared PARTY PaReNt' | rg -io 'par|pare|spare'
spare
PAR
PaR

# workaround is to sort alternations based on length, longest first
$ printf 'spared PARTY PaReNt' | rg -io 'spare|pare|par'
spare
PAR
PaRe

The dot metacharacter

The dot metacharacter matches any character except newline. You can set the s modifier to enable . to match the newline character as well. Modifiers will be discussed in more detail later.

# here '.' will not match newline characters
$ printf 'blue green\nteal brown' | rg -Uo 'g.*n'
green

$ printf 'blue green\nteal brown' | rg -Uo '(?s)g.*n'
green
teal brown

Greedy Quantifiers

The *, +, ? and {m,n} quantifiers are similar to those in BRE/ERE but there are a few differences too. The {m,n} quantifiers can include whitespace characters inside {}. Also, the {,n} version isn't allowed.

$ echo 'abc ac adc abbc xabbbcz bbb bc abbbbbc' | rg -o 'ab{1, 4}c'
abc
abbc
abbbc

$ echo 'abc ac adc abbc xabbbcz bbb bc abbbbbc' | rg -o 'ab{,4}c'
regex parse error:
    ab{,4}c
       ^
error: repetition quantifier expects a valid decimal

Greedy quantifiers will try to match as much as possible and give back characters if it can help match the overall pattern, which is similar to the behavior in BRE/ERE. One key difference is that precedence is left-to-right instead of longest match wins.

$ echo 'fig123312apple' | rg -o 'g[123]+(12apple)?'
g123312

Non-greedy quantifiers

These quantifiers will try to match as minimally as possible. Appending a ? to greedy quantifiers makes them non-greedy.

$ echo 'foot' | rg 'f.??o' -r 'X'
Xot

# overall pattern has to be satisfied as well
$ echo 'frost' | rg 'f.??o' -r 'X'
Xst

Character classes

In addition to \w, \s and their opposites, you can also use \d to match digit characters. Use \D for non-digit characters. Also, these escapes can be used inside [] too.

$ echo 'Sample123string42with777numbers' | rg '\d+' -r ':'
Sample:string:with:numbers

$ echo 'Sample123string42with777numbers' | rg '\D+' -r ':'
:123:42:777:

$ echo 'tea sea-(pit sit);lean bean' | rg -o '[\w\s]+'
tea sea
pit sit
lean bean

Named character sets are supported as well. One additional feature is that you can use [:^name:] to negate that particular set alone.

# delete all non-punctuation characters as well as the '*' character
$ echo "Hi. How *are* you?" | rg '[[:^punct:]*]+' -r ''
.?

Character class metacharacters can be matched literally by specific placement or by using \ to escape them. These are all similar to those seen in BRE/ERE, except that [ has to be always escaped for a literal match.

Set operations

These operators can be applied inside character class between sets. Mostly used to get intersection or difference between two sets, where one/both of them is a character range or a predefined character set. To aid in such definitions, you can use [] in nested fashion.

# intersection of lowercase alphabets and non-vowel characters
# can also use set difference: rg -ow '[a-z--aeiou]+'
$ echo 'tryst glyph pity why' | rg -ow '[a-z&&[^aeiou]]+'
tryst
glyph
why

# symmetric difference, [[a-l]~~[g-z]] is same as [a-fm-z]
$ echo 'gets eat top sigh' | rg -ow '[[a-l]~~[g-z]]+'
eat
top

# remove all punctuation characters except . ! and ?
$ para='"Hi", there! How *are* you? All fine here.'
$ echo "$para" | rg '[[:punct:]--[.!?]]+' -r ''
Hi there! How are you? All fine here.

Backreferences

The syntax is $N where N is the capture group you want. Leftmost ( in the regular expression is $1, next one is $2 and so on. By default, $0 will give entire matched portion. Use ${N} to avoid ambiguity between backreference and other characters.

# remove square brackets that surround digit characters
$ echo '[52] apples [and] [31] mangoes' | rg '\[(\d+)]' -r '$1'
52 apples [and] 31 mangoes

# add something around the matched strings
$ echo '52 apples and 31 mangoes' | rg '\d+' -r '(${0}4)'
(524) apples and (314) mangoes

Use $$ to represent $ literally in the replacement section. This is only needed for ambiguous cases.

$ echo 'a b a' | rg 'a' -r '$$x'
$x b $x

$ echo 'a b a' | rg 'a' -r '$${a}'
${a} b ${a}

# no ambiguity here, so $$ not needed
$ echo '100' | rg '^' -r '$'
$100

warning Backreferences aren't allowed in the search pattern. Use PCRE flavor if needed.

$ echo 'fort effort' | rg -ow '\w*(\w)\1\w*'
regex parse error:
    \w*(\w)\1\w*
           ^^
error: backreferences are not supported

Consider enabling PCRE2 with the --pcre2 flag, which can handle backreferences
and look-around.

Non-capturing groups

You can use non-capturing groups (?:pattern) to avoid keeping a track of groups not needed for backreferencing.

# the first group is needed to apply quantifier, not backreferencing
$ echo '1,2,3,4,5,6,7' | rg '^(([^,]+,){3})([^,]+)' -r '$1($3)'
1,2,3,(4),5,6,7

# you can use non-capturing groups in such cases
$ echo '1,2,3,4,5,6,7' | rg '^((?:[^,]+,){3})([^,]+)' -r '$1($2)'
1,2,3,(4),5,6,7

Named capture groups

The syntax is (?P<name>pattern) to define a named capture group, useful for readability purposes. Use $name to backreference such groups ($N can also be used). Use ${name} to avoid ambiguity between backreference and other characters.

$ echo 'good,bad 42,24' | rg '(?P<fw>\w+),(?P<sw>\w+)' -r '$sw,$fw'
bad,good 24,42

$ row='today,2008-24-03,food,2012-12-08,nice,5632'
$ echo "$row" | rg '(?P<dd>-\d{2})(?P<mm>-\d{2})' -r '$mm$dd'
today,2008-03-24,food,2012-08-12,nice,5632

Extract matches with surrounding conditions

Using backreferences in combination with -o and -r options will allow to extract matches that should also satisfy some surrounding conditions.

# extract digits that follow =
$ echo 'apple=42, fig=314, banana:512' | rg -o '=(\d+)' -r '$1'
42
314

# extract digits only if it is preceded by - and followed by ; or :
$ echo '42 apple-5, fig3; x-83, y-20: f12' | rg -o '\-(\d+)[:;]' -r '$1'
20

$ s='cat scatter cater scat concatenate catastrophic catapult duplicate'
# extract 3rd occurrence of 'cat' followed by optional lowercase letters
$ echo "$s" | rg -o '^(?:.*?cat.*?){2}(cat[a-z]*)' -r '$1'
cater
# extract occurrences at multiples of 3
$ echo "$s" | rg -o '(?:.*?cat.*?){2}(cat[a-z]*)' -r '$1'
cater
catastrophic

Modifiers

ModifierDescription
icase sensitivity
mmultiline for line anchors (enabled by default for -U option)
smatching newline with . metacharacter
xreadable pattern with whitespace and comments
uunicode

To apply modifiers selectively, specify them inside a special grouping syntax. This will override the modifiers applied to entire pattern, if any. The syntax variations are:

  • (?modifiers:pattern) will apply modifiers only for this portion
  • (?-modifiers:pattern) will negate modifiers only for this portion
  • (?modifiers-modifiers:pattern) will apply and negate particular modifiers only for this portion
  • (?modifiers) when pattern is not given, modifiers (including negation) will be applied from this point onwards
# same as: rg -i 'cat' -r '[$0]'
$ echo 'Cat cOnCaT scatter cut' | rg '(?i)cat' -r '[$0]'
[Cat] cOn[CaT] s[cat]ter cut
# override -i option
$ printf 'Cat\ncOnCaT\nscatter\ncut' | rg -i '(?-i)cat'
scatter
# same as: rg -i '(?-i:Cat)[a-z]*\b' or rg 'Cat(?i)[a-z]*\b'
$ echo 'Cat SCatTeR CATER cAts' | rg 'Cat(?i:[a-z]*)\b' -r '[$0]'
[Cat] S[CatTeR] CATER cAts

# multiple modifiers can be used together
# 'm' is on by default for -U option
$ printf 'Cat\ncOnCaT\nscatter\nCater' | rg -Uo '(?is)on.*^cat'
OnCaT
scatter
Cat

The x modifier allows you to use literal unescaped whitespaces for readability purposes and add comments after an unescaped # character.

$ echo 'fox,cat,dog,parrot' | rg -o '(?x) ( ,[^,]+ ){2}$ #last 2 columns'
,dog,parrot

# need to escape whitespaces or use them inside [] to match literally
$ echo 'a cat and a dog' | rg '(?x)t a'
$ echo 'a cat and a dog' | rg '(?x)t\ a'
a cat and a dog

$ echo 'foo a#b 123' | rg -o '(?x)a#.'
a
$ echo 'foo a#b 123' | rg -o '(?x)a\#.'
a#b

Unicode

Similar to named character classes and escapes, the \p{} construct offers various predefined sets to work with Unicode strings. See regular-expressions: Unicode for more details. See -E option regarding encoding support.

# all consecutive letters
# note that {} can be omitted for single characters
$ echo 'fox:αλεπού,eagle:αετός' | rg '\p{L}+' -r '($0)'
(fox):(αλεπού),(eagle):(αετός)

# extract all consecutive Greek letters
$ echo 'fox:αλεπού,eagle:αετός' | rg -o '\p{Greek}+'
αλεπού
αετός

# escapes like \d, \w, \s are unicode aware
$ echo 'φοο12,βτ_4,bat' | rg '\w+' -r '[$0]'
[φοο12],[βτ_4],[bat]
# can be disabled by using the 'u' modifier
$ echo 'φοο12,βτ_4,bat' | rg '(?-u)\w+' -r '[$0]'
φοο[12],βτ[_4],[bat]

# extract all characters other than letters, \PL can also be used
$ echo 'φοο12,βτ_4,bat' | rg -o '\P{L}+'
12,
_4,

Characters can be specified in the hexadecimal \x{} format as well.

# {} are optional if only two hexadecimal characters are needed
$ echo 'a cat and a dog' | rg 't\x20a'
a cat and a dog

$ echo 'fox:αλεπού,eagle:αετός' | rg -o '[\x61-\x7a]+'
fox
eagle

$ echo 'fox:αλεπού,eagle:αετός' | rg -o '[\x{3b1}-\x{3bb}]+'
αλε
αε

Perl Compatible Regular Expressions

Use -P option to enable Perl Compatible Regular Expressions (PCRE) instead of the default regex. Both GNU grep and ripgrep use the PCRE2 version of the library, so most of the pattern matching features will work the same way.

One significant difference is that ripgrep provides substitution via the -r option. And there are a few subtle differences, like the -f and -e options, empty matches, etc.

# empty match handling
$ echo '1a42z' | grep -oP '[a-z]*'
a
z
$ echo '1a42z' | rg -oP '[a-z]*'

a

z

$ printf 'sub\nbit' | grep -P -f- five_words.txt
grep: the -P option only supports a single pattern
$ printf 'sub\nbit' | rg -P -f- five_words.txt
2:subtle
4:exhibit

$ grep -P -e 'sub' -e 'bit' five_words.txt
grep: the -P option only supports a single pattern
$ rg -P -e 'sub' -e 'bit' five_words.txt
2:subtle
4:exhibit

Here are some examples where you might need the -P option over the default regex features. See the Perl Compatible Regular Expressions chapter for more examples.

# lookarounds is a major feature not supported by the regex crate
# words containing all lowercase vowels in any order
$ rg -NP '(?=.*a)(?=.*e)(?=.*i)(?=.*o).*u' five_words.txt
sequoia
questionable
equation

# same as: rg -o '^(?:.*?cat.*?){2}(cat[a-z]*)' -r '$1'
$ s='cat scatter cater scat concatenate catastrophic catapult duplicate'
$ echo "$s" | rg -oP '^(.*?cat.*?){2}\Kcat[a-z]*'
cater

# match if 'go' is not there between 'at' and 'par'
$ echo 'fox,cat,dog,parrot' | rg -qP 'at((?!go).)*par' && echo 'match found'
match found

# backreference in the search pattern
# remove any number of consecutive duplicate words that are separated by a space
$ echo 'aa a a a 42 f_1 f_1 f_13.14' | rg -P '\b(\w+)( \1)+\b' -r '$1'
aa a 42 f_1 f_13.14

# mix regex and literal matching
$ expr='(a^b)'
$ echo 'f*(2-a/b) - 3*(a^b)-42' | rg -oP '\S*\Q'"$expr"'\E\S*'
3*(a^b)-42

If you wish to use default regex and switch to PCRE when needed, use the --engine=auto option.

# using a feature not present normally
$ echo '123-87-593 42 apple-12-345' | rg -o '\G\d+-?'
regex parse error:
    \G\d+-?
    ^^
error: unrecognized escape sequence

# automatically switch to PCRE
# all digits and optional hyphen combo from the start of string
$ echo '123-87-593 42 apple-12-345' | rg -o --engine=auto '\G\d+-?'
123-
87-
593

info See my blog post Search and replace tricks with ripgrep for more examples.

This section will discuss the recursive features provided by ripgrep and related options.

Sample directory

For sample files and directories used in this section, go to the example_files directory and source the grep.sh script.

$ source grep.sh

$ tree -a
.
├── backups
│   ├── color list.txt
│   └── dot_files
│       ├── .bash_aliases
│       └── .inputrc
├── colors_1
├── colors_2
├── .hidden
└── projects
    ├── dot_files -> ../backups/dot_files
    ├── python
    │   └── hello.py
    └── shell
        └── hello.sh

6 directories, 8 files

Default behavior

As mentioned earlier, ripgrep will search the current working directory recursively if no path is given. Here's an example:

$ rg 'blue'
backups/color list.txt
3:blue

colors_2
1:blue

colors_1
2:light blue

Some files will not be searched by default. These are files matched by rules specified by ignore files (such as .gitignore), hidden files and binary files. Symbolic links found while traversing a directory are also ignored by default. You can use the --files option to list the files that would be searched:

# in this example, only the hidden files are absent
# there are no ignore or binary files
# you can use 'find -type f' to get the full list of files
$ rg --files
backups/color list.txt
colors_2
colors_1
projects/shell/hello.sh
projects/python/hello.py

Here's an example of passing files and directories as arguments. In this case, the current directory won't be searched.

$ rg --files projects colors_1
colors_1
projects/shell/hello.sh
projects/python/hello.py

Ignore files

The presence of a .git directory (current or parent directories) would mark .gitignore to be used for ignoring. You can use the --no-require-git option to enable such ignore rules even for a non-git directory. For illustration purposes, an empty .git directory would be created here instead of an actual git project. In addition to .gitignore, filenames like .ignore and .rgignore are also used for determining files to ignore. For more details, refer to the manual as well as the ripgrep: user guide. Here's an example to show .gitignore in action:

$ mkdir .git
$ echo 'color*' > .gitignore
$ rg --files
projects/shell/hello.sh
projects/python/hello.py

You can use the --no-ignore option to disable the default pruning of ignore files.

$ rg --no-ignore --files
backups/color list.txt
colors_2
colors_1
projects/shell/hello.sh
projects/python/hello.py

Delete the .git folder and .gitignore file as they will hinder examples to be presented next.

$ rm -r .git .gitignore

Hidden files

Use the --hidden or -. option to search hidden files as well.

$ rg -l 'blue'
backups/color list.txt
colors_2
colors_1

# same as: rg -. -l 'blue'
$ rg --hidden -l 'blue'
backups/color list.txt
colors_2
colors_1
.hidden

-u option

As a shortcut, you can use:

  • -u to indicate --no-ignore
  • -uu to indicate --no-ignore --hidden
  • -uuu to indicate --no-ignore --hidden --binary

With rg -uuu you can match the default behavior of the grep -r command.

Use the -L option to search symbolic links that are found while traversing a directory. Here's an example:

$ rg --hidden -l 'pwd'
backups/dot_files/.bash_aliases

# dot_files is a symbolic link
$ stat -c '%N' projects/dot_files
'projects/dot_files' -> '../backups/dot_files'

# -L option enables searching symbolic links
$ rg --hidden -lL 'pwd'
projects/dot_files/.bash_aliases
backups/dot_files/.bash_aliases

NUL separator for filenames

The -0 option will use the ASCII NUL character as the separator for file paths in the output. This is helpful to avoid issues due to shell metacharacters in the filenames.

# error due to 'backups/color list.txt' having a shell metacharacter
$ rg -l 'blue' | xargs rg -l 'teal'
backups/color: No such file or directory (os error 2)
list.txt: No such file or directory (os error 2)
colors_1

# NUL separator to the rescue
$ rg -l0 'blue' | xargs -r0 rg -l 'teal'
colors_1

Predefined file types

The -t option provides a handy way to search files based on their extension. Use rg --type-list to see all the available types and their glob patterns.

# both 'md' and 'markdown' match the same file types
$ rg --type-list | rg 'markdown'
markdown: *.markdown, *.md, *.mdown, *.mkdn
md: *.markdown, *.md, *.mdown, *.mkdn

$ rg --type-list | rg '^c:'
c: *.[chH], *.[chH].in, *.cats

Here are some examples featuring the -t option:

# python and shell files
$ rg -t 'py' -t 'sh' --files
projects/shell/hello.sh
projects/python/hello.py

# files ending with .txt
$ rg -t 'txt' --files
backups/color list.txt

You can use the -T option to invert the selection.

# other than files ending with .txt
$ rg -T 'txt' --files
colors_2
colors_1
projects/shell/hello.sh
projects/python/hello.py

Glob pattern matching

The -t option helps you search based on already defined types. The -g option allows you to define your own glob pattern for matching the filenames. If / is not present in the glob provided, files will be matched against the basename only, not the entire path.

# files ending with '.sh' or '.py'
$ rg -g '*.{sh,py}' --files
projects/shell/hello.sh
projects/python/hello.py

# files having 'color' in their name
$ rg -g '*color*' --files
backups/color list.txt
colors_2
colors_1

Using ! as the first character in the glob pattern will negate the matching. For example, -g '!*.py' will match other than files ending with .py.

# files not having 'color' in their name
$ rg -g '!*color*' --files
projects/shell/hello.sh
projects/python/hello.py

You can apply file type and glob based matching multiple times:

$ rg -g '*color*' -g '!*1*' --files
backups/color list.txt
colors_2

$ rg -T 'txt' -g '!*.sh' --files
colors_2
colors_1
projects/python/hello.py

The -g option uses the .gitignore rules for pattern matching (which differs from shell globbing rules). See git documentation: gitignore pattern format for more details. The ** pattern serves as a placeholder for zero or more levels of directories.

# path (not just basename) containing 'b' or 'y'
$ rg -g '**/*[by]*/**' --files
backups/color list.txt
backups/dot_files/.inputrc
backups/dot_files/.bash_aliases
projects/python/hello.py

# * instead of ** will match only a single level
$ rg -g '*/*[by]*/**' --files
projects/python/hello.py
$ rg -g '**/*[by]*/*' --files
backups/color list.txt
projects/python/hello.py

info Use the -iglob option to match filenames case insensitively.

Limit traversal levels

The --max-depth option will help you limit the recursion depth. For example, with 1 as the value, the search won't descend into sub-directories.

# exclude all sub-directories
# same as: rg -g '!*/' --files
$ rg --max-depth 1 --files
colors_2
colors_1

Debug and other options

The --debug and --trace (more detailed debug) options can be used for debugging purposes, for example to know why a file is being ignored.

There are many more options to customize the search experience. For example, the --type-add option allows you to define your own type, the --max-depth option controls the depth of traversal and so on.

See ripgrep user guide: configuration for examples and details on how to maintain them in a file.

Speed comparison

ripgrep automatically makes use of parallel resources to provide quicker results. GNU grep would need external tools like xargs and parallel for such cases.

A sample comparison is shown below using the directory that was previously mentioned in the Parallel execution section.

# assumes 'linux-4.19' as the current working directory
# my machine can run four processes in parallel

$ time grep -rw 'user' > ../f1
real    0m0.886s
$ time rg -uuu -w 'user' > ../f2
real    0m0.287s

$ diff -sq <(sort ../f1) <(sort ../f2)
Files /dev/fd/63 and /dev/fd/62 are identical

# clean up
$ rm ../f[12]

Lot of factors like file size, file encoding, line size, sparse or dense matches, hardware features, etc will affect the performance. ripgrep provides options like -j, --dfa-size-limit and --mmap for tuning performance related settings.

info See Benchmark with other grep implementations by the author of ripgrep command for a methodological analysis and insights.

ripgrep-all

Quoting from the GitHub repo:

rga is a line-oriented search tool that allows you to look for a regex in a multitude of file types. rga wraps the awesome ripgrep and enables it to search in pdf, docx, sqlite, jpg, movie subtitles (mkv, mp4), etc.

The main attraction is pairing file types and relevant tools to enable text searching. rga also has a handy caching feature that speeds up the search for subsequent usages on the same input.

Summary

ripgrep is an excellent alternative to GNU grep. If you are working with large code bases, I'd definitely recommend ripgrep for its performance and customization options. There are interesting features in the pipeline too, for example ngram indexing support.

Exercises

Would be a good idea to first redo all the exercises using rg from all the previous chapters. Some exercises will require reading the manual, as those options aren't covered in this book.

info The exercises directory has all the files used in this section.

1) Which option will change the line separator from \n to \r\n?

# no output
$ rg -cx '' dos.txt

##### add your solution here
4

2) Default behavior of ripgrep changes depending on whether the output is redirected or not. Use appropriate option(s) to filter lines containing are from the sample.txt and patterns.txt input files and pipe the output to tr 'a-z' 'A-Z' to get results as shown below.

##### add your solution here
PATTERNS.TXT
12:CARE
15:SCARE

SAMPLE.TXT
4:HOW ARE YOU

3) Replace all occurrences of ].*[ with _ for the input file regex_terms.txt.

##### add your solution here
^[c-k].*\W$
ly.
[A-Z_0-9]

4) For the input file nul_separated, use the ASCII NUL character as the line separator and display lines containing fig. Perform additional transformation to convert ASCII NUL characters, if any, to the newline character.

##### add your solution here
apple
fig
mango
icecream

5) For the input file nul_separated, replace the ASCII NUL character with a newline character, followed by --- and another newline character.

##### add your solution here
apple
fig
mango
icecream
---
how are you
have a nice day
---
dragon unicorn centaur

6) Extract all whole words from the sample.txt input file. However, do not extract words if they contain any character present in the ignore shell variable.

$ ignore='aety'
##### add your solution here
World
Hi
How
do
Much
Adios

$ ignore='eosW'
##### add your solution here
Hi
it
it
banana
papaya
Much

7) How would you represent a $ character literally when using the -r option?

8) From the patterns.txt input file, extract from car at the start of a line to the very next occurrence of book or lie in the file.

##### add your solution here
care
4*5]
a huge discarded pile of book
car
eden
rested replie

9) From the pcre.txt input file, extract from the second field to the second last field from rows having at least two columns considering ; as the delimiter. For example, b;c should be extracted from a;b;c;d and a line containing less than two ; characters shouldn't produce any output.

##### add your solution here
in;awe;b2b;3list
be;he;0;a;b

10) For the input file python.md, match all lines containing python irrespective of case, but not if it is part of code blocks that are bounded by triple backticks.

##### add your solution here
REPL is a good way to learn PYTHON for beginners.
python comes loaded with awesome methods. Enjoy learning pYtHoN.

info For the rest of the exercises, use the recursive_matching directory that was created in an earlier chapter. Source the recursive.sh script if you haven't created this directory yet.

# the 'recursive.sh' script is present in the 'exercises' directory
$ source recursive.sh

11) List all files not containing blue. Hidden files should also be considered.

##### add your solution here
substitute.sh
backups/dot_files/.bash_aliases
backups/dot_files/.inputrc
projects/shell/hello.sh
projects/python/hello.py

12) List all the files in the backups directory, including links and hidden files.

##### add your solution here
backups/text/pat.txt
backups/color list.txt
backups/dot_files/.inputrc
backups/dot_files/.bash_aliases

13) What does the -uuu option mean?

14) Display lines containing a word ending with e. Search only among the sh file type and the output should not have line number or filename prefixes.

##### add your solution here
sed -i 's/search/replace/g' **/*.txt

15) List files other than hidden files and file types sh and py. Links should be considered for listing.

##### add your solution here
backups/text/pat.txt
backups/color list.txt
colors_2.txt
sample_file.txt
colors_1

16) List all files not containing a . character in their names. Ignore links.

##### add your solution here
colors_1

17) What does ** mean when used with the -g option?

18) Search recursively and list the names of files that contain Hello or blue. Symbolic links should be searched as well. Do not search within python or backups directories.

##### add your solution here
colors_2.txt
sample_file.txt
colors_1
projects/shell/hello.sh

19) Match lines containing Hello or red only from files in the current hierarchy, i.e. don't search recursively. Symbolic links should be searched as well.

##### add your solution here
colors_2.txt
5:red

sample_file.txt
1:Hello World

20) Search recursively for files containing blue, yellow and teal anywhere in the file.

##### add your solution here
colors_1