ripgrep
ripgrep
is a very popular alternative to the grep
command. Editors like Visual Studio Code are using ripgrep
to power their search and replace features. The major selling point is its default behavior for recursive search, parallel processing and speed. ripgrep
doesn't aim to be compatible with POSIX or GNU grep
and there are various differences in terms of features, option names, output style, regular expressions and so on.
Project links
See Feature comparison of ack, ag, git-grep, GNU grep and ripgrep for an overview of features among various
grep
implementations. See also benchmark among grep implementations.
Installation
See ripgrep: installation for details on various methods and platforms. Instructions shown below is for Debian-like distributions.
# link shown here on two lines as it is too long
# visit using the first part to get latest version
$ link='https://github.com/BurntSushi/ripgrep/releases/'
$ link="$link"'download/13.0.0/ripgrep_13.0.0_amd64.deb'
$ wget "$link"
$ sudo gdebi ripgrep_13.0.0_amd64.deb
# note that the installed command name is rg, not ripgrep
$ rg --version
ripgrep 13.0.0 (rev 7ec2fd51ba)
-SIMD -AVX (compiled)
+SIMD -AVX (runtime)
Command line text processing with Rust tools
Earlier versions of this book discussed ripgrep
from the basics, just like GNU grep
. That led to a lot of repetitive details that were very similar to GNU grep
. This chapter will now cover only notable features and differences.
You can still access the earlier version from my work-in-progress Command line text processing with Rust tools ebook.
Default behavior differences
Here are some notable differences in behavior between ripgrep
and GNU grep
when they are invoked without any options:
- Regular expressions flavor is provided by the regex crate
- to put it roughly, this provides more features compared to BRE/ERE but less compared to PCRE
- Line number prefix and color options are enabled by default
- Blank line separates matching lines from different files
- Filename is added as a prefix line above the matching lines instead of a prefix for each matching line
- Recursive search is on by default for directories provided as an argument (current directory if input source is not specified). In addition,
- files and directories that match rules specified by ignore files like
.gitignore
are not searched - hidden files and directories are ignored
- binary files (determined by the presence of the ASCII NUL character) are ignored, but a matching line is displayed if found before encountering the NUL character along with a warning
- files and directories that match rules specified by ignore files like
Options overview
It is always a good idea to know where to find the documentation. From command line, you can use man rg
for the manual and rg -h
for a list of all the options. See also ripgrep: User guide.
This section will cover some of the options provided by ripgrep
with examples. As mentioned earlier, the focus will be on differences compared to GNU grep
. So, options like -F
, -f
, -i
, -o
, -v
, -w
, -x
, -m
, -q
, -b
, -A
, -B
and -C
won't be discussed as they behave the same as GNU grep
. Regular expressions will be covered in a later section.
The example_files directory has all the files used in the examples.
Line number
As mentioned earlier, line number prefix is enabled by default. However, if the output is redirected or if the input is being read from stdin
, this option won't be on by default. You can override the default behavior by using -n
to always add the line number prefix and -N
to turn off the numbering.
# default behavior
$ rg 'day' ip.txt
1:it is a warm and cozy day
$ printf 'apple\nbanana\ncherry' | rg 'an'
banana
# using options explicitly
$ rg -N 'day' ip.txt
it is a warm and cozy day
$ printf 'apple\nbanana\ncherry' | rg -n 'an'
2:banana
Here are some examples with output redirection:
# saving output to a file
$ rg 'to' ip.txt > out.txt
$ cat out.txt
listen to what I say
There are so many delights to cherish
$ rm out.txt
# passing output to another command
$ rg 'to' ip.txt | rg 'many'
There are so many delights to cherish
# use options explicitly if required
$ rg -n 'to' ip.txt | rg 'many'
6:There are so many delights to cherish
Count
Unlike GNU grep
, the -c
will not display files that don't have a match. You can add the --include-zero
option to display files without matches as well.
$ rg -c 'to' ip.txt search.txt
ip.txt:2
$ rg -c --include-zero 'to' ip.txt search.txt
search.txt:0
ip.txt:2
When -o
is combined with -c
, you'll get the total count of matches. Unlike GNU grep
, you don't have to use another command like wc
. You can also use --count-matches
instead of the -co
combination.
$ rg -co 'an' ip.txt
6
$ rg --count-matches 'an' ip.txt
6
Get filename instead of matching lines
Similar to GNU grep
, you can use -l
or --files-with-matches
to get filenames when a match is found. But the -L
option in ripgrep
is used to follow links, so only the long option --files-without-match
is available to get filenames when a match is not found.
$ rg --files-without-match 'to' ip.txt search.txt
search.txt
Filename prefix for matching lines
Filename prefix is automatically added for recursive search and multiple file arguments. You can use -H
to always show the prefix and -I
to suppress it.
# filename prefix automatically added for multiple file arguments
$ rg -N 'say' ip.txt search.txt
search.txt
say
ip.txt
listen to what I say
$ rg -NI 'say' ip.txt search.txt
say
listen to what I say
# single file search
$ rg -N 'play' ip.txt
go play in the park
$ rg -NH 'play' ip.txt
ip.txt
go play in the park
Use the --no-heading
option to get filename prefix for each matching line. This will also remove the newline separation between multiple files. When output is redirected, --no-heading
option will be automatically active.
$ rg -N --no-heading 'say' ip.txt search.txt
search.txt:say
ip.txt:listen to what I say
$ rg 'say' ip.txt search.txt | cat -
search.txt:say
ip.txt:listen to what I say
# add -I to suppress the filename prefix
$ rg -NI --no-heading 'say' ip.txt search.txt
say
listen to what I say
Field separator
By default, :
is used to separate prefixes like filename and line numbers. You can use the --field-match-separator
option to customize this separator.
$ rg --field-match-separator ')' 'the' ip.txt
3)go play in the park
4)come back before the sky turns dark
9)Try them all before you perish
$ rg --no-heading --field-match-separator ';' 'par' ip.txt pets.txt
pets.txt;2;I like parrots
ip.txt;3;go play in the park
Colored output
The --color
option works similar to the one seen earlier with GNU grep
.
The --colors
(note the plural form) option is useful to customize colors and style for matching text, line numbers, etc. A common usage is to highlight multiple terms in different colors. See manual for more details.
Context matching
The options for context matching are very similar to GNU grep
. The customization options are named differently: --context-separator
and --no-context-separator
. Also, escape sequences like \t
, \n
, etc can be used as part of the separator.
Unlike
GNU grep
, using0
as the context number will never add a separator in the output.
$ seq 29 | rg --context-separator '=====' -A1 '3'
3
4
=====
13
14
=====
23
24
$ seq 29 | rg --no-context-separator -A1 '3'
3
4
13
14
23
24
By default, -
is used to separate the fields such as filename and line number prefix for context lines. You can use the --field-context-separator
option to customize this separator.
$ rg --no-heading -H -A1 'play' ip.txt
ip.txt:3:go play in the park
ip.txt-4-come back before the sky turns dark
$ rg --no-heading -H --field-context-separator ')' -A1 'play' ip.txt
ip.txt:3:go play in the park
ip.txt)4)come back before the sky turns dark
Scripting options
You can use the -q
option to suppress stdout
and --no-messages
to suppress stderr
.
# when file doesn't exist
$ rg 'in' xyz.txt
xyz.txt: No such file or directory (os error 2)
$ rg --no-messages 'in' xyz.txt
$ echo $?
2
# some errors will require explicit redirection
$ rg --no-messages 'a(' ip.txt
regex parse error:
a(
^
error: unclosed group
$ rg --no-messages 'a(' ip.txt 2> /dev/null
$ echo $?
2
Substitution
The -r
option will help you perform substitution operations. Here's an example:
# 'day' is the search pattern
# 'morning' is the replacement string
$ rg 'day' -r 'morning' ip.txt
1:it is a warm and cozy morning
Using rg --passthru -N 'search' -r 'replace'
is very similar to how you can use the command sed 's/search/replace/g'
for substitution. Some advantages with ripgrep
include fixed string matching, recursive search (and speed benefit due to parallel processing), etc.
# replace 'and' with '&'
$ rg --passthru -N 'and' -r '&' ip.txt
it is a warm & cozy day
listen to what I say
go play in the park
come back before the sky turns dark
There are so many delights to cherish
Apple, Banana & Cherry
Bread, Butter & Jelly
Try them all before you perish
Multiline matching
The -U
option will allow you to match across multiple lines. Here's an example:
$ rg -U 'y\ng' ip.txt
2:listen to what I say
3:go play in the park
See my blog post Multiline fixed string search and replace with CLI tools for more examples with the
-U
option.
NUL separator
The --null-data
option helps to process data that use the ASCII NUL character as the separator.
$ printf 'cred\nteal\0a2\0spared' | rg --null-data 'red' | sed 's/\x0/\n---\n/g'
cred
teal
---
spared
---
ripgrep regex
From regex crate:
Its syntax is similar to Perl-style regular expressions, but lacks a few features like look around and backreferences. In exchange, all searches execute in linear time with respect to the size of the regular expression and search text.
By default, rg
treats the search term as a regular expression. You can use the following options to alter the default behavior:
-F
option will cause the search patterns to be treated literally-P
option will enable Perl Compatible Regular Expression (PCRE) instead of regex crate--engine=auto
option will dynamically use PCRE when needed
This section will cover syntax and features that are different from the BRE/ERE flavor seen earlier. PCRE will be discussed later in a separate section.
String vs line anchors
\A
restricts the match to the start of string and \z
restricts the match to the end of string. You'll also need the -U
multiline option to use string anchors.
# start of the line vs start of the string
$ printf 'hi-hello\ntop-spot\n' | rg -o '^\w+'
hi
top
$ printf 'hi-hello\ntop-spot\n' | rg -Uo '\A\w+'
hi
# end of the line vs end of the string
$ printf 'hi-hello\ntop-spot\n' | rg -o '\w+$'
hello
spot
# note that you need to match \n as well (if present) for \z
$ printf 'hi-hello\ntop-spot\n' | rg -Uo '\w+\n\z'
spot
Alternation precedence
The alternative which matches earliest in the input gets higher precedence. Left-to-right precedence if there are alternatives that match from the same starting index.
# alternative which matches earliest gets higher precedence
$ echo 'best years' | rg 'year|years' -r 'X'
best Xs
$ echo 'best years' | rg 'years|year' -r 'X'
best X
# left to right precedence if alternatives match from the same index
$ printf 'spared PARTY PaReNt' | rg -io 'par|pare|spare'
spare
PAR
PaR
# workaround is to sort alternations based on length, longest first
$ printf 'spared PARTY PaReNt' | rg -io 'spare|pare|par'
spare
PAR
PaRe
The dot metacharacter
The dot metacharacter matches any character except newline. You can set the s
modifier to enable .
to match the newline character as well. Modifiers will be discussed in more detail later.
# here '.' will not match newline characters
$ printf 'blue green\nteal brown' | rg -Uo 'g.*n'
green
$ printf 'blue green\nteal brown' | rg -Uo '(?s)g.*n'
green
teal brown
Greedy Quantifiers
The *
, +
, ?
and {m,n}
quantifiers are similar to those in BRE/ERE but there are a few differences too. The {m,n}
quantifiers can include whitespace characters inside {}
. Also, the {,n}
version isn't allowed.
$ echo 'abc ac adc abbc xabbbcz bbb bc abbbbbc' | rg -o 'ab{1, 4}c'
abc
abbc
abbbc
$ echo 'abc ac adc abbc xabbbcz bbb bc abbbbbc' | rg -o 'ab{,4}c'
regex parse error:
ab{,4}c
^
error: repetition quantifier expects a valid decimal
Greedy quantifiers will try to match as much as possible and give back characters if it can help match the overall pattern, which is similar to the behavior in BRE/ERE. One key difference is that precedence is left-to-right instead of longest match wins.
$ echo 'fig123312apple' | rg -o 'g[123]+(12apple)?'
g123312
Non-greedy quantifiers
These quantifiers will try to match as minimally as possible. Appending a ?
to greedy quantifiers makes them non-greedy.
$ echo 'foot' | rg 'f.??o' -r 'X'
Xot
# overall pattern has to be satisfied as well
$ echo 'frost' | rg 'f.??o' -r 'X'
Xst
Character classes
In addition to \w
, \s
and their opposites, you can also use \d
to match digit characters. Use \D
for non-digit characters. Also, these escapes can be used inside []
too.
$ echo 'Sample123string42with777numbers' | rg '\d+' -r ':'
Sample:string:with:numbers
$ echo 'Sample123string42with777numbers' | rg '\D+' -r ':'
:123:42:777:
$ echo 'tea sea-(pit sit);lean bean' | rg -o '[\w\s]+'
tea sea
pit sit
lean bean
Named character sets are supported as well. One additional feature is that you can use [:^name:]
to negate that particular set alone.
# delete all non-punctuation characters as well as the '*' character
$ echo "Hi. How *are* you?" | rg '[[:^punct:]*]+' -r ''
.?
Character class metacharacters can be matched literally by specific placement or by using \
to escape them. These are all similar to those seen in BRE/ERE, except that [
has to be always escaped for a literal match.
Set operations
These operators can be applied inside character class between sets. Mostly used to get intersection or difference between two sets, where one/both of them is a character range or a predefined character set. To aid in such definitions, you can use []
in nested fashion.
# intersection of lowercase alphabets and non-vowel characters
# can also use set difference: rg -ow '[a-z--aeiou]+'
$ echo 'tryst glyph pity why' | rg -ow '[a-z&&[^aeiou]]+'
tryst
glyph
why
# symmetric difference, [[a-l]~~[g-z]] is same as [a-fm-z]
$ echo 'gets eat top sigh' | rg -ow '[[a-l]~~[g-z]]+'
eat
top
# remove all punctuation characters except . ! and ?
$ para='"Hi", there! How *are* you? All fine here.'
$ echo "$para" | rg '[[:punct:]--[.!?]]+' -r ''
Hi there! How are you? All fine here.
Backreferences
The syntax is $N
where N
is the capture group you want. Leftmost (
in the regular expression is $1
, next one is $2
and so on. By default, $0
will give entire matched portion. Use ${N}
to avoid ambiguity between backreference and other characters.
# remove square brackets that surround digit characters
$ echo '[52] apples [and] [31] mangoes' | rg '\[(\d+)]' -r '$1'
52 apples [and] 31 mangoes
# add something around the matched strings
$ echo '52 apples and 31 mangoes' | rg '\d+' -r '(${0}4)'
(524) apples and (314) mangoes
Use $$
to represent $
literally in the replacement section. This is only needed for ambiguous cases.
$ echo 'a b a' | rg 'a' -r '$$x'
$x b $x
$ echo 'a b a' | rg 'a' -r '$${a}'
${a} b ${a}
# no ambiguity here, so $$ not needed
$ echo '100' | rg '^' -r '$'
$100
Backreferences aren't allowed in the search pattern. Use PCRE flavor if needed.
$ echo 'fort effort' | rg -ow '\w*(\w)\1\w*' regex parse error: \w*(\w)\1\w* ^^ error: backreferences are not supported Consider enabling PCRE2 with the --pcre2 flag, which can handle backreferences and look-around.
Non-capturing groups
You can use non-capturing groups (?:pattern)
to avoid keeping a track of groups not needed for backreferencing.
# the first group is needed to apply quantifier, not backreferencing
$ echo '1,2,3,4,5,6,7' | rg '^(([^,]+,){3})([^,]+)' -r '$1($3)'
1,2,3,(4),5,6,7
# you can use non-capturing groups in such cases
$ echo '1,2,3,4,5,6,7' | rg '^((?:[^,]+,){3})([^,]+)' -r '$1($2)'
1,2,3,(4),5,6,7
Named capture groups
The syntax is (?P<name>pattern)
to define a named capture group, useful for readability purposes. Use $name
to backreference such groups ($N
can also be used). Use ${name}
to avoid ambiguity between backreference and other characters.
$ echo 'good,bad 42,24' | rg '(?P<fw>\w+),(?P<sw>\w+)' -r '$sw,$fw'
bad,good 24,42
$ row='today,2008-24-03,food,2012-12-08,nice,5632'
$ echo "$row" | rg '(?P<dd>-\d{2})(?P<mm>-\d{2})' -r '$mm$dd'
today,2008-03-24,food,2012-08-12,nice,5632
Extract matches with surrounding conditions
Using backreferences in combination with -o
and -r
options will allow to extract matches that should also satisfy some surrounding conditions.
# extract digits that follow =
$ echo 'apple=42, fig=314, banana:512' | rg -o '=(\d+)' -r '$1'
42
314
# extract digits only if it is preceded by - and followed by ; or :
$ echo '42 apple-5, fig3; x-83, y-20: f12' | rg -o '\-(\d+)[:;]' -r '$1'
20
$ s='cat scatter cater scat concatenate catastrophic catapult duplicate'
# extract 3rd occurrence of 'cat' followed by optional lowercase letters
$ echo "$s" | rg -o '^(?:.*?cat.*?){2}(cat[a-z]*)' -r '$1'
cater
# extract occurrences at multiples of 3
$ echo "$s" | rg -o '(?:.*?cat.*?){2}(cat[a-z]*)' -r '$1'
cater
catastrophic
Modifiers
Modifier | Description |
---|---|
i | case sensitivity |
m | multiline for line anchors (enabled by default for -U option) |
s | matching newline with . metacharacter |
x | readable pattern with whitespace and comments |
u | unicode |
To apply modifiers selectively, specify them inside a special grouping syntax. This will override the modifiers applied to entire pattern, if any. The syntax variations are:
(?modifiers:pattern)
will apply modifiers only for this portion(?-modifiers:pattern)
will negate modifiers only for this portion(?modifiers-modifiers:pattern)
will apply and negate particular modifiers only for this portion(?modifiers)
when pattern is not given, modifiers (including negation) will be applied from this point onwards
# same as: rg -i 'cat' -r '[$0]'
$ echo 'Cat cOnCaT scatter cut' | rg '(?i)cat' -r '[$0]'
[Cat] cOn[CaT] s[cat]ter cut
# override -i option
$ printf 'Cat\ncOnCaT\nscatter\ncut' | rg -i '(?-i)cat'
scatter
# same as: rg -i '(?-i:Cat)[a-z]*\b' or rg 'Cat(?i)[a-z]*\b'
$ echo 'Cat SCatTeR CATER cAts' | rg 'Cat(?i:[a-z]*)\b' -r '[$0]'
[Cat] S[CatTeR] CATER cAts
# multiple modifiers can be used together
# 'm' is on by default for -U option
$ printf 'Cat\ncOnCaT\nscatter\nCater' | rg -Uo '(?is)on.*^cat'
OnCaT
scatter
Cat
The x
modifier allows you to use literal unescaped whitespaces for readability purposes and add comments after an unescaped #
character.
$ echo 'fox,cat,dog,parrot' | rg -o '(?x) ( ,[^,]+ ){2}$ #last 2 columns'
,dog,parrot
# need to escape whitespaces or use them inside [] to match literally
$ echo 'a cat and a dog' | rg '(?x)t a'
$ echo 'a cat and a dog' | rg '(?x)t\ a'
a cat and a dog
$ echo 'foo a#b 123' | rg -o '(?x)a#.'
a
$ echo 'foo a#b 123' | rg -o '(?x)a\#.'
a#b
Unicode
Similar to named character classes and escapes, the \p{}
construct offers various predefined sets to work with Unicode strings. See regular-expressions: Unicode for more details. See -E
option regarding encoding support.
# all consecutive letters
# note that {} can be omitted for single characters
$ echo 'fox:αλεπού,eagle:αετός' | rg '\p{L}+' -r '($0)'
(fox):(αλεπού),(eagle):(αετός)
# extract all consecutive Greek letters
$ echo 'fox:αλεπού,eagle:αετός' | rg -o '\p{Greek}+'
αλεπού
αετός
# escapes like \d, \w, \s are unicode aware
$ echo 'φοο12,βτ_4,bat' | rg '\w+' -r '[$0]'
[φοο12],[βτ_4],[bat]
# can be disabled by using the 'u' modifier
$ echo 'φοο12,βτ_4,bat' | rg '(?-u)\w+' -r '[$0]'
φοο[12],βτ[_4],[bat]
# extract all characters other than letters, \PL can also be used
$ echo 'φοο12,βτ_4,bat' | rg -o '\P{L}+'
12,
_4,
Characters can be specified in the hexadecimal \x{}
format as well.
# {} are optional if only two hexadecimal characters are needed
$ echo 'a cat and a dog' | rg 't\x20a'
a cat and a dog
$ echo 'fox:αλεπού,eagle:αετός' | rg -o '[\x61-\x7a]+'
fox
eagle
$ echo 'fox:αλεπού,eagle:αετός' | rg -o '[\x{3b1}-\x{3bb}]+'
αλε
αε
Perl Compatible Regular Expressions
Use -P
option to enable Perl Compatible Regular Expressions (PCRE) instead of the default regex. Both GNU grep
and ripgrep
use the PCRE2 version of the library, so most of the pattern matching features will work the same way.
One significant difference is that ripgrep
provides substitution via the -r
option. And there are a few subtle differences, like the -f
and -e
options, empty matches, etc.
# empty match handling
$ echo '1a42z' | grep -oP '[a-z]*'
a
z
$ echo '1a42z' | rg -oP '[a-z]*'
a
z
$ printf 'sub\nbit' | grep -P -f- five_words.txt
grep: the -P option only supports a single pattern
$ printf 'sub\nbit' | rg -P -f- five_words.txt
2:subtle
4:exhibit
$ grep -P -e 'sub' -e 'bit' five_words.txt
grep: the -P option only supports a single pattern
$ rg -P -e 'sub' -e 'bit' five_words.txt
2:subtle
4:exhibit
Here are some examples where you might need the -P
option over the default regex features. See the Perl Compatible Regular Expressions chapter for more examples.
# lookarounds is a major feature not supported by the regex crate
# words containing all lowercase vowels in any order
$ rg -NP '(?=.*a)(?=.*e)(?=.*i)(?=.*o).*u' five_words.txt
sequoia
questionable
equation
# same as: rg -o '^(?:.*?cat.*?){2}(cat[a-z]*)' -r '$1'
$ s='cat scatter cater scat concatenate catastrophic catapult duplicate'
$ echo "$s" | rg -oP '^(.*?cat.*?){2}\Kcat[a-z]*'
cater
# match if 'go' is not there between 'at' and 'par'
$ echo 'fox,cat,dog,parrot' | rg -qP 'at((?!go).)*par' && echo 'match found'
match found
# backreference in the search pattern
# remove any number of consecutive duplicate words that are separated by a space
$ echo 'aa a a a 42 f_1 f_1 f_13.14' | rg -P '\b(\w+)( \1)+\b' -r '$1'
aa a 42 f_1 f_13.14
# mix regex and literal matching
$ expr='(a^b)'
$ echo 'f*(2-a/b) - 3*(a^b)-42' | rg -oP '\S*\Q'"$expr"'\E\S*'
3*(a^b)-42
If you wish to use default regex and switch to PCRE when needed, use the --engine=auto
option.
# using a feature not present normally
$ echo '123-87-593 42 apple-12-345' | rg -o '\G\d+-?'
regex parse error:
\G\d+-?
^^
error: unrecognized escape sequence
# automatically switch to PCRE
# all digits and optional hyphen combo from the start of string
$ echo '123-87-593 42 apple-12-345' | rg -o --engine=auto '\G\d+-?'
123-
87-
593
See my blog post Search and replace tricks with ripgrep for more examples.
Recursive search
This section will discuss the recursive features provided by ripgrep
and related options.
Sample directory
For sample files and directories used in this section, go to the example_files
directory and source the grep.sh
script.
$ source grep.sh
$ tree -a
.
├── backups
│ ├── color list.txt
│ └── dot_files
│ ├── .bash_aliases
│ └── .inputrc
├── colors_1
├── colors_2
├── .hidden
└── projects
├── dot_files -> ../backups/dot_files
├── python
│ └── hello.py
└── shell
└── hello.sh
6 directories, 8 files
Default behavior
As mentioned earlier, ripgrep
will search the current working directory recursively if no path is given. Here's an example:
$ rg 'blue'
backups/color list.txt
3:blue
colors_2
1:blue
colors_1
2:light blue
Some files will not be searched by default. These are files matched by rules specified by ignore files (such as .gitignore
), hidden files and binary files. Symbolic links found while traversing a directory are also ignored by default. You can use the --files
option to list the files that would be searched:
# in this example, only the hidden files are absent
# there are no ignore or binary files
# you can use 'find -type f' to get the full list of files
$ rg --files
backups/color list.txt
colors_2
colors_1
projects/shell/hello.sh
projects/python/hello.py
Here's an example of passing files and directories as arguments. In this case, the current directory won't be searched.
$ rg --files projects colors_1
colors_1
projects/shell/hello.sh
projects/python/hello.py
Ignore files
The presence of a .git
directory (current or parent directories) would mark .gitignore
to be used for ignoring. You can use the --no-require-git
option to enable such ignore rules even for a non-git directory. For illustration purposes, an empty .git
directory would be created here instead of an actual git
project. In addition to .gitignore
, filenames like .ignore
and .rgignore
are also used for determining files to ignore. For more details, refer to the manual as well as the ripgrep: user guide. Here's an example to show .gitignore
in action:
$ mkdir .git
$ echo 'color*' > .gitignore
$ rg --files
projects/shell/hello.sh
projects/python/hello.py
You can use the --no-ignore
option to disable the default pruning of ignore files.
$ rg --no-ignore --files
backups/color list.txt
colors_2
colors_1
projects/shell/hello.sh
projects/python/hello.py
Delete the .git
folder and .gitignore
file as they will hinder examples to be presented next.
$ rm -r .git .gitignore
Hidden files
Use the --hidden
or -.
option to search hidden files as well.
$ rg -l 'blue'
backups/color list.txt
colors_2
colors_1
# same as: rg -. -l 'blue'
$ rg --hidden -l 'blue'
backups/color list.txt
colors_2
colors_1
.hidden
-u option
As a shortcut, you can use:
-u
to indicate--no-ignore
-uu
to indicate--no-ignore --hidden
-uuu
to indicate--no-ignore --hidden --binary
With rg -uuu
you can match the default behavior of the grep -r
command.
Follow links
Use the -L
option to search symbolic links that are found while traversing a directory. Here's an example:
$ rg --hidden -l 'pwd'
backups/dot_files/.bash_aliases
# dot_files is a symbolic link
$ stat -c '%N' projects/dot_files
'projects/dot_files' -> '../backups/dot_files'
# -L option enables searching symbolic links
$ rg --hidden -lL 'pwd'
projects/dot_files/.bash_aliases
backups/dot_files/.bash_aliases
NUL separator for filenames
The -0
option will use the ASCII NUL character as the separator for file paths in the output. This is helpful to avoid issues due to shell metacharacters in the filenames.
# error due to 'backups/color list.txt' having a shell metacharacter
$ rg -l 'blue' | xargs rg -l 'teal'
backups/color: No such file or directory (os error 2)
list.txt: No such file or directory (os error 2)
colors_1
# NUL separator to the rescue
$ rg -l0 'blue' | xargs -r0 rg -l 'teal'
colors_1
Predefined file types
The -t
option provides a handy way to search files based on their extension. Use rg --type-list
to see all the available types and their glob patterns.
# both 'md' and 'markdown' match the same file types
$ rg --type-list | rg 'markdown'
markdown: *.markdown, *.md, *.mdown, *.mkdn
md: *.markdown, *.md, *.mdown, *.mkdn
$ rg --type-list | rg '^c:'
c: *.[chH], *.[chH].in, *.cats
Here are some examples featuring the -t
option:
# python and shell files
$ rg -t 'py' -t 'sh' --files
projects/shell/hello.sh
projects/python/hello.py
# files ending with .txt
$ rg -t 'txt' --files
backups/color list.txt
You can use the -T
option to invert the selection.
# other than files ending with .txt
$ rg -T 'txt' --files
colors_2
colors_1
projects/shell/hello.sh
projects/python/hello.py
Glob pattern matching
The -t
option helps you search based on already defined types. The -g
option allows you to define your own glob pattern for matching the filenames. If /
is not present in the glob provided, files will be matched against the basename only, not the entire path.
# files ending with '.sh' or '.py'
$ rg -g '*.{sh,py}' --files
projects/shell/hello.sh
projects/python/hello.py
# files having 'color' in their name
$ rg -g '*color*' --files
backups/color list.txt
colors_2
colors_1
Using !
as the first character in the glob pattern will negate the matching. For example, -g '!*.py'
will match other than files ending with .py
.
# files not having 'color' in their name
$ rg -g '!*color*' --files
projects/shell/hello.sh
projects/python/hello.py
You can apply file type and glob based matching multiple times:
$ rg -g '*color*' -g '!*1*' --files
backups/color list.txt
colors_2
$ rg -T 'txt' -g '!*.sh' --files
colors_2
colors_1
projects/python/hello.py
The -g
option uses the .gitignore
rules for pattern matching (which differs from shell globbing rules). See git documentation: gitignore pattern format for more details. The **
pattern serves as a placeholder for zero or more levels of directories.
# path (not just basename) containing 'b' or 'y'
$ rg -g '**/*[by]*/**' --files
backups/color list.txt
backups/dot_files/.inputrc
backups/dot_files/.bash_aliases
projects/python/hello.py
# * instead of ** will match only a single level
$ rg -g '*/*[by]*/**' --files
projects/python/hello.py
$ rg -g '**/*[by]*/*' --files
backups/color list.txt
projects/python/hello.py
Use the
-iglob
option to match filenames case insensitively.
Limit traversal levels
The --max-depth
option will help you limit the recursion depth. For example, with 1
as the value, the search won't descend into sub-directories.
# exclude all sub-directories
# same as: rg -g '!*/' --files
$ rg --max-depth 1 --files
colors_2
colors_1
Debug and other options
The --debug
and --trace
(more detailed debug) options can be used for debugging purposes, for example to know why a file is being ignored.
There are many more options to customize the search experience. For example, the --type-add
option allows you to define your own type, the --max-depth
option controls the depth of traversal and so on.
See ripgrep user guide: configuration for examples and details on how to maintain them in a file.
Speed comparison
ripgrep
automatically makes use of parallel resources to provide quicker results. GNU grep
would need external tools like xargs
and parallel
for such cases.
A sample comparison is shown below using the directory that was previously mentioned in the Parallel execution section.
# assumes 'linux-4.19' as the current working directory
# my machine can run four processes in parallel
$ time grep -rw 'user' > ../f1
real 0m0.886s
$ time rg -uuu -w 'user' > ../f2
real 0m0.287s
$ diff -sq <(sort ../f1) <(sort ../f2)
Files /dev/fd/63 and /dev/fd/62 are identical
# clean up
$ rm ../f[12]
Lot of factors like file size, file encoding, line size, sparse or dense matches, hardware features, etc will affect the performance. ripgrep
provides options like -j
, --dfa-size-limit
and --mmap
for tuning performance related settings.
See Benchmark with other grep implementations by the author of
ripgrep
command for a methodological analysis and insights.
ripgrep-all
Quoting from the GitHub repo:
rga
is a line-oriented search tool that allows you to look for a regex in a multitude of file types.rga
wraps the awesomeripgrep
and enables it to search in pdf, docx, sqlite, jpg, movie subtitles (mkv, mp4), etc.
The main attraction is pairing file types and relevant tools to enable text searching. rga
also has a handy caching feature that speeds up the search for subsequent usages on the same input.
Summary
ripgrep
is an excellent alternative to GNU grep
. If you are working with large code bases, I'd definitely recommend ripgrep
for its performance and customization options. There are interesting features in the pipeline too, for example ngram indexing support.
Exercises
Would be a good idea to first redo all the exercises using rg
from all the previous chapters. Some exercises will require reading the manual, as those options aren't covered in this book.
The exercises directory has all the files used in this section.
1) Which option will change the line separator from \n
to \r\n
?
# no output
$ rg -cx '' dos.txt
##### add your solution here
4
2) Default behavior of ripgrep
changes depending on whether the output is redirected or not. Use appropriate option(s) to filter lines containing are
from the sample.txt
and patterns.txt
input files and pipe the output to tr 'a-z' 'A-Z'
to get results as shown below.
##### add your solution here
PATTERNS.TXT
12:CARE
15:SCARE
SAMPLE.TXT
4:HOW ARE YOU
3) Replace all occurrences of ].*[
with _
for the input file regex_terms.txt
.
##### add your solution here
^[c-k].*\W$
ly.
[A-Z_0-9]
4) For the input file nul_separated
, use the ASCII NUL character as the line separator and display lines containing fig
. Perform additional transformation to convert ASCII NUL characters, if any, to the newline character.
##### add your solution here
apple
fig
mango
icecream
5) For the input file nul_separated
, replace the ASCII NUL character with a newline character, followed by ---
and another newline character.
##### add your solution here
apple
fig
mango
icecream
---
how are you
have a nice day
---
dragon unicorn centaur
6) Extract all whole words from the sample.txt
input file. However, do not extract words if they contain any character present in the ignore
shell variable.
$ ignore='aety'
##### add your solution here
World
Hi
How
do
Much
Adios
$ ignore='eosW'
##### add your solution here
Hi
it
it
banana
papaya
Much
7) How would you represent a $
character literally when using the -r
option?
8) From the patterns.txt
input file, extract from car
at the start of a line to the very next occurrence of book
or lie
in the file.
##### add your solution here
care
4*5]
a huge discarded pile of book
car
eden
rested replie
9) From the pcre.txt
input file, extract from the second field to the second last field from rows having at least two columns considering ;
as the delimiter. For example, b;c
should be extracted from a;b;c;d
and a line containing less than two ;
characters shouldn't produce any output.
##### add your solution here
in;awe;b2b;3list
be;he;0;a;b
10) For the input file python.md
, match all lines containing python
irrespective of case, but not if it is part of code blocks that are bounded by triple backticks.
##### add your solution here
REPL is a good way to learn PYTHON for beginners.
python comes loaded with awesome methods. Enjoy learning pYtHoN.
For the rest of the exercises, use the
recursive_matching
directory that was created in an earlier chapter. Source therecursive.sh
script if you haven't created this directory yet.# the 'recursive.sh' script is present in the 'exercises' directory $ source recursive.sh
11) List all files not containing blue
. Hidden files should also be considered.
##### add your solution here
substitute.sh
backups/dot_files/.bash_aliases
backups/dot_files/.inputrc
projects/shell/hello.sh
projects/python/hello.py
12) List all the files in the backups
directory, including links and hidden files.
##### add your solution here
backups/text/pat.txt
backups/color list.txt
backups/dot_files/.inputrc
backups/dot_files/.bash_aliases
13) What does the -uuu
option mean?
14) Display lines containing a word ending with e
. Search only among the sh
file type and the output should not have line number or filename prefixes.
##### add your solution here
sed -i 's/search/replace/g' **/*.txt
15) List files other than hidden files and file types sh
and py
. Links should be considered for listing.
##### add your solution here
backups/text/pat.txt
backups/color list.txt
colors_2.txt
sample_file.txt
colors_1
16) List all files not containing a .
character in their names. Ignore links.
##### add your solution here
colors_1
17) What does **
mean when used with the -g
option?
18) Search recursively and list the names of files that contain Hello
or blue
. Symbolic links should be searched as well. Do not search within python
or backups
directories.
##### add your solution here
colors_2.txt
sample_file.txt
colors_1
projects/shell/hello.sh
19) Match lines containing Hello
or red
only from files in the current hierarchy, i.e. don't search recursively. Symbolic links should be searched as well.
##### add your solution here
colors_2.txt
5:red
sample_file.txt
1:Hello World
20) Search recursively for files containing blue
, yellow
and teal
anywhere in the file.
##### add your solution here
colors_1