Recursive search
This chapter will cover recursive search options and ways to filter the files to be searched. Shell globs and the find
command are also discussed to show alternate methods. You'll also learn how to pass the files filtered by grep
to other commands for further processing.
The example_files directory has the script used to create the sample directory for this chapter.
Sample directory
For sample files and directories used in this chapter, go to the example_files
directory and source the grep.sh
script.
$ source grep.sh
$ tree -a
.
├── backups
│ ├── color list.txt
│ └── dot_files
│ ├── .bash_aliases
│ └── .inputrc
├── colors_1
├── colors_2
├── .hidden
└── projects
├── dot_files -> ../backups/dot_files
├── python
│ └── hello.py
└── shell
└── hello.sh
6 directories, 8 files
Recursive options
From man grep
:
-r, --recursive
Read all files under each directory, recursively, following
symbolic links only if they are on the command line. Note
that if no file operand is given, grep searches the working
directory. This is equivalent to the -d recurse option.
-R, --dereference-recursive
Read all files under each directory, recursively. Follow
all symbolic links, unlike -r.
-r
and-R
will work as if-H
option was specified as well, even if there is only one file found during the recursive search. Hidden files are included by default.
When the above options are used, any directory in the argument list will be searched recursively. By default, the current directory will be used if there's no path specified. Here are some basic examples:
# current directory is the default path to be searched recursively
# show all matching lines containing 'clear'
$ grep -r 'clear'
.hidden:clear blue sky
backups/dot_files/.bash_aliases:alias c=clear
# without filename prefix
$ grep -rh 'clear'
clear blue sky
alias c=clear
# list of files containing 'blue'
$ grep -rl 'blue'
.hidden
colors_1
colors_2
backups/color list.txt
# list of files NOT containing 'blue'
$ grep -rL 'blue'
projects/python/hello.py
projects/shell/hello.sh
backups/dot_files/.bash_aliases
backups/dot_files/.inputrc
If links are provided as part of the argument list, grep
will perform a search within that path even if only the -r
option is used. The -R
option will follow links even when they are not part of the argument list.
# -r will not follow links
$ grep -rl 'pwd'
backups/dot_files/.bash_aliases
# link provided as an argument will be searched even with -r
$ grep -rl 'pwd' backups projects/dot_files
backups/dot_files/.bash_aliases
projects/dot_files/.bash_aliases
# -R will automatically follow links
$ grep -Rl 'pwd'
projects/dot_files/.bash_aliases
backups/dot_files/.bash_aliases
Customize search path
By default, the recursive search options -r
and -R
will include hidden files as well. There are situations, such as version controlled directories, where specific paths should be ignored or all the files mentioned in a specific file should be ignored. To aid in such custom searches, four options are available:
Option | Description |
---|---|
--include=GLOB | search only files that match GLOB (a file pattern) |
--exclude=GLOB | skip files that match GLOB |
--exclude-from=FILE | skip files that match any file pattern from FILE |
--exclude-dir=GLOB | skip directories that match GLOB |
GLOB
here refers to wildcard patterns (also known as globs) used by the shell to expand filenames. These are NOT the same as regular expressions. When recursive options are used, theGLOB
applies only to the basename of a file or directory, not the entire path. For more information about globs, see this mywiki.wooledge article.
Here are some basic examples:
# without filtering
$ grep -rl 'blue'
.hidden
colors_1
colors_2
backups/color list.txt
# search only filenames ending with '.txt'
$ grep -rl --include='*.txt' 'blue'
backups/color list.txt
# exclude filenames ending with '.txt' or starting with '.hi'
$ printf '*.txt\n.hi*' | grep -rl --exclude-from=- 'blue'
colors_1
colors_2
Each of these options can be used multiple times to narrow your search.
# excluding 'backups' directory and hidden files
$ grep -rl --exclude-dir='backups' --exclude='.*' 'blue'
colors_1
colors_2
# allow only filenames ending with '.txt' or starting with '.hi'
$ grep -rl --include='*.txt' --include='.hi*' 'blue'
.hidden
backups/color list.txt
If you mix --include
and --exclude
options, their order of declaration matters.
# here, exclude gets countered by the include option
$ grep -rl --exclude='*.sh' --include='*ll*' 'He'
projects/python/hello.py
projects/shell/hello.sh
# files ending with '.sh' are excluded as expected
$ grep -rl --include='*ll*' --exclude='*.sh' 'He'
projects/python/hello.py
As mentioned earlier, these options can be used even when recursive search isn't active.
$ grep -l --exclude='*.sh' 'He' projects/*/* projects/python/hello.py $ grep -l --include='*.sh' 'He' projects/*/* projects/shell/hello.sh
extglob and globstar
Modern versions of shells like bash
and zsh
provide advanced wildcard matching. These can be used instead of -r
and -R
options for some cases. See my blog posts on extended globs and globstar for more details on these shell options.
# same as: grep -rl --include='*.txt' --include='*.py' --include='*.sh' 'r'
# to include hidden files, 'dotglob' shell option should be set as well
$ shopt -s extglob globstar
$ grep -l 'r' **/*.@(txt|py|sh)
backups/color list.txt
projects/python/hello.py
In the above example, **
indicates that you need recursive matching from that point onwards. @(pattern-list)
helps to provide alternate patterns to be matched, with common parts outside this grouping.
Wildcard matching doesn't distinguish between directories and files. So, you might have to use -d skip
to prevent grep
from treating directories as input files to be searched. Here's an example:
$ printf '%s\n' **/*py*
projects/python
projects/python/hello.py
$ grep -l 'on' **/*py*
grep: projects/python: Is a directory
projects/python/hello.py
$ grep -d skip -l 'on' **/*py*
projects/python/hello.py
find command
The find
command is even more versatile than recursive options and advanced wildcard matching. Apart from searching based on filename, it has provisions to match based on file properties like size and time.
# files (including hidden ones) with size less than 25 bytes
# '-type f' helps to match only files
# -L option tells find to follow links
$ find -L -type f -size -25c
./projects/python/hello.py
./projects/shell/hello.sh
./.hidden
./backups/color list.txt
# apply 'grep' only for the files filtered by the find command
$ find -L -type f -size -25c -exec grep 'e$' {} +
./backups/color list.txt:blue
See find chapter from my Computing from the Command Line ebook for more details about this command.
Piping filenames
Suppose a command gives a list of filenames and you want to pass this list as input arguments to another command, what would you do? One solution is to use the xargs
command. Here's a basic example (assuming filenames won't conflict with shell metacharacters):
# an example command producing a list of filenames
$ grep -rl 'clear'
.hidden
backups/dot_files/.bash_aliases
# same as: head -n1 .hidden backups/dot_files/.bash_aliases
$ grep -rl 'clear' | xargs head -n1
==> .hidden <==
ghost
==> backups/dot_files/.bash_aliases <==
alias p=pwd
Characters like space, newline, semicolon, etc are special to the shell. You have to properly quote filenames containing such metacharacters. Or, where applicable, you can use a list of filenames separated by the ASCII NUL character (since filenames cannot have the NUL character). You can use grep -Z
to separate the output with NUL and xargs -0
to treat the input as NUL separated. Here's an example:
# consider this command that generates a list of filenames
$ grep -rl 'blue'
.hidden
colors_1
colors_2
backups/color list.txt
# example to show issues due to filenames containing shell metacharacters
# 'backups/color list.txt' is treated as two different files
$ grep -rl 'blue' | xargs grep -l 'teal'
colors_1
grep: backups/color: No such file or directory
grep: list.txt: No such file or directory
# use 'grep -Z' + 'xargs -0' combo for a robust solution
# match files containing both 'blue' and 'teal'
$ grep -rlZ 'blue' | xargs -0 grep -l 'teal'
colors_1
Note that the command passed to xargs
doesn't accept custom made aliases and functions. So, if you had aliased grep
to grep --color=auto
, don't be surprised if the output isn't colorized. See unix.stackexchange: have xargs use alias instead of binary for details and workarounds.
You can use
xargs -r
to avoid running the command when the filename list doesn't have any non-blank character (i.e. when the list is effectively empty).# there's no file containing 'violet' # so, xargs doesn't get any filename, but grep is still run $ grep -rlZ 'violet' | xargs -0 grep -L 'brown' (standard input) # using -r option avoids running the command in such cases $ grep -rlZ 'violet' | xargs -r0 grep -L 'brown'
Do not use
xargs -P
to combine the output of parallel runs, unless you know how to manage output buffers and thus prevent mangled result. The parallel command would be a better option. See unix.stackexchange: xargs vs parallel for more details. See also unix.stackexchange: when to use xargs.
Summary
Having recursive options when there is already find
command seems unnecessary, but in my opinion, these options are highly convenient. Some cases may require falling back to shell globs or find
or even a combination of these methods. Modern tools like ripgrep
provide a default recursive search behavior, with out-of-box features like ignoring hidden files, respecting .gitignore
rules, parallel execution and so on.
Exercises
Use the
recursive.sh
script from the exercises directory for this section. Unless otherwise mentioned, assume you need to use the-r
option instead of the-R
option.# change to the 'exercises' directory and source the 'recursive.sh' script $ source recursive.sh $ tree -a . ├── backups │ ├── color list.txt │ ├── dot_files │ │ ├── .bash_aliases │ │ └── .inputrc │ └── text │ └── pat.txt -> ../../../patterns.txt ├── colors_1 ├── colors_2.txt ├── .hidden ├── projects │ ├── python │ │ └── hello.py │ └── shell │ └── hello.sh ├── sample_file.txt -> ../sample.txt └── substitute.sh 6 directories, 11 files
1) Search recursively and display the lines containing ello
. Output should not have filename prefix.
##### add your solution here
print("Hello, Python!")
echo "Hello, Bash!"
yellow
yellow
2) Search recursively and list the names of files containing blue
or on
or a double quote character. Match all of these terms only at the end of a line.
##### add your solution here
projects/shell/hello.sh
colors_1
colors_2.txt
backups/dot_files/.inputrc
backups/color list.txt
3) Search recursively and list the names of files containing blue
, but do not search within the backups
directory.
##### add your solution here
.hidden
colors_1
colors_2.txt
4) Search recursively within the backups
directory and list the names of files containing red
. Symbolic links found in this directory should be searched as well.
##### add your solution here
backups/color list.txt
backups/text/pat.txt
5) Search recursively and list the names of files that do not contain greeting
or blue
. Symbolic links should be searched as well.
##### add your solution here
projects/shell/hello.sh
substitute.sh
sample_file.txt
backups/dot_files/.bash_aliases
backups/dot_files/.inputrc
6) Search for files containing red
or ello
recursively, but do not list the file if it also contains greeting
.
##### add your solution here
projects/shell/hello.sh
colors_1
colors_2.txt
7) Search recursively only within filenames ending with .txt
and display the names of files containing red
. Symbolic links should be searched as well.
##### add your solution here
colors_2.txt
backups/color list.txt
backups/text/pat.txt
8) Search recursively only within filenames ending with .txt
but not if the name has a space character. Display the names of files containing red
. Symbolic links should be searched as well.
##### add your solution here
colors_2.txt
backups/text/pat.txt
9) Which option will you use if you have a file with a list of glob patterns to identify filenames to be excluded?
10) Does the glob pattern provided to include
and exclude
options match only the basename or the entire file path? Assume that recursive search is active.
11) How would you tell grep
to avoid treating directory arguments as input files to be searched?
12) Use a combination of find
and grep
commands to display lines containing a whole word Hi
only for symbolic links.
##### add your solution here
./sample_file.txt:Hi there
./backups/text/pat.txt:Hi there(greeting). Nice day(a(b)
13) Search recursively and list the names of files that contain Hello
or blue
. Symbolic links should be searched as well. Do not search within python
or backups
directories.
##### add your solution here
projects/shell/hello.sh
.hidden
colors_1
sample_file.txt
colors_2.txt
14) Search recursively only within filenames ending with .txt
and count the total number of lines containing car
or blue
or a digit character. Symbolic links should be searched as well.
##### add your solution here
21
15) Display lines containing Hello
or red
only from files in the current hierarchy, i.e. don't search recursively. Symbolic links should be searched as well.
##### add your solution here
colors_2.txt:red
sample_file.txt:Hello World
16) Search recursively for files containing blue
as well as yellow
anywhere in the file, but do not list the file if it also contains teal
.
##### add your solution here
colors_2.txt