Recursive search

This chapter will cover recursive search options and ways to filter the files to be searched. Shell globs and the find command are also discussed to show alternate methods. You'll also learn how to pass the files filtered by grep to other commands for further processing.

info The example_files directory has the script used to create the sample directory for this chapter.

Sample directory

For sample files and directories used in this chapter, go to the example_files directory and source the grep.sh script.

$ source grep.sh

$ tree -a
.
├── backups
│   ├── color list.txt
│   └── dot_files
│       ├── .bash_aliases
│       └── .inputrc
├── colors_1
├── colors_2
├── .hidden
└── projects
    ├── dot_files -> ../backups/dot_files
    ├── python
    │   └── hello.py
    └── shell
        └── hello.sh

6 directories, 8 files

Recursive options

From man grep:

-r, --recursive
      Read all files under each directory, recursively,  following
      symbolic  links  only if they are on the command line.  Note
      that if no file operand is given, grep searches the  working
      directory.  This is equivalent to the -d recurse option.

-R, --dereference-recursive
      Read  all  files  under each directory, recursively.  Follow
      all symbolic links, unlike -r.

info -r and -R will work as if -H option was specified as well, even if there is only one file found during the recursive search. Hidden files are included by default.

When the above options are used, any directory in the argument list will be searched recursively. By default, the current directory will be used if there's no path specified. Here are some basic examples:

# current directory is the default path to be searched recursively
# show all matching lines containing 'clear'
$ grep -r 'clear'
.hidden:clear blue sky
backups/dot_files/.bash_aliases:alias c=clear

# without filename prefix
$ grep -rh 'clear'
clear blue sky
alias c=clear

# list of files containing 'blue'
$ grep -rl 'blue'
.hidden
colors_1
colors_2
backups/color list.txt

# list of files NOT containing 'blue'
$ grep -rL 'blue'
projects/python/hello.py
projects/shell/hello.sh
backups/dot_files/.bash_aliases
backups/dot_files/.inputrc

If links are provided as part of the argument list, grep will perform a search within that path even if only the -r option is used. The -R option will follow links even when they are not part of the argument list.

# -r will not follow links
$ grep -rl 'pwd'
backups/dot_files/.bash_aliases

# link provided as an argument will be searched even with -r
$ grep -rl 'pwd' backups projects/dot_files
backups/dot_files/.bash_aliases
projects/dot_files/.bash_aliases

# -R will automatically follow links
$ grep -Rl 'pwd'
projects/dot_files/.bash_aliases
backups/dot_files/.bash_aliases

Customize search path

By default, the recursive search options -r and -R will include hidden files as well. There are situations, such as version controlled directories, where specific paths should be ignored or all the files mentioned in a specific file should be ignored. To aid in such custom searches, four options are available:

OptionDescription
--include=GLOBsearch only files that match GLOB (a file pattern)
--exclude=GLOBskip files that match GLOB
--exclude-from=FILEskip files that match any file pattern from FILE
--exclude-dir=GLOBskip directories that match GLOB

info GLOB here refers to wildcard patterns (also known as globs) used by the shell to expand filenames. These are NOT the same as regular expressions. When recursive options are used, the GLOB applies only to the basename of a file or directory, not the entire path. For more information about globs, see this mywiki.wooledge article.

Here are some basic examples:

# without filtering
$ grep -rl 'blue'
.hidden
colors_1
colors_2
backups/color list.txt

# search only filenames ending with '.txt'
$ grep -rl --include='*.txt' 'blue'
backups/color list.txt

# exclude filenames ending with '.txt' or starting with '.hi'
$ printf '*.txt\n.hi*' | grep -rl --exclude-from=- 'blue'
colors_1
colors_2

Each of these options can be used multiple times to narrow your search.

# excluding 'backups' directory and hidden files
$ grep -rl --exclude-dir='backups' --exclude='.*' 'blue'
colors_1
colors_2

# allow only filenames ending with '.txt' or starting with '.hi'
$ grep -rl --include='*.txt' --include='.hi*' 'blue'
.hidden
backups/color list.txt

If you mix --include and --exclude options, their order of declaration matters.

# here, exclude gets countered by the include option
$ grep -rl --exclude='*.sh' --include='*ll*' 'He'
projects/python/hello.py
projects/shell/hello.sh

# files ending with '.sh' are excluded as expected
$ grep -rl --include='*ll*' --exclude='*.sh' 'He'
projects/python/hello.py

info As mentioned earlier, these options can be used even when recursive search isn't active.

$ grep -l --exclude='*.sh' 'He' projects/*/*
projects/python/hello.py

$ grep -l --include='*.sh' 'He' projects/*/*
projects/shell/hello.sh

extglob and globstar

Modern versions of shells like bash and zsh provide advanced wildcard matching. These can be used instead of -r and -R options for some cases. See my blog posts on extended globs and globstar for more details on these shell options.

# same as: grep -rl --include='*.txt' --include='*.py' --include='*.sh' 'r'
# to include hidden files, 'dotglob' shell option should be set as well 
$ shopt -s extglob globstar
$ grep -l 'r' **/*.@(txt|py|sh)
backups/color list.txt
projects/python/hello.py

In the above example, ** indicates that you need recursive matching from that point onwards. @(pattern-list) helps to provide alternate patterns to be matched, with common parts outside this grouping.

Wildcard matching doesn't distinguish between directories and files. So, you might have to use -d skip to prevent grep from treating directories as input files to be searched. Here's an example:

$ printf '%s\n' **/*py*
projects/python
projects/python/hello.py

$ grep -l 'on' **/*py*
grep: projects/python: Is a directory
projects/python/hello.py

$ grep -d skip -l 'on' **/*py*
projects/python/hello.py

find command

The find command is even more versatile than recursive options and advanced wildcard matching. Apart from searching based on filename, it has provisions to match based on file properties like size and time.

# files (including hidden ones) with size less than 25 bytes
# '-type f' helps to match only files
# -L option tells find to follow links
$ find -L -type f -size -25c
./projects/python/hello.py
./projects/shell/hello.sh
./.hidden
./backups/color list.txt

# apply 'grep' only for the files filtered by the find command
$ find -L -type f -size -25c -exec grep 'e$' {} +
./backups/color list.txt:blue

info See find chapter from my Computing from the Command Line ebook for more details about this command.

Piping filenames

Suppose a command gives a list of filenames and you want to pass this list as input arguments to another command, what would you do? One solution is to use the xargs command. Here's a basic example (assuming filenames won't conflict with shell metacharacters):

# an example command producing a list of filenames
$ grep -rl 'clear'
.hidden
backups/dot_files/.bash_aliases

# same as: head -n1 .hidden backups/dot_files/.bash_aliases
$ grep -rl 'clear' | xargs head -n1
==> .hidden <==
ghost

==> backups/dot_files/.bash_aliases <==
alias p=pwd

Characters like space, newline, semicolon, etc are special to the shell. You have to properly quote filenames containing such metacharacters. Or, where applicable, you can use a list of filenames separated by the ASCII NUL character (since filenames cannot have the NUL character). You can use grep -Z to separate the output with NUL and xargs -0 to treat the input as NUL separated. Here's an example:

# consider this command that generates a list of filenames
$ grep -rl 'blue'
.hidden
colors_1
colors_2
backups/color list.txt

# example to show issues due to filenames containing shell metacharacters
# 'backups/color list.txt' is treated as two different files
$ grep -rl 'blue' | xargs grep -l 'teal'
colors_1
grep: backups/color: No such file or directory
grep: list.txt: No such file or directory

# use 'grep -Z' + 'xargs -0' combo for a robust solution
# match files containing both 'blue' and 'teal'
$ grep -rlZ 'blue' | xargs -0 grep -l 'teal'
colors_1

Note that the command passed to xargs doesn't accept custom made aliases and functions. So, if you had aliased grep to grep --color=auto, don't be surprised if the output isn't colorized. See unix.stackexchange: have xargs use alias instead of binary for details and workarounds.

info You can use xargs -r to avoid running the command when the filename list doesn't have any non-blank character (i.e. when the list is effectively empty).

# there's no file containing 'violet'
# so, xargs doesn't get any filename, but grep is still run
$ grep -rlZ 'violet' | xargs -0 grep -L 'brown'
(standard input)

# using -r option avoids running the command in such cases
$ grep -rlZ 'violet' | xargs -r0 grep -L 'brown'

warning warning Do not use xargs -P to combine the output of parallel runs, unless you know how to manage output buffers and thus prevent mangled result. The parallel command would be a better option. See unix.stackexchange: xargs vs parallel for more details. See also unix.stackexchange: when to use xargs.

Summary

Having recursive options when there is already find command seems unnecessary, but in my opinion, these options are highly convenient. Some cases may require falling back to shell globs or find or even a combination of these methods. Modern tools like ripgrep provide a default recursive search behavior, with out-of-box features like ignoring hidden files, respecting .gitignore rules, parallel execution and so on.

Exercises

info Use the recursive.sh script from the exercises directory for this section. Unless otherwise mentioned, assume you need to use the -r option instead of the -R option.

# change to the 'exercises' directory and source the 'recursive.sh' script
$ source recursive.sh

$ tree -a
.
├── backups
│   ├── color list.txt
│   ├── dot_files
│   │   ├── .bash_aliases
│   │   └── .inputrc
│   └── text
│       └── pat.txt -> ../../../patterns.txt
├── colors_1
├── colors_2.txt
├── .hidden
├── projects
│   ├── python
│   │   └── hello.py
│   └── shell
│       └── hello.sh
├── sample_file.txt -> ../sample.txt
└── substitute.sh

6 directories, 11 files

1) Search recursively and display the lines containing ello. Output should not have filename prefix.

##### add your solution here
    print("Hello, Python!")
echo "Hello, Bash!"
yellow
yellow

2) Search recursively and list the names of files containing blue or on or a double quote character. Match all of these terms only at the end of a line.

##### add your solution here
projects/shell/hello.sh
colors_1
colors_2.txt
backups/dot_files/.inputrc
backups/color list.txt

3) Search recursively and list the names of files containing blue, but do not search within the backups directory.

##### add your solution here
.hidden
colors_1
colors_2.txt

4) Search recursively within the backups directory and list the names of files containing red. Symbolic links found in this directory should be searched as well.

##### add your solution here
backups/color list.txt
backups/text/pat.txt

5) Search recursively and list the names of files that do not contain greeting or blue. Symbolic links should be searched as well.

##### add your solution here
projects/shell/hello.sh
substitute.sh
sample_file.txt
backups/dot_files/.bash_aliases
backups/dot_files/.inputrc

6) Search for files containing red or ello recursively, but do not list the file if it also contains greeting.

##### add your solution here
projects/shell/hello.sh
colors_1
colors_2.txt

7) Search recursively only within filenames ending with .txt and display the names of files containing red. Symbolic links should be searched as well.

##### add your solution here
colors_2.txt
backups/color list.txt
backups/text/pat.txt

8) Search recursively only within filenames ending with .txt but not if the name has a space character. Display the names of files containing red. Symbolic links should be searched as well.

##### add your solution here
colors_2.txt
backups/text/pat.txt

9) Which option will you use if you have a file with a list of glob patterns to identify filenames to be excluded?

10) Does the glob pattern provided to include and exclude options match only the basename or the entire file path? Assume that recursive search is active.

11) How would you tell grep to avoid treating directory arguments as input files to be searched?

12) Use a combination of find and grep commands to display lines containing a whole word Hi only for symbolic links.

##### add your solution here
./sample_file.txt:Hi there
./backups/text/pat.txt:Hi there(greeting). Nice day(a(b)

13) Search recursively and list the names of files that contain Hello or blue. Symbolic links should be searched as well. Do not search within python or backups directories.

##### add your solution here
projects/shell/hello.sh
.hidden
colors_1
sample_file.txt
colors_2.txt

14) Search recursively only within filenames ending with .txt and count the total number of lines containing car or blue or a digit character. Symbolic links should be searched as well.

##### add your solution here
21

15) Display lines containing Hello or red only from files in the current hierarchy, i.e. don't search recursively. Symbolic links should be searched as well.

##### add your solution here
colors_2.txt:red
sample_file.txt:Hello World

16) Search recursively for files containing blue as well as yellow anywhere in the file, but do not list the file if it also contains teal.

##### add your solution here
colors_2.txt