Searching Files and Filenames

This chapter will show how to search file contents based on literal strings or regular expressions. After that, you'll learn how to locate files based on their names and other properties like size, last modified timestamp and so on.

info The example_files directory has the scripts used in this chapter.


Quoting from wikipedia:

grep is a command-line utility for searching plain-text data sets for lines that match a regular expression. Its name comes from the ed command g/re/p (globally search a regular expression and print), which has the same effect.

The grep command has lots and lots of features, so much so that I wrote a book with hundreds of examples and exercises. The most common usage is filtering lines from the input using a regular expression (regexp).

Common options

Commonly used options are listed below. Examples will be discussed in later sections.

  • --color=auto highlight the matching portions, filenames, line numbers, etc using colors
  • -i ignore case
  • -v print only the non-matching lines
  • -n prefix line numbers for output lines
  • -c display only the count of output lines
  • -l print only the filenames matching the given expression
  • -L print filenames not matching the pattern
  • -w match pattern only as whole words
  • -x match pattern only as whole lines
  • -F interpret pattern as a fixed string (i.e. not as a regular expression)
  • -o print only the matching portions
  • -A N print the matching line and N number of lines after the matched line
  • -B N print the matching line and N number of lines before the matched line
  • -C N print the matching line and N number of lines before and after the matched line
  • -m N print a maximum of N matching lines
  • -q no standard output, quit immediately if match found, useful in scripts
  • -s suppress error messages, useful in scripts
  • -r recursively search all files in the specified input folders (by default searches the current directory)
  • -R like -r, but follows symbolic links as well
  • -h do not prefix filename for matching lines (default behavior for single input file)
  • -H prefix filename for matching lines (default behavior for multiple input files)

The following examples would all be suited for the -F option as these do not use regular expressions. grep is smart enough to do the right thing in such cases.

# lines containing 'an'
$ printf 'apple\nbanana\nmango\nfig\ntango\n' | grep 'an'

# case insensitive matching
$ printf 'Cat\ncut\ncOnCaT\nfour cats\n' | grep -i 'cat'
four cats

# match only whole words
$ printf 'par value\nheir apparent\ntar-par' | grep -w 'par'
par value

# count empty lines
$ printf 'hi\n\nhello\n\n\n\nbye\n' | grep -cx ''

# print the matching line as well as two lines after
$ printf 'red\nblue\ngreen\nbrown\nyellow' | grep -A2 'blue'

Here's an example where the line numbers and matched portions are highlighted in color:

example with --color option

Regular Expressions

By default, grep treats the search pattern as Basic Regular Expression (BRE). Here are the various options related to regexp:

  • -G option can be used to specify explicitly that BRE is needed
  • -E option will enable Extended Regular Expression (ERE)
    • in GNU grep, BRE and ERE only differ in how metacharacters are specified, no difference in features
  • -F option will cause the search patterns to be treated literally
  • -P if available, this option will enable Perl Compatible Regular Expression (PCRE)

The following reference is for Extended Regular Expressions.


  • ^ restricts the match to the start of the string
  • $ restricts the match to the end of the string
  • \< restricts the match to the start of word
  • \> restricts the match to the end of word
  • \b restricts the match to both the start/end of words
  • \B matches wherever \b doesn't match

Dot metacharacter and Quantifiers

  • . match any character, including the newline character
  • ? match 0 or 1 times
  • * match 0 or more times
  • + match 1 or more times
  • {m,n} match m to n times
  • {m,} match at least m times
  • {,n} match up to n times (including 0 times)
  • {n} match exactly n times

Character classes

  • [set123] match any of these characters once
  • [^set123] match except any of these characters once
  • [3-7AM-X] range of characters from 3 to 7, A, another range from M to X
  • \w similar to [a-zA-Z0-9_] for matching word characters
  • \s similar to [ \t\n\r\f\v] for matching whitespace characters
  • \W match non-word characters
  • \S match non-whitespace characters
  • [[:digit:]] similar to [0-9]
  • [[:alnum:]_] similar to \w

Alternation and Grouping

  • pat1|pat2|pat3 match pat1 or pat2 or pat3
  • () group patterns, a(b|c)d is same as abd|acd
    • also serves as a capture group
  • \N backreference, gives the matched portion of the Nth capture group
    • \1 backreference to the first capture group
    • \2 backreference to the second capture group and so on up to \9

Quoting from the manual for BRE vs ERE differences:

In basic regular expressions the meta-characters ?, +, {, |, (, and ) lose their special meaning; instead use the backslashed versions \?, \+, \{, \|, \(, and \).

Regexp examples

# lines ending with 'ar'
$ printf 'spared no one\npar\nspar\ndare' | grep 'ar$'

# extract 'part' or 'parrot' or 'parent' case insensitively
$ echo 'par apartment PARROT parent' | grep -ioE 'par(en|ro)?t'

# extract quoted text
$ echo 'I like "mango" and "guava"' | grep -oE '"[^"]+"'

# 8 character lines having the same 3 lowercase letters at the start and end
$ grep -xE '([a-z]{3})..\1' /usr/share/dict/words

Line comparisons between files

The -f and -x options can be combined to get the common lines between two files or the difference when -v is used as well. Add -F if you want to treat the search strings literally (recall that regexp is the default).

# change to the 'scripts' directory and source the '' script
$ source

# common lines between two files
$ grep -Fxf colors_1 colors_2

# lines present in colors_2 but not in colors_1
$ grep -Fvxf colors_1 colors_2
dark green

# lines present in colors_1 but not in colors_2
$ grep -Fvxf colors_2 colors_1
light blue

Perl Compatible Regular Expression

PCRE has many advanced features compared to BRE/ERE. Here are some examples:

# numbers >= 100, uses possessive quantifiers
$ echo '0501 035 154 12 26 98234' | grep -oP '0*+\d{3,}'

# extract digits only if preceded by =
$ echo '100 apple=42, fig=314 red:255' | grep -oP '=\K\d+'

# all digits and optional hyphen combo from the start of the line
$ echo '123-87-593 42 fig 314-12-111' | grep -oP '\G\d+-?'

# all whole words except 'bat' and 'map'
$ echo 'car2 bat cod map combat' | grep -oP '\b(bat|map)\b(*SKIP)(*F)|\w+'

See man pcrepattern or PCRE online manual for documentation.

You can use the -r option to search recursively within the specified directories. By default, the current directory will be searched. Use -R if you want symbolic links found within the input directories to be followed as well. You do not need the -R option for specifying symbolic links as arguments.

Here are some basic examples. Recursive search will work as if -H option was specified as well, even if only one file was matched. Also, hidden files are included by default.

# change to the 'scripts' directory and source the '' script
$ source
$ ls -AF
backups/  colors_1  colors_2  .hidden  projects/

# recursively search in the 'backups' directory
$ grep -r 'clear' backups
backups/dot_files/.bash_aliases:alias c=clear
# add the -h option to prevent filename prefix in the output
$ grep -rh 'clear' backups
alias c=clear

# by default, the current directory is used for recursive search
$ grep -rl 'clear'

You can further prune the files to be searched using the include/exclude options. Note that these options will work even if recursive search is not active.

--include=GLOBsearch only files that match GLOB
--exclude=GLOBskip files that match GLOB
--exclude-from=FILEskip files that match any file pattern from FILE
--exclude-dir=GLOBskip directories that match GLOB
# default recursive search
$ grep -r 'Hello'
projects/python/"Hello, Python!")
projects/shell/ "Hello, Bash!"

# limit the search to only filenames ending with '.py'
$ grep -r --include='*.py' 'Hello'
projects/python/"Hello, Python!")

# in some cases you can just use shell globs instead recursive grep
$ shopt -s globstar
$ grep -H 'Hello' **/*.py
projects/python/"Hello, Python!")

info ripgrep is a recommended alternative to GNU grep with a highly optimized regexp engine, parallel search, ignoring files based on .gitignore and so on.

grep and xargs

You can use the shell | operator to pass the output of a command as input to another command. Suppose a command gives you a list of filenames and you want to pass this list as input arguments to another command, what would you do? One solution is to use the xargs command. Here's a basic example (assuming filenames won't conflict with shell metacharacters):

# an example command producing a list of filenames
$ grep -rl 'clear'

# same as: head -n1 .hidden backups/dot_files/.bash_aliases
$ grep -rl 'clear' | xargs head -n1
==> .hidden <==

==> backups/dot_files/.bash_aliases <==
alias p=pwd

Characters like space, newline, semicolon, etc are special to the shell. So, filenames containing these characters have to be properly quoted. Or, where applicable, you can use a list of filenames separated by the ASCII NUL character (since filenames cannot have the NUL character). You can use grep -Z to separate the output with NUL and xargs -0 to treat the input as NUL separated. Here's an example:

# consider this command that generates a list of filenames
$ grep -rl 'blue'
backups/color list.txt

# example to show issues due to filenames containing shell metacharacters
# 'backups/color list.txt' is treated as two different files
$ grep -rl 'blue' | xargs grep -l 'teal'
grep: backups/color: No such file or directory
grep: list.txt: No such file or directory

# use 'grep -Z' + 'xargs -0' combo for a robust solution
# match files containing both 'blue' and 'teal'
$ grep -rlZ 'blue' | xargs -0 grep -l 'teal'

Note that the command passed to xargs doesn't accept custom made aliases and functions. So, if you had aliased grep to grep --color=auto, don't be surprised if the output isn't colorized. See unix.stackexchange: have xargs use alias instead of binary for details and workarounds.

info You can use xargs -r to avoid running the command when the filename list doesn't have any non-blank character (i.e. when the list is empty).

# there's no file containing 'violet'
# so, xargs doesn't get any filename, but grep is still run
$ grep -rlZ 'violet' | xargs -0 grep -L 'brown'
(standard input)

# using the -r option avoids running the command in such cases
$ grep -rlZ 'violet' | xargs -r0 grep -L 'brown'

warning warning Do not use xargs -P to combine the output of parallel runs, as you are likely to get a mangled result. The parallel command would be a better option. See unix.stackexchange: xargs vs parallel for more details. See also unix.stackexchange: when to use xargs.

Further Reading


The find command has comprehensive features to filter files and directories based on their name, size, timestamp and so on. And more importantly, find helps you to perform actions on such filtered files.


By default, you'll get every entry (including hidden ones) in the current directory and sub-directories when you use find without any options or paths. To search within specific paths, they should be immediately mentioned after find, i.e. before any options.

# change to the 'scripts' directory and source the '' script
$ source
$ ls -F
backups/*  ip.txt     report.log  todos/
errors.log*           projects/  scripts@

$ cd projects
# same as: find .
$ find

$ cd ..
$ find todos

info Note that symbolic links won't be followed by default. You can use the -L option for such cases.

To match filenames based on a particular criteria, you can use wildcards or regular expressions. For wildcards, you can use the -name option or the case-insensitive version -iname. These will match only the basename, so you'll get a warning if you use / as part of the pattern. You can use -path and -ipath if you need to include / as well in the pattern. Unlike grep, the glob pattern is matched against the entire basename (as there are no start/end anchors in globs).

# filenames ending with '.log'
# 'find .' indicates the current working directory (CWD) as the path to search
$ find . -name '*.log'

# match filenames containing 'ip' case-insensitively
# note the use of '*' on both sides of 'ip' to match the whole filename
# . is optional when CWD is the only path to search
$ find -iname '*ip*'

# names containing 'k' within the 'backups' and 'todos' directories
$ find backups todos -name '*k*'

You can use the -not (or !) operator to invert the matching condition:

# same as: find todos ! -name '*[A-Z]*'
$ find todos -not -name '*[A-Z]*'

You can use the -regex and -iregex (case-insensitive) options to match filenames based on regular expressions. In this case, the pattern will match the entire path, so / can be used without requiring special options. The default regexp flavor is emacs which you can change by using the -regextype option.

# filename containing only uppercase alphabets and file extension is '.txt'
# note the use of '.*/' to match the entire file path
$ find -regex '.*/[A-Z]+\.txt'

# here 'egrep' flavor is being used
# filename starting and ending with the same word character (case-insensitive)
# and file extension is '.txt'
$ find -regextype egrep -iregex '.*/(\w).*\1\.txt'

File type

The -type option helps to filter files based on their types like regular file, directory, symbolic link, etc.

# regular files
$ find projects -type f

# regular files that are hidden as well
$ find -type f -name '.*'

# directories
$ find projects -type d

# symbolic links
$ find -type l

info You can use , to separate multiple file types. For example, -type f,l will match both regular files and symbolic links.

$ find -type f,l -name '*ip*'


The path being searched is considered as depth 0, files within the search path are at depth 1, files within a sub-directory are at depth 2 and so on. Note that these global options should be specified before other kind of options like -type, -name, etc.

The -maxdepth option restricts the search to the specified maximum depth:

# non-hidden regular files only in the current directory
# sub-directories will not be checked
# -not -name '.*' can also be used instead of -name '[^.]*'
$ find -maxdepth 1 -type f -name '[^.]*'

The -mindepth option specifies the minimum depth:

# recall that path being searched is considered as depth 0
# and contents within the search path are at depth 1
$ find -mindepth 1 -maxdepth 1 -type d

$ find -mindepth 3 -type f


Consider the following file properties:

  • a accessed
  • c status changed
  • m modified

The above prefixes need to be combined with time (based on 24 hour periods) or min (based on minutes) options. For example, the -mtime (24 hour) option checks for the last modified timestamp and -amin (minute) checks for the last accessed timestamp. These options accept a number (integer or fractional) argument, that can be further prefixed by the + or - symbols. Here are some examples:

# modified less than 24 hours ago
$ find -maxdepth 1 -type f -mtime 0

# accessed between 24 to 48 hours ago
$ find -maxdepth 1 -type f -atime 1
# accessed within the last 24 hours
$ find -maxdepth 1 -type f -atime -1
# accessed within the last 48 hours
$ find -maxdepth 1 -type f -atime -2

# modified more than 20 days back
$ find -maxdepth 1 -type f -mtime +20

info The -daystart qualifier will measure time only from the beginning of the day. For example, -daystart -mtime 1 will check the files that were modified yesterday.


You can use the -size option to filter based on file sizes. By default, the number argument will be considered as 512-byte blocks. You can use the suffix c to specify the size in bytes. The suffixes k (kilo), M (mega) and G (giga) are calculated in powers of 1024.

# greater than 10 * 1024 bytes
$ find -type f -size +10k

# greater than 9 bytes and less than 50 bytes
$ find -type f -size +9c -size -50c

# exactly 10 bytes
$ find -type f -size 10c

info You can also use the -empty option instead of -size 0.

Acting on matched files

The -exec option helps you pass the matching files to another command. You can choose to execute the command once for every file (by using \;) or just once for all the matching files (by using +). However, if the number of files are too many, find will use more command invocations as necessary. The ; character is escaped since it is a shell metacharacter (you can also quote it as an alternative to escaping).

You need to use {} to represent the files passed as arguments to the command being executed. Here are some examples:

# count the number of characters for each matching file
# wc is called separately for each matching file
$ find -type f -size +9k -exec wc -c {} \;
1234567 ./report.log
54321 ./errors.log

# here, both matching files are passed together to the wc command
$ find -type f -size +9k -exec wc -c {} +
1234567 ./report.log
  54321 ./errors.log
1288888 total

As mentioned in the Managing Files and Directories chapter, the -t option for cp and mv commands will help you specify the target directory before the source files. Here's an example:

$ mkdir rc_files
$ find backups/dot_files -type f -exec cp -t rc_files {} +

$ find rc_files -type f

$ rm -r rc_files

info You can use the -delete option instead of calling the rm command to delete the matching files. However, it cannot remove non-empty directories and there are other gotchas to be considered. See the manual for more details.

Multiple criteria

Filenames can be matched against multiple criteria such as -name, -size, -mtime, etc. You can use operators between them and group them within \( and \) to construct complex expressions.

  • -a or -and or absence of an operator means both expressions have to be satisfied
    • second expression won't be evaluated if the first one is false
  • -o or -or means either of the expressions have to be satisfied
    • second expression won't be evaluated if the first one is true
  • -not inverts the result of the expression
    • you can also use ! but that might need escaping or quoting depending on the shell
# names containing both 'x' and 'ip' in any order (case-insensitive)
$ find -iname '*x*' -iname '*ip*'

# names containing 'sc' or size greater than 10k
$ find -name '*sc*' -or -size +10k

# except filenames containing 'o' or 'r' or 'txt'
$ find -type f -not \( -name '*[or]*' -or -name '*txt*' \)


The -prune option is helpful when you want to prevent find from descending into specific directories. By default, find will traverse all the files even if the given conditions will result in throwing away those results from the output. So, using -prune not only helps in speeding up the process, it could also help in cases where trying to access a file within the exclusion path would've resulted in an error.

# regular files ending with '.log'
$ find -type f -name '*.log'

# exclude the 'backups' directory
# note the use of -path when '/' is needed in the pattern
$ find -type f -not -path './backups/*' -prune -name '*.log'

Using -not -path '*/.git/*' -prune can be handy when dealing with Git based version control projects.

find and xargs

Similar to the grep -Z and xargs -0 combination seen earlier, you can use the find -print0 and xargs -0 combination. The -exec option is sufficient for most use cases, but xargs -P (or the parallel command) can be handy if you need parallel execution for performance reasons.

Here's an example of passing filtered files to sed (stream editor, will be discussed in the Multipurpose Text Processing Tools chapter):

$ find -name '*.log'

# for the filtered files, replace all occurrences of 'apple' with 'fig'
# 'sed -i' will edit the files inplace, so no output on the terminal
$ find -name '*.log' -print0 | xargs -r0 -n2 -P2 sed -i 's/apple/fig/g'

In the above example, -P2 is used to allow xargs to run two processes at a time (default is one process). You can use -P0 to allow xargs to launch as many processes as possible. The -n2 option is used to limit the number of file arguments passed to each sed call to 2, otherwise xargs is likely to pass as many arguments as possible and thus reduce/negate the effect of parallelism. Note that the values used for -n and -P in the above illustration are just random examples, you'll have to fine tune them for your particular use case.

Further Reading


locate is a faster alternative to the find command for searching files by name. It is based on a database, which gets updated by a cron job. So, newer files may be not present in results unless you update the database. Use this command if it is available in your distro (for example, sudo apt install mlocate on Debian-like systems) and you remember some part of filename. Very useful if you have to search the entire filesystem in which case find command will take a very long time compared to locate.

Here are some examples:

  • locate 'power' print path of filenames containing power in the whole filesystem
    • implicitly, locate would change the string to *power* as no globbing characters are present in the string specified
  • locate -b '\power.log' print path matching the string power.log exactly at the end of the path
    • /home/learnbyexample/power.log matches
    • /home/learnbyexample/lowpower.log' will not match since there are other characters at the start of the filename
    • use of \ prevents the search string from implicitly being replaced by *power.log*
  • locate -b '\proj_adder' the -b option is also handy to print only the matching directory name, otherwise every file under that folder would also be displayed

info See also unix.stackexchange: pros and cons of find and locate.


info For grep exercises, use the example_files/text_files directory for input files, unless otherwise specified.

info For find exercises, use the script, unless otherwise specified.

1) Display lines containing an from the input files blocks.txt, ip.txt and uniform.txt. Show the results with and without filename prefix.

# ???
ip.txt:light orange

# ???
light orange

2) Display lines containing the whole word he from the sample.txt input file.

# ???
14) He he he

3) Match only whole lines containing car irrespective of case. The matching lines should be displayed with line number prefix as well.

$ printf 'car\nscared\ntar car par\nCar\n' | grep # ???

4) Display all lines from purchases.txt except those that contain tea.

# ???
washing powder

5) Display all lines from sample.txt that contain do but not it.

# ???
13) Much ado about nothing

6) For the input file sample.txt, filter lines containing do and also display the line that comes after such a matching line.

# ???
 6) Just do-it
 7) Believe it
13) Much ado about nothing
14) He he he

7) For the input file sample.txt, filter lines containing are or he as whole words as well as the line that comes before such a matching line. Go through info grep or the online manual and use appropriate options such that there's no separator between the groups of matching lines in the output.

# ???
 3) Hi there
 4) How are you
13) Much ado about nothing
14) He he he

8) Extract all pairs of () with/without text inside them, provided they do not contain () characters inside.

$ echo 'I got (12) apples' | grep # ???

$ echo '((2 +3)*5)=25 and (4.3/2*()' | grep # ???
(2 +3)

9) For the given input, match all lines that start with den or end with ly.

$ lines='reply\n1 dentist\n2 lonely\neden\nfly away\ndent\n'

$ printf '%b' "$lines" | grep # ???
2 lonely

10) Extract words starting with s and containing both e and t in any order.

$ words='sequoia subtle exhibit sets tests sit store_2'

$ echo "$words" | grep # ???

11) Extract all whole words having the same first and last word character.

$ echo 'oreo not a _oh_ pip RoaR took 22 Pop' | grep # ???

12) Match all input lines containing *[5] literally.

$ printf '4*5]\n(9-2)*[5]\n[5]*3\nr*[5\n' | grep # ???

13) Match whole lines that start with hand and immediately followed by s or y or le or no further character.

$ lines='handed\nhand\nhandy\nunhand\nhands\nhandle\nhandss\n'

$ printf '%b' "$lines" | grep # ???

14) Input lines have three or more fields separated by a , delimiter. Extract from the second field to the second last field. In other words, extract fields other than the first and last.

$ printf 'apple,fig,cherry\ncat,dog,bat\n' | grep # ???

$ echo 'dragon,42,unicorn,3.14,shapeshifter\n' | grep # ???

15) Recursively search for files containing ello.

# change to the 'scripts' directory and source the '' script
$ source

# ???

16) Search for files containing blue recursively, but do not search within the backups directory.

# change to the 'scripts' directory and source the '' script
$ source

# ???

17) Search for files containing blue recursively, but not if the file also contains teal.

# change to the 'scripts' directory and source the '' script
$ source

# ???
backups/color list.txt

18) Find all regular files within the backups directory.

# change to the 'scripts' directory and source the '' script
$ source

# ???

19) Find all regular files whose extension starts with p or s or v.

# ???

20) Find all regular files whose name do not have the lowercase letters g to l.

# ???

21) Find all regular files whose path has at least one directory name starting with p or d.

# ???

22) Find all directories whose name contains b or d.

# ???

23) Find all hidden directories.

# ???

24) Find all regular files at the exact depth of 2.

# ???

25) What's the difference between find -mtime and find -atime? And, what is the time period these options work with?

26) Find all empty regular files.

# ???

27) Create a directory named filtered_files. Then, copy all regular files that are greater than 1 byte in size but whose name don't end with .log to this directory.

# ???
$ ls -A filtered_files  .hidden  ip.txt

28) Find all hidden files, but not if they are part of the filtered_files directory created earlier.

# ???

29) Delete the filtered_files directory created earlier. Then, go through the find manual and figure out how to list only executable files.

# ???

30) List at least one use case for piping the find output to the xargs command instead of using the find -exec option.

31) How does the locate command work faster than the equivalent find command?