Miscellaneous options

Some of the options not yet covered will be discussed in this chapter.

The example_files directory has all the files used in the examples.

Suppress stdout

While writing scripts, sometimes you just need to know if a file contains the pattern and act based on the exit status of the command. Instead of usual workarounds like redirecting output to /dev/null you can use the -q option. This will avoid printing anything on stdout and also provides speed benefit as grep would stop processing as soon as the given condition is satisfied. Check out my ch command line tool for a practical case study.

$ cat find.md
The find command is more versatile than recursive options and
and extended globs. Apart from searching based on filename, it
has provisions to match based on the the file characteristics
like size and time.

$ grep -wE '(\w+) \1' find.md
has provisions to match based on the the file characteristics
$ grep -qwE '(\w+) \1' find.md
$ echo $?
0

$ grep -q 'xyz' find.md
$ echo $?
1

$ grep -qwE '(\w+) \1' find.md && echo 'Repeated words found!'
Repeated words found!

Suppress stderr

The -s option will suppress the error messages that are intended for the stderr stream.

# when file doesn't exist
$ grep 'in' xyz.txt
grep: xyz.txt: No such file or directory
$ grep -s 'in' xyz.txt
$ echo $?
2

# when sufficient permission is not available
$ touch new.txt
$ chmod -r new.txt
$ grep 'rose' new.txt
grep: new.txt: Permission denied
$ grep -s 'rose' new.txt
$ echo $?
2

$ rm -f new.txt

Errors regarding regular expressions and invalid options will be on the stderr stream even when the -s option is used.

$ grep -sE 'a(' find.md
grep: Unmatched ( or \(

$ grep -sE 'a(' find.md 2> /dev/null
$ echo $?
2

Multiline matching

If the input file is small enough to meet memory requirements, the -z option comes in handy to match across multiple lines. This assumes that the input doesn't contain the NUL character and thus the entire file is read as single string. The -z option is similar to the -0 option for xargs, it will cause grep to separate input based on the NUL character (instead of the newline character).

# note that each match in the output will end with \0
$ grep -zowE '(\w+)\s+\1' find.md | od -c
0000000   a   n   d  \n   a   n   d  \0   t   h   e       t   h   e  \0
0000020

# replace the NUL characters for further processing
$ grep -zowE '(\w+)\s+\1' find.md | tr '\0' '\n'
and
and
the the
$ grep -zowE '(\w+)\s+\1' find.md | sed 's/\x0/\n---\n/g'
and
and
---
the the
---

If input contents includes the NUL character and -z is used, then whole file will not be read at once. Rather, grep will process chunks of data using the NUL character as the separator.

# with -z, \0 marks the different 'lines'
$ printf 'dark red\nteal\0a2\0spared' | grep -z 'red' | sed 's/\x0/\n---\n/g'
dark red
teal
---
spared
---

Byte offset

Sometimes you also want to know where the patterns you are searching for are located in the file. The -b option will give the byte location of matching lines (starting with 0 for the first byte).

# offset for the starting line of each match
$ grep -b 'is' find.md
0:The find command is more versatile than recursive options and
125:has provisions to match based on the the file characteristics

$ grep -b 'it' find.md
62:and extended globs. Apart from searching based on filename, it

With the -o option, you'll get the location of matching portions instead of lines.

$ grep -ob 'art\b' find.md
84:art

You can use alternatives like the awk command to get offset line-wise instead of locations based on the entire input file. Here's an example:

# output shows the line number and offset for the start of matching portion
# note that the offset starts with 1 for the first byte
$ awk 'match($0, /is/){print NR, RSTART, $0}' OFS=: find.md
1:18:The find command is more versatile than recursive options and
3:9:has provisions to match based on the the file characteristics

# or, you can use the ripgrep command (discussed later)
$ rg --column 'is' find.md
1:18:The find command is more versatile than recursive options and
3:9:has provisions to match based on the the file characteristics

Naming stdin

The --label option helps you customize the string to represent the standard input.

$ echo 'red and blue' | grep -c 'and' - find.md
(standard input):1
find.md:3

$ echo 'red and blue' | grep --label='stdin' -c 'and' - find.md
stdin:1
find.md:3

Topics not covered

The following options haven't been discussed in this book:

Option	Description
--binary-files, -a, -I	how to deal with binary data
-d, -D	how to deal with directory, device, FIFO or socket as input
-U	how to deal with files on MS-DOS and MS-Windows platforms
--line-buffered	useful for processing continuous stream
-T	align output with prefixes (ex: `-H`, `-b`) when input has Tab characters

Another topic not covered in this book is handling environment variables like GREP_COLORS.

Summary

A few more options were covered in this chapter. I wish I had known about the -s and -q options for script usage in my early years at work, instead of trying to mess with redirections (which itself was a topic I struggled with).

Exercises

The exercises directory has all the files used in this section.

1) What do the -q and -s options do?

2) For the input file sample.txt, extract from the first occurrence of Just to the last occurrence of it. These terms can occur across different lines. Perform additional transformation to convert ASCII NUL characters, if any, to the newline character.

##### add your solution here
Just do-it
Believe it

3) For the input file nul_separated, use the ASCII NUL character as the line separator and display lines starting with a. Perform additional transformation to convert ASCII NUL characters, if any, to the newline character.

##### add your solution here
apple
fig
mango
icecream

4) Read about the --line-buffered option from the manual (read this link too) and see it in action with code shown below:

$ for i in {1..5}; do seq 12; sleep 1; done | grep '[1-489]' | grep -v '0'

# '> ' is secondary prompt (PS2), not part of the command
$ for i in {1..5}; do seq 12; sleep 1; done | \
> grep --line-buffered '[1-489]' | grep -v '0'

5) Write a Bash script find_digits.sh that loops over filenames passed as arguments. For each file, search for the presence of a digit character and display the results in the format shown below.

$ bash find_digits.sh sample.txt patterns.txt regex_terms.txt
sample.txt: digit characters not found
patterns.txt: found digit characters
regex_terms.txt: found digit characters

$ bash find_digits.sh terms.txt lines.txt
terms.txt: found digit characters
lines.txt: digit characters not found

6) For the input file sample.txt, display lines containing he prefixed with the byte location of the matching lines.

##### add your solution here
13:Hi there
102:He he he

7) What does the --label option do?

CLI text processing with GNU grep and ripgrep