Miscellaneous options

Options not yet covered will be discussed in this chapter.

info Files used in examples are available chapter wise from learn_gnugrep_ripgrep repo. The directory for this chapter is miscellaneous.

Scripting options

While writing scripts, sometimes you just need to know if a file contains the pattern and act based on exit status of the command. Instead of usual workarounds like redirecting output to /dev/null you can use the -q option. This will avoid printing anything on stdout and also provides speed benefit as grep would stop processing as soon as the given condition is satisfied. Check out my ch command line tool for a practical case study of this option.

$ cat find.md
The find command is more versatile than recursive options and
and extended globs. Apart from searching based on filename, it
has provisions to match based on the the file characteristics
like size and time.

$ grep -wE '(\w+) \1' find.md
has provisions to match based on the the file characteristics
$ grep -qwE '(\w+) \1' find.md
$ echo $?
0
$ grep -q 'xyz' find.md
$ echo $?
1
$ grep -qwE '(\w+) \1' find.md && echo 'Repeated words found!'
Repeated words found!

The -s option will suppress error messages that are intended for stderr.

$ # when file doesn't exist
$ grep 'in' xyz.txt
grep: xyz.txt: No such file or directory
$ grep -s 'in' xyz.txt
$ echo $?
2

$ # when sufficient permission is not available
$ touch foo.txt
$ chmod -r foo.txt
$ grep 'rose' foo.txt
grep: foo.txt: Permission denied
$ grep -s 'rose' foo.txt
$ echo $?
2

Errors regarding regular expressions and invalid options will be on stderr even when the -s option is used.

$ grep -sE 'a(' find.md
grep: Unmatched ( or \(

$ grep -sE 'a(' find.md 2> /dev/null
$ echo $?
2

Multiline matching

If input is small enough to meet memory requirements, the -z option comes in handy to match across multiple lines. This assumes that input doesn't contain the NUL character and thus entire file is read as single string. The -z option is similar to -0 option for xargs, it will cause grep to separate input based on NUL character, instead of newline character.

$ # note that each match ends with \0
$ grep -zowE '(\w+)\s+\1' find.md | od -c
0000000   a   n   d  \n   a   n   d  \0   t   h   e       t   h   e  \0
0000020

$ # handy sed one-liner transformation for nicely formatted output
$ grep -zowE '(\w+)\s+\1' find.md | sed 's/\x0/\n---\n/g'
and
and
---
the the
---

If input contents includes the NUL character and -z is used, then whole file will not be read at once. Rather, grep will process chunks of data using NUL character as separator.

$ # with -z, \0 marks the different 'lines'
$ printf 'dark red\nteal\0a2\0spared' | grep -z 'red' | sed 's/\x0/\n---\n/g'
dark red
teal
---
spared
---

$ # always remember that -z will add \0 to each result
$ # just like \n is added for normal newline separated usage
$ printf 'dark red\nteal\0a2\0spared' | grep -z 'red' | od -c
0000000   d   a   r   k       r   e   d  \n   t   e   a   l  \0   s   p
0000020   a   r   e   d  \0
0000025

Byte offset

Sometimes you also want to know where the patterns you are searching for are located in the file. The -b option will give the byte location (starting with 0 for first byte) of matching lines or matching portions (if -o is also used).

$ # offset for starting line of each match
$ grep -b 'is' find.md
0:The find command is more versatile than recursive options and
125:has provisions to match based on the the file characteristics
$ grep -b 'it' find.md
62:and extended globs. Apart from searching based on filename, it

$ # offset for start of matching portion instead of line
$ grep -ob 'art\b' find.md
84:art

$ # use awk to get offset line-wise instead of location in entire input file
$ # output here has line number and offset for start of matching portion
$ awk '/is/{print NR, index($0, "is")-1}' find.md
1 17
3 8

--label

Allows to customize the string to be used when indicating standard input as the data that was processed.

$ echo 'red and blue' | grep -c 'and' - find.md
(standard input):1
find.md:3

$ echo 'red and blue' | grep --label='stdin' -c 'and' - find.md
stdin:1
find.md:3

Options not covered

OptionDescription
--binary-files, -a, -Ihow to deal with binary input
-d, -Dhow to deal with directory, device, FIFO or socket as input
-u, -Uhow to deal with files on MS-DOS and MS-Windows platforms
--line-buffereduseful for processing continuous stream
-Talign output with prefixes (ex: -H, -b) when input has Tab characters

Summary

A few more options were covered in this chapter. I wish I had known about the -s and -q options for script usage in my early years of job, instead of trying to mess with redirections (which was another topic I struggled with). Another topic not covered in this book is environment variable settings like locale and color.

Exercises

a) Use the correct binary option to get output for second command shown below:

$ printf 'hi there\0good day\n' | grep 'good'
Binary file (standard input) matches

$ printf 'hi there\0good day\n' | grep ##### add your solution here
hi theregood day

b) Read about --line-buffered from the manual (also this link) and see it in action with code below:

$ for i in {1..5}; do seq 12; sleep 1; done | grep '[1-489]' | grep -v '0'

$ for i in {1..5}; do seq 12; sleep 1; done | \
> grep --line-buffered '[1-489]' | grep -v '0'

c) Consider non-binary input having multiple lines of text. Display Match if input starts with a number and Nope if it doesn't.

$ printf 'oh\n42' | grep ##### add your solution here
Nope
$ printf '2a\nhi' | grep ##### add your solution here
Match