CLI tip 30: extract only the matching portions
The grep
command provides the -o
option to extract only the matching portions. Here are some examples using the BRE/ERE regexp flavors:
# whole words made up of lowercase alphabets and digits only
$ s='coat Bin food Apple (tar12) best fig_42'
$ echo "$s" | grep -owE '[a-z0-9]+'
coat
food
tar12
best
# extract characters from the start of string based on a delimiter
$ echo 'apple:123:banana:cherry' | grep -o '^[^:]*'
apple
# sequence of characters surrounded by double quotes
$ echo 'I like "mango" and "guava"' | grep -oE '"[^"]+"'
"mango"
"guava"
# whole words that have at least one consecutive repeated character
$ s='effort flee facade oddball rat tool'
$ echo "$s" | grep -owE '\w*(\w)\1\w*'
effort
flee
oddball
tool
And here are some examples with the PCRE flavor:
# numbers >= 100 if there are leading zeros
# same as: grep -owE '0*[1-9][0-9]{2,}'
$ echo '0501 035 154 12 26 98234' | grep -woP '0*+\d{3,}'
0501
154
98234
# extract digits only if it is preceded by - and not followed by ,
$ s='42 apple-5, fig3; x-83, y-20: f12'
$ echo "$s" | grep -oP '(?<=-)\d++(?!,)'
20
# extract digits that follow =
$ echo 'apple=42, fig=314' | grep -oP '=\K\d+'
42
314
# all digits and optional hyphen combo from the start of string
$ echo '123-87-593 42 apple-12-345' | grep -oP '\G\d+-?'
123-
87-
593
# all words except those surrounded by double quotes
$ s='I like2 "mango" and "guava"'
$ echo "$s" | grep -oP '"[^"]+"(*SKIP)(*F)|\w+'
I
like2
and
Use ripgrep
if you want to add some more text to the matching portions, or perhaps you need to handle multiple capture groups. Here's an example:
$ echo 'apple=42, fig=314' | rg -o '(\w+)=(\d+)' -r '$2:$1'
42:apple
314:fig
Video demo:
See my CLI text processing with GNU grep and ripgrep ebook if you are interested in learning about the GNU grep
and ripgrep
commands in more detail.