CLI tip 29: define fields using FPAT in GNU awk

In awk, the FS variable allows you to define the input field separator. In contrast, FPAT (field pattern) allows you to define what should the fields be made up of.

$ s='Sample123string42with777numbers'
# one or more consecutive digits
$ echo "$s" | awk -v FPAT='[0-9]+' '{print $2}'
42

$ s='coat Bin food tar12 best Apple fig_42'
# whole words made up of lowercase alphabets and digits only
$ echo "$s" | awk -v FPAT='\\<[a-z0-9]+\\>' -v OFS=, '{$1=$1} 1'
coat,food,tar12,best

$ s='items: "apple" and "mango"'
# get the first double quoted item
$ echo "$s" | awk -v FPAT='"[^"]+"' '{print $1}'
"apple"

FPAT is often used for CSV input where fields can contain embedded delimiter characters. For example, a field content "fox,42" when , is the delimiter.

$ s='eagle,"fox,42",bee,frog'

# simply using , as separator isn't sufficient
$ echo "$s" | awk -F, '{print $2}'
"fox

For such simpler CSV input, FPAT helps to define fields as starting and ending with double quotes or containing non-comma characters.

# * is used instead of + to allow empty fields
$ echo "$s" | awk -v FPAT='"[^"]*"|[^,]*' '{print $2}'
"fox,42"

warning The above will not work for all kinds of CSV files, for example if fields contain escaped double quotes, newline characters, etc. See stackoverflow: What's the most robust way to efficiently parse CSV using awk? for such cases. You could also use other programming languages such as Perl, Python, Ruby, etc which come with standard CSV parsing libraries or have easy access to third party solutions. There are also specialized command line tools such as xsv.

Video demo:

info See also my CLI text processing with GNU awk ebook.