Multipurpose Text Processing Tools

Many CLI text processing tools have been in existence for about half a century. And newer tools are being written to solve the ever expanding text processing problems. Just knowing that a particular tool exists or searching for a tool before attempting to write your own solution can be a time saver. Also, popular tools are likely to be optimized for speed, hardened against bugs due to wide usage, discussed on forums, and so on.

grep was already covered in the Searching Files and Filenames chapter. In addition, sed, awk and perl are essential tools to solve a wide variety of text processing problems from the command line. In this chapter you'll learn field processing, use regular expressions for search and replace requirements, perform operations based on multiple lines and files, etc.

The examples presented in this chapter only cover some of the functionalities. I've written separate books to cover these tools with more detailed explanations, examples and exercises. See https://learnbyexample.github.io/books/ for links to these books.

The example_files directory has the sample input files used in this chapter.

sed

The command name sed is derived from stream editor. Here, stream refers to the data being passed via shell pipes. Thus, the command's primary functionality is to act as a text editor for stdin data with stdout as the output target. You can also edit file input and save the changes back to the same file if needed.

Substitution

sed has various commands to manipulate text input. The substitute command is the most commonly used, whose syntax is s/REGEXP/REPLACEMENT/FLAGS. Here are some basic examples:

# for each input line, change only the first ',' to '-'
$ printf '1,2,3,4\na,b,c,d\n' | sed 's/,/-/'
1-2,3,4
a-b,c,d

# change all matches by adding the 'g' flag
$ printf '1,2,3,4\na,b,c,d\n' | sed 's/,/-/g'
1-2-3-4
a-b-c-d

Here's an example with file input:

$ cat greeting.txt
Hi there
Have a nice day

# change 'day' to 'weekend'
$ sed 's/day/weekend/g' greeting.txt
Hi there
Have a nice weekend

What if you want to issue multiple substitute commands (or use several other sed commands)? It will depend on the command being used. Here's an example where you can use the -e option or separate the commands with a ; character.

# change all occurrences of 'day' to 'weekend'
# add '.' to the end of each line
$ sed 's/day/weekend/g; s/$/./' greeting.txt
Hi there.
Have a nice weekend.

# same thing with the -e option
$ sed -e 's/day/weekend/g' -e 's/$/./' greeting.txt
Hi there.
Have a nice weekend.

Inplace editing

You can use the -i option for inplace editing. Pass an argument to this option to save the original input as a backup.

$ cat ip.txt
deep blue
light orange
blue delight

# output from sed is written back to 'ip.txt'
# original file is preserved in 'ip.txt.bkp'
$ sed -i.bkp 's/blue/green/g' ip.txt
$ cat ip.txt
deep green
light orange
green delight

Filtering features

The sed command also has features to filter lines based on a search pattern like grep. And you can apply other sed commands for these filtered lines as needed.

# the -n option disables automatic printing
# the 'p' command prints the contents of the pattern space
# same as: grep 'at'
$ printf 'sea\neat\ndrop\n' | sed -n '/at/p'
eat

# the 'd' command deletes the matching lines
# same as: grep -v 'at'
$ printf 'sea\neat\ndrop\n' | sed '/at/d'
sea
drop

# change commas to hyphens only if the input line contains '2'
$ printf '1,2,3,4\na,b,c,d\n' | sed '/2/ s/,/-/g'
1-2-3-4
a,b,c,d

# change commas to hyphens if the input line does NOT contain '2'
$ printf '1,2,3,4\na,b,c,d\n' | sed '/2/! s/,/-/g'
1,2,3,4
a-b-c-d

You can use the q and Q commands to quit sed once a matching line is found:

# quit after a line containing 'st' is found
$ printf 'apple\nsea\neast\ndust' | sed '/st/q'
apple
sea
east

# the matching line won't be printed in this case
$ printf 'apple\nsea\neast\ndust' | sed '/st/Q'
apple
sea

Apart from regexp, filtering can also be done based on line numbers, address ranges, etc.

# perform substitution only for the second line
# use '$' instead of a number to indicate the last input line
$ printf 'gates\nnot\nused\n' | sed '2 s/t/*/g'
gates
no*
used

# address range example, same as: sed -n '3,8!p'
# you can also use regexp to construct address ranges
$ seq 15 24 | sed '3,8d'
15
16
23
24

If you need to issue multiple commands for filtered lines, you can group those commands within {} characters. Here's an example:

# for lines containing 'e', replace 's' with '*' and 't' with '='
# note that the second line isn't changed as there's no 'e'
$ printf 'gates\nnot\nused\n' | sed '/e/{s/s/*/g; s/t/=/g}'
ga=e*
not
u*ed

Regexp substitution

Here are some regexp based substitution examples. The -E option enables ERE (default is BRE). Most of the syntax discussed in the Regular Expressions section for the grep command applies for sed as well.

# replace all sequences of non-digit characters with '-'
$ echo 'Sample123string42with777numbers' | sed -E 's/[^0-9]+/-/g'
-123-42-777-

# replace numbers >= 100 which can have optional leading zeros
$ echo '0501 035 154 12 26 98234' | sed -E 's/\b0*[1-9][0-9]{2,}\b/X/g'
X 035 X 12 26 X

# reduce \\ to single \ and delete if it is a single \
$ echo '\[\] and \\w and \[a-zA-Z0-9\_\]' | sed -E 's/(\\?)\\/\1/g'
[] and \w and [a-zA-Z0-9_]

# remove two or more duplicate words that are separated by a space character
# \b prevents false matches like 'the theatre', 'sand and stone' etc
$ echo 'aa a a a 42 f_1 f_1 f_13.14' | sed -E 's/\b(\w+)( \1)+\b/\1/g'
aa a 42 f_1 f_13.14

# & backreferences the matched portion
# \u changes the next character to uppercase
$ echo 'hello there. how are you?' | sed 's/\b\w/\u&/g'
Hello There. How Are You?

# replace only the third matching occurrence
$ echo 'apple:123:banana:fig' | sed 's/:/-/3'
apple:123:banana-fig
# change all ':' to ',' only from the second occurrence
$ echo 'apple:123:banana:fig' | sed 's/:/,/2g'
apple:123,banana,fig

The / character is idiomatically used as the regexp delimiter. But any character other than \ and the newline character can be used instead. This helps to avoid or reduce the need for escaping delimiter characters.

$ echo '/home/learnbyexample/reports' | sed 's#/home/learnbyexample/#~/#'
~/reports

$ echo 'home path is:' | sed 's,$, '"$HOME"','
home path is: /home/learnbyexample

awk

awk is a programming language and widely used for text processing tasks from the command line. awk provides filtering capabilities like those supported by the grep and sed commands, along with some more nifty features. And similar to many command line utilities, awk can accept input from both stdin and files.

Regexp filtering

To make it easier to use programming features from the command line, there are several shortcuts, for example:

awk '/regexp/' is a shortcut for awk '$0 ~ /regexp/{print $0}'
awk '!/regexp/' is a shortcut for awk '$0 !~ /regexp/{print $0}'

# same as: grep 'at' and sed -n '/at/p'
$ printf 'gate\napple\nwhat\nkite\n' | awk '/at/'
gate
what

# same as: grep -v 'e' and sed -n '/e/!p'
$ printf 'gate\napple\nwhat\nkite\n' | awk '!/e/'
what

# lines containing 'e' followed by zero or more characters and then 'y'
$ awk '/e.*y/' greeting.txt
Have a nice day

Awk special variables

Brief description for some of the special variables are given below:

$0 contains the input record content
$1 first field
$2 second field and so on
FS input field separator
OFS output field separator
NF number of fields
RS input record separator
ORS output record separator
NR number of records (i.e. line number) for entire input
FNR number of records per file

Default field processing

awk automatically splits input into fields based on one or more sequence of space or tab or newline characters. In addition, any of these three characters at the start or end of input gets trimmed and won't be part of field contents. The fields are accessible using $N where N is the field number you need. You can also pass an expression instead of numeric literals to specify the field required.

Here are some examples:

$ cat table.txt
brown bread mat hair 42
blue cake mug shirt -7
yellow banana window shoes 3.14

# print the second field of each input line
$ awk '{print $2}' table.txt
bread
cake
banana

# print lines only if the last field is a negative number
$ awk '$NF<0' table.txt
blue cake mug shirt -7

Here's an example of applying a substitution operation for a particular field.

# delete lowercase vowels only from the first field
# gsub() is like the sed substitution command with the 'g' flag
# use sub() if you need to change only the first match
# 1 is a true condition, and thus prints the contents of $0
$ awk '{gsub(/[aeiou]/, "", $1)} 1' table.txt
brwn bread mat hair 42
bl cake mug shirt -7
yllw banana window shoes 3.14

Condition and Action

The examples so far have used a few different ways to construct a typical awk one-liner. If you haven't yet grasped the syntax, this generic structure might help:

awk 'cond1{action1} cond2{action2} ... condN{actionN}'

If a condition isn't provided, the action is always executed. Within a block, you can provide multiple statements separated by a semicolon character. If action isn't provided, then by default, contents of $0 variable is printed if the condition evaluates to true. Idiomatically, 1 is used to denote a true condition in one-liners as a shortcut to print the contents of $0 (as seen in an earlier example). When action isn't present, you can use semicolon to terminate the condition and start another condX{actionX} snippet.

You can use a BEGIN{} block when you need to execute something before the input is read and an END{} block to execute something after all of the input has been processed.

$ seq 2 | awk 'BEGIN{print "---"} 1; END{print "%%%"}'
---
1
2
%%%

Regexp field processing

As seen earlier, awk automatically splits input into fields (based on space/tab/newline characters) which are accessible using $N where N is the field number you need. You can use the -F option or assign the FS variable to set a regexp based input field separator. Use the OFS variable to set the output field separator.

$ echo 'goal:amazing:whistle:kwality' | awk -F: '{print $1}'
goal
# one or more alphabets will be considered as the input field separator
$ echo 'Sample123string42with777numbers' | awk -F'[a-zA-Z]+' '{print $2}'
123

$ s='Sample123string42with777numbers'
# -v option helps you set a value for the given variable
$ echo "$s" | awk -F'[0-9]+' -v OFS=, '{print $1, $(NF-1)}'
Sample,with

The FS variable allows you to define the input field separator. In contrast, FPAT (field pattern) allows you to define what should the fields be made up of.

# lowercase whole words starting with 'b'
$ awk -v FPAT='\\<b[a-z]*\\>' -v OFS=, '{$1=$1} 1' table.txt
brown,bread
blue
banana

# fields enclosed within double quotes or made up of non-comma characters
$ s='eagle,"fox,42",bee,frog'
$ echo "$s" | awk -v FPAT='"[^"]*"|[^,]*' '{print $2}'
"fox,42"

Record separators

By default, newline is used as the input and output record separators. You can change them using the RS and ORS variables.

# print records containing 'i' as well as 't'
$ printf 'Sample123string42with777numbers' | awk -v RS='[0-9]+' '/i/ && /t/'
string
with

# empty RS is paragraph mode, uses two or more newlines as the separator
$ printf 'apple\nbanana\nfig\n\n\n123\n456' | awk -v RS= 'NR==1'
apple
banana
fig

# change ORS depending on some condition
$ seq 9 | awk '{ORS = NR%3 ? "-" : "\n"} 1'
1-2-3
4-5-6
7-8-9

State machines

The condX{actionX} shortcut makes it easy to code state machines concisely. This is useful to solve problems that depend on the contents of multiple records.

Here's an example of printing the matching line as well as c number of lines that follow:

# same as: grep --no-group-separator -A1 'blue'
# print matching line as well as the one that follows it
$ printf 'red\nblue\ngreen\nteal\n' | awk -v c=1 '/blue/{n=c+1} n && n--'
blue
green

# print matching line as well as two lines that follow
$ printf 'red\nblue\ngreen\nteal\n' | awk -v c=2 '/blue/{n=c+1} n && n--'
blue
green
teal

Consider the following input file that has records bounded by distinct markers (lines containing start and end):

$ cat uniform.txt
mango
icecream
--start 1--
1234
6789
**end 1**
how are you
have a nice day
--start 2--
a
b
c
**end 2**
par,far,mar,tar

Here are some examples of processing such bounded records:

# same as: sed -n '/start/,/end/p' uniform.txt
$ awk '/start/{f=1} f; /end/{f=0}' uniform.txt
--start 1--
1234
6789
**end 1**
--start 2--
a
b
c
**end 2**

# you can re-arrange and invert the conditions to create other combinations
# for example, exclude the ending match
$ awk '/start/{f=1} /end/{f=0} f' uniform.txt
--start 1--
1234
6789
--start 2--
a
b
c

Here's an example of printing two consecutive records only if the first record contains ar and the second one contains nice:

$ awk 'p ~ /ar/ && /nice/{print p ORS $0} {p=$0}' uniform.txt
how are you
have a nice day

Two files processing

This section focuses on solving problems which depend upon the contents of two or more files. These are usually based on comparing records and fields. These two files will be used in the examples to follow:

$ paste c1.txt c2.txt
Blue    Black
Brown   Blue
Orange  Green
Purple  Orange
Red     Pink
Teal    Red
White   White

The key features used to find common lines between two files:

For two files as input, NR==FNR will be true only when the first file is being processed
- FNR is record number like NR but resets for each input file
next will skip the rest of the code and fetch the next record
a[$0] by itself is a valid statement, creates an uninitialized element in array a with $0 as the key (if the key doesn't exist yet)
$0 in a checks if the given string ($0 here) exists as a key in the array a

# common lines, same as: grep -Fxf c1.txt c2.txt
$ awk 'NR==FNR{a[$0]; next} $0 in a' c1.txt c2.txt
Blue
Orange
Red
White

# lines present in c2.txt but not in c1.txt
$ awk 'NR==FNR{a[$0]; next} !($0 in a)' c1.txt c2.txt
Black
Green
Pink

Note that the NR==FNR logic will fail if the first file is empty. See this unix.stackexchange thread for workarounds.

Removing duplicates

awk '!a[$0]++' is one of the most famous awk one-liners. It eliminates line based duplicates while retaining the input order. The following example shows this feature in action along with an illustration of how the logic works.

$ cat purchases.txt
coffee
tea
washing powder
coffee
toothpaste
tea
soap
tea

$ awk '{print +a[$0] "\t" $0; a[$0]++}' purchases.txt
0       coffee
0       tea
0       washing powder
1       coffee
0       toothpaste
1       tea
0       soap
2       tea

# only those entries with zero in the first column will be retained
$ awk '!a[$0]++' purchases.txt
coffee
tea
washing powder
toothpaste
soap

perl

Perl is a scripting language with plenty of builtin features and a strong ecosystem. Perl one-liners can be used for text processing, similar to grep, sed, awk and more. And similar to many command line utilities, perl can accept input from both stdin and file arguments.

Basic one-liners

# print all lines containing 'at'
# same as: grep 'at' and sed -n '/at/p' and awk '/at/'
$ printf 'gate\napple\nwhat\nkite\n' | perl -ne 'print if /at/'
gate
what

# print all lines NOT containing 'e'
# same as: grep -v 'e' and sed -n '/e/!p' and awk '!/e/'
$ printf 'gate\napple\nwhat\nkite\n' | perl -ne 'print if !/e/'
what

The -e option accepts code as a command line argument. Many shortcuts are available to reduce the amount of typing needed. In the above examples, a regular expression has been used to filter the input. When the input string isn't specified, the test is performed against the special variable $_, which has the contents of the current input line. $_ is also the default argument for many functions like print and length. To summarize:

/REGEXP/FLAGS is a shortcut for $_ =~ m/REGEXP/FLAGS
!/REGEXP/FLAGS is a shortcut for $_ !~ m/REGEXP/FLAGS

In the examples below, the -p option is used instead of -n. This helps to automatically print the value of $_ after processing each input line.

# same as: sed 's/:/-/' and awk '{sub(/:/, "-")} 1'
$ printf '1:2:3:4\na:b:c:d\n' | perl -pe 's/:/-/'
1-2:3:4
a-b:c:d

# same as: sed 's/:/-/g' and awk '{gsub(/:/, "-")} 1'
$ printf '1:2:3:4\na:b:c:d\n' | perl -pe 's/:/-/g'
1-2-3-4
a-b-c-d

Similar to sed, you can use the -i option for inplace editing.

Perl special variables

Brief description for some of the special variables are given below:

$_ contains the input record content
@F array containing the field contents (with the -a and -F options)
- $F[0] first field
- $F[1] second field and so on
- $F[-1] last field
- $F[-2] second last field and so on
- $#F index of the last field
$. number of records (i.e. line number)
$1 backreference to the first capture group
$2 backreference to the second capture group and so on
$& backreference to the entire matched portion

You'll see examples using such variables in the sections to follow.

Auto split

Here are some examples based on specific fields rather than the entire line. The -a option will cause the input line to be split based on whitespaces and the field contents can be accessed using the @F special array variable. Leading and trailing whitespaces will be suppressed, so there's no possibility of empty fields.

$ cat table.txt
brown bread mat hair 42
blue cake mug shirt -7
yellow banana window shoes 3.14

# same as: awk '{print $2}' table.txt
$ perl -lane 'print $F[1]' table.txt
bread
cake
banana

# same as: awk '$NF<0' table.txt
$ perl -lane 'print if $F[-1] < 0' table.txt
blue cake mug shirt -7

# same as: awk '{gsub(/b/, "B", $1)} 1' table.txt
$ perl -lane '$F[0] =~ s/b/B/g; print "@F"' table.txt
Brown bread mat hair 42
Blue cake mug shirt -7
yellow banana window shoes 3.14

When you use an array within double quotes (like "@F" in the example above), the fields will be printed with a space character in between. The join function is one of the ways to print the contents of an array with a custom field separator. Here's an example:

# print contents of @F array with colon as the separator
$ perl -lane 'print join ":", @F' table.txt
brown:bread:mat:hair:42
blue:cake:mug:shirt:-7
yellow:banana:window:shoes:3.14

In the above examples, the -l option has been used to remove the record separator (which is newline by default) from the input line. The record separator thus removed is added back when the print function is used.

Regexp field separator

You can use the -F option to specify a regexp pattern for input field separation.

$ echo 'apple,banana,cherry' | perl -F, -lane 'print $F[1]'
banana

$ s='Sample123string42with777numbers'
$ echo "$s" | perl -F'\d+' -lane 'print join ",", @F'
Sample,string,with,numbers

Powerful features

I reach for Perl over grep, sed and awk when I need powerful regexp features and make use of the vast builtin functions and libraries.

Here are some examples showing regexp features not present in BRE/ERE:

# reverse lowercase alphabets at the end of input lines
# the 'e' flag allows you to use Perl code in the replacement section
$ echo 'fig 42apples' | perl -pe 's/[a-z]+$/reverse $&/e'
fig 42selppa

# replace arithmetic expressions with their results
$ echo '42*10 200+100 22/7' | perl -pe 's|\d+[+/*-]\d+|$&|gee'
420 300 3.14285714285714

# exclude terms in the search pattern
$ s='orange apple appleseed'
$ echo "$s" | perl -pe 's#\bapple\b(*SKIP)(*F)|\w+#($&)#g'
(orange) apple (appleseed)

And here are some examples showing off builtin features:

# filter fields containing 'in' or 'it' or 'is'
$ s='goal:amazing:42:whistle:kwality:3.14'
$ echo "$s" | perl -F: -lane 'print join ":", grep {/i[nts]/} @F'
amazing:whistle:kwality

# sort numbers in ascending order
# use {$b <=> $a} for descending order
$ echo '23 756 -983 5' | perl -lane 'print join " ", sort {$a <=> $b} @F'
-983 5 23 756

# sort strings in ascending order
$ s='floor bat to dubious four'
$ echo "$s" | perl -lane 'print join ":", sort @F'
bat:dubious:floor:four:to

# unique fields, maintains input order of elements
# -M option helps you load modules
$ s='3,b,a,3,c,d,1,d,c,2,2,2,3,1,b'
$ echo "$s" | perl -MList::Util=uniq -F, -lane 'print join ",", uniq @F'
3,b,a,c,d,1,2

Exercises

Use the example_files/text_files directory for input files used in the following exercises.

1) Replace all occurrences of 0xA0 with 0x50 and 0xFF with 0x7F for the given input.

$ printf 'a1:0xA0, a2:0xA0A1\nb1:0xFF, b2:0xBE\n'
a1:0xA0, a2:0xA0A1
b1:0xFF, b2:0xBE

$ printf 'a1:0xA0, a2:0xA0A1\nb1:0xFF, b2:0xBE\n' | sed # ???
a1:0x50, a2:0x50A1
b1:0x7F, b2:0xBE

2) Remove only the third line from the given input.

$ seq 34 37 | # ???
34
35
37

3) For the input file sample.txt, display all lines that contain it but not do.

# ???
 7) Believe it

4) For the input file purchases.txt, delete all lines containing tea. Also, replace all occurrences of coffee with milk. Write back the changes to the input file itself. The original contents should get saved to purchases.txt.orig. Afterwards, restore the contents from this backup file.

# make the changes
# ???
$ ls purchases*
purchases.txt  purchases.txt.orig
$ cat purchases.txt
milk
washing powder
milk
toothpaste
soap

# restore the contents
# ???
$ ls purchases*
purchases.txt
$ cat purchases.txt
coffee
tea
washing powder
coffee
toothpaste
tea
soap
tea

5) For the input file sample.txt, display all lines from the start of the file till the first occurrence of are.

# ???
 1) Hello World
 2) 
 3) Hi there
 4) How are you

6) Delete all groups of lines from a line containing start to a line containing end for the uniform.txt input file.

# ???
mango
icecream
how are you
have a nice day
par,far,mar,tar

7) Replace all occurrences of 42 with [42] unless it is at the edge of a word.

$ echo 'hi42bye nice421423 bad42 cool_4242a 42c' | sed # ???
hi[42]bye nice[42]1[42]3 bad42 cool_[42][42]a 42c

8) Replace all whole words with X that start and end with the same word character.

$ echo 'oreo not a _oh_ pip RoaR took 22 Pop' | sed # ???
X not X X X X took X Pop

9) For the input file anchors.txt, convert markdown anchors to hyperlinks as shown below.

$ cat anchors.txt
# <a name="regular-expressions"></a>Regular Expressions
## <a name="subexpression-calls"></a>Subexpression calls
## <a name="the-dot-meta-character"></a>The dot meta character

$ sed # ???
[Regular Expressions](#regular-expressions)
[Subexpression calls](#subexpression-calls)
[The dot meta character](#the-dot-meta-character)

10) Replace all occurrences of e with 3 except the first two matches.

$ echo 'asset sets tests site' | sed # ???
asset sets t3sts sit3

$ echo 'sample item teem eel' | sed # ???
sample item t33m 33l

11) The below sample strings use , as the delimiter and the field values can be empty as well. Use sed to replace only the third field with 42.

$ echo 'lion,,ant,road,neon' | sed # ???
lion,,42,road,neon

$ echo ',,,' | sed # ???
,,42,

12) For the input file table.txt, calculate and display the product of numbers in the last field of each line. Consider space as the field separator for this file.

$ cat table.txt
brown bread mat hair 42
blue cake mug shirt -7
yellow banana window shoes 3.14

# ???
-923.16

13) Extract the contents between () or )( from each of the input lines. Assume that the () characters will be present only once every line.

$ printf 'apple(ice)pie\n(almond)pista\nyo)yoyo(yo\n'
apple(ice)pie
(almond)pista
yo)yoyo(yo

$ printf 'apple(ice)pie\n(almond)pista\nyo)yoyo(yo\n' | awk # ???
ice
almond
yoyo

14) For the input file scores.csv, display the Name and Physics fields in the format shown below.

$ cat scores.csv
Name,Maths,Physics,Chemistry
Ith,100,100,100
Cy,97,98,95
Lin,78,83,80

# ???
Name:Physics
Ith:100
Cy:98
Lin:83

15) Extract and display the third and first words in the format shown below.

$ echo '%whole(Hello)--{doubt}==ado==' | # ???
doubt:whole

$ echo 'just,\joint*,concession_42<=nice' | # ???
concession_42:just

16) For the input file scores.csv, add another column named GP which is calculated out of 100 by giving 50% weightage to Maths and 25% each for Physics and Chemistry.

$ awk # ???
Name,Maths,Physics,Chemistry,GP
Ith,100,100,100,100
Cy,97,98,95,96.75
Lin,78,83,80,79.75

17) From the para.txt input file, display all paragraphs containing any digit character.

$ cat para.txt
hi there
how are you

2 apples
12 bananas


blue sky
yellow sun
brown earth

$ awk # ???
2 apples
12 bananas

18) Input has the ASCII NUL character as the record separator. Change it to dot and newline characters as shown below.

$ printf 'apple\npie\0banana\ncherry\0' | awk # ???
apple
pie.
banana
cherry.

19) For the input file sample.txt, print a matching line containing do only if you is found two lines before. For example, if do is found on line number 10 and the 8th line contains you, then the 10th line should be printed.

# ???
 6) Just do-it

20) For the input file blocks.txt, extract contents from a line containing exactly %=%= until but not including the next such line. The block to be extracted is indicated by the variable n passed via the -v option.

$ cat blocks.txt
%=%=
apple
banana
%=%=
brown
green

$ awk -v n=1 # ???
%=%=
apple
banana
$ awk -v n=2 # ???
%=%=
brown
green

21) Display lines present in c1.txt but not in c2.txt using the awk command.

$ awk # ???
Brown
Purple
Teal

22) Display lines from scores.csv by matching the first field based on a list of names from the names.txt file.

$ printf 'Ith\nLin\n' > names.txt

$ awk # ???
Ith,100,100,100
Lin,78,83,80

$ rm names.txt

23) Retain only the first copy of duplicate lines from the duplicates.txt input file. Use only the contents of the last field for determining duplicates.

$ cat duplicates.txt
brown,toy,bread,42
dark red,ruby,rose,111
blue,ruby,water,333
dark red,sky,rose,555
yellow,toy,flower,333
white,sky,bread,111
light red,purse,rose,333

# ???
brown,toy,bread,42
dark red,ruby,rose,111
blue,ruby,water,333
dark red,sky,rose,555

24) For the input file table.txt, print input lines if the second field starts with b. Construct solutions using awk and perl.

$ awk # ???
brown bread mat hair 42
yellow banana window shoes 3.14

$ perl # ???
brown bread mat hair 42
yellow banana window shoes 3.14

25) For the input file table.txt, retain only the second last field. Write back the changes to the input file itself. The original contents should get saved to table.txt.bkp. Afterwards, restore the contents from this backup file.

# make the changes
$ perl # ???
$ ls table*
table.txt  table.txt.bkp
$ cat table.txt
hair
shirt
shoes

# restore the contents
# ???
$ ls table*
table.txt
$ cat table.txt
brown bread mat hair 42
blue cake mug shirt -7
yellow banana window shoes 3.14

26) Reverse the first field contents of table.txt input file.

# ???
nworb bread mat hair 42
eulb cake mug shirt -7
wolley banana window shoes 3.14

27) Sort the given comma separated input lexicographically. Change the output field separator to a : character.

$ ip='floor,bat,to,dubious,four'
$ echo "$ip" | perl # ???
bat:dubious:floor:four:to

28) Filter fields containing digit characters.

$ ip='5pearl 42 east 1337 raku_6 lion 3.14'
$ echo "$ip" | perl # ???
5pearl 42 1337 raku_6 3.14

29) The input shown below has several words ending with digit characters. Change the words containing test to match the output shown below. That is, renumber the matching portions to 1, 2, etc. Words not containing test should not be changed.

$ ip='test_12:test123\nanother_test_4,no_42\n'
$ printf '%b' "$ip"
test_12:test123
another_test_4,no_42

$ printf '%b' "$ip" | perl # ???
test_1:test2
another_test_3,no_42

30) For the input file table.txt, change contents of the third field to all uppercase. Construct solutions using sed, awk and perl.

$ sed # ???
brown bread MAT hair 42
blue cake MUG shirt -7
yellow banana WINDOW shoes 3.14

$ awk # ???
brown bread MAT hair 42
blue cake MUG shirt -7
yellow banana WINDOW shoes 3.14

$ perl # ???
brown bread MAT hair 42
blue cake MUG shirt -7
yellow banana WINDOW shoes 3.14

Linux Command Line Computing

Multipurpose Text Processing Tools

sed

Substitution

Inplace editing

Filtering features

Regexp substitution

Further Reading

awk

Regexp filtering

Awk special variables

Default field processing

Condition and Action

Regexp field processing

Record separators

State machines

Two files processing

Removing duplicates

Further Reading

perl

Basic one-liners

Perl special variables

Auto split

Regexp field separator

Powerful features

Further Reading

Exercises