Multipurpose Text Processing Tools
Many CLI text processing tools have been in existence for about half a century. And newer tools are being written to solve the ever expanding text processing problems. Just knowing that a particular tool exists or searching for a tool before attempting to write your own solution can be a time saver. Also, popular tools are likely to be optimized for speed, hardened against bugs due to wide usage, discussed on forums, and so on.
grep
was already covered in the Searching Files and Filenames chapter. In addition, sed
, awk
and perl
are essential tools to solve a wide variety of text processing problems from the command line. In this chapter you'll learn field processing, use regular expressions for search and replace requirements, perform operations based on multiple lines and files, etc.
The examples presented in this chapter only cover some of the functionalities. I've written separate books to cover these tools with more detailed explanations, examples and exercises. See https://learnbyexample.github.io/books/ for links to these books.
The example_files directory has the sample input files used in this chapter.
sed
The command name sed
is derived from stream editor. Here, stream refers to the data being passed via shell pipes. Thus, the command's primary functionality is to act as a text editor for stdin data with stdout as the output target. You can also edit file input and save the changes back to the same file if needed.
Substitution
sed
has various commands to manipulate text input. The substitute command is the most commonly used, whose syntax is s/REGEXP/REPLACEMENT/FLAGS
. Here are some basic examples:
# for each input line, change only the first ',' to '-'
$ printf '1,2,3,4\na,b,c,d\n' | sed 's/,/-/'
1-2,3,4
a-b,c,d
# change all matches by adding the 'g' flag
$ printf '1,2,3,4\na,b,c,d\n' | sed 's/,/-/g'
1-2-3-4
a-b-c-d
Here's an example with file input:
$ cat greeting.txt
Hi there
Have a nice day
# change 'day' to 'weekend'
$ sed 's/day/weekend/g' greeting.txt
Hi there
Have a nice weekend
What if you want to issue multiple substitute commands (or use several other sed
commands)? It will depend on the command being used. Here's an example where you can use the -e
option or separate the commands with a ;
character.
# change all occurrences of 'day' to 'weekend'
# add '.' to the end of each line
$ sed 's/day/weekend/g; s/$/./' greeting.txt
Hi there.
Have a nice weekend.
# same thing with the -e option
$ sed -e 's/day/weekend/g' -e 's/$/./' greeting.txt
Hi there.
Have a nice weekend.
Inplace editing
You can use the -i
option for inplace editing. Pass an argument to this option to save the original input as a backup.
$ cat ip.txt
deep blue
light orange
blue delight
# output from sed is written back to 'ip.txt'
# original file is preserved in 'ip.txt.bkp'
$ sed -i.bkp 's/blue/green/g' ip.txt
$ cat ip.txt
deep green
light orange
green delight
Filtering features
The sed
command also has features to filter lines based on a search pattern like grep
. And you can apply other sed
commands for these filtered lines as needed.
# the -n option disables automatic printing
# the 'p' command prints the contents of the pattern space
# same as: grep 'at'
$ printf 'sea\neat\ndrop\n' | sed -n '/at/p'
eat
# the 'd' command deletes the matching lines
# same as: grep -v 'at'
$ printf 'sea\neat\ndrop\n' | sed '/at/d'
sea
drop
# change commas to hyphens only if the input line contains '2'
$ printf '1,2,3,4\na,b,c,d\n' | sed '/2/ s/,/-/g'
1-2-3-4
a,b,c,d
# change commas to hyphens if the input line does NOT contain '2'
$ printf '1,2,3,4\na,b,c,d\n' | sed '/2/! s/,/-/g'
1,2,3,4
a-b-c-d
You can use the q
and Q
commands to quit sed
once a matching line is found:
# quit after a line containing 'st' is found
$ printf 'apple\nsea\neast\ndust' | sed '/st/q'
apple
sea
east
# the matching line won't be printed in this case
$ printf 'apple\nsea\neast\ndust' | sed '/st/Q'
apple
sea
Apart from regexp, filtering can also be done based on line numbers, address ranges, etc.
# perform substitution only for the second line
# use '$' instead of a number to indicate the last input line
$ printf 'gates\nnot\nused\n' | sed '2 s/t/*/g'
gates
no*
used
# address range example, same as: sed -n '3,8!p'
# you can also use regexp to construct address ranges
$ seq 15 24 | sed '3,8d'
15
16
23
24
If you need to issue multiple commands for filtered lines, you can group those commands within {}
characters. Here's an example:
# for lines containing 'e', replace 's' with '*' and 't' with '='
# note that the second line isn't changed as there's no 'e'
$ printf 'gates\nnot\nused\n' | sed '/e/{s/s/*/g; s/t/=/g}'
ga=e*
not
u*ed
Regexp substitution
Here are some regexp based substitution examples. The -E
option enables ERE (default is BRE). Most of the syntax discussed in the Regular Expressions section for the grep
command applies for sed
as well.
# replace all sequences of non-digit characters with '-'
$ echo 'Sample123string42with777numbers' | sed -E 's/[^0-9]+/-/g'
-123-42-777-
# replace numbers >= 100 which can have optional leading zeros
$ echo '0501 035 154 12 26 98234' | sed -E 's/\b0*[1-9][0-9]{2,}\b/X/g'
X 035 X 12 26 X
# reduce \\ to single \ and delete if it is a single \
$ echo '\[\] and \\w and \[a-zA-Z0-9\_\]' | sed -E 's/(\\?)\\/\1/g'
[] and \w and [a-zA-Z0-9_]
# remove two or more duplicate words that are separated by a space character
# \b prevents false matches like 'the theatre', 'sand and stone' etc
$ echo 'aa a a a 42 f_1 f_1 f_13.14' | sed -E 's/\b(\w+)( \1)+\b/\1/g'
aa a 42 f_1 f_13.14
# & backreferences the matched portion
# \u changes the next character to uppercase
$ echo 'hello there. how are you?' | sed 's/\b\w/\u&/g'
Hello There. How Are You?
# replace only the third matching occurrence
$ echo 'apple:123:banana:fig' | sed 's/:/-/3'
apple:123:banana-fig
# change all ':' to ',' only from the second occurrence
$ echo 'apple:123:banana:fig' | sed 's/:/,/2g'
apple:123,banana,fig
The /
character is idiomatically used as the regexp delimiter. But any character other than \
and the newline character can be used instead. This helps to avoid or reduce the need for escaping delimiter characters.
$ echo '/home/learnbyexample/reports' | sed 's#/home/learnbyexample/#~/#'
~/reports
$ echo 'home path is:' | sed 's,$, '"$HOME"','
home path is: /home/learnbyexample
Further Reading
- My ebook CLI text processing with GNU sed
- See also my blog post GNU BRE/ERE cheatsheet
- unix.stackexchange: common search and replace examples with sed and other tools
awk
awk
is a programming language and widely used for text processing tasks from the command line. awk
provides filtering capabilities like those supported by the grep
and sed
commands, along with some more nifty features. And similar to many command line utilities, awk
can accept input from both stdin
and files.
Regexp filtering
To make it easier to use programming features from the command line, there are several shortcuts, for example:
awk '/regexp/'
is a shortcut forawk '$0 ~ /regexp/{print $0}'
awk '!/regexp/'
is a shortcut forawk '$0 !~ /regexp/{print $0}'
# same as: grep 'at' and sed -n '/at/p'
$ printf 'gate\napple\nwhat\nkite\n' | awk '/at/'
gate
what
# same as: grep -v 'e' and sed -n '/e/!p'
$ printf 'gate\napple\nwhat\nkite\n' | awk '!/e/'
what
# lines containing 'e' followed by zero or more characters and then 'y'
$ awk '/e.*y/' greeting.txt
Have a nice day
Awk special variables
Brief description for some of the special variables are given below:
$0
contains the input record content$1
first field$2
second field and so onFS
input field separatorOFS
output field separatorNF
number of fieldsRS
input record separatorORS
output record separatorNR
number of records (i.e. line number) for entire inputFNR
number of records per file
Default field processing
awk
automatically splits input into fields based on one or more sequence of space or tab or newline characters. In addition, any of these three characters at the start or end of input gets trimmed and won't be part of field contents. The fields are accessible using $N
where N
is the field number you need. You can also pass an expression instead of numeric literals to specify the field required.
Here are some examples:
$ cat table.txt
brown bread mat hair 42
blue cake mug shirt -7
yellow banana window shoes 3.14
# print the second field of each input line
$ awk '{print $2}' table.txt
bread
cake
banana
# print lines only if the last field is a negative number
$ awk '$NF<0' table.txt
blue cake mug shirt -7
Here's an example of applying a substitution operation for a particular field.
# delete lowercase vowels only from the first field
# gsub() is like the sed substitution command with the 'g' flag
# use sub() if you need to change only the first match
# 1 is a true condition, and thus prints the contents of $0
$ awk '{gsub(/[aeiou]/, "", $1)} 1' table.txt
brwn bread mat hair 42
bl cake mug shirt -7
yllw banana window shoes 3.14
Condition and Action
The examples so far have used a few different ways to construct a typical awk
one-liner. If you haven't yet grasped the syntax, this generic structure might help:
awk 'cond1{action1} cond2{action2} ... condN{actionN}'
If a condition isn't provided, the action is always executed. Within a block, you can provide multiple statements separated by a semicolon character. If action isn't provided, then by default, contents of $0
variable is printed if the condition evaluates to true. Idiomatically, 1
is used to denote a true
condition in one-liners as a shortcut to print the contents of $0
(as seen in an earlier example). When action isn't present, you can use semicolon to terminate the condition and start another condX{actionX}
snippet.
You can use a BEGIN{}
block when you need to execute something before the input is read and an END{}
block to execute something after all of the input has been processed.
$ seq 2 | awk 'BEGIN{print "---"} 1; END{print "%%%"}'
---
1
2
%%%
Regexp field processing
As seen earlier, awk
automatically splits input into fields (based on space/tab/newline characters) which are accessible using $N
where N
is the field number you need. You can use the -F
option or assign the FS
variable to set a regexp based input field separator. Use the OFS
variable to set the output field separator.
$ echo 'goal:amazing:whistle:kwality' | awk -F: '{print $1}'
goal
# one or more alphabets will be considered as the input field separator
$ echo 'Sample123string42with777numbers' | awk -F'[a-zA-Z]+' '{print $2}'
123
$ s='Sample123string42with777numbers'
# -v option helps you set a value for the given variable
$ echo "$s" | awk -F'[0-9]+' -v OFS=, '{print $1, $(NF-1)}'
Sample,with
The FS
variable allows you to define the input field separator. In contrast, FPAT
(field pattern) allows you to define what should the fields be made up of.
# lowercase whole words starting with 'b'
$ awk -v FPAT='\\<b[a-z]*\\>' -v OFS=, '{$1=$1} 1' table.txt
brown,bread
blue
banana
# fields enclosed within double quotes or made up of non-comma characters
$ s='eagle,"fox,42",bee,frog'
$ echo "$s" | awk -v FPAT='"[^"]*"|[^,]*' '{print $2}'
"fox,42"
Record separators
By default, newline is used as the input and output record separators. You can change them using the RS
and ORS
variables.
# print records containing 'i' as well as 't'
$ printf 'Sample123string42with777numbers' | awk -v RS='[0-9]+' '/i/ && /t/'
string
with
# empty RS is paragraph mode, uses two or more newlines as the separator
$ printf 'apple\nbanana\nfig\n\n\n123\n456' | awk -v RS= 'NR==1'
apple
banana
fig
# change ORS depending on some condition
$ seq 9 | awk '{ORS = NR%3 ? "-" : "\n"} 1'
1-2-3
4-5-6
7-8-9
State machines
The condX{actionX}
shortcut makes it easy to code state machines concisely. This is useful to solve problems that depend on the contents of multiple records.
Here's an example of printing the matching line as well as c
number of lines that follow:
# same as: grep --no-group-separator -A1 'blue'
# print matching line as well as the one that follows it
$ printf 'red\nblue\ngreen\nteal\n' | awk -v c=1 '/blue/{n=c+1} n && n--'
blue
green
# print matching line as well as two lines that follow
$ printf 'red\nblue\ngreen\nteal\n' | awk -v c=2 '/blue/{n=c+1} n && n--'
blue
green
teal
Consider the following input file that has records bounded by distinct markers (lines containing start
and end
):
$ cat uniform.txt
mango
icecream
--start 1--
1234
6789
**end 1**
how are you
have a nice day
--start 2--
a
b
c
**end 2**
par,far,mar,tar
Here are some examples of processing such bounded records:
# same as: sed -n '/start/,/end/p' uniform.txt
$ awk '/start/{f=1} f; /end/{f=0}' uniform.txt
--start 1--
1234
6789
**end 1**
--start 2--
a
b
c
**end 2**
# you can re-arrange and invert the conditions to create other combinations
# for example, exclude the ending match
$ awk '/start/{f=1} /end/{f=0} f' uniform.txt
--start 1--
1234
6789
--start 2--
a
b
c
Here's an example of printing two consecutive records only if the first record contains ar
and the second one contains nice
:
$ awk 'p ~ /ar/ && /nice/{print p ORS $0} {p=$0}' uniform.txt
how are you
have a nice day
Two files processing
This section focuses on solving problems which depend upon the contents of two or more files. These are usually based on comparing records and fields. These two files will be used in the examples to follow:
$ paste c1.txt c2.txt
Blue Black
Brown Blue
Orange Green
Purple Orange
Red Pink
Teal Red
White White
The key features used to find common lines between two files:
- For two files as input,
NR==FNR
will be true only when the first file is being processedFNR
is record number likeNR
but resets for each input file
next
will skip the rest of the code and fetch the next recorda[$0]
by itself is a valid statement, creates an uninitialized element in arraya
with$0
as the key (if the key doesn't exist yet)$0 in a
checks if the given string ($0
here) exists as a key in the arraya
# common lines, same as: grep -Fxf c1.txt c2.txt
$ awk 'NR==FNR{a[$0]; next} $0 in a' c1.txt c2.txt
Blue
Orange
Red
White
# lines present in c2.txt but not in c1.txt
$ awk 'NR==FNR{a[$0]; next} !($0 in a)' c1.txt c2.txt
Black
Green
Pink
Note that the
NR==FNR
logic will fail if the first file is empty. See this unix.stackexchange thread for workarounds.
Removing duplicates
awk '!a[$0]++'
is one of the most famous awk
one-liners. It eliminates line based duplicates while retaining the input order. The following example shows this feature in action along with an illustration of how the logic works.
$ cat purchases.txt
coffee
tea
washing powder
coffee
toothpaste
tea
soap
tea
$ awk '{print +a[$0] "\t" $0; a[$0]++}' purchases.txt
0 coffee
0 tea
0 washing powder
1 coffee
0 toothpaste
1 tea
0 soap
2 tea
# only those entries with zero in the first column will be retained
$ awk '!a[$0]++' purchases.txt
coffee
tea
washing powder
toothpaste
soap
Further Reading
- My ebook CLI text processing with GNU awk
- See also my blog post GNU BRE/ERE cheatsheet
- Online gawk manual
- My blog post CLI computation with GNU datamash
perl
Perl is a scripting language with plenty of builtin features and a strong ecosystem. Perl one-liners can be used for text processing, similar to grep
, sed
, awk
and more. And similar to many command line utilities, perl
can accept input from both stdin
and file arguments.
Basic one-liners
# print all lines containing 'at'
# same as: grep 'at' and sed -n '/at/p' and awk '/at/'
$ printf 'gate\napple\nwhat\nkite\n' | perl -ne 'print if /at/'
gate
what
# print all lines NOT containing 'e'
# same as: grep -v 'e' and sed -n '/e/!p' and awk '!/e/'
$ printf 'gate\napple\nwhat\nkite\n' | perl -ne 'print if !/e/'
what
The -e
option accepts code as a command line argument. Many shortcuts are available to reduce the amount of typing needed. In the above examples, a regular expression has been used to filter the input. When the input string isn't specified, the test is performed against the special variable $_
, which has the contents of the current input line. $_
is also the default argument for many functions like print
and length
. To summarize:
/REGEXP/FLAGS
is a shortcut for$_ =~ m/REGEXP/FLAGS
!/REGEXP/FLAGS
is a shortcut for$_ !~ m/REGEXP/FLAGS
In the examples below, the -p
option is used instead of -n
. This helps to automatically print the value of $_
after processing each input line.
# same as: sed 's/:/-/' and awk '{sub(/:/, "-")} 1'
$ printf '1:2:3:4\na:b:c:d\n' | perl -pe 's/:/-/'
1-2:3:4
a-b:c:d
# same as: sed 's/:/-/g' and awk '{gsub(/:/, "-")} 1'
$ printf '1:2:3:4\na:b:c:d\n' | perl -pe 's/:/-/g'
1-2-3-4
a-b-c-d
Similar to
sed
, you can use the-i
option for inplace editing.
Perl special variables
Brief description for some of the special variables are given below:
$_
contains the input record content@F
array containing the field contents (with the-a
and-F
options)$F[0]
first field$F[1]
second field and so on$F[-1]
last field$F[-2]
second last field and so on$#F
index of the last field
$.
number of records (i.e. line number)$1
backreference to the first capture group$2
backreference to the second capture group and so on$&
backreference to the entire matched portion
You'll see examples using such variables in the sections to follow.
Auto split
Here are some examples based on specific fields rather than the entire line. The -a
option will cause the input line to be split based on whitespaces and the field contents can be accessed using the @F
special array variable. Leading and trailing whitespaces will be suppressed, so there's no possibility of empty fields.
$ cat table.txt
brown bread mat hair 42
blue cake mug shirt -7
yellow banana window shoes 3.14
# same as: awk '{print $2}' table.txt
$ perl -lane 'print $F[1]' table.txt
bread
cake
banana
# same as: awk '$NF<0' table.txt
$ perl -lane 'print if $F[-1] < 0' table.txt
blue cake mug shirt -7
# same as: awk '{gsub(/b/, "B", $1)} 1' table.txt
$ perl -lane '$F[0] =~ s/b/B/g; print "@F"' table.txt
Brown bread mat hair 42
Blue cake mug shirt -7
yellow banana window shoes 3.14
When you use an array within double quotes (like "@F"
in the example above), the fields will be printed with a space character in between. The join
function is one of the ways to print the contents of an array with a custom field separator. Here's an example:
# print contents of @F array with colon as the separator
$ perl -lane 'print join ":", @F' table.txt
brown:bread:mat:hair:42
blue:cake:mug:shirt:-7
yellow:banana:window:shoes:3.14
In the above examples, the
-l
option has been used to remove the record separator (which is newline by default) from the input line. The record separator thus removed is added back when the
Regexp field separator
You can use the -F
option to specify a regexp pattern for input field separation.
$ echo 'apple,banana,cherry' | perl -F, -lane 'print $F[1]'
banana
$ s='Sample123string42with777numbers'
$ echo "$s" | perl -F'\d+' -lane 'print join ",", @F'
Sample,string,with,numbers
Powerful features
I reach for Perl over grep
, sed
and awk
when I need powerful regexp features and make use of the vast builtin functions and libraries.
Here are some examples showing regexp features not present in BRE/ERE:
# reverse lowercase alphabets at the end of input lines
# the 'e' flag allows you to use Perl code in the replacement section
$ echo 'fig 42apples' | perl -pe 's/[a-z]+$/reverse $&/e'
fig 42selppa
# replace arithmetic expressions with their results
$ echo '42*10 200+100 22/7' | perl -pe 's|\d+[+/*-]\d+|$&|gee'
420 300 3.14285714285714
# exclude terms in the search pattern
$ s='orange apple appleseed'
$ echo "$s" | perl -pe 's#\bapple\b(*SKIP)(*F)|\w+#($&)#g'
(orange) apple (appleseed)
And here are some examples showing off builtin features:
# filter fields containing 'in' or 'it' or 'is'
$ s='goal:amazing:42:whistle:kwality:3.14'
$ echo "$s" | perl -F: -lane 'print join ":", grep {/i[nts]/} @F'
amazing:whistle:kwality
# sort numbers in ascending order
# use {$b <=> $a} for descending order
$ echo '23 756 -983 5' | perl -lane 'print join " ", sort {$a <=> $b} @F'
-983 5 23 756
# sort strings in ascending order
$ s='floor bat to dubious four'
$ echo "$s" | perl -lane 'print join ":", sort @F'
bat:dubious:floor:four:to
# unique fields, maintains input order of elements
# -M option helps you load modules
$ s='3,b,a,3,c,d,1,d,c,2,2,2,3,1,b'
$ echo "$s" | perl -MList::Util=uniq -F, -lane 'print join ",", uniq @F'
3,b,a,c,d,1,2
Further Reading
Exercises
Use the example_files/text_files directory for input files used in the following exercises.
1) Replace all occurrences of 0xA0
with 0x50
and 0xFF
with 0x7F
for the given input.
$ printf 'a1:0xA0, a2:0xA0A1\nb1:0xFF, b2:0xBE\n'
a1:0xA0, a2:0xA0A1
b1:0xFF, b2:0xBE
$ printf 'a1:0xA0, a2:0xA0A1\nb1:0xFF, b2:0xBE\n' | sed # ???
a1:0x50, a2:0x50A1
b1:0x7F, b2:0xBE
2) Remove only the third line from the given input.
$ seq 34 37 | # ???
34
35
37
3) For the input file sample.txt
, display all lines that contain it
but not do
.
# ???
7) Believe it
4) For the input file purchases.txt
, delete all lines containing tea
. Also, replace all occurrences of coffee
with milk
. Write back the changes to the input file itself. The original contents should get saved to purchases.txt.orig
. Afterwards, restore the contents from this backup file.
# make the changes
# ???
$ ls purchases*
purchases.txt purchases.txt.orig
$ cat purchases.txt
milk
washing powder
milk
toothpaste
soap
# restore the contents
# ???
$ ls purchases*
purchases.txt
$ cat purchases.txt
coffee
tea
washing powder
coffee
toothpaste
tea
soap
tea
5) For the input file sample.txt
, display all lines from the start of the file till the first occurrence of are
.
# ???
1) Hello World
2)
3) Hi there
4) How are you
6) Delete all groups of lines from a line containing start
to a line containing end
for the uniform.txt
input file.
# ???
mango
icecream
how are you
have a nice day
par,far,mar,tar
7) Replace all occurrences of 42
with [42]
unless it is at the edge of a word.
$ echo 'hi42bye nice421423 bad42 cool_4242a 42c' | sed # ???
hi[42]bye nice[42]1[42]3 bad42 cool_[42][42]a 42c
8) Replace all whole words with X
that start and end with the same word character.
$ echo 'oreo not a _oh_ pip RoaR took 22 Pop' | sed # ???
X not X X X X took X Pop
9) For the input file anchors.txt
, convert markdown anchors to hyperlinks as shown below.
$ cat anchors.txt
# <a name="regular-expressions"></a>Regular Expressions
## <a name="subexpression-calls"></a>Subexpression calls
## <a name="the-dot-meta-character"></a>The dot meta character
$ sed # ???
[Regular Expressions](#regular-expressions)
[Subexpression calls](#subexpression-calls)
[The dot meta character](#the-dot-meta-character)
10) Replace all occurrences of e
with 3
except the first two matches.
$ echo 'asset sets tests site' | sed # ???
asset sets t3sts sit3
$ echo 'sample item teem eel' | sed # ???
sample item t33m 33l
11) The below sample strings use ,
as the delimiter and the field values can be empty as well. Use sed
to replace only the third field with 42
.
$ echo 'lion,,ant,road,neon' | sed # ???
lion,,42,road,neon
$ echo ',,,' | sed # ???
,,42,
12) For the input file table.txt
, calculate and display the product of numbers in the last field of each line. Consider space as the field separator for this file.
$ cat table.txt
brown bread mat hair 42
blue cake mug shirt -7
yellow banana window shoes 3.14
# ???
-923.16
13) Extract the contents between ()
or )(
from each of the input lines. Assume that the ()
characters will be present only once every line.
$ printf 'apple(ice)pie\n(almond)pista\nyo)yoyo(yo\n'
apple(ice)pie
(almond)pista
yo)yoyo(yo
$ printf 'apple(ice)pie\n(almond)pista\nyo)yoyo(yo\n' | awk # ???
ice
almond
yoyo
14) For the input file scores.csv
, display the Name
and Physics
fields in the format shown below.
$ cat scores.csv
Name,Maths,Physics,Chemistry
Ith,100,100,100
Cy,97,98,95
Lin,78,83,80
# ???
Name:Physics
Ith:100
Cy:98
Lin:83
15) Extract and display the third and first words in the format shown below.
$ echo '%whole(Hello)--{doubt}==ado==' | # ???
doubt:whole
$ echo 'just,\joint*,concession_42<=nice' | # ???
concession_42:just
16) For the input file scores.csv
, add another column named GP which is calculated out of 100 by giving 50% weightage to Maths and 25% each for Physics and Chemistry.
$ awk # ???
Name,Maths,Physics,Chemistry,GP
Ith,100,100,100,100
Cy,97,98,95,96.75
Lin,78,83,80,79.75
17) From the para.txt
input file, display all paragraphs containing any digit character.
$ cat para.txt
hi there
how are you
2 apples
12 bananas
blue sky
yellow sun
brown earth
$ awk # ???
2 apples
12 bananas
18) Input has the ASCII NUL character as the record separator. Change it to dot and newline characters as shown below.
$ printf 'apple\npie\0banana\ncherry\0' | awk # ???
apple
pie.
banana
cherry.
19) For the input file sample.txt
, print a matching line containing do
only if you
is found two lines before. For example, if do
is found on line number 10 and the 8th line contains you
, then the 10th line should be printed.
# ???
6) Just do-it
20) For the input file blocks.txt
, extract contents from a line containing exactly %=%=
until but not including the next such line. The block to be extracted is indicated by the variable n
passed via the -v
option.
$ cat blocks.txt
%=%=
apple
banana
%=%=
brown
green
$ awk -v n=1 # ???
%=%=
apple
banana
$ awk -v n=2 # ???
%=%=
brown
green
21) Display lines present in c1.txt
but not in c2.txt
using the awk
command.
$ awk # ???
Brown
Purple
Teal
22) Display lines from scores.csv
by matching the first field based on a list of names from the names.txt
file.
$ printf 'Ith\nLin\n' > names.txt
$ awk # ???
Ith,100,100,100
Lin,78,83,80
$ rm names.txt
23) Retain only the first copy of duplicate lines from the duplicates.txt
input file. Use only the contents of the last field for determining duplicates.
$ cat duplicates.txt
brown,toy,bread,42
dark red,ruby,rose,111
blue,ruby,water,333
dark red,sky,rose,555
yellow,toy,flower,333
white,sky,bread,111
light red,purse,rose,333
# ???
brown,toy,bread,42
dark red,ruby,rose,111
blue,ruby,water,333
dark red,sky,rose,555
24) For the input file table.txt
, print input lines if the second field starts with b
. Construct solutions using awk
and perl
.
$ awk # ???
brown bread mat hair 42
yellow banana window shoes 3.14
$ perl # ???
brown bread mat hair 42
yellow banana window shoes 3.14
25) For the input file table.txt
, retain only the second last field. Write back the changes to the input file itself. The original contents should get saved to table.txt.bkp
. Afterwards, restore the contents from this backup file.
# make the changes
$ perl # ???
$ ls table*
table.txt table.txt.bkp
$ cat table.txt
hair
shirt
shoes
# restore the contents
# ???
$ ls table*
table.txt
$ cat table.txt
brown bread mat hair 42
blue cake mug shirt -7
yellow banana window shoes 3.14
26) Reverse the first field contents of table.txt
input file.
# ???
nworb bread mat hair 42
eulb cake mug shirt -7
wolley banana window shoes 3.14
27) Sort the given comma separated input lexicographically. Change the output field separator to a :
character.
$ ip='floor,bat,to,dubious,four'
$ echo "$ip" | perl # ???
bat:dubious:floor:four:to
28) Filter fields containing digit characters.
$ ip='5pearl 42 east 1337 raku_6 lion 3.14'
$ echo "$ip" | perl # ???
5pearl 42 1337 raku_6 3.14
29) The input shown below has several words ending with digit characters. Change the words containing test
to match the output shown below. That is, renumber the matching portions to 1
, 2
, etc. Words not containing test
should not be changed.
$ ip='test_12:test123\nanother_test_4,no_42\n'
$ printf '%b' "$ip"
test_12:test123
another_test_4,no_42
$ printf '%b' "$ip" | perl # ???
test_1:test2
another_test_3,no_42
30) For the input file table.txt
, change contents of the third field to all uppercase. Construct solutions using sed
, awk
and perl
.
$ sed # ???
brown bread MAT hair 42
blue cake MUG shirt -7
yellow banana WINDOW shoes 3.14
$ awk # ???
brown bread MAT hair 42
blue cake MUG shirt -7
yellow banana WINDOW shoes 3.14
$ perl # ???
brown bread MAT hair 42
blue cake MUG shirt -7
yellow banana WINDOW shoes 3.14