Processing multiple records

Often, you need to consider multiple lines at a time to make a decision, such as the paragraph mode examples seen earlier. Sometimes, you need to match a particular record and then get records surrounding the matched record. Solution to these type of problems often take the form of state machines. See softwareengineering: FSM examples if you are not familiar with state machines.

Processing consecutive records

You might need a condition that should satisfy something for one record and something else for the very next record. There are many ways to tackle this problem. One possible solution is to use a variable to save the previous record and then create the required conditional expression using that variable and $_ which already has the current record content.

$ # match and print two consecutive records
$ # first record should contain 'as' and second record should contain 'not'
$ perl -ne 'print $p, $_ if /not/ && $p=~/as/; $p = $_' programming_quotes.txt
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it by Brian W. Kernighan

$ # same filtering as above, but print only the first record
$ perl -ne 'print $p if /not/ && $p=~/as/; $p = $_' programming_quotes.txt
Therefore, if you write the code as cleverly as possible, you are,

$ # same filtering as above, but print only the second record
$ perl -ne 'print if /not/ && $p=~/as/; $p = $_' programming_quotes.txt
by definition, not smart enough to debug it by Brian W. Kernighan

Context matching

Sometimes you want not just the matching records, but the records relative to the matches as well. For example, it could be to see the comments at the start of a function block that was matched while searching a program file. Or, it could be to see extended information from a log file while searching for a particular error message.

Consider this sample input file:

$ cat context.txt
blue
    toy
    flower
    sand stone
light blue
    flower
    sky
    water
language
    english
    hindi
    spanish
    tamil
programming language
    python
    kotlin
    ruby

Case 1: Here's an example that emulates grep --no-group-separator -A<n> functionality. The $n && $n-- trick used in the example below works like this:

  • If initially $n=2
    • 2 && 2 --> evaluates to true and $n becomes 1
    • 1 && 1 --> evaluates to true and $n becomes 0
    • 0 && --> evaluates to false and $n doesn't change
  • Note that when conditionals are connected with logical &&, the second expression will not be executed at all if the first one turns out to be false because the overall result will always be false. Same is the case if the first expression evaluates to true with logical || operator. Such logical operators are also known as short-circuit operators. Thus, in the above case, $n-- won't be executed when $n is 0 on the left hand side. This prevents $n going negative and $n && $n-- will never become true unless $n is assigned again.
$ # same as: grep --no-group-separator -A1 'blue'
$ # print matching line as well as the one that follows it
$ perl -ne '$n=2 if /blue/; print if $n && $n--' context.txt
blue
    toy
light blue
    flower

$ # for overlapping cases, $n gets re-assigned before $n becomes 0
$ perl -ne '$n=2 if /toy|flower/; print if $n && $n--' context.txt
    toy
    flower
    sand stone
    flower
    sky

Once you've understood the above examples, the rest of the examples in this section should be easier to comprehend. They are all variations of the logic used above and re-arranged to solve the use case being discussed.

Case 2: Print n records after the matching record. This is similar to previous case, except that the matching record isn't printed.

$ # print 2 lines after the matching line
$ perl -ne 'print if $n && $n--; $n=2 if /prog/' context.txt
    python
    kotlin

Case 3: Printing nth record after the matching record.

$ # print only the 3rd line found after the matching line
$ # $n && !--$n will be true only when --$n yields 0
$ # overlapping cases won't work as $n gets re-assigned before going to 0
$ perl -ne 'print if $n && !--$n; $n=3 if /language/' context.txt
    spanish
    ruby

Case 4: Printing the matched record and n records before it.

$ # print matched record and 2 records before the match
$ perl -ne '$ip[$.]=$_; print @ip[$.-2 .. $.] if /stone/' context.txt
    toy
    flower
    sand stone

$ # this will work even if there are less than n records before a match
$ n=5 perl -ne '$i=$.-$ENV{n}; $i=0 if $i<0; $ip[$.]=$_;
                print @ip[$i .. $.] if /toy/' context.txt
blue
    toy

To prevent confusion with overlapping cases, you can add a separation line between the results.

$ n=2 perl -ne '$i=$.-$ENV{n}; $i=0 if $i<0; $ip[$.]=$_;
                if(/toy|flower/){print $s, @ip[$i .. $.]; $s="---\n"}' context.txt
blue
    toy
---
blue
    toy
    flower
---
    sand stone
light blue
    flower

Case 5: Print nth record before the matching record.

$ n=2 perl -ne '$i=$.-$ENV{n}; $i=0 if $i<0; $ip[$.]=$_;
                print $ip[$i] if /language/' context.txt
    sky
    spanish

$ # if the count is small enough, you can save them in variables
$ # this one prints 2nd line before the matching line
$ perl -ne 'print $p2 if /toy|flower/; $p2=$p1; $p1=$_' context.txt
blue
    sand stone

You can also use the logic from Case 3 by applying tac twice. This avoids the need to use a hash variable.

$ tac context.txt | perl -ne 'print if $n && !--$n; $n=2 if /language/' | tac
    sky
    spanish

Records bounded by distinct markers

This section will cover cases where the input file will always contain the same number of starting and ending patterns and arranged in alternating fashion. For example, there cannot be two starting patterns appearing without an ending pattern between them and vice versa. Zero or more records of text can appear inside such groups as well as in between the groups.

The sample file shown below will be used to illustrate examples in this section. For simplicity, assume that the starting pattern is marked by start and the ending pattern by end. They have also been given group numbers to make it easier to visualize the transformation between input and output for the commands discussed in this section.

$ cat uniform.txt
mango
icecream
--start 1--
1234
6789
**end 1**
how are you
have a nice day
--start 2--
a
b
c
**end 2**
par,far,mar,tar

Case 1: Processing all the groups of records based on the distinct markers, including the records matched by markers themselves. For simplicity, the below command will just print all such records.

$ perl -ne '$f=1 if /start/; print if $f; $f=0 if /end/' uniform.txt
--start 1--
1234
6789
**end 1**
--start 2--
a
b
c
**end 2**

info perl -ne 'print if /start/../end/' can be used as seen previously in Range operator section. The state machine format is more flexible for various cases to follow.

Case 2: Processing all the groups of records but excluding the records matched by markers themselves.

$ perl -ne '$f=0 if /end/; print "* $_" if $f; $f=1 if /start/' uniform.txt
* 1234
* 6789
* a
* b
* c

Case 3-4: Processing all the groups of records but excluding one of the markers.

$ perl -ne '$f=1 if /start/; $f=0 if /end/; print if $f' uniform.txt
--start 1--
1234
6789
--start 2--
a
b
c

$ perl -ne 'print if $f; $f=1 if /start/; $f=0 if /end/' uniform.txt
1234
6789
**end 1**
a
b
c
**end 2**

The next four cases are obtained by just using if !$f instead of if $f from the cases shown above.

Case 5: Processing all input records except the groups of records bound by the markers.

$ # same as: perl -ne 'print if !(/start/../end/)'
$ perl -ne '$f=1 if /start/; print if !$f; $f=0 if /end/' uniform.txt
mango
icecream
how are you
have a nice day
par,far,mar,tar

Case 6 Processing all input records except the groups of records between the markers.

$ perl -ne '$f=0 if /end/; print if !$f; $f=1 if /start/' uniform.txt
mango
icecream
--start 1--
**end 1**
how are you
have a nice day
--start 2--
**end 2**
par,far,mar,tar

Case 7-8: Similar to case 6, but include only one of the markers.

$ perl -ne 'print if !$f; $f=1 if /start/; $f=0 if /end/' uniform.txt
mango
icecream
--start 1--
how are you
have a nice day
--start 2--
par,far,mar,tar

$ perl -ne '$f=1 if /start/; $f=0 if /end/; print if !$f' uniform.txt
mango
icecream
**end 1**
how are you
have a nice day
**end 2**
par,far,mar,tar

Specific blocks

Instead of working with all the groups (or blocks) bound by the markers, this section will discuss how to choose blocks based on some additional criteria.

Here's how you can process only the first matching block. See also stackoverflow: copy pattern between range only once and stackoverflow: extract only first range.

$ perl -ne '$f=1 if /start/; print if $f; exit if /end/' uniform.txt
--start 1--
1234
6789
**end 1**
$ # use other tricks discussed in previous section as needed
$ perl -ne 'exit if /end/; print if $f; $f=1 if /start/' uniform.txt
1234
6789

Getting last block alone involves lot more work, unless you happen to know how many blocks are present in the input file.

$ # reverse input linewise, change the order of comparison, reverse again
$ # can't be used if record separator has to be something other than newline
$ tac uniform.txt | perl -ne '$f=1 if /end/; print if $f; exit if /start/' | tac
--start 2--
a
b
c
**end 2**

$ # or, save the blocks in a buffer and print the last one alone
$ perl -ne 'if(/start/){$f=1; $buf=$_; next}
            $buf .= $_ if $f;
            $f=0 if /end/;
            END{print $buf}' uniform.txt
--start 2--
a
b
c
**end 2**

Only the nth block.

$ seq 30 | perl -ne 'BEGIN{$n=2; $c=0} $c++ if /4/; if($c==$n){print; exit if /6/}'
14
15
16

All blocks greater than nth block.

$ seq 30 | perl -ne 'BEGIN{$n=1; $c=0} if(/4/){$f=1; $c++}
                     print if $f && $c>$n; $f=0 if /6/'
14
15
16
24
25
26

Excluding nth block.

$ seq 30 | perl -ne 'BEGIN{$n=2; $c=0} if(/4/){$f=1; $c++}
                     print if $f && $c!=$n; $f=0 if /6/'
4
5
6
24
25
26

All blocks, only if the block matches an additional condition.

$ # additional condition here is '15' as one of the lines in the block
$ seq 30 | perl -ne 'if(/4/){$f=1; $buf=$_; next}
                     $buf .= $_ if $f;
                     if(/6/){$f=0; print $buf if $buf=~/^15$/m}'
14
15
16

Broken blocks

Sometimes, you can have markers in random order and mixed in different ways. In such cases, to work with blocks without any other marker present in between them, the buffer approach comes in handy again.

$ cat broken.txt
qqqqqqqqqqqqqqqq
error 1
hi
error 2
1234
6789
state 1
bye
state 2
error 3
xyz
error 4
abcd
state 3
zzzzzzzzzzzzzzzz

$ perl -ne 'if(/error/){$f=1; $buf=$_; next}
            $buf .= $_ if $f;
            if(/state/){print $buf if $f; $f=0}' broken.txt
error 2
1234
6789
state 1
error 4
abcd
state 3

Summary

This chapter covered various examples of working with multiple records. State machines play an important role in deriving solutions for such cases. Knowing various corner cases is also crucial, otherwise a solution that works for one input may fail for others.

Next chapter will discuss use cases where you need to process a file input based on contents of another file.

Exercises

a) For the input file sample.txt, print a matching line containing do only if the previous line is empty and the line before that contains you.

##### add your solution here
Just do-it
Much ado about nothing

b) Print only the second matching line respectively for the search terms do and not for the input file sample.txt. Match these terms case insensitively.

$ # for reference, here's all the matches
$ grep -i 'do' sample.txt
Just do-it
No doubt you like it too
Much ado about nothing
$ grep -i 'not' sample.txt
Not a bit funny
Much ado about nothing

##### add your solution here
No doubt you like it too
Much ado about nothing

c) For the input file sample.txt, print matching line as well as n lines around the matching lines. The value for n is passed to the perl command as an environment value.

$ # match a line containing 'are' or 'bit'
$ n=1 ##### add your solution here
Good day
How are you

Today is sunny
Not a bit funny
No doubt you like it too

$ # match a line containing 'World'
$ n=2 ##### add your solution here
Hello World

Good day

d) For the input file broken.txt, print all lines between the markers top and bottom. The first perl command shown below doesn't work because it is matching till end of file if second marker isn't found. Assume that the input file cannot have two top markers without a bottom marker appearing in between and vice-versa.

$ cat broken.txt
top
3.14
bottom
---
top
1234567890
bottom
top
Hi there
Have a nice day
Good bye

$ # wrong output
$ perl -ne '$f=0 if /bottom/; print if $f; $f=1 if /top/' broken.txt
3.14
1234567890
Hi there
Have a nice day
Good bye

$ # expected output
##### add your solution here
3.14
1234567890

e) For the input file concat.txt, extract contents from a line starting with %%% until but not including the next such line. The block to be extracted is indicated by variable n passed as an environment value.

$ cat concat.txt
%%% addr.txt
How are you
This game is good
Today %%% is sunny
%%% broken.txt
top %%%
1234567890
bottom
%%% sample.txt
Just %%% do-it
Believe it
%%% mixed_fs.txt
pink blue white yellow
car,mat,ball,basket

$ n=2 ##### add your solution here
%%% broken.txt
top %%%
1234567890
bottom

$ n=4 ##### add your solution here
%%% mixed_fs.txt
pink blue white yellow
car,mat,ball,basket

f) For the input file perl.md, replace all occurrences of perl (irrespective of case) with Perl. But, do not replace any matches between ```perl and ``` lines (perl in these markers shouldn't be replaced either).

##### add your solution here, redirect the output to 'out.md'

$ diff -sq out.md expected.md 
Files out.md and expected.md are identical

g) Print the last two lines for each of the input files ip.txt, sample.txt and table.txt. Also, add a separator between the results as shown below (note that the separator isn't present at the end of the output). Assume input files will have at least two lines.

##### add your solution here
12345
You are funny
---
Much ado about nothing
He he he
---
blue cake mug shirt -7
yellow banana window shoes 3.14