Processing lines bounded by distinct markers

Address range was already introduced in an earlier chapter. This chapter will cover a wide variety of use cases where you need to process a group of lines defined by a starting and an ending pattern. For some examples, other text processing commands will also be used to construct a simpler one-liner compared to a complex sed only solution.

The example_files directory has all the files used in the examples.

Uniform markers

This section will cover cases where the input file will always contain the same number of starting and ending patterns and arranged in an alternating fashion. For example, there cannot be two starting patterns appearing without an ending pattern between them and vice versa. Lines of text inside and between such groups are optional.

The sample file shown below will be used to illustrate examples in this section. For simplicity, assume that the starting pattern is marked by start and the ending pattern by end. They have also been given group numbers to make it easier to analyze the output.

$ cat uniform.txt
mango
icecream
--start 1--
1234
6789
**end 1**
how are you
have a nice day
--start 2--
a
b
c
**end 2**
par,far,mar,tar

Case 1: Processing all the group of lines based on the distinct markers, including the lines matched by markers themselves. For simplicity, the below command will just print all such lines. This use case was already covered in the Address range section as well.

$ sed -n '/start/,/end/p' uniform.txt
--start 1--
1234
6789
**end 1**
--start 2--
a
b
c
**end 2**

Case 2: Processing all the group of lines but excluding the lines matched by markers themselves.

# recall that // represents the last REGEXP that was matched
$ sed -n '/start/,/end/{//! s/^/* /p}' uniform.txt
* 1234
* 6789
* a
* b
* c

Case 3: Processing all the group of lines but excluding the ending marker.

$ sed -n '/start/,/end/{/end/!p}' uniform.txt
--start 1--
1234
6789
--start 2--
a
b
c

Case 4: Processing all the group of lines but excluding the starting marker.

$ sed -n '/start/,/end/{/start/!p}' uniform.txt
1234
6789
**end 1**
a
b
c
**end 2**

Case 5: Processing all input lines except the group of lines bound by the markers.

$ sed '/start/,/end/d; s/$/./' uniform.txt
mango.
icecream.
how are you.
have a nice day.
par,far,mar,tar.

Case 6 Processing all input lines except the group of lines between the markers.

$ sed '/start/,/end/{//!d}' uniform.txt
mango
icecream
--start 1--
**end 1**
how are you
have a nice day
--start 2--
**end 2**
par,far,mar,tar

Case 7: Similar to case 6, but include the starting marker.

$ sed '/start/,/end/{/start/!d}' uniform.txt
mango
icecream
--start 1--
how are you
have a nice day
--start 2--
par,far,mar,tar

Case 8: Similar to case 6, but include the ending marker.

$ sed '/start/,/end/{/end/!d}' uniform.txt
mango
icecream
**end 1**
how are you
have a nice day
**end 2**
par,far,mar,tar

Extracting the first or last group

The same sample input file from the previous section will be used for this section's examples as well. The task is to extract only the first or the very last group of lines defined by the markers.

To get the first block, simply apply the q command when the ending mark is matched.

$ sed -n '/start/,/end/{p; /end/q}' uniform.txt
--start 1--
1234
6789
**end 1**

# use other tricks discussed in previous section as needed
$ sed -n '/start/,/end/{//!p; /end/q}' uniform.txt
1234
6789

To get the last block, reverse the input linewise, change the order of address range, get the first block, and then reverse linewise again.

$ tac uniform.txt | sed -n '/end/,/start/{p; /start/q}' | tac
--start 2--
a
b
c
**end 2**

Broken groups

Sometimes, the starting and ending markers aren't always present uniformly in pairs. For example, consider a log file which can have multiple warning messages followed by an error message as shown below.

$ cat log.txt
foo baz 123
--> warning 1
a,b,c,d
42
--> warning 2
x,y,z
--> warning 3
4,3,1
==> error 1
hi bye

Considering error lines as the ending marker, the starting marker might be one of two possibilities. Either get all the warning messages or get only the last warning message that occurs before the error.

$ sed -n '/warning/,/error/p' log.txt
--> warning 1
a,b,c,d
42
--> warning 2
x,y,z
--> warning 3
4,3,1
==> error 1

$ tac log.txt | sed -n '/error/,/warning/p' | tac
--> warning 3
4,3,1
==> error 1

If both the starting and ending markers can occur multiple times, then awk or perl would be better suited.

Summary

This chapter didn't introduce any new feature, but rather dealt with a variety of use cases that need the address range filter. Some of them required using other commands to make the solution simpler. The next chapter will discuss various gotchas that you may encounter while using sed and a few tricks to get better performance. After that, there's another chapter with resource links for further reading. Hope you found sed as an interesting and useful tool to learn. Happy coding!

Exercises

The exercises directory has all the files used in this section.

1) The blocks.txt file uses %=%= to separate group of lines. Display the last such group.

##### add your solution here
%=%=
hi
hello there
bye

2) The code.txt file has code snippets that are surrounded by whole lines containing %%Code: python%% and %%end%%. The end of such snippets is always followed by an empty line. Assume that there will always be at least one line of code between the markers. Delete all such code snippets as well as the empty line that follows.

$ sed ##### add your solution here
H1: Introduction

REPL is a good way to learn Python for beginners.

H2: String methods

Python comes loaded with awesome methods.
Enjoy learning Python.

3) The code.txt file has code snippets that are surrounded by whole lines containing %%Code: python%% and %%end%%. Display the lines between such markers only for the first block.

$ sed ##### add your solution here
>>> 3 + 7
10
>>> 22 / 7
3.142857142857143
>>> 22 // 7
3

4) The input file broken.txt starts with a line containing top followed by some content before a line containing bottom is found. Blocks of lines bounded by these two markers repeats except for the last block as it is missing the bottom marker. The first sed command shown below doesn't work because it is matching till the end of file due to the missing marker. Correct this command to get the expected output shown below.

$ cat broken.txt
--top--
3.14
[bottom]
--top--
1234567890
[bottom]
--top--
Hi there
Have a nice day
Good bye

# wrong output
$ sed -n '/top/,/bottom/ {//!p}' broken.txt
3.14
1234567890
Hi there
Have a nice day
Good bye

# expected output
##### add your solution here
3.14
1234567890

5) For the input file addr.txt, replace the lines occurring between the markers How and 12345 with the contents of the file hex.txt.

$ sed ##### add your solution here
Hello World
How are you
start address: 0xA0, func1 address: 0xA0
end address: 0xFF, func2 address: 0xB0
12345
You are funny

CLI text processing with GNU sed