Selective editing

By default, sed acts on the entire input content. Many a times, you want to act only upon specific portions of the input. To that end, sed has features to filter lines, similar to tools like grep, head and tail. sed can replicate most of grep's filtering features without too much fuss. And has additional features like line number based filtering, selecting lines between two patterns, relative addressing, etc. If you are familiar with functional programming, you would have come across the map, filter, reduce paradigm. A typical task with sed involves filtering a subset of input and then modifying (mapping) them. Sometimes, the subset is the entire input, as seen in the examples of previous chapters.

A tool optimized for a particular functionality should be preferred where possible. grep, head and tail would be better performance wise compared to sed for equivalent line filtering solutions.

The example_files directory has all the files used in the examples.

REGEXP filtering

As seen earlier, syntax for the substitute command is s/REGEXP/REPLACEMENT/FLAGS. The /REGEXP/FLAGS portion can be used as a conditional expression to allow commands to execute only for the lines matching the pattern.

# change commas to hyphens only if the input line contains '2'
# space between the filter and the command is optional
$ printf '1,2,3,4\na,b,c,d\n' | sed '/2/ s/,/-/g'
1-2-3-4
a,b,c,d

Use /REGEXP/FLAGS! to act upon lines other than the matching ones.

# change commas to hyphens if the input line does NOT contain '2'
# space around ! is optional
$ printf '1,2,3,4\na,b,c,d\n' | sed '/2/! s/,/-/g'
1,2,3,4
a-b-c-d

/REGEXP/ is one of the ways to define a filter, termed as address in the manual. Others will be covered later in this chapter.

Regular expressions will be discussed later. In this chapter, the examples with /REGEXP/ filtering will use only fixed strings (exact string comparison).

Delete command

To delete the filtered lines, use the d command. Recall that all input lines are printed by default.

# same as: grep -v 'at'
$ printf 'sea\neat\ndrop\n' | sed '/at/d'
sea
drop

To get the default grep filtering, use the !d combination. Sometimes, negative logic can get confusing to use. It boils down to personal preference, similar to choosing between if and unless conditionals in programming languages.

# same as: grep 'at'
$ printf 'sea\neat\ndrop\n' | sed '/at/!d'
eat

Using an address is optional. So, for example, sed '!d' file would be equivalent to the cat file command.
# same as: cat greeting.txt
$ sed '!d' greeting.txt
Hi there
Have a nice day

Print command

To print the filtered lines, use the p command. But, recall that all input lines are printed by default. So, this command is typically used in combination with the -n option, which turns off the default printing.

$ cat rhymes.txt
it is a warm and cozy day
listen to what I say
go play in the park
come back before the sky turns dark

There are so many delights to cherish
Apple, Banana and Cherry
Bread, Butter and Jelly
Try them all before you perish

# same as: grep 'warm' rhymes.txt
$ sed -n '/warm/p' rhymes.txt
it is a warm and cozy day

# same as: grep 'n t' rhymes.txt
$ sed -n '/n t/p' rhymes.txt
listen to what I say
go play in the park

The substitute command provides p as a flag. In such a case, the modified line would be printed only if the substitution succeeded.

$ sed -n 's/warm/cool/gp' rhymes.txt
it is a cool and cozy day

# filter + substitution + p combination
$ sed -n '/the/ s/ark/ARK/gp' rhymes.txt
go play in the pARK
come back before the sky turns dARK

Using !p with the -n option will be equivalent to using the d command.

# same as: sed '/at/d'
$ printf 'sea\neat\ndrop\n' | sed -n '/at/!p'
sea
drop

Here's an example of using the p command without the -n option.

# duplicate every line
$ seq 2 | sed 'p'
1
1
2
2

Quit commands

The q command causes sed to exit immediately. Remaining commands and input lines will not be processed.

# quits after an input line containing 'say' is found
$ sed '/say/q' rhymes.txt
it is a warm and cozy day
listen to what I say

The Q command is similar to q but won't print the matching line.

# matching line won't be printed
$ sed '/say/Q' rhymes.txt
it is a warm and cozy day

Use tac to get all lines starting from the last occurrence of the search string in the entire file.

$ tac rhymes.txt | sed '/an/q' | tac
Bread, Butter and Jelly
Try them all before you perish

You can optionally provide an exit status (from 0 to 255) along with the quit commands.

$ printf 'sea\neat\ndrop\n' | sed '/at/q2'
sea
eat
$ echo $?
2

$ printf 'sea\neat\ndrop\n' | sed '/at/Q3'
sea
$ echo $?
3

Be careful if you want to use q or Q commands with multiple files, as sed will stop even if there are other files remaining to be processed. You could use a mixed address range as a workaround. See also unix.stackexchange: applying q to multiple files.

Multiple commands

Commands seen so far can be specified more than once by separating them using ; or using the -e option multiple times. See sed manual: Multiple commands syntax for more details.

# print all input lines as well as the modified lines
$ printf 'sea\neat\ndrop\n' | sed -n -e 'p' -e 's/at/AT/p'
sea
eat
eAT
drop

# equivalent command to the above example using ; instead of -e
# space around ; is optional
$ printf 'sea\neat\ndrop\n' | sed -n 'p; s/at/AT/p'
sea
eat
eAT
drop

You can also separate the commands using a literal newline character. If many lines are needed, it is better to use a sed script instead.

# here, each command is separated by a literal newline character
# similar to $ representing the primary prompt PS1,
# > represents the secondary prompt PS2
$ sed -n '
> /the/ s/ark/ARK/gp
> s/warm/cool/gp
> s/Bread/Cake/gp
> ' rhymes.txt
it is a cool and cozy day
go play in the pARK
come back before the sky turns dARK
Cake, Butter and Jelly

Do not use multiple commands to construct conditional OR of multiple search strings, as you might get lines duplicated in the output as shown below. You can use regular expression feature alternation for such cases.
$ sed -ne '/play/p' -e '/ark/p' rhymes.txt
go play in the park
go play in the park
come back before the sky turns dark

To execute multiple commands for a common filter, use {} to group the commands. You can also nest them if needed.

# spaces around {} is optional
$ printf 'gates\nnot\nused\n' | sed '/e/{s/s/*/g; s/t/*/g}'
ga*e*
not
u*ed

$ sed -n '/the/{s/for/FOR/gp; /play/{p; s/park/PARK/gp}}' rhymes.txt
go play in the park
go play in the PARK
come back beFORe the sky turns dark
Try them all beFORe you perish

Command grouping is an easy way to construct conditional AND of multiple search strings.

# same as: grep 'ark' rhymes.txt | grep 'play'
$ sed -n '/ark/{/play/p}' rhymes.txt
go play in the park

# same as: grep 'the' rhymes.txt | grep 'for' | grep 'urn'
$ sed -n '/the/{/for/{/urn/p}}' rhymes.txt
come back before the sky turns dark

# same as: grep 'for' rhymes.txt | grep -v 'sky'
$ sed -n '/for/{/sky/!p}' rhymes.txt
Try them all before you perish

Other solutions using alternation feature of regular expressions and sed's control structures will be discussed later.

Line addressing

Line numbers can also be used as a filtering criteria.

# here, 3 represents the address for the print command
# same as: head -n3 rhymes.txt | tail -n1
# same as: sed '3!d'
$ sed -n '3p' rhymes.txt
go play in the park

# print the 2nd and 6th lines
$ sed -n '2p; 6p' rhymes.txt
listen to what I say
There are so many delights to cherish

# apply substitution only for the 2nd line
$ printf 'gates\nnot\nused\n' | sed '2 s/t/*/g'
gates
no*
used

As a special case, $ indicates the last line of the input.

# same as: tail -n1 rhymes.txt
$ sed -n '$p' rhymes.txt
Try them all before you perish

For large input files, use the q command to avoid processing unnecessary input lines.

$ seq 3542 4623452 | sed -n '2452{p; q}'
5993
$ seq 3542 4623452 | sed -n '250p; 2452{p; q}'
3791
5993

# here is a sample time comparison
$ time seq 3542 4623452 | sed -n '2452{p; q}' > f1
real    0m0.005s
$ time seq 3542 4623452 | sed -n '2452p' > f2
real    0m0.121s
$ rm f1 f2

Mimicking the head command using line number addressing and the q command.

# same as: seq 23 45 | head -n5
$ seq 23 45 | sed '5q'
23
24
25
26
27

Print only the line number

The = command will display the line numbers of matching lines.

# gives both the line number and matching lines
$ grep -n 'the' rhymes.txt
3:go play in the park
4:come back before the sky turns dark
9:Try them all before you perish

# gives only the line number of matching lines
# note the use of the -n option to avoid default printing
$ sed -n '/the/=' rhymes.txt
3
4
9

If needed, matching line can also be printed. But there will be a newline character between the matching line and the line number.

$ sed -n '/what/{=; p}' rhymes.txt
2
listen to what I say

$ sed -n '/what/{p; =}' rhymes.txt
listen to what I say
2

Address range

So far, filtering has been based on specific line number or lines matching the given REGEXP pattern. Address range gives the ability to define a starting address and an ending address separated by a comma.

# note that all the matching ranges are printed
$ sed -n '/to/,/pl/p' rhymes.txt
listen to what I say
go play in the park
There are so many delights to cherish
Apple, Banana and Cherry

# same as: sed -n '3,8!p'
$ seq 15 24 | sed '3,8d'
15
16
23
24

Line numbers and REGEXP filtering can be mixed.

$ sed -n '6,/utter/p' rhymes.txt
There are so many delights to cherish
Apple, Banana and Cherry
Bread, Butter and Jelly

# same as: sed '/play/Q' rhymes.txt
# inefficient, but this will work for multiple file inputs
$ sed '/play/,$d' rhymes.txt
it is a warm and cozy day
listen to what I say

If the second filtering condition doesn't match, lines starting from the first condition to the last line of the input will be matched.

# there's a line containing 'Banana' but the matching pair isn't found
$ sed -n '/Banana/,/XYZ/p' rhymes.txt
Apple, Banana and Cherry
Bread, Butter and Jelly
Try them all before you perish

The second address will always be used as a filtering condition only from the line that comes after the line that satisfied the first address. For example, if the same search pattern is used for both the addresses, there'll be at least two lines in output (assuming there are lines in the input after the first matching line).

$ sed -n '/w/,/w/p' rhymes.txt
it is a warm and cozy day
listen to what I say

# there's no line containing 'Cherry' after the 7th line
# so, rest of the file gets printed
$ sed -n '7,/Cherry/p' rhymes.txt
Apple, Banana and Cherry
Bread, Butter and Jelly
Try them all before you perish

As a special case, the first address can be 0 if the second one is a REGEXP filter. This allows the search pattern to be matched against the first line of the file.

# same as: sed '/cozy/q'
# inefficient, but this will work for multiple file inputs
$ sed -n '0,/cozy/p' rhymes.txt
it is a warm and cozy day

# same as: sed '/say/q'
$ sed -n '0,/say/p' rhymes.txt
it is a warm and cozy day
listen to what I say

Relative addressing

The grep command has an option -A that allows you to view lines that come after the matching lines. The sed command provides a similar feature when you prefix a + character to the number used in the second address. One difference compared to grep is that the context lines won't trigger a fresh matching of the first address.

# match a line containing 'the' and display the next line as well
# won't be same as: grep -A1 --no-group-separator 'the'
$ sed -n '/the/,+1p' rhymes.txt
go play in the park
come back before the sky turns dark
Try them all before you perish

# the first address can be a line number too
# helpful when it is programmatically constructed in a script
$ sed -n '6,+2p' rhymes.txt
There are so many delights to cherish
Apple, Banana and Cherry
Bread, Butter and Jelly

You can construct an arithmetic progression with start and step values separated by the ~ symbol. i~j will filter lines numbered i+0j, i+1j, i+2j, i+3j, etc. So, 1~2 means all odd numbered lines and 5~3 means 5th, 8th, 11th, etc.

# print even numbered lines
$ seq 10 | sed -n '2~2p'
2
4
6
8
10

# delete lines numbered 2+0*4, 2+1*4, 2+2*4, etc (2, 6, 10, etc)
$ seq 7 | sed '2~4d'
1
3
4
5
7

If i,~j is used (note the ,) then the meaning changes completely. After the start address, the closest line number which is a multiple of j will mark the end address. The start address can be REGEXP based filtering as well.

# here, closest multiple of 4 is the 4th line
$ seq 10 | sed -n '2,~4p'
2
3
4
# here, closest multiple of 4 is the 8th line
$ seq 10 | sed -n '5,~4p'
5
6
7
8

# line matching 'many' is the 6th line, closest multiple of 3 is the 9th line
$ sed -n '/many/,~3p' rhymes.txt
There are so many delights to cherish
Apple, Banana and Cherry
Bread, Butter and Jelly
Try them all before you perish

n and N commands

So far, the commands used have all been processing only one line at a time. The address range option provides the ability to act upon a group of lines, but the commands still operate one line at a time for that group. There are cases when you want a command to handle a string that contains multiple lines. As mentioned in the preface, this book will not cover advanced commands related to multiline processing and I highly recommend using awk or perl for such scenarios. However, this section will introduce two commands n and N which are relatively easier to use and will be seen in the coming chapters as well.

This is also a good place to get to know more details about how sed works. Quoting from sed manual: How sed Works:

sed maintains two data buffers: the active pattern space, and the auxiliary hold space. Both are initially empty.
sed operates by performing the following cycle on each line of input: first, sed reads one line from the input stream, removes any trailing newline, and places it in the pattern space. Then commands are executed; each command can have an address associated to it: addresses are a kind of condition code, and a command is only executed if the condition is verified before the command is to be executed.
When the end of the script is reached, unless the -n option is in use, the contents of pattern space are printed out to the output stream, adding back the trailing newline if it was removed. Then the next cycle starts for the next input line.

The pattern space buffer has only contained single line of input in all the examples seen so far. By using n and N commands, you can change the contents of the pattern space and use commands to act upon entire contents of this data buffer. For example, you can perform substitution on two or more lines at once.

First up, the n command. Quoting from sed manual: Often-Used Commands:

If auto-print is not disabled, print the pattern space, then, regardless, replace the pattern space with the next line of input. If there is no more input then sed exits without processing any more commands.

# same as: sed -n '2~2p'
# n will replace pattern space with the next line of input
# as -n option is used, the replaced line won't be printed
# the p command then prints the new line
$ seq 10 | sed -n 'n; p'
2
4
6
8
10

# if a line contains 't', replace pattern space with the next line
# substitute all 't' with 'TTT' for the new line thus fetched
# note that 't' wasn't substituted in the line that got replaced
# replaced pattern space gets printed as -n option is NOT used here
$ printf 'gates\nnot\nused\n' | sed '/t/{n; s/t/TTT/g}'
gates
noTTT
used

Next, the N command. Quoting from sed manual: Less Frequently-Used Commands:

Add a newline to the pattern space, then append the next line of input to the pattern space. If there is no more input then sed exits without processing any more commands.
When -z is used, a zero byte (the ascii 'NUL' character) is added between the lines (instead of a new line).

# append the next line to the pattern space
# and then replace newline character with a colon character
$ seq 7 | sed 'N; s/\n/:/'
1:2
3:4
5:6
7

# if line contains 'at', the next line gets appended to the pattern space
# then the substitution is performed on the two lines in the buffer
$ printf 'gates\nnot\nused\n' | sed '/at/{N; s/s\nnot/d/}'
gated
used

See also sed manual: N command on the last line. Escape sequences like \n will be discussed in detail later.

See grymoire: sed tutorial if you wish to explore about the data buffers in detail and learn about the various multiline commands.

Cheatsheet and summary

Note	Description
`ADDR cmd`	Execute cmd only if the input line satisfies the ADDR condition
	`ADDR` can be REGEXP or line number or a combination of them
`/at/d`	delete all lines satisfying the given REGEXP
`/at/!d`	don't delete lines matching the given REGEXP
`/twice/p`	print all lines based on the given REGEXP
	as print is the default action, usually `p` is paired with `-n`
`/not/ s/in/out/gp`	substitute only if line matches the given REGEXP
	and print only if the substitution succeeds
`/if/q`	quit immediately after printing the current pattern space
	further input files, if any, won't be processed
`/if/Q`	quit immediately without printing the current pattern space
`/at/q2`	both `q` and `Q` can additionally use `0-255` as the exit code
`-e 'cmd1' -e 'cmd2'`	execute multiple commands one after the other
`cmd1; cmd2`	execute multiple commands one after the other
	note that not all commands can be constructed this way
	commands can also be separated by a literal newline character
`ADDR {cmds}`	group one or more commands to be executed for given ADDR
	groups can be nested as well
	ex: `/in/{/not/{/you/p}}` conditional AND of 3 REGEXPs
`2p`	line addressing, print only the 2nd line
`$`	special address to indicate the last line of input
`2452{p; q}`	quit early to avoid processing unnecessary lines
`/not/=`	print line number instead of the matching line
`ADDR1,ADDR2`	start and end addresses to operate upon
	if ADDR2 doesn't match, lines till end of the file gets processed
`/are/,/by/p`	print all groups of line matching the REGEXPs
`3,8d`	delete lines numbered 3 to 8
`5,/use/p`	line number and REGEXP can be mixed
`0,/not/p`	inefficient equivalent of `/not/q` but works for multiple files
`ADDR,+N`	all lines matching the ADDR and `N` lines after
`i~j`	arithmetic progression with `i` as start and `j` as step
`ADDR,~j`	closest multiple of `j` w.r.t. the line matching the ADDR
pattern space	active data buffer, commands work on this content
`n`	if `-n` option isn't used, pattern space gets printed
	and then pattern space is replaced with the next line of input
	exit without executing other commands if there's no more input
`N`	add newline (or NUL for `-z`) to the pattern space
	and then append the next line of input
	exit without executing other commands if there's no more input

This chapter introduced the filtering capabilities of sed and how it can be combined with sed commands to process only lines of interest instead of the entire input contents. Filtering can be specified using a REGEXP, line number or a combination of them. You also learnt various ways to compose multiple sed commands. In the next chapter, you will learn syntax and features of regular expressions as supported by the sed command.

Exercises

The exercises directory has all the files used in this section.

1) For the given input, display except the third line.

$ seq 34 37 | sed ##### add your solution here
34
35
37

2) Display only the fourth, fifth, sixth and seventh lines for the given input.

$ seq 65 78 | sed ##### add your solution here
68
69
70
71

3) For the input file addr.txt, replace all occurrences of are with are not and is with is not only from line number 4 till the end of file. Also, only the lines that were changed should be displayed in the output.

$ cat addr.txt
Hello World
How are you
This game is good
Today is sunny
12345
You are funny

$ sed ##### add your solution here
Today is not sunny
You are not funny

4) Use sed to get the output shown below for the given input. You'll have to first understand the input to output transformation logic and then use commands introduced in this chapter to construct a solution.

$ seq 15 | sed ##### add your solution here
2
4
7
9
12
14

5) For the input file addr.txt, display all lines from the start of the file till the first occurrence of is.

$ sed ##### add your solution here
Hello World
How are you
This game is good

6) For the input file addr.txt, display all lines that contain is but not good.

$ sed ##### add your solution here
Today is sunny

7) n and N commands will not execute further commands if there are no more input lines to fetch. Correct the command shown below to get the expected output.

# wrong output
$ seq 11 | sed 'N; N; s/\n/-/g'
1-2-3
4-5-6
7-8-9
10
11

# expected output
$ seq 11 | sed ##### add your solution here
1-2-3
4-5-6
7-8-9
10-11

8) For the input file addr.txt, add line numbers in the format as shown below.

$ sed ##### add your solution here
1
Hello World
2
How are you
3
This game is good
4
Today is sunny
5
12345
6
You are funny

9) For the input file addr.txt, print all lines that contain are and the line that comes after, if any.

$ sed ##### add your solution here
How are you
This game is good
You are funny

Bonus: For the above input file, will sed -n '/is/,+1 p' addr.txt produce identical results as grep -A1 'is' addr.txt? If not, why?

10) Print all lines if their line numbers follow the sequence 1, 15, 29, 43, etc but not if the line contains 4 in it.

$ seq 32 100 | sed ##### add your solution here
32
60
88

11) For the input file sample.txt, display from the start of the file till the first occurrence of are, excluding the matching line.

$ cat sample.txt
Hello World

Hi there
How are you

Just do-it
Believe it

banana
papaya
mango

Much ado about nothing
He he he
Adios amigo

$ sed ##### add your solution here
Hello World

Hi there

12) For the input file sample.txt, display from the last occurrence of do till the end of the file.

##### add your solution here
Much ado about nothing
He he he
Adios amigo

13) For the input file sample.txt, display from the 9th line till a line containing go.

$ sed ##### add your solution here
banana
papaya
mango

14) For the input file sample.txt, display from a line containing it till the next line number that is divisible by 3.

$ sed ##### add your solution here
Just do-it
Believe it

banana

15) Display only the odd numbered lines from addr.txt.

$ sed ##### add your solution here
Hello World
This game is good
12345

CLI text processing with GNU sed