Selective editing

By default, sed acts on the entire input content. Many a times, you only want to act upon specific portions of the input. To that end, sed has features to filter lines, similar to tools like grep, head and tail. sed can replicate most of grep's filtering features without too much fuss. And has features like line number based filtering, selecting lines between two patterns, relative addressing, etc which isn't possible with grep. If you are familiar with functional programming, you would have come across map, filter, reduce paradigm. A typical task with sed involves filtering a subset of input and then modifying (mapping) them. Sometimes, the subset is the entire input, as seen in the examples of previous chapters.

info A tool optimized for a particular functionality should be preferred where possible. grep, head and tail would be better performance wise compared to sed for equivalent line filtering solutions.

For some of the examples, equivalent commands will also be shown as comments for learning purposes.

Conditional execution

As seen earlier, the syntax for substitute command is s/REGEXP/REPLACEMENT/FLAGS. The /REGEXP/FLAGS portion can be used as a conditional expression to allow commands to execute only for the lines matching the pattern.

$ # change commas to hyphens only if the input line contains '2'
$ # space between the filter and the command is optional
$ printf '1,2,3,4\na,b,c,d\n' | sed '/2/ s/,/-/g'
1-2-3-4
a,b,c,d

Use /REGEXP/FLAGS! to act upon lines other than the matching ones.

$ # change commas to hyphens if the input line does NOT contain '2'
$ # space around ! is optional
$ printf '1,2,3,4\na,b,c,d\n' | sed '/2/! s/,/-/g'
1,2,3,4
a-b-c-d

/REGEXP/ is one of the ways to define a filter in sed, termed as address in the manual. Others will be covered in sections to come in this chapter.

Delete command

To delete the filtered lines, use the d command. Recall that all input lines are printed by default.

$ # same as: grep -v 'at'
$ printf 'sea\neat\ndrop\n' | sed '/at/d'
sea
drop

To get the default grep filtering, use !d combination. Sometimes, negative logic can get confusing to use. It boils down to personal preference, similar to choosing between if and unless conditionals in programming languages.

$ # same as: grep 'at'
$ printf 'sea\neat\ndrop\n' | sed '/at/!d'
eat

info Using an address is optional. So, for example, sed '!d' file would be equivalent to the cat file command.

To print the filtered lines, use the p command. But, recall that all input lines are printed by default. So, this command is typically used in combination with -n command line option, which would turn off the default printing.

$ cat programming_quotes.txt
Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it by Brian W. Kernighan

Some people, when confronted with a problem, think - I know, I will
use regular expressions. Now they have two problems by Jamie Zawinski

A language that does not affect the way you think about programming,
is not worth knowing by Alan Perlis

There are 2 hard problems in computer science: cache invalidation,
naming things, and off-by-1 errors by Leon Bambrick

$ # same as: grep 'twice' programming_quotes.txt
$ sed -n '/twice/p' programming_quotes.txt
Debugging is twice as hard as writing the code in the first place.
$ # same as: grep 'e th' programming_quotes.txt
$ sed -n '/e th/p' programming_quotes.txt
Therefore, if you write the code as cleverly as possible, you are,
A language that does not affect the way you think about programming,

The substitute command provides p as a flag. In such a case, the modified line would be printed only if the substitution succeeded.

$ # same as: grep '1' programming_quotes.txt | sed 's/1/one/g'
$ sed -n 's/1/one/gp' programming_quotes.txt
naming things, and off-by-one errors by Leon Bambrick

$ # filter + substitution + p combination
$ # same as: grep 'not' programming_quotes.txt | sed 's/in/**/g'
$ sed -n '/not/ s/in/**/gp' programming_quotes.txt
by def**ition, not smart enough to debug it by Brian W. Kernighan
A language that does not affect the way you th**k about programm**g,
is not worth know**g by Alan Perlis

Using !p with -n option will be equivalent to using the d command.

$ # same as: sed '/at/d'
$ printf 'sea\neat\ndrop\n' | sed -n '/at/!p'
sea
drop

Here's an example of using p command without the -n option.

$ # duplicate every line
$ seq 2 | sed 'p'
1
1
2
2

Quit commands

Using q command will exit sed immediately, without any further processing.

$ # quits after an input line containing 'if' is found
$ sed '/if/q' programming_quotes.txt
Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,

Q command is similar to q but won't print the matching line.

$ # matching line won't be printed
$ sed '/if/Q' programming_quotes.txt
Debugging is twice as hard as writing the code in the first place.

Use tac to get all lines starting from last occurrence of the search string with respect to entire file content.

$ tac programming_quotes.txt | sed '/not/q' | tac
is not worth knowing by Alan Perlis

There are 2 hard problems in computer science: cache invalidation,
naming things, and off-by-1 errors by Leon Bambrick

You can optionally provide an exit status (from 0 to 255) along with the quit commands.

$ printf 'sea\neat\ndrop\n' | sed '/at/q2'
sea
eat
$ echo $?
2

$ printf 'sea\neat\ndrop\n' | sed '/at/Q3'
sea
$ echo $?
3

warning Be careful if you want to use q or Q commands with multiple files, as sed will stop even if there are other files to process. You could use a mixed address range as a workaround. See also unix.stackexchange: applying q to multiple files.

Multiple commands

Commands seen so far can be specified more than once by separating them using ; or using the -e command line option. See sed manual: Multiple commands syntax for more details.

$ # print all input lines as well as modified lines
$ printf 'sea\neat\ndrop\n' | sed -n -e 'p' -e 's/at/AT/p'
sea
eat
eAT
drop

$ # equivalent command to above example using ; instead of -e
$ # space around ; is optional
$ printf 'sea\neat\ndrop\n' | sed -n 'p; s/at/AT/p'
sea
eat
eAT
drop

Another way is to separate the commands using a literal newline character. If more than 2-3 lines are needed, it is better to use a sed script instead.

$ # here, each command is separated by a literal newline character
$ # > at the start of line indicates continuation of a multiline shell command
$ sed -n '
> /not/ s/in/**/gp
> s/1/one/gp
> s/2/two/gp
> ' programming_quotes.txt
by def**ition, not smart enough to debug it by Brian W. Kernighan
A language that does not affect the way you th**k about programm**g,
is not worth know**g by Alan Perlis
There are two hard problems in computer science: cache invalidation,
naming things, and off-by-one errors by Leon Bambrick

warning Do not use multiple commands to construct conditional OR of multiple search strings, as you might get lines duplicated in the output. For example, check what output you get for sed -ne '/use/p' -e '/two/p' programming_quotes.txt command. You can use regular expression feature alternation for such cases.

To execute multiple commands for a common filter, use {} to group the commands. You can also nest them if needed.

$ # same as: sed -n 'p; s/at/AT/p'
$ printf 'sea\neat\ndrop\n' | sed '/at/{p; s/at/AT/}'
sea
eat
eAT
drop

$ # spaces around {} is optional
$ printf 'gates\nnot\nused\n' | sed '/e/{s/s/*/g; s/t/*/g}'
ga*e*
not
u*ed

Command grouping is an easy way to construct conditional AND of multiple search strings.

$ # same as: grep 'in' programming_quotes.txt | grep 'not'
$ sed -n '/in/{/not/p}' programming_quotes.txt
by definition, not smart enough to debug it by Brian W. Kernighan
A language that does not affect the way you think about programming,
is not worth knowing by Alan Perlis

$ # same as: grep 'in' programming_quotes.txt | grep 'not' | grep 'you'
$ sed -n '/in/{/not/{/you/p}}' programming_quotes.txt
A language that does not affect the way you think about programming,

$ # same as: grep 'not' programming_quotes.txt | grep -v 'you'
$ sed -n '/not/{/you/!p}' programming_quotes.txt
by definition, not smart enough to debug it by Brian W. Kernighan
is not worth knowing by Alan Perlis

Other solutions using alternation feature of regular expressions and sed's control structures will be discussed later.

Line addressing

Line numbers can also be used as a filtering criteria.

$ # here, 3 represents the address for the print command
$ # same as: head -n3 programming_quotes.txt | tail -n1 and sed '3!d'
$ sed -n '3p' programming_quotes.txt
by definition, not smart enough to debug it by Brian W. Kernighan

$ # print 2nd and 5th line
$ sed -n '2p; 5p' programming_quotes.txt
Therefore, if you write the code as cleverly as possible, you are,
Some people, when confronted with a problem, think - I know, I will

$ # substitution only on 2nd line
$ printf 'gates\nnot\nused\n' | sed '2 s/t/*/g'
gates
no*
used

As a special case, $ indicates the last line of the input.

$ # same as: tail -n1 programming_quotes.txt
$ sed -n '$p' programming_quotes.txt
naming things, and off-by-1 errors by Leon Bambrick

For large input files, use q command to avoid processing unnecessary input lines.

$ seq 3542 4623452 | sed -n '2452{p; q}'
5993
$ seq 3542 4623452 | sed -n '250p; 2452{p; q}'
3791
5993

$ # here is a sample time comparison
$ time seq 3542 4623452 | sed -n '2452{p; q}' > f1
real    0m0.003s
$ time seq 3542 4623452 | sed -n '2452p' > f2
real    0m0.140s

Mimicking head command using line addressing and the q command.

$ # same as: seq 23 45 | head -n5
$ seq 23 45 | sed '5q'
23
24
25
26
27

The = command will display the line numbers of matching lines.

$ # gives both line number and matching line
$ grep -n 'not' programming_quotes.txt
3:by definition, not smart enough to debug it by Brian W. Kernighan
8:A language that does not affect the way you think about programming,
9:is not worth knowing by Alan Perlis

$ # gives only line number of matching line
$ # note the use of -n option to avoid default printing
$ sed -n '/not/=' programming_quotes.txt
3
8
9

If needed, matching line can also be printed. But there will be a newline character between the matching line and line number.

$ sed -n '/off/{=; p}' programming_quotes.txt
12
naming things, and off-by-1 errors by Leon Bambrick

$ sed -n '/off/{p; =}' programming_quotes.txt
naming things, and off-by-1 errors by Leon Bambrick
12

Address range

So far, filtering has been based on specific line number or lines matching the given /REGEXP/FLAGS pattern. Address range gives the ability to define a starting address and an ending address, separated by a comma.

$ # note that all the matching ranges are printed
$ sed -n '/are/,/by/p' programming_quotes.txt
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it by Brian W. Kernighan
There are 2 hard problems in computer science: cache invalidation,
naming things, and off-by-1 errors by Leon Bambrick

$ # same as: sed -n '3,8!p'
$ seq 15 24 | sed '3,8d'
15
16
23
24

Line numbers and string matching can be mixed.

$ sed -n '5,/use/p' programming_quotes.txt
Some people, when confronted with a problem, think - I know, I will
use regular expressions. Now they have two problems by Jamie Zawinski

$ # same as: sed '/smart/Q'
$ # inefficient, but this will work for multiple file inputs
$ sed '/smart/,$d' programming_quotes.txt
Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,

If the second filtering condition doesn't match, lines starting from the first condition to the last line of the input will be matched.

$ # there's a line containing 'affect' but doesn't have matching pair
$ sed -n '/affect/,/XYZ/p' programming_quotes.txt
A language that does not affect the way you think about programming,
is not worth knowing by Alan Perlis

There are 2 hard problems in computer science: cache invalidation,
naming things, and off-by-1 errors by Leon Bambrick

The second address will always be used as a filtering condition only from the line that comes after the line that satisfied the first address. For example, if the same search pattern is used for both the addresses, there'll be at least two lines in output (provided there are lines in the input after the first matching line).

$ # there's no line containing 'worth' after the 9th line
$ # so, rest of the file gets matched
$ sed -n '9,/worth/p' programming_quotes.txt
is not worth knowing by Alan Perlis

There are 2 hard problems in computer science: cache invalidation,
naming things, and off-by-1 errors by Leon Bambrick

As a special case, the first address can be 0 if the second one is a search pattern. This allows the search pattern to be matched against the first line of the file.

$ # same as: sed '/in/q'
$ # inefficient, but this will work for multiple file inputs
$ sed -n '0,/in/p' programming_quotes.txt
Debugging is twice as hard as writing the code in the first place.

$ # same as: sed '/not/q'
$ sed -n '0,/not/p' programming_quotes.txt
Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it by Brian W. Kernighan

Relative addressing

Prefixing + to line number as the second address gives relative filtering. This is similar to using grep -A<num> --no-group-separator but grep will start a new group if a line matches within context lines.

$ # line matching 'not' and 2 lines after
$ # won't be same as: grep -A2 --no-group-separator 'not'
$ sed -n '/not/,+2p' programming_quotes.txt
by definition, not smart enough to debug it by Brian W. Kernighan

Some people, when confronted with a problem, think - I know, I will
A language that does not affect the way you think about programming,
is not worth knowing by Alan Perlis

$ # the first address can be a line number too
$ # helpful when it is programmatically constructed in a script
$ sed -n '5,+1p' programming_quotes.txt
Some people, when confronted with a problem, think - I know, I will
use regular expressions. Now they have two problems by Jamie Zawinski

You can construct an arithmetic progression with start and step values separated by the ~ symbol. i~j will filter lines numbered i+0j, i+1j, i+2j, i+3j, etc. So, 1~2 means all odd numbered lines and 5~3 means 5th, 8th, 11th, etc.

$ # print even numbered lines
$ seq 10 | sed -n '2~2p'
2
4
6
8
10

$ # delete lines numbered 2+0*4, 2+1*4, 2+2*4, etc
$ seq 7 | sed '2~4d'
1
3
4
5
7

If i,~j is used (note the ,) then the meaning changes completely. After the start address, the closest line number which is a multiple of j will mark the end address. The start address can be specified using search pattern as well.

$ # here, closest multiple of 4 is 4th line
$ seq 10 | sed -n '2,~4p'
2
3
4
$ # here, closest multiple of 4 is 8th line
$ seq 10 | sed -n '5,~4p'
5
6
7
8

$ # line matching on 'regular' is 6th line, so ending is 9th line
$ sed -n '/regular/,~3p' programming_quotes.txt
use regular expressions. Now they have two problems by Jamie Zawinski

A language that does not affect the way you think about programming,
is not worth knowing by Alan Perlis

n and N commands

So far, the commands used have all been processing only one line at a time. The address range option provides the ability to act upon a group of lines, but the commands still operate one line at a time for that group. There are cases when you want a command to handle a string that contains multiple lines. As mentioned in the preface, this book will not cover advanced commands related to multiline processing and I highly recommend using awk or perl for such scenarios. However, this section will introduce two commands n and N which are relatively easier to use and will be seen in coming chapters as well.

This is also a good place to give more details about how sed works. Quoting from sed manual: How sed Works:

sed maintains two data buffers: the active pattern space, and the auxiliary hold space. Both are initially empty.
sed operates by performing the following cycle on each line of input: first, sed reads one line from the input stream, removes any trailing newline, and places it in the pattern space. Then commands are executed; each command can have an address associated to it: addresses are a kind of condition code, and a command is only executed if the condition is verified before the command is to be executed.
When the end of the script is reached, unless the -n option is in use, the contents of pattern space are printed out to the output stream, adding back the trailing newline if it was removed. Then the next cycle starts for the next input line.

The pattern space buffer has only contained single line of input in all the examples seen so far. By using n and N commands, you can change the contents of the pattern space and use commands to act upon entire contents of this data buffer. For example, you can perform substitution on two or more lines at once.

First up, the n command. Quoting from sed manual: Often-Used Commands:

If auto-print is not disabled, print the pattern space, then, regardless, replace the pattern space with the next line of input. If there is no more input then sed exits without processing any more commands.

$ # same as: sed -n '2~2p'
$ # n will replace pattern space with the next line of input
$ # as -n option is used, the replaced line won't be printed
$ # then the new line is printed as p command is used
$ seq 10 | sed -n 'n; p'
2
4
6
8
10

$ # if line contains 't', replace pattern space with the next line
$ # substitute all 't' with 'TTT' for the new line thus fetched
$ # note that 't' wasn't substituted in the line that got replaced
$ # replaced pattern space gets printed as -n option is NOT used here
$ printf 'gates\nnot\nused\n' | sed '/t/{n; s/t/TTT/g}'
gates
noTTT
used

Next, the N command. Quoting from sed manual: Less Frequently-Used Commands:

Add a newline to the pattern space, then append the next line of input to the pattern space. If there is no more input then sed exits without processing any more commands.
When -z is used, a zero byte (the ascii ‘NUL’ character) is added between the lines (instead of a new line).

$ # append the next line to the pattern space
$ # and then replace newline character with colon character
$ seq 7 | sed 'N; s/\n/:/'
1:2
3:4
5:6
7

$ # if line contains 'at', the next line gets appended to the pattern space
$ # then the substitution is performed on the two lines in the buffer
$ printf 'gates\nnot\nused\n' | sed '/at/{N; s/s\nnot/d/}'
gated
used

info See also sed manual: N command on the last line. Escape sequences like \n will be discussed in detail later.

info See grymoire: sed tutorial if you wish to explore about the data buffers in detail and learn about the various multiline commands.

Cheatsheet and summary

NoteDescription
ADDR cmdExecute cmd only if input line satisfies the ADDR condition
ADDR can be REGEXP or line number or a combination of them
/at/ddelete all lines based on the given REGEXP
/at/!ddon't delete lines matching the given REGEXP
/twice/pprint all lines based on the given REGEXP
as print is default action, usually p is paired with -n option
/not/ s/in/**/gpsubstitute only if line matches given REGEXP
and print only if the substitution succeeds
/if/qquit immediately after printing current pattern space
further input files, if any, won't be processed
/if/Qquit immediately without printing current pattern space
/at/q2both q and Q can additionally use 0-255 as exit code
-e 'cmd1' -e 'cmd2'execute multiple commands one after the other
cmd1; cmd2execute multiple commands one after the other
note that not all commands can be constructed this way
commands can also be separated by literal newline character
ADDR {cmds}group one or more commands to be executed for given ADDR
groups can be nested as well
ex: /in/{/not/{/you/p}} conditional AND of 3 REGEXPs
2pline addressing, print only 2nd line
$special address to indicate last line of input
2452{p; q}quit early to avoid processing unnecessary lines
/not/=print line number instead of matching line
ADDR1,ADDR2start and end addresses to operate upon
if ADDR2 doesn't match, lines till end of file gets processed
/are/,/by/pprint all groups of line matching the REGEXPs
3,8ddelete lines numbered 3 to 8
5,/use/pline number and REGEXP can be mixed
0,/not/pinefficient equivalent of /not/q but works for multiple files
ADDR,+Nall lines matching the ADDR and N lines after
i~jarithmetic progression with i as start and j as step
ADDR,~jclosest multiple of j w.r.t. line matching the ADDR
pattern spaceactive data buffer, commands work on this content
nif -n option isn't used, pattern space gets printed
and then pattern space is replaced with the next line of input
exit without executing other commands if there's no more input
Nadd newline (or NUL for -z) to the pattern space
and then append next line of input
exit without executing other commands if there's no more input

This chapter introduced the filtering capabilities of sed and how it can be combined with sed commands to process only lines of interest instead of the entire input contents. Filtering can be specified using a REGEXP, line number or a combination of them. You also learnt various ways to compose multiple sed commands. In the next chapter, you will learn syntax and features of regular expressions as implemented in sed command.

Exercises

a) Remove only the third line of given input.

$ seq 34 37 | sed ##### add your solution here
34
35
37

b) Display only fourth, fifth, sixth and seventh lines for the given input.

$ seq 65 78 | sed ##### add your solution here
68
69
70
71

c) For the input file addr.txt, replace all occurrences of are with are not and is with is not only from line number 4 till end of file. Also, only the lines that were changed should be displayed in the output.

$ cat addr.txt
Hello World
How are you
This game is good
Today is sunny
12345
You are funny

$ sed ##### add your solution here
Today is not sunny
You are not funny

d) Use sed to get the output shown below for the given input. You'll have to first understand the logic behind input to output transformation and then use commands introduced in this chapter to construct a solution.

$ seq 15 | sed ##### add your solution here
2
4
7
9
12
14

e) For the input file addr.txt, display all lines from start of the file till the first occurrence of game.

$ sed ##### add your solution here
Hello World
How are you
This game is good

f) For the input file addr.txt, display all lines that contain is but not good.

$ sed ##### add your solution here
Today is sunny

g) See Gotchas and Tricks chapter and correct the command to get the output as shown below.

$ # wrong output
$ seq 11 | sed 'N; N; s/\n/-/g'
1-2-3
4-5-6
7-8-9
10
11

$ # expected output
$ seq 11 | sed ##### add your solution here
1-2-3
4-5-6
7-8-9
10-11

h) For the input file addr.txt, add line numbers in the format as shown below.

$ sed ##### add your solution here
1
Hello World
2
How are you
3
This game is good
4
Today is sunny
5
12345
6
You are funny

i) For the input file addr.txt, print all lines that contain are and the line that comes after such a line, if any.

$ sed ##### add your solution here
How are you
This game is good
You are funny

Bonus: For the above input file, will sed -n '/is/,+1 p' addr.txt produce identical results as grep -A1 'is' addr.txt? If not, why?

j) Print all lines if their line numbers follow the sequence 1, 15, 29, 43, etc but not if the line contains 4 in it.

$ seq 32 100 | sed ##### add your solution here
32
60
88