Control structures

sed supports two types of branching commands for constructing control structures. These commands (and other advanced features not discussed in this book) allow you to emulate a wide range of features that are common in programming languages. This chapter will show basic examples and you'll find some more use cases in a later chapter.

See also catonmat: A proof that Unix utility sed is Turing complete.

The example_files directory has all the files used in the examples.

Branch commands

Command	Description
`b label`	unconditionally branch to the specified `label`
`b`	skip rest of the commands and start next cycle
`t label`	branch to the specified `label` on successful substitution
`t`	on successful substitution, skip rest of the commands and start next cycle
`T label`	branch to the specified `label` if substitution fails
`T`	if substitution fails, skip rest of the commands and start next cycle

A label is specified by prefixing a command with :label where label is the name given to be referenced elsewhere with branching commands. Note that for the t command, any successful substitution since the last input line was read or a conditional branch will count. So, you could have few failed substitutions and a single successful substitution in any order and the branch will be taken. Similarly, the T command will branch only if there has been no successful substitution since the last input line was read or a conditional branch.

if-then-else

The branching commands can be used to construct control structures like if-then-else. For example, consider an input file with a column of numbers and the required task is to change positive numbers to negative and vice versa. If a line starts with the - character, you need to delete it. Else, you need to insert - at the start of line to convert positive numbers to negative. Both b and t commands can be used here as shown below.

$ cat nums.txt
3.14
-20000
-51
4567

# empty REGEXP section will reuse last REGEXP match, in this case /^-/
# also note the use of ; after {} command grouping
# 2nd substitute command will execute only if a line doesn't start with -
$ sed '/^-/{s///; b}; s/^/-/' nums.txt
-3.14
20000
51
-4567

# t command will come into play only if the 1st substitute command succeeds
# in which case the 2nd substitute command won't be executed
$ sed '/^-/ s///; t; s/^/-/' nums.txt
-3.14
20000
51
-4567

The T command will branch only if there has been no successful substitution since the last input was read or a conditional branch was taken. In other words, the commands after the T branch will be executed only if there has been at least one successful substitution.

# 2nd substitution will work only if the 1st one succeeds
# same as: sed '/o/{s//-/g; s/d/*/g}'
$ printf 'good\nbad\n' | sed 's/o/-/g; T; s/d/*/g'
g--*
bad

# append will work if any of the substitution succeeds
$ printf 'fig\ngood\nbad\nspeed\n' | sed 's/o/-/g; s/a/%/g; T; a ----'
fig
g--d
----
b%d
----
speed

loop

When branching commands are used without label arguments, sed will skip rest of the script and starts processing the next line from input. By marking a command location with a label, you can branch to that particular location when required. In such cases, sed is still processing the current pattern space.

The below example replaces all consecutive digit characters from the start of line with * characters. :a labels the substitute command as a and ta would branch to this label if the substitute command succeeds. Effectively, you get a looping mechanism to replace the current line as long as the substitute condition is satisfied.

# same as: perl -pe 's/\G\d/*/g'
# first, * is matched 0 times followed by the digit 1
# next, * is matched 1 times followed by the digit 2
# then, * is matched 2 times followed by the digit 3
# and so on until the space character breaks the loop
$ echo '12345 hello42' | sed -E ':a s/^(\**)[0-9]/\1*/; ta'
***** hello42

# here, the x character breaks the loop
$ echo '123x45 hello42' | sed -E ':a s/^(\**)[0-9]/\1*/; ta'
***x45 hello42
# no change as the input didn't start with a number
$ echo 'hi 12345 hello42' | sed -E ':a s/^(\**)[0-9]/\1*/; ta'
hi 12345 hello42

For debugging purposes, which also helps beginners to understand this command better, unroll the loop and test the command. For the above example, try sed -E 's/^(\**)[0-9]/\1*/' followed by sed -E 's/^(\**)[0-9]/\1*/; s//\1*/' and so on.

Space between : and the label is optional. Similarly, space between the branch command and the target label is optional.

Here's an example for field processing. awk and perl are better suited for field processing, but in some cases sed might be convenient because the rest of the text processing is already being done in sed.

# replace space with underscore only for the 2nd column
# [^,]*, captures first column (comma is the field separator)
# [^ ,]* matches non-space and non-comma characters
# end of line or another comma will break the loop
$ echo 'he be me,1 2 3 4,nice slice' | sed -E ':b s/^([^,]*,[^ ,]*) /\1_/; tb'
he be me,1_2_3_4,nice slice

The looping construct also helps to emulate certain advanced regular expression features not available in sed like lookarounds (see stackoverflow: regex faq for details on lookarounds). See my blog post Emulating regexp lookarounds in GNU sed for more examples.

# replace empty fields with NA
# simple replacement won't work for the ,,, case
$ echo '1,,,two,,3' | sed 's/,,/,NA,/g'
1,NA,,two,NA,3
# looping to the rescue
$ echo '1,,,two,,3' | sed -E ':c s/,,/,NA,/g; tc'
1,NA,NA,two,NA,3

The below example has similar solution to the previous example, but the problem statement is different and cannot be solved using lookarounds. Here, the act of performing substitution results in an output string that will again match the search pattern.

# deleting 'fin' results in 'cofing' which can again match 'fin'
$ echo 'coffining' | sed 's/fin//'
cofing
# add more s commands if the number of times to substitute is known
$ echo 'coffining' | sed 's/fin//; s///'
cog
# use loop when the count isn't known
$ echo 'coffining' | sed ':d s/fin//; td'
cog

Cheatsheet and summary

Note	Description
`b label`	unconditionally branch to the specified `label`
`b`	skip rest of the commands and start next cycle
`t label`	branch to the specified `label` on successful substitution
`t`	on successful substitution, skip rest of the commands and start next cycle
`T label`	branch to the specified `label` if substitution fails
`T`	if substitution fails, skip rest of the commands and start next cycle

This chapter introduced branching commands that can be used to emulate programming features like if-else and loops. These are handy for certain cases, especially when combined with filtering features of sed. Speaking of filtering features, the next chapter will focus entirely on using address range for various use cases.

Exercises

The exercises directory has all the files used in this section.

1) Using the input file para.txt, create a file named markers.txt with all lines that contain start or end (matched case insensitively) and a file named rest.txt with the remaining lines.

$ sed ##### add your solution here

$ cat markers.txt
good start
Start working on that
to, do not let it end
start and try to
finish the End

$ cat rest.txt
project you always wanted
hi there
bye

2) For the input file addr.txt:

if a line contains e, surround all consecutive repeated characters with {} as well as uppercase those characters
else, if a line contains u, surround all uppercase letters in that line with []

# note that H in the second line and Y in the last line aren't modified
$ sed ##### add your solution here
He{LL}o World
How are you
This game is g{OO}d
[T]oday is sunny
12345
You are fu{NN}y

3) The nums.txt file uses a space character as the field separator. The first field of some lines has one or more numbers separated by the - character. Surround such numbers in the first field with [] as shown below.

$ cat nums.txt
123-87-593 42-3 fig
apple 42-42-42 1000 banana 4-3
53783-0913 hi 3 4-2
1000 guava mango

$ sed ##### add your solution here
[123]-[87]-[593] 42-3 fig
apple 42-42-42 1000 banana 4-3
[53783]-[0913] hi 3 4-2
[1000] guava mango

4) Convert the contents of headers.txt such that it matches the content of anchors.txt. The input file headers.txt contains one header per line, starting with one or more # characters followed by a space character and then followed by the heading. You have to convert such headings into anchor tags as shown by the contents of anchors.txt. Save the output in out.txt.

$ cat headers.txt
# Regular Expressions
## Subexpression calls
## The dot meta character
$ cat anchors.txt
# <a name="regular-expressions"></a>Regular Expressions
## <a name="subexpression-calls"></a>Subexpression calls
## <a name="the-dot-meta-character"></a>The dot meta character

$ sed ##### add your solution here
$ diff -s out.txt anchors.txt
Files out.txt and anchors.txt are identical

5) What is the difference between t and T commands?

6) The blocks.txt file uses %=%= to separate group of lines. Display the first such group.

$ cat blocks.txt
%=%=
apple
banana
%=%=
3.14
42
1000
%=%=
brown
green
%=%=
hi
hello there
bye

$ sed ##### add your solution here
%=%=
apple
banana

CLI text processing with GNU sed