Control structures

sed supports two types of branching commands that helps to construct control structures. These commands (and other advanced features not discussed in this book) allow you to emulate a wide range of features that are common in programming languages. This chapter will show basic examples and you'll find some more use cases in a later chapter.

info See also catonmat: A proof that Unix utility sed is Turing complete

Branch commands

CommandDescription
b labelunconditionally branch to specified label
bskip rest of the commands and start next cycle
t labelbranch to specified label on successful substitution
ton successful substitution, skip rest of the commands and start next cycle
T labelbranch to specified label if substitution fails
Tif substitution fails, skip rest of the commands and start next cycle

A label is specified by prefixing a command with :label where label is the name given to be referenced elsewhere with branching commands. Note that for t command, any successful substitution since the last input line was read or a conditional branch will count. So, you could have few failed substitutions and a single successful substitution in any order and the branch will be taken. Similarly, T command will branch only if there has been no successful substitution since the last input line was read or a conditional branch.

if-then-else

The branching commands can be used to construct control structures like if-then-else. For example, consider an input file containing numbers in a single column and the task required is to change positive numbers to negative and vice versa. If the line starts with - character, you need to delete it and process next input line. Else, you need to insert - at start of line to convert positive numbers to negative. Both b and t commands can be used here as shown below.

$ cat nums.txt
3.14
-20000
-51
4567

$ # empty REGEXP section will reuse last REGEXP match, in this case /^-/
$ # also note the use of ; after {} command grouping
$ # 2nd substitute command will execute only if line doesn't start with -
$ sed '/^-/{s///; b}; s/^/-/' nums.txt
-3.14
20000
51
-4567

$ # t command will come into play if the 1st substitute command succeeds
$ # and thus skip the 2nd substitute command
$ sed '/^-/ s///; t; s/^/-/' nums.txt
-3.14
20000
51
-4567

The T command will branch only if there has been no successful substitution since the last input was read or conditional branch. Rephrased it another way, the commands after the T branch will be executed only if there has been at least one successful substitution.

$ # 2nd substitution will work only if 1st one succeeds
$ # same as: sed '/o/{s//-/g; s/d/*/g}'
$ printf 'good\nbad\n' | sed 's/o/-/g; T; s/d/*/g'
g--*
bad

$ # append will work if any of the substitution succeeds
$ printf 'good\nbad\nneed\n' | sed 's/o/-/g; s/a/%/g; T; a ----'
g--d
----
b%d
----
need

loop

Without labels, branching commands will skip rest of the commands and then start processing the next line from input. By marking a command location with a label, you can branch to that particular location when required. In this case, you'll still be processing the current pattern space.

The below example replaces all consecutive digit characters from start of line with * character. :a marks the substitute command with label named a and ta would branch to label a if the substitute command succeeds. Effectively, you get a looping mechanism to replace the current line as long as the substitute condition is satisfied.

$ # same as: perl -pe 's/\G\d/*/g'
$ # first, * is matched 0 times followed by the digit 1
$ # next, * is matched 1 times followed by the digit 2
$ # then, * is matched 2 times followed by the digit 3
$ # and so on until the space character breaks the loop
$ echo '12345 hello42' | sed -E ':a s/^(\**)[0-9]/\1*/; ta'
***** hello42

$ # here, the x character breaks the loop
$ echo '123x45 hello42' | sed -E ':a s/^(\**)[0-9]/\1*/; ta'
***x45 hello42
$ # no change as the input didn't start with a number
$ echo 'hi 12345 hello42' | sed -E ':a s/^(\**)[0-9]/\1*/; ta'
hi 12345 hello42

For debugging purposes, which also helps beginners to understand this command better, unroll the loop and test the command. For the above example, try sed -E 's/^(\**)[0-9]/\1*/' followed by sed -E 's/^(\**)[0-9]/\1*/; s//\1*/' and so on.

info Space between : and label name is optional. Similarly, space between branch command and target label is optional.

Here's an example for field processing. awk and perl are better suited for field processing, but in some cases sed might be convenient because rest of the text processing is already in sed and so on.

$ # replace space with underscore only in the 2nd column
$ # [^,]*, captures first column delimited by comma character
$ # [^ ,]* matches non-space and non-comma characters
$ # end of line or another comma will break the loop
$ echo 'he be me,1 2 3 4,nice slice' | sed -E ':b s/^([^,]*,[^ ,]*) /\1_/; tb'
he be me,1_2_3_4,nice slice

The looping construct also helps to emulate certain advanced regular expression features not available in sed like lookarounds (see stackoverflow: regex faq for details on lookarounds).

$ # replace empty fields with NA
$ # simple replacement won't work for ,,, case
$ echo '1,,,two,,3' | sed 's/,,/,NA,/g'
1,NA,,two,NA,3
$ # looping to the rescue
$ echo '1,,,two,,3' | sed -E ':c s/,,/,NA,/g; tc'
1,NA,NA,two,NA,3

info See my blog post Emulating regexp lookarounds in GNU sed for more examples.

The below example has similar solution to previous example, but the problem statement is different and cannot be solved using lookarounds. Here, the act of performing substitution results in an output string that will again match the search pattern.

$ # deleting 'fin' results in 'cofing' which can again match 'fin'
$ echo 'coffining' | sed 's/fin//'
cofing
$ # add more s commands if number of times to substitute is known
$ echo 'coffining' | sed 's/fin//; s///'
cog
$ # use loop when the count isn't known
$ echo 'coffining' | sed ':d s/fin//; td'
cog

Cheatsheet and summary

NoteDescription
b labelunconditionally branch to specified label
bskip rest of the commands and start next cycle
t labelbranch to specified label on successful substitution
ton successful substitution, skip rest of the commands and start next cycle
T labelbranch to specified label if substitution fails
Tif substitution fails, skip rest of the commands and start next cycle

This chapter introduced branching commands that can be used to emulate programming features like if-else and loops. These are handy for certain cases, especially when combined with filtering features of sed. Speaking of filtering features, the next chapter will focus entirely on using address range for various use cases.

Exercises

a) Using the input file para.txt, create a file named markers.txt with all lines that contain start or end (matched case insensitively) and a file named rest.txt with rest of the lines.

$ sed ##### add your solution here
$ cat markers.txt 
good start
Start working on that
to, do not let it end
start and try to
finish the End
$ cat rest.txt 
project you always wanted
hi there
bye

b) For the input file addr.txt:

  • if line contains e, surround all consecutive repeated characters with {} as well as uppercase those characters
  • if line doesn't contain e but contains u, surround all uppercase letters in that line with []
$ # note that H in second line and Y in last line isn't modified
$ sed ##### add your solution here
He{LL}o World
How are you
This game is g{OO}d
[T]oday is sunny
12345
You are fu{NN}y

c) The given sample strings below has multiple fields separated by a space. The first field has numbers separated by - character. Surround these numbers in first field with []

$ echo '123-87-593 42-3 foo' | sed ##### add your solution here
[123]-[87]-[593] 42-3 foo

$ echo '53783-0913 hi 3 4-2' | sed ##### add your solution here
[53783]-[0913] hi 3 4-2

d) Convert the contents of headers.txt such that it matches the content of anchors.txt. The input file headers.txt contains one header per line, starting with one or more # character followed by a space character and then followed by the heading. You have to convert this heading into anchor tag as shown by the contents of anchors.txt.

$ cat headers.txt
# Regular Expressions
## Subexpression calls
## The dot meta character
$ cat anchors.txt
# <a name="regular-expressions"></a>Regular Expressions
## <a name="subexpression-calls"></a>Subexpression calls
## <a name="the-dot-meta-character"></a>The dot meta character

$ sed ##### add your solution here headers.txt > out.txt
$ diff -s out.txt anchors.txt
Files out.txt and anchors.txt are identical