Flags

Just like options change the default behavior of shell commands, flags are used to change aspects of regular expressions. Some of the flags like g and p have been already discussed. For completeness sake, they will be discussed again in this chapter. In regular expression parlance, flags are also known as modifiers.

info The example_files directory has all the files used in the examples.

Case insensitive matching

The I flag allows to match a pattern case insensitively.

# match 'cat' case sensitively
$ printf 'Cat\ncOnCaT\nscatter\ncot\n' | sed -n '/cat/p'
scatter

# match 'cat' case insensitively
# note that a command (ex: p) cannot be used before flags
$ printf 'Cat\ncOnCaT\nscatter\ncot\n' | sed -n '/cat/Ip'
Cat
cOnCaT
scatter

# match 'cat' case insensitively and replace it with 'dog'
$ printf 'Cat\ncOnCaT\nscatter\ncot\n' | sed 's/cat/dog/I'
dog
cOndog
sdogter
cot

info Usually i is used for such purposes, grep -i for example. But i is a command (discussed in the append, change, insert chapter), so /REGEXP/i cannot be used. The substitute command allows both i and I, but I is recommended for consistency.

Changing case in the replacement section

This section is presented in this chapter to complement the I flag. sed provides escape sequences to change the case of replacement strings, which might include backreferences, shell variables, etc.

SequenceDescription
\Eindicates the end of case conversion
\lconvert the next character to lowercase
\uconvert the next character to uppercase
\Lconvert the following characters to lowercase (overridden by \U or \E)
\Uconvert the following characters to uppercase (overridden by \L or \E)

First up, changing case of only the immediate next character after the escape sequence.

# match only the first character of a word
# use & to backreference the matched character
# \u would then change it to uppercase
$ echo 'hello there. how are you?' | sed 's/\b\w/\u&/g'
Hello There. How Are You?

# change the first character of a word to lowercase
$ echo 'HELLO THERE. HOW ARE YOU?' | sed 's/\b\w/\l&/g'
hELLO tHERE. hOW aRE yOU?

# match lowercase followed by underscore followed by lowercase
# delete the underscore and convert the 2nd lowercase to uppercase
$ echo '_fig aug_price next_line' | sed -E 's/([a-z])_([a-z])/\1\u\2/g'
_fig augPrice nextLine

Next, changing case of multiple characters at a time.

# change all alphabets to lowercase
$ echo 'HaVE a nICe dAy' | sed 's/.*/\L&/'
have a nice day
# change all alphabets to uppercase
$ echo 'HaVE a nICe dAy' | sed 's/.*/\U&/'
HAVE A NICE DAY

# \E will stop further conversion
$ echo 'fig_ aug_price next_line' | sed -E 's/([a-z]+)(_[a-z]+)/\U\1\E\2/g'
fig_ AUG_price NEXT_line
# \L or \U will override any existing conversion
$ echo 'HeLLo:bYe gOoD:beTTEr' | sed -E 's/([a-z]+)(:[a-z]+)/\L\1\U\2/Ig'
hello:BYE good:BETTER

Finally, examples where escapes are used next to each other.

# uppercase first character of a word
# and lowercase rest of the word characters
# note the order of escapes used, \u\L won't work
$ echo 'HeLLo:bYe gOoD:beTTEr' | sed -E 's/[a-z]+/\L\u&/Ig'
Hello:Bye Good:Better

# lowercase first character of a word
# and uppercase rest of the word characters
$ echo 'HeLLo:bYe gOoD:beTTEr' | sed -E 's/[a-z]+/\U\l&/Ig'
hELLO:bYE gOOD:bETTER

Global replace

By default, the substitute command will replace only the first occurrence of matching portions. Use the g flag to replace all the matches.

# for each input line, change only the first ',' to '-'
$ printf '1,2,3,4\na,b,c,d\n' | sed 's/,/-/'
1-2,3,4
a-b,c,d

# change all matches by adding the 'g' flag
$ printf '1,2,3,4\na,b,c,d\n' | sed 's/,/-/g'
1-2-3-4
a-b-c-d

Replace specific occurrences

A number provided as a flag will cause only the Nth match to be replaced.

$ s='apple:banana:cherry:fig:mango'

# replace only the second occurrence
$ echo "$s" | sed 's/:/---/2'
apple:banana---cherry:fig:mango
$ echo "$s" | sed -E 's/[^:]+/"&"/2'
apple:"banana":cherry:fig:mango

# replace only the third occurrence, and so on
$ echo "$s" | sed 's/:/---/3'
apple:banana:cherry---fig:mango
$ echo "$s" | sed -E 's/[^:]+/"&"/3'
apple:banana:"cherry":fig:mango

Use a combination of capture groups and quantifiers to replace Nth match from the end of the line.

$ s='car,art,pot,tap,urn,ray,ear'

# replace the last occurrence
# can also use sed -E 's/,([^,]*)$/[]\1/'
$ echo "$s" | sed -E 's/(.*),/\1[]/'
car,art,pot,tap,urn,ray[]ear

# replace last but one
$ echo "$s" | sed -E 's/(.*),(.*,)/\1[]\2/'
car,art,pot,tap,urn[]ray,ear

# generic version, where {N} refers to last but N
$ echo "$s" | sed -E 's/(.*),((.*,){3})/\1[]\2/'
car,art,pot[]tap,urn,ray,ear

warning See unix.stackexchange: Why doesn't this sed command replace the 3rd-to-last "and"? for a bug related to the use of word boundaries in the ((pat){N}) generic case.

Combination of a number and the g flag will replace all matches except the first N-1 occurrences. In other words, all matches starting from the Nth occurrence will be replaced.

$ s='apple:banana:cherry:fig:mango'

# replace all matches except the first occurrence
$ echo "$s" | sed -E 's/:/---/2g'
apple:banana---cherry---fig---mango

# replace all matches except the first three occurrences
$ echo "$s" | sed -E 's/:/---/4g'
apple:banana:cherry:fig---mango

If multiple Nth occurrences are to be replaced, use descending order for readability.

$ s='car,art,pot,tap,urn,ray,ear'

# replace the second and third occurrences
# same as: sed 's/,/[]/2; s/,/[]/2'
$ echo "$s" | sed 's/,/[]/3; s/,/[]/2'
car,art[]pot[]tap,urn,ray,ear

# replace the second, third and fifth occurrences
$ echo "$s" | sed 's/,/[]/5; s/,/[]/3; s/,/[]/2'
car,art[]pot[]tap,urn[]ray,ear

This flag was already introduced in the Selective editing chapter.

# no output if no substitution
$ echo 'hi there. have a nice day' | sed -n 's/xyz/XYZ/p'

# modified line is printed when the substitution succeeds
$ echo 'hi there. have a nice day' | sed -n 's/\bh/H/pg'
Hi there. Have a nice day

Write to a file

The w flag helps to redirect contents to a specified filename. This flag applies to both the filtering and substitution commands. You might wonder why not simply use shell redirection? As sed allows multiple commands, the w flag can be used selectively, allow writes to multiple files and so on.

# space between w and the filename is optional
# same as: sed -n 's/3/three/p' > 3.txt
$ seq 20 | sed -n 's/3/three/w 3.txt'
$ cat 3.txt
three
1three

# do not use -n if output should be displayed as well as written to a file
$ printf '1,2,3,4\na,b,c,d\n' | sed 's/,/:/gw cols.txt'
1:2:3:4
a:b:c:d
$ cat cols.txt
1:2:3:4
a:b:c:d

For multiple output files, use -e to separate the commands. Don't use ; between the commands as it will be interpreted as part of the filename!

$ seq 20 | sed -n -e 's/5/five/w 5.txt' -e 's/7/seven/w 7.txt'
$ cat 5.txt
five
1five
$ cat 7.txt
seven
1seven

There are two predefined filenames:

  • /dev/stdout to represent the stdout stream
  • /dev/stderr to represent the stderr stream
# in-place editing as well as display changes on stdout
# 3.txt was created at the start of this section
$ sed -i 's/three/3/w /dev/stdout' 3.txt
3
13
$ cat 3.txt
3
13

Executing external commands

The e flag helps you to use the output of a shell command. The external command can be based on the pattern space contents or provided as an argument. Quoting from the manual:

This command allows one to pipe input from a shell command into pattern space. Without parameters, the e command executes the command that is found in pattern space and replaces the pattern space with the output; a trailing newline is suppressed.

If a parameter is specified, instead, the e command interprets it as a command and sends its output to the output stream. The command can run across multiple lines, all but the last ending with a back-slash.

In both cases, the results are undefined if the command to be executed contains a NUL character.

First, examples with the substitution command.

# sample input
$ printf 'Date:\nreplace this line\n'
Date:
replace this line

# replace entire line with the output of a shell command
$ printf 'Date:\nreplace this line\n' | sed 's/^replace.*/date/e'
Date:
Monday 29 May 2023 04:09:24 PM IST

If the p flag is used as well, order is important. Quoting from the manual:

when both the p and e options are specified, the relative ordering of the two produces very different results. In general, ep (evaluate then print) is what you want, but operating the other way round can be useful for debugging. For this reason, the current version of GNU sed interprets specially the presence of p options both before and after e, printing the pattern space before and after evaluation, while in general flags for the s command show their effect just once. This behavior, although documented, might change in future versions.

$ printf 'Date:\nreplace this line\n' | sed -n 's/^replace.*/date/ep'
Monday 29 May 2023 04:10:20 PM IST

$ printf 'Date:\nreplace this line\n' | sed -n 's/^replace.*/date/pe'
date

If only a portion of the line is replaced, the complete modified line after the substitution will get executed as a shell command.

# after substitution, the command that gets executed is 'seq 5'
$ echo 'xyz 5' | sed 's/xyz/seq/e'
1
2
3
4
5

Next, examples with filtering alone.

# execute entire matching line as a shell command
# replaces the matching line with the output of the command
$ printf 'date\ndate -I\n' | sed '/date/e'
Monday 29 May 2023 04:12:01 PM IST
2023-05-29
$ printf 'date\ndate -I\n' | sed '2e'
date
2023-05-29

Here's an example where the command is provided as an argument. In such cases, the command's output is inserted before the matching line.

$ printf 'show\nexample\n' | sed '/am/e seq 2'
show
1
2
example

Multiline mode

The m (or M) flag will change the behavior of ^, $ and . metacharacters. This comes into play only if there are multiple lines in the pattern space to operate with, for example when the N command is used.

When the m flag is used, the . metacharacter will not match the newline character.

# without 'm' flag . will match the newline character
$ printf 'Hi there\nHave a Nice Day\n' | sed 'N; s/H.*e/X/'
X Day

# with 'm' flag . will not match across lines
$ printf 'Hi there\nHave a Nice Day\n' | sed 'N; s/H.*e/X/gm'
X
X Day

The ^ and $ anchors will match every line's start and end locations when the m flag is used.

# without 'm' flag line anchors will match once for the whole string
$ printf 'Hi there\nHave a Nice Day\n' | sed 'N; s/^/* /g'
* Hi there
Have a Nice Day
$ printf 'Hi there\nHave a Nice Day\n' | sed 'N; s/$/./g'
Hi there
Have a Nice Day.

# with 'm' flag line anchors will work for every line
$ printf 'Hi there\nHave a Nice Day\n' | sed 'N; s/^/* /gm'
* Hi there
* Have a Nice Day
$ printf 'Hi there\nHave a Nice Day\n' | sed 'N; s/$/./gm'
Hi there.
Have a Nice Day.

The \` and \' anchors will always match the start and end of the entire string, irrespective of single or multiline mode.

# similar to \A start of string anchor found in other regexp flavors
$ printf 'Hi there\nHave a Nice Day\n' | sed 'N; s/\`/* /gm'
* Hi there
Have a Nice Day

# similar to \Z end of string anchor found in other regexp flavors
# note the use of double quotes
# with single quotes, it will be: sed 'N; s/\'\''/./gm'
$ printf 'Hi there\nHave a Nice Day\n' | sed "N; s/\'/./gm"
Hi there
Have a Nice Day.

Usually, regular expression implementations have separate flags to control the behavior of . metacharacter and line anchors. Having a single flag restricts flexibility. As an example, you cannot make . to match across lines if the m flag is active. You'll have to resort to some creative alternatives in such cases as shown below.

# \w|\W or .|\n can also be used
# recall that sed doesn't allow character set sequences inside []
$ printf 'Hi there\nHave a Nice Day\n' | sed -E 'N; s/H(\s|\S)*e/X/m'
X Day

# this one doesn't use alternation
$ printf 'Hi there\nHave a Nice Day\n' | sed -E 'N; s/H(.*\n.*)*e/X/m'
X Day

Cheatsheet and summary

NoteDescription
flagchanges default behavior of filtering and substitution commands
Imatch case insensitively for REGEXP address
i or Imatch case insensitively for the substitution command
\Eindicates end of case conversion in the replacement section
\lconvert the next character to lowercase
\uconvert the next character to uppercase
\Lconvert the following characters to lowercase (overridden by \U or \E)
\Uconvert the following characters to uppercase (overridden by \L or \E)
greplace all occurrences instead of just the first match
Na number will cause only the Nth match to be replaced
pprint only if the substitution succeeds (assuming -n is active)
w filenamewrite contents of the pattern space to the given filename
whenever the REGEXP address matches or substitution succeeds
eexecutes contents of the pattern space as a shell command
and replaces the pattern space with the command output
if argument is passed, executes that external command
and inserts the output before matching lines
m or Mmultiline mode flag
. will not match the newline character
^ and $ will match every line's start and end locations
\`always match the start of string irrespective of the multiline flag
\'always match the end of string irrespective of the multiline flag

This chapter showed how flags can be used for extra functionality. Some of the flags interact with the shell as well. In the next chapter, you'll learn how to incorporate shell variables and command outputs when you need to dynamically construct a sed command.

Exercises

info The exercises directory has all the files used in this section.

1) For the input file para.txt, remove all groups of lines marked with a line beginning with start and a line ending with end. Match both these markers case insensitively.

$ cat para.txt
good start
Start working on that
project you always wanted
to, do not let it end
hi there
start and try to
finish the End
bye

$ sed ##### add your solution here
good start
hi there
bye

2) The headers.txt file contains one header per line, starting with one or more # characters followed by one or more whitespace characters and then some words. Convert such lines to the corresponding output as shown below.

$ cat headers.txt
# Regular Expressions
## Subexpression calls
## The dot meta character

$ sed ##### add your solution here
regular-expressions
subexpression-calls
the-dot-meta-character

3) Using para.txt, create a file named five.txt with lines that contain a whole word of length 5 and a file named seven.txt with lines that contain a whole word of length 7.

$ sed ##### add your solution here

$ cat five.txt
good start
Start working on that
hi there
start and try to

$ cat seven.txt
Start working on that
project you always wanted

4) Given sample strings have fields separated by , where field values can be empty as well. Use sed to replace the third field with 42.

$ echo 'lion,,ant,road,neon' | sed ##### add your solution here
lion,,42,road,neon

$ echo ',,,' | sed ##### add your solution here
,,42,

5) Replace all occurrences of e with 3 except the first two matches.

$ echo 'asset sets tests site' | sed ##### add your solution here
asset sets t3sts sit3

$ echo 'sample item teem eel' | sed ##### add your solution here
sample item t33m 33l

6) For the input file addr.txt, replace all input lines with the number of characters in those lines. wc -L is one of the ways to get the length of a line as shown below. Assume that the input file doesn't have single or double quote characters.

# note that newline character isn't counted, which is preferable here
$ echo "Hello World" | wc -L
11

$ sed ##### add your solution here
11
11
17
14
5
13

7) For the input file para.txt, assume that it'll always have lines in multiples of 4. Use sed commands such that there are 4 lines at a time in the pattern space. Then, delete from start till end provided start is matched only at the start of a line. Also, match these two keywords case insensitively.

$ sed ##### add your solution here
good start

hi there

bye

8) For the input file patterns.txt, replace the last but second ar with X. Display only the modified lines.

$ sed ##### add your solution here
par car tX far Cart
pXt cart mart

9) Display lines from sample.txt that satisfy both of these conditions:

  • he matched irrespective of case
  • either World or Hi matched case sensitively
$ sed ##### add your solution here
Hello World
Hi there

10) For the input file patterns.txt, surround all hexadecimal sequences with a minimum of four characters with []. Match 0x as an optional prefix, but shouldn't be counted for determining the length. Match the characters case insensitively, and the sequences shouldn't be surrounded by other word characters. Display only the modified lines.

$ sed ##### add your solution here
"should not match [0XdeadBEEF]"
Hi42Bye nice1423 [bad42]
took 0xbad 22 [0x0ff1ce]
eqn2 = pressure*3+42/5-[14256]