Flags
Just like options change the default behavior of shell commands, flags are used to change aspects of regular expressions. Some of the flags like g
and p
have been already discussed. For completeness, they will be discussed again in this chapter. In regular expression parlance, flags are also known as modifiers.
Case insensitive matching
The I
flag allows to match a pattern case insensitively.
$ # match 'cat' case sensitively
$ printf 'Cat\ncOnCaT\nscatter\ncot\n' | sed -n '/cat/p'
scatter
$ # match 'cat' case insensitively
$ # note that command p cannot be used before flag I
$ printf 'Cat\ncOnCaT\nscatter\ncot\n' | sed -n '/cat/Ip'
Cat
cOnCaT
scatter
$ # match 'cat' case insensitively and replace it with 'dog'
$ printf 'Cat\ncOnCaT\nscatter\ncot\n' | sed 's/cat/dog/I'
dog
cOndog
sdogter
cot
Usually
i
is used for such purposes,grep -i
for example. Buti
is a command (discussed in append, change, insert chapter) insed
, so/REGEXP/i
cannot be used. The substitute command does allow bothi
andI
to be used, butI
is recommended for consistency.
Changing case in replacement section
This section isn't actually about flags, but presented in this chapter to complement the I
flag. sed
provides escape sequences to change the case of replacement strings, which might include backreferences, shell variables, etc.
Escape Sequence | Description |
---|---|
\E | indicates end of case conversion |
\l | convert next character to lowercase |
\u | convert next character to uppercase |
\L | convert following characters to lowercase, unless \U or \E is used |
\U | convert following characters to uppercase, unless \L or \E is used |
First up, changing case of only the immediate next character after the escape sequence.
$ # match only first character of word using word boundary
$ # use & to backreference the matched character
$ # \u would then change it to uppercase
$ echo 'hello there. how are you?' | sed 's/\b\w/\u&/g'
Hello There. How Are You?
$ # change first character of word to lowercase
$ echo 'HELLO THERE. HOW ARE YOU?' | sed 's/\b\w/\l&/g'
hELLO tHERE. hOW aRE yOU?
$ # match lowercase followed by underscore followed by lowercase
$ # delete underscore and convert 2nd lowercase to uppercase
$ echo '_foo aug_price next_line' | sed -E 's/([a-z])_([a-z])/\1\u\2/g'
_foo augPrice nextLine
Next, changing case of multiple characters at a time.
$ # change all alphabets to lowercase
$ echo 'HaVE a nICe dAy' | sed 's/.*/\L&/'
have a nice day
$ # change all alphabets to uppercase
$ echo 'HaVE a nICe dAy' | sed 's/.*/\U&/'
HAVE A NICE DAY
$ # \E will stop further conversion
$ echo '_foo aug_price next_line' | sed -E 's/([a-z]+)(_[a-z]+)/\U\1\E\2/g'
_foo AUG_price NEXT_line
$ # \L or \U will override any existing conversion
$ echo 'HeLLo:bYe gOoD:beTTEr' | sed -E 's/([a-z]+)(:[a-z]+)/\L\1\U\2/Ig'
hello:BYE good:BETTER
Finally, examples where escapes can be used next to each other.
$ # uppercase first character of a word
$ # and lowercase rest of the word characters
$ # note the order of escapes used, \u\L won't work
$ echo 'HeLLo:bYe gOoD:beTTEr' | sed -E 's/[a-z]+/\L\u&/Ig'
Hello:Bye Good:Better
$ # lowercase first character of a word
$ # and uppercase rest of the word characters
$ echo 'HeLLo:bYe gOoD:beTTEr' | sed -E 's/[a-z]+/\U\l&/Ig'
hELLO:bYE gOOD:bETTER
Global replace
As seen earlier, by default substitute command will replace only the first occurrence of search pattern. Use g
flag to replace all the matches.
$ # change only first ',' to '-'
$ printf '1,2,3,4\na,b,c,d\n' | sed 's/,/-/'
1-2,3,4
a-b,c,d
$ # change all matches by adding 'g' flag
$ printf '1,2,3,4\na,b,c,d\n' | sed 's/,/-/g'
1-2-3-4
a-b-c-d
Replace specific occurrences
A number provided as a flag will cause only the Nth match to be replaced.
$ # default substitution replaces first occurrence
$ echo 'foo:123:bar:baz' | sed 's/:/-/'
foo-123:bar:baz
$ echo 'foo:123:bar:baz' | sed -E 's/[^:]+/"&"/'
"foo":123:bar:baz
$ # replace second occurrence
$ echo 'foo:123:bar:baz' | sed 's/:/-/2'
foo:123-bar:baz
$ echo 'foo:123:bar:baz' | sed -E 's/[^:]+/"&"/2'
foo:"123":bar:baz
$ # replace third occurrence and so on
$ echo 'foo:123:bar:baz' | sed 's/:/-/3'
foo:123:bar-baz
$ echo 'foo:123:bar:baz' | sed -E 's/[^:]+/"&"/3'
foo:123:"bar":baz
Quantifiers can be used to replace Nth match from the end of line.
$ # replacing last occurrence
$ # can also use sed -E 's/:([^:]*)$/[]\1/'
$ echo '456:foo:123:bar:789:baz' | sed -E 's/(.*):/\1[]/'
456:foo:123:bar:789[]baz
$ # replacing last but one
$ echo '456:foo:123:bar:789:baz' | sed -E 's/(.*):(.*:)/\1[]\2/'
456:foo:123:bar[]789:baz
$ # generic version, where {N} refers to last but N
$ echo '456:foo:123:bar:789:baz' | sed -E 's/(.*):((.*:){2})/\1[]\2/'
456:foo:123[]bar:789:baz
See unix.stackexchange: Why doesn't this sed command replace the 3rd-to-last "and"? for a bug related to use of word boundaries in the
((pat){N})
generic case.
A combination of number and g
flag will replace all matches except the first N-1 occurrences. In other words, all matches starting from the Nth occurrence will be replaced.
$ # replace all except the first occurrence
$ echo '456:foo:123:bar:789:baz' | sed -E 's/:/[]/2g'
456:foo[]123[]bar[]789[]baz
$ # replace all except the first three occurrences
$ echo '456:foo:123:bar:789:baz' | sed -E 's/:/[]/4g'
456:foo:123:bar[]789[]baz
If multiple Nth occurrences are to be replaced, use descending order for readability.
$ # replace second and third occurrences
$ # note the numbers used
$ echo '456:foo:123:bar:789:baz' | sed 's/:/[]/2; s/:/[]/2'
456:foo[]123[]bar:789:baz
$ # better way is to use descending order
$ echo '456:foo:123:bar:789:baz' | sed 's/:/[]/3; s/:/[]/2'
456:foo[]123[]bar:789:baz
$ # replace second, third and fifth occurrences
$ echo '456:foo:123:bar:789:baz' | sed 's/:/[]/5; s/:/[]/3; s/:/[]/2'
456:foo[]123[]bar:789[]baz
Print flag
This flag was already introduced in Selective editing chapter.
$ # no output if no substitution
$ echo 'hi there. have a nice day' | sed -n 's/xyz/XYZ/p'
$ # modified line is displayed if substitution succeeds
$ echo 'hi there. have a nice day' | sed -n 's/\bh/H/pg'
Hi there. Have a nice day
Write to a file
The w
flag allows to redirect contents to a specified filename instead of default stdout. This flag applies to both filtering and substitution command. You might wonder why not simply use shell redirection? As sed
allows multiple commands, the w
flag can be used selectively, allow writes to multiple files and so on.
$ # space between w and filename is optional
$ # same as: sed -n 's/3/three/p' > 3.txt
$ seq 20 | sed -n 's/3/three/w 3.txt'
$ cat 3.txt
three
1three
$ # do not use -n if output should be displayed as well as written to file
$ printf '1,2,3,4\na,b,c,d\n' | sed 's/,/:/gw cols.txt'
1:2:3:4
a:b:c:d
$ cat cols.txt
1:2:3:4
a:b:c:d
For multiple output files, use -e
for each file. Don't use ;
between commands as that will be interpreted as part of the filename!
$ seq 20 | sed -n -e 's/5/five/w 5.txt' -e 's/7/seven/w 7.txt'
$ cat 5.txt
five
1five
$ cat 7.txt
seven
1seven
There are two predefined filenames:
/dev/stdout
to write to stdout/dev/stderr
to write to stderr
$ # in-place editing as well as display changes on stdout
$ sed -i 's/three/3/w /dev/stdout' 3.txt
3
13
$ cat 3.txt
3
13
Executing external commands
The e
flag allows to use output of a shell command. The external command can be based on the pattern space contents or provided as an argument. Quoting from the manual:
This command allows one to pipe input from a shell command into pattern space. Without parameters, the e command executes the command that is found in pattern space and replaces the pattern space with the output; a trailing newline is suppressed.
If a parameter is specified, instead, the e command interprets it as a command and sends its output to the output stream. The command can run across multiple lines, all but the last ending with a back-slash.
In both cases, the results are undefined if the command to be executed contains a NUL character.
First, examples with substitution command.
$ # sample input
$ printf 'Date:\nreplace this line\n'
Date:
replace this line
$ # replacing entire line with output of shell command
$ printf 'Date:\nreplace this line\n' | sed 's/^replace.*/date/e'
Date:
Wed Aug 14 11:39:39 IST 2019
If the p
flag is used as well, order is important. Quoting from the manual:
when both the p and e options are specified, the relative ordering of the two produces very different results. In general, ep (evaluate then print) is what you want, but operating the other way round can be useful for debugging. For this reason, the current version of GNU sed interprets specially the presence of p options both before and after e, printing the pattern space before and after evaluation, while in general flags for the s command show their effect just once. This behavior, although documented, might change in future versions.
$ printf 'Date:\nreplace this line\n' | sed -n 's/^replace.*/date/ep'
Wed Aug 14 11:42:48 IST 2019
$ printf 'Date:\nreplace this line\n' | sed -n 's/^replace.*/date/pe'
date
If only a portion of the line is replaced, complete modified line after substitution will get executed as a shell command.
$ # after substitution, the command that gets executed is 'seq 5'
$ echo 'xyz 5' | sed 's/xyz/seq/e'
1
2
3
4
5
Next, examples with filtering alone.
$ # execute entire matching line as a shell command
$ # replaces the matching line with output of the command
$ printf 'date\ndate -I\n' | sed '/date/e'
Wed Aug 14 11:51:06 IST 2019
2019-08-14
$ printf 'date\ndate -I\n' | sed '2e'
date
2019-08-14
$ # command provided as argument, output is inserted before matching line
$ printf 'show\nexample\n' | sed '/am/e seq 2'
show
1
2
example
Multiline mode
The m
(or M
) flag will change the behavior of ^
, $
and .
metacharacters. This comes into play only if there are multiple lines in the pattern space to operate with, for example when the N
command is used.
If m
flag is used, the .
metacharacter will not match the newline character.
$ # without 'm' flag . will match newline character
$ printf 'Hi there\nHave a Nice Day\n' | sed 'N; s/H.*e/X/'
X Day
$ # with 'm' flag . will not match across lines
$ printf 'Hi there\nHave a Nice Day\n' | sed 'N; s/H.*e/X/gm'
X
X Day
The ^
and $
anchors will match every line's start and end locations when m
flag is used.
$ # without 'm' flag line anchors will match once for whole string
$ printf 'Hi there\nHave a Nice Day\n' | sed 'N; s/^/* /g'
* Hi there
Have a Nice Day
$ printf 'Hi there\nHave a Nice Day\n' | sed 'N; s/$/./g'
Hi there
Have a Nice Day.
$ # with 'm' flag line anchors will work for every line
$ printf 'Hi there\nHave a Nice Day\n' | sed 'N; s/^/* /gm'
* Hi there
* Have a Nice Day
$ printf 'Hi there\nHave a Nice Day\n' | sed 'N; s/$/./gm'
Hi there.
Have a Nice Day.
The \`
and \'
anchors will always match the start and end of entire string, irrespective of single or multiline mode.
$ # similar to \A start of string anchor found in other implementations
$ printf 'Hi there\nHave a Nice Day\n' | sed 'N; s/\`/* /gm'
* Hi there
Have a Nice Day
$ # similar to \Z end of string anchor found in other implementations
$ # note the use of double quotes
$ # with single quotes, it will be: sed 'N; s/\'\''/./gm'
$ printf 'Hi there\nHave a Nice Day\n' | sed "N; s/\'/./gm"
Hi there
Have a Nice Day.
Usually, regular expression implementations have separate flags to control the behavior of .
metacharacter and line anchors. Having a single flag restricts flexibility. As an example, you cannot make .
to match across lines if m
flag is used in sed
. You'll have to resort to some creative alternatives in such cases as shown below.
$ # \w|\W or .|\n can also be used
$ # recall that sed doesn't allow character set sequences inside []
$ printf 'Hi there\nHave a Nice Day\n' | sed -E 'N; s/H(\s|\S)*e/X/m'
X Day
$ # this one doesn't use alternation
$ printf 'Hi there\nHave a Nice Day\n' | sed -E 'N; s/H(.*\n.*)*e/X/m'
X Day
Cheatsheet and summary
Note | Description |
---|---|
flag | changes default behavior of REGEXP |
I | match case insensitively for REGEXP address |
i or I | match case insensitively for substitution command |
\E | indicates end of case conversion in replacement section |
\l | convert next character to lowercase |
\u | convert next character to uppercase |
\L | convert following characters to lowercase, unless \U or \E is used |
\U | convert following characters to uppercase, unless \L or \E is used |
g | replace all occurrences instead of just the first match |
N | a number will cause only the Nth match to be replaced |
p | prints line only if substitution succeeds (assuming -n is active) |
w filename | write contents of pattern space to given filename |
whenever the REGEXP address matches or substitution succeeds | |
e | executes contents of pattern space as shell command |
and replaces the pattern space with command output | |
if argument is passed, executes that external command | |
and inserts output before matching lines | |
m or M | multiline mode flag |
. will not match the newline character | |
^ and $ will match every line's start and end locations | |
\` | always match the start of string irrespective of m flag |
\' | always match the end of string irrespective of m flag |
This chapter showed how flags can be used for extra functionality. Some of the flags interact with the shell as well. In the next chapter, you'll learn how to incorporate shell variables and command outputs to dynamically construct a sed
command.
Exercises
a) For the input file para.txt
, remove all groups of lines marked with a line beginning with start
and a line ending with end
. Match both these markers case insensitively.
$ cat para.txt
good start
Start working on that
project you always wanted
to, do not let it end
hi there
start and try to
finish the End
bye
$ sed ##### add your solution here
good start
hi there
bye
b) The given sample input below starts with one or more #
characters followed by one or more whitespace characters and then some words. Convert such strings to corresponding output as shown below.
$ echo '# Regular Expressions' | sed ##### add your solution here
regular-expressions
$ echo '## Compiling regular expressions' | sed ##### add your solution here
compiling-regular-expressions
c) Using the input file para.txt
, create a file named five.txt
with all lines that contain a whole word of length 5 and a file named six.txt
with all lines that contain a whole word of length 6.
$ sed ##### add your solution here
$ cat five.txt
good start
Start working on that
hi there
start and try to
$ cat six.txt
project you always wanted
finish the End
d) Given sample strings have fields separated by ,
where field values can be empty as well. Use sed
to replace the third field with 42
.
$ echo 'lion,,ant,road,neon' | sed ##### add your solution here
lion,,42,road,neon
$ echo ',,,' | sed ##### add your solution here
,,42,
e) Replace all occurrences of e
with 3
except the first two matches.
$ echo 'asset sets tests site' | sed ##### add your solution here
asset sets t3sts sit3
$ echo 'sample item teem eel' | sed ##### add your solution here
sample item t33m 33l
f) For the input file addr.txt
, replace all input lines with number of characters in those lines. wc -L
is one of the ways to get length of a line as shown below.
$ # note that newline character isn't counted, which is preferable here
$ echo "Hello World" | wc -L
11
$ sed ##### add your solution here
11
11
17
14
5
13
g) For the input file para.txt
, assume that it'll always have lines in multiples of 4. Use sed
commands such that there are 4 lines at a time in the pattern space. Then, delete from start
till end
provided start
is matched only at the start of a line. Also, match these two keywords case insensitively.
$ sed ##### add your solution here
good start
hi there
bye
h) For the given strings, replace last but third so
with X
. Only print the lines which are changed by the substitution.
$ printf 'so and so also sow and soup\n' | sed ##### add your solution here
so and X also sow and soup
$ printf 'sososososososo\nso and so\n' | sed ##### add your solution here
sososoXsososo
i) Display all lines that satisfies both of these conditions:
professor
matched irrespective of casequip
orthis
matched case sensitively
Input is a file downloaded from internet as shown below.
$ wget https://www.gutenberg.org/files/345/old/345.txt -O dracula.txt
$ sed ##### add your solution here
equipment of a professor of the healing craft. When we were shown in,
should be. I could see that the Professor had carried out in this room,
"Not up to this moment, Professor," she said impulsively, "but up to
and sprang at us. But by this time the Professor had gained his feet,
this time the Professor had to ask her questions, and to ask them pretty