Shell substitutions

So far, the sed commands have been constructed statically. All the details were known. For example, which line numbers to act upon, the search REGEXP, the replacement string and so on. When it comes to automation and scripting, you'd often need to construct commands dynamically based on user input, file contents, etc. And sometimes, output of a shell command is needed as part of the replacement string. This chapter will discuss how to incorporate shell variables and command output to compose a sed command dynamically. As mentioned before, this book assumes bash as the shell being used.

info As an example, see my repo ch: command help for a practical shell script, where commands are constructed dynamically.

Variable substitution

The characters you type on the command line are first interpreted by the shell before it can be executed. Wildcards are expanded, pipes and redirections are set up, double quotes are interpolated and so on. For some use cases, it is simply easier to use double quotes instead of single quotes for the script passed to sed command. That way, shell variables get substituted for their values.

info See wooledge: Quotes and unix.stackexchange: Why does my shell script choke on whitespace or other special characters? for details about various quoting mechanisms in bash and when quoting is needed.

$ start=5; step=1
$ sed -n "${start},+${step}p" programming_quotes.txt
Some people, when confronted with a problem, think - I know, I will
use regular expressions. Now they have two problems by Jamie Zawinski

$ step=4
$ sed -n "${start},+${step}p" programming_quotes.txt
Some people, when confronted with a problem, think - I know, I will
use regular expressions. Now they have two problems by Jamie Zawinski

A language that does not affect the way you think about programming,
is not worth knowing by Alan Perlis

But, if the shell variables can contain any generic string instead of just numbers, it is strongly recommended to use double quotes only where it is needed. Otherwise, normal characters part of sed command may get interpolated because of double quotes. bash allows unquoted, single quoted and double quoted strings to be concatenated by simply placing them next to each other.

$ # ! is special within double quotes
$ # !d got expanded to 'date -Is' from my history and hence the error
$ word='at'
$ printf 'sea\neat\ndrop\n' | sed "/${word}/!d"
printf 'sea\neat\ndrop\n' | sed "/${word}/date -Is"
sed: -e expression #1, char 6: extra characters after command

$ # use double quotes only for variable substitution
$ # and single quotes for everything else
$ # the command is concatenation of '/' and "${word}" and '/!d'
$ printf 'sea\neat\ndrop\n' | sed '/'"${word}"'/!d'
eat

After you've properly separated single and double quoted portions, you need to take care of few more things to robustly construct a dynamic command. First, you'll have to ensure that the shell variable is properly preprocessed to avoid conflict with whichever delimiter is being used for search or substitution operations.

info See wooledge: Parameter Expansion for details about the bash feature used in the example below.

$ # error because '/' inside HOME value conflicts with '/' as delimiter
$ echo 'home path is:' | sed 's/$/ '"${HOME}"'/'
sed: -e expression #1, char 7: unknown option to `s'
$ # using a different delimiter will help in this particular case
$ echo 'home path is:' | sed 's|$| '"${HOME}"'|'
home path is: /home/learnbyexample

$ # but you may not have the luxury of choosing a delimiter
$ # in such cases, escape all delimiter characters before variable substitution
$ home=${HOME//\//\\/}
$ echo 'home path is:' | sed 's/$/ '"${home}"'/'
home path is: /home/learnbyexample

warning If the variable value is obtained from an external source, such as user input, then you need to worry about security too. See unix.stackexchange: security consideration when using shell substitution for more details.

Escaping metacharacters

Next, you have to properly escape all the metacharacters depending upon whether the variable is used as search or replacement string. This is needed only if the content of the variable has to be treated literally. Here's an example to illustrate the issue with one metacharacter.

$ c='&'
$ # & will backreference entire matched portion
$ echo 'a and b and c' | sed 's/and/'"${c}"'/g'
a [and] b [and] c

$ # escape all occurrences of & to insert it literally
$ c1=${c//&/\\&}
$ echo 'a and b and c' | sed 's/and/'"${c1}"'/g'
a [&] b [&] c

Typically, you'd need to escape \, & and the delimiter for variables used in the replacement section. For the search section, the characters to be escaped will depend upon whether you are using BRE or ERE.

$ # replacement string
$ r='a/b&c\d'
$ r=$(printf '%s' "$r" | sed 's#[\&/]#\\&#g')

$ # ERE version for search string
$ s='{[(\ta^b/d).*+?^$|]}'
$ s=$(printf '%s' "$s" | sed 's#[{[()^$*?+.\|/]#\\&#g')
$ echo 'f*{[(\ta^b/d).*+?^$|]} - 3' | sed -E 's/'"$s"'/'"$r"'/g'
f*a/b&c\d - 3

$ # BRE version for search string
$ s='{[(\ta^b/d).*+?^$|]}'
$ s=$(printf '%s' "$s" | sed 's#[[^$*.\/]#\\&#g')
$ echo 'f*{[(\ta^b/d).*+?^$|]} - 3' | sed 's/'"$s"'/'"$r"'/g'
f*a/b&c\d - 3

For a more detailed analysis on escaping the metacharacters, refer to these wonderful Q&A threads.

Command substitution

This section will show examples of using output of shell command as part of sed command. And all the precautions seen in previous sections apply here too.

info See also wooledge: Why is $() preferred over backticks?

$ # note that the trailing newline character of command output gets stripped
$ echo 'today is date.' | sed 's/date/'"$(date -I)"'/'
today is 2019-08-23.

$ # need to change delimiter where possible
$ printf 'f1.txt\nf2.txt\n' | sed 's|^|'"$(pwd)"'/|'
/home/learnbyexample/f1.txt
/home/learnbyexample/f2.txt
$ # or preprocess if delimiter cannot be changed for other reasons
$ p=$(pwd | sed 's|/|\\/|g')
$ printf 'f1.txt\nf2.txt\n' | sed 's/^/'"${p}"'\//'
/home/learnbyexample/f1.txt
/home/learnbyexample/f2.txt

warning Multiline command output cannot be substituted in this manner, as substitute command doesn't allow literal newlines in replacement section unless escaped.

$ printf 'a\n[x]\nb\n' | sed 's/x/'"$(seq 3)"'/'
sed: -e expression #1, char 5: unterminated `s' command
$ # prefix literal newlines with \ except the last newline
$ printf 'a\n[x]\nb\n' | sed 's/x/'"$(seq 3 | sed '$!s/$/\\/' )"'/'
a
[1
2
3]
b

Cheatsheet and summary

NoteDescription
sed -n "${start},+${step}p"dynamically construct sed command
in above example, start and step are shell variables
their values gets substituted before sed is executed
sed "/${word}/!d"entire command in double quotes is risky
within double quotes, $, \, ! and ` are special
sed '/'"${word}"'/!d'use double quotes only where needed
and variable contents have to be preprocessed to prevent
clashing with sed metacharacters and security issue
if you don't control the variable contents
sed 's#[\&/]#\\&#g'escape metacharacters for replacement section
sed '$!s/$/\\/'escape literal newlines for replacement section
sed 's#[{[()^$*?+.\|/]#\\&#g'escape metacharacters for search section, ERE
sed 's#[[^$*.\/]#\\&#g'escape metacharacters for search section, BRE
sed 's/date/'"$(date -I)"'/'example for command substitution
command output's final newline character gets stripped
other literal newlines, if any, have to be escaped

This chapter covered some of the ways to construct a sed command dynamically. Like most things in software programming, 90% of the cases are relatively easier to accomplish. But the other 10% could get significantly complicated. Dealing with the clash between shell and sed metacharacters is a mess and I'd even suggest looking for alternatives such as perl to reduce the complexity. The next chapter will cover some more command line options.

Exercises

a) Replace #expr# with value of usr_ip shell variable. Assume that this variable can only contain the metacharacters as shown in the sample below.

$ usr_ip='c = (a/b) && (x-5)'
$ mod_ip=$(echo "$usr_ip" | sed ##### add your solution here)
$ echo 'Expression: #expr#' | sed ##### add your solution here
Expression: c = (a/b) && (x-5)

b) Repeat previous exercise, but this time with command substitution instead of using temporary variable.

$ usr_ip='c = (a/b/y) && (x-5)'
$ echo 'Expression: #expr#' | sed ##### add your solution here
Expression: c = (a/b/y) && (x-5)