Gotchas and Tricks

  1. Use single quotes to enclose sed commands on the command line to avoid potential conflict with shell metacharacters. This case applies when the command doesn't need variable or command substitution.
$ # space is a shell metacharacter, hence the error
$ echo 'a sunny day' | sed s/sunny day/cloudy day/
sed: -e expression #1, char 7: unterminated `s' command
$ # shell treats characters inside single quotes literally
$ echo 'a sunny day' | sed 's/sunny day/cloudy evening/'
a cloudy evening
  1. On the other hand, beginners often do not realize the difference between single and double quotes and expect shell substitutions to work from within single quotes. See wooledge: Quotes and unix.stackexchange: Why does my shell script choke on whitespace or other special characters? for details about various quoting mechanisms.
$ # $USER won't get expanded within single quotes
$ echo 'User name: ' | sed 's/$/$USER/'
User name: $USER

$ # use double quotes for such cases
$ echo 'User name: ' | sed "s/$/$USER/"
User name: learnbyexample
  1. When shell substitution is needed, surrounding entire command with double quotes may lead to issues due to conflict between sed and bash special characters. So, use double quotes only for the portion of the command where it is required.
$ # ! is one of special shell characters within double quotes
$ word='at'
$ printf 'sea\neat\ndrop\n' | sed "/${word}/!d"
printf 'sea\neat\ndrop\n' | sed "/${word}/date -Is"
sed: -e expression #1, char 6: extra characters after command

$ # works correctly when only the required portion is double quoted
$ printf 'sea\neat\ndrop\n' | sed '/'"${word}"'/!d'
eat
  1. Another gotcha when applying variable or command substitution is the conflict between sed metacharacters and the value of the substituted string. See also stackoverflow: Is it possible to escape regex metacharacters reliably with sed and unix.stackexchange: security consideration when using shell substitution.
$ # variable being substituted cannot have the delimiter character
$ printf 'home\n' | sed 's/$/: '"$HOME"'/'
sed: -e expression #1, char 8: unknown option to `s'

$ # use a different delimiter that won't conflict with variable value
$ printf 'home\n' | sed 's|$|: '"$HOME"'|'
home: /home/learnbyexample
  1. You can specify command line options after filename arguments. Useful if you forgot some option(s) and want to edit the previous command from history.
$ printf 'boat\nsite\nfoot\n' > temp.txt
$ # no output, as + is not special with default BRE
$ sed -n '/[aeo]+t/p' temp.txt

$ # pressing up arrow will bring up the last command from history
$ # then you can add the option needed at the end of the command
$ sed -n '/[aeo]+t/p' temp.txt -E
boat
foot

As a corollary, if a filename starts with -, you need to either escape it or use -- as an option to indicate that no more options will be used. The -- feature is not unique to sed command, it is applicable to many other commands as well and typically used when filenames are obtained from another source or expanded by shell globs such as *.txt.

$ echo 'hi hello' > -dash.txt
$ sed 's/hi/HI/' -dash.txt
sed: invalid option -- 'd'

$ sed -- 's/hi/HI/' -dash.txt
HI hello

$ # clean up temporary file
$ rm -- -dash.txt
  1. Your command might not work and/or get weird output if your input file has dos style line endings.
$ # substitution doesn't work here because of dos style line ending
$ printf 'hi there\r\ngood day\r\n' | sed -E 's/\w+$/123/'
hi there
good day
$ # matching \r optionally is one way to solve this issue
$ # that way, it'll work for both \r\n and \n line endings
$ printf 'hi there\r\ngood day\r\n' | sed -E 's/\w+(\r?)$/123\1/'
hi 123
good 123

$ # swapping every two columns, works well with \n line ending
$ printf 'good,bad,42,24\n' | sed -E 's/([^,]+),([^,]+)/\2,\1/g'
bad,good,24,42
$ # output gets mangled with \r\n line ending
$ printf 'good,bad,42,24\r\n' | sed -E 's/([^,]+),([^,]+)/\2,\1/g'
,42,good,24

I use these bash functions (as part of .bashrc configuration) to easily switch between dos and unix style line endings. Some Linux distribution may come with these commands installed by default. See also stackoverflow: Why does my tool output overwrite itself and how do I fix it?

unix2dos() { sed -i 's/$/\r/' "$@" ; }
dos2unix() { sed -i 's/\r$//' "$@" ; }
  1. Unlike grep, sed will not add a newline if last line of input didn't have one.
$ # grep added a newline even though 'drop' doesn't end with newline
$ printf 'sea\neat\ndrop' | grep -v 'at'
sea
drop
$ # sed will not do so
$ # note how the prompt appears after 'drop'
$ printf 'sea\neat\ndrop' | sed '/at/d'
sea
drop$ 
  1. Use of -e option for commands like a/c/i/r/R when command grouping is also required.
$ # } gets treated as part of argument for append command, hence the error
$ seq 3 | sed '2{s/^/*/; a hi}'
sed: -e expression #1, char 0: unmatched `{'

$ # } now used with -e, but -e is still missing for first half of command
$ seq 3 | sed '2{s/^/*/; a hi' -e '}'
sed: -e expression #1, char 1: unexpected `}'

$ # -e now properly used for both portions of the command
$ seq 3 | sed -e '2{s/^/*/; a hi' -e '}'
1
*2
hi
3
  1. Longest match wins. See also Longest match wins section.
$ s='food land bark sand band cue combat'
$ # this will always match from first 'foo' to last 'ba'
$ echo "$s" | sed 's/foo.*ba/X/'
Xt
$ # if you need to match from first 'foo' to first 'ba', then
$ # use a tool which supports non-greedy quantifiers
$ echo "$s" | perl -pe 's/foo.*?ba/X/'
Xrk sand band cue combat

For certain cases, character class can help in matching only the relevant characters. And in some cases, adding more qualifiers instead of just .* can help. See stackoverflow: How to replace everything until the first occurrence for an example.

$ echo '{52} apples and {31} mangoes' | sed 's/{.*}/42/g'
42 mangoes
$ echo '{52} apples and {31} mangoes' | sed 's/{[^}]*}/42/g'
42 apples and 42 mangoes
  1. Beware of empty matches when using the * quantifier.
$ # * matches zero or more times
$ echo '42,,,,,hello,bye,,,hi' | sed 's/,*/,/g'
,4,2,h,e,l,l,o,b,y,e,h,i,
$ # + matches one or more times
$ echo '42,,,,,hello,bye,,,hi' | sed -E 's/,+/,/g'
42,hello,bye,hi
  1. BRE vs ERE syntax could get confusing for beginners. Quoting from the manual:

In GNU sed, the only difference between basic and extended regular expressions is in the behavior of a few special characters: ?, +, parentheses, braces ({}), and |.

$ # no match as + is not special with default BRE
$ echo '52 apples and 31234 mangoes' | sed 's/[0-9]+/[&]/g'
52 apples and 31234 mangoes
$ # so, either use \+ with BRE or use + with ERE
$ echo '52 apples and 31234 mangoes' | sed 's/[0-9]\+/[&]/g'
[52] apples and [31234] mangoes

$ # the reverse is also common, use of escapes when not required
$ echo 'get {} set' | sed 's/\{\}/[]/'
sed: -e expression #1, char 10: Invalid preceding regular expression
$ echo 'get {} set' | sed 's/{}/[]/'
get [] set
  1. Online tools like regex101 and debuggex can be very useful for beginners to regular expressions, especially for debugging purposes. However, their popularity has lead to users trying out their pattern on these sites and expecting them to work as is for command line tools like grep, sed and awk. The issue arises when features like non-greedy and lookarounds are used as they wouldn't work with BRE/ERE. See also unix.stackexchange: Why does my regular expression work in X but not in Y?
$ echo '1,,,two,,3' | sed -E 's/,\K(?=,)/NA/g'
sed: -e expression #1, char 15: Invalid preceding regular expression
$ echo '1,,,two,,3' | perl -pe 's/,\K(?=,)/NA/g'
1,NA,NA,two,NA,3

$ # \d is not available as character set escape sequence
$ # will match 'd' instead
$ echo '52 apples and 31234 mangoes' | sed -E 's/\d+/[&]/g'
52 apples an[d] 31234 mangoes
$ echo '52 apples and 31234 mangoes' | perl -pe 's/\d+/[$&]/g'
[52] apples and [31234] mangoes
  1. If you are facing issues with end of line matching, it is often due to dos-style line ending (discussed earlier in this chapter) or whitespace characters at the end of line.
$ # there's no visual clue to indicate whitespace characters at end of line
$ printf 'food bark \n1234 6789\t\n'
food bark 
1234 6789	
$ # no match
$ printf 'food bark \n1234 6789\t\n' | sed -E 's/\w+$/xyz/'
food bark 
1234 6789	

$ # cat command has options to indicate end of line, tabs, etc
$ printf 'food bark \n1234 6789\t\n' | cat -A
food bark $
1234 6789^I$
$ # works now, as whitespace characters are matched too
$ printf 'food bark \n1234 6789\t\n' | sed -E 's/\w+\s*$/xyz/'
food xyz
1234 xyz
  1. The word boundary \b matches both start and end of word locations. Whereas, \< and \> match exactly the start and end of word locations respectively. This leads to cases where you have to choose which of these word boundaries to use depending on results desired. Consider I have 12, he has 2! as sample text, shown below as an image with vertical bars marking the word boundaries. The last character ! doesn't have end of word boundary as it is not a word character.

word boundary

$ # \b matches both start and end of word boundaries
$ # the first match here used starting boundary of 'I' and 'have'
$ echo 'I have 12, he has 2!' | sed 's/\b..\b/[&]/g'
[I ]have [12][, ][he] has[ 2]!

$ # \< and \> only match the start and end word boundaries respectively
$ echo 'I have 12, he has 2!' | sed 's/\<..\>/[&]/g'
I have [12], [he] has 2!

Here's another example to show the difference between the two types of word boundaries.

$ # add something to both start/end of word
$ echo 'hi log_42 12b' | sed 's/\b/:/g'
:hi: :log_42: :12b:

$ # add something only at start of word
$ echo 'hi log_42 12b' | sed 's/\</:/g'
:hi :log_42 :12b

$ # add something only at end of word
$ echo 'hi log_42 12b' | sed 's/\>/:/g'
hi: log_42: 12b:
  1. For some cases, you could simplify and improve readability of a substitution command by adding a filter condition instead of using substitution only.
$ # insert 'Error: ' at start of line if the line contains '42'
$ # also, remove all other starting whitespaces for such lines
$ printf '1423\n214\n   425\n' | sed -E 's/^\s*(.*42)/Error: \1/'
Error: 1423
214
Error: 425

$ # simpler and readable
$ # also note that -E is no longer required
$ printf '1423\n214\n   425\n' | sed '/42/ s/^\s*/Error: /'
Error: 1423
214
Error: 425
  1. Both 1 and $ will match as an address if input file has only one line of data.
$ printf '3.14\nhi\n42\n' | sed '1 s/^/start: /; $ s/$/ :end/'
start: 3.14
hi
42 :end
$ echo '3.14' | sed '1 s/^/start: /; $ s/$/ :end/'
start: 3.14 :end

$ # you could use control structures as a workaround
$ # this will not work for ending address if input has only one line
$ echo '3.14' | sed '1{s/^/start: /; b}; $ s/$/ :end/'
start: 3.14
$ # this will not work for starting address if input has only one line
$ echo '3.14' | sed '${s/$/ :end/; b}; 1 s/^/start: /'
3.14 :end
  1. n and N commands will not execute further commands if there's no more input lines to fetch.
$ # last line matched the filtering condition
$ # but substitution didn't work for last line as there's no more input
$ printf 'red\nblue\ncredible\n' | sed '/red/{N; s/e.*e/2/}'
r2
credible

$ # $!N will avoid executing N command for last line of input
$ printf 'red\nblue\ncredible\n' | sed '/red/{$!N; s/e.*e/2/}'
r2
cr2
  1. Changing locale to ASCII (assuming default is not ASCII locale) can give significant speed boost.
$ # time shown is best result from multiple runs
$ # speed benefit will vary depending on computing resources, input, etc
$ time sed -nE '/^([a-d][r-z]){3}$/p' /usr/share/dict/words > f1
real    0m0.022s

$ # LC_ALL=C will give ASCII locale, active only for this command
$ time LC_ALL=C sed -nE '/^([a-d][r-z]){3}$/p' /usr/share/dict/words > f2
real    0m0.012s

$ # check that results are same for both versions of the command
$ diff -s f1 f2
Files f1 and f2 are identical

Here's another example.

$ time sed -nE '/^([a-z]..)\1$/p' /usr/share/dict/words > f1
real    0m0.049s

$ time LC_ALL=C sed -nE '/^([a-z]..)\1$/p' /usr/share/dict/words > f2
real    0m0.029s

$ # clean up temporary files
$ rm f[12]
  1. ripgrep (command name rg) is primarily used as an alternative to grep but also supports search and replace functionality. It has more regular expression features than BRE/ERE, supports unicode, multiline and fixed string matching and generally faster than sed. sed 's/search/replace/g' file is similar to rg --passthru -N 'search' -r 'replace' file. There are plenty of features to recommended learning rg even though it supports substitution in limited fashion compared to sed (no in-place support, no address filtering, no control structures, etc). See my book on GNU GREP and RIPGREP for more details.
$ # same as: sed 's/e/E/g' greeting.txt
$ # --passthru is needed to print lines which didn't match the pattern
$ rg --passthru -N 'e' -r 'E' greeting.txt
Hi thErE
HavE a nicE day

$ # non-greedy quantifier
$ s='food land bark sand band cue combat'
$ echo "$s" | rg --passthru 'foo.*?ba' -r 'X'
Xrk sand band cue combat

$ # Multiline search and replacement
$ printf '42\nHi there\nHave a Nice Day' | rg --passthru -U '(?s)the.*ice' -r ''
42
Hi  Day

$ # easily handle fixed strings, this one replaces [4]* with 2
$ printf '2.3/[4]*6\nfoo\n5.3-[4]*9\n' | rg --passthru -F '[4]*' -r '2'
2.3/26
foo
5.3-29

$ # unicode support
$ echo 'fox:αλεπού,eagle:αετός' | rg '\p{L}+' -r '($0)'
(fox):(αλεπού),(eagle):(αετός)

$ # -P option enables PCRE2 if you need even more advanced features
$ echo 'car bat cod map' | rg -P '(bat|map)(*SKIP)(*F)|\w+' -r '[$0]'
[car] bat [cod] map
  1. Quoting from sed-bin: POSIX sed to C translator:

This project allows to translate sed to C to be able to compile the result and generate a binary that will have the exact same behavior as the original sed script

It could help in debugging a complex sed script, obfuscation, better speed, etc.