Escaping metacharacters

You have seen a few metacharacters and escape sequences that help to compose a RegExp literal. There's also the / character used as a delimiter for RegExp objects. This chapter will discuss how to remove the special meaning of such constructs. Also, you'll learn how to take care of special characters when you are building a RegExp literal from normal strings.

Escaping with \

To match the metacharacters literally, i.e. to remove their special meaning, prefix those characters with a \ character. To indicate a literal \ character, use \\.

// even though ^ is not being used as anchor, it won't be matched literally
> /b^2/.test('a^2 + b^2 - C*3')
< false
// escaping will work
> /b\^2/.test('a^2 + b^2 - C*3')
< true

> '(a*b) + c'.replace(/\(|\)/g, '')
< "a*b + c"

> '\\learn\\by\\example'.replace(/\\/g, '/')
< "/learn/by/example"

Dynamically escaping metacharacters

When you are defining the regexp yourself, you can manually escape the metacharacters where needed. However, if you have strings obtained from elsewhere and need to match the contents literally, you'll have to somehow escape all the metacharacters while constructing the regexp. The solution of course is to use regular expressions! Usually, the programming language itself would provide an in-built method for such cases. JavaScript doesn't, but MDN: Regular Expressions doc has it covered in the form of a function as shown below.

> function escapeRegExp(string) {
    return string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&')
  }

There are many things in the above regexp that you haven't learnt yet. They'll be discussed in coming chapters. For now, it is enough to know that this function will automatically escape all the metacharacters. Examples are shown below.

// sample input on which regexp will be applied
> let eqn = 'f*(a^b) - 3*(a^b)'
// sample string obtained from elsewhere which needs to be matched literally
> const usr_str = '(a^b)'

// case 1: replace all matches
// escaping metacharacters using 'escapeRegExp' function
> const pat = new RegExp(escapeRegExp(usr_str), 'g')
> pat
< /\(a\^b\)/g
> eqn.replace(pat, 'c')
< "f*c - 3*c"

// case 2: replace only at the end of string
> eqn.replace(new RegExp(escapeRegExp(usr_str) + '$'), 'c')
"f*(a^b) - 3*c"

info Note that the / delimiter character isn't escaped in the above function. Use [.*+?^${}()|[\]\\\/] to escape the delimiter as well.

Dynamically building alternation

Examples in previous chapter showed cases where a single regexp can contain multiple patterns combined using | metacharacter. Often, you have an array of strings and you need to match any of their content literally. To do so, you need to escape all the metacharacters before combining the strings with | metacharacter. The function shown below uses the escapeRegExp function introduced in the previous section.

> function unionRegExp(arr) {
    return arr.map(w => escapeRegExp(w)).join('|')
  }

And here's some examples with unionRegExp function used to construct the required regexp.

// here, order of alternation wouldn't matter
// and assume that other regexp features aren't needed
> let w1 = ['c^t', 'dog$', 'f|x']
> const p1 = new RegExp(unionRegExp(w1), 'g')
> p1
< /c\^t|dog\$|f\|x/g
> 'c^t dog$ bee parrot f|x'.replace(p1, 'mammal')
< "mammal mammal bee parrot mammal"

// here, alternation precedence rules needs to be applied first
// and assume that the terms have to be matched as whole words
> let w2 = ['hand', 'handy', 'handful']
// sort by string length, longest first
> w2.sort((a, b) => b.length - a.length)
< ["handful", "handy", "hand"]
> const p2 = new RegExp(`\\b(${unionRegExp(w2)})\\b`, 'g')
> p2
< /\b(handful|handy|hand)\b/g
// note that 'hands' and 'handed' aren't replaced
> 'handful handed handy hands hand'.replace(p2, 'X')
< "X handed X hands X"

info The XRegExp utility provides XRegExp.escape and XRegExp.union methods. The union method has additional functionality of allowing a mix of string and RegExp literals and also takes care of renumbering backreferences.

source and flags properties

If you need the contents of a RegExp object, you can use source and flags properties to get the pattern string and flags respectively. These methods will help you to build a RegExp object using contents of another RegExp object.

> const p3 = /\bpar\b/
> const p4 = new RegExp(p3.source + '|cat', 'g')

> p4
< /\bpar\b|cat/g
> console.log(p4.source)
< \bpar\b|cat
> p4.flags
< "g"

> 'cater cat concatenate par spare'.replace(p4, 'X')
< "Xer X conXenate X spare"

Escaping delimiter

Another character to keep track for escaping is the delimiter used to define the RegExp literal. Or depending upon the pattern, you can use the new RegExp constructor to avoid escaping.

> let path = '/abc/123/foo/baz/ip.txt'

// this is known as 'leaning toothpick syndrome'
> path.replace(/^\/abc\/123\//, '~/')
< "~/foo/baz/ip.txt"

// using 'new RegExp' improves readability and can reduce typos
> path.replace(new RegExp(`^/abc/123/`), '~/')
< "~/foo/baz/ip.txt"

Escape sequences

Certain characters like tab and newline can be expressed using escape sequences as \t and \n respectively. These are similar to how they are treated in normal string literals. However, \b is for word boundaries as seen earlier, whereas it stands for backspace character in normal string literals. Additionally, there are several sequences that are specific to regexps.

The full list is mentioned in the Using special characters section of MDN documentation. These are \b \B \cX \d \D \f \n \p \P \r \s \S \t \uhhhh \uhhhhh \v \w \W \xhh \0.

> 'a\tb\tc'.replace(/\t/g, ':')
< "a:b:c"

> '1\n2\n3'.replace(/\n/g, ' ')
< "1 2 3"

// use \\ instead of \ when constructing regexp from string literals
> new RegExp('123\tabc')
< /123    abc/
> new RegExp('123\\tabc')
< /123\tabc/

Here's a console screenshot of another example.

Backslash in RegExp

If an escape sequence is not defined, it will be treated as the character it escapes.

// here \e is treated as e
> /\e/.test('hello')
< true

You can also represent a character using hexadecimal escape of the format \xhh where hh are exactly two hexadecimal characters. If you represent a metacharacter using escapes, it will be treated literally instead of its metacharacter feature. Codepoints section will discuss escapes for unicode characters.

// \x20 is space character
> 'h e l l o'.replace(/\x20/g, '')
< "hello"

// \x7c is '|' character
> '12|30'.replace(/2\x7c3/g, '5')
< "150"
> '12|30'.replace(/2|3/g, '5')
< "15|50"

info See ASCII code table for a handy cheatsheet with all the ASCII characters and their hexadecimal representation.

Cheatsheet and Summary

NoteDescription
\prefix metacharacters with \ to match them literally
\\to match \ literally
sourceproperty to convert RegExp object to string
helps to insert a RegExp inside another RegExp
flagsproperty to get flags of a RegExp object
RegExp(`pat`)helps to avoid or reduce escaping the / delimiter character
Alternation precedencetie-breaker is left to right if matches have same starting location
robust solution: sort the alternations based on length, longest first

Exercises

a) Transform given input strings to expected output using same logic on both strings.

> let str1 = '(9-2)*5+qty/3'
> let str2 = '(qty+4)/2-(9-2)*5+pq/4'

> const pat1 =      // add your solution here
> str1.replace()        // add your solution here
< "35+qty/3"
> str2.replace()        // add your solution here
< "(qty+4)/2-35+pq/4"

b) Replace (4)\| with 2 only at the start or end of given input strings.

> let s1 = '2.3/(4)\\|6 foo 5.3-(4)\\|'
> let s2 = '(4)\\|42 - (4)\\|3'
> let s3 = 'two - (4)\\|\n'

> const pat2 =      // add your solution here

> s1.replace()      // add your solution here
< "2.3/(4)\|6 foo 5.3-2"
> s2.replace()      // add your solution here
< "242 - (4)\|3"
> s3.replace()      // add your solution here
< "two - (4)\|
  "

c) Replace any matching item from given array with X for the given input strings.

> let items = ['a.b', '3+n', 'x\\y\\z', 'qty||price', '{n}']

// add your solution here
> const pat3 =      // add your solution here

> '0a.bcd'.replace(pat3, 'X')
< "0Xcd"
> 'E{n}AMPLE'.replace(pat3, 'X')
< "EXAMPLE"
> '43+n2 ax\\y\\ze'.replace(pat3, 'X')
< "4X2 aXe"

d) Replace backspace character \b with a single space character for the given input string.

> let ip = '123\b456'

> ip.replace()      // add your solution here
< "123 456"

e) Replace all occurrences of \e with e.

> let ip = 'th\\er\\e ar\\e common asp\\ects among th\\e alt\\ernations'

> ip.replace()      // add your solution here
< "there are common aspects among the alternations"

f) Replace any matching item from the array eqns with X for given the string ip. Match the items from eqns literally.

> let ip = '3-(a^b)+2*(a^b)-(a/b)+3'
> let eqns = ['(a^b)', '(a/b)', '(a^b)+2']

// add your solution here
> const pat4 =      // add your solution here

> ip.replace(pat4, 'X')
< "3-X*X-X+3"