Escaping metacharacters

This chapter will show how to match metacharacters literally. Examples will be discussed for both manually as well as programmatically constructed patterns. You'll also learn about escape sequences supported by regexp and how they differ from strings.

Escaping with backslash

You have seen a few metacharacters and escape sequences for composing regexp literals. To match the metacharacters literally, i.e. to remove their special meaning, prefix those characters with a \ (backslash) character. To indicate a literal \ character, use \\.

To spice up the examples a bit, block form has been used below to modify the matched portion of string with an expression. In later chapters, you'll see more ways to work directly with the matched portions.

# even though ^ is not being used as anchor, it won't be matched literally
>> 'a^2 + b^2 - C*3'.match?(/b^2/)
=> false
# escaping will work
>> 'a^2 + b^2 - C*3'.gsub(/(a|b)\^2/) { _1.upcase }
=> "A^2 + B^2 - C*3"

# match ( or ) literally
>> '(a*b) + c'.gsub(/\(|\)/, '')
=> "a*b + c"

>> '\learn\by\example'.gsub(/\\/, '/')
=> "/learn/by/example"

As emphasized earlier, regular expressions is just another tool to process text. Some examples and exercises presented in this book can be solved using normal string methods as well. It is a good practice to reason out whether regular expressions is needed for a given problem.

>> eqn = 'f*(a^b) - 3*(a^b)'

# straightforward search and replace, no need regexp shenanigans
>> eqn.gsub('(a^b)', 'c')
=> "f*c - 3*c"

Regexp.escape method

How to escape all the metacharacters when a regexp is constructed dynamically? Relax, the Regexp.escape method has got you covered. No need to manually take care of all the metacharacters or worry about changes in future versions.

>> eqn = 'f*(a^b) - 3*(a^b)'
>> expr = '(a^b)'

>> puts Regexp.escape(expr)
\(a\^b\)

# replace only at the end of string
>> eqn.sub(/#{Regexp.escape(expr)}\z/, 'c')
=> "f*(a^b) - 3*c"

The Regexp.union method automatically applies escaping for string arguments.

# array of strings, assume alternation precedence sorting isn't needed
>> terms = %w[a_42 (a^b) 2|3]

>> pat = Regexp.union(terms)
>> pat
=> /a_42|\(a\^b\)|2\|3/

>> 'ba_423 (a^b)c 2|3 a^b'.gsub(pat, 'X')
=> "bX3 Xc X a^b"

Regexp.union will also take care of mixing string and regexp patterns correctly. The grouping (?-mix: seen in the output below will be explained in the Modifiers chapter.

>> Regexp.union(/^cat|dog$/, 'a^b')
=> /(?-mix:^cat|dog$)|a\^b/

Escaping delimiter

Another character to keep track for escaping is the delimiter used to define the regexp literal. You can also use a different delimiter with %r to avoid or minimize escaping. You need not worry about unescaped delimiters inside #{} interpolation.

>> path = '/home/joe/report/sales/ip.txt'

# \/ is known as 'leaning toothpick syndrome'
>> path.sub(/\A\/home\/joe\//, '~/')
=> "~/report/sales/ip.txt"

# a different delimiter improves readability and reduces typos
>> path.sub(%r#\A/home/joe/#, '~/')
=> "~/report/sales/ip.txt"

Escape sequences

In regexp literals, characters like tab and newline can be expressed using escape sequences as \t and \n respectively. These are similar to how they are treated in normal string literals (see ruby-doc: Strings for details). However, escapes like \b (word boundary) and \s (see the Escape sequence sets section) are different for regexps. Also, octal escapes \nnn have to be three digits to avoid conflict with Backreferences.

>> "a\tb\tc".gsub(/\t/, ':')
=> "a:b:c"

>> "1\n2\n3".gsub(/\n/, ' ')
=> "1 2 3"

If an escape sequence is not defined, it'll match the character that is escaped. For example, \% will match % and not \ followed by %.
>> 'h%x'.match?(/h\%x/)
=> true
>> 'h\%x'.match?(/h\%x/)
=> false

>> 'hello'.match?(/\l/)
=> true

If you represent a metacharacter using escapes, it will be treated literally instead of its metacharacter feature.

# \x20 is space character in hexadecimal format
>> 'h e l l o'.gsub(/\x20/, '')
=> "hello"
# \053 is + character in octal format
>> 'a+b'.match?(/a\053b/)
=> true

# \x7c is '|' character
>> '12|30'.gsub(/2\x7c3/, '5')
=> "150"
>> '12|30'.gsub(/2|3/, '5')
=> "15|50"

See ASCII code table for a handy cheatsheet with all the ASCII characters and their hexadecimal representations.

The Codepoints and Unicode escapes section will discuss escapes for unicode characters.

Cheatsheet and Summary

Note	Description
`\`	prefix metacharacters with `\` to match them literally
`\\`	to match `\` literally
`Regexp.escape(s)`	automatically escape all metacharacters for string `s`
	`Regexp.union` also automatically escapes string arguments
`%r`	helps to avoid/reduce escaping the delimiter character
`\t`	escape sequences like those supported in string literals
	but escapes like `\b` and `\s` have different meaning in regexps
`\%`	undefined escapes will match the character it escapes
`\x7c`	will match `\|` literally
	instead of acting as an alternation metacharacter

Exercises

1) Transform the given input strings to the expected output using the same logic on both strings.

>> str1 = '(9-2)*5+qty/3-(9-2)*7'
>> str2 = '(qty+4)/2-(9-2)*5+pq/4'

>> str1.gsub()      ##### add your solution here
=> "35+qty/3-(9-2)*7"
>> str2.gsub()      ##### add your solution here
=> "(qty+4)/2-35+pq/4"

2) Replace (4)\| with 2 only at the start or end of the given input strings.

>> s1 = '2.3/(4)\|6 fig 5.3-(4)\|'
>> s2 = '(4)\|42 - (4)\|3'
>> s3 = "two - (4)\\|\n"

>> pat =        ##### add your solution here

>> s1.gsub(pat, '2')
=> "2.3/(4)\\|6 fig 5.3-2"
>> s2.gsub(pat, '2')
=> "242 - (4)\\|3"
>> s3.gsub(pat, '2')
=> "two - (4)\\|\n"

3) Replace any matching item from the given array with X for the given input strings. Match the elements from items literally. Assume no two elements of items will result in any matching conflict.

>> items = ['a.b', '3+n', 'x\y\z', 'qty||price', '{n}']

>> pat =        ##### add your solution here

>> '0a.bcd'.gsub(pat, 'X')
=> "0Xcd"
>> 'E{n}AMPLE'.gsub(pat, 'X')
=> "EXAMPLE"
>> '43+n2 ax\y\ze'.gsub(pat, 'X')
=> "4X2 aXe"

4) Replace the backspace character \b with a single space character for the given input string.

>> ip = "123\b456"
>> puts ip
12456

>> ip.gsub()        ##### add your solution here
=> "123 456"

5) Replace all occurrences of \o with o.

>> ip = 'there are c\omm\on aspects am\ong the alternati\ons'

>> ip.gsub()        ##### add your solution here
=> "there are common aspects among the alternations"

6) Replace any matching item from the array eqns with X for the given string ip. Match the items from eqns literally.

>> ip = '3-(a^b)+2*(a^b)-(a/b)+3'
>> eqns = %w[(a^b) (a/b) (a^b)+2]

>> pat =        ##### add your solution here

>> ip.gsub(pat, 'X')
=> "3-X*X-X+3"

Understanding Ruby Regexp