Anchors

In this chapter, you'll be learning about qualifying a pattern. Instead of matching anywhere in the input string, restrictions can be specified. For now, you'll see the ones that are already part of regexp features. In later chapters, you'll learn how to define custom rules.

These restrictions are made possible by assigning special meaning to certain characters and escape sequences. The characters with special meaning are known as metacharacters in regexp parlance. In case you need to match those characters literally, you need to escape them with a \ character (discussed in the Escaping metacharacters chapter).

String anchors

This restriction is about qualifying a regexp to match only at the start or end of an input string. These provide functionality similar to the string methods startsWith() and endsWith(). First up, the ^ metacharacter which restricts the matching to the start of string.

// ^ is placed as a prefix to the search term
> /^cat/.test('cater')
< true
> /^cat/.test('concatenation')
< false

> /^hi/.test('hi hello\ntop spot')
< true
> /^top/.test('hi hello\ntop spot')
< false

To restrict the matching to the end of string, the $ metacharacter is used.

// $ is placed as a suffix to the search term
> /are$/.test('spare')
< true
> /are$/.test('nearest')
< false

> let words = ['surrender', 'unicorn', 'newer', 'door', 'empty', 'eel', 'pest']
> words.filter(w => /er$/.test(w))
< ['surrender', 'newer']
> words.filter(w => /t$/.test(w))
< ['pest']

Combining both the start and end string anchors, you can restrict the matching to the whole string. The effect is similar to comparing strings using the == operator.

> /^cat$/.test('cat')
< true
> /^cat$/.test('cater')
< false

You can emulate string concatenation operations by using the anchors by themselves as a pattern.

// insert text at the start of a string
> 'live'.replace(/^/, 're')
< 'relive'
> 'send'.replace(/^/, 're')
< 'resend'

// appending text
> 'cat'.replace(/$/, 'er')
< 'cater'
> 'hack'.replace(/$/, 'er')
< 'hacker'

Line anchors

A string input may contain single or multiple lines. The characters \r, \n, \u2028 (line separator) and \u2029 (paragraph separator) are considered as line separators. When the m flag is used, the ^ and $ anchors will match the start and end of every line respectively.

// check if any line in the string starts with 'top'
> /^top/m.test('hi hello\ntop spot')
< true

// check if any line in the string ends with 'er'
> /er$/m.test('spare\npar\nera\ndare')
< false

// filter elements having lines ending with 'are'
> let elements = ['spare\ntool', 'par\n', 'dare', 'spared']
> elements.filter(e => /are$/m.test(e))
< ['spare\ntool', 'dare']

// check if any whole line in the string is 'par'
> /^par$/m.test('spare\npar\nera\ndare')
< true

Just like string anchors, you can use the line anchors by themselves as a pattern.

> let items = 'catapults\nconcatenate\ncat'

> console.log(items.replace(/^/gm, '* '))
< * catapults
  * concatenate
  * cat

> console.log(items.replace(/$/gm, '.'))
< catapults.
  concatenate.
  cat.

If there is a line separator character at the end of string, there is an additional start/end of line match after the separator.
// 'fig ' is inserted three times
> console.log('1\n2\n'.replace(/^/mg, 'fig '))
< fig 1
  fig 2
  fig 

> console.log('1\n2\n'.replace(/$/mg, ' apple'))
< 1 apple
  2 apple
   apple

If you are dealing with Windows OS based text files, you may have to convert \r\n line endings to \n first. Otherwise, you'll get end of line matches for both the \r and \n characters. You can also handle this case in regexp by making \r as an optional character with quantifiers (see the Greedy quantifiers section for examples).

Word anchors

The third type of restriction is word anchors. Alphabets (irrespective of case), digits and the underscore character qualify as word characters. You might wonder why there are digits and underscores as well, why not just alphabets? This comes from variable and function naming conventions — typically alphabets, digits and underscores are allowed. So, the definition is more oriented to programming languages than natural ones.

The escape sequence \b denotes a word boundary. This works for both the start and end of word anchoring. Start of word means either the character prior to the word is a non-word character or there is no character (start of string). Similarly, end of word means the character after the word is a non-word character or no character (end of string). This implies that you cannot have word boundary \b without a word character.

> let words = 'par spar apparent spare part'

// replace 'par' irrespective of where it occurs
> words.replace(/par/g, 'X')
< 'X sX apXent sXe Xt'
// replace 'par' only at the start of word
> words.replace(/\bpar/g, 'X')
< 'X spar apparent spare Xt'
// replace 'par' only at the end of word
> words.replace(/par\b/g, 'X')
< 'X sX apparent spare part'
// replace 'par' only if it is not part of another word
> words.replace(/\bpar\b/g, 'X')
< 'X spar apparent spare part'

Using word boundary as a pattern by itself can yield creative solutions:

// space separated words to double quoted csv
// note that the 'replace' method is used twice here
> let words = 'par spar apparent spare part'
> console.log(words.replace(/\b/g, '"').replace(/ /g, ','))
< "par","spar","apparent","spare","part"

// make a programming statement more readable
// shown for illustration purpose only, won't work for all cases
> 'output=num1+35*42/num2'.replace(/\b/g, ' ')
< ' output = num1 + 35 * 42 / num2 '
// excess space at the start/end of string can be trimmed off
// later you'll learn how to add a qualifier so that trim is not needed
> 'output=num1+35*42/num2'.replace(/\b/g, ' ').trim()
< 'output = num1 + 35 * 42 / num2'

Opposite Word Anchor

The word boundary has an opposite anchor too. \B matches wherever \b doesn't match. This duality will be seen with some other escape sequences too. Negative logic is handy in many text processing situations. But use it with care, you might end up matching things you didn't intend!

> let words = 'par spar apparent spare part'

// replace 'par' if it is not at the start of word
> words.replace(/\Bpar/g, 'X')
< 'par sX apXent sXe part'
// replace 'par' at the end of word but not the whole word 'par'
> words.replace(/\Bpar\b/g, 'X')
< 'par sX apparent spare part'
// replace 'par' if it is not at the end of word
> words.replace(/par\B/g, 'X')
< 'par spar apXent sXe Xt'
// replace 'par' if it is surrounded by word characters
> words.replace(/\Bpar\B/g, 'X')
< 'par spar apXent sXe part'

Here are some standalone pattern usage to compare and contrast the two word anchors.

> 'copper'.replace(/\b/g, ':')
< ':copper:'
> 'copper'.replace(/\B/g, ':')
< 'c:o:p:p:e:r'

> '-----hello-----'.replace(/\b/g, ' ')
< '----- hello -----'
> '-----hello-----'.replace(/\B/g, ' ')
< ' - - - - -h e l l o- - - - - '

Cheatsheet and Summary

Note	Description
metacharacter	characters with special meaning in regexp
`^`	restricts the match to the start of string
`$`	restricts the match to the end of string
`m`	flag to match the start/end of line with `^` and `$` anchors
	`\r`, `\n`, `\u2028` and `\u2029` are line separators
	DOS-style files use `\r\n`, may need special attention
`\b`	restricts the match to the start and end of words
	word characters: alphabets, digits, underscore
`\B`	matches wherever `\b` doesn't match

In this chapter, you've begun to see building blocks of regular expressions and how they can be used in interesting ways. But at the same time, regular expression is but another tool in the land of text processing. Often, you'd get simpler solution by combining regular expressions with other string methods and expressions. Practice, experience and imagination would help you construct creative solutions. In the coming chapters, you'll see examples for anchors in combination with other regexp features.

Exercises

1) Check if the given input strings contain is or the as whole words.

> let str1 = 'is; (this)'
> let str2 = "The food isn't good"
> let str3 = 'the2 cats'
> let str4 = 'switch on the light'

> const pat1 =      // add your solution here
> const pat2 =      // add your solution here

> pat1.test(str1) || pat2.test(str1)
< true
> pat1.test(str2) || pat2.test(str2)
< false
> pat1.test(str3) || pat2.test(str3)
< false
> pat1.test(str4) || pat2.test(str4)
< true

2) For the given input string, change only the whole word red to brown.

> let ip = 'bred red spread credible red;'

> ip.replace()       // add your solution here
< 'bred brown spread credible brown;'

3) For the given array, filter all elements that contain 42 surrounded by word characters.

> let items = ['hi42bye', 'nice1423', 'bad42', 'cool_42a', 'fake4b']

> items.filter(e => test(e))       // add your solution here
< ['hi42bye', 'nice1423', 'cool_42a']

4) For the given input array, filter all elements that start with den or end with ly.

> let items = ['lovely', '1\ndentist', '2 lonely', 'eden', 'fly\n', 'dent']

> items.filter(e => test(e) || test(e))        // add your solution here
< ['lovely', '2 lonely', 'dent']

5) For the given input string, change whole word mall to 1234 only if it is at the start of a line.

> let para = `(mall) call ball pall
ball fall wall tall
mall call ball pall
wall mall ball fall
mallet wallet malls
mall:call:ball:pall`

> console.log(para.replace())        // add your solution here
< (mall) call ball pall
  ball fall wall tall
  1234 call ball pall
  wall mall ball fall
  mallet wallet malls
  1234:call:ball:pall

6) For the given array, filter all elements having a line starting with den or ending with ly.

> let items = ['lovely', '1\ndentist', '2 lonely', 'eden', 'fly\nfar', 'dent']

> items.filter(e => test(e) || test(e))      // add your solution here
< ['lovely', '1\ndentist', '2 lonely', 'fly\nfar', 'dent']

7) For the given input array, filter all whole elements 12\nthree irrespective of case.

> let items = ['12\nthree\n', '12\nThree', '12\nthree\n4', '12\nthree']

> items.filter(e => test(e))     // add your solution here
< ['12\nThree', '12\nthree']

8) For the given input array, replace hand with X for all elements that start with hand followed by at least one word character.

> let items = ['handed', 'hand', 'handy', 'un-handed', 'handle', 'hand-2']

> items.map(w => w.replace())        // add your solution here
< ['Xed', 'hand', 'Xy', 'un-handed', 'Xle', 'hand-2']

9) For the given input array, filter all elements starting with h. Additionally, replace e with X for these filtered elements.

> let items = ['handed', 'hand', 'handy', 'unhanded', 'handle', 'hand-2']

> items.filter(w => test(w)).map(w => w.replace())        // add your solution here
< ['handXd', 'hand', 'handy', 'handlX', 'hand-2']

10) Why does the following code show false instead of true?

> /end$/.test('bend it\nand send\n')
< false

Understanding JavaScript RegExp