Anchors

In this chapter, you'll be learning about qualifying a pattern. Instead of matching anywhere in the given string, restrictions can be specified. For now, you'll see the ones that are already part of regexp features. In later chapters, you'll learn how to define your own rules for restriction.

These restrictions are made possible by assigning special meaning to certain characters and escape sequences. The characters with special meaning are known as metacharacters in regexp parlance. In case you need to match those characters literally, you need to escape them with a \ character (discussed in Escaping metacharacters chapter).

String anchors

This restriction is about qualifying a regexp to match only at the start or end of an input string. These provide functionality similar to the string methods startsWith and endsWith. First up is ^ metacharacter, which restricts the matching to the start of string.

// ^ is placed as a prefix to the pattern
> /^cat/.test('cater')
< true
> /^cat/.test('concatenation')
< false

> /^hi/.test('hi hello\ntop spot')
< true
> /^top/.test('hi hello\ntop spot')
< false

To restrict the match to the end of string, $ metacharacter is used.

// $ is placed as a suffix to the pattern
> /are$/.test('spare')
< true
> /are$/.test('nearest')
< false

> let words = ['surrender', 'unicorn', 'newer', 'door', 'empty', 'eel', 'pest']
> words.filter(w => /er$/.test(w))
< ["surrender", "newer"]
> words.filter(w => /t$/.test(w))
< ["pest"]

Combining both the start and end string anchors, you can restrict the matching to the whole string. Similar to comparing strings using the == operator.

> /^cat$/.test('cat')
< true
> /^cat$/.test('cater')
< false

The anchors can be used by themselves as a pattern. Helps to insert text at the start or end of string, emulating string concatenation operations. These might not feel like useful capability, but combined with other regexp features they become quite a handy tool.

> 'live'.replace(/^/, 're')
< "relive"
> 'send'.replace(/^/, 're')
< "resend"

> 'cat'.replace(/$/, 'er')
< "cater"
> 'hack'.replace(/$/, 'er')
< "hacker"

Line anchors

A string input may contain single or multiple lines. The \r, \n, \u2028 (line separator) and \u2029 (paragraph separator) characters are considered as line separators. When the m flag is used, the ^ and $ anchors will match the start and end of every line respectively.

// check if any line in the string starts with 'top'
> /^top/m.test('hi hello\ntop spot')
< true

// check if any line in the string ends with 'er'
> /er$/m.test('spare\npar\nera\ndare')
< false

// check if any complete line in the string is 'par'
> /^par$/m.test('spare\npar\nera\ndare')
< true

Just like string anchors, you can use the line anchors by themselves as a pattern.

> let items = 'catapults\nconcatenate\ncat'

> console.log(items.replace(/^/gm, '* '))
< * catapults
  * concatenate
  * cat

> console.log(items.replace(/$/gm, '.'))
< catapults.
  concatenate.
  cat.

warning If there is a line separator character at the end of string, there is an additional start/end of line match after the separator.

// 'foo ' is inserted three times
> '1\n2\n'.replace(/^/mg, 'foo ')
< "foo 1
  foo 2
  foo "

> '1\n2\n'.replace(/$/mg, ' baz')
< "1 baz
  2 baz
   baz"

warning If you are dealing with Windows OS based text files, you may have to convert \r\n line endings to \n first. Otherwise, you'll get end of line matches for both \r and \n characters. You can also handle this case in regexp by making \r as optional character with quantifiers (see Greedy quantifiers section).

Word anchors

The third type of restriction is word anchors. Alphabets (irrespective of case), digits and the underscore character qualify as word characters. You might wonder why there are digits and underscores as well, why not only alphabets? This comes from variable and function naming conventions — typically alphabets, digits and underscores are allowed. So, the definition is more oriented to programming languages than natural ones.

The escape sequence \b denotes a word boundary. This works for both the start of word and end of word anchoring. Start of word means either the character prior to the word is a non-word character or there is no character (start of string). Similarly, end of word means the character after the word is a non-word character or no character (end of string). This implies that you cannot have word boundary \b without a word character.

> let sample = 'par spar apparent spare part'

// replace 'par' irrespective of where it occurs
> sample.replace(/par/g, 'X')
< "X sX apXent sXe Xt"
// replace 'par' only at the start of word
> sample.replace(/\bpar/g, 'X')
< "X spar apparent spare Xt"
// replace 'par' only at the end of word
> sample.replace(/par\b/g, 'X')
< "X sX apparent spare part"
// replace 'par' only if it is not part of another word
> sample.replace(/\bpar\b/g, 'X')
< "X spar apparent spare part"

You can get lot more creative with using word boundary as a pattern by itself.

// space separated words to double quoted csv
// note that 'replace' method is used twice here
> let sample = 'par spar apparent spare part'
> console.log(sample.replace(/\b/g, '"').replace(/ /g, ','))
< "par","spar","apparent","spare","part"

// make a programming statement more readable
// shown for illustration purpose only, won't work for all cases
> 'foo_baz=num1+35*42/num2'.replace(/\b/g, ' ')
< " foo_baz = num1 + 35 * 42 / num2 "
// excess space at start/end of string can be trimmed off
// later you'll learn how to add a qualifier so that trim is not needed
> 'foo_baz=num1+35*42/num2'.replace(/\b/g, ' ').trim()
< "foo_baz = num1 + 35 * 42 / num2"

The word boundary has an opposite anchor too. \B matches wherever \b doesn't match. This duality will be seen later with some other escape sequences too.

> let sample = 'par spar apparent spare part'

// replace 'par' if it is not start of word
> sample.replace(/\Bpar/g, 'X')
< "par sX apXent sXe part"
// replace 'par' at the end of word but not whole word 'par'
> sample.replace(/\Bpar\b/g, 'X')
< "par sX apparent spare part"
// replace 'par' if it is not end of word
> sample.replace(/par\B/g, 'X')
< "par spar apXent sXe Xt"
// replace 'par' if it is surrounded by word characters
> sample.replace(/\Bpar\B/g, 'X')
< "par spar apXent sXe part"

Here's some standalone pattern usage to compare and contrast the two word anchors.

> 'copper'.replace(/\b/g, ':')
< ":copper:"
> 'copper'.replace(/\B/g, ':')
< "c:o:p:p:e:r"

> '-----hello-----'.replace(/\b/g, ' ')
< "----- hello -----"
> '-----hello-----'.replace(/\B/g, ' ')
< " - - - - -h e l l o- - - - - "

warning Negative logic is handy in many text processing situations. But use it with care as you might end up matching things you didn't intend!

Cheatsheet and Summary

NoteDescription
metacharactercharacters with special meaning in regexp
^restricts the match to the start of string
$restricts the match to the end of string
mflag to match the start/end of line with ^ and $ anchors
\r, \n, \u2028 and \u2029 are line separators
dos-style files use \r\n, may need special attention
\brestricts the match to the start/end of words
word characters: alphabets, digits, underscore
\Bmatches wherever \b doesn't match

In this chapter, you've begun to see building blocks of regular expressions and how they can be used in interesting ways. But at the same time, regular expression is but another tool in the land of text processing. Often, you'd get simpler solution by combining regular expressions with normal string methods. Practice, experience and imagination would help you construct creative solutions. In coming chapters, you'll see more applications of anchors in combination with other regexp features.

Exercises

a) Check if the given input strings contain is or the as whole words.

> let str1 = 'is; (this)'
> let str2 = "The food isn't good"
> let str3 = 'the2 cats'
> let str4 = 'switch on the light'

> const pat1 =      // add your solution here
> const pat2 =      // add your solution here

> pat1.test(str1) || pat2.test(str1)
< true
> pat1.test(str2) || pat2.test(str2)
< false
> pat1.test(str3) || pat2.test(str3)
< false
> pat1.test(str4) || pat2.test(str4)
< true

b) For the given input string, change only whole word red to brown

> let ip = 'bred red spread credible red;'

> ip.replace()       // add your solution here
< "bred brown spread credible brown;"

c) For the given array, filter all elements that contains 42 surrounded by word characters.

> let items = ['hi42bye', 'nice1423', 'bad42', 'cool_42a', 'fake4b']

> items.filter(e => test(e))       // add your solution here
< ["hi42bye", "nice1423", "cool_42a"]

d) For the given input array, filter all elements that start with den or end with ly

> let items = ['lovely', '1\ndentist', '2 lonely', 'eden', 'fly\n', 'dent']

> items.filter(e => test(e) || test(e))        // add your solution here
< ["lovely", "2 lonely", "dent"]

e) For the given input string, change whole word mall to 1234 only if it is at the start of line.

> let para = `ball fall wall tall
mall call ball pall
wall mall ball fall
mallet wallet malls`

> console.log(para.replace())        // add your solution here
< ball fall wall tall
  1234 call ball pall
  wall mall ball fall
  mallet wallet malls

f) For the given array, filter all elements having a line starting with den or ending with ly.

> let items = ['lovely', '1\ndentist', '2 lonely', 'eden', 'fly\nfar', 'dent']

> items.filter(e => test(e) || test(e))      // add your solution here
< ["lovely", "1\ndentist", "2 lonely", "fly\nfar", "dent"]

g) For the given input array, filter all whole elements 12\nthree irrespective of case.

> let items = ['12\nthree\n', '12\nThree', '12\nthree\n4', '12\nthree']

> items.filter(e => test(e))     // add your solution here
< ["12\nThree", "12\nthree"]

h) For the given input array, replace hand with X for all words that start with hand followed by at least one word character.

> let items = ['handed', 'hand', 'handy', 'unhanded', 'handle', 'hand-2']

> items.map(w => w.replace())        // add your solution here
< ["Xed", "hand", "Xy", "unhanded", "Xle", "hand-2"]

i) For the given input array, filter all elements starting with h. Additionally, replace e with X for these filtered elements.

> let items = ['handed', 'hand', 'handy', 'unhanded', 'handle', 'hand-2']

> items.filter(w => test(w)).map(w => w.replace())        // add your solution here
< ["handXd", "hand", "handy", "handlX", "hand-2"]

j) Why does the following code show false instead of true?

> /end$/.test('bend it\nand send\n')
< false