Working with matched portions

You have already seen a few features that can match varying text. In this chapter, you'll learn how to extract and work with those matching portions. Three new methods are introduced. You'll also learn a few tricks like using functions and dictionaries in the replacement section of the replace() method.

match() method

The match() method can be used in different ways. When the g flag isn't used and the regexp succeeds, you get an array object containing various details of the first matching portion.

// note that the 'g' flag isn't used
> 'too soon a song snatch'.match(/so+n/)
< ['soon', index: 4, input: 'too soon a song snatch', groups: undefined]

// to get only the matching portion
> 'too soon a song snatch'.match(/so+n/)[0]
< 'soon'

// non-regexp object will get processed as: RegExp(object)
> 'too soon a song snatch'.match('so+n')
< ['soon', index: 4, input: 'too soon a song snatch', groups: undefined]

The index property gives the starting location of the matched portion. The input property gives the input string on which the match() method was used. If the given regexp fails, the output is null and not an empty array. The groups property will be discussed in the Named capture groups section. See MDN: match for more details and examples.

> let s1 = 'cat and dog'

> s1.match(/dog/).index
< 8
> s1.match(/dog/).input
< 'cat and dog'

> s1.match(/xyz/)
< null

search() method

The search() method gives the index of the first matching portion. If the regexp fails, it returns -1 as the output.

// same as: match(/dog/).index
> 'cat and dog'.search(/dog/)
< 8

// cannot use match(/xyz/).index here
// as 'match' returns 'null' when the regexp doesn't match
> 'cat and dog'.search(/xyz/)
< -1

Capture groups

The () grouping is also known as a capture group. It has multiple uses, one of which is the ability to work with matched portions of those groups. When capture groups are used with the match() method, the matched portions of those groups will also be part of the array output. The first element is always the entire matched portion followed by text captured by groups (if they are present).

> let motivation = 'improve yourself.'

> motivation.match(/pr.*our/)
< ['prove your', index: 2, input: 'improve yourself.', groups: undefined]

// retrieving the entire matched portion
> motivation.match(/pr.*our/)[0]
< 'prove your'

Here's an example with capture groups. The leftmost ( will get group number 1, second leftmost ( will get group number 2 and so on.

> let purchase = 'coffee:100g tea:250g sugar:75g chocolate:50g'

// there are three capture groups used here
> let m = purchase.match(/:(.*?)g.*?:(.*?)g.*?chocolate:(.*?)g/)
> m
< [':100g tea:250g sugar:75g chocolate:50g', '100', '250', '50', index: 6,
   input: 'coffee:100g tea:250g sugar:75g chocolate:50g', groups: undefined]

// capture group portions
> m[1]
< '100'
> m[3]
< '50'

d flag

You can use the d flag to get both the starting and ending locations of the matching portions. Ending location is calculated by adding the length of the matching portion to the starting index. Here's an example:

// note the addition of the 'indices' property
> 'awesome'.match(/so/d)
< ['so', index: 3, input: 'awesome', groups: undefined, indices: Array(1)]

// start and end+1 location of the entire matching portion
> 'awesome'.match(/so/d).indices[0]
< [3, 5]

And here's an example when capture groups are used as well:

> 'coffee:100g tea:250g'.match(/:(.*?)g/d)
< [':100g', '100', index: 6, input: 'coffee:100g tea:250g',
   groups: undefined, indices: Array(2)]

// locations for the entire match
> 'coffee:100g tea:250g'.match(/:(.*?)g/d).indices[0]
< [6, 11]

// locations for the first capture group
> 'coffee:100g tea:250g'.match(/:(.*?)g/d).indices[1]
< [7, 10]

Getting all the matched portions

The match() method returns all the matched portions when the g flag is used. Capture group portions and the three properties won't be part of the output.

> 'too soon a song snatch'.match(/so*n/g)
< ['soon', 'son', 'sn']

> 'too soon a song snatch'.match(/so+n/g)
< ['soon', 'son']

> 'PAR spar apparent SpArE part pare'.match(/\bs?pare?\b/ig)
< ['PAR', 'spar', 'SpArE', 'pare']

// entire matching portion is returned even if capture groups are used
> 'par spar apparent spare part'.match(/\bs?par(e|t)\b/g)
< ['spare', 'part']

It is useful for debugging purposes as well. For example, to see the potential matches before using the replace() method.

> 'green:3.14:teal::brown:oh!:blue'.match(/:.*:/g)
< [':3.14:teal::brown:oh!:']

> 'green:3.14:teal::brown:oh!:blue'.match(/:.*?:/g)
< [':3.14:', '::', ':oh!:']

matchAll() method

If you need the capture group portions and properties for every match with the g flag active, use the matchAll() method. An iterator will be returned as the output.

> 'song too soon snatch'.matchAll(/so*n/g)
< RegExpStringIterator {}

// convert the iterator result to an array of arrays
> let arr = [...'song too soon snatch'.matchAll(/so*n/g)]
> arr
< (3) [Array(1), Array(1), Array(1)]
  0: ['son', index: 0, input: 'song too soon snatch', groups: undefined]
  1: ['soon', index: 9, input: 'song too soon snatch', groups: undefined]
  2: ['sn', index: 14, input: 'song too soon snatch', groups: undefined]
  length: 3
  [[Prototype]]: Array(0)

// get array with details for the first match
> arr[0]
< ['son', index: 0, input: 'song too soon snatch', groups: undefined]
// get starting index for the second match
> arr[1].index
< 9

You can also use Array.from() to convert the iterator to an array object. This allows you to provide a mapping function as the second argument.

// same as: match(/so*n/g)
> Array.from('song too soon snatch'.matchAll(/so*n/g), m => m[0])
< ['son', 'soon', 'sn']
// get starting index for each match
> Array.from('song too soon snatch'.matchAll(/so*n/g), m => m.index)
< [0, 9, 14]

// get only the capture group portions as an array for each match
> Array.from('2023/04,1986/Mar,'.matchAll(/(.*?)\/(.*?),/g), m => m.slice(1))
< (2) [Array(2), Array(2)]
  0: (2) ['2023', '04']
  1: (2) ['1986', 'Mar']
  length: 2
  [[Prototype]]: Array(0)

Before the introduction of the matchAll() method, the exec() method had to be used. See MDN: exec for details and examples.

split() with capture groups

Capture groups affect the split() method as well. If the pattern used to split contains capture groups, the portions matched by those groups will also be a part of the output array.

// without capture group
> '31111111111251111426'.split(/1*4?2/)
< ['3', '5', '6']

// to include the matching portions of the pattern as well in the output
> '31111111111251111426'.split(/(1*4?2)/)
< ['3', '11111111112', '5', '111142', '6']

If part of the pattern is outside a capture group, the text thus matched won't be in the output. If a capture group didn't participate, it will be represented by undefined in the output array.

// here 4?2 is outside the capture group, so that portion won't be in the output
> '31111111111251111426'.split(/(1*)4?2/)
< ['3', '1111111111', '5', '1111', '6']

// multiple capture groups example
// note that the portion matched by b+ isn't present in the output
> '3.14aabccc42'.split(/(a+)b+(c+)/)
< ['3.14', 'aa', 'ccc', '42']

// here (4)? matches zero times on the first occasion
> '31111111111251111426'.split(/(1*)(4)?2/)
< ['3', '1111111111', undefined, '5', '1111', '4', '6']

Use of capture groups and the optional limit argument can help you partition an input string into three parts:

portion before the first match
portion matched by the pattern itself
portion after the pattern

// use the 's' flag as well if needed
> '3.14aabccc42abc88'.split(/(a+b+c+)(.*)/, 3)
< ['3.14', 'aabccc', '42abc88']

Using functions in the replacement section

Sometimes, a simple replacement string isn't enough and you need to do some processing on the matched portion. For such cases, you can use functions in the replacement section. The arguments available to the function are similar to the details provided by the match() method. The first one is the entire matched portion. If capture groups are used, the portions matched by those groups will be next. Then comes the index of the matched portion and finally the input string. Depending on the complexity, you can use fully defined functions or arrow function expressions.

> function titleCase(m) {
      return m[0].toUpperCase() + m.substr(1).toLowerCase()
  }

// only the function name is enough as the second argument
// the matched portion details will be passed automatically to the function
// in this example, 'titleCase' is using only the entire matched portion
> 'aBc ac ADC aBbBC'.replace(/a.*?c/ig, titleCase)
< 'Abc Ac Adc Abbbc'

// can also use arrow function expressions for simple cases
> 'abc ac adc abbbc'.replace(/ab*c/g, m => m.toUpperCase())
< 'ABC AC adc ABBBC'

// \d will be covered later
// for now, it is enough to know that it will match all the digit characters
> '1 42 317'.replace(/\d+/g, m => m*2)
< '2 84 634'

Here's an example with capture groups. See also MDN: replace for more details.

> function titleCase(m, g1, g2) {
        return g1.toUpperCase() + g2.toLowerCase()
  }
> 'aBc ac ADC aBbBC'.replace(/(a)(.*?c)/ig, titleCase)
< 'Abc Ac Adc Abbbc'

Using dictionary in the replacement section

Sometimes, the functionality you need in the replacement section can be simplified by using a dictionary. The matched portion acts as the key to get the corresponding value from the dictionary.

// one to one mappings
> let h = { '1': 'one', '2': 'two', '4': 'four' }

> '9234012'.replace(/1|2|4/g, k => h[k])
< '9two3four0onetwo'

// if the matched text doesn't exist as a key, the default value will be used
// recall that \d matches all the digit characters
> '9234012'.replace(/\d/g, k => k in h ? h[k] : 'X')
< 'XtwoXfourXonetwo'

For swapping two or more portions without using intermediate results, using a dictionary is recommended.

> let swap = { 'cat': 'tiger', 'tiger': 'cat' }

> 'cat tiger dog tiger cat'.replace(/cat|tiger/g, k => swap[k])
< 'tiger cat dog cat tiger'

For a dictionary that has many entries and likely to undergo changes during development, building alternation list manually is not a good choice. Also, recall that as per precedence rules, the longest length string should come first. The unionRegExp() function, introduced in the Dynamically building alternation section, is helpful here.

> let d = { 'hand': 1, 'handy': 2, 'handful': 3, 'a^b': 4 }

// sort the keys to handle precedence rules
// add anchors if needed
> const p = unionRegExp(Object.keys(d).sort((a, b) => b.length - a.length))
> console.log(p)
< handful|handy|hand|a\^b

> 'handful hand pin handy (a^b)'.replace(new RegExp(p, 'g'), k => d[k])
< '3 1 pin 2 (4)'

Cheatsheet and Summary

Note	Description
`m = s.match(/pat/)`	assuming the `g` flag isn't used and regexp succeeds,
	returns an array with the matched portion and 3 properties
	`index` property gives the starting location of the match
	`input` property gives the input string `s`
	`groups` property gives dictionary of named capture groups
`m[0]`	for the above case, gives the entire matched portion
`m[1]`	matched portion of the first capture group
`m[2]`	matched portion of the second capture group and so on
`d`	flag to get the starting and ending locations
	of the matching portions via the `indices` property
`s.match(/pat/g)`	returns only the matched portions, no properties
	capture groups don't affect the output
	`match` returns `null` if the regexp fails
`s.matchAll(/pat/g)`	returns an iterator containing details for
	each matched portion and its properties
	use `[...]` or `Array.from()` to convert to an array
	`Array.from()` also allows mapping function
`s.replace(/pat/, func)`	you can use a function to provide the replacement string
	each matched portion details gets passed as arguments
	similarly, dictionary can be used for replacement
`s.search(/pat/)`	gives the starting location of the first match if regexp succeeds
	`-1` if regexp fails
`split()`	capture groups affect the `split()` method too
	text matched by the groups will be part of the output
	portion matched by pattern outside groups won't be in output
	group that didn't match will be represented by `undefined`

This chapter introduced the match() and matchAll() methods, which allow you to work with various matching portions of the input string. The search() method is handy if you just need the starting location of the first match. You can use the d flag if both the starting and ending locations are needed. The replace() method allows you to use a function in the replacement section, which helps to process the matching portions before being used as the replacement string. You can also use a dictionary to provide the replacement string based on the matched portion as keys. You learnt about capture groups and you'll see even more uses of groupings in the coming chapters.

Exercises

1) For the given strings, extract the matching portion from the first is to the last t.

> let str1 = 'What is the biggest fruit you have seen?'
> let str2 = 'Your mission is to read and practice consistently'

> const pat1 =      // add your solution here

// add your solution here for str1
< 'is the biggest fruit'
// add your solution here for str2
< 'ission is to read and practice consistent'

2) Find the starting index of the first occurrence of is or the or was or to for the given input strings. Assume that there will be at least one match for each input string.

> let s1 = 'match after the last newline character'
> let s2 = 'and then you want to test'
> let s3 = 'this is good bye then'
> let s4 = 'who was there to see?'

> const pat2 =      // add your solution here

// add your solution here for s1
< 12
// add your solution here for s2
< 4
// add your solution here for s3
< 2
// add your solution here for s4
< 4

3) Find the starting index of the last occurrence of is or the or was or to for the given input strings. Assume that there will be at least one match for each input string.

> let s1 = 'match after the last newline character'
> let s2 = 'and then you want to test'
> let s3 = 'this is good bye then'
> let s4 = 'who was there to see?'

> const pat3 =      // add your solution here

// add your solution here for s1
< 12
// add your solution here for s2
< 18
// add your solution here for s3
< 17
// add your solution here for s4
< 14

4) The given input string contains : exactly once. Extract all characters after the : as output.

> let ip = 'fruits:apple, mango, guava, blueberry'

// add your solution here
< 'apple, mango, guava, blueberry'

5) Extract all words between ( and ) from the given input string as an array (including the parentheses). Assume that the input will not contain any broken parentheses.

> let ip = 'another (way) to reuse (portion) matched (by) capture groups'

// add your solution here
< ['(way)', '(portion)', '(by)']

6) Extract all occurrences of < up to the next occurrence of >, provided there is at least one character in between < and >.

> let ip = 'a<apple> 1<> b<bye> 2<> c<cat>'

// add your solution here
< ['<apple>', '<> b<bye>', '<> c<cat>']

7) Use matchAll() to get the output as shown below for the given input strings. Note the characters used in the input strings carefully.

> let row1 = '-2,5 4,+3 +42,-53 4356246,-357532354 '
> let row2 = '1.32,-3.14 634,5.63 63.3e3,9907809345343.235 '

> const pat4 =      // add your solution here

// add your solution here for row1
< (4) [Array(2), Array(2), Array(2), Array(2)]
  0: (2) ['-2', '5']
  1: (2) ['4', '+3']
  2: (2) ['+42', '-53']
  3: (2) ['4356246', '-357532354']
  length: 4
  [[Prototype]]: Array(0)

// add your solution here for row2
< (3) [Array(2), Array(2), Array(2)]
  0: (2) ['1.32', '-3.14']
  1: (2) ['634', '5.63']
  2: (2) ['63.3e3', '9907809345343.235']
  length: 3
  [[Prototype]]: Array(0)

8) This is an extension to the previous question. Sum each pair of numbers that are separated by a comma.

For row1, find the sum of integers. For example, sum of -2 and 5 is 3.
For row2, find the sum of floating-point numbers. For example, sum of 1.32 and -3.14 is -1.82.

> let row1 = '-2,5 4,+3 +42,-53 4356246,-357532354 '
> let row2 = '1.32,-3.14 634,5.63 63.3e3,9907809345343.235 '

// should be same as the previous question
> const pat5 =      // add your solution here

// add your solution here for row1
< [3, 7, -11, -353176108]

// add your solution here for row2
< [-1.82, 639.63, 9907809408643.234]

9) Use the split() method to get the output as shown below.

> let ip = '42:no-output;1000:car-tr:u-ck;SQEX49801'

// add your solution here
< ['42', 'output', '1000', 'tr:u-ck', 'SQEX49801']

10) Write a string function that changes the given input to alternate case. The first alphabet should be changed to lowercase, the next one to uppercase and then lowercase and so on. Characters other than alphabets should be left alone and not affect case changing.

> function aLtErNaTeCaSe(ip) {
      // add your solution here
  }

> aLtErNaTeCaSe('HI THERE!')
< 'hI tHeRe!'
> aLtErNaTeCaSe('good morning')
< 'gOoD mOrNiNg'
> aLtErNaTeCaSe('Sample123string42with777numbers')
< 'sAmPlE123sTrInG42wItH777nUmBeRs'

11) Replace all occurrences of par with spar, spare with extra and park with garden.

> let s1 = 'apartment has a park'
> let s2 = 'do you have a spare cable'
> let s3 = 'write a parser'

> let d1 =          // add your solution here
> const pat6 =      // add your solution here

> s1.replace(pat6, k => d1[k])
< 'aspartment has a garden'
> s2.replace(pat6, k => d1[k])
< 'do you have a extra cable'
> s3.replace(pat6, k => d1[k])
< 'write a sparser'

12) Name the flag and property you can use with the match() method to get both the starting and ending locations of the matched portions.

Understanding JavaScript RegExp