Working with matched portions

Having seen a few regexp features that can match varying text, you'll learn how to extract and work with those matching portions in this chapter. Three new methods are introduced. You'll also learn a few tricks like using functions and dictionaries in replacement section of replace method.

match method

The match method can be used in different ways. When g flag isn't used and the regexp succeeds, you get an array object containing various details of the first matching portion.

// note that 'g' flag isn't used
> 'abc ac adc abbbc'.match(/ab*c/)
< ["abc", index: 0, input: "abc ac adc abbbc", groups: undefined]

// to get only the matching portion
> 'abc ac adc abbbc'.match(/ab*c/)[0]
< "abc"

// non-regexp object will get processed as: RegExp(object)
> 'abc ac adc abbbc'.match('ab*c')
< ["abc", index: 0, input: "abc ac adc abbbc", groups: undefined]

The index property gives the starting location of matched portion. The input property gives the input string on which the match method was used. If the given regexp fails, the output is null and not an empty array. The groups property will be discussed in Named capture groups section. See MDN: match for more details and examples.

> let s1 = 'cat and dog'

> s1.match(/dog/).index
< 8
> s1.match(/dog/).input
< "cat and dog"

> s1.match(/xyz/)
< null

search method

The search method gives the index of first matching portion. If the regexp fails, it returns -1 as output.

// same as: match(/dog/).index
> 'cat and dog'.search(/dog/)
< 8
// cannot use match(/xyz/).index here
// as 'match' returns 'null' if regexp doesn't match
> 'cat and dog'.search(/xyz/)
< -1

Capture groups

The regexp grouping inside () is also known as a capture group. It has multiple uses, one of which is the ability to work with matched portions of those groups. When capture groups are used with match method, the matched portions of those groups will also be part of the array output. The first element is always the entire matched portion followed by portions of capture groups. The leftmost ( will get group number 1, second leftmost ( will get group number 2 and so on.

// there are two capture groups used here
> 'abc ac adc abbbc'.match(/a(.*)d(.*a)/)
< ["abc ac adc a", "bc ac a", "c a", index: 0,
   input: "abc ac adc abbbc", groups: undefined]

// entire matched portion
> 'abc ac adc abbbc'.match(/a(.*)d(.*a)/)[0]
< "abc ac adc a"

// capture group portions
> 'abc ac adc abbbc'.match(/a(.*)d(.*a)/)[1]
< "bc ac a"
> 'abc ac adc abbbc'.match(/a(.*)d(.*a)/)[2]
< "c a"

Getting all matched portions

The match method returns all the matched portions when the g flag is used. Capture group portions and the three properties won't be part of the output.

> 'abc ac adc abbbc'.match(/ab*c/g)
< ["abc", "ac", "abbbc"]

> 'abc ac adc abbbc'.match(/ab+c/g)
< ["abc", "abbbc"]

> 'par spar apparent spare part'.match(/\bs?pare?\b/g)
< ["par", "spar", "spare"]

// entire matching portion is returned even if capture group is used
> 'par spar apparent spare part'.match(/\bs?par(e|t)\b/g)
< ["spare", "part"]

It is a useful method for debugging purposes as well, for example to see what is going on under the hood before using replace method.

> 'that is quite a fabricated tale'.match(/t.*a/g)
< ["that is quite a fabricated ta"]
> 'that is quite a fabricated tale'.match(/t.*?a/g)
< ["tha", "t is quite a", "ted ta"]

matchAll method

If you need capture group portions and properties for every match with g flag active, use the matchAll method. The return value is an iterator.

> 'abc ac adc abbbc'.matchAll(/ab*c/g)
< RegExpStringIterator {}

// convert the iterator result to array of arrays
> let arr = [...'abc ac adc abbbc'.matchAll(/ab*c/g)]
> arr
< (3) [Array(1), Array(1), Array(1)]
  0: ["abc", index: 0, input: "abc ac adc abbbc", groups: undefined]
  1: ["ac", index: 4, input: "abc ac adc abbbc", groups: undefined]
  2: ["abbbc", index: 11, input: "abc ac adc abbbc", groups: undefined]
  length: 3
  __proto__: Array(0)

// get array with details for first match
> arr[0]
< ["abc", index: 0, input: "abc ac adc abbbc", groups: undefined]
// get index for second match
> arr[1].index
< 4

You can also use Array.from() to convert the iterator to array object. Array.from allows you to provide a mapping function as second argument.

// same as: match(/ab*c/g)
> Array.from('abc ac adc abbbc'.matchAll(/ab*c/g), m => m[0])
< ["abc", "ac", "abbbc"]
// get index for each match
> Array.from('abc ac adc abbbc'.matchAll(/ab*c/g), m => m.index)
< [0, 4, 11]

// get only capture group portions as an array for each match
> Array.from('xx:yyy x: x:yy :y'.matchAll(/(x*):(y*)/g), m => m.slice(1))
< (4) [Array(2), Array(2), Array(2), Array(2)]
  0: (2) ["xx", "yyy"]
  1: (2) ["x", ""]
  2: (2) ["x", "yy"]
  3: (2) ["", "y"]
  length: 4
  __proto__: Array(0)

info Before the introduction of matchAll method, exec method had to be used. See MDN: exec details and examples.

split with capture groups

Capture groups affects split method as well. If the pattern used to split contains capture groups, the portions matched by those groups will also be a part of the output array.

// without capture group
> '31111111111251111426'.split(/1*4?2/)
< ["3", "5", "6"]

// to include the matching portions of the pattern as well in the output
> '31111111111251111426'.split(/(1*4?2)/)
< ["3", "11111111112", "5", "111142", "6"]

If part of the pattern is outside a capture group, the text thus matched won't be in the output. If a capture group didn't participate, it will be represented by undefined in the output array.

// here 4?2 is outside capture group, so that portion won't be in output
> '31111111111251111426'.split(/(1*)4?2/)
< ["3", "1111111111", "5", "1111", "6"]

// multiple capture groups example
// note that the portion matched by b+ isn't present in the output
> '3.14aabccc42'.split(/(a+)b+(c+)/)
< ["3.14", "aa", "ccc", "42"]

// here (4)? matches zero times on the first occasion
> '31111111111251111426'.split(/(1*)(4)?2/)
< ["3", "1111111111", undefined, "5", "1111", "4", "6"]

Use of capture groups and optional limit argument can help you partition an input string into three parts:

  • portion before the first match
  • portion matched by the pattern itself
  • portion after the pattern
// use 's' flag as well if needed
> '3.14aabccc42abc88'.split(/(a+b+c+)(.*)/, 3)
< ["3.14", "aabccc", "42abc88"]

Using function in replacement section

Sometimes, simple replacement string isn't enough and you need to do some processing on the matched portion. For such cases, you can use function in the replacement section. The arguments available to the function are similar to the details provided by match method. The first one is entire matched portion. If capture groups are used, portions matched by those groups will be next. Then comes index of matched portion and finally the input string. Depending on the complexity, you can use fully defined function or arrow function expressions.

> function titleCase(m) {
      return m[0].toUpperCase() + m.substr(1).toLowerCase()
  }

// only function name is enough as second argument
// the matched portion details will be passed automatically to the function
// in this example, 'titleCase' is using only the entire matched portion
> 'aBc ac ADC aBbBC'.replace(/a.*?c/ig, titleCase)
< "Abc Ac Adc Abbbc"

// can also use arrow function expressions for simple cases
> 'abc ac adc abbbc'.replace(/ab*c/g, m => m.toUpperCase())
< "ABC AC adc ABBBC"

// \d will be covered later
// for now, it is enough to know that it will match all digit characters
> '1 42 317'.replace(/\d+/g, m => m*2)
< "2 84 634"

Here's an example with capture groups. See also MDN: replace for more details.

> function titleCase(m, g1, g2) {
        return g1.toUpperCase() + g2.toLowerCase()
  }
> 'aBc ac ADC aBbBC'.replace(/(a)(.*?c)/ig, titleCase)
< "Abc Ac Adc Abbbc"

Using dictionary in replacement section

Sometimes, the functionality you need in replacement section can be simplified to using a dictionary. The matched portion acts as the key to get corresponding value from the dictionary.

// one to one mappings
> let h = { '1': 'one', '2': 'two', '4': 'four' }

> '9234012'.replace(/1|2|4/g, k => h[k])
< "9two3four0onetwo"

// providing a default value if the matched text doesn't exist as a key
> '9234012'.replace(/\d/g, k => k in h ? h[k] : 'X')
< "XtwoXfourXonetwo"

For swapping two or more strings without using intermediate result, using a dictionary is recommended.

> let swap = { 'cat': 'tiger', 'tiger': 'cat' }

> 'cat tiger dog tiger cat'.replace(/cat|tiger/g, k => swap[k])
< "tiger cat dog cat tiger"

For a dictionary that has many entries and likely to undergo changes during development, building alternation list manually is not a good choice. Also, recall that as per precedence rules, longest length string should come first. The unionRegExp function, introduced in Dynamically building alternation section, is helpful here.

> let d = { 'hand': 1, 'handy': 2, 'handful': 3, 'a^b': 4 }

> const p = unionRegExp(Object.keys(d).sort((a, b) => b.length - a.length))
> console.log(p)
< handful|handy|hand|a\^b
> 'handful hand pin handy (a^b)'.replace(new RegExp(p, 'g'), k => d[k])
< "3 1 pin 2 (4)"

Cheatsheet and Summary

NoteDescription
m = s.match(/pat/)assuming g flag isn't used and regexp succeeds,
returns an array with matched portion and 3 properties
index property gives the starting location of the match
input property gives the input string s
groups property gives dictionary of named capture groups
m[0]for above case, gives entire matched portion
m[1]matched portion of first capture group
m[2]matched portion of second capture group and so on
s.match(/pat/g)returns only the matched portions, no properties
capture group doesn't affect the output
match returns null if regexp fails
s.matchAll(/pat/g)returns an iterator containing details for
each matched portion and its properties
use [...] or Array.from to convert to array
Array.from also allows mapping function
s.replace(/pat/, func)you can use a function to provide replacement string
each matched portion details gets passed as arguments
similarly, dictionary can be used for replacement
s.search(/pat/)gives starting location of first match if regexp succeeds
-1 if regexp fails

This chapter introduced match and matchAll methods, which allows you to work with various matching portions of input string. The search method is handy if you just need the starting location of the first match. The replace method allows you to use a function as replacement, which helps to process the matching portions before being used as replacement string. You can also use a dictionary to provide replacement string based on matched portion as key. You learnt about capture groups and you'll see even more uses of groupings in coming chapters.

Exercises

a) For the given strings, extract the matching portion from first is to last t

> let str1 = 'What is the biggest fruit you have seen?'
> let str2 = 'Your mission is to read and practice consistently'

> const pat1 =      // add your solution here

// add your solution here for str1
< "is the biggest fruit"
// add your solution here for str2
< "ission is to read and practice consistent"

b) Find the starting index of first occurrence of is or the or was or to for the given input strings. Assume that there will be at least one match for each input string.

> let s1 = 'match after the last newline character'
> let s2 = 'and then you want to test'
> let s3 = 'this is good bye then'
> let s4 = 'who was there to see?'

> const pat2 =      // add your solution here

// add your solution here for s1
< 12
// add your solution here for s2
< 4
// add your solution here for s3
< 2
// add your solution here for s4
< 4

c) Find the starting index of last occurrence of is or the or was or to for the given input strings. Assume that there will be at least one match for each input string.

> let s1 = 'match after the last newline character'
> let s2 = 'and then you want to test'
> let s3 = 'this is good bye then'
> let s4 = 'who was there to see?'

> const pat3 =      // add your solution here

// add your solution here for s1
< 12
// add your solution here for s2
< 18
// add your solution here for s3
< 17
// add your solution here for s4
< 14

d) The given input string contains : exactly once. Extract all characters after the : as output.

> let ip = 'fruits:apple, mango, guava, blueberry'

// add your solution here
< "apple, mango, guava, blueberry"

e) Extract all words between ( and ) from the given input string as an array (including the parentheses). Assume that the input will not contain any broken parentheses.

> let ip = 'another (way) to reuse (portion) matched (by) capture groups'

// add your solution here
< ["(way)", "(portion)", "(by)"]

f) Extract all occurrences of < up to next occurrence of >, provided there is at least one character in between < and >.

> let ip = 'a<apple> 1<> b<bye> 2<> c<cat>'

// add your solution here
< ["<apple>", "<> b<bye>", "<> c<cat>"]

g) Use matchAll to get the output as shown below for the given input strings. Note the characters used in the input strings carefully.

> let row1 = '-2,5 4,+3 +42,-53 4356246,-357532354 '
> let row2 = '1.32,-3.14 634,5.63 63.3e3,9907809345343.235 '

> const pat4 =      // add your solution here

// add your solution here for row1
< (4) [Array(2), Array(2), Array(2), Array(2)]
  0: (2) ["-2", "5"]
  1: (2) ["4", "+3"]
  2: (2) ["+42", "-53"]
  3: (2) ["4356246", "-357532354"]
  length: 4
  __proto__: Array(0)

// add your solution here for row2
< (3) [Array(2), Array(2), Array(2)]
  0: (2) ["1.32", "-3.14"]
  1: (2) ["634", "5.63"]
  2: (2) ["63.3e3", "9907809345343.235"]
  length: 3
  __proto__: Array(0)

h) This is an extension to previous question. Sum each pair of numbers that are separated by a comma.

> let row1 = '-2,5 4,+3 +42,-53 4356246,-357532354 '
> let row2 = '1.32,-3.14 634,5.63 63.3e3,9907809345343.235 '

// should be same as previous question
> const pat5 =      // add your solution here

// add your solution here for row1
< [3, 7, -11, -353176108]

// add your solution here for row2
< [-1.82, 639.63, 9907809408643.234]

i) Use split method to get the output as shown below.

> let ip = '42:no-output;1000:car-truck;SQEX49801'

// add your solution here
< ["42", "output", "1000", "truck", "SQEX49801"]

j) Write a string function that changes given input to alternate case. The first alphabet should be changed to lowercase, the next one to uppercase and then lowercase and so on. Characters other than alphabets should be left alone and not affect case changing.

> function aLtErNaTeCaSe(ip) {
      // add your solution here
  }

> aLtErNaTeCaSe('HI THERE!')
< "hI tHeRe!"
> aLtErNaTeCaSe('good morning')
< "gOoD mOrNiNg"
> aLtErNaTeCaSe('Sample123string42with777numbers')
< "sAmPlE123sTrInG42wItH777nUmBeRs"

k) Replace the string par with spar, spare with extra and park with garden

> let s1 = 'apartment has a park'
> let s2 = 'do you have a spare cable'
> let s3 = 'write a parser'

> let d1 =          // add your solution here
> const pat6 =      // add your solution here

> s1.replace(pat6, k => d1[k])
< "aspartment has a garden"
> s2.replace(pat6, k => d1[k])
< "do you have a extra cable"
> s3.replace(pat6, k => d1[k])
< "write a sparser"