Dot metacharacter and Quantifiers
This chapter introduces the dot metacharacter and metacharacters related to quantifiers. Similar to the repeat()
string method, quantifiers allows you to repeat a portion of the regular expression pattern and thus make it compact and improve readability. Quantifiers also provide a way to specify a range of repetition. This range has the flexibility of being bounded or unbounded with respect to the start and end values. Combined with the dot metacharacter (and alternation if needed), quantifiers allow you to construct conditional AND logic between patterns.
Dot metacharacter
The dot metacharacter serves as a placeholder to match any character except the \r
, \n
, \u2028
(line separator) and \u2029
(paragraph separator) characters. These are the same characters seen earlier in the Line anchors section.
// matches character 'c', any character and then character 't'
> 'tac tin c.t abc;tuv acute'.replace(/c.t/g, 'X')
< 'taXin X abXuv aXe'
// matches character 'r', any two characters and then character 'd'
> 'breadth markedly reported overrides'.replace(/r..d/g, 'X')
< 'bXth maXly repoX oveXes'
// matches character '2', any character and then character '3'
> '42\t35'.replace(/2.3/, '8')
< '485'
The s flag section will show how to include the line separators as well. The Character class chapter will discuss how to define your own custom placeholder for a limited set of characters.
Some characters like
g̈
have more than one codepoint (numerical value of a character). You'll need to use multiple.
metacharacters to match such characters (equal to the number of codepoints).> 'cag̈ed'.replace(/a.e/, 'o') < 'cag̈ed' > 'cag̈ed'.replace(/a..e/, 'o') < 'cod'
split() method
This chapter will additionally use the split()
method to illustrate examples. The split()
method separates the string based on the given regexp (or string) and returns an array of strings.
> 'apple-85-mango-70'.split(/-/)
< ['apple', '85', 'mango', '70']
// use the optional 'limit' argument to specify max no. of output elements
> 'apple-85-mango-70'.split(/-/, 2)
< ['apple', '85']
// example with the dot metacharacter
> 'bus:3:car:-:van'.split(/:.:/)
< ['bus', 'car', 'van']
See the split() with capture groups section for details of how capture groups affect the output of the split()
method.
Greedy quantifiers
Quantifiers helps you to repeat a portion of the regexp. They can be applied to literal characters, groupings and other features that you'll learn later. Apart from the ability to specify the exact quantity and bounded ranges, these can also match unbounded varying quantities. If the input string can satisfy a pattern with varying quantities in multiple ways, you can choose among two types of quantifiers to narrow down the possibilities. In this section, greedy type of quantifiers is covered.
First up, the ?
metacharacter which quantifies a character or group to match 0
or 1
times. In other words, you make that portion as something to be optionally matched. This leads to a terser regexp compared to alternation and grouping.
// same as: /ear|ar/g
> 'far feat flare fear'.replace(/e?ar/g, 'X')
< 'fX feat flXe fX'
// same as: /\bpar(t|)\b/g
> 'par spare part party'.replace(/\bpart?\b/g, 'X')
< 'X spare X party'
// same as: /\b(re.d|red)\b/
> ['red', 'ready', 're;d', 'redo', 'reed'].filter(w => /\bre.?d\b/.test(w))
< ['red', 're;d', 'reed']
// same as: /part|parrot/g
> 'par part parrot parent'.replace(/par(ro)?t/g, 'X')
< 'par X X parent'
// same as: /part|parrot|parent/g
> 'par part parrot parent'.replace(/par(en|ro)?t/g, 'X')
< 'par X X X'
The *
metacharacter quantifies a character or group to match 0
or more times. There is no upper bound.
// match 't' followed by zero or more of 'a' followed by 'r'
> 'tr tear tare steer sitaara'.replace(/ta*r/g, 'X')
< 'X tear Xe steer siXa'
// match 't' followed by zero or more of 'e' or 'a' followed by 'r'
> 'tr tear tare steer sitaara'.replace(/t(e|a)*r/g, 'X')
< 'X X Xe sX siXa'
// match zero or more of '1' followed by '2'
> '3111111111125111142'.replace(/1*2/g, 'X')
< '3X511114X'
Here are some more examples with the split()
method.
// last element is empty because there is nothing after '2' at the end of string
> '3111111111125111142'.split(/1*2/)
< ['3', '511114', '']
// note how '25' and '42' gets split, there is '1' zero times in between them
> '3111111111125111142'.split(/1*/)
< ['3', '2', '5', '4', '2']
The +
metacharacter quantifies a character or group to match 1
or more times. Similar to the *
quantifier, there is no upper bound. More importantly, this doesn't have surprises like matching empty strings.
> 'tr tear tare steer sitaara'.replace(/ta+r/g, 'X')
< 'tr tear Xe steer siXa'
> 'tr tear tare steer sitaara'.replace(/t(e|a)+r/g, 'X')
< 'tr X Xe sX siXa'
> '3111111111125111142'.replace(/1+2/g, 'X')
< '3X5111142'
> '3111111111125111142'.split(/1+/)
< ['3', '25', '42']
You can specify a range of integer numbers, both bounded and unbounded, using the {}
metacharacters. There are three ways to use this quantifier as shown below.
Quantifier | Description |
---|---|
{m,n} | match m to n times |
{m,} | match at least m times |
{n} | match exactly n times |
> let repeats = ['abc', 'ac', 'abbc', 'xabbbcz', 'bc', 'abbbbbc']
> repeats.filter(w => /ab{1,4}c/.test(w))
< ['abc', 'abbc', 'xabbbcz']
> repeats.filter(w => /ab{0,2}c/.test(w))
< ['abc', 'ac', 'abbc']
> repeats.filter(w => /ab{3,}c/.test(w))
< ['xabbbcz', 'abbbbbc']
> repeats.filter(w => /ab{3}c/.test(w))
< ['xabbbcz']
The
{}
metacharacters have to be escaped to match them literally. However, unlike the()
metacharacters, these have lot more leeway. For example, escaping{
alone is enough, or if it doesn't conform strictly to any of the forms listed above, escaping is not needed at all.> 'a{5} = 10'.replace(/a\{5}/g, 'a{6}') < 'a{6} = 10' > 'report_{a,b}.txt'.replace(/_{a,b}/g, '-{c,d}') < 'report-{c,d}.txt'
AND Conditional
Next up, how to construct AND conditional using the dot metacharacter and quantifiers.
// match 'Error' followed by zero or more characters followed by 'valid'
> /Error.*valid/.test('Error: not a valid input')
< true
> /Error.*valid/.test('Error: key not found')
< false
To allow matching in any order, you'll have to bring in alternation as well. That is somewhat manageable for 2 or 3 patterns. See the AND conditional with lookarounds section for an easier approach.
> /cat.*dog|dog.*cat/.test('cat and dog')
< true
> /cat.*dog|dog.*cat/.test('dog and cat')
< true
// if you just need a boolean result, this would be a scalable approach
> let patterns = [/cat/, /dog/]
> patterns.every(p => p.test('cat and dog'))
< true
> patterns.every(p => p.test('dog and cat'))
< true
What does greedy mean?
When you are using the ?
quantifier, how does JavaScript decide to match 0
or 1
times, if both quantities can satisfy the regexp? For example, consider the expression 'foot'.replace(/f.?o/, 'X')
— should foo
be replaced or fo
? It will always replace foo
, because these are greedy quantifiers, i.e. they try to match as much as possible.
> 'foot'.replace(/f.?o/, 'X')
< 'Xt'
// a more practical example
// prefix '<' with '\' if it is not already prefixed
// both '<' and '\<' will get replaced with '\<'
> console.log('table < fig \\< bat < cake'.replace(/\\?</g, '\\<'))
< table \< fig \< bat \< cake
// say goodbye to /handful|handy|hand/ shenanigans
> 'hand handy handful'.replace(/hand(y|ful)?/g, 'X')
< 'X X X'
But wait, how did the /Error.*valid/
example work? Shouldn't .*
consume all the characters after Error
? Good question. The regexp engine actually does consume all the characters. Then realizing that the regexp fails, it gives back one character from the end of string and checks again if the overall regexp is satisfied. This process is repeated until a match is found or failure is confirmed. In regular expression parlance, this is known as backtracking.
> let sentence = 'that is quite a fabricated tale'
// t.*a will always match from the first 't' to the last 'a'
// which implies that there cannot be more than one match for such patterns
> sentence.replace(/t.*a/, 'X')
< 'Xle'
> 'star'.replace(/t.*a/, 'X')
< 'sXr'
// matching first 't' to last 'a' for t.*a won't work for these cases
// so, the regexp engine backtracks until the overall regexp can be matched
> sentence.replace(/t.*a.*q.*f/, 'X')
< 'Xabricated tale'
> sentence.replace(/t.*a.*u/, 'X')
< 'Xite a fabricated tale'
Backtracking can become significantly time consuming for certain corner cases. Or even catastrophic — see cloudflare: Details of the Cloudflare outage on July 2, 2019 for an example. See this post for more examples and workarounds.
Non-greedy quantifiers
As the name implies, these quantifiers will try to match as minimally as possible. Also known as lazy or reluctant quantifiers. Appending a ?
to greedy quantifiers makes them non-greedy.
> 'foot'.replace(/f.??o/, 'X')
< 'Xot'
> 'frost'.replace(/f.??o/, 'X')
< 'Xst'
> '123456789'.replace(/.{2,5}?/, 'X')
< 'X3456789'
Like greedy quantifiers, lazy quantifiers will try to satisfy the overall regexp. For example, .*?
will first start with an empty match and then move forward one character at a time until a match is found.
// greedy will match from the first ':' to the last ':'
> 'green:3.14:teal::brown:oh!:blue'.split(/:.*:/)
< ['green', 'blue']
// non-greedy will match from ':' to the very next ':'
> 'green:3.14:teal::brown:oh!:blue'.split(/:.*?:/)
< ['green', 'teal', 'brown', 'blue']
s flag
Use the s
flag to allow the .
metacharacter to match \r
, \n
and line/paragraph separator characters as well.
// by default, the . metacharacter doesn't match the line separators
> console.log('Hi there\nHave a Nice Day'.replace(/the.*ice/, 'X'))
< Hi there
Have a Nice Day
// 's' flag allows line separators to be matched as well
> console.log('Hi there\nHave a Nice Day'.replace(/the.*ice/s, 'X'))
< Hi X Day
Cheatsheet and Summary
Note | Description |
---|---|
. | match any character except the line separators |
s | flag to match line separators as well with the . metacharacter |
greedy | match as much as possible |
? | greedy quantifier, match 0 or 1 times |
* | greedy quantifier, match 0 or more times |
+ | greedy quantifier, match 1 or more times |
{m,n} | greedy quantifier, match m to n times |
{m,} | greedy quantifier, match at least m times |
{n} | greedy quantifier, match exactly n times |
pat1.*pat2 | any number of characters between pat1 and pat2 |
pat1.*pat2|pat2.*pat1 | match both pat1 and pat2 in any order |
non-greedy | append ? to greedy quantifier |
match as minimally as possible | |
s.split(/pat/) | split a string based on regexps |
This chapter introduced the concept of specifying a placeholder instead of fixed strings. When combined with quantifiers, you've seen a glimpse of how a simple regexp can match wide ranges of text. In the coming chapters, you'll learn how to create your own restricted set of placeholder characters.
Exercises
Use
s
flag for these exercises depending upon the contents of the input strings.
1) Replace 42//5
or 42/5
with 8
for the given input.
> let ip = 'a+42//5-c pressure*3+42/5-14256'
// add your solution here
< 'a+8-c pressure*3+8-14256'
2) For the array items
, filter all elements starting with hand
and ending immediately with at most one more character or le
.
> let items = ['handed', 'hand', 'handled', 'handy', 'unhand', 'hands', 'handle']
// add your solution here
< ['hand', 'handy', 'hands', 'handle']
3) Use the split()
method to get the output as shown for the given input strings.
> let eqn1 = 'a+42//5-c'
> let eqn2 = 'pressure*3+42/5-14256'
> let eqn3 = 'r*42-5/3+42///5-42/53+a'
> const pat1 = // add your solution here
> eqn1.split(pat1)
< ['a+', '-c']
> eqn2.split(pat1)
< ['pressure*3+', '-14256']
> eqn3.split(pat1)
< ['r*42-5/3+42///5-', '3+a']
4) For the given input strings, remove everything from the first occurrence of i
till the end of the string.
> let s1 = 'remove the special meaning of such constructs'
> let s2 = 'characters while constructing'
> let s3 = 'input output'
> const pat2 = // add your solution here
> s1.replace(pat2, '')
< 'remove the spec'
> s2.replace(pat2, '')
< 'characters wh'
> s3.replace(pat2, '')
< ''
5) For the given strings, construct a regexp to get the output as shown.
> let str1 = 'a+b(addition)'
> let str2 = 'a/b(division) + c%d(#modulo)'
> let str3 = 'Hi there(greeting). Nice day(a(b)'
> const remove_parentheses = // add your solution here
> str1.replace(remove_parentheses, '')
< 'a+b'
> str2.replace(remove_parentheses, '')
< 'a/b + c%d'
> str3.replace(remove_parentheses, '')
< 'Hi there. Nice day'
6) Correct the given regexp to get the expected output.
> let words = 'plink incoming tint winter in caution sentient'
// wrong output
> const w1 = /int|in|ion|ing|inco|inter|ink/g
> words.replace(w1, 'X')
"plXk XcomXg tX wXer X cautX sentient"
// expected output
> const w2 = // add your solution here
> words.replace(w2, 'X')
"plX XmX tX wX X cautX sentient"
7) For the given greedy quantifiers, what would be the equivalent form using the {m,n}
representation?
?
is same as*
is same as+
is same as
8) (a*|b*)
is same as (a|b)*
— true or false?
9) For the given input strings, remove everything from the first occurrence of test
(irrespective of case) till the end of the string, provided test
isn't at the end of the string.
> let s1 = 'this is a Test'
> let s2 = 'always test your regexp for corner\ncases'
> let s3 = 'a TEST of skill tests?'
> let pat3 = // add your solution here
> s1.replace(pat3, '')
< 'this is a Test'
> s2.replace(pat3, '')
< 'always '
> s3.replace(pat3, '')
< 'a '
10) For the input array words
, filter all elements starting with s
and containing e
and t
in any order.
> let words = ['sequoia', 'subtle', 'exhibit', 'a set', 'sets', 'tests', 'site']
// add your solution here
< ['subtle', 'sets', 'site']
11) For the input array words
, remove all elements having less than 6
characters.
> let words = ['sequoia', 'subtle', 'exhibit', 'asset', 'sets', 'tests', 'site']
// add your solution here
< ['sequoia', 'subtle', 'exhibit']
12) For the input array words
, filter all elements starting with s
or t
and having a maximum of 6
characters.
> let words = ['sequoia', 'subtle', 'exhibit', 'asset', 'sets', 't set', 'site']
// add your solution here
< ['subtle', 'sets', 't set', 'site']
13) Delete from the string start
if it is at the beginning of a line up to the next occurrence of the string end
at the end of a line. Match these keywords irrespective of case.
> let para = `good start
start working on that
project you always wanted
to, do not let it end
hi there
start and end the end
42
Start and try to
finish the End
bye`
> const mpat = // add your solution here
> console.log(para.replace(mpat, ''))
< good start
hi there
42
bye
14) Can you reason out why this code results in the output shown? The aim was to remove all <characters>
patterns but not the <>
ones. The expected result was 'a 1<> b 2<> c'
.
> let ip = 'a<apple> 1<> b<bye> 2<> c<cat>'
> ip.replace(/<.+?>/g, '')
< 'a 1 2'
15) Use the split()
method to get the output as shown below for the given input strings.
> let s1 = 'go there :: this :: that'
> let s2 = 'a::b :: c::d e::f :: 4::5'
> let s3 = '42:: hi::bye::see :: carefully'
> const pat4 = // add your solution here
> s1.split() // add your solution here
< ['go there', 'this :: that']
> s2.split() // add your solution here
< ['a::b', 'c::d e::f :: 4::5']
> s3.split() // add your solution here
< ['42:: hi::bye::see', 'carefully']