Modifiers

Just like options change the default behavior of command-line tools, modifiers are used to change aspects of regexp. They can be applied to the entire regexp or just to a particular portion, and both forms can be mixed up as well. The cryptic output of the Regexp.union method when one of the arguments is a regexp will be explained as well in this chapter. In regular expression parlance, modifiers are also known as flags.

Modifiers already seen will again be discussed in this chapter for sake of completeness. You'll also see how to combine multiple modifiers.

i modifier

First up, the i modifier which will ignore case while matching alphabets.

>> 'A Cat' =~ /cat/
=> nil
>> 'A Cat' =~ /cat/i
=> 2

>> 'Cat cot CATER ScUtTLe'.scan(/c.t/i)
=> ["Cat", "cot", "CAT", "cUt"]

# same as: /[a-zA-Z]+/
# can also use: /[A-Z]+/i
>> 'Sample123string42with777numbers'.scan(/[a-z]+/i)
=> ["Sample", "string", "with", "numbers"]

m modifier

Use the m modifier to allow the . metacharacter to match newline characters as well.

# by default, the . metacharacter doesn't match newlines
>> "Hi there\nHave a Nice Day".sub(/the.*ice/, 'X')
=> "Hi there\nHave a Nice Day"

# m modifier will allow the newline character to be matched as well
>> "Hi there\nHave a Nice Day".sub(/the.*ice/m, 'X')
=> "Hi X Day"

# multiple modifiers can be specified next to each other
>> "Hi there\nHave a Nice Day".sub(/the.*day/im, 'Bye')
=> "Hi Bye"

o modifier

The o modifier restricts the #{} interpolations inside a regexp definition to be performed only once, even if it is inside a loop. As an alternate, you could simply assign a variable with the regexp definition and use that within the loop without needing the o modifier.

>> words = %w[car bike bus auto train plane]

# as 'o' modifier is used, expression inside #{} will be evaluated only once
# and not calculated again and again every iteration
>> n = 2
?> for w in words
?>     puts w if w.match?(/\A\w{#{2**n}}\z/o)
>> end
bike
auto

# here, the expression result is not a constant, so don't use the 'o' modifier
# with 'o' modifier, there'll be no match because #{n} will be '1' always
>> n = 1
?> for w in words
?>     puts w if w.match?(/\A\w{#{n}}\z/)
?>     n += 1
>> end
bus
auto
train

x modifier

The x modifier is another provision like the named capture groups to help add clarity to regexp definitions. This modifier allows to use literal whitespaces for aligning purposes and add comments after the # character to define multiline regexps with comments.

# same as: pat = /\A((?:[^,]+,){3})([^,]+)/
>> pat = /\A(                 # group-1, captures the first 3 columns
              (?:[^,]+,){3}   # non-capturing group to get the 3 columns
            )
            ([^,]+)           # group-2, captures the 4th column
         /x

>> '1,2,3,4,5,6,7'.sub(pat, '\1(\2)')
=> "1,2,3,(4),5,6,7"

As whitespace and # characters get special meaning when using the x modifier, they have to be escaped or represented by backslash escape sequences to match them literally. See ruby-doc: Extended Mode for more details.

>> 'cat and dog'.match?(/t a/x)
=> false
>> 'cat and dog'.match?(/t\ a/x)
=> true
>> 'cat and dog'.match?(/t[ ]a/x)
=> true
>> 'cat and dog'.match?(/t\x20a/x)
=> true

>> 'apple a#b 123'[/a#b/x]
=> "a"
>> 'apple a#b 123'[/a\#b/x]
=> "a#b"

Inline comments

Comments can also be added using the (?#comment) special group. This is independent of the x modifier.

>> pat = /\A((?:[^,]+,){3})(?#3-cols)([^,]+)(?#4th-col)/

>> '1,2,3,4,5,6,7'.sub(pat, '\1(\2)')
=> "1,2,3,(4),5,6,7"

Inline modifiers

To apply modifiers to specific portions of regexp, specify them inside a special grouping syntax. This will override the modifiers applied to the entire regexp definition, if any. The syntax variations are:

(?modifiers:pat) will apply modifiers only for this regexp portion
(?-modifiers:pat) will negate modifiers only for this regexp portion
(?modifiers-modifiers:pat) will apply and negate particular modifiers only for this regexp portion
(?modifiers) when :pat is not used within the grouping, modifiers (including negation) will be applied from this point onwards

In these ways, modifiers can be specified precisely only where it is needed. And as can be observed from below examples, these do not act like a capture group.

# case-insensitive only for the 'cat' portion
>> 'Cat scatter CATER cAts'.scan(/(?i:cat)[a-z]*\b/)
=> ["Cat", "catter", "cAts"]
# same thing by overriding the overall modifier
>> 'Cat scatter CATER cAts'.scan(/cat(?-i)[a-z]*\b/i)
=> ["Cat", "catter", "cAts"]

# case-sensitive only for 'Cat'
>> 'Cat SCatTeR CATER cAts'.scan(/(?-i:Cat)[a-z]*\b/i)
=> ["Cat", "CatTeR"]
# same thing without the overall modifier
>> 'Cat SCatTeR CATER cAts'.scan(/Cat(?i)[a-z]*\b/)
=> ["Cat", "CatTeR"]

So, now you should be able to decode the output of the Regexp.union method when one of the arguments is a regexp.

>> Regexp.union(/^cat/i, '123')
=> /(?i-mx:^cat)|123/

>> Regexp.union(/cat/, 'a^b', /the.*ice/im)
=> /(?-mix:cat)|a\^b|(?mi-x:the.*ice)/

Cheatsheet and Summary

Note	Description
`i`	modifier to ignore case
`m`	allow the `.` metacharacter to match newline characters too
`o`	interpolate `#{}` inside a regexp only once
`x`	allows to use literal whitespaces for aligning purposes
	and to add comments after the `#` character
	escape spaces and `#` if needed as part of the actual regexp
`(?#comment)`	another way to add comments, not a modifier
`(?modifiers:pat)`	apply modifiers only for `pat`
`(?-modifiers:pat)`	negate modifiers only for `pat`
`(?modifiers-modifiers:pat)`	apply and negate modifiers only for `pat`
`(?modifiers)`	modifiers will be applied from this point onwards

This chapter showed some of the modifiers that can be used to change the default behavior of regexps. And more special groupings were covered.

Exercises

1) Remove from the first occurrence of hat to the last occurrence of it for the given input strings. Match these markers case insensitively.

>> s1 = "But Cool THAT\nsee What okay\nwow quite"
>> s2 = 'it this hat is sliced HIT.'

>> pat =        ##### add your solution here

>> s1.sub(pat, '')
=> "But Cool Te"
>> s2.sub(pat, '')
=> "it this ."

2) Delete from the string start if it is at the beginning of a line up to the next occurrence of the string end at the end of a line. Match these keywords irrespective of case.

'> para = %q{good start
'> start working on that
'> project you always wanted
'> to, do not let it end
'> hi there
'> start and end the end
'> 42
'> Start and try to
'> finish the End
>> bye}

>> pat =        ##### add your solution here

>> puts para.gsub(pat, '')
good start

hi there

42

bye

3) For the given markdown file, replace all occurrences of the string ruby (irrespective of case) with the string Ruby. However, any match within code blocks that start with the whole line ```ruby and end with the whole line ``` shouldn't be replaced. Consider the input file to be small enough to fit memory requirements.

Refer to the exercises folder for input files required to solve this exercise.

>> ip_str = File.open('sample.md').read
>> pat =        ##### add your solution here

>> File.open('sample_mod.md', 'w') do |f|
?>   ip_str.split(pat).each_with_index do |s, i|
?>     f.write(i.odd? ? s : s.gsub(/ruby/i) { $&.capitalize })
>>   end
>> end

>> File.open('sample_mod.md').read == File.open('expected.md').read
=> true

4) Write a string method that changes the given input to alternate case (starting with lowercase first).

?> def aLtErNaTe_CaSe(ip_str)
##### add your solution here
>> end

>> aLtErNaTe_CaSe('HI THERE!')
=> "hI tHeRe!"
>> aLtErNaTe_CaSe('good morning')
=> "gOoD mOrNiNg"
>> aLtErNaTe_CaSe('Sample123string42with777numbers')
=> "sAmPlE123sTrInG42wItH777nUmBeRs"

5) For the given input strings, match all of these three conditions:

This case sensitively
nice and cool case insensitively

>> s1 = 'This is nice and Cool'
>> s2 = 'Nice and cool this is'
>> s3 = 'What is so nice and cool about This?'
>> s4 = 'nice,cool,This'
>> s5 = 'not nice This?'
>> s6 = 'This is not cool'

>> pat =        ##### add your solution here

>> s1.match?(pat)
=> true
>> s2.match?(pat)
=> false
>> s3.match?(pat)
=> true
>> s4.match?(pat)
=> true
>> s5.match?(pat)
=> false
>> s6.match?(pat)
=> false

6) For the given input strings, match if the string begins with Th and also contains a line that starts with There.

>> s1 = "There there\nHave a cookie"
>> s2 = "This is a mess\nYeah?\nThereeeee"
>> s3 = "Oh\nThere goes the fun"
>> s4 = 'This is not\ngood\nno There'

>> pat =        ##### add your solution here

>> s1.match?(pat)
=> true
>> s2.match?(pat)
=> true
>> s3.match?(pat)
=> false
>> s4.match?(pat)
=> false

Understanding Ruby Regexp