Exercise solutions

Solutions for Exercises.md is presented here.

Regexp introduction

1) Check whether the given strings contain 0xB0. Display a boolean result as shown below.

>> line1 = 'start address: 0xA0, func1 address: 0xC0'
>> line2 = 'end address: 0xFF, func2 address: 0xB0'

>> line1.match?(/0xB0/)
=> false
>> line2.match?(/0xB0/)
=> true

2) Check if the given input strings contain two irrespective of case.

>> s1 = 'Their artwork is exceptional'
>> s2 = 'one plus tw0 is not three'
>> s3 = 'TRUSTWORTHY'

>> pat1 = /two/i

>> pat1.match?(s1)
=> true
>> pat1.match?(s2)
=> false
>> pat1.match?(s3)
=> true

3) Replace all occurrences of 5 with five for the given string.

>> ip = 'They ate 5 apples and 5 oranges'

>> ip.gsub(/5/, 'five')
=> "They ate five apples and five oranges"

4) Replace only the first occurrence of 5 with five for the given string.

>> ip = 'They ate 5 apples and 5 oranges'

>> ip.sub(/5/, 'five')
=> "They ate five apples and 5 oranges"

5) For the given array, filter all elements that do not contain e.

>> items = %w[goal new user sit eat dinner]

>> items.grep_v(/e/)
=> ["goal", "sit"]

6) Replace all occurrences of note irrespective of case with X.

>> ip = 'This note should not be NoTeD'

>> ip.gsub(/note/i, 'X')
=> "This X should not be XD"

7) For the given input string, print all lines NOT containing the string 2.

'> purchases = %q{items qty
'> apple 24
'> mango 50
'> guava 42
'> onion 31
>> water 10}

>> num = /2/

>> puts purchases.each_line.grep_v(num)
items qty
mango 50
onion 31
water 10

8) For the given array, filter all elements that contain either a or w.

>> items = %w[goal new user sit eat dinner]

>> items.filter { |e| e.match?(/a/) || e.match?(/w/) }
=> ["goal", "new", "eat"]

9) For the given array, filter all elements that contain both e and n.

>> items = %w[goal new user sit eat dinner]

>> items.filter { |e| e.match?(/e/) && e.match?(/n/) }
=> ["new", "dinner"]

10) For the given string, replace 0xA0 with 0x7F and 0xC0 with 0x1F.

>> ip = 'start address: 0xA0, func1 address: 0xC0'

>> ip.gsub(/0xA0/, '0x7F').gsub(/0xC0/, '0x1F')
=> "start address: 0x7F, func1 address: 0x1F"

11) Find the starting index of the first occurrence of is for the given input string.

>> ip = 'match this after the history lesson'

>> ip =~ /is/
=> 8

Anchors

1) Check if the given strings start with be.

>> line1 = 'be nice'
>> line2 = '"best!"'
>> line3 = 'better?'
>> line4 = 'oh no\nbear spotted'

>> pat = /\Abe/

>> pat.match?(line1)
=> true
>> pat.match?(line2)
=> false
>> pat.match?(line3)
=> true
>> pat.match?(line4)
=> false

2) For the given input string, change only the whole word red to brown.

>> words = 'bred red spread credible red.'

>> words.gsub(/\bred\b/, 'brown')
=> "bred brown spread credible brown."

3) For the given input array, filter elements that contain 42 surrounded by word characters.

>> items = ['hi42bye', 'nice1423', 'bad42', 'cool_42a', '42fake', '_42_']

>> items.grep(/\B42\B/)
=> ["hi42bye", "nice1423", "cool_42a", "_42_"]

4) For the given input array, filter elements that start with den or end with ly.

>> items = ['lovely', "1\ndentist", '2 lonely', 'eden', "fly\n", 'dent']

>> items.filter { |e| e.match?(/\Aden/) || e.match?(/ly\z/) }
=> ["lovely", "2 lonely", "dent"]

5) For the given input string, change whole word mall to 1234 only if it is at the start of a line.

'> para = %q{(mall) call ball pall
'> ball fall wall tall
'> mall call ball pall
'> wall mall ball fall
'> mallet wallet malls
>> mall:call:ball:pall}

>> puts para.gsub(/^mall\b/, '1234')
(mall) call ball pall
ball fall wall tall
1234 call ball pall
wall mall ball fall
mallet wallet malls
1234:call:ball:pall

6) For the given array, filter elements having a line starting with den or ending with ly.

>> items = ['lovely', "1\ndentist", '2 lonely', 'eden', "fly\nfar", 'dent']

>> items.filter { |e| e.match?(/^den/) || e.match?(/ly$/) }
=> ["lovely", "1\ndentist", "2 lonely", "fly\nfar", "dent"]

7) For the given input array, filter all whole elements 12\nthree irrespective of case.

>> items = ["12\nthree\n", "12\nThree", "12\nthree\n4", "12\nthree"]

>> items.grep(/\A12\nthree\z/i)
=> ["12\nThree", "12\nthree"]

8) For the given input array, replace hand with X for all elements that start with hand followed by at least one word character.

>> items = %w[handed hand handy unhanded handle hand-2]

>> items.map { _1.sub(/\bhand\B/, 'X') }
=> ["Xed", "hand", "Xy", "unhanded", "Xle", "hand-2"]

9) For the given input array, filter all elements starting with h. Additionally, replace e with X for these filtered elements.

>> items = %w[handed hand handy unhanded handle hand-2]

>> items.filter_map { |e| e.gsub(/e/, 'X') if e.match?(/\Ah/) }
=> ["handXd", "hand", "handy", "handlX", "hand-2"]

Alternation and Grouping

1) For the given input array, filter all elements that start with den or end with ly.

>> items = ['lovely', "1\ndentist", '2 lonely', 'eden', "fly\n", 'dent']

>> items.grep(/\Aden|ly\z/)
=> ["lovely", "2 lonely", "dent"]

2) For the given array, filter elements having a line starting with den or ending with ly.

>> items = ['lovely', "1\ndentist", '2 lonely', 'eden', "fly\nfar", 'dent']

>> items.grep(/^den|ly$/)
=> ["lovely", "1\ndentist", "2 lonely", "fly\nfar", "dent"]

3) For the given strings, replace all occurrences of removed or reed or received or refused with X.

>> s1 = 'creed refuse removed read'
>> s2 = 'refused reed redo received'

>> pat = /re(mov|ceiv|fus|)ed/

>> s1.gsub(pat, 'X')
=> "cX refuse X read"
>> s2.gsub(pat, 'X')
=> "X X redo X"

4) For the given strings, replace all matches from the array words with A.

>> s1 = 'plate full of slate'
>> s2 = "slated for later, don't be late"
>> words = %w[late later slated]

>> pat = Regexp.union(words.sort_by { |w| -w.length })

>> s1.gsub(pat, 'A')
=> "pA full of sA"
>> s2.gsub(pat, 'A')
=> "A for A, don't be A"

5) Filter all whole elements from the input array items that exactly matches any of the elements present in the array words.

>> items = ['slate', 'later', 'plate', 'late', 'slates', 'slated ']
>> words = %w[late later slated]

>> pat = Regexp.union(words.sort_by { |w| -w.length })
>> pat = /\A(#{pat.source})\z/

>> items.grep(pat)
=> ["later", "late"]

Escaping metacharacters

1) Transform the given input strings to the expected output using the same logic on both strings.

>> str1 = '(9-2)*5+qty/3-(9-2)*7'
>> str2 = '(qty+4)/2-(9-2)*5+pq/4'

>> str1.gsub('(9-2)*5', '35')
=> "35+qty/3-(9-2)*7"
>> str2.gsub('(9-2)*5', '35')
=> "(qty+4)/2-35+pq/4"

2) Replace (4)\| with 2 only at the start or end of the given input strings.

>> s1 = '2.3/(4)\|6 fig 5.3-(4)\|'
>> s2 = '(4)\|42 - (4)\|3'
>> s3 = "two - (4)\\|\n"

>> pat = /\A\(4\)\\\||\(4\)\\\|\z/

>> s1.gsub(pat, '2')
=> "2.3/(4)\\|6 fig 5.3-2"
>> s2.gsub(pat, '2')
=> "242 - (4)\\|3"
>> s3.gsub(pat, '2')
=> "two - (4)\\|\n"

3) Replace any matching item from the given array with X for the given input strings. Match the elements from items literally. Assume no two elements of items will result in any matching conflict.

>> items = ['a.b', '3+n', 'x\y\z', 'qty||price', '{n}']

>> pat = Regexp.union(items)

>> '0a.bcd'.gsub(pat, 'X')
=> "0Xcd"
>> 'E{n}AMPLE'.gsub(pat, 'X')
=> "EXAMPLE"
>> '43+n2 ax\y\ze'.gsub(pat, 'X')
=> "4X2 aXe"

4) Replace the backspace character \b with a single space character for the given input string.

>> ip = "123\b456"
>> puts ip
12456

>> ip.gsub(/\x08/, ' ')
=> "123 456"

5) Replace all occurrences of \o with o.

>> ip = 'there are c\omm\on aspects am\ong the alternati\ons'

>> ip.gsub(/\\o/, 'o')
=> "there are common aspects among the alternations"

6) Replace any matching item from the array eqns with X for the given string ip. Match the items from eqns literally.

>> ip = '3-(a^b)+2*(a^b)-(a/b)+3'
>> eqns = %w[(a^b) (a/b) (a^b)+2]

>> pat = Regexp.union(eqns.sort_by { |w| -w.length })

>> ip.gsub(pat, 'X')
=> "3-X*X-X+3"

Dot metacharacter and Quantifiers

Since the . metacharacter doesn't match newline characters by default, assume that the input strings in the following exercises will not contain newline characters.

1) Replace 42//5 or 42/5 with 8 for the given input.

>> ip = 'a+42//5-c pressure*3+42/5-14256'

>> ip.gsub(%r{42//?5}, '8')
=> "a+8-c pressure*3+8-14256"

2) For the array items, filter all elements starting with hand and ending immediately with at most one more character or le.

>> items = %w[handed hand handled handy unhand hands handle]

>> items.grep(/\Ahand(.|le)?\z/)
=> ["hand", "handy", "hands", "handle"]

3) Use the split method to get the output as shown for the given input strings.

>> eqn1 = 'a+42//5-c'
>> eqn2 = 'pressure*3+42/5-14256'
>> eqn3 = 'r*42-5/3+42///5-42/53+a'

>> pat = %r{42//?5}

>> eqn1.split(pat)
=> ["a+", "-c"]
>> eqn2.split(pat)
=> ["pressure*3+", "-14256"]
>> eqn3.split(pat)
=> ["r*42-5/3+42///5-", "3+a"]

4) For the given input strings, remove everything from the first occurrence of i till the end of the string.

>> s1 = 'remove the special meaning of such constructs'
>> s2 = 'characters while constructing'
>> s3 = 'input output'

>> pat = /i.*/

>> s1.sub(pat, '')
=> "remove the spec"
>> s2.sub(pat, '')
=> "characters wh"
>> s3.sub(pat, '')
=> ""

5) For the given strings, construct a regexp to get the output as shown below.

>> str1 = 'a+b(addition)'
>> str2 = 'a/b(division) + c%d(#modulo)'
>> str3 = 'Hi there(greeting). Nice day(a(b)'

>> remove_parentheses = /\(.*?\)/

>> str1.gsub(remove_parentheses, '')
=> "a+b"
>> str2.gsub(remove_parentheses, '')
=> "a/b + c%d"
>> str3.gsub(remove_parentheses, '')
=> "Hi there. Nice day"

6) Correct the given regexp to get the expected output.

>> words = 'plink incoming tint winter in caution sentient'

# wrong output
>> change = /int|in|ion|ing|inco|inter|ink/
>> words.gsub(change, 'X')
=> "plXk XcomXg tX wXer X cautX sentient"

# expected output
>> change = /in(ter|co|t|g|k)?|ion/
>> words.gsub(change, 'X')
=> "plX XmX tX wX X cautX sentient"

7) For the given greedy quantifiers, what would be the equivalent form using the {m,n} representation?

? is same as {,1}
* is same as {0,}
+ is same as {1,}

8) (a*|b*) is same as (a|b)* — true or false?

False. Because (a*|b*) will match only sequences like a, aaa, bb, bbbbbbbb. But (a|b)* can match mixed sequences like ababbba too.

9) For the given input strings, remove everything from the first occurrence of test (irrespective of case) till the end of the string, provided test isn't at the end of the string.

>> s1 = 'this is a Test'
>> s2 = 'always test your RE for corner cases'
>> s3 = 'a TEST of skill tests?'

>> pat = /test.+/i

>> s1.sub(pat, '')
=> "this is a Test"
>> s2.sub(pat, '')
=> "always "
>> s3.sub(pat, '')
=> "a "

10) For the input array words, filter all elements starting with s and containing e and t in any order.

>> words = ['sequoia', 'subtle', 'exhibit', 'a set', 'sets', 'tests', 'site']

>> words.grep(/\As.*(e.*t|t.*e)/)
=> ["subtle", "sets", "site"]

11) For the input array words, remove all elements having less than 6 characters.

>> words = %w[sequoia subtle exhibit asset sets tests site]

>> words.grep(/.{6,}/)
=> ["sequoia", "subtle", "exhibit"]

12) For the input array words, filter all elements starting with s or t and having a maximum of 6 characters.

>> words = ['sequoia', 'subtle', 'exhibit', 'asset', 'sets', 't set', 'site']

>> words.grep(/\A(s|t).{,5}\z/)
=> ["subtle", "sets", "t set", "site"]

13) Can you reason out why this code results in the output shown? The aim was to remove all <characters> patterns but not the <> ones. The expected result was 'a 1<> b 2<> c'.

The use of .+ quantifier after < means that <> cannot be a possible match to satisfy <.+?>. So, after matching < (which occurs after 1 and 2 in the given input string) the regular expression engine will look for the next occurrence of the > character to satisfy the given pattern. To solve such cases, you need to use character classes (discussed in a later chapter) to specify which particular set of characters should be matched by the + quantifier (instead of the . metacharacter).

>> ip = 'a<apple> 1<> b<bye> 2<> c<cat>'

>> ip.gsub(/<.+?>/, '')
=> "a 1 2"

14) Use the split method to get the output as shown below for the given input strings.

>> s1 = 'go there  ::   this :: that'
>> s2 = 'a::b :: c::d e::f :: 4::5'
>> s3 = '42:: hi::bye::see :: carefully'

>> pat = / +:: +/

>> s1.split(pat, 2)
=> ["go there", "this :: that"]
>> s2.split(pat, 2)
=> ["a::b", "c::d e::f :: 4::5"]
>> s3.split(pat, 2)
=> ["42:: hi::bye::see", "carefully"]

15) For the given input strings, match if the string starts with optional space characters followed by at least two # characters.

>> s1 = '   ## header2'
>> s2 = '#### header4'
>> s3 = '# comment'
>> s4 = 'normal string'
>> s5 = 'nope ## not this'

>> pat = /\A *\#{2,}/

>> s1.match?(pat)
=> true
>> s2.match?(pat)
=> true
>> s3.match?(pat)
=> false
>> s4.match?(pat)
=> false
>> s5.match?(pat)
=> false

16) Modify the given regular expression such that it gives the expected results.

>> s1 = 'appleabcabcabcapricot'
>> s2 = 'bananabcabcabcdelicious'

# wrong output
>> pat = /(abc)+a/
>> pat.match?(s1)
=> true
>> pat.match?(s2)
=> true

# expected output
# 'abc' shouldn't be considered when trying to match 'a' at the end
>> pat = /(abc)++a/
>> pat.match?(s1)
=> true
>> pat.match?(s2)
=> false

Working with matched portions

1) For the given strings, extract the matching portion from the first is to the last t.

>> str1 = 'This the biggest fruit you have seen?'
>> str2 = 'Your mission is to read and practice consistently'

>> pat = /is.*t/

>> str1[pat]
=> "is the biggest fruit"
>> str2[pat]
=> "ission is to read and practice consistent"

2) Find the starting index of the first occurrence of is or the or was or to for the given input strings.

>> s1 = 'match after the last newline character'
>> s2 = 'and then you want to test'
>> s3 = 'this is good bye then'
>> s4 = 'who was there to see?'

>> pat = /is|the|was|to/

>> s1 =~ pat
=> 12
>> s2 =~ pat
=> 4
>> s3 =~ pat
=> 2
>> s4 =~ pat
=> 4

3) Find the starting index of the last occurrence of is or the or was or to for the given input strings.

>> s1 = 'match after the last newline character'
>> s2 = 'and then you want to test'
>> s3 = 'this is good bye then'
>> s4 = 'who was there to see?'

>> pat = /.*(is|the|was|to)/

>> s1.match(pat).begin(1)
=> 12
>> s2.match(pat).begin(1)
=> 18
>> s3.match(pat).begin(1)
=> 17
>> s4.match(pat).begin(1)
=> 14

4) Extract everything after the : character, which occurs only once in the input.

>> ip = 'fruits:apple, mango, guava, blueberry'

# can also use: ip[/:(.*)/, 1]
# can also use: ip.sub(/.*:/, '')
>> ip.match(/:(.*)/)[1]
=> "apple, mango, guava, blueberry"

5) The given input strings contains some text followed by - followed by a number. Replace that number with its log value using Math.log().

>> s1 = 'first-3.14'
>> s2 = 'next-123'

>> pat = /-(.+)/

>> s1.sub(pat) { "-#{Math.log($1.to_f)}" }
=> "first-1.144222799920162"
>> s2.sub(pat) { "-#{Math.log($1.to_f)}" }
=> "next-4.812184355372417"

6) Replace all occurrences of par with spar, spare with extra and park with garden for the given input strings.

>> str1 = 'apartment has a park'
>> str2 = 'do you have a spare cable'
>> str3 = 'write a parser'

>> pat = /park?|spare/
>> h = { 'par' => 'spar', 'spare' => 'extra', 'park' => 'garden' }

>> str1.gsub(pat, h)
=> "aspartment has a garden"
>> str2.gsub(pat, h)
=> "do you have a extra cable"
>> str3.gsub(pat, h)
=> "write a sparser"

7) Extract all words between ( and ) from the given input string as an array. Assume that the input will not contain any broken parentheses.

>> ip = 'another (way) to reuse (portion) matched (by) capture groups'

# as nested array
>> ip.scan(/\((.*?)\)/)
=> [["way"], ["portion"], ["by"]]

# as array of strings
>> ip.gsub(/\((.*?)\)/).map { $1 }
=> ["way", "portion", "by"]

8) Extract all occurrences of < up to the next occurrence of >, provided there is at least one character in between < and >.

>> ip = 'a<apple> 1<> b<bye> 2<> c<cat>'

>> ip.scan(/<.+?>/)
=> ["<apple>", "<> b<bye>", "<> c<cat>"]

9) Use scan to get the output as shown below for the given input strings. Note the characters used in the input strings carefully.

>> row1 = '-2,5 4,+3 +42,-53 4356246,-357532354 '
>> row2 = '1.32,-3.14 634,5.63 63.3e3,9907809345343.235 '

>> pat = /(.+?),(.+?) /

>> row1.scan(pat)
=> [["-2", "5"], ["4", "+3"], ["+42", "-53"], ["4356246", "-357532354"]]
>> row2.scan(pat)
=> [["1.32", "-3.14"], ["634", "5.63"], ["63.3e3", "9907809345343.235"]]

10) This is an extension to the previous question.

For row1, find the sum of integers of each array element. For example, sum of -2 and 5 is 3.
For row2, find the sum of floating-point numbers of each array element. For example, sum of 1.32 and -3.14 is -1.82.

>> row1 = '-2,5 4,+3 +42,-53 4356246,-357532354 '
>> row2 = '1.32,-3.14 634,5.63 63.3e3,9907809345343.235 '

# should be same as the previous question
>> pat = /(.+?),(.+?) /

>> row1.scan(pat).map { |a, b| a.to_i + b.to_i }
=> [3, 7, -11, -353176108]

>> row2.scan(pat).map { |a, b| a.to_f + b.to_f }
=> [-1.82, 639.63, 9907809408643.234]

11) Use the split method to get the output as shown below.

>> ip = '42:no-output;1000:car-tr:u-ck;SQEX49801'

>> ip.split(/:.+?-(.+?);/)
=> ["42", "output", "1000", "tr:u-ck", "SQEX49801"]

12) Convert the comma separated strings to corresponding hash objects as shown below. Note that the input strings have an extra , at the end.

>> row1 = 'name:rohan,maths:75,phy:89,'
>> row2 = 'name:rose,maths:88,phy:92,'

>> pat = /(.+?):(.+?),/

>> row1.scan(pat).to_h
=> {"name"=>"rohan", "maths"=>"75", "phy"=>"89"}
>> row2.scan(pat).to_h
=> {"name"=>"rose", "maths"=>"88", "phy"=>"92"}

Character class

1) For the array items, filter all elements starting with hand and ending immediately with s or y or le.

>> items = %w[-handy hand handy unhand hands hand-icy handle]

>> items.grep(/\Ahand([sy]|le)\z/)
=> ["handy", "hands", "handle"]

2) Replace all whole words reed or read or red with X.

>> ip = 'redo red credible :read: rod reed'

>> ip.gsub(/\bre[ae]?d\b/, 'X')
=> "redo X credible :X: rod X"

3) For the array words, filter all elements containing e or i followed by l or n. Note that the order mentioned should be followed.

>> words = %w[surrender unicorn newer door empty eel pest]

>> words.grep(/[ei].*[ln]/)
=> ["surrender", "unicorn", "eel"]

4) For the array words, filter all elements containing e or i and l or n in any order.

>> words = %w[surrender unicorn newer door empty eel pest]

>> words.grep(/[ei].*[ln]|[ln].*[ei]/)
=> ["surrender", "unicorn", "newer", "eel"]

5) Convert the comma separated strings to corresponding hash objects as shown below.

>> row1 = 'name:rohan,maths:75,phy:89'
>> row2 = 'name:rose,maths:88,phy:92'

>> pat = /([^:]+):([^,]+),?/

>> row1.scan(pat).to_h
=> {"name"=>"rohan", "maths"=>"75", "phy"=>"89"}
>> row2.scan(pat).to_h
=> {"name"=>"rose", "maths"=>"88", "phy"=>"92"}

6) Delete from ( to the next occurrence of ) unless they contain parentheses characters in between.

>> str1 = 'def factorial()'
>> str2 = 'a/b(division) + c%d(#modulo) - (e+(j/k-3)*4)'
>> str3 = 'Hi there(greeting). Nice day(a(b)'

>> remove_parentheses = /\([^()]*\)/

>> str1.gsub(remove_parentheses, '')
=> "def factorial"
>> str2.gsub(remove_parentheses, '')
=> "a/b + c%d - (e+*4)"
>> str3.gsub(remove_parentheses, '')
=> "Hi there. Nice day(a"

7) For the array words, filter all elements not starting with e or p or u.

>> words = %w[surrender unicorn newer door empty eel (pest)]

>> words.grep(/\A[^epu]/)
=> ["surrender", "newer", "door", "(pest)"]

8) For the array words, filter all elements not containing u or w or ee or -.

>> words = %w[p-t you tea heel owe new reed ear]

>> words.grep_v(/[uw-]|ee/)
=> ["tea", "ear"]

9) The given input strings contain fields separated by , and fields can be empty too. Replace the last three fields with WHTSZ323.

>> row1 = '(2),kite,12,,D,C,,'
>> row2 = 'hi,bye,sun,moon'

>> pat = /(,[^,]*){3}\z/

>> row1.sub(pat, ',WHTSZ323')
=> "(2),kite,12,,D,WHTSZ323"
>> row2.sub(pat, ',WHTSZ323')
=> "hi,WHTSZ323"

10) Split the given strings based on consecutive sequence of digit or whitespace characters.

>> str1 = "lion \t Ink32onion Nice"
>> str2 = "**1\f2\n3star\t7 77\r**"

>> pat = /[\d\s]+/

>> str1.split(pat)
=> ["lion", "Ink", "onion", "Nice"]
>> str2.split(pat)
=> ["**", "star", "**"]

11) Delete all occurrences of the sequence <characters> where characters is one or more non > characters and cannot be empty.

>> ip = 'a<apple> 1<> b<bye> 2<> c<cat>'

>> ip.gsub(/<[^>]+>/, '')
=> "a 1<> b 2<> c"

12) \b[a-z](on|no)[a-z]\b is same as \b[a-z][on]{2}[a-z]\b. True or False? Sample input lines shown below might help to understand the differences, if any.

False. [on]{2} will also match oo and nn.

>> puts "known\nmood\nknow\npony\ninns"
known
mood
know
pony
inns

13) For the given array, filter elements containing any number sequence greater than 624.

>> items = ['h0000432ab', 'car00625', '42_624 0512', '96 foo1234baz 3.14 2']

>> items.filter { _1.gsub(/\d+/).any? { $&.to_i > 624 } }
=> ["car00625", "96 foo1234baz 3.14 2"]

14) Count the maximum depth of nested braces for the given strings. Unbalanced or wrongly ordered braces should return -1. Note that this will require a mix of regular expressions and Ruby code.

?> def max_nested_braces(ip)
?>   cnt = 0
?>   cnt += 1 while ip.gsub!(/\{[^{}]*\}/, '')
?>   return ip.match?(/[{}]/) ? -1 : cnt
>> end

>> max_nested_braces('a*b')
=> 0
>> max_nested_braces('}a+b{')
=> -1
>> max_nested_braces('a*b+{}')
=> 1
>> max_nested_braces('{{a+2}*{b+c}+e}')
=> 2
>> max_nested_braces('{{a+2}*{b+{c*d}}+e}')
=> 3
>> max_nested_braces("{{a+2}*{\n{b+{c*d}}+e*d}}")
=> 4
>> max_nested_braces('a*{b+c*{e*3.14}}}')
=> -1

15) By default, the split method will split on whitespace and remove empty strings from the result. Which regexp based method would you use to replicate this functionality?

>> ip = " \t\r  so  pole\t\t\t\n\nlit in to \r\n\v\f  "

>> ip.split
=> ["so", "pole", "lit", "in", "to"]

>> ip.scan(/\S+/)
=> ["so", "pole", "lit", "in", "to"]

16) Convert the given input string to two different arrays as shown below. You can optimize the regexp based on characters present in the input string.

>> ip = "price_42 roast^\t\n^-ice==cat\neast"

>> ip.split(/\W+/)
=> ["price_42", "roast", "ice", "cat", "east"]

>> ip.split(/(\W+)/)
=> ["price_42", " ", "roast", "^\t\n^-", "ice", "==", "cat", "\n", "east"]

17) Filter all elements whose first non-whitespace character is not a # character. Any element made up of only whitespace characters should be ignored as well.

>> items = ['    #comment', "\t\napple #42", '#oops', 'sure', 'no#1', "\t\r\f"]

# can also use: items.grep(/\A\s*[^#\s]/)
>> items.grep(/\A\s*+[^#]/)
=> ["\t\napple #42", "sure", "no#1"]

18) Extract all whole words for the given input strings. However, based on user input ignore, do not match words if they contain any character present in the ignore variable. Assume that ignore variable will not contain any regexp metacharacters.

>> s1 = 'match after the last newline character'
>> s2 = 'and then you want to test'

>> ignore = 'aty'
>> pat = /\b[\w&&[^#{ignore}]]+\b/
>> s1.scan(pat)
=> ["newline"]
>> s2.scan(pat)
=> []

>> ignore = 'esw'
>> pat = /\b[\w&&[^#{ignore}]]+\b/
>> s1.scan(pat)
=> ["match"]
>> s2.scan(pat)
=> ["and", "you", "to"]

19) Filter all whole elements with optional whitespaces at the start followed by three to five non-digit characters. Whitespaces at the start should not be part of the calculation for non-digit characters.

>> items = ["\t \ncat", 'goal', ' oh', 'he-he', 'goal2', 'ok ', 'sparrow']

>> items.grep(/\A\s*+\D{3,5}\z/)
=> ["\t \ncat", "goal", "he-he", "ok "]

20) Modify the given regexp such that it gives the expected result.

>> ip = '( S:12 E:5 S:4 and E:123 ok S:100 & E:10 S:1 - E:2 S:42 E:43 )'

# wrong output
>> ip.scan(/S:\d+.*?E:\d{2,}/)
=> ["S:12 E:5 S:4 and E:123", "S:100 & E:10", "S:1 - E:2 S:42 E:43"]

# expected output
>> ip.scan(/(?>S:\d+.*?E:)\d{2,}/)
=> ["S:4 and E:123", "S:100 & E:10", "S:42 E:43"]

Groupings and backreferences

1) Replace the space character that occurs after a word ending with a or r with a newline character.

>> ip = 'area not a _a2_ roar took 22'

>> puts ip.gsub(/([ar]) /, "\\1\n")
area
not a
_a2_ roar
took 22

2) Add [] around words starting with s and containing e and t in any order.

>> ip = 'sequoia subtle exhibit asset sets2 tests si_te'

>> ip.gsub(/\bs\w*(t\w*e|e\w*t)\w*/, '[\0]')
=> "sequoia [subtle] exhibit asset [sets2] tests [si_te]"

3) Replace all whole words with X that start and end with the same word character (irrespective of case). Single character word should get replaced with X too, as it satisfies the stated condition.

>> ip = 'oreo not a _a2_ Roar took 22'

# can also use: ip.gsub(/\b(\w|(\w)\w*\2)\b/i, 'X')
>> ip.gsub(/\b(\w)(\w*\1)?\b/i, 'X')
=> "X not X X X took X"

4) Convert the given markdown headers to corresponding anchor tags. Consider the input to start with one or more # characters followed by space and word characters. The name attribute is constructed by converting the header to lowercase and replacing spaces with hyphens. Can you do it without using a capture group?

>> header1 = '# Regular Expressions'
>> header2 = '## Named capture groups'

>> anchor = /\w.*/

>> header1.sub(anchor) { "<a name='#{$&.downcase.tr(' ', '-')}'></a>#{$&}" }
=> "# <a name='regular-expressions'></a>Regular Expressions"
>> header2.sub(anchor) { "<a name='#{$&.downcase.tr(' ', '-')}'></a>#{$&}" }
=> "## <a name='named-capture-groups'></a>Named capture groups"

5) Convert the given markdown anchors to corresponding hyperlinks.

>> anchor1 = "# <a name='regular-expressions'></a>Regular Expressions"
>> anchor2 = "## <a name='subexpression-calls'></a>Subexpression calls"

>> hyperlink = %r{[^']+'([^']+)'></a>(.+)}

>> anchor1.sub(hyperlink, '[\2](#\1)')
=> "[Regular Expressions](#regular-expressions)"
>> anchor2.sub(hyperlink, '[\2](#\1)')
=> "[Subexpression calls](#subexpression-calls)"

6) Count the number of whole words that have at least two occurrences of consecutive repeated alphabets. For example, words like stillness and Committee should be counted but not words like root or readable or rotational.

'> ip = %q{oppressed abandon accommodation bloodless
'> carelessness committed apparition innkeeper
'> occasionally afforded embarrassment foolishness
'> depended successfully succeeded
>> possession cleanliness suppress}

# can also use: ip.scan(/\b\w*(\w)\1\w*(\w)\2\w*\b/).size
>> ip.scan(/\b(\w*(\w)\2){2}\w*\b/).size
=> 13

7) For the given input string, replace all occurrences of digit sequences with only the unique non-repeating sequence. For example, 232323 should be changed to 23 and 897897 should be changed to 897. If there are no repeats (for example 1234) or if the repeats end prematurely (for example 12121), it should not be changed.

>> ip = '1234 2323 453545354535 9339 11 60260260'

>> ip.gsub(/\b(\d+)\1+\b/, '\1')
=> "1234 23 4535 9339 1 60260260"

8) Replace sequences made up of words separated by : or . by the first word of the sequence. Such sequences will end when : or . is not followed by a word character.

>> ip = 'wow:Good:2_two.five: hi-2 bye kite.777:water.'

>> ip.gsub(/([:.]\w*)+/, '')
=> "wow hi-2 bye kite"

9) Replace sequences made up of words separated by : or . by the last word of the sequence. Such sequences will end when : or . is not followed by a word character.

>> ip = 'wow:Good:2_two.five: hi-2 bye kite.777:water.'

>> ip.gsub(/((\w+)[:.])+/, '\2')
=> "five hi-2 bye water"

10) Split the given input string on one or more repeated sequence of cat.

>> ip = 'firecatlioncatcatcatbearcatcatparrot'

>> ip.split(/(?:cat)+/)
=> ["fire", "lion", "bear", "parrot"]

11) For the given input string, find all occurrences of digit sequences with at least one repeating sequence. For example, 232323 and 897897. If the repeats end prematurely, for example 12121, it should not be matched.

>> ip = '1234 2323 453545354535 9339 11 60260260'

>> pat = /\b(\d+)\1+\b/

# entire sequences in the output
>> ip.gsub(pat).map { $& }
=> ["2323", "453545354535", "11"]

# only the unique sequence in the output
>> ip.gsub(pat).map { $1 }
=> ["23", "4535", "1"]

12) Convert the comma separated strings to corresponding hash objects as shown below. The keys are name, maths and phy for the three fields in the input strings.

>> row1 = 'rohan,75,89'
>> row2 = 'rose,88,92'

>> pat = /(?<name>[^,]+),(?<maths>[^,]+),(?<phy>[^,]+)/

>> row1.match(pat).named_captures
=> {"name"=>"rohan", "maths"=>"75", "phy"=>"89"}
>> row2.match(pat).named_captures
=> {"name"=>"rose", "maths"=>"88", "phy"=>"92"}

13) Surround all whole words with (). Additionally, if the whole word is imp or ant, delete them. Can you do it with just a single substitution?

>> ip = 'tiger imp goat eagle ant important'

>> ip.gsub(/\b(?:imp|ant|(\w+))\b/, '(\1)')
=> "(tiger) () (goat) (eagle) () (important)"

14) Filter all elements that contain a sequence of lowercase alphabets followed by - followed by digits. They can be optionally surrounded by {{ and }}. Any partial match shouldn't be part of the output.

>> ip = %w[{{apple-150}} {{mango2-100}} {{cherry-200 grape-87 {{go-to}}]

>> ip.grep(/\A({{)?[a-z]+-\d+(?(1)}})\z/)
=> ["{{apple-150}}", "grape-87"]

15) Extract all hexadecimal character sequences, with 0x optional prefix. Match the characters case insensitively, and the sequences shouldn't be surrounded by other word characters.

>> str1 = '128A foo 0xfe32 34 0xbar'
>> str2 = '0XDEADBEEF place 0x0ff1ce bad'

>> hex_seq = /\b(?:0x)?\h+\b/i

>> str1.scan(hex_seq)
=> ["128A", "0xfe32", "34"]
>> str2.scan(hex_seq)
=> ["0XDEADBEEF", "0x0ff1ce", "bad"]

16) Replace sequences made up of words separated by : or . by the first/last word of the sequence and the separator. Such sequences will end when : or . is not followed by a word character.

>> ip = 'wow:Good:2_two.five: hi-2 bye kite.777:water.'

# first word of the sequence
>> ip.gsub(/((\w+[:.]))\g<2>+/, '\1')
=> "wow: hi-2 bye kite."

# last word of the sequence
>> ip.gsub(/(\w+[:.])\g<1>+/, '\1')
=> "five: hi-2 bye water."

17) For the given input strings, extract if followed by any number of nested parentheses. Assume that there will be only one such pattern per input string.

>> ip1 = 'for (((i*3)+2)/6) if(3-(k*3+4)/12-(r+2/3)) while()'
>> ip2 = 'if+while if(a(b)c(d(e(f)1)2)3) for(i=1)'

>> pat = /if(\((?:[^()]++|\g<1>)++\))/

>> ip1[pat]
=> "if(3-(k*3+4)/12-(r+2/3))"
>> ip2[pat]
=> "if(a(b)c(d(e(f)1)2)3)"

18) The given input string has sequences made up of words separated by : or . and such sequences will end when : or . is not followed by a word character. For all such sequences, display only the last word followed by - followed by the first word.

>> ip = 'wow:Good:2_two.five: hi-2 bye kite.777:water.'

>> ip.scan(/(\w+)[:.](?:(\w+)[:.])+/).map { "#{_2}-#{_1}" }
=> ["five-wow", "water-kite"]

Lookarounds

Please use lookarounds for solving the following exercises even if you can do it without lookarounds. Unless you cannot use lookarounds for cases like variable length lookbehinds.

1) Replace all whole words with X unless it is preceded by a ( character.

>> ip = '(apple) guava berry) apple (mango) (grape'

>> ip.gsub(/(?<!\()\b\w+/, 'X')
=> "(apple) X X) X (mango) (grape"

2) Replace all whole words with X unless it is followed by a ) character.

>> ip = '(apple) guava berry) apple (mango) (grape'

>> ip.gsub(/\w+\b(?!\))/, 'X')
=> "(apple) X berry) X (mango) (X"

3) Replace all whole words with X unless it is preceded by ( or followed by ) characters.

>> ip = '(apple) guava berry) apple (mango) (grape'

>> ip.gsub(/(?<!\()\b\w+\b(?!\))/, 'X')
=> "(apple) X berry) X (mango) (grape"

4) Extract all whole words that do not end with e or n.

>> ip = 'a_t row on Urn e note Dust n end a2-e|u'

>> ip.scan(/\b\w+\b(?<![en])/)
=> ["a_t", "row", "Dust", "end", "a2", "u"]

5) Extract all whole words that do not start with a or d or n.

>> ip = 'a_t row on Urn e note Dust n end a2-e|u'

>> ip.scan(/(?![adn])\b\w+\b/)
=> ["row", "on", "Urn", "e", "Dust", "end", "e", "u"]

6) Extract all whole words only if they are followed by : or , or -.

>> ip = 'Poke,on=-=so_good:ink.to/is(vast)ever2-sit'

>> ip.scan(/\w+(?=[:,-])/)
=> ["Poke", "so_good", "ever2"]

7) Extract all whole words only if they are preceded by = or / or -.

>> ip = 'Poke,on=-=so_good:ink.to/is(vast)ever2-sit'

# can also use: ip.scan(%r{[=/-]\K\w+})
>> ip.scan(%r{(?<=[=/-])\w+})
=> ["so_good", "is", "sit"]

8) Extract all whole words only if they are preceded by = or : and followed by : or ..

>> ip = 'Poke,on=-=so_good:ink.to/is(vast)ever2-sit'

# can also use: ip.scan(/[=:]\K\w+(?=[:.])/)
>> ip.scan(/(?<=[=:])\w+(?=[:.])/)
=> ["so_good", "ink"]

9) Extract all whole words only if they are preceded by = or : or . or ( or - and not followed by . or /.

>> ip = 'Poke,on=-=so_good:ink.to/is(vast)ever2-sit'

# can also use: ip.scan(%r{[=:.(-]\K\w+\b(?![/.])})
>> ip.scan(%r{(?<=[=:.(-])\w+\b(?![/.])})
=> ["so_good", "vast", "sit"]

10) Remove the leading and trailing whitespaces from all the individual fields where , is the field separator.

>> csv1 = " comma  ,separated ,values \t\r "
>> csv2 = 'good bad,nice  ice  , 42 , ,   stall   small'

>> remove_whitespace = /(?<![^,])\s+|\s+(?![^,])/

>> csv1.gsub(remove_whitespace, '')
=> "comma,separated,values"
>> csv2.gsub(remove_whitespace, '')
=> "good bad,nice  ice,42,,stall   small"

11) Filter elements that satisfy all of these rules:

should have at least two alphabets
should have at least three digits
should have at least one special character among % or * or # or $
should not end with a whitespace character

>> pwds = ['hunter2', 'F2H3u%9', "*X3Yz3.14\t", 'r2_d2_42', 'A $B C1234']

>> rule_chk = /(?=(.*[a-zA-Z]){2})(?=(.*\d){3})(?!.*\s\z).*[%*#$]/

>> pwds.grep(rule_chk)
=> ["F2H3u%9", "A $B C1234"]

12) For the given string, surround all whole words with {} except for whole words par and cat and apple.

>> ip = 'part; cat {super} rest_42 par scatter apple spar'

>> ip.gsub(/\b(?!(?:par|cat|apple)\b)\w+/, '{\0}')
=> "{part}; cat {{super}} {rest_42} par {scatter} apple {spar}"

13) Extract the integer portion of floating-point numbers for the given string. Integers and numbers ending with . and no further digits should not be considered.

>> ip = '12 ab32.4 go 5 2. 46.42 5'

>> ip.scan(/\d+(?=\.\d)/)
=> ["32", "46"]

14) For the given input strings, extract all overlapping two character sequences.

>> s1 = 'apple'
>> s2 = '1.2-3:4'

>> pat = /.(?=(.))/

>> s1.gsub(pat).map { $& + $1 }
=> ["ap", "pp", "pl", "le"]
>> s2.gsub(pat).map { $& + $1 }
=> ["1.", ".2", "2-", "-3", "3:", ":4"]

15) The given input strings contain fields separated by the : character. Delete : and the last field if there is a digit character anywhere before the last field.

>> s1 = '42:cat'
>> s2 = 'twelve:a2b'
>> s3 = 'we:be:he:0:a:b:bother'
>> s4 = 'apple:banana-42:cherry:'
>> s5 = 'dragon:unicorn:centaur'

>> pat = /(\d.*):.*/

>> s1.sub(pat, '\1')
=> "42"
>> s2.sub(pat, '\1')
=> "twelve:a2b"
>> s3.sub(pat, '\1')
=> "we:be:he:0:a:b"
>> s4.sub(pat, '\1')
=> "apple:banana-42:cherry"
>> s5.sub(pat, '\1')
=> "dragon:unicorn:centaur"

16) Extract all whole words unless they are preceded by : or <=> or ---- or #.

>> ip = '::very--at<=>row|in.a_b#b2c=>lion----east'

>> ip.scan(/(?<![:#]|<=>|-{4})\b\w+/)
=> ["at", "in", "a_b", "lion"]

17) Match strings if it contains qty followed by price but not if there is any whitespace character or the string error between them.

>> str1 = '23,qty,price,42'
>> str2 = 'qty price,oh'
>> str3 = '3.14,qty,6,errors,9,price,3'
>> str4 = "42\nqty-6,apple-56,price-234,error"
>> str5 = '4,price,3.14,qty,4'
>> str6 = '(qtyprice) (hi-there)'

# can also use: neg = /qty((?!\s|error).)*price/
>> neg = /qty(?~\s|error)price/

>> str1.match?(neg)
=> true
>> str2.match?(neg)
=> false
>> str3.match?(neg)
=> false
>> str4.match?(neg)
=> true
>> str5.match?(neg)
=> false
>> str6.match?(neg)
=> true

18) Can you reason out why the following regular expressions behave differently?

\b matches both the start and end of word locations. In the below example, \b..\b doesn't necessarily mean that the first \b will match only the start of word location and the second \b will match only the end of word location. They can be any combination! For example, I followed by space in the input string here is using the start of word location for both the conditions. Similarly, space followed by 2 is using the end of word location for both the conditions.

In contrast, the negative lookarounds version ensures that there are no word characters around any two characters. Also, such assertions will always be satisfied at the start of string and the end of string respectively. But \b depends on the presence of word characters. For example, ! at the end of the input string here matches the lookaround assertion but not word boundary.

>> ip = 'I have 12, he has 2!'

>> ip.gsub(/\b..\b/, '{\0}')
=> "{I }have {12}{, }{he} has{ 2}!"

>> ip.gsub(/(?<!\w)..(?!\w)/, '{\0}')
=> "I have {12}, {he} has {2!}"

19) The given input strings have fields separated by the : character. Assume that each string has a minimum of two fields and cannot have empty fields. Extract all fields, but stop if a field with a digit character is found.

>> row1 = 'vast:a2b2:ride:in:awe:b2b:3list:end'
>> row2 = 'um:no:low:3e:s4w:seer'
>> row3 = 'oh100:apple:banana:fig'
>> row4 = 'Dragon:Unicorn:Wizard-Healer'

>> pat = /\G([^\d:]+)(?::|\z)/

>> row1.gsub(pat).map { $1 }
=> ["vast"]
>> row2.gsub(pat).map { $1 }
=> ["um", "no", "low"]
>> row3.gsub(pat).map { $1 }
=> []
>> row4.gsub(pat).map { $1 }
=> ["Dragon", "Unicorn", "Wizard-Healer"]

20) The given input strings have fields separated by the : character. Extract all fields only after a field containing a digit character is found. Assume that each string has a minimum of two fields and cannot have empty fields.

>> row1 = 'vast:a2b2:ride:in:awe:b2b:3list:end'
>> row2 = 'um:no:low:3e:s4w:seer'
>> row3 = 'oh100:apple:banana:fig'
>> row4 = 'Dragon:Unicorn:Wizard-Healer'

>> pat = /(?:\d[^:]*|\G):\K[^:]+/

>> row1.scan(pat)
=> ["ride", "in", "awe", "b2b", "3list", "end"]
>> row2.scan(pat)
=> ["s4w", "seer"]
>> row3.scan(pat)
=> ["apple", "banana", "fig"]
>> row4.scan(pat)
=> []

21) The given input string has comma separated fields and some of them can occur more than once. For the duplicated fields, retain only the rightmost one. Assume that there are no empty fields.

>> row = '421,cat,2425,42,5,cat,6,6,42,61,6,6,scat,6,6,4,Cat,425,4'

>> row.gsub(/(?<![^,])([^,]+),(?=.*(?<![^,])\1(?![^,]))/, '')
=> "421,2425,5,cat,42,61,scat,6,Cat,425,4"

Modifiers

1) Remove from the first occurrence of hat to the last occurrence of it for the given input strings. Match these markers case insensitively.

>> s1 = "But Cool THAT\nsee What okay\nwow quite"
>> s2 = 'it this hat is sliced HIT.'

>> pat = /hat.*it/im

>> s1.sub(pat, '')
=> "But Cool Te"
>> s2.sub(pat, '')
=> "it this ."

2) Delete from the string start if it is at the beginning of a line up to the next occurrence of the string end at the end of a line. Match these keywords irrespective of case.

'> para = %q{good start
'> start working on that
'> project you always wanted
'> to, do not let it end
'> hi there
'> start and end the end
'> 42
'> Start and try to
'> finish the End
>> bye}

>> pat = /^start.*?end$/im

>> puts para.gsub(pat, '')
good start

hi there

42

bye

3) For the given markdown file, replace all occurrences of the string ruby (irrespective of case) with the string Ruby. However, any match within code blocks that start with the whole line ```ruby and end with the whole line ``` shouldn't be replaced. Consider the input file to be small enough to fit memory requirements.

Refer to the exercises folder for input files required to solve this exercise.

>> ip_str = File.open('sample.md').read
>> pat = /(^```ruby$.*?^```$)/m

>> File.open('sample_mod.md', 'w') do |f|
?>   ip_str.split(pat).each_with_index do |s, i|
?>     f.write(i.odd? ? s : s.gsub(/ruby/i) { $&.capitalize })
>>   end
>> end

>> File.open('sample_mod.md').read == File.open('expected.md').read
=> true

4) Write a string method that changes the given input to alternate case (starting with lowercase first).

?> def aLtErNaTe_CaSe(ip_str)
?>   b = true
?>   return ip_str.gsub(/[a-z]/i) { (b = !b) ? $&.upcase : $&.downcase }
>> end

>> aLtErNaTe_CaSe('HI THERE!')
=> "hI tHeRe!"
>> aLtErNaTe_CaSe('good morning')
=> "gOoD mOrNiNg"
>> aLtErNaTe_CaSe('Sample123string42with777numbers')
=> "sAmPlE123sTrInG42wItH777nUmBeRs"

5) For the given input strings, match all of these three conditions:

This case sensitively
nice and cool case insensitively

>> s1 = 'This is nice and Cool'
>> s2 = 'Nice and cool this is'
>> s3 = 'What is so nice and cool about This?'
>> s4 = 'nice,cool,This'
>> s5 = 'not nice This?'
>> s6 = 'This is not cool'

>> pat = /(?=.*nice)(?=.*cool)(?-i:.*This)/i

>> s1.match?(pat)
=> true
>> s2.match?(pat)
=> false
>> s3.match?(pat)
=> true
>> s4.match?(pat)
=> true
>> s5.match?(pat)
=> false
>> s6.match?(pat)
=> false

6) For the given input strings, match if the string begins with Th and also contains a line that starts with There.

>> s1 = "There there\nHave a cookie"
>> s2 = "This is a mess\nYeah?\nThereeeee"
>> s3 = "Oh\nThere goes the fun"
>> s4 = 'This is not\ngood\nno There'

>> pat = /\A(?=Th)(?m:.*^There)/

>> s1.match?(pat)
=> true
>> s2.match?(pat)
=> true
>> s3.match?(pat)
=> false
>> s4.match?(pat)
=> false

Unicode

1) Output true or false depending on input string made up of ASCII characters or not. Consider the input to be non-empty strings and any character that isn't part of the 7-bit ASCII set should give false.

>> str1 = '123—456'
>> str2 = 'good fοοd'
>> str3 = 'happy learning!'

# can also use ! str1.match?(/[^\u{00}-\u{7f}]/)
>> str1.ascii_only?
=> false
>> str2.ascii_only?
=> false
>> str3.ascii_only?
=> true

2) Retain only punctuation characters for the given strings (generated from codepoints). Use the Unicode character set definition for punctuation for solving this exercise.

>> s1 = (0..0x7f).to_a.pack('U*')
>> s2 = (0x80..0xff).to_a.pack('U*')
>> s3 = (0x2600..0x27eb).to_a.pack('U*')

>> pat = /\p{^P}/

>> s1.gsub(pat, '')
=> "!\"#%&'()*,-./:;?@[\\]_{}"
>> s2.gsub(pat, '')
=> "¡§«¶·»¿"
>> s3.gsub(pat, '')
=> "❨❩❪❫❬❭❮❯❰❱❲❳❴❵⟅⟆⟦⟧⟨⟩⟪⟫"

3) Explore the following Q&A threads.

Understanding Ruby Regexp