Regexp introduction

In this chapter, you'll get to know how to declare and use regexps. For some examples, the equivalent normal string method is also shown for comparison. The main focus will be to get you comfortable with syntax and text processing examples. Three methods will be introduced in this chapter. The match? method to search if the input contains a string and the sub and gsub methods to substitute a portion of the input with something else. Regular expression features will be covered from the next chapter onwards.

This book will use the terms regular expressions and regexp interchangeably.

Regexp documentation

It is always a good idea to know where to find the documentation. Visit ruby-doc: Regexp for information on Regexp class, available methods, syntax, features, examples and more. Here's a quote from an older version of the documentation:

Regular expressions (regexps) are patterns which describe the contents of a string. They're used for testing whether a string contains a given pattern, or extracting the portions that match. They are created with the /pat/ and %r{pat} literals or the Regexp.new constructor.

match? method

First up, a simple example to test whether a string is part of another string or not. Normally, you'd use the include? method and pass a string as an argument. For regular expressions, use the match? method and enclose the search string within // delimiters (regexp literal).

>> sentence = 'This is a sample string'

# check if 'sentence' contains the given search string
>> sentence.include?('is')
=> true
>> sentence.include?('z')
=> false

# check if 'sentence' matches the pattern described by the regexp argument
>> sentence.match?(/is/)
=> true
>> sentence.match?(/z/)
=> false

# unlike the include? method, match? is commutative
>> /ring/.match?(sentence)
=> true

The match? method accepts an optional second argument which specifies the index to start searching from.

>> sentence = 'This is a sample string'

>> sentence.match?(/is/, 2)
=> true
>> sentence.match?(/is/, 6)
=> false

Regexp modifiers

Some of the regular expressions functionality is enabled by passing modifiers, represented by an alphabet character. Modifiers are similar to command-line options, for example grep -i will perform case insensitive matching. These will be discussed in detail in the Modifiers chapter. Here's an example for the i modifier.

>> sentence = 'This is a sample string'

>> sentence.match?(/this/)
=> false

# 'i' modifier enables case insensitive matching
>> sentence.match?(/this/i)
=> true

Regexp literal reuse and interpolation

The regexp literal can be saved in a variable. This helps to improve code clarity, pass around as method arguments, enable reuse, etc.

>> pet = /dog/i
>> pet
=> /dog/i

>> 'They bought a Dog'.match?(pet)
=> true
>> 'A cat crossed their path'.match?(pet)
=> false

Similar to double quoted string literals, you can use interpolation and escape sequences within a regexp literal. See ruby-doc: Strings for syntax details on string escape sequences. Regexp literals have their own special escapes, which will be discussed in the Escape sequences section.

>> "cat\tdog".match?(/\t/)
=> true
>> "cat\tdog".match?(/\a/)
=> false

>> greeting = 'hi'
>> /#{greeting} there/
=> /hi there/
>> /#{greeting.upcase} there/
=> /HI there/
>> /#{2**4} apples/
=> /16 apples/

sub and gsub methods

For search and replace requirements, use the sub or gsub methods. The sub method will replace only the first occurrence of the match, whereas gsub will replace all the occurrences. The regexp pattern to match against the input string has to be passed as the first argument. The second argument specifies the string to replace the portions matched by the pattern.

>> greeting = 'Have a nice weekend'

# replace the first occurrence of 'e' with 'E'
>> greeting.sub(/e/, 'E')
=> "HavE a nice weekend"

# replace all occurrences of 'e' with 'E'
>> greeting.gsub(/e/, 'E')
=> "HavE a nicE wEEkEnd"

Use the sub! and gsub! methods for in-place substitution.

>> word = 'cater'

# this will return a string object, won't modify the 'word' variable
>> word.sub(/cat/, 'wag')
=> "wager"
>> word
=> "cater"

# this will modify the 'word' variable itself
>> word.sub!(/cat/, 'wag')
=> "wager"
>> word
=> "wager"

Regexp operators

Ruby also provides operators for regexp matching.

=~ match operator returns the index of the first match and nil if a match is not found
!~ match operator returns true if the input string doesn't contain the given regexp and false otherwise
=== match operator returns true or false similar to the match? method

>> sentence = 'This is a sample string'

# can also use: /is/ =~ sentence
>> sentence =~ /is/
=> 2
>> sentence =~ /z/
=> nil

# can also use: /z/ !~ sentence
>> sentence !~ /z/
=> true
>> sentence !~ /is/
=> false

Just like the match? method, both =~ and !~ can be used in a conditional statement.

>> sentence = 'This is a sample string'

>> puts 'hi' if sentence =~ /is/
hi

>> puts 'oh' if sentence !~ /z/
oh

The === operator comes in handy with Enumerable methods like grep, grep_v, all?, any?, etc.

>> sentence = 'This is a sample string'

# regexp literal has to be on LHS and input string on RHS
>> /is/ === sentence
=> true
>> /z/ === sentence
=> false

>> words = %w[cat attempt tattle]
>> words.grep(/tt/)
=> ["attempt", "tattle"]
>> words.all?(/at/)
=> true
>> words.none?(/temp/)
=> false

A key difference from the match? method is that these operators will also set regexp related global variables.

Cheatsheet and Summary

Note	Description
ruby-doc: Regexp	Ruby Regexp documentation
Onigmo doc	Onigmo library documentation
`/pat/` or `%r{pat}`	regexp literal
	interpolation and escape sequences can also be used
`var = /pat/`	save regexp literals in a variable
`/pat1#{expr}pat2/`	use the result of an expression to build regexps
`s.match?(/pat/)`	check if string `s` matches the pattern `/pat/`
	returns `true` or `false`
`s.match?(/pat/, 3)`	optional 2nd argument changes the starting index of search
`/pat/i`	modifier `i` matches alphabets case insensitively
`s.sub(/pat/, 'replace')`	search and replace the first matching occurrence
	use `gsub` to replace all occurrences
	use `sub!` and `gsub!` for in-place substitutions
`s =~ /pat/` or `/pat/ =~ s`	returns the index of the first match or `nil`
`s !~ /pat/` or `/pat/ !~ s`	returns `true` if no match, `false` otherwise
`/pat/ === s`	returns `true` or `false` similar to `match?`
	these operators will also set regexp global variables

This chapter introduced the Regexp class and the methods match?, sub and gsub were discussed. You also learnt how to save and reuse regexp literals, how to specify modifiers and how to use regexp operators.

You might wonder why there are so many ways to test a matching condition with regexps. The most common approach is to use the match? method in a conditional statement. If you need the position of match, use the =~ operator or the index method. The === operator is usually relevant in Enumerable methods. Usage of global variables will be covered in later chapters. The =~ and !~ operators are also prevalent in command-line usage (see my Ruby One-Liners Guide for examples).

The next section has exercises to test your understanding of the concepts introduced in this chapter. Please do solve them before moving on to the next chapter.

Exercises

Try to solve the exercises in every chapter using only the features discussed until that chapter. Some of the exercises will be easier to solve with techniques presented in the later chapters, but the aim of these exercises is to explore the features presented so far.

All the exercises are also collated together in one place at Exercises.md. For solutions, see Exercise_solutions.md.

1) Check whether the given strings contain 0xB0. Display a boolean result as shown below.

>> line1 = 'start address: 0xA0, func1 address: 0xC0'
>> line2 = 'end address: 0xFF, func2 address: 0xB0'

>> line1.match?()       ##### add your solution here
=> false
>> line2.match?()       ##### add your solution here
=> true

2) Check if the given input strings contain two irrespective of case.

>> s1 = 'Their artwork is exceptional'
>> s2 = 'one plus tw0 is not three'
>> s3 = 'TRUSTWORTHY'

>> pat1 = //        ##### add your solution here

>> pat1.match?(s1)
=> true
>> pat1.match?(s2)
=> false
>> pat1.match?(s3)
=> true

3) Replace all occurrences of 5 with five for the given string.

>> ip = 'They ate 5 apples and 5 oranges'

>> ip.gsub(//, 'five')      ##### add your solution here
=> "They ate five apples and five oranges"

4) Replace only the first occurrence of 5 with five for the given string.

>> ip = 'They ate 5 apples and 5 oranges'

>> ip.sub(//, 'five')       ##### add your solution here
=> "They ate five apples and 5 oranges"

5) For the given array, filter all elements that do not contain e.

>> items = %w[goal new user sit eat dinner]

>> items.grep_v(//)     ##### add your solution here
=> ["goal", "sit"]

6) Replace all occurrences of note irrespective of case with X.

>> ip = 'This note should not be NoTeD'

>> ip.gsub(//, 'X')     ##### add your solution here
=> "This X should not be XD"

7) For the given input string, print all lines NOT containing the string 2.

'> purchases = %q{items qty
'> apple 24
'> mango 50
'> guava 42
'> onion 31
>> water 10}

>> num = //     ##### add your solution here

>> puts purchases.each_line.grep_v(num)
items qty
mango 50
onion 31
water 10

8) For the given array, filter all elements that contain either a or w.

>> items = %w[goal new user sit eat dinner]

>> items.filter { }     ##### add your solution here
=> ["goal", "new", "eat"]

9) For the given array, filter all elements that contain both e and n.

>> items = %w[goal new user sit eat dinner]

>> items.filter { }     ##### add your solution here
=> ["new", "dinner"]

10) For the given string, replace 0xA0 with 0x7F and 0xC0 with 0x1F.

>> ip = 'start address: 0xA0, func1 address: 0xC0'

##### add your solution here
=> "start address: 0x7F, func1 address: 0x1F"

11) Find the starting index of the first occurrence of is for the given input string.

>> ip = 'match this after the history lesson'

##### add your solution here
=> 8

Understanding Ruby Regexp