Regexp introduction

In this chapter, you'll get to know how to declare and use regexps. For some examples, the equivalent normal string method is shown for comparison. Regular expression features will be covered next chapter onwards. The main focus will be to get you comfortable with syntax and text processing examples. Three methods will be introduced in this chapter. The match? method to search if the input contains a string and the sub and gsub methods to substitute a portion of the input with something else.

info This book will use the terms regular expressions and regexp interchangeably.

Regexp documentation

It is always a good idea to know where to find the documentation. Visit ruby-doc: Regexp for information on Regexp class, available methods, syntax, features, examples and more. Here's a quote:

Regular expressions (regexps) are patterns which describe the contents of a string. They're used for testing whether a string contains a given pattern, or extracting the portions that match. They are created with the /pat/ and %r{pat} literals or the Regexp.new constructor.

match? method

First up, a simple example to test whether a string is part of another string or not. Normally, you'd use the include? method and pass a string as argument. For regular expressions, use the match? method and enclose the search string within // delimiters (regexp literal).

>> sentence = 'This is a sample string'

# check if 'sentence' contains the given string argument
>> sentence.include?('is')
=> true
>> sentence.include?('z')
=> false

# check if 'sentence' matches the pattern as described by the regexp argument
>> sentence.match?(/is/)
=> true
>> sentence.match?(/z/)
=> false

The match? method accepts an optional second argument which specifies the index to start searching from.

>> sentence = 'This is a sample string'

>> sentence.match?(/is/, 2)
=> true
>> sentence.match?(/is/, 6)
=> false

Some of the regular expressions functionality is enabled by passing modifiers, represented by an alphabet character. If you have used command line, modifiers are similar to command options, for example grep -i will perform case insensitive matching. It will be discussed in detail in Modifiers chapter. Here's an example for i modifier.

>> sentence = 'This is a sample string'

>> sentence.match?(/this/)
=> false
# 'i' is a modifier to enable case insensitive matching
>> sentence.match?(/this/i)
=> true

Regexp literal reuse and interpolation

The regexp literal can be saved in a variable. This helps to improve code clarity, pass around as method argument, enable reuse, etc.

>> pet = /dog/i
>> pet
=> /dog/i

>> 'They bought a Dog'.match?(pet)
=> true
>> 'A cat crossed their path'.match?(pet)
=> false

Similar to double quoted string literals, you can use interpolation and escape sequences in a regexp literal. See ruby-doc: Strings for syntax details on string escape sequences. Regexp literals have their own special escapes, which will be discussed in Escape sequences section.

>> "cat\tdog".match?(/\t/)
=> true
>> "cat\tdog".match?(/\a/)
=> false

>> greeting = 'hi'
>> /#{greeting} there/
=> /hi there/
>> /#{greeting.upcase} there/
=> /HI there/
>> /#{2**4} apples/
=> /16 apples/

sub and gsub methods

For search and replace, use sub or gsub methods. The sub method will replace only the first occurrence of the match, whereas gsub will replace all the occurrences. The regexp pattern to match against the input string has to be passed as the first argument. The second argument specifies the string which will replace the portions matched by the pattern.

>> greeting = 'Have a nice weekend'

# replace first occurrence of 'e' with 'E'
>> greeting.sub(/e/, 'E')
=> "HavE a nice weekend"
# replace all occurrences of 'e' with 'E'
>> greeting.gsub(/e/, 'E')
=> "HavE a nicE wEEkEnd"

Use sub! and gsub! methods for in-place substitution.

>> word = 'cater'

# this will return a string object, won't modify 'word' variable
>> word.sub(/cat/, 'wag')
=> "wager"
>> word
=> "cater"

# this will modify 'word' variable itself
>> word.sub!(/cat/, 'wag')
=> "wager"
>> word
=> "wager"

Regexp operators

Ruby also provides operators for regexp matching.

  • =~ match operator returns index of the first match and nil if match is not found
  • !~ match operator returns true if string doesn't contain the given regexp and false otherwise
  • === match operator returns true or false similar to the match? method
>> sentence = 'This is a sample string'

# can also use: /is/ =~ sentence
>> sentence =~ /is/
=> 2
>> sentence =~ /z/
=> nil

# can also use: /z/ !~ sentence
>> sentence !~ /z/
=> true
>> sentence !~ /is/
=> false

Just like match? method, both =~ and !~ can be used in a conditional statement.

>> sentence = 'This is a sample string'

>> puts 'hi' if sentence =~ /is/
hi

>> puts 'oh' if sentence !~ /z/
oh

The === operator comes in handy with Enumerable methods like grep, grep_v, all?, any?, etc.

>> sentence = 'This is a sample string'

# regexp literal has to be on LHS and input string on RHS
>> /is/ === sentence
=> true
>> /z/ === sentence
=> false

>> words = %w[cat attempt tattle]
>> words.grep(/tt/)
=> ["attempt", "tattle"]
>> words.all?(/at/)
=> true
>> words.none?(/temp/)
=> false

info A key difference from match? method is that these operators will also set regexp related global variables.

Cheatsheet and Summary

NoteDescription
ruby-doc: RegexpRuby Regexp documentation
Onigmo docOnigmo library documentation
/pat/ or %r{pat}regexp literal
interpolation and escape sequences can also be used
var = /pat/save regexp literal in a variable
/pat1#{expr}pat2/use result of an expression to build regexp
s.match?(/pat/)check if string s matches the pattern /pat/
returns true or false
s.match?(/pat/, 3)optional 2nd argument changes starting index of search
/pat/imodifier i matches alphabets case insensitively
s.sub(/pat/, 'replace')search and replace first matching occurrence
use gsub to replace all occurrences
use sub! and gsub! for in-place substitution
s =~ /pat/ or /pat/ =~ sreturns index of first match or nil
s !~ /pat/ or /pat/ !~ sreturns true if no match or false
/pat/ === sreturns true or false similar to match?
these operators will also set regexp global variables

This chapter introduced the Regexp class and methods match?, sub and gsub were discussed. You also learnt how to save and reuse regexp literals, how to specify modifiers and how to use regexp operators.

You might wonder why there are so many ways to test matching condition with regexps. The most common approach is to use match? method in a conditional statement. If you need position of match, use =~ operator or index method. The === operator is usually relevant in Enumerable methods. Usage of global variables will be covered in later chapters. The =~ and !~ operators are also prevalent in command line usage (see my Ruby one liners tutorial for examples).

The next section has exercises to test your understanding of the concepts introduced in this chapter. Please do solve them before moving on to the next chapter.

Exercises

info Refer to exercises folder for input files required to solve the exercises.

info All the exercises are also collated together in one place at Exercises.md. For solutions, see Exercise_solutions.md.

a) Check whether the given strings contain 0xB0. Display a boolean result as shown below.

>> line1 = 'start address: 0xA0, func1 address: 0xC0'
>> line2 = 'end address: 0xFF, func2 address: 0xB0'

>> line1.match?()       ##### add your solution here
=> false
>> line2.match?()       ##### add your solution here
=> true

b) For the given input file, print all lines containing the string two.

# note that expected output shown here is wrapped to fit pdf width
>> filename = 'programming_quotes.txt'

>> word =       ##### add your solution here

>> puts File.foreach(filename).grep(word)
"Some people, when confronted with a problem, think - I know, I'll use regular
expressions. Now they have two problems" by Jamie Zawinski
"So much complexity in software comes from trying to make one thing do two
things" by Ryan Singer

c) Replace all occurrences of 5 with five for the given string.

>> ip = 'They ate 5 apples and 5 oranges'

>> ip.gsub(//, 'five')      ##### add your solution here
=> "They ate five apples and five oranges"

d) Replace first occurrence of 5 with five for the given string.

>> ip = 'They ate 5 apples and 5 oranges'

>> ip.sub(//, 'five')       ##### add your solution here
=> "They ate five apples and 5 oranges"

e) For the given array, filter all elements that do not contain e.

>> items = %w[goal new user sit eat dinner]

>> items.grep_v(//)     ##### add your solution here
=> ["goal", "sit"]

f) Replace all occurrences of note irrespective of case with X.

>> ip = 'This note should not be NoTeD'

>> ip.gsub(//, 'X')     ##### add your solution here
=> "This X should not be XD"

g) For the given input string, print all lines NOT containing the string 2

'> purchases = %q{items qty
'> apple 24
'> mango 50
'> guava 42
'> onion 31
>> water 10}

>> num = //     ##### add your solution here

>> puts purchases.each_line.grep_v(num)
items qty
mango 50
onion 31
water 10

h) For the given array, filter all elements that contains either a or w.

>> items = %w[goal new user sit eat dinner]

>> items.filter { }     ##### add your solution here
=> ["goal", "new", "eat"]

i) For the given array, filter all elements that contains both e and n.

>> items = %w[goal new user sit eat dinner]

>> items.filter { }     ##### add your solution here
=> ["new", "dinner"]

j) For the given string, replace 0xA0 with 0x7F and 0xC0 with 0x1F.

>> ip = 'start address: 0xA0, func1 address: 0xC0'

##### add your solution here
=> "start address: 0x7F, func1 address: 0x1F"

k) Find the starting index of the first occurrence of is for the given input string.

>> ip = 'match this after the history lesson'

##### add your solution here
=> 8