Regexp introduction
In this chapter, you'll get to know how to declare and use regexps. For some examples, the equivalent normal string method is shown for comparison. Regular expression features will be covered next chapter onwards. The main focus will be to get you comfortable with syntax and text processing examples. Three methods will be introduced in this chapter. The match?
method to search if the input contains a string and the sub
and gsub
methods to substitute a portion of the input with something else.
This book will use the terms regular expressions and regexp interchangeably.
Regexp documentation
It is always a good idea to know where to find the documentation. Visit ruby-doc: Regexp for information on Regexp
class, available methods, syntax, features, examples and more. Here's a quote:
Regular expressions (regexps) are patterns which describe the contents of a string. They're used for testing whether a string contains a given pattern, or extracting the portions that match. They are created with the
/pat/
and%r{pat}
literals or theRegexp.new
constructor.
match? method
First up, a simple example to test whether a string is part of another string or not. Normally, you'd use the include?
method and pass a string as argument. For regular expressions, use the match?
method and enclose the search string within //
delimiters (regexp literal).
>> sentence = 'This is a sample string'
# check if 'sentence' contains the given string argument
>> sentence.include?('is')
=> true
>> sentence.include?('z')
=> false
# check if 'sentence' matches the pattern as described by the regexp argument
>> sentence.match?(/is/)
=> true
>> sentence.match?(/z/)
=> false
The match?
method accepts an optional second argument which specifies the index to start searching from.
>> sentence = 'This is a sample string'
>> sentence.match?(/is/, 2)
=> true
>> sentence.match?(/is/, 6)
=> false
Some of the regular expressions functionality is enabled by passing modifiers, represented by an alphabet character. If you have used command line, modifiers are similar to command options, for example grep -i
will perform case insensitive matching. It will be discussed in detail in Modifiers chapter. Here's an example for i
modifier.
>> sentence = 'This is a sample string'
>> sentence.match?(/this/)
=> false
# 'i' is a modifier to enable case insensitive matching
>> sentence.match?(/this/i)
=> true
Regexp literal reuse and interpolation
The regexp literal can be saved in a variable. This helps to improve code clarity, pass around as method argument, enable reuse, etc.
>> pet = /dog/i
>> pet
=> /dog/i
>> 'They bought a Dog'.match?(pet)
=> true
>> 'A cat crossed their path'.match?(pet)
=> false
Similar to double quoted string literals, you can use interpolation and escape sequences in a regexp literal. See ruby-doc: Strings for syntax details on string escape sequences. Regexp literals have their own special escapes, which will be discussed in Escape sequences section.
>> "cat\tdog".match?(/\t/)
=> true
>> "cat\tdog".match?(/\a/)
=> false
>> greeting = 'hi'
>> /#{greeting} there/
=> /hi there/
>> /#{greeting.upcase} there/
=> /HI there/
>> /#{2**4} apples/
=> /16 apples/
sub and gsub methods
For search and replace, use sub
or gsub
methods. The sub
method will replace only the first occurrence of the match, whereas gsub
will replace all the occurrences. The regexp pattern to match against the input string has to be passed as the first argument. The second argument specifies the string which will replace the portions matched by the pattern.
>> greeting = 'Have a nice weekend'
# replace first occurrence of 'e' with 'E'
>> greeting.sub(/e/, 'E')
=> "HavE a nice weekend"
# replace all occurrences of 'e' with 'E'
>> greeting.gsub(/e/, 'E')
=> "HavE a nicE wEEkEnd"
Use sub!
and gsub!
methods for in-place substitution.
>> word = 'cater'
# this will return a string object, won't modify 'word' variable
>> word.sub(/cat/, 'wag')
=> "wager"
>> word
=> "cater"
# this will modify 'word' variable itself
>> word.sub!(/cat/, 'wag')
=> "wager"
>> word
=> "wager"
Regexp operators
Ruby also provides operators for regexp matching.
=~
match operator returns index of the first match andnil
if match is not found!~
match operator returnstrue
if string doesn't contain the given regexp andfalse
otherwise===
match operator returnstrue
orfalse
similar to thematch?
method
>> sentence = 'This is a sample string'
# can also use: /is/ =~ sentence
>> sentence =~ /is/
=> 2
>> sentence =~ /z/
=> nil
# can also use: /z/ !~ sentence
>> sentence !~ /z/
=> true
>> sentence !~ /is/
=> false
Just like match?
method, both =~
and !~
can be used in a conditional statement.
>> sentence = 'This is a sample string'
>> puts 'hi' if sentence =~ /is/
hi
>> puts 'oh' if sentence !~ /z/
oh
The ===
operator comes in handy with Enumerable methods like grep
, grep_v
, all?
, any?
, etc.
>> sentence = 'This is a sample string'
# regexp literal has to be on LHS and input string on RHS
>> /is/ === sentence
=> true
>> /z/ === sentence
=> false
>> words = %w[cat attempt tattle]
>> words.grep(/tt/)
=> ["attempt", "tattle"]
>> words.all?(/at/)
=> true
>> words.none?(/temp/)
=> false
A key difference from
match?
method is that these operators will also set regexp related global variables.
Cheatsheet and Summary
Note | Description |
---|---|
ruby-doc: Regexp | Ruby Regexp documentation |
Onigmo doc | Onigmo library documentation |
/pat/ or %r{pat} | regexp literal |
interpolation and escape sequences can also be used | |
var = /pat/ | save regexp literal in a variable |
/pat1#{expr}pat2/ | use result of an expression to build regexp |
s.match?(/pat/) | check if string s matches the pattern /pat/ |
returns true or false | |
s.match?(/pat/, 3) | optional 2nd argument changes starting index of search |
/pat/i | modifier i matches alphabets case insensitively |
s.sub(/pat/, 'replace') | search and replace first matching occurrence |
use gsub to replace all occurrences | |
use sub! and gsub! for in-place substitution | |
s =~ /pat/ or /pat/ =~ s | returns index of first match or nil |
s !~ /pat/ or /pat/ !~ s | returns true if no match or false |
/pat/ === s | returns true or false similar to match? |
these operators will also set regexp global variables |
This chapter introduced the Regexp
class and methods match?
, sub
and gsub
were discussed. You also learnt how to save and reuse regexp literals, how to specify modifiers and how to use regexp operators.
You might wonder why there are so many ways to test matching condition with regexps. The most common approach is to use match?
method in a conditional statement. If you need position of match, use =~
operator or index
method. The ===
operator is usually relevant in Enumerable methods. Usage of global variables will be covered in later chapters. The =~
and !~
operators are also prevalent in command line usage (see my Ruby one liners tutorial for examples).
The next section has exercises to test your understanding of the concepts introduced in this chapter. Please do solve them before moving on to the next chapter.
Exercises
Refer to exercises folder for input files required to solve the exercises.
All the exercises are also collated together in one place at Exercises.md. For solutions, see Exercise_solutions.md.
a) Check whether the given strings contain 0xB0
. Display a boolean result as shown below.
>> line1 = 'start address: 0xA0, func1 address: 0xC0'
>> line2 = 'end address: 0xFF, func2 address: 0xB0'
>> line1.match?() ##### add your solution here
=> false
>> line2.match?() ##### add your solution here
=> true
b) For the given input file, print all lines containing the string two
.
# note that expected output shown here is wrapped to fit pdf width
>> filename = 'programming_quotes.txt'
>> word = ##### add your solution here
>> puts File.foreach(filename).grep(word)
"Some people, when confronted with a problem, think - I know, I'll use regular
expressions. Now they have two problems" by Jamie Zawinski
"So much complexity in software comes from trying to make one thing do two
things" by Ryan Singer
c) Replace all occurrences of 5
with five
for the given string.
>> ip = 'They ate 5 apples and 5 oranges'
>> ip.gsub(//, 'five') ##### add your solution here
=> "They ate five apples and five oranges"
d) Replace first occurrence of 5
with five
for the given string.
>> ip = 'They ate 5 apples and 5 oranges'
>> ip.sub(//, 'five') ##### add your solution here
=> "They ate five apples and 5 oranges"
e) For the given array, filter all elements that do not contain e
.
>> items = %w[goal new user sit eat dinner]
>> items.grep_v(//) ##### add your solution here
=> ["goal", "sit"]
f) Replace all occurrences of note
irrespective of case with X
.
>> ip = 'This note should not be NoTeD'
>> ip.gsub(//, 'X') ##### add your solution here
=> "This X should not be XD"
g) For the given input string, print all lines NOT containing the string 2
'> purchases = %q{items qty
'> apple 24
'> mango 50
'> guava 42
'> onion 31
>> water 10}
>> num = // ##### add your solution here
>> puts purchases.each_line.grep_v(num)
items qty
mango 50
onion 31
water 10
h) For the given array, filter all elements that contains either a
or w
.
>> items = %w[goal new user sit eat dinner]
>> items.filter { } ##### add your solution here
=> ["goal", "new", "eat"]
i) For the given array, filter all elements that contains both e
and n
.
>> items = %w[goal new user sit eat dinner]
>> items.filter { } ##### add your solution here
=> ["new", "dinner"]
j) For the given string, replace 0xA0
with 0x7F
and 0xC0
with 0x1F
.
>> ip = 'start address: 0xA0, func1 address: 0xC0'
##### add your solution here
=> "start address: 0x7F, func1 address: 0x1F"
k) Find the starting index of the first occurrence of is
for the given input string.
>> ip = 'match this after the history lesson'
##### add your solution here
=> 8