Regular Expressions is a versatile tool for text processing. You'll find them included as part of standard library of most programming languages that are used for scripting purposes. If not, you can usually find a third-party library. Syntax and features of regular expressions vary from language to language. Python's syntax is similar to that of Perl language, but there are significant feature differences.
str class comes loaded with variety of methods to deal with text. So, what's so special about regular expressions and why would you need it? For learning and understanding purposes, one can view regular expressions as a mini programming language in itself, specialized for text processing. Parts of a regular expression can be saved for future use, analogous to variables and functions. There are ways to perform AND, OR, NOT conditionals. Operations similar to range function, string repetition operator and so on.
Here's some common use cases.
- Sanitizing a string to ensure that it satisfies a known set of rules. For example, to check if a given string matches password rules.
- Filtering or extracting portions on an abstract level like alphabets, numbers, punctuation and so on.
- Qualified string replacement. For example, at the start or the end of a string, only whole words, based on surrounding text, etc.
You are likely to be familiar with graphical search and replace tool, like the screenshot shown below from LibreOffice Writer. Match case, Whole words only, Replace and Replace All are some of the basic features supported by regular expressions.
Another real world use case is password validation. The screenshot below is from GitHub sign up page. Performing multiple checks like string length and the type of characters allowed is another core feature of regular expressions.
Here's some articles on regular expressions to know about its history and the type of problems it is suited for.
- The true power of regular expressions — it also includes a nice explanation of what regular means in this context
- softwareengineering: Is it a must for every programmer to learn regular expressions?
- softwareengineering: When you should NOT use Regular Expressions?
- codinghorror: Now You Have Two Problems
- wikipedia: Regular expression — this article includes discussion on regular expressions as a formal language as well as details on various implementations
The book introduces concepts one by one and exercises at the end of chapters will require only the features introduced until that chapter. Each concept is accompanied by multiple examples to cover various angles of usage and corner cases. As mentioned before, follow along the illustrations by typing out the code snippets manually. It is important to understand both the nature of the sample input string as well as the actual programming command used. There are two interlude chapters that give an overview of useful external resources and some more resources are collated in the final chapter.
- re introduction
- Alternation and Grouping
- Escaping metacharacters
- Dot metacharacter and Quantifiers
- Interlude: Tools for debugging and visualization
- Working with matched portions
- Character class
- Groupings and backreferences
- Interlude: Common tasks
- regex module
- Further Reading
By the end of the book, you should be comfortable with both writing and reading regular expressions, how to debug them and know when to avoid them.