Interlude: Common tasks
Tasks like matching phone numbers, ip addresses, dates, etc are so common that you can often find them collected as a library. This chapter shows some examples for the CommonRegex module. The re
module documentation also has a section on tasks like docs.python: tokenizer. See also Awesome Regex: Collections.
CommonRegex
You can either install commonregex
as a module or go through commonregex.py and choose the regular expression you need. There are several ways to use the patterns, see CommonRegex: Usage for details. Here's an example for matching ip addresses:
>>> from commonregex import ip
>>> data = 'hello 255.21.255.22 okay'
>>> ip.findall(data)
['255.21.255.22']
If you check ip.pattern
, you'll find (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
repeated without any other restriction. So, this will not prevent partial matches. You'll have to add those conditions yourself.
>>> data = '23.14.2.4.2 255.21.255.22 567.12.2.1'
# wrong matches
>>> ip.findall(data)
['23.14.2.4', '255.21.255.22', '67.12.2.1']
# corrected usage
>>> [e for e in data.split() if ip.fullmatch(e)]
['255.21.255.22']
Summary
Some patterns are quite complex and not easy to build and validate from scratch. Libraries like CommonRegex are helpful to reduce your time and effort needed for commonly known tasks. However, you do need to test the solution for your use cases. See also stackoverflow: validating email addresses.