Interlude: Common tasks

Tasks like matching phone numbers, ip addresses, dates, etc are so common that you can often find them collected as a library. This chapter shows some examples for CommonRegex. re module documentation also has a section on tasks like docs.python: tokenizer. See also Awesome Regex: Collections.

CommonRegex

You can either install commonregex as a module or go through commonregex.py and choose the regular expression you need. There are several ways to use the patterns, see CommonRegex: Usage for details.

>>> from commonregex import ip

>>> data = 'hello 255.21.255.22 okay'
>>> ip.findall(data)
['255.21.255.22']

If you check ip.pattern, you'll find (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) repeated without any other restriction. So, this will not prevent partial matches. You'll have to add those conditions yourself.

>>> data = '23.14.2.4.2 255.21.255.22 567.12.2.1'

# wrong matches
>>> ip.findall(data)
['23.14.2.4', '255.21.255.22', '67.12.2.1']

# corrected usage
>>> [e for e in data.split() if ip.fullmatch(e)]
['255.21.255.22']

Summary

Some patterns are quite complex and not easy to build and validate from scratch. Libraries like CommonRegex are helpful to reduce your time and effort needed for commonly known tasks. However, you do need to test the solution for your use case. See also stackoverflow: validating email addresses.