Python tip 26: atomic grouping
Until Python 3.10, you had to use alternatives like the third-party regex module for possessive quantifiers and atomic grouping. The re
module supports these features from Python 3.11 version.
Greedy and non-greedy quantifiers will backtrack to help the overall pattern to succeed. The syntax for an atomic group is (?>pat)
, where pat
is the pattern you want to safeguard from further backtracking. You can think of it as a special group that is isolated from the other parts of the regular expression.
Here's an example with greedy quantifier:
>>> import re
>>> numbers = '42 314 001 12 00984'
# 0* is greedy and the (?>) grouping prevents backtracking
# same as: re.findall(r'0*+\d{3,}', numbers)
>>> re.findall(r'(?>0*)\d{3,}', numbers)
['314', '00984']
Here's an example with non-greedy quantifier:
>>> ip = 'fig::mango::pineapple::guava::apples::orange'
# this matches from the first '::' to the first occurrence of '::apple'
>>> re.search(r'::.*?::apple', ip)[0]
'::mango::pineapple::guava::apple'
# '(?>::.*?::)' will match only from '::' to the very next '::'
# '::mango::' fails because 'apple' isn't found afterwards
# similarly '::pineapple::' fails
# '::guava::' succeeds because it is followed by 'apple'
>>> re.search(r'(?>::.*?::)apple', ip)[0]
'::guava::apple'
Video demo:
See also my 100 Page Python Intro and Understanding Python re(gex)? ebooks.