Python tip 32: positive lookarounds
Lookarounds help to create custom anchors and add conditions within a regex definition. These assertions are also known as zero-width patterns because they add restrictions similar to anchors and are not part of the matched portions. Negative lookarounds were discussed in this post. The syntax for positive lookarounds is shown below:
(?=pat)
positive lookahead assertion(?<=pat)
positive lookbehind assertion
Here are some examples:
>>> s = '42 apple-5, fig3; x-83, y-20: f12'
# extract digits only if it is followed by ,
# note that end of string doesn't qualify as this is a positive assertion
>>> re.findall(r'\d+(?=,)', s)
['5', '83']
# extract digits only if it is preceded by - and followed by ; or :
>>> re.findall(r'(?<=-)\d+(?=[:;])', s)
['20']
# replace 'par' as long as 'part' occurs as a whole word later in the line
>>> re.sub(r'par(?=.*\bpart\b)', '[\g<0>]', 'par spare part party')
'[par] s[par]e part party'
With lookbehind assertion (both positive and negative), the pattern used for the assertion cannot imply matching variable length of text. Fixed length quantifier is allowed. Different length alternations are not allowed, even if the individual alternations are of fixed length.
>>> s = 'pore42 tar3 dare7 care5'
# not allowed
>>> re.findall(r'(?<=tar|dare)\d+', s)
re.error: look-behind requires fixed-width pattern
# workaround for r'(?<!tar|dare)\d+'
>>> re.findall(r'(?<!tar)(?<!dare)\d+', s)
['42', '5']
# workaround for r'(?<=tar|dare)\d+'
>>> re.findall(r'(?:(?<=tar)|(?<=dare))\d+', s)
['3', '7']
The third-party regex
module (https://pypi.org/project/regex/) offers advanced features like variable-length lookbehinds, subexpression calls, etc.
Video demo:
See also my 100 Page Python Intro and Understanding Python re(gex)? ebooks.