Similar to a(b+c)d = abd+acd in maths, you get a(b|c)d = abd|acd in regular expressions. However, you'll have to be careful if quantifiers are involved.

For example, (a*|b*) isn't the same as (a|b)*. Can you reason out why? Here's a railroad diagram to help you out:

Regexp grouping with quantifiers gotcha


The difference is that (a*|b*) only matches same letter sequences like a, bb, aaaaaa, etc. But (a|b)* can match mixed sequences like ababbba too. You can also simplify (a|b)* to [ab]* since it is just single character alternation in this particular example.

Here's an illustration using Python:

>>> import re

>>> test = ['aa', 'abbaba', 'aaabbb', 'bbbbb', 'abc']

>>> [s for s in test if re.fullmatch(r'(a*|b*)', s)]
['aa', 'bbbbb']

>>> [s for s in test if re.fullmatch(r'(a|b)*', s)]
['aa', 'abbaba', 'aaabbb', 'bbbbb']

info Want to learn regular expressions from the basics with plenty of examples and exercises? I've written regexp ebooks for Python, JavaScript, Ruby and CLI tools.