# Regexp gotcha 1: grouping common portions

Similar to `a(b+c)d = abd+acd`

in maths, you get `a(b|c)d = abd|acd`

in regular expressions. However, you'll have to be careful if quantifiers are involved.

For example, `(a*|b*)`

isn't the same as `(a|b)*`

. Can you reason out why? Here's a railroad diagram to help you out:

Credit: debuggex.com

The difference is that `(a*|b*)`

only matches same letter sequences like `a`

, `bb`

, `aaaaaa`

, etc. But `(a|b)*`

can match mixed sequences like `ababbba`

too. You can also simplify `(a|b)*`

to `[ab]*`

since it is just single character alternation in this particular example.

Here's an illustration using Python:

```
>>> import re
>>> test = ['aa', 'abbaba', 'aaabbb', 'bbbbb', 'abc']
>>> [s for s in test if re.fullmatch(r'(a*|b*)', s)]
['aa', 'bbbbb']
>>> [s for s in test if re.fullmatch(r'(a|b)*', s)]
['aa', 'abbaba', 'aaabbb', 'bbbbb']
```

Want to learn regular expressions from the basics with plenty of examples and exercises? I've written regexp ebooks for Python, JavaScript, Ruby and CLI tools.