Regular Expressions
This chapter will discuss regular expressions (regexp) and related features in detail. As discussed in earlier chapters:
/searchpatternsearch the given pattern in the forward direction?searchpatternsearch the given pattern in the backward direction:range s/searchpattern/replacestring/flagssearch and replace:sis short for the:substitutecommand- the delimiter after the
replacestringportion is optional if you are not using flags
Documentation links:
- :h usr_27.txt — search commands and patterns
- :h pattern-searches — reference manual for Patterns and search commands
- :h :substitute — reference manual for the
:substitutecommand
Recall that you need to add the
/prefix for built-in help on regular expressions, :h /^ for example.
Flags
greplace all occurrences within a matching line- by default, only the first matching portion will be replaced
cask for confirmation before each replacementiignore case forsearchpatternIdon't ignore case forsearchpattern
These flags are applicable for the substitute command but not the / or ? searches. Flags can also be combined, for example:
s/cat/Dog/gireplace every occurrence ofcatwithDog- Case is ignored, so
Cat,cAt,CAT, etc will also be replaced - Note that
idoesn't affect the case of the replacement string
- Case is ignored, so
See :h s_flags for a complete list of flags and more details about them.
Anchors
By default, regexp will match anywhere in the text. You can use line and word anchors to specify additional restrictions regarding the position of matches. These restrictions are made possible by assigning special meaning to certain characters and escape sequences. The characters with special meaning are known as metacharacters in regular expressions parlance. In case you need to match those characters literally, you need to escape them with a \ character (discussed in the Escaping metacharacters section later in this chapter).
^restricts the match to the start-of-line^ThismatchesThis is a samplebut notDo This
$restricts the match to the end-of-line)$matchesapple (5)but notdef greeting():
^$match empty lines\<patternrestricts the match to the start of a word- word characters include alphabets, digits and underscore
\<hismatcheshisorto-hisorhistorybut notthisor_hist
pattern\>restricts the match to the end of a wordhis\>matcheshisorto-hisorthisbut nothistoryor_hist
\<pattern\>restricts the match between the start of a word and end of a word\<his\>matcheshisorto-hisbut notthisorhistoryor_hist
End-of-line can be
\r(carriage return),\n(newline) or\r\ndepending on your operating system and thefileformatsetting.
See :h pattern-atoms for more details.
Dot metacharacter
.match any single character other than end-of-linec.tmatchescatorcotorc2torc^torc.torc;tbut notcantoractorsit
\_.match any single character, including end-of-line
As seen above, matching end-of-line character requires special attention. Which is why examples and descriptions in this chapter will assume you are operating line wise unless otherwise mentioned. You'll later see how
\_is used in many more places to include end-of-line in the matches.
Greedy Quantifiers
Quantifiers can be applied to literal characters, the dot metacharacter, groups, backreferences and character classes. Basic examples are shown below, more will be discussed in the sections to follow.
*match zero or more timesabc*matchesaborabcorabcccorabccccccbut notbcError.*validmatchesError: invalid inputbut notvalid Errors/a.*b/X/replacestable bottle buswithtXus
\+match one or more timesabc\+matchesabcorabcccbut notaborbc
\?match zero or one times\=can also be used, helpful if you are searching backwards with the?commandabc\?matchesaborabc. This will matchabcccorabccccccas well, but only theabcportions/abc\?/X/replacesabccwithXc
\{m,n}matchmtontimes (inclusive)ab\{1,4}cmatchesabcorabbcorxabbbczbut notacorabbbbbc- if you are familiar with BRE, you can also use
\{m,n\}(ending brace is escaped)
\{m,}match at leastmtimesab\{3,}cmatchesxabbbczorabbbbbcbut notacorabcorabbc
\{,n}match up tontimes (including0times)ab\{,2}cmatchesabcoracorabbcbut notxabbbczorabbbbbc
\{n}match exactlyntimesab\{3}cmatchesxabbbczbut notabbcorabbbbbc
Greedy quantifiers will consume as much as possible, provided the overall pattern is also matched. That's how the Error.*valid example worked. If .* had consumed everything after Error, there wouldn't be any more characters to try to match valid. How the regexp engine handles matching varying amount of characters depends on the implementation details (backtracking, NFA, etc).
See :h pattern-overview for more details.
If you are familiar with other regular expression flavors like Perl, Python, etc, you'd be surprised by the use of
\in the above examples. If you use the\vvery magic modifier (discussed later in this chapter), the\won't be needed.
Non-greedy Quantifiers
Non-greedy quantifiers match as minimally as possible, provided the overall pattern is also matched.
\{-}match zero or more times as minimally as possibles/t.\{-}a/X/greplacesthat is quite a fabricated talewithXX fabricaXle- the matching portions are
tha,t is quite aandted ta
- the matching portions are
s/t.*a/X/greplacesthat is quite a fabricated talewithXlesince*is greedy
\{-m,n}matchmtontimes as minimally as possiblemorncan be left out as seen in the previous sections/.\{-2,5}/X/replaces123456789withX3456789(here.matched 2 times)s/.\{-2,5}6/X/replaces123456789withX789(here.matched 5 times)
See :h pattern-overview and stackoverflow: non-greedy matching for more details.
Character Classes
To create a custom placeholder for a limited set of characters, you can enclose them inside the [] metacharacters. Character classes have their own versions of metacharacters and provide special predefined sets for common use cases.
[aeiou]match any lowercase vowel character[^aeiou]match any character other than lowercase vowels[a-d]match any ofaorborcord- the range metacharacter
-can be applied between any two characters
- the range metacharacter
\amatch any alphabet character[a-zA-Z]\Amatch other than alphabets[^a-zA-Z]\lmatch lowercase alphabets[a-z]\Lmatch other than lowercase alphabets[^a-z]\umatch uppercase alphabets[A-Z]\Umatch other than uppercase alphabets[^A-Z]\dmatch any digit character[0-9]\Dmatch other than digits[^0-9]\omatch any octal character[0-7]\Omatch other than octals[^0-7]\xmatch any hexadecimal character[0-9a-fA-F]\Xmatch other than hexadecimals[^0-9a-fA-F]\hmatch alphabets and underscore[a-zA-Z_]\Hmatch other than alphabets and underscore[^a-zA-Z_]\wmatch any word character (alphabets, digits, underscore)[a-zA-Z0-9_]- this definition is same as seen earlier with word boundaries
\Wmatch other than word characters[^a-zA-Z0-9_]\smatch space and tab characters[ \t]\Smatch other than space and tab characters[^ \t]
Here are some examples with character classes:
c[ou]tmatchescotorcut\<[ot][on]\>matchesoooronortoortnas whole words only^[on]\{2,}$matchesnoornonornoonoronetc as whole lines onlys/"[^"]\+"/X/greplaces"mango" and "(guava)"withX and Xs/\d\+/-/greplacesSample123string777numberswithSample-string-numberss/\<0*[1-9]\d\{2,}\>/X/greplaces0501 035 26 98234withX 035 26 X(numbers >=100 with optional leading zeros)s/\W\+/ /greplacesload2;err_msg--\antwithload2 err_msg ant
To include the end-of-line character, use
\_instead of\for any of the above escape sequences. For example,\_swill help you match across lines. Similarly, use\_[]for bracketed classes.
![]()
The above escape sequences do not have special meaning within bracketed classes. For example,
[\d\s]will only match\ordors. You can use named character sets in such scenarios. For example,[[:digit:][:blank:]]to match digits or space or tab characters. See :h :alnum: for full list and more details.
The predefined sets are also better in terms of performance compared to bracketed versions. And there are more such sets than the ones discussed above. See :h character-classes for more details.
Alternation and Grouping
Alternation helps you to match multiple terms and they can have their own anchors as well (since each alternative is a regexp pattern). Often, there are some common things among the regular expression alternatives. In such cases, you can group them using a pair of parentheses metacharacters. Similar to a(b+c)d = abd+acd in maths, you get a(b|c)d = abd|acd in regular expressions.
\|match either of the specified patternsmin\|maxmatchesminormaxone\|two\|threematchesoneortwoorthree\<par\>\|er$matches the whole wordparor a line ending wither
\(pattern\)group a pattern to apply quantifiers, create a terser regexp by taking out common elements, etca\(123\|456\)bis equivalent toa123b\|a456bhand\(y\|ful\)matcheshandyorhandfulhand\(y\|ful\)\?matcheshandorhandyorhandful\(to\)\+matchestoortotoortototoand so onre\(leas\|ceiv\)\?edmatchesreedorreleasedorreceived
There can be tricky situations when using alternation. Say, you want to match are or spared — which one should get precedence? The bigger word spared or the substring are inside it or based on something else? The alternative which matches earliest in the input gets precedence, irrespective of the order of the alternatives.
s/are\|spared/X/greplacesrare spared areawithrX X Xas/spared\|are/X/gwill also give the same result
In case of matches starting from the same location, for example spa and spared, the leftmost alternative gets precedence. Sort by longest term first if don't want shorter terms to take precedence.
s/spa\|spared/**/greplacesspared sparewith**red **res/spared\|spa/**/greplacesspared sparewith** **re
Backreference
The groupings seen in the previous section are also known as capture groups. The string captured by these groups can be referred later using a backreference \N where N is the capture group you want. Backreferences can be used in both search and replacement sections.
\(pattern\)capture group for later use via backreferences\%(pattern\)non-capturing group- leftmost group is
1, second leftmost group is2and so on (maximum9groups) \1backreference to the first capture group\2backreference to the second capture group\9backreference to the ninth capture group&or\0backreference to the entire matched portion
Here are some examples:
\(\a\)\1matches two consecutive repeated alphabets likeee,TT,ppand so on- recall that
\arefers to[a-zA-Z]
- recall that
\(\a\)\1\+matches two or more consecutive repeated alphabets likeee,ttttt,PPPPPPPPand so ons/\d\+/(&)/greplaces52 apples 31 mangoeswith(52) apples (31) mangoes(surround digits with parentheses)s/\(\w\+\),\(\w\+\)/\2,\1/greplacesgood,bad 42,24withbad,good 24,42(swap words separated by comma)s/\(_\)\?_/\1/greplaces_fig __123__ _bat_withfig _123_ bat(reduce__to_and delete if it is a single_)s/\(\d\+\)\%(abc\)\+\(\d\+\)/\2:\1/replaces12abcabcabc24with24:12(match digits separated by one or moreabcsequences, swap the numbers with:as the separator)- note the use of non-capturing group for
abcsince it isn't needed later s/\(\d\+\)\(abc\)\+\(\d\+\)/\3:\1/does the same if only capturing groups are used
- note the use of non-capturing group for
Referring to the text matched by a capture group with a quantifier will give only the last match, not the entire match. Use a capture group around the grouping and quantifier together to get the entire matching portion. In such cases, the inner grouping is an ideal candidate to use non-capturing group.
s/a \(\d\{3}\)\+/b (\1)/replacesa 123456789withb (789)a 4839235will be replaced withb (923)5
s/a \(\%(\d\{3}\)\+\)/b (\1)/replacesa 123456789withb (123456789)a 4839235will be replaced withb (483923)5
Lookarounds
Lookarounds help to create custom anchors and add conditions within the searchpattern. These assertions are also known as zero-width patterns because they add restrictions similar to anchors and are not part of the matched portions.
Vim's syntax is different than those usually found in programming languages like Perl, Python and JavaScript. The syntax starting with
\@is always added as a suffix to the pattern atom used in the assertion. For example,(?!\d)and(?<=pat.*)in other languages are specified as\d\@!and\(pat.*\)\@<=respectively in Vim.
\@!negative lookahead assertionice\d\@!matchesiceas long as it is not immediately followed by a digit character, for exampleiceoriced!oricet5orice.123but notice42orice123s/ice\d\@!/X/greplacesiceiceice2withXXice2s/par\(.*\<par\>\)\@!/X/greplacesparwithXas long as whole wordparis not present later in the line, for exampleparse and par and sparseis converted toparse and X and sXseat\(\(go\)\@!.\)*parmatchescat,dog,parrotbut notcat,god,parrot(i.e. matchatfollowed byparas long asgoisn't present in between, this is an example of negating a grouping)
\@<!negative lookbehind assertion_\@<!icematchesiceas long as it is not immediately preceded by a_character, for exampleiceor_(ice)or42icebut not_ice\(cat.*\)\@<!dogmatchesdogas long ascatis not present earlier in the line, for examplefox,parrot,dog,catbut notfox,cat,dog,parrot
\@=positive lookahead assertionice\d\@=matchesiceas long as it is immediately followed by a digit character, for exampleice42orice123but noticeoriced!oricet5orice.123s/ice\d\@=/X/greplacesice ice_2 ice2 icedwithice ice_2 X2 iced
\@<=positive lookbehind assertion_\@<=icematchesiceas long as it is immediately preceded by a_character, for example_iceor(_ice)but noticeor_(ice)or42ice
![]()
![]()
You can also specify the number of bytes to search for lookbehind patterns. This will significantly speed up the matching process. You have to specify the number between the
@and<characters. For example,_\@1<=icewill lookback only one byte beforeicefor matching purposes.\(cat.*\)\@10<!dogwill lookback only ten bytes beforedogto check the given assertion.
Atomic Grouping
As discussed earlier, both greedy and non-greedy quantifiers will try to satisfy the overall pattern by varying the amount of characters matched by the quantifiers. You can use atomic grouping to safeguard a pattern from further backtracking. Similar to lookarounds, you need to use \@> as a suffix, for example \(pattern\)\@>.
s/\(0*\)\@>\d\{3,\}/(&)/greplaces only numbers >= 100 irrespective of any number of leading zeros, for example0501 035 154is converted to(0501) 035 (154)\(0*\)\@>matches the0character zero or more times, but it will not give up this portion to satisfy overall patterns/0*\d\{3,\}/(&)/greplaces0501 035 154with(0501) (035) (154)(here035is matched because0*will match zero times to satisfy the overall pattern)
s/\(::.\{-}::\)\@>par//replacesfig::1::spar::2::par::3withfig::1::spar::3\(::.\{-}::\)\@>will match only from::to the very next::s/::.\{-}::par//replacesfig::1::spar::2::par::3withfig::3(matches from the first::to the first occurrence of::par)
Set start and end of the match
Some of the positive lookbehind and lookahead usage can be replaced with \zs and \ze respectively.
\zsset the start of the match (portion before\zswon't be part of the match)s/\<\w\zs\w*\W*//greplacessea eat car rat eel teawithsecret- same as
s/\(\<\w\)\@<=\w*\W*//gors/\(\<\w\)\w*\W*/\1/g
\zeset the end of the match (portion after\zewon't be part of the match)s/ice\ze\d/X/greplacesice ice_2 ice2 icedwithice ice_2 X2 iced- same as
s/ice\d\@=/X/gors/ice\(\d\)/X\1/g
As per :h \zs and :h \ze, these "Can be used multiple times, the last one encountered in a matching branch is used."
Magic modifiers
These escape sequences change certain aspects of the syntax and behavior of the search pattern that comes after such a modifier. You can use multiple such modifiers as needed for particular sections of the pattern.
Magic and nomagic
\mmagic mode (this is the default setting)\Mnomagic mode.,*and~are no longer metacharacters (compared to magic mode)\.,\*and\~will make them to behave as metacharacters^and$would still behave as metacharacters\Ma.bmatches onlya.b\Ma\.bmatchesa.bas well asa=bora<boracbetc
Very magic
The default syntax of Vim regexp has only a few metacharacters like ., *, ^ and $. If you are familiar with regexp usage in programming languages such as Perl, Python and JavaScript, you can use \v to get a similar syntax in Vim. This will allow the use of more metacharacters such as (), {}, +, ? and so on without having to prefix them with a \ metacharacter. From :h magic documentation:
Use of
\vmeans that after it, all ASCII characters except0-9,a-z,A-Zand_have special meaning
\v<his>matcheshisorto-hisbut notthisorhistoryor_hista<b.*\v<end>matchesc=a<b #endbut notc=a<b #bending- note that
\vis used aftera<bto avoid having to escape the first<
- note that
\vone|two|threematchesoneortwoorthree\vabc+matchesabcorabcccbut notaborbcs/\vabc?/X/replacesabccwithXcs/\vt.{-}a/X/greplacesthat is quite a fabricated talewithXX fabricaXle\vab{3}cmatchesxabbbczbut notabbcorabbbbbcs/\v(\w+),(\w+)/\2,\1/greplacesgood,bad 42,24withbad,good 24,42- compare this to the default mode:
s/\(\w\+\),\(\w\+\)/\2,\1/g
- compare this to the default mode:
Very nomagic
From :h magic documentation:
Use of
\Vmeans that after it, only a backslash and terminating character (usually/or?) have special meaning
\V^.*{}$matches^.*{}$literally\V^.*{}$\.\*abcdmatches^.*{}$literally only ifabcdis found later in the line\V^.*{}$\m.*abcdcan also be used
\V\^ThismatchesThis is a samplebut notDo This\V)\$matchesapple (5)but notdef greeting():
Case sensitivity
These will override flags and settings, if any. Unlike the magic modifiers, you cannot apply \c or \C for a specific portion of the pattern.
\ccase insensitive search\cthismatchesthisorThisorTHIsand so onth\cisorthis\cand so on will also result in the same behavior
\Ccase sensitive search\Cthismatch exactlythisbut notThisorTHIsand so onth\Cisorthis\Cand so on will also result in the same behavior
s/\Ccat/dog/gireplacescat Cat CATwithdog Cat CATsince theiflag gets overridden
Changing Case
These can be used in the replacement section:
\uUppercases the next character\UUPPERCASES the following characters\llowercases the next character\Llowercases the following characters\eor\Ewill end further case changes\Lor\Uwill also override any existing conversion
Examples:
s/\<\l/\u&/greplaceshello. how are you?withHello. How Are You?- recall that
\lin the search section is equivalent to[a-z]
- recall that
s/\<\L/\l&/greplacesHELLO. HOW ARE YOU?withhELLO. hOW aRE yOU?- recall that
\Lin the search section is equivalent to[A-Z]
- recall that
s/\v(\l)_(\l)/\1\u\2/greplacesaug_price next_linewithaugPrice nextLines/.*/\L&/replacesHaVE a nICe dAywithhave a nice days/\a\+/\u\L&/greplacesHeLLo:bYe gOoD:beTTErwithHello:Bye Good:Betters/\a\+/\L\u&/gcan also be used in this case
s/\v(\a+)(:\a+)/\L\1\U\2/greplacesHi:bYe gOoD:baDwithhi:BYE good:BAD
Alternate delimiters
From :h substitute documentation:
Instead of the
/which surrounds the pattern and replacement string, you can use any other single-byte character, but not an alphanumeric character,\,"or|. This is useful if you want to include a/in the search pattern or replacement string.
s#/home/learnbyexample/#\~/#replaces/home/learnbyexample/reportswith~/reports- compare this with
s/\/home\/learnbyexample\//\~\//
- compare this with
Escape sequences
Certain characters like tab, carriage return, newline, etc have escape sequences to represent them. Additionally, any character can be represented using their codepoint value in decimal, octal and hexadecimal formats. Unlike character set escape sequences like \w, these can be used inside character classes as well. If the escape sequences behave differently in searchpattern and replacestring portions, they'll be highlighted in the descriptions below.
\ttab character\bbackspace character\rmatches carriage return forsearchpattern, produces newline forreplacestring\nmatches end-of-line forsearchpattern, produces ASCII NUL forreplacestring\ncan also match\ror\r\n(where\ris carriage return) depending upon thefileformatsetting
\%dmatches character specified by decimal digits\%d39matches the single quote character
\%omatches character specified by octal digits\%o47matches the single quote character
\%xmatches character specified by hexadecimal digits (max 2 digits)\%x27matches the single quote character
\%umatches character specified by hexadecimal digits (max 4 digits)\%Umatches character specified by hexadecimal digits (max 8 digits)
Using
\%sequences to insert characters inreplacestringhasn't been implemented yet. See vi.stackexchange: Replace with hex character for workarounds.
See ASCII code table for a handy cheatsheet with all the ASCII characters and conversion tables. See codepoints for Unicode characters.
Escaping metacharacters
To match the metacharacters literally (including character class metacharacters like -), i.e. to remove their special meaning, prefix those characters with a \ (backslash) character. To indicate a literal \ character, use \\. Depending on the pattern, you can also use a different magic modifier to reduce the need for escaping. Assume default magicness for the below examples unless otherwise specified.
^and$do not require escaping if they are used out of positionb^2matchesa^2 + b^2 - C*3$4matchesthis ebook is priced $40\^supermatches^superscript(you need the\here since^is at the customary position)
[and]do not require escaping if only one of them is usedb[1matchesab[12342]matchesxyz42] =b\[123]orb[123\]matchesab[123] = d
[in the substitute command requires careful considerations/b[1/X/replacesb[1/X/with nothings/b\[1/X/replacesab[123withaX23
\Va*b.cora\*b\.cmatchesa*b.c&in the replacement section requires escaping to represent it literallys/and/\&/replacesapple and mangowithapple & mango
The following can be used to match character class metacharacters literally in addition to escaping them with a \ character:
-can be specified at the start or end of the list, for example[-0-5]and[a-z-]^should be other than the first character, for example[+a^.]]should be the first character, for example[]a-z]and[^]a]
Replacement expressions
\=whenreplacestringstarts with\=, it is treated as an expressions/date:\zs/\=strftime("%Y-%m-%d")/appends the current date- for example, changes
date:todate:2024-06-25
- for example, changes
s/\d\+/\=submatch(0)*2/gmultiplies matching numbers by 2- for example, changes
4 and 10to8 and 20 submatch()function is similar to backreferences,0gives the entire matched string,1refers to the first capture group and so on
- for example, changes
s/\(.*\)\zs/\=" = " . eval(submatch(1))/appends result of an expression- for example, changes
10 * 2 - 3to10 * 2 - 3 = 17 .is the string concatenation operatoreval()here executes the contents of the first capture group as an expression
- for example, changes
s/"[^"]\+"/\=substitute(submatch(0), '[aeiou]', '\u&', 'g')/gaffects vowels only inside double quotes- for example, changes
"mango" and "guava"to"mAngO" and "gUAvA" substitute()function works similarly to thescommand- first argument is the text to work on
- second argument is similar to
searchpattern - third argument is similar to
replacestring - fourth argument is flags, use an empty string if not required
- see :h substitute() for more details and differences compared to the
scommand
- for example, changes
perldo s/\d+/$&*2/gechanges4 and 10to8 and 20- useful if you are familiar with Perl regular expressions and the
perlinterface is available with your Vim installation - note that the default range is
1,$(thescommand works only on the current line by default) - see :h perldo for restrictions and more details
- useful if you are familiar with Perl regular expressions and the
See :h usr_41.txt for details about Vim script.
See :h sub-replace-expression for more details.
See also stackoverflow: find all occurrences and replace with user input.
Miscellaneous
\%Vmatch inside the visual area onlys/\%V10/20/greplaces10with20only inside the visual area- without
\%V, the replacement would happen anywhere on the lines covered by the visual selection
\%[set]match zero or more of these characters in the same order, as much as possiblespa\%[red]matchesspaorsparorspareorspared(longest match wins)- same as
\vspa(red|re|r)?or\vspa(red?|r)?and so on
- same as
ap\%[[pt]ly]matchesaporapporapplorapplyoraptoraptloraptly
\_^and\_$restrict the match to start-of-line and end-of-line respectively, useful for multiline patterns\%^and\%$restrict the match to start-of-file and end-of-file respectively~represents the last replacement strings/apple/banana/followed by/~will search forbananas/apple/banana/followed bys/fig/(~)/will use(banana)as the replacement string
Further Reading
- vi.stackexchange: How to find and replace in Vim without having to type the original word? — lots of tips and tricks
- vi.stackexchange: How to replace each match with incrementing counter?
- vi.stackexchange: What is the rationale for \r and \n meaning different things in s command? and stackoverflow: Why is \r a newline for Vim?
- stackoverflow: What does this regex mean?