Python Regular Expressions (re Module) – A Practical Guide

Discover how to harness the power of regular expressions in Python with the re module. Step‑by‑step examples illustrate common patterns, matching techniques, and advanced functions.

What Is a Regular Expression?

A regular expression (RegEx) is a sequence of characters that defines a search pattern. For example, the pattern ^a...s$ matches any five‑letter string that starts with a and ends with s.

^a...s$

Patterns can be used to match against strings. The following table demonstrates how the pattern behaves with different inputs:

Expression	String	Matched?
`^a...s$`	`abs`	No match
	`alias`	Match
	`abyss`	Match
	`Alias`	No match
	`An abacus`	No match

Python’s re module provides the tools you need to work with RegEx. Here’s a quick example that uses re.match() to check a pattern against a string:

import re

pattern = '^a...s$'
test_string = 'abyss'
result = re.match(pattern, test_string)

if result:
    print("Search successful.")
else:
    print("Search unsuccessful.")

When the pattern is found, re.match() returns a match object; otherwise it returns None.

Specifying Patterns with Metacharacters

Metacharacters are special symbols that the regex engine interprets in a unique way. Below is a quick reference for the most common metacharacters:

[] . ^ $ * + ? {} () \ |

Square Brackets: `[]`

Square brackets define a set of characters to match. For instance, [abc] matches any single occurrence of a, b, or c.

Expression	String	Matched?
`[abc]`	`a`	1 match
	`ac`	2 matches
	`Hey Jude`	No match
	`abc de ca`	5 matches

Ranges can be expressed with a hyphen, e.g., [a-e] equals [abcde]. To invert a set, place a caret ^ after the opening bracket, e.g., [^abc] matches any character except a, b, or c.

Period: `.`

The dot matches any single character except a newline.

Expression	String	Matched?
`..`	`a`	No match
	`ac`	1 match
	`acd`	1 match
	`acde`	2 matches (contains 4 characters)

Caret: `^` and Dollar: `$`

The caret asserts the start of a string; the dollar sign asserts the end. For example, ^a matches any string beginning with a, while a$ matches any string ending with a.

Expression	String	Matched?
`^a`	`a`	1 match
	`abc`	1 match
	`bac`	No match
`^ab`	`abc`	1 match
`^ab`	`acb`	No match (starts with `a` but not followed by `b`)

Expression	String	Matched?
`a$`	`a`	1 match
	`formula`	1 match
	`cab`	No match

Quantifiers: `*`, `+`, `?`, `{n,m}`

These symbols control how many times the preceding element should appear.

* – zero or more times.
+ – one or more times.
? – zero or one time.
{n,m} – at least n and at most m times.

Examples:

Expression	String	Matched?
`ma*n`	`mn`	1 match
	`man`	1 match
	`maaan`	1 match
	`main`	No match (`a` is not followed by `n`)
	`woman`	1 match

Expression	String	Matched?
`ma+n`	`mn`	No match (no `a`)
	`man`	1 match
	`maaan`	1 match
	`main`	No match (a is not followed by n)
	`woman`	1 match

Expression	String	Matched?
`ma?n`	`mn`	1 match
	`man`	1 match
	`maaan`	No match (more than one `a`)
	`main`	No match (a is not followed by n)
	`woman`	1 match

Expression	String	Matched?
`a{2,3}`	`abc dat`	No match
	`abc daat`	1 match (at `daat`)
	`aabc daaat`	2 matches (at `aabc` and `daaat`)
	`aabc daaaat`	2 matches (at `aabc` and `daaaat`)

Commonly used pattern: [0-9]{2,4} matches two to four consecutive digits.

Expression	String	Matched?
`[0-9]{2,4}`	`ab123csde`	1 match (at `ab123csde`)
	`12 and 345673`	3 matches (`12`, `3456`, `73`)
	`1 and 2`	No match

Alternation: `|`

The vertical bar implements logical OR. For example, a|b matches any string containing a or b.

Expression	String	Matched?
`a\|b`	`cde`	No match
	`ade`	1 match (at `ade`)
	`acdbea`	3 matches (at `acdbea`)

Grouping: `()`

Parentheses group sub‑patterns and capture matched substrings. Example: (a|b|c)xz matches any string that contains a, b, or c followed by xz.

Expression	String	Matched?
`(a\|b\|c)xz`	`ab xz`	No match
	`abxz`	1 match (at `abxz`)
	`axz cabxz`	2 matches (at `axzbc cabxz`)

Escaping: `\`

Use a backslash to treat a metacharacter as a literal. For instance, \$a matches the literal sequence $a rather than interpreting $ as the end‑of‑string anchor.

Special Sequences

Special sequences simplify common patterns. Below is a reference list with illustrative examples.

Start of String: `\A`

Expression	String	Matched?
`\Athe`	`the sun`	Match
`\Athe`	`In the sun`	No match

Word Boundary: `\b`

Expression	String	Matched?
`\bfoo`	`football`	Match
	`a football`	Match
	`afootball`	No match
`foo\b`	`the foo`	Match
	`the afoo test`	Match
	`the afootest`	No match

Non‑Word Boundary: `\B`

Expression	String	Matched?
`\Bfoo`	`football`	No match
	`a football`	No match
	`afootball`	Match
`foo\B`	`the foo`	No match
	`the afoo test`	No match
	`the afootest`	Match

Digit: `\d` / Non‑Digit: `\D`

Expression	String	Matched?
`\d`	`12abc3`	3 matches (at `12abc3`)
`\d`	`Python`	No match
`\D`	`1ab34"50`	3 matches (at `1ab34"50`)
`\D`	`1345`	No match

Whitespace: `\s` / Non‑Whitespace: `\S`

Expression	String	Matched?
`\s`	`Python RegEx`	1 match
`\s`	`PythonRegEx`	No match
`\S`	`a b`	2 matches (at `a b`)
`\S`		No match

Word Character: `\w` / Non‑Word: `\W`

Expression	String	Matched?
`\w`	`12&": ;c`	3 matches (at `12": ;c`)
`\w`	`"%> !`	No match
`\W`	`1a2%c`	1 match (at `1a2%c`)
`\W`	`Python`	No match

End of String: `\Z`

Expression	String	Matched?
`Python\Z`	`I like Python`	1 match
	`I like Python Programming`	No match
	`Python is fun.`	No match

Tip: Use an online regex tester like regex101 to craft and debug patterns quickly.

Using RegEx in Python

Python’s re module offers a rich set of functions and constants for regex operations. Import it with:

import re

Below are the most common utilities and how to use them.

re.findall()

Returns a list of all non‑overlapping matches in a string.

# Extract all numbers from a string
import re

string = 'hello 12 hi 89. Howdy 34'
pattern = '\d+'

result = re.findall(pattern, string)
print(result)
# Output: ['12', '89', '34']

When no match is found, an empty list is returned.

re.split()

Splits a string at each point where the pattern matches, returning a list of substrings.

import re

string = 'Twelve:12 Eighty nine:89.'
pattern = '\d+'

result = re.split(pattern, string)
print(result)
# Output: ['Twelve:', ' Eighty nine:', '.']

Use the optional maxsplit argument to limit the number of splits. A value of 0 (the default) splits at every match.

import re

string = 'Twelve:12 Eighty nine:89 Nine:9.'
pattern = '\d+'

# Split only at the first occurrence
result = re.split(pattern, string, 1)
print(result)
# Output: ['Twelve:', ' Eighty nine:89 Nine:9.']

re.sub()

Replaces all occurrences of a pattern with a replacement string.

# Remove all whitespace characters
import re

string = 'abc 12\nde 23 \n f45 6'
pattern = '\s+'
replace = ''

new_string = re.sub(pattern, replace, string)
print(new_string)
# Output: abc12de23f456

To limit replacements, provide a count argument. A value of 0 replaces every match.

new_string = re.sub(r'\s+', replace, string, 1)
print(new_string)
# Output:
# abc12de 23
# f45 6

re.subn()

Like re.sub(), but returns a tuple containing the new string and the number of substitutions performed.

new_string = re.subn(pattern, replace, string)
print(new_string)
# Output: ('abc12de23f456', 4)

re.search()

Finds the first location where the pattern matches.

string = "Python is fun"
match = re.search('\\APython', string)

if match:
    print("pattern found inside the string")
else:
    print("pattern not found")
# Output: pattern found inside the string

Match Object

When a match is found, re.search() returns a Match object. Common methods and attributes include:

group() – the matched substring.
start() / end() – indices of the match.
span() – a tuple of (start, end).
groups() – all captured groups.

string = '39801 356, 2102 1111'
pattern = '(\\d{3}) (\\d{2})'

match = re.search(pattern, string)

if match:
    print(match.group())
    print(match.group(1))
    print(match.group(2))
    print(match.groups())
    print(match.start(), match.end(), match.span())
else:
    print("pattern not found")
# Output:
# 801 35
# 801
# 35
# ('801', '35')
# 2 8 (2, 8)

Using Raw Strings (prefix `r`)

Prefixing a string literal with r treats backslashes as literal characters, preventing accidental escape sequences. This is especially useful in regex patterns.

string = '\n and \r are escape sequences.'
result = re.findall(r'[\n\r]', string)
print(result)
# Output: ['\n', '\r']

For deeper exploration of the re module, refer to the official Python documentation.

Mastering Python’s @property Decorator: Clean, Backward‑Compatible Getters & Setters Mastering Python datetime: Practical Guide to Dates, Times, and Timezones

Python

Python Regular Expressions (re Module) – A Practical Guide

Python Regular Expressions (re Module) – A Practical Guide

What Is a Regular Expression?

Specifying Patterns with Metacharacters

Square Brackets: []

Period: .

Caret: ^ and Dollar: $

Quantifiers: *, +, ?, {n,m}

Alternation: |

Grouping: ()

Escaping: \

Special Sequences

Start of String: \A

Word Boundary: \b

Non‑Word Boundary: \B

Digit: \d / Non‑Digit: \D

Whitespace: \s / Non‑Whitespace: \S

Word Character: \w / Non‑Word: \W

End of String: \Z

Using RegEx in Python

re.findall()

re.split()

re.sub()

re.subn()

re.search()

Match Object

Using Raw Strings (prefix r)

Square Brackets: `[]`

Period: `.`

Caret: `^` and Dollar: `$`

Quantifiers: `*`, `+`, `?`, `{n,m}`

Alternation: `|`

Grouping: `()`

Escaping: `\`

Start of String: `\A`

Word Boundary: `\b`

Non‑Word Boundary: `\B`

Digit: `\d` / Non‑Digit: `\D`

Whitespace: `\s` / Non‑Whitespace: `\S`

Word Character: `\w` / Non‑Word: `\W`

End of String: `\Z`

Using Raw Strings (prefix `r`)