Industrial manufacturing
Industrial Internet of Things | Industrial materials | Equipment Maintenance and Repair | Industrial programming |
home  MfgRobots >> Industrial manufacturing >  >> Industrial programming >> Python

Python Regular Expressions (re Module) – A Practical Guide

Python Regular Expressions (re Module) – A Practical Guide

Discover how to harness the power of regular expressions in Python with the re module. Step‑by‑step examples illustrate common patterns, matching techniques, and advanced functions.

What Is a Regular Expression?

A regular expression (RegEx) is a sequence of characters that defines a search pattern. For example, the pattern ^a...s$ matches any five‑letter string that starts with a and ends with s.

^a...s$

Patterns can be used to match against strings. The following table demonstrates how the pattern behaves with different inputs:

ExpressionStringMatched?
^a...s$absNo match
aliasMatch
abyssMatch
AliasNo match
An abacusNo match

Python’s re module provides the tools you need to work with RegEx. Here’s a quick example that uses re.match() to check a pattern against a string:

import re

pattern = '^a...s$'
test_string = 'abyss'
result = re.match(pattern, test_string)

if result:
    print("Search successful.")
else:
    print("Search unsuccessful.")

When the pattern is found, re.match() returns a match object; otherwise it returns None.


Specifying Patterns with Metacharacters

Metacharacters are special symbols that the regex engine interprets in a unique way. Below is a quick reference for the most common metacharacters:

[] . ^ $ * + ? {} () \ |


Square Brackets: []

Square brackets define a set of characters to match. For instance, [abc] matches any single occurrence of a, b, or c.

ExpressionStringMatched?
[abc]a1 match
ac2 matches
Hey JudeNo match
abc de ca5 matches

Ranges can be expressed with a hyphen, e.g., [a-e] equals [abcde]. To invert a set, place a caret ^ after the opening bracket, e.g., [^abc] matches any character except a, b, or c.


Period: .

The dot matches any single character except a newline.

ExpressionStringMatched?
..aNo match
ac1 match
acd1 match
acde2 matches (contains 4 characters)

Caret: ^ and Dollar: $

The caret asserts the start of a string; the dollar sign asserts the end. For example, ^a matches any string beginning with a, while a$ matches any string ending with a.

ExpressionStringMatched?
^aa1 match
abc1 match
bacNo match
^ababc1 match
acbNo match (starts with a but not followed by b)
ExpressionStringMatched?
a$a1 match
formula1 match
cabNo match

Quantifiers: *, +, ?, {n,m}

These symbols control how many times the preceding element should appear.

Examples:

ExpressionStringMatched?
ma*nmn1 match
man1 match
maaan1 match
mainNo match (a is not followed by n)
woman1 match
ExpressionStringMatched?
ma+nmnNo match (no a)
man1 match
maaan1 match
mainNo match (a is not followed by n)
woman1 match
ExpressionStringMatched?
ma?nmn1 match
man1 match
maaanNo match (more than one a)
mainNo match (a is not followed by n)
woman1 match
ExpressionStringMatched?
a{2,3}abc datNo match
abc daat1 match (at daat)
aabc daaat2 matches (at aabc and daaat)
aabc daaaat2 matches (at aabc and daaaat)

Commonly used pattern: [0-9]{2,4} matches two to four consecutive digits.

ExpressionStringMatched?
[0-9]{2,4}ab123csde1 match (at ab123csde)
12 and 3456733 matches (12, 3456, 73)
1 and 2No match

Alternation: |

The vertical bar implements logical OR. For example, a|b matches any string containing a or b.

ExpressionStringMatched?
a|bcdeNo match
ade1 match (at ade)
acdbea3 matches (at acdbea)

Grouping: ()

Parentheses group sub‑patterns and capture matched substrings. Example: (a|b|c)xz matches any string that contains a, b, or c followed by xz.

ExpressionStringMatched?
(a|b|c)xzab xzNo match
abxz1 match (at abxz)
axz cabxz2 matches (at axzbc cabxz)

Escaping: \

Use a backslash to treat a metacharacter as a literal. For instance, \$a matches the literal sequence $a rather than interpreting $ as the end‑of‑string anchor.


Special Sequences

Special sequences simplify common patterns. Below is a reference list with illustrative examples.

Start of String: \A

ExpressionStringMatched?
\Athethe sunMatch
In the sunNo match

Word Boundary: \b

ExpressionStringMatched?
\bfoofootballMatch
a footballMatch
afootballNo match
foo\bthe fooMatch
the afoo testMatch
the afootestNo match

Non‑Word Boundary: \B

ExpressionStringMatched?
\BfoofootballNo match
a footballNo match
afootballMatch
foo\Bthe fooNo match
the afoo testNo match
the afootestMatch

Digit: \d / Non‑Digit: \D

ExpressionStringMatched?
\d12abc33 matches (at 12abc3)
PythonNo match
\D1ab34"503 matches (at 1ab34"50)
1345No match

Whitespace: \s / Non‑Whitespace: \S

ExpressionStringMatched?
\sPython RegEx1 match
PythonRegExNo match
\Sa b2 matches (at a b)
   No match

Word Character: \w / Non‑Word: \W

ExpressionStringMatched?
\w12&": ;c 3 matches (at 12": ;c)
"%> !No match
\W1a2%c1 match (at 1a2%c)
PythonNo match

End of String: \Z

ExpressionStringMatched?
Python\ZI like Python1 match
I like Python ProgrammingNo match
Python is fun.No match

Tip: Use an online regex tester like regex101 to craft and debug patterns quickly.


Using RegEx in Python

Python’s re module offers a rich set of functions and constants for regex operations. Import it with:

import re

Below are the most common utilities and how to use them.


re.findall()

Returns a list of all non‑overlapping matches in a string.

# Extract all numbers from a string
import re

string = 'hello 12 hi 89. Howdy 34'
pattern = '\d+'

result = re.findall(pattern, string)
print(result)
# Output: ['12', '89', '34']

When no match is found, an empty list is returned.


re.split()

Splits a string at each point where the pattern matches, returning a list of substrings.

import re

string = 'Twelve:12 Eighty nine:89.'
pattern = '\d+'

result = re.split(pattern, string)
print(result)
# Output: ['Twelve:', ' Eighty nine:', '.']

Use the optional maxsplit argument to limit the number of splits. A value of 0 (the default) splits at every match.

import re

string = 'Twelve:12 Eighty nine:89 Nine:9.'
pattern = '\d+'

# Split only at the first occurrence
result = re.split(pattern, string, 1)
print(result)
# Output: ['Twelve:', ' Eighty nine:89 Nine:9.']

re.sub()

Replaces all occurrences of a pattern with a replacement string.

# Remove all whitespace characters
import re

string = 'abc 12\nde 23 \n f45 6'
pattern = '\s+'
replace = ''

new_string = re.sub(pattern, replace, string)
print(new_string)
# Output: abc12de23f456

To limit replacements, provide a count argument. A value of 0 replaces every match.

new_string = re.sub(r'\s+', replace, string, 1)
print(new_string)
# Output:
# abc12de 23
# f45 6

re.subn()

Like re.sub(), but returns a tuple containing the new string and the number of substitutions performed.

new_string = re.subn(pattern, replace, string)
print(new_string)
# Output: ('abc12de23f456', 4)

re.search()

Finds the first location where the pattern matches.

string = "Python is fun"
match = re.search('\\APython', string)

if match:
    print("pattern found inside the string")
else:
    print("pattern not found")
# Output: pattern found inside the string

Match Object

When a match is found, re.search() returns a Match object. Common methods and attributes include:

string = '39801 356, 2102 1111'
pattern = '(\\d{3}) (\\d{2})'

match = re.search(pattern, string)

if match:
    print(match.group())
    print(match.group(1))
    print(match.group(2))
    print(match.groups())
    print(match.start(), match.end(), match.span())
else:
    print("pattern not found")
# Output:
# 801 35
# 801
# 35
# ('801', '35')
# 2 8 (2, 8)

Using Raw Strings (prefix r)

Prefixing a string literal with r treats backslashes as literal characters, preventing accidental escape sequences. This is especially useful in regex patterns.

string = '\n and \r are escape sequences.'
result = re.findall(r'[\n\r]', string)
print(result)
# Output: ['\n', '\r']

For deeper exploration of the re module, refer to the official Python documentation.

Python

  1. Python Keywords and Identifiers: Mastering Reserved Words and Naming Conventions
  2. Mastering Python Operators: A Comprehensive Guide
  3. Python List Operations: Creation, Access, Modification, and Advanced Techniques
  4. Mastering Python Tuples: Creation, Access, and Advanced Operations
  5. Mastering Python Dictionaries: Creation, Manipulation, and Advanced Techniques
  6. Mastering Python's strftime(): Convert Dates and Times to Readable Strings
  7. Master Python's strptime() for Accurate Date Parsing
  8. Mastering Python’s time Module: Functions, Structs, and Practical Examples
  9. Master Python Regular Expressions: re.match(), re.search(), re.findall() – Practical Examples
  10. Master Python Regular Expressions: A Practical Guide