Regular Expression (Regex)
A regular expression (regex) is a search pattern for text. Instead of searching for an exact word, you describe a pattern: “find anything that looks like an email address” or “find all lines that start with a date.” It is like a wildcard search on steroids. Regex is used in every programming language, every text editor, and most command-line tools. It is powerful but notoriously hard to read.
A regular expression (regex/regexp) is a sequence of characters that defines a search pattern, used for pattern matching within strings. Regex is implemented in virtually every programming language and text processing tool.
Core syntax:
| Pattern | Matches | Example |
|---|---|---|
. | Any single character | a.c matches “abc”, “a1c” |
* | Zero or more of preceding | ab*c matches “ac”, “abc”, “abbc” |
+ | One or more of preceding | ab+c matches “abc”, “abbc” (not “ac”) |
? | Zero or one of preceding | colou?r matches “color”, “colour” |
^ | Start of line | ^Error matches lines starting with “Error” |
$ | End of line | \.log$ matches strings ending in “.log” |
[abc] | Character class | [aeiou] matches any vowel |
[^abc] | Negated class | [^0-9] matches non-digits |
\d | Digit (0-9) | \d{3} matches “123” |
\w | Word character (a-z, 0-9, _) | \w+ matches “hello_world” |
\s | Whitespace | \s+ matches spaces, tabs, newlines |
(...) | Capture group | (\d{4})-(\d{2}) captures year and month |
{n,m} | Quantity range | \d{1,3} matches 1-3 digits |
Flags: g (global), i (case-insensitive), m (multiline), s (dotall).
Performance warning: certain patterns cause catastrophic backtracking (exponential time complexity). Avoid nested quantifiers like (a+)+. Use atomic groups or possessive quantifiers where supported. Libraries like RE2 (Google) guarantee linear-time matching.
Regex in practice
# Find IP addresses in a log file
$ grep -oE '\b([0-9]{1,3}\.){3}[0-9]{1,3}\b' /var/log/auth.log
# Extract email addresses from text
$ grep -oP '[\w.+-]+@[\w-]+\.[\w.]+' contacts.txt
# Validate an IPv4 address (strict)
$ echo "192.168.1.1" | grep -P '^((25[0-5]|2[0-4]\d|1?\d\d?)\.){3}(25[0-5]|2[0-4]\d|1?\d\d?)$'
# Replace text in files (sed)
$ sed -i 's/http:\/\//https:\/\//g' config.yaml
# Find failed SSH logins with username
$ grep -P 'Failed password for (\w+) from' /var/log/auth.log
# Count unique source IPs in nginx access log
$ grep -oP '\d+\.\d+\.\d+\.\d+' access.log | sort -u | wc -l Regex is an essential tool for every IT professional. Log analysis (grep through millions of lines for error patterns), input validation (verify email, IP, phone number formats), text transformation (sed, awk for bulk config changes), and security monitoring (SIEM detection rules) all rely on regex. In web development, regex validates form inputs and parses URLs. In CI/CD, regex patterns trigger pipelines on specific branch names or file paths. The common complaint is readability: a production regex like ^(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)$ is valid but impenetrable. Always add comments explaining complex patterns. Tools like regex101.com provide interactive testing and explanation of any pattern.