What Are Regular Expressions
Regular expressions — commonly called regex or regexp — are patterns used to match character combinations in strings. Think of them as a powerful search language: instead of searching for an exact word, you describe a pattern of characters, and the regex engine finds all strings that fit that pattern.
Regex is deeply embedded in nearly every programming language and developer tool. You will find it in JavaScript, Python, Java, C#, PHP, Ruby, Go, shell scripts with grep and sed, text editors like VS Code and Sublime Text, databases, and even spreadsheet formulas. Once you learn regex, you carry a superpower that works everywhere.
The syntax might look intimidating at first — a typical regex like ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$ can seem like line noise. But regex is built from a small set of building blocks that fit together logically. This tutorial will walk you through each one.
Regex Fundamentals: The Building Blocks
Every regex is composed from three categories of tokens. Once you understand these, you can read and write any pattern.
1. Literal Characters
These are the simplest building blocks: they match exactly themselves. The regex cat matches the literal substring "cat" — nothing more, nothing less. It will match inside "catalog", "scatter", and "wildcat". Case matters: cat does not match "Cat" unless you enable case-insensitive mode (the i flag).
2. Metacharacters
Metacharacters have special meaning in regex and do not match themselves. The twelve regex metacharacters you need to know are:
| Metacharacter | Name | Matches |
|---|---|---|
. | Dot | Any single character except newline |
^ | Caret | Start of string (or start of line in multiline mode) |
$ | Dollar | End of string (or end of line in multiline mode) |
* | Star | Zero or more of the preceding token |
+ | Plus | One or more of the preceding token |
? | Question mark | Zero or one of the preceding token (makes it optional) |
{n,m} | Curly braces | Between n and m occurrences of the preceding token |
| | Pipe | Alternation (logical OR) — matches left or right side |
( ) | Parentheses | Grouping and capturing |
[ ] | Square brackets | Character class — matches any one character inside |
\ | Backslash | Escapes a metacharacter to match it literally |
If you need to match a literal metacharacter, escape it with a backslash. For example, \. matches a literal period, and \+ matches a literal plus sign.
3. Shorthand Character Classes
Shorthand classes save typing for common character groupings:
\d → [0-9] Any digit
\D → [^0-9] Any non-digit
\w → [a-zA-Z0-9_] Word character (letters, digits, underscore)
\W → [^a-zA-Z0-9_] Non-word character
\s → [ \t\n\r\f\v] Whitespace (space, tab, newline, etc.)
\S → [^ \t\n\r\f\v] Non-whitespace
These are the backbone of real-world patterns. For example, \d{3}-\d{3}-\d{4} matches a US phone number format like "555-123-4567".
Practical Regex Patterns You Can Use Today
Learning regex theory is important, but nothing beats seeing how patterns solve real problems. Here are battle-tested regex patterns for common validation tasks, with explanations of how each one works.
Email Validation
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
Breakdown:
^— Start of the string.[a-zA-Z0-9._%+-]+— The local part (before the @): one or more letters, digits, dots, underscores, percent signs, plus signs, or hyphens.@— The literal at sign.[a-zA-Z0-9.-]+— The domain name: one or more letters, digits, dots, or hyphens.\.— A literal dot separating the domain from the TLD.[a-zA-Z]{2,}— The top-level domain: two or more letters (e.g., com, org, io, online).$— End of the string.
URL Matching
^https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)$
Breakdown:
^https?:\/\/— Matcheshttp://orhttps://. Thes?makes the "s" optional.(www\.)?— Optionally matcheswww.at the start of the domain.[-a-zA-Z0-9@:%._\+~#=]{1,256}— The domain name: 1 to 256 allowed characters.\.[a-zA-Z0-9()]{1,6}— A dot followed by the TLD (1-6 characters).\b— A word boundary to ensure the TLD ends cleanly.([-a-zA-Z0-9()@:%_\+.~#?&//=]*)$— The optional path, query string, and fragment.
Phone Number Matching (International)
^\+?[\d\s\-().]{7,15}$
Breakdown:
^\+?— Optional leading plus sign for international prefix.[\d\s\-().]{7,15}— Between 7 and 15 characters consisting of digits, spaces, hyphens, parentheses, or dots. This covers formats like+1 (555) 123-4567,555.123.4567, and+44-20-7946-0958.$— End of string.
Strong Password Validation
^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[!@#$%^&*()_+\-=\[\]{};':\"\\|,.<>\/?]).{8,}$
Breakdown:
^— Start of string.(?=.*[a-z])— Positive lookahead: at least one lowercase letter exists somewhere ahead.(?=.*[A-Z])— Positive lookahead: at least one uppercase letter.(?=.*\d)— Positive lookahead: at least one digit.(?=.*[!@#$%^&*()_+\-=\[\]{};':\"\\|,.<>\/?])— Positive lookahead: at least one special character..{8,}— Total length of 8 or more characters.$— End of string.
Date Format Extraction (YYYY-MM-DD)
\b\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])\b
Breakdown:
\b— Word boundary.\d{4}— Exactly four digits for the year.-— Literal hyphen.(0[1-9]|1[0-2])— Month: 01-09 or 10-12.-— Literal hyphen.(0[1-9]|[12]\d|3[01])— Day: 01-09, 10-29, or 30-31.\b— Word boundary.
Beyond the Basics: Advanced Techniques
Once you are comfortable with fundamental patterns, these techniques unlock significantly more expressive power.
Lookaheads and Lookbehinds
Lookaround assertions let you check for a pattern without consuming characters — the regex engine peeks ahead or behind without moving its current position in the string.
# Positive lookahead: match "q" only if followed by "u"
q(?=u) → matches "q" in "queen", not "q" in "Iraq"
# Negative lookahead: match "q" only if NOT followed by "u"
q(?!u) → matches "q" in "Iraq", not "q" in "queen"
# Positive lookbehind: match digits only if preceded by "$"
(?<=\$)\d+ → matches "100" in "$100", not "100" in "abc100"
# Negative lookbehind: match digits NOT preceded by "$"
(?<!\$)\d+ → matches "100" in "abc100", not "100" in "$100"
Non-Capturing Groups
By default, parentheses create capturing groups that store matched text for later use (accessible via $1, \1, or match.groups()). When you only need grouping for structural purposes — such as alternation — use (?: ) to avoid the overhead of capturing:
# Capturing group: stores "dog" or "cat" as $1
^(dog|cat) food$
# Non-capturing group: groups without storing
^(?:dog|cat) food$
Greedy vs Lazy Quantifiers
By default, *, +, and {n,m} are greedy — they match as many characters as possible. Appending ? makes them lazy, matching as few characters as possible. This distinction is critical when the pattern appears multiple times in a string:
# Greedy: matches from first <p> to last </p>
<p>.*<\/p>
# Lazy: matches each <p>...</p> pair individually
<p>.*?<\/p>
# Input: "<p>First</p><p>Second</p>"
# Greedy matches: "<p>First</p><p>Second</p>" (one match)
# Lazy matches: "<p>First</p>" and "<p>Second</p>" (two matches)
Regex in Your Favorite Languages
Regex syntax is largely portable, but each language has its own API for applying patterns. Here is how to use the patterns from this tutorial in four popular languages:
// JavaScript
const emailRegex = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/;
console.log(emailRegex.test("[email protected]")); // true
console.log("Contact: [email protected]".match(emailRegex));
# Python
import re
email_regex = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
print(bool(re.match(email_regex, "[email protected]"))) # True
# Extract all emails from text:
text = "Email [email protected] and [email protected]"
print(re.findall(email_regex, text)) # ['[email protected]', '[email protected]']
// Java
import java.util.regex.*;
Pattern emailRegex = Pattern.compile(
"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"
);
Matcher m = emailRegex.matcher("[email protected]");
System.out.println(m.matches()); // true
# Ruby / grep / sed
echo "[email protected]" | grep -E '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
Conclusion
Regular expressions are one of the most valuable skills a developer can learn. A handful of patterns — character classes, quantifiers, anchors, and groups — cover the vast majority of real-world use cases. Start with simple patterns and build complexity incrementally, testing each step as you go. The patterns in this tutorial are production-ready and will serve you well across form validation, log parsing, data extraction, and search-and-replace workflows. Ready to test your own regex? Use our free online regex tester to write, test, and debug your regular expressions with real-time matching and explanation.