https://github.com/eslamalawy/regex
Regular Expression
https://github.com/eslamalawy/regex
regex regular-expressions summary tutorial
Last synced: 5 months ago
JSON representation
Regular Expression
- Host: GitHub
- URL: https://github.com/eslamalawy/regex
- Owner: eslamalawy
- Created: 2025-05-25T06:25:48.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2025-05-25T07:07:01.000Z (11 months ago)
- Last Synced: 2025-06-04T18:33:53.774Z (10 months ago)
- Topics: regex, regular-expressions, summary, tutorial
- Language: JavaScript
- Homepage: https://regex-eslam.netlify.app/
- Size: 34.2 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Regular Expressions Summary
After studying a complete regex course, I've created this guide along with a simple website featuring practical examples.
Check it out: [](https://regex-eslam.netlify.app/)
## 📚 Table of Contents
1. [Explicit Characters and Quantifiers](#1-explicit-characters-and-quantifiers)
2. [Collections, Character Ranges, and Negation](#2-collections-character-ranges-and-negation)
3. [Whitespace Characters and String Boundaries](#3-whitespace-characters-and-string-boundaries)
4. [Character Classes](#4-character-classes)
5. [Flags](#5-flags)
6. [Greedy vs. Lazy Quantifiers](#6-greedy-vs-lazy-quantifiers)
7. [Multi-Character Strings Quantifiers and Options](#7-multi-character-strings-quantifiers-and-options)
8. [Capture Groups](#8-capture-groups)
9. [Substitution (Replace)](#9-substitution-replace)
10. [Lookarounds](#10-lookarounds)
---
## 1. Explicit Characters and Quantifiers
### Basic Quantifiers
| Quantifier | Description | Behavior |
|------------|-------------|----------|
| `?` | Zero or one times | Matches the previous token between zero and one times (greedy) |
| `*` | Zero or more times | Matches the previous token between zero and unlimited times (greedy) |
| `+` | One or more times | Matches the previous token between one and unlimited times (greedy) |
**Note:** If no quantifier is specified, it matches exactly once.
### Special Characters
- **Period (`.`)** - Matches any character except line terminators (`\n`)
### Characters That Need Escaping
These characters have special meaning in regex and must be escaped with backslash (`\`) to match them literally:
```
+ ? . { } [ ] ( ) ^ $
```
**Examples:**
```regex
\+ \? \. \{ \} \[ \] \( \) \^ \$
```
### Curly Brace Quantifiers `{}`
| Syntax | Description |
|--------|-------------|
| `{x,y}` | Between x and y times (inclusive) |
| `{x}` | Exactly x times |
| `{x,}` | x or more times |
**Examples:**
- `{1,3}` - Between one and three times
- `{3}` - Exactly three times
- `{4,}` - Four or more times
---
## 2. Collections, Character Ranges, and Negation
### Collections
Square brackets `[]` match **one** of any specified characters:
```regex
[0123456789ABCDEF] # Matches any single hexadecimal digit
[0-9A-F]+ # One or more hexadecimal digits (using ranges)
```
**Hexadecimal Reference:**
- Base 16 numbering system
- Digits: 0-9, A(10), B(11), C(12), D(13), E(14), F(15)
### Character Ranges
Use hyphen (`-`) inside collections to create ranges:
```regex
[0-9] # Numbers 0 through 9
[A-Z] # Uppercase letters A through Z
[a-z] # Lowercase letters a through z
[0-9A-F] # Hexadecimal digits
```
### Negation
Use caret (`^`) as the **first character** inside collections for negation:
```regex
[^a-z4] # Anything BUT lowercase letters a-z or the number 4
[^\.?!] # Anything BUT period, question mark, or exclamation point
```
### Practical Example
```regex
/[A-Z][^\.?!]+[\.?!]/
```
**Breakdown:**
- `[A-Z]` - Exactly one capital letter
- `[^\.?!]+` - One or more characters that are NOT `.`, `?`, or `!`
- `[\.?!]` - Ends with one of: `.`, `?`, or `!`
**Note:** Period (`.`) needs escaping inside collections, but `?` and `!` don't.
---
## 3. Whitespace Characters and String Boundaries
### Whitespace Characters
Backslash (`\`) serves dual purposes:
1. **Escape special characters:** `\.` (literal period)
2. **Create special tokens:** `\t` (tab character)
| Token | Description |
|-------|-------------|
| `\t` | Tab character |
| `\n` | New line |
| `\r` | Carriage return |
| `\f` | Form feed |
| `\v` | Vertical tab |
| `\r\n` | Windows-style new line |
### Space Character
No backslash needed for regular spaces:
```regex
/ / # Single space
/ +/ # One or more spaces
/ +.*/ # One or more spaces followed by any characters
```
### String Boundaries
| Anchor | Description |
|--------|-------------|
| `^` | Start of string |
| `$` | End of string |
**Important:** These don't match characters, they indicate position.
```regex
/^ha$/ # Matches only "ha" (entire string)
/^ +.*/ # Must start with one or more spaces
```
### Caret (`^`) Double Duty
1. **String boundary:** `^` (start of string)
2. **Negation:** `[^246]` (not 2, 4, or 6)
### Complete Example
```regex
/^[A-Z][^\.?!]+[\.?!]$/
```
**Breakdown:**
- `^[A-Z]` - Must start with a capital letter
- `[^\.?!]+` - One or more characters that are not `.`, `?`, or `!`
- `[\.?!]$` - Must end with `.`, `?`, or `!`
---
## 4. Character Classes
### Basic Character Classes
| Class | Description | Equivalent |
|-------|-------------|------------|
| `.` | Any character except newline | - |
| `\s` | Any whitespace character | `[\r\n\t\v\f ]` |
| `\S` | Any non-whitespace character | `[^\r\n\t\v\f ]` |
| `\d` | Any digit | `[0-9]` |
| `\D` | Any non-digit | `[^0-9]` |
| `\w` | Any word character | `[0-9A-Za-z_]` |
| `\W` | Any non-word character | `[^0-9A-Za-z_]` |
### Word Boundaries
| Boundary | Description |
|----------|-------------|
| `\b` | Word boundary |
| `\B` | Not a word boundary |
### Word Boundary Examples
```regex
\bcat\b # Matches 'cat' in "a black cat" but not in "catatonic"
✅ it like this: \W(=space) \b \W(=line terminator)
\bcat # Matches 'cat' in "catfish"
✅ it like this: \W(=start of the line) \b \w(=f)
✅ or default : \W \b \W
cat\b # Matches 'cat' in "tomcat"
✅ it like this: \w(=m) \b \W(=line terminator)
✅ or default : \W \b \W
```
**Word boundary conditions:**
- `\b` occurs between
- `\W` and `\W` characters (default)
- `\W` and `\w` characters
- `\w` and `\W` characters
- `\B` occurs when both sides are a word character
- `\w` and `\w` characters
### Practical Examples
```regex
/^\s+/ # Find whitespace at beginning of string
/^\S+/ # Must start with non-whitespace characters
/^\d+/ # Must start with one or more digits
```
---
## 5. Flags
Flags modify how the regex engine interprets the pattern:
| Flag | Name | Description |
|------|------|-------------|
| `g` | Global | Match as many times as possible (not just once) |
| `m` | Multi-line | `^` and `$` match start/end of each line, not just string |
| `i` | Case-insensitive | Match both upper and lowercase letters |
| `s` | Single-line | `.` matches newline characters (treats string as single line) |
### Usage Examples
```regex
/pattern/g # Global matching
/pattern/i # Case-insensitive
/pattern/gim # Multiple flags combined
```
---
## 6. Greedy vs. Lazy Quantifiers
### Understanding Greedy vs. Lazy
- **Greedy:** Take everything you can and still match
- **Lazy:** Take as little as you can to still match
### Default Behavior
```regex
/gre*/ # Greedy by default - matches as much as possible
```
### Making Quantifiers Lazy
Add `?` after the quantifier:
| Greedy | Lazy | Description |
|--------|------|-------------|
| `*` | `*?` | Zero or more (lazy) |
| `+` | `+?` | One or more (lazy) |
| `?` | `??` | Zero or one (lazy) |
| `{n,m}` | `{n,m}?` | Between n and m (lazy) |
### Practical Applications
**Sentence matching comparison:**
```regex
# Using character negation (previous approach)
/^[A-Z][^\.?!]+[\.?!]$/
# Using lazy quantifiers
/^[A-Z].+?[\.?!]$/
```
**Lazy quantifier examples:**
- `.+?` - One or more characters (lazy)
- `.*?` - Zero or more characters (lazy)
Both will match minimal characters before the final punctuation.
---
## 7. Multi-Character Strings Quantifiers and Options
### Multi-Character Options
Use pipe (`|`) for "OR" logic with multi-character strings:
```regex
/kittens|foals|ducklings/ # Matches any of these three words
```
### Groups for Multi-Character Tokens
**Syntax:** Parentheses `()`
```regex
/I love (kittens|foals|ducklings)/ # "I love " + any of the three options
```
### Quantifiers with Groups
```regex
/(kittens)+/ # One or more occurrences of the string "kittens"
```
### Three Ways to Handle "kittens"
1. **No container:** `/kittens+/`
- Quantifier applies only to last character ('s')
- Matches: "kitten", "kittens", "kittenss", etc.
2. **Square brackets:** `/[kittens]+/`
- Quantifier applies to character collection
- Matches: one or more of any letters k, i, t, e, n, s
3. **Parentheses:** `/(kittens)+/`
- Quantifier applies to entire group
- Matches: "kittens", "kittenskittens", etc.
### Practical Example: Digital Clock (24-hour format)
**Requirements:**
- Hours: 0-23
- Minutes: 00-59
```regex
/(1?\d|2[0-3]):[0-5]\d/
```
**Breakdown:**
- `(1?\d|2[0-3])` - Hours group:
- `1?\d` - Optional 1 + any digit (0-19)
- `|` - OR
- `2[0-3]` - 2 + digits 0-3 (20-23)
- `:` - Literal colon
- `[0-5]\d` - Minutes: first digit 0-5, second digit 0-9 (00-59)
### Collection Inside Group
```regex
/(W[0O]W)+/ # One or more of "WOW" or "W0W"
```
---
## 8. Capture Groups
### Basic Capture Groups
Groups `()` not only organize patterns but also **capture** matched text for later use.
### File Extension Example
Extract filename without extension:
```regex
/(.+)\.(png|jpe?g|pdf)/
```
**Breakdown:**
- `(.+)` - **Group 1:** Captures filename (one or more characters)
- `\.` - Literal dot
- `(png|jpe?g|pdf)` - **Group 2:** Captures extension
- `png` OR `jp` + optional `e` + `g` OR `pdf`
### Non-Capturing Groups
When you need grouping for logic but don't want to capture:
**Syntax:** `(?:)`
```regex
/(.+)\.(?:png|jpe?g|pdf)/
```
- `(.+)` - **Group 1:** Captures filename
- `(?:png|jpe?g|pdf)` - Groups for OR logic but doesn't capture
### Numbered Group References
Reference captured groups later in the same regex:
**Syntax:** `\1`, `\2`, `\3`, etc.
```regex
/<(\w+)>.*?<\/\1>/gm
```
**HTML tag matching breakdown:**
- `<(\w+)>` - Opening tag, captures tag name in Group 1
- `.*?` - Lazy match of content
- `<\/\1>` - Closing tag, references Group 1 (same tag name)
### Named Capture Groups
**Syntax:** `(?)`
```regex
/(?.+)\.(?png|jpe?g|pdf)/
```
**Benefits:**
- More readable than numbers
- Self-documenting code
- Easier maintenance
**References:**
- In regex: `\k` (some platforms)
- In replacement: `$` or `${name}`
---
## 9. Substitution (Replace)
### Basic Replacement
Replace matched patterns with fixed strings or references to captured groups.
### Simple Fixed Replacement
```regex
/kitten|puppy|piglet|foal|fawn|duckling|chick/
```
**Replace with:** `"cutie"`
### Group References in Replacement
```regex
/(kitten|puppy|piglet|foal|fawn|duckling|chick)/gm
```
**Replace with:** `$1` (references Group 1)
### Remove Characters
```regex
/[*_]/g
```
**Replace with:** `""` (empty string)
### Using Functions in Replacements
Instead of simple strings, use functions for dynamic replacements:
#### 1. Basic Function with Match Parameter
```regex
/\b(java|javascript|html|css)\b/gim
```
```javascript
function (match) {
return match.toUpperCase();
}
```
#### 2. Function with Capture Groups
```regex
/(\d{3})-(\d{3})-(\d{4})/gm
```
```javascript
function (match, area, prefix, line) {
return `+1 (${area}) ${prefix}-${line}`;
}
```
**Parameters:**
- `match` - Full matched text
- `area`, `prefix`, `line` - Individual captured groups
#### 3. Named Capture Groups in Replacement
```regex
/(?\d{2})\/(?\d{2})\/(?\d{4})/gm
```
**Replace with:** `"Date: $-$-$"`
**Benefits:**
- More readable than positional references
- Self-documenting replacements
- Easier to maintain complex patterns
---
## 10. Lookarounds
Lookarounds specify conditions without capturing them as part of the match. They're similar to boundary tokens (`^`, `$`, `\b`, `\B`) - they're part of the regex but don't correspond to characters in the match.
### Types of Lookarounds
| Type | Syntax | Description |
|------|--------|-------------|
| Positive Lookahead | `(?=)` | Must be followed by |
| Negative Lookahead | `(?!)` | Must NOT be followed by |
| Positive Lookbehind | `(?<=)` | Must be preceded by |
| Negative Lookbehind | `(?