{"id":20493640,"url":"https://github.com/code-lucidal58/regex_tutorial","last_synced_at":"2025-10-17T20:51:31.204Z","repository":{"id":88633399,"uuid":"134375129","full_name":"code-lucidal58/regex_tutorial","owner":"code-lucidal58","description":"Cheatsheet on Regular Expressions Concepts","archived":false,"fork":false,"pushed_at":"2018-05-23T09:32:03.000Z","size":10,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-01-16T05:55:44.135Z","etag":null,"topics":["regex","regex-tutorial","regular-expressions","regular-expressions-tutorial"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/code-lucidal58.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-05-22T07:11:08.000Z","updated_at":"2024-04-13T09:24:39.000Z","dependencies_parsed_at":"2024-03-25T14:00:33.460Z","dependency_job_id":null,"html_url":"https://github.com/code-lucidal58/regex_tutorial","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/code-lucidal58%2Fregex_tutorial","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/code-lucidal58%2Fregex_tutorial/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/code-lucidal58%2Fregex_tutorial/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/code-lucidal58%2Fregex_tutorial/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/code-lucidal58","download_url":"https://codeload.github.com/code-lucidal58/regex_tutorial/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":242075255,"owners_count":20068224,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["regex","regex-tutorial","regular-expressions","regular-expressions-tutorial"],"created_at":"2024-11-15T17:36:02.845Z","updated_at":"2025-10-17T20:51:31.109Z","avatar_url":"https://github.com/code-lucidal58.png","language":null,"readme":"# Regex Tutorial\nShort Notes on Regular Expressions\n\nThe first thing to recognize when using regular expressions is that everything is essentially a character.Patterns are written to match a specific sequence of characters. Mostly normal ASCII is used, but unicode characters can also be used to match international text. Patterns created for regex are case-sensitive.\u003cbr\u003e\n\nForward slash(\\\\) will be used to create metacharacters. Metacharacters are characters that have special meaning for regex engine.\n### digits\n**\\d** can be used to replace any digit from 0 to 9.\nAll these texts will match this simple expression, as these sequences of characters have digits:\n```text\nABc234adf\nvar g=0\n```\n\n### wildcards\nAs Joker in a deck of cards can represent any card, dot(.) can be used to match any character, be it alphanumeric, symbol or whitespace. This metacharacter ovverrides period. To match a period, it needs to be escaped using a slash **\\\\.** \n\n### matching specific characters\nSpecific character can be matched by defining them inside square brackets[]. E.g. **[abc]**. This will match only a, b or c and nothing else.\n```text\nmatch \tcan \t\nmatch \tman \t\nmatch \tfan \t\nskip \tdan \t\nskip \tran \t\nskip \tpan\nCorresponding regex: [cmf]a\n```\n\n### excluding specific characters\nSpecific characters can be excluded using square brackets and hat(^). E.g. **[^abc]** will match any character except letters a,b, or c.\n```text\nmatch \thog \t\nmatch \tdog \t\nskip \tbog\nCorresponding regex: [^b]o\n```\n\n### character ranges\nIf a set of sequential characters are to be denoted, inside square brackets, character range can be defined. E.g. **[a-z]** this will include any lowercase letter from a to z. Suppose you need to exclude a range of characters: E.g. **[^n-p]**. Multiple character ranges can also in included in a single bracket set. **[A-Z0-9a-z]** This includes all alphanumeric characters.\n\n### catching repetitions\nCurly braces{} are used to denote how many repetitions of a character is required. E.g. **a{3}** This denotes _a_ repeated thrice. A range of repetitions can also be declared. E.g. **a{1,3}** meaning _a_ will be matched atleast once and no more than 3 times. This notation can be used with any character or metacharacters.\n```text\nmatch \twazzzzzup\nmatch \twazzzup\nskip \twazup\nCorresponding regex: z{3,6}\n```\n\n### Kleene star and Kleene plus\nThis is a powerful concept in regular expressions with the ability to match an arbitrary number of charcaters. Let's understand the two terms through examples.\n**Kleene star**: Meaning 0 or more. \u003cbr\u003e\n**\\d*** =\u003e match any number of digits in the sequence, maybe zero.\n**Kleene plus**: Meaning 1 or more. \u003cbr\u003e\n**\\d+** =\u003e match any number of digits in the sequence, atleast one.\u003cbr\u003e\u003cbr\u003e\nThese can be used with any character and metacharacters.\n```text\nmatch \taaaabcc\nmatch \taabbbbc\nmatch \taacc\nskip \ta\nCorresponding regex: c+\n```\n### optional characters\nThe metacharacter question mark(?) denotes **optionality**. This metacharacter matches either zero or one of the preceding character or group. For example, the pattern **ab?c** will match either the strings \"abc\" or \"ac\" because the b is considered optional. Similar to the dot metacharacter, the question mark is a special character and it has to be escaped using a slash **\\?** to match a plain question mark character in a string.\n```text\nmatch \t1 file found?\nmatch \t2 files found?\nmatch \t24 files found?\nskip \tNo files found.\nCorresponding regex: \\d+ files? found\\?\n```\n### whitespaces\nWhitespaces include space(\\_), tab(\\t), newline(\\n) and carriage return(\\r). Apart from these metacharacters, \\s covers all whitespaces.\n```text\nmatch \t1.   abc\nmatch \t2.\tabc\nmatch \t3.           abc\nskip \t4.abc\nCorresponding regex: \\d\\.\\s+abc\n```\n\n### starting and ending\nIt is best practice to write as specific regular expressions as possible to ensure that false positivesdo not creep in. E.g. search for 'success' in a file also taking into account 'Error: unsuccessful attempt'. To tighten patterns, **(^)hat** and **($)dollar** signs are used to mark the start and end of a line. ***Note***: This hat sign is different from the one used earlier in this tutorial to exclude characters.\n```text\nmatch \tMission: successful\nskip \tLast Mission: unsuccessful\nskip \tNext Mission: successful upon capture of target\nCorresponding regex: ^Mission: successful$\n```\n\n### match groups\nRegular expressions allow information extraction for further processing. This is done by defining groups of characters and capturing them using the special parentheses **(** and **)** metacharacters. Any subpattern inside a pair of parentheses will be captured as a group. For example, **^(IMG\\d+\\.png)$** will capture and extract the full image filename, but if extension is not required, the pattern will be **^(IMG\\d+)\\.png$** which only captures the part before the period.\n```text\ncapture \tfile_record_transcript.pdf \t-\u003e file_record_transcript\ncapture \tfile_07241999.pdf -\u003e file_07241999\nskip \ttestfile_fake.pdf.tmp\nCorresponding regex: ^(file.+)\\.pdf$\n```\n\n### nested groups\nNested groups can be used to extract multiple layers of information. Using previous example,the filename and the picture number both can be extracted using the same pattern by writing an expression like **^(IMG(\\d+))\\.png$**. The nested groups are read from left to right in the pattern, with the first capture group being the contents of the first parentheses group, etc.\n```text\ncapture \tJan 1987 -\u003e Jan 1987 1987\ncapture \tMay 1969 -\u003e\tMay 1969 1969\ncapture \tAug 2011 -\u003e\tAug 2011 2011\nCorresponding regex: (\\w+\\s(\\d+))\n```\n\n### conditionals\nThe **| (logical OR, aka. the pipe)** is used to denote different possible sets of characters. Example, \"Buy more (milk|bread|juice)\" will match only the strings _Buy more milk_, _Buy more bread_, or _Buy more juice_. \n```text\nmatch \tI love cats\nmatch \tI love dogs\nskip \tI love logs\nskip \tI love cogs\nCorresponding regex: I love (cats|dogs)\n```\n\n### back referencing and other special characters\nBack referencing varies depending on the implementation. However, many systems allow to reference captured groups by using **\\0** (usually the full matched text), **\\1** (group 1), **\\2** (group 2), etc. For example, **\"\\2-\\1\"** to put the second captured data first, and the first captured data second.\nAdditionally, there is a special metacharacter \\b which matches the boundary between a word and a non-word character. It's most useful in capturing entire words (for example by using the pattern \\w+\\b).\n\n## Recaptulation\n\u003ctable\u003e\n  \u003ctr\u003e \u003ctd\u003eabc…\u003c/td\u003e\u003ctd\u003eLetters\u003c/td\u003e \u003c/tr\u003e\n  \u003ctr\u003e \u003ctd\u003e123…\u003c/td\u003e\u003ctd\u003eDigits\u003c/td\u003e \u003c/tr\u003e\n  \u003ctr\u003e \u003ctd\u003e\\d\u003c/td\u003e\u003ctd\u003eAny Digit\u003c/td\u003e \u003c/tr\u003e\n  \u003ctr\u003e \u003ctd\u003e\\D\u003c/td\u003e\u003ctd\u003eAny Non-digit character\u003c/td\u003e \u003c/tr\u003e\n  \u003ctr\u003e \u003ctd\u003e.\u003c/td\u003e\u003ctd\u003eAny Character\u003c/td\u003e \u003c/tr\u003e\n  \u003ctr\u003e \u003ctd\u003e\\.\u003c/td\u003e\u003ctd\u003ePeriod\u003c/td\u003e \u003c/tr\u003e\n  \u003ctr\u003e \u003ctd\u003e[abc]\u003c/td\u003e\u003ctd\u003eOnly a, b, or c\u003c/td\u003e \u003c/tr\u003e\n  \u003ctr\u003e \u003ctd\u003e[^abc]\u003c/td\u003e\u003ctd\u003eNot a, b, nor c\u003c/td\u003e \u003c/tr\u003e\n  \u003ctr\u003e \u003ctd\u003e[a-z]\u003c/td\u003e\u003ctd\u003eCharacters a to z\u003c/td\u003e \u003c/tr\u003e\n  \u003ctr\u003e \u003ctd\u003e[0-9]\u003c/td\u003e\u003ctd\u003eNumbers 0 to 9\u003c/td\u003e \u003c/tr\u003e\n  \u003ctr\u003e \u003ctd\u003e\\w\u003c/td\u003e\u003ctd\u003eAny Alphanumeric character\u003c/td\u003e \u003c/tr\u003e\n  \u003ctr\u003e \u003ctd\u003e\\W\u003c/td\u003e\u003ctd\u003eAny Non-alphanumeric character\u003c/td\u003e \u003c/tr\u003e\n  \u003ctr\u003e \u003ctd\u003e{m}\u003c/td\u003e\u003ctd\u003em Repetitions\u003c/td\u003e \u003c/tr\u003e \n  \u003ctr\u003e \u003ctd\u003e{m,n}\u003c/td\u003e\u003ctd\u003em to n Repetitions\u003c/td\u003e \u003c/tr\u003e \n  \u003ctr\u003e \u003ctd\u003e*\u003c/td\u003e\u003ctd\u003eZero or more repetitions\u003c/td\u003e \u003c/tr\u003e\n  \u003ctr\u003e \u003ctd\u003e+\u003c/td\u003e\u003ctd\u003eOne or more repetitions\u003c/td\u003e \u003c/tr\u003e \n  \u003ctr\u003e \u003ctd\u003e?\u003c/td\u003e\u003ctd\u003eOptional character\u003c/td\u003e \u003c/tr\u003e \n  \u003ctr\u003e \u003ctd\u003e\\s\u003c/td\u003e\u003ctd\u003eAny Whitespace\u003c/td\u003e \u003c/tr\u003e\n  \u003ctr\u003e \u003ctd\u003e\\S\u003c/td\u003e\u003ctd\u003eAny Non-whitespace character\u003c/td\u003e \u003c/tr\u003e\n  \u003ctr\u003e \u003ctd\u003e^…$\u003c/td\u003e\u003ctd\u003eStarts and ends\u003c/td\u003e \u003c/tr\u003e\n  \u003ctr\u003e \u003ctd\u003e(…)\u003c/td\u003e\u003ctd\u003eCapture Group\u003c/td\u003e \u003c/tr\u003e\n  \u003ctr\u003e \u003ctd\u003e(a(bc))\u003c/td\u003e\u003ctd\u003eCapture Sub-group\u003c/td\u003e \u003c/tr\u003e\n  \u003ctr\u003e \u003ctd\u003e(.*)\u003c/td\u003e\u003ctd\u003eCapture all\u003c/td\u003e \u003c/tr\u003e\n  \u003ctr\u003e \u003ctd\u003e(abc|def)\u003c/td\u003e\u003ctd\u003eMatches abc or def\u003c/td\u003e \u003c/tr\u003e\n\u003c/table\u003e\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcode-lucidal58%2Fregex_tutorial","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcode-lucidal58%2Fregex_tutorial","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcode-lucidal58%2Fregex_tutorial/lists"}