{"id":34039717,"url":"https://github.com/benouinirachid/patterns-finder","last_synced_at":"2026-04-09T04:04:19.501Z","repository":{"id":65371585,"uuid":"479858514","full_name":"benouinirachid/patterns-finder","owner":"benouinirachid","description":"Simple, Fast, Powerful and Easily extensible python package for extracting patterns from text, with over than 60 predefined Regular Expressions.","archived":false,"fork":false,"pushed_at":"2022-11-26T19:12:43.000Z","size":85,"stargazers_count":25,"open_issues_count":1,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2026-03-11T02:46:51.716Z","etag":null,"topics":["annotations","information-extraction","regular-expression","visualization"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/benouinirachid.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-04-09T22:34:37.000Z","updated_at":"2025-12-25T18:27:31.000Z","dependencies_parsed_at":"2023-01-23T00:16:06.981Z","dependency_job_id":null,"html_url":"https://github.com/benouinirachid/patterns-finder","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/benouinirachid/patterns-finder","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/benouinirachid%2Fpatterns-finder","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/benouinirachid%2Fpatterns-finder/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/benouinirachid%2Fpatterns-finder/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/benouinirachid%2Fpatterns-finder/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/benouinirachid","download_url":"https://codeload.github.com/benouinirachid/patterns-finder/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/benouinirachid%2Fpatterns-finder/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31584820,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-08T14:31:17.711Z","status":"online","status_checked_at":"2026-04-09T02:00:06.848Z","response_time":112,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["annotations","information-extraction","regular-expression","visualization"],"created_at":"2025-12-13T21:46:49.646Z","updated_at":"2026-04-09T04:04:19.488Z","avatar_url":"https://github.com/benouinirachid.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# patterns-finder\n\nSimple, Fast, Powerful and Easily extensible python package for extracting patterns from text, with over than 60 predefined Regular Expressions.\n\nThis library offers the capabilities:\n* A set of predefined patterns with the most useful regex.\n* Extend the patterns, by adding user defined regex.\n* Find and extarct patterns from text\n* Pandas' Dataframe support.\n* Sort the results of extraction.\n* Summarize the results of extraction.\n* Display extractions by visualy rich text annotation.\n* Build complex extraction rules based on regex (in future release).\n\n# Installation\n\nTo install the last version of patterns-finder library, use pip:\n\n```bash\npip install patterns-finder\n```\n\n# Usage\n\n## Find a pattern in the text\n\nJust import patterns, like `emoji` from `patterns_finder.patterns.web`, then you can use them to find pattern in text:\n\n```python\nfrom patterns_finder.patterns.web import emoji, url, email \n\nemoji.find(\"the quick #A52A2A 🦊 jumped 3 times over the lazy 🐶 \")\n# Output:\n# [(18, 19, 'EMOJI', '🦊'), (49, 50, 'EMOJI', '🐶')]\n\nurl.find(\"The lazy 🐶 has a website https://lazy.dog.com \")\n# Output:\n# [(25, 45, 'URL', 'https://lazy.dog.com')]\n\nemail.find(\"quick.brown@fox.com is the email of 🦊 \")\n# Output:\n# [(0, 19, 'EMAIL', 'quick.brown@fox.com')]\n\n```\n\nThe results provided by the method `find` for each of pattern are in the form:\n    \n    [(0, 19, 'EMAIL', 'quick.brown@fox.com')]\n      ^  ^       ^          ^ \n      |  |       |          |\n     Offset      |          └ Text matching the pattern\n      |  |       └ Label of the pattern\n      |  └ End index\n      └ Start index in the text\n\n## Find multiple patterns in the text\n\nTo search for different patterns in the text we can use the method `finder.patterns_in_text(text, patterns)` as follows:\n\n```python\nfrom patterns_finder import finder\nfrom patterns_finder.patterns.web import emoji, url, color_hex\nfrom patterns_finder.patterns.number import integer\n\npatterns = [emoji, color_hex, integer]\ntext = \"the quick #A52A2A 🦊 jumped 3 times over the lazy 🐶 \"\nfinder.patterns_in_text(text, patterns)\n# Output:\n# [(18, 19, 'EMOJI', '🦊'),\n#  (49, 50, 'EMOJI', '🐶'),\n#  (10, 17, 'COLOR_HEX', '#A52A2A'),\n#  (12, 14, 'INTEGER', '52'),\n#  (15, 16, 'INTEGER', '2'),\n#  (27, 28, 'INTEGER', '3')]\n```\n\n\n## Find user defined patterns in the text\n\nTo define new pattern you can use any regex pattern that are supported by the `regex` and `re` packages of python. User defined patterns can be writen in the form of string `regex pattern` or tuple of string `('regex pattern', 'label')`.\n\n```python\npatterns = [web.emoji, \"quick|lazy\", (\"\\\\b[a-zA-Z]+\\\\b\", \"WORD\") ]\ntext = \"the quick #A52A2A 🦊 jumped 3 times over the lazy 🐶 \"\nfinder.patterns_in_text(text, patterns)\n# Output: \n# [(18, 19, 'EMOJI', '🦊'),\n#  (49, 50, 'EMOJI', '🐶'),\n#  (4, 9, 'quick|lazy', 'quick'),\n#  (44, 48, 'quick|lazy', 'lazy'),\n#  (0, 3, 'WORD', 'the'),\n#  (4, 9, 'WORD', 'quick'),\n#  (20, 26, 'WORD', 'jumped'),\n#  (29, 34, 'WORD', 'times'),\n#  (35, 39, 'WORD', 'over'),\n#  (40, 43, 'WORD', 'the'),\n#  (44, 48, 'WORD', 'lazy')]\n```\n\n## Sort extraxted patterns \n\nBy using the argument `sort_by` of the method `finder.patterns_in_text` we can sort the extraction accoring to different options:\n- `sort_by=finder.START` sorts the results by the start index in the text\n\n```python\npatterns = [web.emoji, color_hex, ('\\\\b[a-zA-Z]+\\\\b', 'WORD') ]\nfinder.patterns_in_text(text, patterns, sort_by=finder.START)\n# Output:\n# [(0, 3, 'WORD', 'the'),\n#  (4, 9, 'WORD', 'quick'),\n#  (10, 17, 'COLOR_HEX', '#A52A2A'),\n#  (18, 19, 'EMOJI', '🦊'),\n#  (20, 26, 'WORD', 'jumped'),\n#  (29, 34, 'WORD', 'times'),\n#  (35, 39, 'WORD', 'over'),\n#  (40, 43, 'WORD', 'the'),\n#  (44, 48, 'WORD', 'lazy'),\n#  (49, 50, 'EMOJI', '🐶')]\n```\n\n- `sort_by=finder.END` sorts the results by the end index in the text\n```python\nfinder.patterns_in_text(text, patterns, sort_by=finder.END)\n# Output:\n# [(0, 3, 'WORD', 'the'),\n#  (4, 9, 'WORD', 'quick'),\n#  (10, 17, 'COLOR_HEX', '#A52A2A'),\n#  (18, 19, 'EMOJI', '🦊'),\n#  (20, 26, 'WORD', 'jumped'),\n#  (29, 34, 'WORD', 'times'),\n#  (35, 39, 'WORD', 'over'),\n#  (40, 43, 'WORD', 'the'),\n#  (44, 48, 'WORD', 'lazy'),\n#  (49, 50, 'EMOJI', '🐶')]\n```\n\n- `sort_by=finder.LABEL` sorts the results by pattern's label\n```python\nfinder.patterns_in_text(text, patterns, sort_by=finder.LABEL)\n# Output:\n# [(10, 17, 'COLOR_HEX', '#A52A2A'),\n#  (18, 19, 'EMOJI', '🦊'),\n#  (49, 50, 'EMOJI', '🐶'),\n#  (0, 3, 'WORD', 'the'),\n#  (4, 9, 'WORD', 'quick'),\n#  (20, 26, 'WORD', 'jumped'),\n#  (29, 34, 'WORD', 'times'),\n#  (35, 39, 'WORD', 'over'),\n#  (40, 43, 'WORD', 'the'),\n#  (44, 48, 'WORD', 'lazy')]\n```\n\n- `sort_by=finder.TEXT` sorts the results by the extracted text\n```python\nfinder.patterns_in_text(text, patterns, sort_by=finder.TEXT)\n# Output:\n# [(10, 17, 'COLOR_HEX', '#A52A2A'),\n#  (20, 26, 'WORD', 'jumped'),\n#  (44, 48, 'WORD', 'lazy'),\n#  (35, 39, 'WORD', 'over'),\n#  (4, 9, 'WORD', 'quick'),\n#  (0, 3, 'WORD', 'the'),\n#  (40, 43, 'WORD', 'the'),\n#  (29, 34, 'WORD', 'times'),\n#  (49, 50, 'EMOJI', '🐶'),\n#  (18, 19, 'EMOJI', '🦊')]\n```\n\n## Summarize results of extraction\n\nBy using the argument `summary_type`, one can choose the desired form of output results.\n- `summary_type=finder.NONE` retruns a list with all details, without summarization.\n\n```python\npatterns = [ color_hex, ('\\\\b[a-zA-Z]+\\\\b', 'WORD'), web.emoji ]\nfinder.patterns_in_text(text, patterns, summary_type=finder.NONE)\n# Output:\n# [(10, 17, 'COLOR_HEX', '#A52A2A'),\n#  (0, 3, 'WORD', 'the'),\n#  (4, 9, 'WORD', 'quick'),\n#  (20, 26, 'WORD', 'jumped'),\n#  (29, 34, 'WORD', 'times'),\n#  (35, 39, 'WORD', 'over'),\n#  (40, 43, 'WORD', 'the'),\n#  (44, 48, 'WORD', 'lazy'),\n#  (18, 19, 'EMOJI', '🦊'),\n#  (49, 50, 'EMOJI', '🐶')]\n```\n\n- `summary_type=finder.LABEL_TEXT_OFFSET` returns a dictionary of patterns labels as keys, with the corresponding offsets and text as values. \n\n```python\nfinder.patterns_in_text(text, patterns, summary_type=finder.LABEL_TEXT_OFFSET)\n# Output:\n# {\n#  'COLOR_HEX': [[10, 17, '#A52A2A']],\n#  'WORD': [[0, 3, 'the'], [4, 9, 'quick'], [20, 26, 'jumped'], [29, 34, 'times'], [35, 39, 'over'], [40, 43, 'the'], [44, 48, 'lazy']],\n#  'EMOJI': [[18, 19, '🦊'], [49, 50, '🐶']]\n# }\n```\n\n- `summary_type=finder.LABEL_TEXT` returns a dictionary of patterns labels as keys, with the corresponding text (without offset) as values. \n\n```python\nfinder.patterns_in_text(text, patterns, summary_type=finder.LABEL_TEXT)\n# Output:\n# {\n#  'COLOR_HEX': ['#A52A2A'],\n#  'WORD': ['the', 'quick', 'jumped', 'times', 'over', 'the', 'lazy'],\n#  'EMOJI': ['🦊', '🐶']\n# }\n```\n\n- `summary_type=finder.TEXT_ONLY` returns a list of the extracted text only. \n\n```python\nfinder.patterns_in_text(text, patterns, summary_type=finder.TEXT_ONLY)\n# Output:\n# ['#A52A2A', 'the', 'quick', 'jumped', 'times', 'over', 'the', 'lazy', '🦊', '🐶']\n```\n\n## Extract patterns from Pandas DataFrame\n\nThis package provides the capability to extract patterns from Pandas' DataFrame easily, by using the method `finder.patterns_in_df(df, input_col, output_col, patterns, ...)`.\n\n```python\nfrom patterns_finder import finder\nfrom patterns_finder.patterns import web\nimport pandas as pd\n\npatterns = [web.email, web.emoji, web.url]\n\ndf = pd.DataFrame(data={\n    'text': [\"the quick #A52A2A 🦊 jumped 3 times over the lazy 🐶\",\n                    \"quick.brown@fox.com is the email of 🦊\",\n                    \"The lazy 🐶 has a website https://lazy.dog.com\"],\n    })\n\nfinder.patterns_in_df(df, \"text\", \"extraction\", patterns, summary_type=finder.LABEL_TEXT)\n# Output:\n# |    | text                                                 | extraction                                          |\n# |---:|:-----------------------------------------------------|:----------------------------------------------------|\n# |  0 | the quick #A52A2A 🦊 jumped 3 times over the lazy 🐶 | {'EMOJI': ['🦊', '🐶']}                            |\n# |  1 | quick.brown@fox.com is the email of 🦊               | {'EMAIL': ['quick.brown@fox.com'], 'EMOJI': ['🦊']} |\n# |  2 | The lazy 🐶 has a website https://lazy.dog.com       | {'EMOJI': ['🐶'], 'URL': ['https://lazy.dog.com']}  |\n```\nThe method `finder.patterns_in_df` have also the arguments `summary_type` and `sort_by`.\n\n# List of all predefined patterns\n\n- Web\n```python\nfrom patterns_finder.web import email, url, uri, mailto, html_link, sql, color_hex, copyright, alphanumeric, emoji, username, quotation, ipv4, ipv6\n```\n\n- Phone\n```python\nfrom patterns_finder.phone import generic, uk, us\n```\n\n- Credit Cards\n```python\nfrom patterns_finder.credit_card import generic, visa, mastercard, discover, american_express\n```\n\n- Numbers\n```python\nfrom patterns_finder.number import integer, float, scientific, hexadecimal, percent, roman\n```\n\n- Currency\n```python\nfrom patterns_finder.currency import monetary, symbol, code, name\n```\n\n- Languages\n```python\nfrom patterns_finder.language import english, french, spanish, arabic, hebrew, turkish, russian, german, chinese, greek, japanese, hindi, bangali, armenian, swedish, portoguese, balinese, georgian\n```\n\n- Time and Date\n```python\nfrom patterns_finder.time_date import time, date, year\n```\n\n- Postal Code\n```python\nfrom patterns_finder.postal_code import us, canada, uk, france, spain, switzerland, brazilian\n```\n\n# Contact\n\nPlease email your questions or comments to [me](mailto:benouini.rachid@gmail.com).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbenouinirachid%2Fpatterns-finder","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbenouinirachid%2Fpatterns-finder","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbenouinirachid%2Fpatterns-finder/lists"}