{"id":15547221,"url":"https://github.com/radeklat/words-to-regular-expression","last_synced_at":"2026-03-16T02:34:07.989Z","repository":{"id":62588049,"uuid":"140072750","full_name":"radeklat/words-to-regular-expression","owner":"radeklat","description":"A command line tool and Python library for converting lists of strings into matching regular expressions (finite automata).","archived":false,"fork":false,"pushed_at":"2018-12-08T13:06:24.000Z","size":205,"stargazers_count":5,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"develop","last_synced_at":"2025-03-30T02:21:46.226Z","etag":null,"topics":["command-line-tool","matching-algorithm","python","python-library","regular-expression"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/radeklat.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-07-07T09:57:12.000Z","updated_at":"2022-09-27T03:14:42.000Z","dependencies_parsed_at":"2022-11-03T17:49:03.551Z","dependency_job_id":null,"html_url":"https://github.com/radeklat/words-to-regular-expression","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/radeklat%2Fwords-to-regular-expression","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/radeklat%2Fwords-to-regular-expression/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/radeklat%2Fwords-to-regular-expression/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/radeklat%2Fwords-to-regular-expression/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/radeklat","download_url":"https://codeload.github.com/radeklat/words-to-regular-expression/tar.gz/refs/heads/develop","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250487960,"owners_count":21438690,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["command-line-tool","matching-algorithm","python","python-library","regular-expression"],"created_at":"2024-10-02T13:07:46.597Z","updated_at":"2026-03-16T02:34:02.955Z","avatar_url":"https://github.com/radeklat.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Master Build Status](https://travis-ci.org/radeklat/words-to-regular-expression.svg?branch=master)](https://travis-ci.org/radeklat/words-to-regular-expression)\n[![Develop Build Status](https://travis-ci.org/radeklat/words-to-regular-expression.svg?branch=develop)](https://travis-ci.org/radeklat/words-to-regular-expression)\n\nCompatible with Python 3.4+\n\n# Purpose\n\nThis library and command line tool compresses multiple strings into one regular expression that can be used to find/match these strings later in larger piece of text.\n\n# Installation\n\nAs simple as `pip install w2re`\n\n## Example use\n\nInput string are: `is`, `in`, `it`, `if`, `the`, `than`\n\nAs a library:\n\n```python\nfrom w2re import iterable_to_regexp                                         \n    \niterable_to_regexp(['is', 'in', 'it', 'if', 'the', 'than'])\n```                 \n\n    '(?:i[fnst]|th(?:e|an))'\n    \nAs command line tool:\n\n```bash\necho -e \"is\\nin\\nit\\nif\\nthe\\nthan\" | w2re\n```\n    \n    (?:i[fnst]|th(?:e|an))\n    \nInput text is [The Zen of Python](https://www.python.org/dev/peps/pep-0020/#id3)\n\nCounting words:\n\n```python\nfrom collections import Counter\nfrom re import findall\n\nfrom requests import get\nfrom w2re import iterable_to_regexp\n\nCounter(\n    findall(\n        iterable_to_regexp(['is', 'in', 'it', 'if', 'the', 'than']),\n        get('https://raw.githubusercontent.com/python/peps/master/pep-0020.txt').text\n    )\n).most_common()                    \n```\n\n    [('is', 15), ('it', 12), ('in', 11), ('than', 8), ('the', 7), ('if', 2)]\n\n# Features\n\n## Collapsing multiple strings from command line input\n\nThis is very useful if you need to search for multiple strings and are not sure how to write the correct regexp (or like me, are lazy and write libraries for it instead).\n\nTerminate your input with EOF (Ctrl+D on empty line in Linux).\n\n```bash\nw2re\ni am searching for this\nand this\nand this as well\n```\n\n    (?:i\\ am\\ searching\\ for\\ this|and\\ this(?:\\ as\\ wel{2})?)\n\n## Collapsing of repeated sequences\n\n```bash\necho 'hahaha' | w2re\n```\n\n    (?:ha){3}\n\nThis unfortunately does not produce a range yet. E.g. `subsubsection`, `subsection` and `section` will become `s(?:ection|ubs(?:ection|ubsection))` rather than expected `(?:sub){0,2}section`.\n\n## Automatic escaping of regular expressions\n  \n```bash\necho '* test: ...' | w2re\n```\n\n    \\*\\ test\\:\\ \\.{3}\n\n## Reading words from a file on command line\n\n    w2re -i /usr/share/dict/words\n\n## Command line filter\n\n    head -n 10 /usr/share/dict/words | w2re\n    \n    A(?:\\'s|MD(?:\\'s)?|OL(?:\\'s)?|WS(?:\\'s)?|achen(?:\\'s)?)\n\n## Reading words from iterable\n\n```python\nimport w2re                                         \n    \nw2re.iterable_to_regexp(['is', 'in', 'it', 'if', 'the', 'than'])\n``` \n\n    '(?:i[fnst]|th(?:e|an))'\n        \n## Reading words from stream\n\n```python\nimport w2re                 \nimport io                        \n    \nw2re.stream_to_regexp(io.StringIO('is\\nin\\nit\\nif\\nthe\\nthan'))\n``` \n\n    '(?:i[fnst]|th(?:e|an))'\n\n## Multiple output formats\n\n### `w2re.PythonFormatter`\n\nStandard Python formatted regular expression, based on the [re](https://docs.python.org/3/library/re.html) module. This is the default formatter for command line and library.\n\n```python\nimport w2re                                         \n    \nw2re.iterable_to_regexp(['is', 'in', 'it', 'if', 'the', 'than'], w2re.PythonFormatter)\n```\n\n    '(?:i[fnst]|th(?:e|an))'\n\n### `w2re.PythonWordMatchFormatter`\n\nStandard Python formatted regular expression, based on the [re](https://docs.python.org/3/library/re.html) module. Suitable for matching whole words, rather than strings. Unlike `PythonFormatter`, it won't match `Python` in `Pythonista`.\n\n```python\nimport w2re                                         \n    \nw2re.iterable_to_regexp(['is', 'in', 'it', 'if', 'the', 'than'], w2re.PythonWordMatchFormatter)\n```\n\n    '(?:\\\\W+|\\\\A)((?:i[fnst]|th(?:e|an)))(?=\\\\W+|\\\\Z)'\n    \n### `w2re.BaseFormatter`\n\nBase class for implementation of custom formatters. See the [w2re.formatters](https://github.com/radeklat/words-to-regular-expression/blob/develop/w2re/formatters.py) module.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fradeklat%2Fwords-to-regular-expression","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fradeklat%2Fwords-to-regular-expression","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fradeklat%2Fwords-to-regular-expression/lists"}