{"id":17854216,"url":"https://github.com/ddelange/retrie","last_synced_at":"2025-04-04T18:04:56.166Z","repository":{"id":42398914,"uuid":"254226224","full_name":"ddelange/retrie","owner":"ddelange","description":"Efficient Trie-based regex unions for blacklist/whitelist filtering and one-pass mapping-based string replacing","archived":false,"fork":false,"pushed_at":"2025-02-01T19:00:36.000Z","size":79,"stargazers_count":69,"open_issues_count":0,"forks_count":6,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-28T17:07:48.006Z","etag":null,"topics":["blacklist","find-replace","regex","regexp","trie","whitelist"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ddelange.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-04-08T23:48:16.000Z","updated_at":"2025-02-20T19:03:41.000Z","dependencies_parsed_at":"2024-02-10T13:25:25.524Z","dependency_job_id":"618d2a2a-b225-4f8f-91de-68699223c582","html_url":"https://github.com/ddelange/retrie","commit_stats":{"total_commits":15,"total_committers":2,"mean_commits":7.5,"dds":0.06666666666666665,"last_synced_commit":"8b2b3cf633b6f1cf8c111a5a1c6df1e6f689bec2"},"previous_names":[],"tags_count":11,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ddelange%2Fretrie","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ddelange%2Fretrie/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ddelange%2Fretrie/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ddelange%2Fretrie/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ddelange","download_url":"https://codeload.github.com/ddelange/retrie/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247226213,"owners_count":20904465,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["blacklist","find-replace","regex","regexp","trie","whitelist"],"created_at":"2024-10-28T00:42:11.835Z","updated_at":"2025-04-04T18:04:56.147Z","avatar_url":"https://github.com/ddelange.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# retrie\n\n[![build](https://img.shields.io/github/actions/workflow/status/ddelange/retrie/main.yml?branch=master\u0026logo=github\u0026cacheSeconds=86400)](https://github.com/ddelange/retrie/actions?query=branch%3Amaster)\n[![codecov](https://img.shields.io/codecov/c/github/ddelange/retrie/master?logo=codecov\u0026logoColor=white)](https://codecov.io/gh/ddelange/retrie)\n[![pypi Version](https://img.shields.io/pypi/v/retrie.svg?logo=pypi\u0026logoColor=white)](https://pypi.org/project/retrie/)\n[![python](https://img.shields.io/pypi/pyversions/retrie.svg?logo=python\u0026logoColor=white)](https://pypi.org/project/retrie/)\n[![downloads](https://static.pepy.tech/badge/retrie)](https://pypistats.org/packages/retrie)\n[![black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/python/black)\n\n\n[retrie](https://github.com/ddelange/retrie) offers fast methods to match and replace (sequences of) strings based on efficient Trie-based regex unions.\n\n#### Trie\n\nInstead of matching against a simple regex union, which becomes slow for large sets of words, a more efficient regex pattern can be compiled using a [Trie](https://en.wikipedia.org/wiki/Trie) structure:\n\n```py\nfrom retrie.trie import Trie\n\n\ntrie = Trie()\n\ntrie.add(\"abc\", \"foo\", \"abs\")\nassert trie.pattern() == \"(?:ab[cs]|foo)\"  # equivalent to but faster than \"(?:abc|abs|foo)\"\n\ntrie.add(\"absolute\")\nassert trie.pattern() == \"(?:ab(?:c|s(?:olute)?)|foo)\"\n\ntrie.add(\"abx\")\nassert trie.pattern() == \"(?:ab(?:[cx]|s(?:olute)?)|foo)\"\n\ntrie.add(\"abxy\")\nassert trie.pattern() == \"(?:ab(?:c|s(?:olute)?|xy?)|foo)\"\n```\n\nA `Trie` may be populated with zero or more strings at instantiation or via `Trie.add`, from which method chaining is possible. Two instances can be merged with the `+` (new instance) and `+=` (in-place update) operators. Instances will compare equal if their data dictionaries are equal.\n\n```py\ntrie = Trie()\ntrie += Trie(\"abc\")\nassert (\n    trie + Trie().add(\"foo\")\n    == Trie(\"abc\", \"foo\")\n    == Trie(*[\"abc\", \"foo\"])\n    == Trie().add(*[\"abc\", \"foo\"])\n    == Trie().add(\"abc\", \"foo\")\n    == Trie().add(\"abc\").add(\"foo\")\n)\n```\n\n\n## Installation\n\nThis pure-Python, OS independent package is available on [PyPI](https://pypi.org/project/retrie):\n\n```sh\n$ pip install retrie\n```\n\n\n## Usage\n\n[![readthedocs](https://readthedocs.org/projects/retrie/badge/?version=latest)](https://retrie.readthedocs.io)\n\nFor documentation, see [retrie.readthedocs.io](https://retrie.readthedocs.io/en/stable/_code_reference/retrie.html).\n\nThe following objects are all subclasses of `retrie.retrie.Retrie`, which handles filling the Trie and compiling the corresponding regex pattern.\n\n\n#### Blacklist\n\nThe `Blacklist` object can be used to filter out bad occurences in a text or a sequence of strings:\n```py\nfrom retrie.retrie import Blacklist\n\n# check out docstrings and methods\nhelp(Blacklist)\n\nblacklist = Blacklist([\"abc\", \"foo\", \"abs\"], match_substrings=False)\nblacklist.compiled\n# re.compile(r'(?\u003c=\\b)(?:ab[cs]|foo)(?=\\b)', re.IGNORECASE|re.UNICODE)\nassert not blacklist.is_blacklisted(\"a foobar\")\nassert tuple(blacklist.filter((\"good\", \"abc\", \"foobar\"))) == (\"good\", \"foobar\")\nassert blacklist.cleanse_text((\"good abc foobar\")) == \"good  foobar\"\n\nblacklist = Blacklist([\"abc\", \"foo\", \"abs\"], match_substrings=True)\nblacklist.compiled\n# re.compile(r'(?:ab[cs]|foo)', re.IGNORECASE|re.UNICODE)\nassert blacklist.is_blacklisted(\"a foobar\")\nassert tuple(blacklist.filter((\"good\", \"abc\", \"foobar\"))) == (\"good\",)\nassert blacklist.cleanse_text((\"good abc foobar\")) == \"good  bar\"\n```\n\n\n#### Whitelist\n\nSimilar methods are available for the `Whitelist` object:\n```py\nfrom retrie.retrie import Whitelist\n\n# check out docstrings and methods\nhelp(Whitelist)\n\nwhitelist = Whitelist([\"abc\", \"foo\", \"abs\"], match_substrings=False)\nwhitelist.compiled\n# re.compile(r'(?\u003c=\\b)(?:ab[cs]|foo)(?=\\b)', re.IGNORECASE|re.UNICODE)\nassert not whitelist.is_whitelisted(\"a foobar\")\nassert tuple(whitelist.filter((\"bad\", \"abc\", \"foobar\"))) == (\"abc\",)\nassert whitelist.cleanse_text((\"bad abc foobar\")) == \"abc\"\n\nwhitelist = Whitelist([\"abc\", \"foo\", \"abs\"], match_substrings=True)\nwhitelist.compiled\n# re.compile(r'(?:ab[cs]|foo)', re.IGNORECASE|re.UNICODE)\nassert whitelist.is_whitelisted(\"a foobar\")\nassert tuple(whitelist.filter((\"bad\", \"abc\", \"foobar\"))) == (\"abc\", \"foobar\")\nassert whitelist.cleanse_text((\"bad abc foobar\")) == \"abcfoo\"\n```\n\n\n#### Replacer\n\nThe `Replacer` object does a fast single-pass search \u0026 replace for occurrences of `replacement_mapping.keys()` with corresponding values.\n```py\nfrom retrie.retrie import Replacer\n\n# check out docstrings and methods\nhelp(Replacer)\n\nreplacement_mapping = dict(zip([\"abc\", \"foo\", \"abs\"], [\"new1\", \"new2\", \"new3\"]))\n\nreplacer = Replacer(replacement_mapping, match_substrings=True)\nreplacer.compiled\n# re.compile(r'(?:ab[cs]|foo)', re.IGNORECASE|re.UNICODE)\nassert replacer.replace(\"ABS ...foo... foobar\") == \"new3 ...new2... new2bar\"\n\nreplacer = Replacer(replacement_mapping, match_substrings=False)\nreplacer.compiled\n# re.compile(r'\\b(?:ab[cs]|foo)\\b', re.IGNORECASE|re.UNICODE)\nassert replacer.replace(\"ABS ...foo... foobar\") == \"new3 ...new2... foobar\"\n\nreplacer = Replacer(replacement_mapping, match_substrings=False, re_flags=None)\nreplacer.compiled  # on py3, re.UNICODE is always enabled\n# re.compile(r'\\b(?:ab[cs]|foo)\\b')\nassert replacer.replace(\"ABS ...foo... foobar\") == \"ABS ...new2... foobar\"\n\nreplacer = Replacer(replacement_mapping, match_substrings=False, word_boundary=\" \")\nreplacer.compiled\n# re.compile(r'(?\u003c= )(?:ab[cs]|foo)(?= )', re.IGNORECASE|re.UNICODE)\nassert replacer.replace(\". ABS ...foo... foobar\") == \". new3 ...foo... foobar\"\n```\n\n\n## Development\n\n[![gitmoji](https://img.shields.io/badge/gitmoji-%20%F0%9F%98%9C%20%F0%9F%98%8D-ffdd67)](https://github.com/carloscuesta/gitmoji-cli)\n[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit\u0026logoColor=white)](https://github.com/pre-commit/pre-commit)\n\nRun `make help` for options like installing for development, linting and testing.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fddelange%2Fretrie","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fddelange%2Fretrie","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fddelange%2Fretrie/lists"}