{"id":30434296,"url":"https://github.com/spaceshaman/deckard","last_synced_at":"2025-08-22T23:32:43.303Z","repository":{"id":310315481,"uuid":"1035601650","full_name":"SpaceShaman/deckard","owner":"SpaceShaman","description":"Extract structured data from unstructured text — no AI, just regular expressions. 🔍","archived":false,"fork":false,"pushed_at":"2025-08-17T09:23:54.000Z","size":34,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-08-17T09:28:25.578Z","etag":null,"topics":["data-extraction","extract","extract-data","regex","regular-expression"],"latest_commit_sha":null,"homepage":"https://pypi.org/project/deckard","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SpaceShaman.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-08-10T18:36:48.000Z","updated_at":"2025-08-17T09:21:17.000Z","dependencies_parsed_at":"2025-08-17T09:28:33.522Z","dependency_job_id":"0e3cf108-40e5-45f8-8e93-6f6f66f75904","html_url":"https://github.com/SpaceShaman/deckard","commit_stats":null,"previous_names":["spaceshaman/deckard"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/SpaceShaman/deckard","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SpaceShaman%2Fdeckard","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SpaceShaman%2Fdeckard/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SpaceShaman%2Fdeckard/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SpaceShaman%2Fdeckard/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SpaceShaman","download_url":"https://codeload.github.com/SpaceShaman/deckard/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SpaceShaman%2Fdeckard/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271717117,"owners_count":24808590,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-22T02:00:08.480Z","response_time":65,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-extraction","extract","extract-data","regex","regular-expression"],"created_at":"2025-08-22T23:32:38.740Z","updated_at":"2025-08-22T23:32:43.292Z","avatar_url":"https://github.com/SpaceShaman.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003eDeckard 🕵️‍♂️\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003eExtract structured data from unstructured text — no AI, just regular expressions. 🔍\u003c/p\u003e\n\n[![GitHub License](https://img.shields.io/github/license/SpaceShaman/deckard)](https://github.com/SpaceShaman/deckard?tab=MIT-1-ov-file)\n[![Tests](https://img.shields.io/github/actions/workflow/status/SpaceShaman/deckard/release.yml?label=tests)](https://app.codecov.io/github/SpaceShaman/deckard)\n[![Codecov](https://img.shields.io/codecov/c/github/SpaceShaman/deckard)](https://app.codecov.io/github/SpaceShaman/deckard)\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/deckard)](https://pypi.org/project/deckard)\n[![PyPI - Version](https://img.shields.io/pypi/v/deckard)](https://pypi.org/project/deckard)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-black)](https://github.com/psf/black)\n[![Linting: Ruff](https://img.shields.io/badge/linting-Ruff-black?logo=ruff\u0026logoColor=black)](https://github.com/astral-sh/ruff)\n[![Pytest](https://img.shields.io/badge/testing-Pytest-red?logo=pytest\u0026logoColor=red)](https://docs.pytest.org/)\n\nDeckard is a library of regular-expression patterns for extracting structured data (addresses, phone numbers, email addresses, etc.) and a small set of helper utilities that make using those patterns easier.\n\n\u003e [!IMPORTANT]\n\u003e Status: very early-stage project. Right now the repository contains mostly patterns for Poland. I am looking for contributors from around the world 🌍 — address formats, phone-number formats and other data representations differ by country, so the goal is to gather country-specific patterns for many regions.\n\n## Key features ✨\n\n- 🗂️ A collection of ready-to-use regex patterns organized by country (for example [`deckard/patterns/pl.py`](./deckard/patterns/pl.py)).\n- 📦 Universal patterns (e.g. email) live in [`deckard/patterns/standard.py`](./deckard/patterns/standard.py).\n- 🛠️ A small helper function `deckard.search` that combines multiple patterns and returns named-group matches ([deckard/main.py](./deckard/main.py)).\n\n## Installation ⚙️\n\nFrom PyPI:\n\n```bash\npip install deckard\n```\n\nEditable / local development install:\n\n```bash\npip install -e .\n```\n\n### For contributors — install dependencies with Poetry 🧑‍💻\n\nThis project uses Poetry to manage dependencies and development dependencies.\n\n1. Install Poetry (see https://python-poetry.org for instructions).\n2. From the project root run:\n\n```bash\npoetry install\n```\n\nThis will create a virtual environment and install runtime and development dependencies (including `pytest`).\n\nTo run tests using Poetry:\n\n```bash\npoetry run pytest\n```\n\nOr start a shell in the created virtualenv and run tests directly:\n\n```bash\npoetry shell\npytest\n```\n\n## Quick usage 🧭\n\nExample using the current public API:\n\n```python\nfrom deckard import search\nfrom deckard.patterns import standard, pl\n\ntext = (\n    \"Hello, my email is spaceshaman@tuta.io and my phone number is \"\n    \"+48 792 321 321 and my address is ul. Tesotowa 12/6A, 66-700 Bielsko-Biała.\"\n)\n\nresult = search([standard.EMAIL, pl.MOBILE_PHONE, pl.ADDRESS], text)\n\n# result.groupdict() will return a dict of named groups, for example:\n# {\n#   'email': 'spaceshaman@tuta.io',\n#   'mobile_phone': '792 321 321',\n#   'street': 'ul. Tesotowa',\n#   'building': '12',\n#   'apartment': '6A',\n#   'zip_code': '66-700',\n#   'city': 'Bielsko-Biała'\n# }\n```\n\nThe `search` helper composes the provided patterns into a single regex (using lookaheads) and returns the first match as a `regex.Match` object (or `None` if nothing matched).\n\n## Repository layout\n\n- [`deckard/`](./deckard/) — library code\n  - [`deckard/main.py`](./deckard/main.py) — helper `search` function\n  - [`deckard/patterns/standard.py`](./deckard/patterns/standard.py) — universal patterns (e.g. `EMAIL`)\n  - [`deckard/patterns/pl.py`](./deckard/patterns/pl.py) — Poland-specific patterns (address, postal code, phone, etc.)\n- [`tests/`](./tests/) — unit tests\n\nExamples of existing tests:\n- [`tests/test_standard_patterns.py`](./tests/test_standard_patterns.py) — test for `standard.EMAIL`\n- [`tests/test_search_with_multiple_patterns.py`](./tests/test_search_with_multiple_patterns.py) — integration tests combining `standard.EMAIL` with patterns from `pl.py`\n- [`tests/pl/test_search_address_pl.py`](./tests/pl/test_search_address_pl.py) — tests for Polish address patterns\n\nEvery new pattern must come with tests. Pull requests without tests will not be accepted.\n\n## Contributing — how to add new patterns\n\n1. Create a new file under [`deckard/patterns/`](./deckard/patterns/) named by the country code, e.g. `us.py`, `de.py`, `fr.py`.\n2. Define constants (UPPERCASE) for each pattern, for example `MOBILE_PHONE`, `ADDRESS`, `ZIP_CODE`.\n3. Add tests under `tests/`. Use the existing Polish tests (e.g. `tests/test_search_with_multiple_patterns.py`) as a template. Provide normal and edge-case examples.\n4. In the PR description explain local rules (phone number format, postal code format, common street abbreviations, etc.).\n5. PRs without tests will not be accepted.\n\nTips 💡:\n- 🧾 Use clear, consistent named groups in regexes (`?P\u003cname\u003e`) so `groupdict()` returns a predictable structure.\n- 📝 Document complex patterns with comments and example inputs if necessary.\n\n## Discussion and roadmap 🚧\n\nThe project is not yet final — everything is open for discussion. Areas for contributors and discussion include:\n\n- 📋 Defining a minimal set of patterns every country should provide (email, phone, address, postal code, national ID where applicable).\n- 🔠 Standardizing group names (`street`, `building`, `apartment`, `zip_code`, `city`, `country`, `mobile_phone`, etc.).\n- ⚖️ Tools for validation and normalization of extracted values.\n- 🤖 Automating tests with sample documents in various languages.\n\nIf you want to help, open an issue or a PR — a short description of the local data format and one or two patterns with tests is a great place to start.\n\n## License 📄\n\nThis project is licensed under the MIT License. See the [LICENSE](./LICENSE) file for the full text.\n\n---\n\nThanks for your interest — please join the effort. Together we can build an international library of patterns to extract structured data from arbitrary text using robust regular expressions. 🚀\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fspaceshaman%2Fdeckard","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fspaceshaman%2Fdeckard","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fspaceshaman%2Fdeckard/lists"}