{"id":49225442,"url":"https://github.com/tonicai/textual-haystack","last_synced_at":"2026-04-24T07:01:58.358Z","repository":{"id":345534126,"uuid":"1186289845","full_name":"TonicAI/textual-haystack","owner":"TonicAI","description":"Tonic Textual integration for Haystack.  Document sanitization and entity extraction","archived":false,"fork":false,"pushed_at":"2026-03-19T13:53:53.000Z","size":122,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-03-20T06:28:34.279Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TonicAI.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-19T13:23:23.000Z","updated_at":"2026-03-19T13:53:10.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/TonicAI/textual-haystack","commit_stats":null,"previous_names":["tonicai/textual-haystack"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/TonicAI/textual-haystack","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TonicAI%2Ftextual-haystack","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TonicAI%2Ftextual-haystack/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TonicAI%2Ftextual-haystack/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TonicAI%2Ftextual-haystack/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TonicAI","download_url":"https://codeload.github.com/TonicAI/textual-haystack/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TonicAI%2Ftextual-haystack/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32212808,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-24T03:15:14.334Z","status":"ssl_error","status_checked_at":"2026-04-24T03:15:11.608Z","response_time":64,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-04-24T07:01:57.115Z","updated_at":"2026-04-24T07:01:58.351Z","avatar_url":"https://github.com/TonicAI.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# textual-haystack\n\n[![PyPI version](https://img.shields.io/pypi/v/textual-haystack)](https://pypi.org/project/textual-haystack/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://github.com/tonicai/textual-haystack/blob/main/LICENSE)\n[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)\n\nPII detection, transformation, and entity extraction components for [Haystack](https://haystack.deepset.ai/), powered by [Tonic Textual](https://textual.tonic.ai).\n\nDetect sensitive data in documents, extract the raw entities for auditing or custom logic, or synthesize and tokenize PII before ingestion. Drop these components into any Haystack pipeline.\n\n## Installation\n\n```bash\npip install textual-haystack\n```\n\n## Components\n\n| Component | Purpose |\n|-----------|---------|\n| `TonicTextualEntityExtractor` | Extract PII entities with type, value, location, and confidence score |\n| `TonicTextualDocumentCleaner` | Synthesize or tokenize PII in document content |\n\n## Quick start\n\n```bash\nexport TONIC_TEXTUAL_API_KEY=\"your-api-key\"\n```\n\n### Entity extraction\n\n```python\nfrom haystack.dataclasses import Document\nfrom haystack_integrations.components.tonic_textual import (\n    TonicTextualEntityExtractor,\n)\n\nextractor = TonicTextualEntityExtractor()\nresult = extractor.run(\n    documents=[Document(content=\"My name is John Smith and my email is john@example.com\")]\n)\n\nfor entity in TonicTextualEntityExtractor.get_stored_annotations(result[\"documents\"][0]):\n    print(f\"{entity.entity}: {entity.text} (confidence: {entity.score:.2f})\")\n# NAME_GIVEN: John (confidence: 0.90)\n# NAME_FAMILY: Smith (confidence: 0.90)\n# EMAIL_ADDRESS: john@example.com (confidence: 0.95)\n```\n\n### Document cleaning\n\n```python\nfrom haystack.dataclasses import Document\nfrom haystack_integrations.components.tonic_textual import (\n    TonicTextualDocumentCleaner,\n)\n\n# Synthesize PII with realistic fakes\ncleaner = TonicTextualDocumentCleaner(generator_default=\"Synthesis\")\nresult = cleaner.run(\n    documents=[Document(content=\"Contact John Smith at john@example.com\")]\n)\nprint(result[\"documents\"][0].content)\n# \"Contact Maria Chen at maria.chen@gmail.com\"\n```\n\nPer-entity control — mix synthesis and tokenization per PII type:\n\n```python\ncleaner = TonicTextualDocumentCleaner(\n    generator_default=\"Off\",\n    generator_config={\n        \"NAME_GIVEN\": \"Synthesis\",\n        \"NAME_FAMILY\": \"Synthesis\",\n        \"EMAIL_ADDRESS\": \"Redaction\",\n    },\n)\n```\n\n### In a pipeline\n\n```python\nfrom haystack import Pipeline\nfrom haystack.dataclasses import Document\nfrom haystack_integrations.components.tonic_textual import (\n    TonicTextualDocumentCleaner,\n    TonicTextualEntityExtractor,\n)\n\npipeline = Pipeline()\npipeline.add_component(\"cleaner\", TonicTextualDocumentCleaner(generator_default=\"Synthesis\"))\npipeline.add_component(\"extractor\", TonicTextualEntityExtractor())\npipeline.connect(\"cleaner\", \"extractor\")\n\nresult = pipeline.run({\n    \"cleaner\": {\n        \"documents\": [\n            Document(content=\"Contact Jane Doe at jane@example.com\"),\n        ]\n    }\n})\n```\n\n## Configuration\n\n**Self-hosted deployment:**\n\n```python\nextractor = TonicTextualEntityExtractor(\n    base_url=\"https://textual.your-company.com\"\n)\n```\n\n**Explicit API key:**\n\n```python\nfrom haystack.utils.auth import Secret\n\nextractor = TonicTextualEntityExtractor(\n    api_key=Secret.from_token(\"your-api-key\")\n)\n```\n\n## Development\n\n```bash\n# install dependencies\nuv sync --group dev --group test --group lint --group typing\n\n# install pre-commit hooks (auto-runs ruff on each commit)\nuv tool install pre-commit\npre-commit install\n\n# run unit tests\nmake test\n\n# run integration tests (requires TONIC_TEXTUAL_API_KEY)\nmake integration_tests\n\n# lint \u0026 format\nmake lint\nmake format\n```\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftonicai%2Ftextual-haystack","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftonicai%2Ftextual-haystack","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftonicai%2Ftextual-haystack/lists"}