{"id":15014209,"url":"https://github.com/jenojp/extractacy","last_synced_at":"2025-07-05T18:33:02.441Z","repository":{"id":39789904,"uuid":"244012020","full_name":"jenojp/extractacy","owner":"jenojp","description":"Spacy pipeline object for extracting values that correspond to a named entity (e.g., birth dates, account numbers, laboratory results)","archived":false,"fork":false,"pushed_at":"2022-05-25T20:22:11.000Z","size":127,"stargazers_count":54,"open_issues_count":1,"forks_count":9,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-05-12T15:58:10.029Z","etag":null,"topics":["entity-extraction","entity-linking","ner","nlp","pattern-matching","spacy","spacy-extension","spacy-pipeline"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jenojp.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-02-29T17:33:12.000Z","updated_at":"2024-12-31T19:59:40.000Z","dependencies_parsed_at":"2022-09-01T22:31:54.023Z","dependency_job_id":null,"html_url":"https://github.com/jenojp/extractacy","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"purl":"pkg:github/jenojp/extractacy","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jenojp%2Fextractacy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jenojp%2Fextractacy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jenojp%2Fextractacy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jenojp%2Fextractacy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jenojp","download_url":"https://codeload.github.com/jenojp/extractacy/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jenojp%2Fextractacy/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259910788,"owners_count":22930719,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["entity-extraction","entity-linking","ner","nlp","pattern-matching","spacy","spacy-extension","spacy-pipeline"],"created_at":"2024-09-24T19:45:19.549Z","updated_at":"2025-07-05T18:33:02.422Z","avatar_url":"https://github.com/jenojp.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"\n\u003cp align=\"center\"\u003e\u003cimg width=\"40%\" src=\"docs/icon.png\" /\u003e\u003c/p\u003e\n\n# extractacy - pattern extraction and named entity linking for spaCy\n[![Build Status](https://dev.azure.com/jenopizzaro/extractacy/_apis/build/status/jenojp.extractacy?branchName=master)](https://dev.azure.com/jenopizzaro/extractacy/_build/latest?definitionId=3\u0026branchName=master) [![Built with spaCy](https://img.shields.io/badge/made%20with%20❤%20and-spaCy-09a3d5.svg)](https://spacy.io) [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg?style=flat-square)](https://github.com/ambv/black) ![pypi Version](https://img.shields.io/pypi/v/extractacy.svg?style=flat-square) [![DOI](https://zenodo.org/badge/244012020.svg)](https://zenodo.org/badge/latestdoi/244012020)\n\nspaCy pipeline object for extracting values that correspond to a named entity (e.g., birth dates, account numbers, or laboratory results)\n\n## Installation and usage\nInstall the library.\n```bash\npip install extractacy\n```\n\nImport library and spaCy.\n```python\nimport spacy\nfrom spacy.pipeline import EntityRuler\nfrom extractacy.extract import ValueExtractor\n```\n\nLoad spacy language model. Set up an EntityRuler for the example. \n\n```python\nnlp = spacy.load(\"en_core_web_sm\")\n# Set up entity ruler\nruler = nlp.add_pipe(\"entity_ruler\")\npatterns = [\n    {\"label\": \"TEMP_READING\", \"pattern\": [{\"LOWER\": \"temperature\"}]},\n    {\"label\": \"TEMP_READING\", \"pattern\": [{\"LOWER\": \"temp\"}]},\n    {\n        \"label\": \"DISCHARGE_DATE\",\n        \"pattern\": [{\"LOWER\": \"discharge\"}, {\"LOWER\": \"date\"}],\n    },\n    \n]\nruler.add_patterns(patterns)\n```\n\nDefine which entities you would like to link patterns to. Each entity needs 3 things:\n1) patterns to search for (list). This relies on [spaCy token matching syntax](https://spacy.io/usage/rule-based-matching#matcher).\n2) n_tokens to search around a named entity (`int` or `sent`)\n3) direction (`right`, `left`, `both`)\n\n```python\n# Define ent_patterns for value extraction\nent_patterns = {\n    \"DISCHARGE_DATE\": {\"patterns\": [[{\"SHAPE\": \"dd/dd/dddd\"}],[{\"SHAPE\": \"dd/d/dddd\"}]],\"n\": 2, \"direction\": \"right\"},\n    \"TEMP_READING\": {\"patterns\": [[\n                        {\"LIKE_NUM\": True},\n                        {\"LOWER\": {\"IN\": [\"f\", \"c\", \"farenheit\", \"celcius\", \"centigrade\", \"degrees\"]}\n                        },\n                    ]\n                ],\n                \"n\": \"sent\",\n                \"direction\": \"both\"\n        },\n}\n```\n\nAdd ValueExtractor to spaCy processing pipeline\n\n```python\nnlp.add_pipe(\"valext\", config={\"ent_patterns\":ent_patterns}, last=True)\n\ndoc = nlp(\"Discharge Date: 11/15/2008. Patient had temp reading of 102.6 degrees.\")\nfor e in doc.ents:\n    if e._.value_extract:\n        print(e.text, e.label_, e._.value_extract)\n        \n## Discharge Date DISCHARGE_DATE 11/15/2008\n## temp reading TEMP_READING 102.6 degrees\n```\n\n## Contributing\n[contributing](https://github.com/jenojp/negspacy/blob/master/CONTRIBUTING.md)\n\n## Authors\n* Jeno Pizarro\n\n## License\n[license](https://github.com/jenojp/extractacy/blob/master/LICENSE)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjenojp%2Fextractacy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjenojp%2Fextractacy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjenojp%2Fextractacy/lists"}