{"id":15014161,"url":"https://github.com/yash1994/dframcy","last_synced_at":"2025-04-12T06:05:11.484Z","repository":{"id":62567958,"uuid":"211251685","full_name":"yash1994/dframcy","owner":"yash1994","description":"Dataframe Integration with spaCy.","archived":false,"fork":false,"pushed_at":"2021-03-12T05:01:59.000Z","size":183,"stargazers_count":103,"open_issues_count":0,"forks_count":4,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-03-26T01:24:26.576Z","etag":null,"topics":["dataframe","pandas-dataframe","python3","spacy","spacy-extension","spacy-pipeline"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/yash1994.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-09-27T06:32:38.000Z","updated_at":"2024-12-31T19:50:06.000Z","dependencies_parsed_at":"2022-11-03T16:30:41.296Z","dependency_job_id":null,"html_url":"https://github.com/yash1994/dframcy","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yash1994%2Fdframcy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yash1994%2Fdframcy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yash1994%2Fdframcy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yash1994%2Fdframcy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/yash1994","download_url":"https://codeload.github.com/yash1994/dframcy/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247615482,"owners_count":20967182,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dataframe","pandas-dataframe","python3","spacy","spacy-extension","spacy-pipeline"],"created_at":"2024-09-24T19:45:16.606Z","updated_at":"2025-04-12T06:05:11.449Z","avatar_url":"https://github.com/yash1994.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DframCy\n\n[![Package Version](https://img.shields.io/pypi/v/dframcy.svg?\u0026service=github)](https://pypi.python.org/pypi/dframcy/)\n[![Python 3.6](https://img.shields.io/badge/python-3.6-blue.svg)](https://www.python.org/downloads/release/python-360/)\n[![Build Status](https://travis-ci.org/yash1994/dframcy.svg?branch=master)](https://travis-ci.org/yash1994/dframcy) \n[![codecov](https://codecov.io/gh/yash1994/dframcy/branch/master/graph/badge.svg)](https://codecov.io/gh/yash1994/dframcy)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/ambv/black)\n\nDframCy is a light-weight utility module to integrate Pandas Dataframe to spaCy's linguistic annotation and training tasks. DframCy provides clean APIs to convert spaCy's linguistic annotations, Matcher and PhraseMatcher information to Pandas dataframe, also supports training and evaluation of NLP pipeline from CSV/XLXS/XLS without any changes to spaCy's underlying APIs.\n\n## Getting Started\n\nDframCy can be easily installed. Just need to the following:\n\n### Requirements\n\n* Python 3.6 or later\n* Pandas\n* spaCy \u003e= 3.0.0\n\nAlso need to download spaCy's language model:\n\n```bash\npython -m spacy download en_core_web_sm\n```\n\nFor more information refer to: [Models \u0026 Languages](https://spacy.io/usage/models)\n\n### Installation:\n\nThis package can be installed from [PyPi](https://pypi.org/project/dframcy/) by running:\n\n```bash\npip install dframcy\n```\n\nTo build from source:\n\n```bash\ngit clone https://github.com/yash1994/dframcy.git\ncd dframcy\npython setup.py install\n```\n\n## Usage\n\n### Linguistic Annotations\n\nGet linguistic annotation in the dataframe. For linguistic annotations (dataframe column names) refer to [spaCy's Token API](https://spacy.io/api/token) document.\n\n```python\nimport spacy\nfrom dframcy import DframCy\n\nnlp = spacy.load(\"en_core_web_sm\")\n\ndframcy = DframCy(nlp)\ndoc = dframcy.nlp(u\"Apple is looking at buying U.K. startup for $1 billion\")\n\n# default columns: [\"id\", \"text\", \"start\", \"end\", \"pos_\", \"tag_\", \"dep_\", \"head\", \"ent_type_\"]\nannotation_dataframe = dframcy.to_dataframe(doc)\n\n# can also pass columns names (spaCy's linguistic annotation attributes)\nannotation_dataframe = dframcy.to_dataframe(doc, columns=[\"text\", \"lemma_\", \"lower_\", \"is_punct\"])\n\n# for separate entity dataframe\ntoken_annotation_dataframe, entity_dataframe = dframcy.to_dataframe(doc, separate_entity_dframe=True)\n\n# custom attributes can also be included\nfrom spacy.tokens import Token\nfruit_getter = lambda token: token.text in (\"apple\", \"pear\", \"banana\")\nToken.set_extension(\"is_fruit\", getter=fruit_getter)\ndoc = dframcy.nlp(u\"I have an apple\")\n\nannotation_dataframe = dframcy.to_dataframe(doc, custom_attributes=[\"is_fruit\"])\n```\n\n### Rule-Based Matching\n\n```python\n# Token-based Matching\nimport spacy\n\nnlp = spacy.load(\"en_core_web_sm\")\n\nfrom dframcy.matcher import DframCyMatcher, DframCyPhraseMatcher, DframCyDependencyMatcher\ndframcy_matcher = DframCyMatcher(nlp)\npattern = [{\"LOWER\": \"hello\"}, {\"IS_PUNCT\": True}, {\"LOWER\": \"world\"}]\ndframcy_matcher.add(\"HelloWorld\", [pattern])\ndoc = dframcy_matcher.nlp(\"Hello, world! Hello world!\")\nmatches_dataframe = dframcy_matcher(doc)\n\n# Phrase Matching\ndframcy_phrase_matcher = DframCyPhraseMatcher(nlp)\nterms = [u\"Barack Obama\", u\"Angela Merkel\",u\"Washington, D.C.\"]\npatterns = [dframcy_phrase_matcher.nlp.make_doc(text) for text in terms]\ndframcy_phrase_matcher.add(\"TerminologyList\", patterns)\ndoc = dframcy_phrase_matcher.nlp(u\"German Chancellor Angela Merkel and US President Barack Obama \"\n                                u\"converse in the Oval Office inside the White House in Washington, D.C.\")\nphrase_matches_dataframe = dframcy_phrase_matcher(doc)\n\n# Dependency Matching\ndframcy_dependency_matcher = DframCyDependencyMatcher(nlp)\npattern = [{\"RIGHT_ID\": \"founded_id\", \"RIGHT_ATTRS\": {\"ORTH\": \"founded\"}}]\ndframcy_dependency_matcher.add(\"FOUNDED\", [pattern])\ndoc = dframcy_dependency_matcher.nlp(u\"Bill Gates founded Microsoft. And Elon Musk founded SpaceX\")\ndependency_matches_dataframe = dframcy_dependency_matcher(doc)\n```\n\n### Command Line Interface\n\nDframcy supports command-line arguments for the conversion of a plain text file to linguistically annotated text in CSV/JSON format.\nPrevious versions of Dframcy were used to support CLI utilities for training and evaluation of spaCy models from CSV/XLS files.\nAfter the [v3](https://spacy.io/usage/v3) release, spaCy's training pipeline has become much more flexible and robust so didn't want to introduce additional\nstep using Dframcy for just format conversion (CSV/XLS to [spaCy’s binary format](https://spacy.io/api/data-formats#binary-training)).\n\n```bash\n# convert\ndframcy dframe -i plain_text.txt -o annotations.csv -f csv\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyash1994%2Fdframcy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyash1994%2Fdframcy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyash1994%2Fdframcy/lists"}