{"id":15284141,"url":"https://github.com/ub-mannheim/spacyopentapioca","last_synced_at":"2025-08-20T08:32:16.817Z","repository":{"id":46227402,"uuid":"404670326","full_name":"UB-Mannheim/spacyopentapioca","owner":"UB-Mannheim","description":"A spaCy wrapper of OpenTapioca for named entity linking on Wikidata","archived":false,"fork":false,"pushed_at":"2023-04-01T09:04:57.000Z","size":1615,"stargazers_count":92,"open_issues_count":2,"forks_count":8,"subscribers_count":8,"default_branch":"main","last_synced_at":"2024-12-19T09:07:07.022Z","etag":null,"topics":["entity-linking","named-entity-linking","spacy","spacy-extensions","spacy-pipeline","wikidata"],"latest_commit_sha":null,"homepage":"https://ub-mannheim.github.io/spacyopentapioca","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/UB-Mannheim.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-09-09T09:57:45.000Z","updated_at":"2024-12-02T18:35:45.000Z","dependencies_parsed_at":"2022-09-19T07:31:57.342Z","dependency_job_id":null,"html_url":"https://github.com/UB-Mannheim/spacyopentapioca","commit_stats":null,"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UB-Mannheim%2Fspacyopentapioca","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UB-Mannheim%2Fspacyopentapioca/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UB-Mannheim%2Fspacyopentapioca/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UB-Mannheim%2Fspacyopentapioca/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/UB-Mannheim","download_url":"https://codeload.github.com/UB-Mannheim/spacyopentapioca/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":230408170,"owners_count":18220974,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["entity-linking","named-entity-linking","spacy","spacy-extensions","spacy-pipeline","wikidata"],"created_at":"2024-09-30T14:50:04.665Z","updated_at":"2024-12-19T09:07:10.387Z","avatar_url":"https://github.com/UB-Mannheim.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# spaCyOpenTapioca\n\n[![PyPI version](https://badge.fury.io/py/spacyopentapioca.svg)](https://badge.fury.io/py/spacyopentapioca) \u003ca href=\"https://ub-mannheim.github.io/spacyopentapioca\"\u003e\u003cimg src=\"https://img.shields.io/badge/docs-JB-green.svg\"/\u003e\u003c/a\u003e \u003ca href=\"https://mybinder.org/v2/gh/UB-Mannheim/spacyopentapioca/main?urlpath=tree/docs/docs/demo.ipynb\"\u003e\u003cimg src=\"https://img.shields.io/badge/launch-binder-blue.svg\"/\u003e\u003c/a\u003e\n\nA [spaCy](https://spacy.io) wrapper of [OpenTapioca](https://opentapioca.org) for named entity linking on Wikidata.\n\n## Table of contents\n* [Installation](#installation)\n* [How to use](#how-to-use)\n* [Local OpenTapioca](#local-opentapioca)\n* [Vizualization](#vizualization)\n\n## Installation\n\n```shell\npip install spacyopentapioca\n```\n\nor\n```shell\ngit clone https://github.com/UB-Mannheim/spacyopentapioca\ncd spacyopentapioca/\npip install .\n```\n\n## How to use\n\nAfter installation the OpenTapioca pipeline can be used without any other pipelines:\n```python\nimport spacy\nnlp = spacy.blank(\"en\")\nnlp.add_pipe('opentapioca')\ndoc = nlp(\"Christian Drosten works in Germany.\")\nfor span in doc.ents:\n    print((span.text, span.kb_id_, span.label_, span._.description, span._.score))\n```\n```shell\n('Christian Drosten', 'Q1079331', 'PERSON', 'German virologist and university teacher', 3.6533377082098895)\n('Germany', 'Q183', 'LOC', 'sovereign state in Central Europe', 2.1099332471902863)\n```\n\nThe types and aliases are also available:\n```python\nfor span in doc.ents:\n    print((span._.types, span._.aliases[0:5]))\n```\n```shell\n({'Q43229': False, 'Q618123': False, 'Q5': True, 'P2427': False, 'P1566': False, 'P496': True}, ['كريستيان دروستين', 'Крістіан Дростен', 'Christian Heinrich Maria Drosten', 'کریستین دروستن', '크리스티안 드로스텐'])\n({'Q43229': True, 'Q618123': True, 'Q5': False, 'P2427': False, 'P1566': True, 'P496': False}, ['IJalimani', 'R. F. A.', 'Alemania', '도이칠란트', 'Germaniya'])\n```\n\nThe Wikidata QIDs are attached to tokens:\n```python\nfor token in doc:\n    print((token.text, token.ent_kb_id_))\n```\n```shell\n('Christian', 'Q1079331')\n('Drosten', 'Q1079331')\n('works', '')\n('in', '')\n('Germany', 'Q183')\n('.', '')\n```\n\nThe raw response of the OpenTapioca API can be accessed in the doc- and span-objects:\n```python\nraw_annotations1 = doc._.annotations\nraw_annotations2 = [span._.annotations for span in doc.ents]\n```\n\nThe partial metadata for the response returned by the OpenTapioca API is\n```python\ndoc._.metadata\n```\n\nAll span-extensions are:\n```python\nspan._.annotations\nspan._.description\nspan._.aliases\nspan._.rank\nspan._.score\nspan._.types\nspan._.label\nspan._.extra_aliases\nspan._.nb_sitelinks\nspan._.nb_statements\n```\n\nNote that spaCyOpenTapioca does a tiny processing of entities appearing in `doc.ents`. All entities returned by OpenTapioca can be found in `doc.spans['all_entities_opentapioca']`.\n### Batching\n\nBatched asynchronous requests to the OpenTapioca API via `nlp.pipe(List[str])`:\n```python\nimport spacy\nnlp = spacy.blank(\"en\")\nnlp.add_pipe('opentapioca')\ndocs = nlp.pipe(\n    [\n        \"Christian Drosten works in Germany.\",\n        \"Momofuku Ando was born in Japan.\".\n    ]\n)\nfor doc in docs:\n    for span in doc.ents:\n        print((span.text, span.kb_id_, span.label_, span._.description, span._.score))\n\n```\n```shell\n('Christian Drosten', 'Q1079331', 'PERSON', 'German virologist and university teacher', 3.6533377082098895)\n('Germany', 'Q183', 'LOC', 'sovereign state in Central Europe', 2.1099332471902863)\n('Momofuku Ando', 'Q317858', 'PERSON', 'Taiwanese-Japanese businessman', 3.6012208212234302)\n('Japan', 'Q17', 'LOC', 'sovereign state in East Asia, situated on an archipelago of five main and over 6,800 smaller islands', 2.349944834167907)\n```\n\n## Local OpenTapioca\n\nIf OpenTapioca is deployed locally, specify the URL of the new OpenTapioca API in the config:\n```python\nimport spacy\nnlp = spacy.blank(\"en\")\nnlp.add_pipe('opentapioca', config={\"url\": OpenTapiocaAPI})\ndoc = nlp(\"Christian Drosten works in Germany.\")\n```\n## Vizualization\n\nNEL vizualization is added to spaCy via [pull request 9199](https://github.com/explosion/spaCy/pull/9199) for [issue 9129](https://github.com/explosion/spaCy/issues/9129). It is supported by spaCy \u003e= 3.1.4.\n\nUse manual option in displaCy:\n```python\nimport spacy\nnlp = spacy.blank(\"en\")\nnlp.add_pipe('opentapioca')\ndoc = nlp(\"Christian Drosten works\\n in Charité, Germany.\")\nparams = {\"text\": doc.text,\n          \"ents\": [{\"start\": ent.start_char,\n                    \"end\": ent.end_char,\n                    \"label\": ent.label_,\n                    \"kb_id\": ent.kb_id_,\n                    \"kb_url\": \"https://www.wikidata.org/entity/\" + ent.kb_id_}\n                   for ent in doc.ents],\n          \"title\": None}\nspacy.displacy.serve(params, style=\"ent\", manual=True)\n```\nThe visualizer is serving on http://0.0.0.0:5000\n\n![alt text](https://github.com/UB-Mannheim/spacyopentapioca/blob/main/images/nel_vizualization.png)\n\nIn Jupyter Notebook replace `spacy.displacy.serve` by `spacy.displacy.render`.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fub-mannheim%2Fspacyopentapioca","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fub-mannheim%2Fspacyopentapioca","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fub-mannheim%2Fspacyopentapioca/lists"}