{"id":16857938,"url":"https://github.com/wetneb/pynif","last_synced_at":"2025-03-22T06:31:13.516Z","repository":{"id":47304676,"uuid":"166817082","full_name":"wetneb/pynif","owner":"wetneb","description":"A small Python library for NLP Interchange Format (NIF)  for NER(D) systems","archived":false,"fork":false,"pushed_at":"2023-02-09T16:03:26.000Z","size":63,"stargazers_count":19,"open_issues_count":2,"forks_count":5,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-20T04:14:57.727Z","etag":null,"topics":["entity-linking","gerbil","named-entity-recognition","nif","nlp","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/wetneb.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2019-01-21T13:16:29.000Z","updated_at":"2023-11-28T18:47:33.000Z","dependencies_parsed_at":"2022-09-14T05:00:44.130Z","dependency_job_id":null,"html_url":"https://github.com/wetneb/pynif","commit_stats":{"total_commits":56,"total_committers":3,"mean_commits":"18.666666666666668","dds":0.0892857142857143,"last_synced_commit":"6b6489e18b3e669e32b588a61557762959430a3a"},"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wetneb%2Fpynif","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wetneb%2Fpynif/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wetneb%2Fpynif/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wetneb%2Fpynif/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/wetneb","download_url":"https://codeload.github.com/wetneb/pynif/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244918500,"owners_count":20531682,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["entity-linking","gerbil","named-entity-recognition","nif","nlp","python"],"created_at":"2024-10-13T14:10:45.878Z","updated_at":"2025-03-22T06:31:12.081Z","avatar_url":"https://github.com/wetneb.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# pynif [![Python tests](https://github.com/wetneb/pynif/actions/workflows/ci.yml/badge.svg)](https://github.com/wetneb/pynif/actions/workflows/ci.yml) [![Coverage Status](https://coveralls.io/repos/github/wetneb/pynif/badge.svg?branch=master)](https://coveralls.io/github/wetneb/pynif?branch=master) [![PyPI version](https://img.shields.io/pypi/v/pynif.svg)](https://pypi.org/project/pynif/)\n\nThe [NLP Interchange Format (NIF)](http://persistence.uni-leipzig.org/nlp2rdf/) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations. It offers a standard representation of annotated texts for tasks such as [Named Entity Recognition](https://en.wikipedia.org/wiki/Named-entity_recognition) or [Entity Linking](https://en.wikipedia.org/wiki/Entity_linking). It is used by [GERBIL](https://github.com/dice-group/gerbil) to run reproducible evaluations of annotators.\n\nThis Python library can be used to serialize and deserialized annotated corpora in NIF.\n\n## Documentation\n\n[NIF Documentation](http://persistence.uni-leipzig.org/nlp2rdf/)\n\n## Supported NIF versions\n\nNIF 2.1, serialized in [any of the formats supported by rdflib](https://rdflib.readthedocs.io/en/stable/plugin_parsers.html)\n\n## Overview\n\nThis library is revolves around three core classes:\n * a `NIFContext` is a document (a string);\n * a `NIFPhrase` is the annotation of a snippet of text (usually a phrase) in a document;\n * a `NIFCollection` is a set of documents, which constitutes a collection.\nIn NIF, each of these objects is identified by a URI, and their attributes and relations are encoded by RDF triples between these URIs.\nThis library abstracts away the encoding by letting you manipulate collections, contexts and phrases as plain Python objects.\n\n## Quick start\n\nInstall pynif with `pip install pynif`.\n\n0) Import and create a collection\n\n```python\nfrom pynif import NIFCollection\n\ncollection = NIFCollection(uri=\"http://freme-project.eu\")\n```\n\n1) Create a context\n\n```python\ncontext = collection.add_context(\n    uri=\"http://freme-project.eu/doc32\",\n    mention=\"Diego Maradona is from Argentina.\")\n\n```\n\n2) Create entries for the entities\n\n```python\ncontext.add_phrase(\n    beginIndex=0,\n    endIndex=14,\n    taClassRef=['http://dbpedia.org/ontology/SportsManager', 'http://dbpedia.org/ontology/Person', 'http://nerd.eurecom.fr/ontology#Person'],\n    score=0.9869992701528016,\n    annotator='http://freme-project.eu/tools/freme-ner',\n    taIdentRef='http://dbpedia.org/resource/Diego_Maradona',\n    taMsClassRef='http://dbpedia.org/ontology/SoccerManager')\n\ncontext.add_phrase(\n    beginIndex=23,\n    endIndex=32,\n    taClassRef=['http://dbpedia.org/ontology/PopulatedPlace', 'http://nerd.eurecom.fr/ontology#Location',\n    'http://dbpedia.org/ontology/Place'],\n    score=0.9804963628413852,\n    annotator='http://freme-project.eu/tools/freme-ner',\n    taMsClassRef='http://dbpedia.org/resource/Argentina')\n```\n\n3) Finally, get the output with the format that you need\n\n```python\ngenerated_nif = collection.dumps(format='turtle')\nprint(generated_nif)\n```\n\nYou will obtain the NIF representation as a string:\n```turtle\n@prefix xsd: \u003chttp://www.w3.org/2001/XMLSchema#\u003e .\n@prefix nif: \u003chttp://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#\u003e .\n@prefix itsrdf: \u003chttp://www.w3.org/2005/11/its/rdf#\u003e .\n@prefix dcterms: \u003chttp://purl.org/dc/terms/\u003e\n\n\u003chttp://freme-project.eu\u003e a nif:ContextCollection ;\n    nif:hasContext \u003chttp://freme-project.eu/doc32\u003e ;\n    dcterms:conformsTo \u003chttp://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core/2.1\u003e .\n\n\u003chttp://freme-project.eu/doc32\u003e a nif:Context,\n        nif:OffsetBasedString ;\n    nif:beginIndex \"0\"^^xsd:nonNegativeInteger ;\n    nif:endIndex \"33\"^^xsd:nonNegativeInteger ;\n    nif:isString \"Diego Maradona is from Argentina.\" .\n\n\u003chttp://freme-project.eu/doc32#offset_0_14\u003e a nif:OffsetBasedString,\n        nif:Phrase ;\n    nif:anchorOf \"Diego Maradona\" ;\n    nif:beginIndex \"0\"^^xsd:nonNegativeInteger ;\n    nif:endIndex \"14\"^^xsd:nonNegativeInteger ;\n    nif:referenceContext \u003chttp://freme-project.eu/doc32\u003e ;\n    nif:taMsClassRef \u003chttp://dbpedia.org/ontology/SoccerManager\u003e ;\n    itsrdf:taAnnotatorsRef \u003chttp://freme-project.eu/tools/freme-ner\u003e ;\n    itsrdf:taClassRef \u003chttp://dbpedia.org/ontology/Person\u003e,\n        \u003chttp://dbpedia.org/ontology/SportsManager\u003e,\n        \u003chttp://nerd.eurecom.fr/ontology#Person\u003e ;\n    itsrdf:taConfidence 9.869993e-01 ;\n    itsrdf:taIdentRef \u003chttp://dbpedia.org/resource/Diego_Maradona\u003e .\n\n\u003chttp://freme-project.eu/doc32#offset_23_32\u003e a nif:OffsetBasedString,\n        nif:Phrase ;\n    nif:anchorOf \"Argentina\" ;\n    nif:beginIndex \"23\"^^xsd:nonNegativeInteger ;\n    nif:endIndex \"32\"^^xsd:nonNegativeInteger ;\n    nif:referenceContext \u003chttp://freme-project.eu/doc32\u003e ;\n    nif:taMsClassRef \u003chttp://dbpedia.org/resource/Argentina\u003e ;\n    itsrdf:taAnnotatorsRef \u003chttp://freme-project.eu/tools/freme-ner\u003e ;\n    itsrdf:taClassRef \u003chttp://dbpedia.org/ontology/Place\u003e,\n        \u003chttp://dbpedia.org/ontology/PopulatedPlace\u003e,\n        \u003chttp://nerd.eurecom.fr/ontology#Location\u003e ;\n    itsrdf:taConfidence 9.804964e-01 .\n    \n```\n\n4) You can then parse it back:\n\n```python\nparsed_collection = NIFCollection.loads(generated_nif, format='turtle')\n\nfor context in parsed_collection.contexts:\n   for phrase in context.phrases:\n       print(phrase)\n```\n\n## Supported NIF OffsetBasedString\n\nA context can be represented by an OffsetBasedString URI or a ContextHashBasedString URI. The ContextHashBasedString URI format is discussed in the paper Linked-Data Aware URI Schemes for Referencing Text Fragments (https://doi.org/10.1007/978-3-642-33876-2_17) page 4. \n\n**URIs formatted as ContextHashBasedString must be manually provided. The current pynif does not create them for you**\n\nTo use ContextHashBasedString URIs, you always need to provide them when creating Contexts and Phrases.\nTo provide a ContextHashBasedString URI instead of a OffsetBasedString URI, you must set the ``:param: is_hash_based_uri`` to ``True`` (by default ``is_hash_based_uri`` is ``False`` and the pynif works with ``nif:OffsetBasedString``). See the following examples:\n\n```py\ncontext = NIFContext(\n    uri='http://freme-project.eu#hash_0_33_cf35b7e267d05b7ca8aba0651641050b_Diego%20Maradona%20is%20fr',\n    is_hash_based_uri = True,\n    mention=\"Diego Maradona is from Argentina.\")\n\ncontext.add_phrase(\n    uri='http://freme-project.eu#hash_19_33_158118325b076b079d3969108872d855_Diego%20Maradona%20is%20fr',\n    is_hash_based_uri = True,\n    beginIndex=0,\n    endIndex=14,\n    score=0.9869992701528016,\n    taClassRef=['http://dbpedia.org/ontology/SportsManager', \n        'http://dbpedia.org/ontology/Person', \n        'http://nerd.eurecom.fr/ontology#Person'],\n    annotator='http://freme-project.eu/tools/freme-ner',\n    taIdentRef='http://dbpedia.org/resource/Diego_Maradona',\n    taMsClassRef='http://dbpedia.org/ontology/SoccerManager')\n```\n\nThe output in TURTLE format:\n\n```python\ngenerated_nif = context.dumps(format='turtle')\nprint(generated_nif)\n```\n```TURTLE\n@prefix xsd:   \u003chttp://www.w3.org/2001/XMLSchema#\u003e .\n@prefix itsrdf: \u003chttp://www.w3.org/2005/11/its/rdf#\u003e .\n@prefix nif:   \u003chttp://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#\u003e .\n                \n\u003chttp://freme-project.eu#hash_0_33_cf35b7e267d05b7ca8aba0651641050b_Diego%20Maradona%20is%20fr\u003e\n    a nif:ContextHashBasedString , nif:Context ;\n    nif:beginIndex  \"0\"^^xsd:nonNegativeInteger ;\n    nif:endIndex    \"33\"^^xsd:nonNegativeInteger ;\n    nif:isString    \"Diego Maradona is from Argentina.\" .\n\n\u003chttp://freme-project.eu#hash_19_33_158118325b076b079d3969108872d855_Diego%20Maradona%20is%20fr\u003e\n    a nif:ContextHashBasedString, nif:Phrase ;\n    nif:anchorOf \"Diego Maradona\" ;\n    nif:beginIndex \"0\"^^xsd:nonNegativeInteger ;\n    nif:endIndex \"14\"^^xsd:nonNegativeInteger ;\n    nif:referenceContext \u003chttp://freme-project.eu#hash_0_33_cf35b7e267d05b7ca8aba0651641050b_Diego%20Maradona%20is%20fr\u003e ;\n    nif:taMsClassRef \u003chttp://dbpedia.org/ontology/SoccerManager\u003e ;\n    itsrdf:taAnnotatorsRef \u003chttp://freme-project.eu/tools/freme-ner\u003e ;\n    itsrdf:taClassRef \u003chttp://dbpedia.org/ontology/Person\u003e, \n        \u003chttp://dbpedia.org/ontology/SportsManager\u003e, \n        \u003chttp://nerd.eurecom.fr/ontology#Person\u003e ;\n    itsrdf:taConfidence 9.869993e-01 ;\n    itsrdf:taIdentRef \u003chttp://dbpedia.org/resource/Diego_Maradona\u003e .\n```\n\n## Issues\n\nIf you have any problems with or questions about this library, please contact us through a [GitHub issue](https://github.com/wetneb/pynif/issues).\n\n## Releasing a new version\n\nMake sure the version in `setup.py` is up to date, create and upload a git tag, and then:\n\n```\npython -m build --sdist\npython -m build --wheel\npython -m twine upload dist/*\n```\n\nIncrement the version in `setup.py`.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwetneb%2Fpynif","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwetneb%2Fpynif","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwetneb%2Fpynif/lists"}