{"id":15284137,"url":"https://github.com/dcavar/py-json-nlp","last_synced_at":"2025-04-12T23:21:02.977Z","repository":{"id":62581168,"uuid":"177855382","full_name":"dcavar/Py-JSON-NLP","owner":"dcavar","description":"Python module for JSON-NLP","archived":false,"fork":false,"pushed_at":"2020-07-04T00:27:41.000Z","size":110,"stargazers_count":8,"open_issues_count":1,"forks_count":6,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-04-10T22:41:19.172Z","etag":null,"topics":["conll","conll-u","flair","json","natural-language-processing","nlp","nltk","polyglot","python3","spacy"],"latest_commit_sha":null,"homepage":"http://nlp-lab.org/pyjsonnlp/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dcavar.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-03-26T19:27:38.000Z","updated_at":"2020-07-04T00:27:43.000Z","dependencies_parsed_at":"2022-11-03T21:34:16.909Z","dependency_job_id":null,"html_url":"https://github.com/dcavar/Py-JSON-NLP","commit_stats":null,"previous_names":[],"tags_count":9,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dcavar%2FPy-JSON-NLP","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dcavar%2FPy-JSON-NLP/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dcavar%2FPy-JSON-NLP/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dcavar%2FPy-JSON-NLP/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dcavar","download_url":"https://codeload.github.com/dcavar/Py-JSON-NLP/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248644091,"owners_count":21138556,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["conll","conll-u","flair","json","natural-language-processing","nlp","nltk","polyglot","python3","spacy"],"created_at":"2024-09-30T14:50:03.074Z","updated_at":"2025-04-12T23:21:02.955Z","avatar_url":"https://github.com/dcavar.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Python JSON-NLP Module\n\n(C) 2020 by [Semiring Inc.]\n\nContributions from [Damir Cavar], [Oren Baldinger], [Maanvitha Gongalla], [Anurag Kumar], Murali Kammili, and others during 2019.\n\nBrought to you by the [NLP-Lab.org]. New Maintainer since 2020 is [Semiring Inc.].\n\nThis new version now is 0.6 and it is no longer compatible with version 0.2.33. If you use the old JSON-NLP standard in your code, make sure you require version 0.2.33 of *pyjsonnlp*. This new version is compatible with the newest version of [Go JSON-NLP].\n\n\n\n## Introduction\n\nThere is a growing number of Natural Language Processing (NLP) tools, modules, pipelines. There does not seem to be any standard for the output format. Here we are focusing on a standard for the output format syntax. Some future version of [JSON-NLP] might address the output semantics as well.\n\n[JSON-NLP] is a standard for the most important outputs NLP pipelines and components can generate. The relevant documentation can be found in the [JSON-NLP] GitHub repo and on its website at the [NLP-Lab] and [Semiring Inc.].\n\nThe Python [JSON-NLP] module contains general mapping functions for [JSON-NLP] to [CoNLL-U], a validator for the generated output, an NLP pipeline interface (for [Flair], [spaCy], [NLTK], [Polyglot], [Xrenner], etc.), and various utility functions.\n\nThere is a [Java JSON-NLP](https://github.com/dcavar/J-JSON-NLP) Maven module as well, and there are wrappers for numerous popular NLP pipelines and tools linked from the [NLP-Lab.org] website.\n\n\n## Installation\n\nFor more details, see [JSON-NLP].\n\nThis module is a wrapper for outputs from different NLP pipelines and modules into a standardized [JSON-NLP] format.\n\nTo install this package, run the following command:\n\n    pip install pyjsonnlp\n\nYou might have to use *pip3* on some systems.\n\n\n## Validation\n\n[JSON-NLP] is based on a schema, maintained by [NLP-Lab.org] and [Semiring Inc.], to comprehensively and concisely represent linguistic annotations. \n\nWe provide a validator to help ensure that generated JSON validates against the schema:\n\n    result = MyPipeline().proces(text=\"I am a sentence\")\n    assert pyjsonnlp.validation.is_valid(result)\n\n\n## Conversion\n\nTo enable interoperability with other annotation formats, we support conversions between them.\nNote that conversion could be lossy, if the relative depths of annotation are not the same.\nCurrently we have a [CoNLL-U] to [JSON-NLP] converter, that covers most annotations:\n\n    pyjsonnlp.conversion.parse_conllu(conllu_text)\n    \nTo convert the other direction:\n\n    pyjsonnlp.conversion.to_conllu(jsonnlp)\n\n\n## Pipeline\n\n[JSON-NLP] provides a simple `Pipeline` interface that should be implemented for embedding into a microservice:\n    \n    from collections import OrderedDict\n\n    class MockPipeline(pyjsonnlp.pipeline.Pipeline):\n        @staticmethod\n        def process(text='', coreferences=False, constituents=False, dependencies=False, expressions=False,\n                    **kwargs) -\u003e OrderedDict: \n            return OrderedDict()\n            \nThe provided keyword arguments should be used to toggle on or off processing components within the method.        \n            \nIf you have deployed a `Pipeline` as a microservice (see below), we provide a local endpoint for a remotely \ndeployed `Pipeline` via the `RemotePipeline` class:\n\n    pipeline = pyjsonnlp.pipeline.RemotePipeline('localhost', port=9000)\n    print(pipeline.process(text='I am a sentence', dependencies=True, something='else'), spacing=2)\n\n\n## Microservice\n\nThe [JSON-NLP] as a Microservice class is only available in older versions of this module. Version 0.2.x is implemented as a Microsorvice with a pre-built implementation of [Flask].\n\n    from pyjsonnlp.microservices.flask_server import FlaskMicroservice\n\n    app = FlaskMicroservice(__name__, MyPipeline(), base_route='/')\n \nWe recommend creating a `server.py` with the `FlaskMicroservice` class, which extends the [Flask] app. A corresponding WSGI file would contain:\n\n    from mypipeline.server import app as application\n    \nTo disable a pipeline component (such as phrase structure parsing), add\n\n    application.constituents = False\n    \nThe full list of properties available that can be disabled or enabled are\n- constituents\n- dependencies\n- coreference\n- expressions\n\nThe microservice exposes the following URIs:\n- /constituents\n- /dependencies\n- /coreference\n- /expressions\n- /token_list\n\nThese URIs are shortcuts to disable the other components of the parse. In all cases, `tokenList` will be included in the `JSON-NLP` output. An example url is:\n\n    http://localhost:5000/dependencies?text=I am a sentence\n\nText is provided to the microservice with the `text` parameter, via either `GET` or `POST`. If you pass `url` as a parameter, the microservice will scrape that url and process the text of the website.\n\nOther parameters specific to your pipeline implementation can be passed as well:\n\n    http://localhost:5000?lang=en\u0026constituents=0\u0026text=I am a sentence.\n\nThe current version 0.6 or newer does not support the [Flask]-based RESTful Microservice infrastructure. It is a pure [JSON-NLP] data structure, processor and converter.\n\n\n\n[Damir Cavar]: https://www.linkedin.com/in/damircavar/ \"Damir Cavar\"\n[Oren Baldinger]: https://oren.baldinger.me/ \"Oren Baldinger\"\n[Anurag Kumar]: https://github.com/anuragkumar95/ \"Anurag Kumar\"\n[Maanvitha Gongalla]: https://maanvithag.github.io/MaanvithaGongalla/\n[NLP-Lab.org]: http://nlp-lab.org/ \"NLP-Lab.org\"\n[JSON-NLP]: https://github.com/SemiringInc/JSON-NLP \"JSON-NLP\"\n[Flair]: https://github.com/zalandoresearch/flair \"Flair\"\n[spaCy]: https://spacy.io/ \"spaCy\"\n[NLTK]: http://nltk.org/ \"Natural Language Processing Toolkit\"\n[Polyglot]: https://github.com/aboSamoor/polyglot \"Polyglot\" \n[Xrenner]: https://github.com/amir-zeldes/xrenner \"Xrenner\"\n[CoNLL-U]: https://universaldependencies.org/format.html \"CoNNL-U\"\n[Semiring Inc.]: https://semiring.com/ \"Semiring Inc.\"\n[Go JSON-NLP]: https://github.com/SemiringInc/GoJSONNLP \"Go JSON-NLP\"\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdcavar%2Fpy-json-nlp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdcavar%2Fpy-json-nlp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdcavar%2Fpy-json-nlp/lists"}