{"id":13754372,"url":"https://github.com/Lynten/stanford-corenlp","last_synced_at":"2025-05-09T22:32:12.877Z","repository":{"id":37664691,"uuid":"92424780","full_name":"Lynten/stanford-corenlp","owner":"Lynten","description":"Python wrapper for Stanford CoreNLP.","archived":false,"fork":false,"pushed_at":"2021-12-07T12:22:56.000Z","size":62,"stargazers_count":922,"open_issues_count":73,"forks_count":199,"subscribers_count":24,"default_branch":"master","last_synced_at":"2025-04-23T05:48:29.027Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Lynten.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-05-25T16:53:03.000Z","updated_at":"2025-03-10T07:37:28.000Z","dependencies_parsed_at":"2022-07-12T16:42:39.117Z","dependency_job_id":null,"html_url":"https://github.com/Lynten/stanford-corenlp","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Lynten%2Fstanford-corenlp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Lynten%2Fstanford-corenlp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Lynten%2Fstanford-corenlp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Lynten%2Fstanford-corenlp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Lynten","download_url":"https://codeload.github.com/Lynten/stanford-corenlp/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253335977,"owners_count":21892765,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T09:01:57.232Z","updated_at":"2025-05-09T22:32:07.866Z","avatar_url":"https://github.com/Lynten.png","language":"Python","funding_links":[],"categories":["实体识别NER、意图识别、槽位填充"],"sub_categories":["其他_文本生成、文本对话"],"readme":"## stanfordcorenlp\n[![PyPI](https://img.shields.io/pypi/v/stanfordcorenlp.svg)]()\n[![GitHub release](https://img.shields.io/github/release/Lynten/stanford-corenlp.svg)]()\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/stanfordcorenlp.svg)]()\n\n\n`stanfordcorenlp` is a Python wrapper for [Stanford CoreNLP](https://stanfordnlp.github.io/CoreNLP/). It provides a simple API for text processing tasks such as Tokenization, Part of Speech Tagging, Named Entity Reconigtion, Constituency Parsing, Dependency Parsing, and more.\n\n## Prerequisites\nJava 1.8+ (Check with command: `java -version`) ([Download Page](http://www.oracle.com/technetwork/cn/java/javase/downloads/jdk8-downloads-2133151-zhs.html))\n\nStanford CoreNLP ([Download Page](https://stanfordnlp.github.io/CoreNLP/history.html))\n\n| Py Version | CoreNLP Version |\n| --- | --- |\n|v3.7.0.1 v3.7.0.2 | CoreNLP 3.7.0 |\n|v3.8.0.1 | CoreNLP 3.8.0 |\n|v3.9.1.1 | CoreNLP 3.9.1 |\n\n## Installation\n\n`pip install stanfordcorenlp`\n\n## Example\n### Simple Usage\n```python\n# Simple usage\nfrom stanfordcorenlp import StanfordCoreNLP\n\nnlp = StanfordCoreNLP(r'G:\\JavaLibraries\\stanford-corenlp-full-2018-02-27')\n\nsentence = 'Guangdong University of Foreign Studies is located in Guangzhou.'\nprint 'Tokenize:', nlp.word_tokenize(sentence)\nprint 'Part of Speech:', nlp.pos_tag(sentence)\nprint 'Named Entities:', nlp.ner(sentence)\nprint 'Constituency Parsing:', nlp.parse(sentence)\nprint 'Dependency Parsing:', nlp.dependency_parse(sentence)\n\nnlp.close() # Do not forget to close! The backend server will consume a lot memery.\n```\n\nOutput format:\n```python\n# Tokenize\n[u'Guangdong', u'University', u'of', u'Foreign', u'Studies', u'is', u'located', u'in', u'Guangzhou', u'.']\n\n# Part of Speech\n[(u'Guangdong', u'NNP'), (u'University', u'NNP'), (u'of', u'IN'), (u'Foreign', u'NNP'), (u'Studies', u'NNPS'), (u'is', u'VBZ'), (u'located', u'JJ'), (u'in', u'IN'), (u'Guangzhou', u'NNP'), (u'.', u'.')]\n\n# Named Entities\n [(u'Guangdong', u'ORGANIZATION'), (u'University', u'ORGANIZATION'), (u'of', u'ORGANIZATION'), (u'Foreign', u'ORGANIZATION'), (u'Studies', u'ORGANIZATION'), (u'is', u'O'), (u'located', u'O'), (u'in', u'O'), (u'Guangzhou', u'LOCATION'), (u'.', u'O')]\n\n# Constituency Parsing\n (ROOT\n  (S\n    (NP\n      (NP (NNP Guangdong) (NNP University))\n      (PP (IN of)\n        (NP (NNP Foreign) (NNPS Studies))))\n    (VP (VBZ is)\n      (ADJP (JJ located)\n        (PP (IN in)\n          (NP (NNP Guangzhou)))))\n    (. .)))\n\n# Dependency Parsing\n[(u'ROOT', 0, 7), (u'compound', 2, 1), (u'nsubjpass', 7, 2), (u'case', 5, 3), (u'compound', 5, 4), (u'nmod', 2, 5), (u'auxpass', 7, 6), (u'case', 9, 8), (u'nmod', 7, 9), (u'punct', 7, 10)]\n\n```\n\n### Other Human Languages Support\nNote: you must download an additional model file and place it in the `.../stanford-corenlp-full-2018-02-27` folder. For example, you should download the `stanford-chinese-corenlp-2018-02-27-models.jar` file if you want to process Chinese.\n```python\n# _*_coding:utf-8_*_\n\n# Other human languages support, e.g. Chinese\nsentence = '清华大学位于北京。'\n\nwith StanfordCoreNLP(r'G:\\JavaLibraries\\stanford-corenlp-full-2018-02-27', lang='zh') as nlp:\n    print(nlp.word_tokenize(sentence))\n    print(nlp.pos_tag(sentence))\n    print(nlp.ner(sentence))\n    print(nlp.parse(sentence))\n    print(nlp.dependency_parse(sentence))\n```\n\n### General Stanford CoreNLP API\nSince this will load all the models which require more memory, initialize the server with more memory. 8GB is recommended.\n\n```python\n # General json output\nnlp = StanfordCoreNLP(r'path_to_corenlp', memory='8g')\nprint nlp.annotate(sentence)\nnlp.close()\n```\nYou can specify properties:\n\n- `annotators`: `tokenize, ssplit, pos, lemma, ner, parse, depparse, dcoref` ([See Detail](https://stanfordnlp.github.io/CoreNLP/annotators.html))\n\n- `pipelineLanguage`: `en, zh, ar, fr, de, es` (English, Chinese, Arabic, French, German, Spanish) ([See Annotator Support Detail](https://stanfordnlp.github.io/CoreNLP/human-languages.html)) \n\n- `outputFormat`: `json, xml, text`\n```python\ntext = 'Guangdong University of Foreign Studies is located in Guangzhou. ' \\\n       'GDUFS is active in a full range of international cooperation and exchanges in education. '\n\nprops={'annotators': 'tokenize,ssplit,pos','pipelineLanguage':'en','outputFormat':'xml'}\nprint nlp.annotate(text, properties=props)\nnlp.close()\n```\n\n\n### Use an Existing Server\nStart a [CoreNLP Server](https://stanfordnlp.github.io/CoreNLP/corenlp-server.html) with command:\n```\njava -mx4g -cp \"*\" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000\n```\nAnd then:\n```python\n# Use an existing server\nnlp = StanfordCoreNLP('http://localhost', port=9000)\n```\n\n## Debug\n```python\nimport logging\nfrom stanfordcorenlp import StanfordCoreNLP\n\n# Debug the wrapper\nnlp = StanfordCoreNLP(r'path_or_host', logging_level=logging.DEBUG)\n\n# Check more info from the CoreNLP Server \nnlp = StanfordCoreNLP(r'path_or_host', quiet=False, logging_level=logging.DEBUG)\nnlp.close()\n```\n\n## Build\n\nWe use `setuptools` to package our project. You can build from the latest source code with the following command:\n```\n$ python setup.py bdist_wheel --universal\n```\n\nYou will see the `.whl` file under `dist` directory.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FLynten%2Fstanford-corenlp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FLynten%2Fstanford-corenlp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FLynten%2Fstanford-corenlp/lists"}