{"id":13741247,"url":"https://github.com/dasmith/stanford-corenlp-python","last_synced_at":"2026-03-11T23:01:25.882Z","repository":{"id":138386922,"uuid":"1415467","full_name":"dasmith/stanford-corenlp-python","owner":"dasmith","description":"Python wrapper for Stanford CoreNLP tools v3.4.1","archived":false,"fork":false,"pushed_at":"2018-03-14T10:34:59.000Z","size":193295,"stargazers_count":610,"open_issues_count":47,"forks_count":227,"subscribers_count":34,"default_branch":"master","last_synced_at":"2025-12-21T00:56:44.918Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dasmith.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2011-02-26T18:20:51.000Z","updated_at":"2025-12-06T14:39:58.000Z","dependencies_parsed_at":"2023-04-30T11:00:17.728Z","dependency_job_id":null,"html_url":"https://github.com/dasmith/stanford-corenlp-python","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/dasmith/stanford-corenlp-python","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dasmith%2Fstanford-corenlp-python","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dasmith%2Fstanford-corenlp-python/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dasmith%2Fstanford-corenlp-python/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dasmith%2Fstanford-corenlp-python/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dasmith","download_url":"https://codeload.github.com/dasmith/stanford-corenlp-python/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dasmith%2Fstanford-corenlp-python/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30406400,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-11T22:36:59.286Z","status":"ssl_error","status_checked_at":"2026-03-11T22:36:57.544Z","response_time":84,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T04:00:57.175Z","updated_at":"2026-03-11T23:01:25.858Z","avatar_url":"https://github.com/dasmith.png","language":"Python","readme":"# Python interface to Stanford Core NLP tools v3.4.1\n\nThis is a Python wrapper for Stanford University's NLP group's Java-based [CoreNLP tools](http://nlp.stanford.edu/software/corenlp.shtml).  It can either be imported as a module or run as a JSON-RPC server. Because it uses many large trained models (requiring 3GB RAM on 64-bit machines and usually a few minutes loading time), most applications will probably want to run it as a server.\n\n\n   * Python interface to Stanford CoreNLP tools: tagging, phrase-structure parsing, dependency parsing, [named-entity recognition](http://en.wikipedia.org/wiki/Named-entity_recognition), and [coreference resolution](http://en.wikipedia.org/wiki/Coreference).\n   * Runs an JSON-RPC server that wraps the Java server and outputs JSON.\n   * Outputs parse trees which can be used by [nltk](http://nltk.googlecode.com/svn/trunk/doc/howto/tree.html).\n\n\nIt depends on [pexpect](http://www.noah.org/wiki/pexpect) and includes and uses code from [jsonrpc](http://www.simple-is-better.org/rpc/) and [python-progressbar](http://code.google.com/p/python-progressbar/).\n\nIt runs the Stanford CoreNLP jar in a separate process, communicates with the java process using its command-line interface, and makes assumptions about the output of the parser in order to parse it into a Python dict object and transfer it using JSON.  The parser will break if the output changes significantly, but it has been tested on **Core NLP tools version 3.4.1** released 2014-08-27.\n\n## Download and Usage\n\nTo use this program you must [download](http://nlp.stanford.edu/software/corenlp.shtml#Download) and unpack the compressed file containing Stanford's CoreNLP package.  By default, `corenlp.py` looks for the Stanford Core NLP folder as a subdirectory of where the script is being run.  In other words:\n\n\tsudo pip install pexpect unidecode\n\tgit clone git://github.com/dasmith/stanford-corenlp-python.git\n\tcd stanford-corenlp-python\n\twget http://nlp.stanford.edu/software/stanford-corenlp-full-2014-08-27.zip\n\tunzip stanford-corenlp-full-2014-08-27.zip\n\nThen launch the server:\n\n    python corenlp.py\n\nOptionally, you can specify a host or port:\n\n    python corenlp.py -H 0.0.0.0 -p 3456\n\nThat will run a public JSON-RPC server on port 3456.\n\nAssuming you are running on port 8080, the code in `client.py` shows an example parse: \n\n    import jsonrpc\n    from simplejson import loads\n    server = jsonrpc.ServerProxy(jsonrpc.JsonRpc20(),\n                                 jsonrpc.TransportTcpIp(addr=(\"127.0.0.1\", 8080)))\n\n    result = loads(server.parse(\"Hello world.  It is so beautiful\"))\n    print \"Result\", result\n\nThat returns a dictionary containing the keys `sentences` and `coref`. The key `sentences` contains a list of dictionaries for each sentence, which contain `parsetree`, `text`, `tuples` containing the dependencies, and `words`, containing information about parts of speech, recognized named-entities, etc:\n\n\t{u'sentences': [{u'parsetree': u'(ROOT (S (VP (NP (INTJ (UH Hello)) (NP (NN world)))) (. !)))',\n\t                 u'text': u'Hello world!',\n\t                 u'tuples': [[u'dep', u'world', u'Hello'],\n\t                             [u'root', u'ROOT', u'world']],\n\t                 u'words': [[u'Hello',\n\t                             {u'CharacterOffsetBegin': u'0',\n\t                              u'CharacterOffsetEnd': u'5',\n\t                              u'Lemma': u'hello',\n\t                              u'NamedEntityTag': u'O',\n\t                              u'PartOfSpeech': u'UH'}],\n\t                            [u'world',\n\t                             {u'CharacterOffsetBegin': u'6',\n\t                              u'CharacterOffsetEnd': u'11',\n\t                              u'Lemma': u'world',\n\t                              u'NamedEntityTag': u'O',\n\t                              u'PartOfSpeech': u'NN'}],\n\t                            [u'!',\n\t                             {u'CharacterOffsetBegin': u'11',\n\t                              u'CharacterOffsetEnd': u'12',\n\t                              u'Lemma': u'!',\n\t                              u'NamedEntityTag': u'O',\n\t                              u'PartOfSpeech': u'.'}]]},\n\t                {u'parsetree': u'(ROOT (S (NP (PRP It)) (VP (VBZ is) (ADJP (RB so) (JJ beautiful))) (. .)))',\n\t                 u'text': u'It is so beautiful.',\n\t                 u'tuples': [[u'nsubj', u'beautiful', u'It'],\n\t                             [u'cop', u'beautiful', u'is'],\n\t                             [u'advmod', u'beautiful', u'so'],\n\t                             [u'root', u'ROOT', u'beautiful']],\n\t                 u'words': [[u'It',\n\t                             {u'CharacterOffsetBegin': u'14',\n\t                              u'CharacterOffsetEnd': u'16',\n\t                              u'Lemma': u'it',\n\t                              u'NamedEntityTag': u'O',\n\t                              u'PartOfSpeech': u'PRP'}],\n\t                            [u'is',\n\t                             {u'CharacterOffsetBegin': u'17',\n\t                              u'CharacterOffsetEnd': u'19',\n\t                              u'Lemma': u'be',\n\t                              u'NamedEntityTag': u'O',\n\t                              u'PartOfSpeech': u'VBZ'}],\n\t                            [u'so',\n\t                             {u'CharacterOffsetBegin': u'20',\n\t                              u'CharacterOffsetEnd': u'22',\n\t                              u'Lemma': u'so',\n\t                              u'NamedEntityTag': u'O',\n\t                              u'PartOfSpeech': u'RB'}],\n\t                            [u'beautiful',\n\t                             {u'CharacterOffsetBegin': u'23',\n\t                              u'CharacterOffsetEnd': u'32',\n\t                              u'Lemma': u'beautiful',\n\t                              u'NamedEntityTag': u'O',\n\t                              u'PartOfSpeech': u'JJ'}],\n\t                            [u'.',\n\t                             {u'CharacterOffsetBegin': u'32',\n\t                              u'CharacterOffsetEnd': u'33',\n\t                              u'Lemma': u'.',\n\t                              u'NamedEntityTag': u'O',\n\t                              u'PartOfSpeech': u'.'}]]}],\n\tu'coref': [[[[u'It', 1, 0, 0, 1], [u'Hello world', 0, 1, 0, 2]]]]}\n    \nTo use it in a regular script (useful for debugging), load the module instead:\n\n    from corenlp import *\n    corenlp = StanfordCoreNLP()  # wait a few minutes...\n    corenlp.parse(\"Parse this sentence.\")\n\nThe server, `StanfordCoreNLP()`, takes an optional argument `corenlp_path` which specifies the path to the jar files.  The default value is `StanfordCoreNLP(corenlp_path=\"./stanford-corenlp-full-2014-08-27/\")`.\n\n## Coreference Resolution\n\nThe library supports [coreference resolution](http://en.wikipedia.org/wiki/Coreference), which means pronouns can be \"dereferenced.\"  If an entry in the `coref` list is, `[u'Hello world', 0, 1, 0, 2]`, the numbers mean:\n\n  * 0 = The reference appears in the 0th sentence (e.g. \"Hello world\")\n  * 1 = The 2nd token, \"world\", is the [headword](http://en.wikipedia.org/wiki/Head_%28linguistics%29) of that sentence\n  * 0 = 'Hello world' begins at the 0th token in the sentence\n  * 2 = 'Hello world' ends before the 2nd token in the sentence.\n\n\u003c!--\n\n\n## Adding WordNet\n\nNote: wordnet doesn't seem to be supported using this approach.  Looks like you'll need Java.\n\nDownload WordNet-3.0 Prolog:  http://wordnetcode.princeton.edu/3.0/WNprolog-3.0.tar.gz\ntar xvfz WNprolog-3.0.tar.gz \n\n--\u003e\n\n\n## Questions \n\n**Stanford CoreNLP tools require a large amount of free memory**.  Java 5+ uses about 50% more RAM on 64-bit machines than 32-bit machines.  32-bit machine users can lower the memory requirements by changing `-Xmx3g` to `-Xmx2g` or even less.\nIf pexpect timesout while loading models, check to make sure you have enough memory and can run the server alone without your kernel killing the java process:\n\n\tjava -cp stanford-corenlp-2014-08-27.jar:stanford-corenlp-3.4.1-models.jar:xom.jar:joda-time.jar -Xmx3g edu.stanford.nlp.pipeline.StanfordCoreNLP -props default.properties\n\nYou can reach me, Dustin Smith, by sending a message on GitHub or through email (contact information is available [on my webpage](http://web.media.mit.edu/~dustin)).\n\n\n# License \u0026 Contributors\n\nThis is free and open source software and has benefited from the contribution and feedback of others.  Like Stanford's CoreNLP tools, it is covered under the [GNU General Public License v2 +](http://www.gnu.org/licenses/gpl-2.0.html), which in short means that modifications to this program must maintain the same free and open source distribution policy.\n\nI gratefully welcome bug fixes and new features.  If you have forked this repository, please submit a [pull request](https://help.github.com/articles/using-pull-requests/) so others can benefit from your contributions.  This project has already benefited from contributions from these members of the open source community:\n\n  * [Emilio Monti](https://github.com/emilmont)\n  * [Justin Cheng](https://github.com/jcccf) \n  * Abhaya Agarwal\n\n*Thank you!*\n\n## Related Projects\n\nMaintainers of the Core NLP library at Stanford keep an [updated list of wrappers and extensions](http://nlp.stanford.edu/software/corenlp.shtml#Extensions).  See Brendan O'Connor's [stanford_corenlp_pywrapper](https://github.com/brendano/stanford_corenlp_pywrapper) for a different approach more suited to batch processing.\n","funding_links":[],"categories":["Software","Python"],"sub_categories":["Utilities","General-Purpose Machine Learning"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdasmith%2Fstanford-corenlp-python","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdasmith%2Fstanford-corenlp-python","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdasmith%2Fstanford-corenlp-python/lists"}