{"id":35015663,"url":"https://github.com/mdda/corenlp-java-server","last_synced_at":"2025-12-27T05:19:31.385Z","repository":{"id":32359741,"uuid":"35935679","full_name":"mdda/corenlp-java-server","owner":"mdda","description":"Simple Java REST API wrapper for the Stanford CoreNLP parser","archived":false,"fork":false,"pushed_at":"2015-06-23T10:14:58.000Z","size":304,"stargazers_count":4,"open_issues_count":0,"forks_count":2,"subscribers_count":3,"default_branch":"master","last_synced_at":"2023-03-11T07:58:07.391Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mdda.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-05-20T08:28:53.000Z","updated_at":"2018-11-08T15:23:35.000Z","dependencies_parsed_at":"2022-09-14T02:02:24.417Z","dependency_job_id":null,"html_url":"https://github.com/mdda/corenlp-java-server","commit_stats":null,"previous_names":[],"tags_count":null,"template":null,"template_full_name":null,"purl":"pkg:github/mdda/corenlp-java-server","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mdda%2Fcorenlp-java-server","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mdda%2Fcorenlp-java-server/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mdda%2Fcorenlp-java-server/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mdda%2Fcorenlp-java-server/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mdda","download_url":"https://codeload.github.com/mdda/corenlp-java-server/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mdda%2Fcorenlp-java-server/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28072870,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-12-27T02:00:05.897Z","response_time":58,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-12-27T05:19:29.794Z","updated_at":"2025-12-27T05:19:31.377Z","avatar_url":"https://github.com/mdda.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# corenlp-java-server\nSimple Java REST API wrapper for the Stanford CoreNLP parser\n\n## Rationale\n\nIn order to 'play' with [CoreNLP](http://nlp.stanford.edu/software/corenlp.shtml) \neffectively, it makes sense to leave it running as a standalone process, \nand query it using a simple REST API.\n\nOne issue with the existing wrappers in other languages is that they \ndon't seem to offer much control over the pipeline being used.  Often they\njust launch a command-line instance of CoreNLP, and 'what you get is what you get'.\n\nTherefore, this project is written in Java, so that the pipeline can\nbe customized per call, rather than per restart.\n\nHowever, rather than get *involved* with writing extensively in Java, this project\ntakes the simplest approach possible to getting the REST server up-and-running\n(using [SparkJava](http://sparkjava.com/documentation.html), \nnot to be confused with [Apache-Spark](https://spark.apache.org/)).\n\n## Goal\n\nIt should be simple to add to this project to create additional routes\nas required, and still be able to 'natively' control the way in which CoreNLP\ninteracts with the text you supply.\n\nThe sample routes includs a ```/ner``` POST route (with content data provided \nin POSTed json), that allows the processing of more than one \ndocument (data in ```doc:[]```) by the current pipeline, \nand also the (cached) usage of pipelines defined by ```props:{}```.\n\n## Running\n\nThis project is built using ```sbt``` - which I prefer due to my \nScala tendencies.  Note, however, that it contains no scala code \nitself.  If necessary, it should be easy to create a ```pom.xml``` file \nfor maven, etc.\n\nTo compile (which will take some time initially, since the CoreNLP \nclass files must be downloaded, and they are ~150Mb) just :\n```\nsbt\n### This is now an interactive session\n\u003e compile\n\n### or, for on-file-change recompilation:\n\u003e ~ compile\n\n### run on the default port:\n\u003e run\n### or (specifying the port configuration explicitly):\n\u003e run -port 4567\n```\n\nRunning from the command line directly is: \n```\nsbt \"run -port 4567\"\n```\n\nThe server should be visible at [http://localhost:4567/ping](http://localhost:4567/ping).\n\nAnd an example parse via [http://localhost:4567/test?txt=\"This is a test of the Stanford parser.\"](http://localhost:4567/test?txt=\"This is a test of the Stanford parser.\"),\nwhere you may need to replace the spaces in the txt string with ```%20```.\n\nThere is also a ```POST``` endpoint example given in ```Main.java```, \nwhereby the ```txt``` parameter can be specified in the HTTP body.  This can be \naccessed as follows :\n\n```\ncurl -X POST http://localhost:4567/ner \\\n  -d '{\"doc\":[\"Jack and Jill went up the hill.\"],\n       \"props\":{\"annotators\":\"tokenize, ssplit, pos, lemma, ner, parse\"}\n      }'\n```\n\nInitial runs indicate that the parse speed is of the order of 10 sentences a second, and the parser will automatically make use of all the cores of your machine, if you let it.\n\n## Requirements\n\nNeeds ```java``` and ```sbt``` (unless someone else wants to suggest a ```pom.xml```).\n\nOn Fedora, these two tools can be installed with : \n\n```\nyum install java-1.8.0-openjdk-devel sbt\n```\n\nIn ```Main.java``` the code makes use of Java 8 closures, but it shouldn't be hard to \nre-do this to be compilable under previous Java versions.\n\n\n## Additional NER models : \n\nThese can be downloaded from Stanford if you need them specifically (the NER\nworks just fine using the models already included, though) :\n\n```\nwget http://nlp.stanford.edu/software/conll.closed.iob2.crf.ser.gz    #  97Mb\nwget http://nlp.stanford.edu/software/conll.distsim.iob2.crf.ser.gz   # 115Mb\n```\n\nThese can be downloaded into (for instance) this repo's base directory, \nand accessed by using the property : \n\n```\n\"ner.model\":\"conll.closed.iob2.crf.ser.gz\"   ## IOB2 output (different from regular NER models)\n\"ner.model\":\"conll.distsim.iob2.crf.ser.gz\"  ## java.lang.OutOfMemoryError: GC overhead limit exceeded\n```\n\n\n## License\n\nSince this embeds the GPL2(+) CoreNLP project (https://github.com/stanfordnlp/CoreNLP), \nall modifications and extensions will also be GPL2(+).\n\nNote that simply using this as a REST API service doesn't mean a client \nusing the HTTP API must be GPL.  If that were Stanford's intention, \nthen the CoreNLP project itself would be (for instance) AGPL licensed.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmdda%2Fcorenlp-java-server","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmdda%2Fcorenlp-java-server","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmdda%2Fcorenlp-java-server/lists"}