Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/brooksian/sparkpipelinesparknlp
Build & Convert a Spark NLP Pipeline to PMML
https://github.com/brooksian/sparkpipelinesparknlp
corenlp nlp pmml spark zeppelin-notebook
Last synced: about 1 month ago
JSON representation
Build & Convert a Spark NLP Pipeline to PMML
- Host: GitHub
- URL: https://github.com/brooksian/sparkpipelinesparknlp
- Owner: BrooksIan
- License: gpl-3.0
- Created: 2018-08-02T15:58:32.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2018-09-04T15:37:39.000Z (over 6 years ago)
- Last Synced: 2023-07-19T18:49:09.608Z (over 1 year ago)
- Topics: corenlp, nlp, pmml, spark, zeppelin-notebook
- Size: 35.2 KB
- Stars: 2
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# SparkPipelineSparkNLP
Build & Convert a Spark NLP Pipeline to PMML## Spark NLP Pipeline on Tweets
**Language**: Scala
**Requirements**:
- [HDP 2.6.X]
- Spark 2.x**Author** Ian Brooks \
**Follow** [LinkedIn - Ian Brooks PhD] (https://www.linkedin.com/in/ianrbrooksphd/) \
**HCC Article**: [Link] (https://community.hortonworks.com/articles/208569/build-and-convert-a-spark-nlp-pipeline-into-pmml-i.html)Instructions:
1. Please follow this [tutorial](https://community.hortonworks.com/articles/1282/sample-hdfnifi-flow-to-push-tweets-into-solrbanana.html) to build the Solr collection 'tweets'2. Upload the notebook (JSON File) to Apache Zeppelin
3. Match the version of Spark with the SolrSpark Connector. The version list is included in [here](https://github.com/lucidworks/spark-solr)
4. Review Spark Core NLP's [API](https://github.com/databricks/spark-corenlp) which creates Spark wrapper to the [Stanford CoreNLP](https://stanfordnlp.github.io/CoreNLP/) library
5. In the Stanford Core NLP download found here http://nlp.stanford.edu/software/stanford-corenlp-full-2018-02-27.zip, find the stanford-corelop-*-models.jar and copy it to the /tmp directory. In Zeppelin's Interpreters configurations for Spark, include the following artifact: /tmp/stanford-corenlp-full-2018-02-27/stanford-corenlp-3.9.1-models.jar
6. Review the libraries JPMML-Spark ML and JPMML-Model library found here https://github.com/jpmml/jpmml-sparkml and https://github.com/jpmml/jpmml-model