https://github.com/brooksian/sparkpipelinesparknlp

Build & Convert a Spark NLP Pipeline to PMML
https://github.com/brooksian/sparkpipelinesparknlp

corenlp nlp pmml spark zeppelin-notebook

Last synced: about 2 months ago
JSON representation

Build & Convert a Spark NLP Pipeline to PMML

Host: GitHub
URL: https://github.com/brooksian/sparkpipelinesparknlp
Owner: BrooksIan
License: gpl-3.0
Created: 2018-08-02T15:58:32.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2018-09-04T15:37:39.000Z (over 6 years ago)
Last Synced: 2025-01-19T16:16:01.469Z (3 months ago)
Topics: corenlp, nlp, pmml, spark, zeppelin-notebook
Size: 35.2 KB
Stars: 2
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# SparkPipelineSparkNLP
Build & Convert a Spark NLP Pipeline to PMML

## Spark NLP Pipeline on Tweets

**Language**: Scala
**Requirements**:
- [HDP 2.6.X]
- Spark 2.x

**Author** Ian Brooks \
**Follow** [LinkedIn - Ian Brooks PhD] (https://www.linkedin.com/in/ianrbrooksphd/) \
**HCC Article**: [Link] (https://community.hortonworks.com/articles/208569/build-and-convert-a-spark-nlp-pipeline-into-pmml-i.html)

Instructions:
1. Please follow this [tutorial](https://community.hortonworks.com/articles/1282/sample-hdfnifi-flow-to-push-tweets-into-solrbanana.html) to build the Solr collection 'tweets'

2. Upload the notebook (JSON File) to Apache Zeppelin

3. Match the version of Spark with the SolrSpark Connector. The version list is included in [here](https://github.com/lucidworks/spark-solr)

4. Review Spark Core NLP's [API](https://github.com/databricks/spark-corenlp) which creates Spark wrapper to the [Stanford CoreNLP](https://stanfordnlp.github.io/CoreNLP/) library

5. In the Stanford Core NLP download found here http://nlp.stanford.edu/software/stanford-corenlp-full-2018-02-27.zip, find the stanford-corelop-*-models.jar and copy it to the /tmp directory. In Zeppelin's Interpreters configurations for Spark, include the following artifact: /tmp/stanford-corenlp-full-2018-02-27/stanford-corenlp-3.9.1-models.jar

6. Review the libraries JPMML-Spark ML and JPMML-Model library found here https://github.com/jpmml/jpmml-sparkml and https://github.com/jpmml/jpmml-model

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/brooksian/sparkpipelinesparknlp

Awesome Lists containing this project

README