Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/brooksian/twittersentimentsparkcorenlp

Twitter Sentiment Analysis Using Spark CoreNLP
https://github.com/brooksian/twittersentimentsparkcorenlp

nlp-machine-learning spark sparksql zeppelin-notebook

Last synced: 3 months ago
JSON representation

Twitter Sentiment Analysis Using Spark CoreNLP

Awesome Lists containing this project

README

        

# TwitterSentimentSparkCoreNLP

## Spark Core NLP on Tweets

**Language**: Scala
**Requirements**:
- [HDP 2.6.X]
- Spark 2.x

**Author** Ian Brooks \
**Follow** [LinkedIn - Ian Brooks PhD] (https://www.linkedin.com/in/ianrbrooksphd/) \
**HCC Article**: [Link] (https://community.hortonworks.com/articles/192368/spark-core-nlp-in-apache-zeppelin.html)

Instructions:
1. Please follow this [tutorial](https://community.hortonworks.com/articles/1282/sample-hdfnifi-flow-to-push-tweets-into-solrbanana.html) to build the Solr collection 'tweets'

2. Upload the notebook (JSON File) to Apache Zeppelin

3. Match the version of Spark with the SolrSpark Connector. The version list is included in [here](https://github.com/lucidworks/spark-solr)

4. Review Spark Core NLP's [API](https://github.com/databricks/spark-corenlp) which creates Spark wrapper to the [Stanford CoreNLP](https://stanfordnlp.github.io/CoreNLP/) library

5. In the Stanford Core NLP download found here http://nlp.stanford.edu/software/stanford-corenlp-full-2018-02-27.zip, find the stanford-corelop-*-models.jar and copy it to the /tmp directory. In Zeppelin's Interpreters configurations for Spark, include the following artifact: /tmp/stanford-corenlp-full-2018-02-27/stanford-corenlp-3.9.1-models.jar