https://github.com/m-molaei/twitter-sentiment-analysis-using-apache-spark-

Sentiment analysis using deep learning models and FastText embedding on Apache Spark
https://github.com/m-molaei/twitter-sentiment-analysis-using-apache-spark-

apache-cassandra apache-spark big-data fasttext fasttext-embeddings mongodb pyspark rdd sentiment-analysis sentiment140-dataset spark

Last synced: 4 months ago
JSON representation

Sentiment analysis using deep learning models and FastText embedding on Apache Spark

Host: GitHub
URL: https://github.com/m-molaei/twitter-sentiment-analysis-using-apache-spark-
Owner: m-molaei
Created: 2022-08-25T13:59:50.000Z (almost 3 years ago)
Default Branch: main
Last Pushed: 2022-08-26T08:41:29.000Z (almost 3 years ago)
Last Synced: 2025-03-14T07:45:55.974Z (4 months ago)
Topics: apache-cassandra, apache-spark, big-data, fasttext, fasttext-embeddings, mongodb, pyspark, rdd, sentiment-analysis, sentiment140-dataset, spark
Language: Jupyter Notebook
Homepage:
Size: 43 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Twitter-Sentiment-Analysis-using-Apache-Spark
### Sentiment analysis using deep learning models and FastText embeddings on Apache Spark

I implemented a sentiment analysis model on Twitter using Apache Spark. I used FastText embeddings and deep learning RNN models (LSTM, GRU, and CNN) with Analytics Zoo library. Also, this work included a pre-processing framework based on Dataframe that performs much better than RDD-based architectures in terms of processing time and volume of data that can be processed.
In addition, I used MongoDB and Apache Cassandra as this model's databases and compared them to the Apache Spark file storing and retrieving system.

We also published an article for introducing a Dataframe based pre-processing framework that you can get from here:
https://jad.shahroodut.ac.ir/article_2394.html

I hope this will be useful for you ;)

## Code Explanation

1. Importing libraries (Probably you will need to install some of them such as [`Analytics Zoo`](https://analytics-zoo.readthedocs.io/en/latest/doc/UserGuide/python.html) and [`findspark`](https://github.com/minrk/findspark))
2. Initialize Apache spark cluster
3. Import and reading sentiemnt140 dataset with pandas. (You will need to change dataset's path)
4. Import FastText embeddings with gensim
5. Pre-processing tweets including cleansing, tokening, padding and vectorizing (This step is implemented in two ways: RDD-based and Dataframe-based)
6. Configuration of Apache Cassandra and MongoDB on Apache Spark
7. Sentiment Analysis models

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/m-molaei/twitter-sentiment-analysis-using-apache-spark-

Awesome Lists containing this project

README