https://github.com/drisskhattabi6/real-time-twitter-sentiment-analysis
This repo contains Big Data Project, its about "Real Time Twitter Sentiment Analysis via Kafka, Spark Streaming, MongoDB and Django Dashboard".
https://github.com/drisskhattabi6/real-time-twitter-sentiment-analysis
Last synced: 3 months ago
JSON representation
This repo contains Big Data Project, its about "Real Time Twitter Sentiment Analysis via Kafka, Spark Streaming, MongoDB and Django Dashboard".
- Host: GitHub
- URL: https://github.com/drisskhattabi6/real-time-twitter-sentiment-analysis
- Owner: drisskhattabi6
- Created: 2024-05-19T13:23:42.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-05-19T13:56:22.000Z (over 1 year ago)
- Last Synced: 2024-05-19T14:35:13.336Z (over 1 year ago)
- Language: Python
- Size: 1.1 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Big Data Project: Real-Time Twitter Sentiment Analysis Using Kafka, Spark (MLLib & Streaming), MongoDB and Django.
## Overview
This repository contains a Big Data project focused on real-time sentiment analysis of Twitter data (classification of tweets). The project leverages various technologies to collect, process, analyze, and visualize sentiment data from tweets in real-time.
## Project Architecture
The project is built using the following components:
- **Apache Kafka**: Used for real-time data ingestion from Twitter DataSet.
- **Spark Streaming**: Processes the streaming data from Kafka to perform sentiment analysis.
- **MongoDB**: Stores the processed sentiment data.
- **Django**: Serves as the web framework for building a real-time dashboard to visualize the sentiment analysis results.
- **chart.js** & **matplotlib** : for plotting.- This is the project plan :
## Features
- **Real-time Data Ingestion**: Collects live tweets using Kafka from the Twitter DataSet.
- **Stream Processing**: Utilizes Spark Streaming to process and analyze the data in real-time.
- **Sentiment Analysis**: Classifies tweets into different sentiment categories (positive, negative, neutral) using natural language processing (NLP) techniques.
- **Data Storage**: Stores the sentiment analysis results in MongoDB for persistence.
- **Visualization**: Provides a real-time dashboard built with Django to visualize the sentiment trends and insights.## Data description:
In This Project I'm using a Dataset (twitter_training.csv and twitter_validation.csv) to create pyspark Model and for create live tweets using Kafka. Each line of the "twitter_training.csv" learning database represents a Tweet, it contains over 74682 lines;
The data types of Features are:
- Tweet ID: int
- Entity: string
- Sentiment: string (Target)
- Tweet content: stringThe validation database “twitter_validation.csv” contains 998 lines (Tweets) with the same features of “twitter_training.csv”.
This is the Data Source:
https://www.kaggle.com/datasets/jp797498e/twitter-entity-sentiment-analysis## Repository Structure
- **Django-Dashboard** : this folder contains Dashboard Django Application
- **Kafka-PySpark** : this folder contains kafka provider and pyspark streaming (kafka consumer).
- **ML PySpark Model** : this folder contains the trained model with jupyter notebook and datasets.
- **zk-single-kafka-single.yml** : Download and install Apache Kafka in docker.
- **bigdataproject rapport** : a brief report about the project (in french).## Getting Started
### Prerequisites
To run this project, you will need the following installed on your system:
- Docker (for runing Kafka)
- Python 3.x
- Apache Kafka
- Apache Spark (PySpark for python)
- MongoDB
- Django### Installation
1. **Clone the repository**:
```bash
git clone https://github.com/drisskhattabi6/Real-Time-Twitter-Sentiment-Analysis.git
cd Real-Time-Twitter-Sentiment-Analysis
```
2. **Installing Docker Desktop**3. **Set up Kafka**:
- Download and install Apache Kafka in docker using :
```bash
docker-compose -f zk-single-kafka-single.yml up -d
```5. **Set up MongoDB**:
- Download and install MongoDB.
- It is recommended to install also **MongoDBCompass** to visualize data and makes working with mongodb easier.6. **Install Python dependencies**:
- To install pySpark - PyMongo - Django ...
```bash
pip install -r requirements.txt
```### Running the Project
Note : you will need MongoDB for Running the Kafka and Spark Streaming application and for Running Django Dashboard application.
- **Start MongoDB**:
- using command line :
```bash
sudo systemctl start mongod
```
- then use **MongoDBCompass** (Recommended).#### Running the Kafka and Spark Streaming application :
1. **Change the directory to the application**:
```bash
cd Kafka-PySpark
```2. **Start Kafka in docker**:
- using command line :
```bash
docker exec -it /bin/bash
```
- or using docker desktop :
4. **Run kafka Zookeeper and a Broker**:
```bash
kafka-topics --create --topic twitter --bootstrap-server localhost:9092
kafka-topics --describe --topic twitter --bootstrap-server localhost:9092
```5. **Run kafka provider app**:
```bash
py producer-validation-tweets.py
```6. **Run pyspark streaming (kafka consumer) app**:
```bash
py consumer-pyspark.py
```
this is an img of the MongoDBCompass after Running the Kafka and Spark Streaming application :

#### Running Django Dashboard application :
1. **Change the directory to the application**:
```bash
cd Django-Dashboard
```2. **Creating static folder**:
```bash
python manage.py collectstatic
```3. **Run the Django server**:
```bash
python manage.py runserver
```4. **Access the Dashboard**:
Open your web browser and go to `http://127.0.0.1:8000` to view the real-time sentiment analysis dashboard.

## More informations :
- Django Dashboard get the data from MongoDb DataBase.
- the User can classify his owne text in `http://127.0.0.1:8000/classify` link.
- in the Dashboard, There is a table contains tweets with labels.
- in the Dashboard, There is 3 statistics or plots : labels rates - pie plot - bar plot.## Team :
- [Khattabi Idriss](https://github.com/drisskhattabi6)
- [Boufarhi Ayman](https://github.com/aymanboufarhi)
- [Abdelali IBN TABET](https://github.com/abd-ibn)## Supervised By :
- Prof. **Yasyn El Yusufi**
---
Abdelmalek Essaadi University - Faculty of Sciences and Technology of Tangier
- Master: Artificial Intelligence and Data Science
- Module: Big Data---
- By following the above instructions, you should be able to set up and run the real-time Twitter sentiment analysis project on your local machine. Happy coding!
- Feel free to explore the project and customize it according to your requirements. If you encounter any issues or have any questions, don't hesitate to reach out!