Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/xgagandeep/twitter-sentiment-analysis
A Dashboard to perfom sentimental analysis and visualizing the factors that affect people sentiments.
https://github.com/xgagandeep/twitter-sentiment-analysis
Last synced: 6 days ago
JSON representation
A Dashboard to perfom sentimental analysis and visualizing the factors that affect people sentiments.
- Host: GitHub
- URL: https://github.com/xgagandeep/twitter-sentiment-analysis
- Owner: xgagandeep
- Created: 2023-08-12T04:52:01.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-09-11T05:35:59.000Z (2 months ago)
- Last Synced: 2024-09-12T01:00:51.730Z (2 months ago)
- Language: Python
- Homepage:
- Size: 7.33 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Twitter Sentiment Analysis using BERT and TensorFlow
This repository provides a complete workflow for performing sentiment analysis on Twitter data using both PyTorch's BERT model and visualization tools like Streamlit. The project includes three core components:
1. **Data Preprocessing (`Preprocessing.py`)**: Cleans and processes the tweet dataset.
2. **Model Training (`Model_training.py`)**: Trains a BERT-based model for binary sentiment classification.
3. **Sentiment Analysis and Visualization (`model.py`)**: Performs real-time sentiment analysis using a pre-trained model and visualizes the results with interactive charts in a Streamlit web application.
![image](https://github.com/user-attachments/assets/25a83c9c-bcb3-4f45-b225-f77dbe3711fc)## Table of Contents
- [Installation](#installation)
- [Dataset](#dataset)
- [Data Preprocessing](#data-preprocessing)
- [Model Training](#model-training)
- [Sentiment Analysis & Visualization](#sentiment-analysis--visualization)
- [Project Structure](#project-structure)## Installation
To run this project, you need to install the required dependencies. First, clone this repository:
```bash
git clone https://github.com/xgagandeep/twitter-sentiment-analysis.git
cd twitter-sentiment-analysis
```Install the necessary Python packages:
```bash
pip install -r requirements.txt
```## Dataset
The dataset used for this project consists of tweets stored in `tweets1.csv`. Each tweet includes the following fields:
- `target`: Sentiment labels (0 for negative, 4 for positive; converted to 1 during preprocessing)
- `id`: Unique tweet ID
- `date`: Timestamp of the tweet
- `flag`: Quality flag
- `user`: Username of the tweet author
- `text`: The actual tweet contentAfter preprocessing, the dataset will contain the columns:
- `target`: Sentiment labels (0 for negative, 1 for positive)
- `user`: Username of the tweet author
- `text`: The actual tweet content
- `DayofWeek`: Day of the week when the tweet was posted
- `date`: Reformatted date of the tweet## Data Preprocessing
The `Preprocessing.py` script handles data cleaning and preparation:
1. Load the dataset from `tweets1.csv`.
2. Drop unnecessary columns (`id`, `flag`).
3. Randomly sample 100,000 rows.
4. Extract date information (day of the week, month, and year).
5. Convert the positive sentiment label from `4` to `1`.To run the script:
```bash
python Preprocessing.py
```This will generate a processed dataset (`tweets100kfinalf.csv`).
## Model Training
The `Model_training.py` script trains a BERT-based sentiment classification model using the processed dataset:
1. Load the pre-trained BERT tokenizer and model.
2. Split the dataset into training (80%) and testing (20%) sets.
3. Tokenize the tweet text and encode it for BERT.
4. Train the model for 3 epochs.
5. Save the trained model to `model50000.pt`.To run the model training:
```bash
python Model_training.py
```The trained model will be saved as `model50000.pt`.
## Sentiment Analysis & Visualization
The `model.py` script runs the pre-trained sentiment analysis model and generates real-time interactive visualizations using Streamlit. It provides the following features:
- **Text Sentiment Analysis**: Enter a tweet and predict its sentiment (positive or negative).
- **Sentiment Timeline**: A line chart visualizing the trend of sentiments over time.
- **Analysis on Weekdays**: A bar chart displaying sentiment distributions across weekdays.
- **Word Cloud**: Displays word clouds of negative and positive tweets for the selected day of the week.
- **Top 10 Users**: Pie charts showing users with the highest number of negative and positive tweets.
![image](https://github.com/user-attachments/assets/9b99b39f-eafc-486d-b2e6-87a35d565833)
![image](https://github.com/user-attachments/assets/b68882ba-7433-4cd5-88bd-4f2422047da3)To run the web app locally:
```bash
streamlit run model.py
```The app will be accessible at `http://localhost:8501` .
## Project Structure
```
.
├── Preprocessing.py # Data preprocessing script
├── Model_training.py # Model training script
├── model.py # Streamlit app for sentiment analysis and visualization
├── tweets1.csv # Original dataset #too big of a file to upload on github
├── tweets100kfinalf.csv # Processed dataset
├── model50000.pt # Trained model (generated after running Model_training.py) #too big of a file to upload on github
└── requirements.txt # Python package dependencies
```