https://github.com/justsecret123/twitter-sentiment-analysis

A sentiment analysis model trained with Kaggle GPU on 1.6M examples, used to make inferences on 220k tweets about Messi and draw insights from their results.
https://github.com/justsecret123/twitter-sentiment-analysis

classification data-analysis data-science deep-learning deep-neural-networks docker glove-embeddings kaggle lstm lstm-neural-networks machine-learning natural-language-processing nlp python rnn scikit-learn sentiment-analysis sentiment-classification tensorflow word-embeddings

Last synced: 6 months ago
JSON representation

A sentiment analysis model trained with Kaggle GPU on 1.6M examples, used to make inferences on 220k tweets about Messi and draw insights from their results.

Host: GitHub
URL: https://github.com/justsecret123/twitter-sentiment-analysis
Owner: Justsecret123
License: gpl-3.0
Created: 2021-09-21T19:08:46.000Z (almost 5 years ago)
Default Branch: main
Last Pushed: 2022-05-22T10:21:17.000Z (about 4 years ago)
Last Synced: 2025-03-21T13:44:16.147Z (over 1 year ago)
Topics: classification, data-analysis, data-science, deep-learning, deep-neural-networks, docker, glove-embeddings, kaggle, lstm, lstm-neural-networks, machine-learning, natural-language-processing, nlp, python, rnn, scikit-learn, sentiment-analysis, sentiment-classification, tensorflow, word-embeddings
Language: Jupyter Notebook
Homepage:
Size: 24.2 MB
Stars: 2
Watchers: 1
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md

Awesome Lists containing this project

README

          # Twitter-sentiment-analysis ![Language_support](https://img.shields.io/pypi/pyversions/Tensorflow) ![Last_commit](https://img.shields.io/github/last-commit/JustSecret123/Human-pose-estimation) ![Workflow](https://img.shields.io/github/workflow/status/JustSecret123/Human-pose-estimation/Pylint/main) ![Tensorflow_version](https://img.shields.io/badge/Tensorflow%20version-2.6.2-orange)

A sentiment analysis model trained using a Kaggle GPU. Sentiment140 Dataset, with 1.6 million tweets.  

> **Deployed on my personal Docker Hub repository: [*Click here*](https://hub.docker.com/repository/docker/ibrahimserouis/my-tensorflow-models)

> **Kaggle Notebook link:  [Kaggle notebook](https://www.kaggle.com/ibrahimserouis99/twitter-sentiment-analysis)



  



# Dataset (Sentiment140+GloVe)

- Train/test split : 90% / 10% 

- Size : 1.6M samples 

- Link : [Dataset](https://www.kaggle.com/ibrahimserouis99/twitter-sentiment-analysis-and-word-embeddings)

# Model

- Model type : Sequential, RNN, Binary classification

- Optimizer : Adam

- Loss function : Binary cross entropy 

- Outputs : Sentiment score [0;1]

- Thresholds (fine-tuned):  >=0.625 ---> "Positive", <0.625 ----> "Negative"

- Best validation accuracy : 83%

- F1-score :  0.8340

- Version : 4

| Metric | Score |

|--------|-------|

Precision|**Negative**: 0.84; **Positive:** 0.82   |

Recall   |**Negative**: 0.82; **Positive:** 0.84 |

F-1 score|**Negative**: 0.83; **Positive:** 0.83

# Training 

- Training epochs : **initially** 50, but 22 with early stopping and a patience factor = 10

- Training environment : Kaggle GPU

## Architecture

![Model_architecture](Screenshots/Model%20architecture.png)

# Inferences (with Tensorflow Serving REST API)

![Inference example](Screenshots/Inference%20example.PNG)

# Some results using Power BI + Python

## Positive tweets

![Positives](Results/positive_messi.gif)

## Negative tweets 

![Negatives](Results/negative_messi.gif)

## Data by country (when available)

![Country](Results/country_messi.gif)

# Useful scripts and notebooks

## Notebooks 

> [Training notebook](Notebook/twitter-sentiment-analysis.ipynb)

> [How inferences were made on our dataset](Notebook/custom-nlp-classifier-on-football-tweets.ipynb)

> [Data cleaning notebook](Notebook/data-cleaning-messi-and-ronaldo-tweets.ipynb)

> [Data exploration notebook](Notebook/explore-tweets-about-messi-and-ronaldo.ipynb)

## Scripts

> [Link to the Tensorflow Sevring script](Scripts/test_the_model.py)

> **There's also a useful script (command line runner) that converts .h5 models to TF SavedModel format [here](Scripts/h5_to_savedmodel.py)

> ![Args](Screenshots/clr_args.PNG)

# Data collection (tweets about Messi and Ronaldo)

- Collected using the Twitter API 

- Scripts for searching and saving 100*n tweets containing a keyword : [Tweets about Messi](Scripts/search_n_times_100_messi_tweets.py) & [Tweets about Ronaldo](Scripts/search_n_times_100_ronaldo_tweets.py)

> **NOTE: Executing these scripts requires a developer account, as well as a bearer_token stored into a text file whose path is manually given into the code, or exported as an environment variable**

# Libraries

- **Deep Learning Framework :** Tensorflow 2.6 or higher 

- **Data visualization :** Pandas, Seaborn, Matplotlib

- **Regular expressions builder :** re 

- **NLP library :** NLTK

- **Train/test splitting, classification_report :** Scikit-learn

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/justsecret123/twitter-sentiment-analysis

Awesome Lists containing this project

README