https://github.com/justsecret123/twitter-sentiment-analysis
A sentiment analysis model trained with Kaggle GPU on 1.6M examples, used to make inferences on 220k tweets about Messi and draw insights from their results.
https://github.com/justsecret123/twitter-sentiment-analysis
classification data-analysis data-science deep-learning deep-neural-networks docker glove-embeddings kaggle lstm lstm-neural-networks machine-learning natural-language-processing nlp python rnn scikit-learn sentiment-analysis sentiment-classification tensorflow word-embeddings
Last synced: 6 months ago
JSON representation
A sentiment analysis model trained with Kaggle GPU on 1.6M examples, used to make inferences on 220k tweets about Messi and draw insights from their results.
- Host: GitHub
- URL: https://github.com/justsecret123/twitter-sentiment-analysis
- Owner: Justsecret123
- License: gpl-3.0
- Created: 2021-09-21T19:08:46.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2022-05-22T10:21:17.000Z (about 4 years ago)
- Last Synced: 2025-03-21T13:44:16.147Z (about 1 year ago)
- Topics: classification, data-analysis, data-science, deep-learning, deep-neural-networks, docker, glove-embeddings, kaggle, lstm, lstm-neural-networks, machine-learning, natural-language-processing, nlp, python, rnn, scikit-learn, sentiment-analysis, sentiment-classification, tensorflow, word-embeddings
- Language: Jupyter Notebook
- Homepage:
- Size: 24.2 MB
- Stars: 2
- Watchers: 1
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# Twitter-sentiment-analysis    
A sentiment analysis model trained using a Kaggle GPU. Sentiment140 Dataset, with 1.6 million tweets.
> **Deployed on my personal Docker Hub repository: [*Click here*](https://hub.docker.com/repository/docker/ibrahimserouis/my-tensorflow-models)
> **Kaggle Notebook link: [Kaggle notebook](https://www.kaggle.com/ibrahimserouis99/twitter-sentiment-analysis)
# Dataset (Sentiment140+GloVe)
- Train/test split : 90% / 10%
- Size : 1.6M samples
- Link : [Dataset](https://www.kaggle.com/ibrahimserouis99/twitter-sentiment-analysis-and-word-embeddings)
# Model
- Model type : Sequential, RNN, Binary classification
- Optimizer : Adam
- Loss function : Binary cross entropy
- Outputs : Sentiment score [0;1]
- Thresholds (fine-tuned): >=0.625 ---> "Positive", <0.625 ----> "Negative"
- Best validation accuracy : 83%
- F1-score : 0.8340
- Version : 4
| Metric | Score |
|--------|-------|
Precision|**Negative**: 0.84; **Positive:** 0.82 |
Recall |**Negative**: 0.82; **Positive:** 0.84 |
F-1 score|**Negative**: 0.83; **Positive:** 0.83
# Training
- Training epochs : **initially** 50, but 22 with early stopping and a patience factor = 10
- Training environment : Kaggle GPU
## Architecture

# Inferences (with Tensorflow Serving REST API)

# Some results using Power BI + Python
## Positive tweets

## Negative tweets

## Data by country (when available)

# Useful scripts and notebooks
## Notebooks
> [Training notebook](Notebook/twitter-sentiment-analysis.ipynb)
> [How inferences were made on our dataset](Notebook/custom-nlp-classifier-on-football-tweets.ipynb)
> [Data cleaning notebook](Notebook/data-cleaning-messi-and-ronaldo-tweets.ipynb)
> [Data exploration notebook](Notebook/explore-tweets-about-messi-and-ronaldo.ipynb)
## Scripts
> [Link to the Tensorflow Sevring script](Scripts/test_the_model.py)
> **There's also a useful script (command line runner) that converts .h5 models to TF SavedModel format [here](Scripts/h5_to_savedmodel.py)
> 
# Data collection (tweets about Messi and Ronaldo)
- Collected using the Twitter API
- Scripts for searching and saving 100*n tweets containing a keyword : [Tweets about Messi](Scripts/search_n_times_100_messi_tweets.py) & [Tweets about Ronaldo](Scripts/search_n_times_100_ronaldo_tweets.py)
> **NOTE: Executing these scripts requires a developer account, as well as a bearer_token stored into a text file whose path is manually given into the code, or exported as an environment variable**
# Libraries
- **Deep Learning Framework :** Tensorflow 2.6 or higher
- **Data visualization :** Pandas, Seaborn, Matplotlib
- **Regular expressions builder :** re
- **NLP library :** NLTK
- **Train/test splitting, classification_report :** Scikit-learn