https://github.com/ihabbendidi/sentiment_embeddings
A scientific benchmark and comparison of the performance of sentiment analysis models in NLP on small to medium datasets
https://github.com/ihabbendidi/sentiment_embeddings
3d-visualization benchmark bert colab doc2vec embedding-evaluation keras logistic-regression lstm nlp notebook python pytorch sentiment-analysis sentiment-embeddings textblob twitter-data visualization word2vec
Last synced: 6 months ago
JSON representation
A scientific benchmark and comparison of the performance of sentiment analysis models in NLP on small to medium datasets
- Host: GitHub
- URL: https://github.com/ihabbendidi/sentiment_embeddings
- Owner: IhabBendidi
- License: mit
- Created: 2020-12-07T13:59:09.000Z (over 5 years ago)
- Default Branch: main
- Last Pushed: 2020-12-14T05:31:32.000Z (over 5 years ago)
- Last Synced: 2025-08-16T13:01:15.566Z (8 months ago)
- Topics: 3d-visualization, benchmark, bert, colab, doc2vec, embedding-evaluation, keras, logistic-regression, lstm, nlp, notebook, python, pytorch, sentiment-analysis, sentiment-embeddings, textblob, twitter-data, visualization, word2vec
- Language: Jupyter Notebook
- Homepage:
- Size: 54 MB
- Stars: 13
- Watchers: 1
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Sentiment Analysis Benchmark
## A scientific benchmark and comparison of the performance of sentiment analysis models in NLP on small to medium datasets
[](https://colab.research.google.com/github/IhabBendidi/sentiment_embeddings/blob/main/sentiment_embeddings.ipynb)
[](https://github.com/IhabBendidi/sentiment_embeddings/blob/master/LICENSE)
**Authors :** *Ihab Bendidi*, *Yousra Bourkiche*, *Clément Siegrist*, *Kaouter Berrahal*
In general, documents with similar sentiments, would be close to each other in the embeddings feature space. This can become another method to judge the performance of sentiment analysis models.
In this work, we aim to perform a benchmark of recent sentiment analysis works and models, reproduce their results, and judge their performance in comparison to baseline methods.
## Outline
The following work in made on a jupyter notebook, that you can find [here](https://github.com/IhabBendidi/sentiment_embeddings/blob/main/sentiment_embeddings.ipynb), or open in Colab [here](https://colab.research.google.com/github/IhabBendidi/sentiment_embeddings/blob/main/sentiment_embeddings.ipynb).
**I - Processing & Exploratory Data Analysis**
- *Understanding the data*
- *Text Preprocessing*
**II - Sentiment classification models**
- *Bert Model*
- *LSTM recurrent model*
- *Baseline method : textblob*
**III - Document Embeddings**
- *Training doc2vec*
- *Doc2vec sentiment classifier*
**IV - Model performance visualisation**
- *Bert model*
- *LSTM model*
- *Logreg model*
- *Textblob*
You can also find `.pdf`report with code [here](https://github.com/IhabBendidi/sentiment_embeddings/blob/main/sentiment_embeddings.pdf).
### Installation
This was tested on Ubuntu 20.04 with Python 3.7, but should run on any device and any python 3 version.
Before running it, make sure to install dependencies, by running in terminal :
```
pip install -r requirements.txt
```
On Google colab, you would need to import the `requirements.txt` file, and the `tweets.csv` dataset to your colab session.