https://github.com/ihabbendidi/sentiment_embeddings

A scientific benchmark and comparison of the performance of sentiment analysis models in NLP on small to medium datasets
https://github.com/ihabbendidi/sentiment_embeddings

3d-visualization benchmark bert colab doc2vec embedding-evaluation keras logistic-regression lstm nlp notebook python pytorch sentiment-analysis sentiment-embeddings textblob twitter-data visualization word2vec

Last synced: 9 months ago
JSON representation

A scientific benchmark and comparison of the performance of sentiment analysis models in NLP on small to medium datasets

Host: GitHub
URL: https://github.com/ihabbendidi/sentiment_embeddings
Owner: IhabBendidi
License: mit
Created: 2020-12-07T13:59:09.000Z (over 5 years ago)
Default Branch: main
Last Pushed: 2020-12-14T05:31:32.000Z (over 5 years ago)
Last Synced: 2025-08-16T13:01:15.566Z (11 months ago)
Topics: 3d-visualization, benchmark, bert, colab, doc2vec, embedding-evaluation, keras, logistic-regression, lstm, nlp, notebook, python, pytorch, sentiment-analysis, sentiment-embeddings, textblob, twitter-data, visualization, word2vec
Language: Jupyter Notebook
Homepage:
Size: 54 MB
Stars: 13
Watchers: 1
Forks: 3
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Sentiment Analysis Benchmark
## A scientific benchmark and comparison of the performance of sentiment analysis models in NLP on small to medium datasets

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/IhabBendidi/sentiment_embeddings/blob/main/sentiment_embeddings.ipynb)
[![GitHub license](https://img.shields.io/github/license/Naereen/StrapDown.js.svg)](https://github.com/IhabBendidi/sentiment_embeddings/blob/master/LICENSE)

**Authors :** *Ihab Bendidi*, *Yousra Bourkiche*, *Clément Siegrist*, *Kaouter Berrahal*

In general, documents with similar sentiments, would be close to each other in the embeddings feature space. This can become another method to judge the performance of sentiment analysis models.

In this work, we aim to perform a benchmark of recent sentiment analysis works and models, reproduce their results, and judge their performance in comparison to baseline methods.

## Outline

The following work in made on a jupyter notebook, that you can find [here](https://github.com/IhabBendidi/sentiment_embeddings/blob/main/sentiment_embeddings.ipynb), or open in Colab [here](https://colab.research.google.com/github/IhabBendidi/sentiment_embeddings/blob/main/sentiment_embeddings.ipynb).

**I - Processing & Exploratory Data Analysis**
- *Understanding the data*
- *Text Preprocessing*

**II - Sentiment classification models**
- *Bert Model*
- *LSTM recurrent model*
- *Baseline method : textblob*

**III - Document Embeddings**
- *Training doc2vec*
- *Doc2vec sentiment classifier*

**IV - Model performance visualisation**
- *Bert model*
- *LSTM model*
- *Logreg model*
- *Textblob*

You can also find `.pdf`report with code [here](https://github.com/IhabBendidi/sentiment_embeddings/blob/main/sentiment_embeddings.pdf).

### Installation

This was tested on Ubuntu 20.04 with Python 3.7, but should run on any device and any python 3 version.

Before running it, make sure to install dependencies, by running in terminal :

```
pip install -r requirements.txt
```

On Google colab, you would need to import the `requirements.txt` file, and the `tweets.csv` dataset to your colab session.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ihabbendidi/sentiment_embeddings

Awesome Lists containing this project

README