Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/safwanshamsir99/sentiment-analysis

Trained over 60,000 IMDB rating to categorize positive and negative review
https://github.com/safwanshamsir99/sentiment-analysis

Last synced: 9 days ago
JSON representation

Trained over 60,000 IMDB rating to categorize positive and negative review

Awesome Lists containing this project

README

        

python
spyder
![GitHub](https://img.shields.io/badge/github-%23121011.svg?style=for-the-badge&logo=github&logoColor=white)
![NumPy](https://img.shields.io/badge/numpy-%23013243.svg?style=for-the-badge&logo=numpy&logoColor=white)
![Pandas](https://img.shields.io/badge/pandas-%23150458.svg?style=for-the-badge&logo=pandas&logoColor=white)
![scikit-learn](https://img.shields.io/badge/scikit--learn-%23F7931E.svg?style=for-the-badge&logo=scikit-learn&logoColor=white)
tf

![model_loss](static/imdb.png)

# Predictive classification model using Natural Language Processing (NLP) for IMDB movie rating.
Using deep learning model to train over 49,000 IMDB rating dataset to categorize either the review is positive and negative.

## Description
1. The project's objective is to categorize the IMDB movies rating.
2. The IMDB movie reviews contain enormous amount of data, which can be used to predict whether the movie review is a negative or positive review.
3. The dataset contains anomalies such as HTML tags (removed using RegEx), lowercase/uppercase, and duplicates data.
4. The method used for the deep learning model are word embedding, LSTM and Bidirectional.
5. Several method can be used to improve the model such as lemmatization, stemming, CNN, n-grams, etc.

### Deep learning model images
![model_architecture](static/model.png)

## Results

### Training loss & Validation loss:

![model_loss](static/loss.png)

### Training accuracy & Validation accuracy:

![model_accuracy](static/accuracy.png)

### Model score:

![model_score](static/score_sentiment.PNG)

## Discussion
1. The model achieved 84% accuracy during training.
2. Both recall and f1 score report 85%.
3. However, the model starts to overfit after 2nd epochs. Early stopping can be used to prevent overfitting. The dropout data can be increased to control overfitting.

## Credits:
Shout out to @Ankit152 for the IMDB Dataset. Check out the dataset by clicking the link below. :smile:
### Dataset link
[IMDB-Sentiment-Analysis](https://github.com/Ankit152/IMDB-sentiment-analysis)