https://github.com/safwanshamsir99/sentiment-analysis

Trained over 60,000 IMDB rating to categorize positive and negative review
https://github.com/safwanshamsir99/sentiment-analysis

bi-lstm-model classification nlp sentiment-analysis

Last synced: over 1 year ago
JSON representation

Trained over 60,000 IMDB rating to categorize positive and negative review

Host: GitHub
URL: https://github.com/safwanshamsir99/sentiment-analysis
Owner: safwanshamsir99
Created: 2022-06-17T03:07:10.000Z (about 4 years ago)
Default Branch: main
Last Pushed: 2023-01-24T14:59:14.000Z (over 3 years ago)
Last Synced: 2025-02-07T11:53:42.326Z (over 1 year ago)
Topics: bi-lstm-model, classification, nlp, sentiment-analysis
Language: Python
Homepage:
Size: 11.9 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          



![GitHub](https://img.shields.io/badge/github-%23121011.svg?style=for-the-badge&logo=github&logoColor=white)

![NumPy](https://img.shields.io/badge/numpy-%23013243.svg?style=for-the-badge&logo=numpy&logoColor=white)

![Pandas](https://img.shields.io/badge/pandas-%23150458.svg?style=for-the-badge&logo=pandas&logoColor=white)

![scikit-learn](https://img.shields.io/badge/scikit--learn-%23F7931E.svg?style=for-the-badge&logo=scikit-learn&logoColor=white)



![model_loss](static/imdb.png)

# Predictive classification model using Natural Language Processing (NLP) for IMDB movie rating.

 Using deep learning model to train over 49,000 IMDB rating dataset to categorize either the review is positive and negative.

## Description

1. The project's objective is to categorize the IMDB movies rating. 

2. The IMDB movie reviews contain enormous amount of data, which can be used to predict whether the movie review is a negative or positive review. 

3. The dataset contains anomalies such as HTML tags (removed using RegEx), lowercase/uppercase, and duplicates data.

4. The method used for the deep learning model are word embedding, LSTM and Bidirectional.

5. Several method can be used to improve the model such as lemmatization, stemming, CNN, n-grams, etc.

### Deep learning model images

![model_architecture](static/model.png)

## Results

### Training loss & Validation loss:

![model_loss](static/loss.png)

### Training accuracy & Validation accuracy:

![model_accuracy](static/accuracy.png)

### Model score:

![model_score](static/score_sentiment.PNG)

## Discussion

1. The model achieved 84% accuracy during training. 

2. Both recall and f1 score report 85%. 

3. However, the model starts to overfit after 2nd epochs. Early stopping can be used to prevent overfitting. The dropout data can be increased to control overfitting.

## Credits:

Shout out to @Ankit152 for the IMDB Dataset. Check out the dataset by clicking the link below. :smile:

### Dataset link

[IMDB-Sentiment-Analysis](https://github.com/Ankit152/IMDB-sentiment-analysis)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/safwanshamsir99/sentiment-analysis

Awesome Lists containing this project

README