Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/skullick/movie-review-rating
Movie review binary sentiment classification using RandomForest and TF-IDF
https://github.com/skullick/movie-review-rating
jupyter-notebook random-forest sentiment-analysis sklearn tf-idf
Last synced: 12 days ago
JSON representation
Movie review binary sentiment classification using RandomForest and TF-IDF
- Host: GitHub
- URL: https://github.com/skullick/movie-review-rating
- Owner: skullick
- Created: 2024-08-28T02:31:32.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2024-08-28T02:47:17.000Z (3 months ago)
- Last Synced: 2024-11-03T04:02:49.574Z (12 days ago)
- Topics: jupyter-notebook, random-forest, sentiment-analysis, sklearn, tf-idf
- Language: Jupyter Notebook
- Homepage:
- Size: 30.3 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Movie Review Rating
This project is a sentiment analysis application that uses a Random Forest classifier to predict the sentiment (positive or negative) of movie reviews from the IMDB dataset. The project includes data preprocessing, feature extraction using TF-IDF vectorization, model training, hyperparameter tuning, and model evaluation.
## Overview
The aim of this project is to develop a machine learning model capable of identifying the sentiment expressed in movie reviews as either positive or negative. This can help businesses and individuals understand customer feedback and make informed decisions based on public sentiment.
## Dataset
The project uses the [IMDB Movie Reviews dataset](https://ai.stanford.edu/~amaas/data/sentiment/), which consists of 50,000 movie reviews labeled as either 'positive' or 'negative'. The dataset is split evenly into 25,000 training and 25,000 testing samples.
## Features
- **Data Preprocessing**: Removes duplicates, cleans HTML tags, removes stopwords...
- **TF-IDF Vectorization**: Converts text data into numerical feature vectors using TF-IDF.
- **Model Training**: Trains a model for binary sentiment classification using RandomForestClassifier.
- **Hyperparameter Tuning**: Optimizes model parameters using GridSearchCV.
- **Model Evaluation**: Measures some performance metrics and visualizes with a confusion matrix and a feature importances plot.