Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/chintan45/amazon-product-review-classification

A simple classifier to predict the sentiments of Amazon product reviews.
https://github.com/chintan45/amazon-product-review-classification

machine-learning nlp nltk sentiment-classification

Last synced: 12 days ago
JSON representation

A simple classifier to predict the sentiments of Amazon product reviews.

Host: GitHub
URL: https://github.com/chintan45/amazon-product-review-classification
Owner: Chintan45
Created: 2024-04-09T06:02:59.000Z (9 months ago)
Default Branch: main
Last Pushed: 2024-04-09T07:16:34.000Z (9 months ago)
Last Synced: 2024-11-06T03:46:17.282Z (2 months ago)
Topics: machine-learning, nlp, nltk, sentiment-classification
Language: Jupyter Notebook
Homepage:
Size: 22 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Amazon Product Review Classification
- This project was focused on the development of a machine learning-based classifier designed to accurately categorize Amazon product reviews.
- By leveraging advanced algorithms such as Naive Bayes, SVM, Logistic Regression, XGBoost, and Random Forest, the classifier is able to achieve 89% accuracy and an 87% F1-score.
- In this project, I have used TF-IDF vectorizer for feature extraction to interpret and classify vast quantities of text data effectively.

### Project Overview

- **Objective**: To accurately classify Amazon product reviews using machine learning algorithms.
- **Algorithms Used**: Naive Bayes, SVM, Logistic Regression, XGBoost, and Random Forest.
- **Feature Extraction**: Utilized TF-IDF.
- **Validation Method**: 5-fold cross-validation.
- **Dataset**: 220,909 Amazon product reviews ([Link]('/AMAZON_FASHION_sample.json.gz')).

### Technologies Used

- **Python**: The primary programming language used for model development.
- **Jupyter Notebook**: For documenting the development process and performing data analysis.
- **Scikit-learn**: Utilized for model training, feature extraction, and evaluation.
- **XGBoost**: For implementing the XGBoost algorithm.
- **Pandas & NumPy**: For data manipulation and numerical calculations.
- **NLTK**: For text preprocessing and natural language processing tasks such as tokenization, stemming, lemmatization and parsing.

### Algorithm Performance

| Algorithm | Accuracy (%) | F1-Score (%) |
|--------------------|--------------|--------------|
| Naive Bayes | 88 | 83 |
| Random Forest | 88 | 82 |
| XGBoost | 88 | 85 |
| Logistic Regression| 89 | 86 |
| SVM | 89 | 87 |