Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/chintan45/amazon-product-review-classification
A simple classifier to predict the sentiments of Amazon product reviews.
https://github.com/chintan45/amazon-product-review-classification
machine-learning nlp nltk sentiment-classification
Last synced: 12 days ago
JSON representation
A simple classifier to predict the sentiments of Amazon product reviews.
- Host: GitHub
- URL: https://github.com/chintan45/amazon-product-review-classification
- Owner: Chintan45
- Created: 2024-04-09T06:02:59.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2024-04-09T07:16:34.000Z (9 months ago)
- Last Synced: 2024-11-06T03:46:17.282Z (2 months ago)
- Topics: machine-learning, nlp, nltk, sentiment-classification
- Language: Jupyter Notebook
- Homepage:
- Size: 22 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Amazon Product Review Classification
- This project was focused on the development of a machine learning-based classifier designed to accurately categorize Amazon product reviews.
- By leveraging advanced algorithms such as Naive Bayes, SVM, Logistic Regression, XGBoost, and Random Forest, the classifier is able to achieve 89% accuracy and an 87% F1-score.
- In this project, I have used TF-IDF vectorizer for feature extraction to interpret and classify vast quantities of text data effectively.
### Project Overview
- **Objective**: To accurately classify Amazon product reviews using machine learning algorithms.
- **Algorithms Used**: Naive Bayes, SVM, Logistic Regression, XGBoost, and Random Forest.
- **Feature Extraction**: Utilized TF-IDF.
- **Validation Method**: 5-fold cross-validation.
- **Dataset**: 220,909 Amazon product reviews ([Link]('/AMAZON_FASHION_sample.json.gz')).### Technologies Used
- **Python**: The primary programming language used for model development.
- **Jupyter Notebook**: For documenting the development process and performing data analysis.
- **Scikit-learn**: Utilized for model training, feature extraction, and evaluation.
- **XGBoost**: For implementing the XGBoost algorithm.
- **Pandas & NumPy**: For data manipulation and numerical calculations.
- **NLTK**: For text preprocessing and natural language processing tasks such as tokenization, stemming, lemmatization and parsing.### Algorithm Performance
| Algorithm | Accuracy (%) | F1-Score (%) |
|--------------------|--------------|--------------|
| Naive Bayes | 88 | 83 |
| Random Forest | 88 | 82 |
| XGBoost | 88 | 85 |
| Logistic Regression| 89 | 86 |
| SVM | 89 | 87 |