https://github.com/dimitrivavoulisportfolio/aws-serverless-nlp-sentiment-4m-product-reviews
This is a production ready DistilBERT Sentiment Analysis model for product reviews designed to work as a low cost market research tool with the nuiance of an actual market researcher.
https://github.com/dimitrivavoulisportfolio/aws-serverless-nlp-sentiment-4m-product-reviews
aws distilbert distilbert-model large-dataset market-research nlp nlp-machine-learning product-reviews sentiment-analysis serverless
Last synced: 2 months ago
JSON representation
This is a production ready DistilBERT Sentiment Analysis model for product reviews designed to work as a low cost market research tool with the nuiance of an actual market researcher.
- Host: GitHub
- URL: https://github.com/dimitrivavoulisportfolio/aws-serverless-nlp-sentiment-4m-product-reviews
- Owner: DimitriVavoulisPortfolio
- License: apache-2.0
- Created: 2024-09-16T15:37:10.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2024-09-18T20:20:54.000Z (9 months ago)
- Last Synced: 2025-02-10T00:19:13.740Z (4 months ago)
- Topics: aws, distilbert, distilbert-model, large-dataset, market-research, nlp, nlp-machine-learning, product-reviews, sentiment-analysis, serverless
- Language: Python
- Homepage:
- Size: 1.92 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Sentiment Analysis Model for Product Reviews
## Project Overview
This project is a production-ready DistilBERT Sentiment Analysis model for product reviews desgined as a low cost market research tool with the nuiance of an actual market researcher, it was trained with a dataset of over 4 million Amazon product reviews to work for a wide variety of products in a highly scalable manner. This project comes with designs for a serverless live demo in AWS.
### Key Features
- Pre-trained DistilBERT model for sentiment analysis on product reviews
- Designed for low compute costs to maximize scalability
- Conversion from PyTorch to TensorFlow
- Data processing and tokenization pipeline
- Comprehensive testing of both original and converted models
- Designs of an AWS serverless architecture for a live demo with future deployment(TBD)## Project Structure
1. **1-Data processing**: Script for data preparation
2. **2-CSV files(Production test)**: CSV file used for production testing(Full dataset https://www.kaggle.com/datasets/bittlingmayer/amazonreviews)
3. **3-Tokenization and Training**: Tokenization process and initial model training
4. **4-distilbert_sentiment_model**: Original PyTorch DistilBERT model
5. **5-Testing model**: Scripts for testing the PyTorch model, both a write in and a CSV input script
6. **6-PyTorch2TensorFlow**: Conversion script from PyTorch to TensorFlow
7. **7-tensorflow_model**: Converted TensorFlow model
8. **8-Testing converted model**: Scripts for testing the TensorFlow model, includes a write in one, and 2 CSV input ones, one for testing the model, one for production usage## Documentation
- **MODEL.md**: https://github.com/DimitriVavoulisPortfolio/aws-serverless-nlp-sentiment-4M-product-reviews/blob/main/MODEL.md
- **PROCESS.md**: https://github.com/DimitriVavoulisPortfolio/aws-serverless-nlp-sentiment-4M-product-reviews/blob/main/PROCESS.md
- **AWS-IMPLEMENTATION-AND-COST-REPORT.md**: https://github.com/DimitriVavoulisPortfolio/aws-serverless-nlp-sentiment-4M-product-reviews/blob/main/AWS-IMPLEMENTATION-WITH-COST-REPORT.md## Model Performance
- **4M-dataset-sentiment_distribution.png**: Visualization of sentiment distribution in the dataset
- **400K-dataset-testing.png**: Results of model testing on a 400K sample dataset
- **Accuracy: 0.9973**
- **Precision: 0.9968**
- **Recall: 0.9979**
- **F1: 0.9973**
## Quick Start Guide1. Clone the repository:
```
git clone https://github.com/DimitriVavoulisPortfolio/aws-serverless-nlp-sentiment-4M-product-reviews.git
cd aws-serverless-nlp-sentiment-4M-product-reviews
```2. Install dependencies:
```
pip install tensorflow transformers torch pandas numpy scikit-learn onnx onnx-tf
```3. To test the PyTorch model:
```
python 5-Testing model\sentiment-analysis-testing-script-write-in.py
```4. To test the TensorFlow model:
```
python 8-Testing converted model\tensorflow-model-test-script.py
```
DISCLAIMER: The paths for the models have to be put in by the user## Future Work
- Implement AWS deployment
- Create API for real-time sentiment prediction
- Optimize model performance and size## License
This project is licensed under the Apache-2.0 license - see the [LICENSE](LICENSE) file for details.
## Contact
For any questions or feedback, please open an issue in this repository or contact [Dimitri Vavoulis](mailto:[email protected]).