https://github.com/genaray/ml.shopanalytics
A minimalist Python & cloud ML project that trains on Amazon sales & review data to recommend optimal prices/discounts to boost ratings/sales and surface actionable visual insights. Powered end-to-end by AWS CloudFront, S3, ALB & Fargate and Svelte.
https://github.com/genaray/ml.shopanalytics
ai aws aws-alb aws-cloudfront aws-ecs aws-fargate aws-s3 cicd devops machine-learning python scikit-learn terraform
Last synced: 2 months ago
JSON representation
A minimalist Python & cloud ML project that trains on Amazon sales & review data to recommend optimal prices/discounts to boost ratings/sales and surface actionable visual insights. Powered end-to-end by AWS CloudFront, S3, ALB & Fargate and Svelte.
- Host: GitHub
- URL: https://github.com/genaray/ml.shopanalytics
- Owner: genaray
- Created: 2025-08-05T10:46:46.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2025-08-06T13:45:39.000Z (11 months ago)
- Last Synced: 2026-03-30T17:48:37.789Z (3 months ago)
- Topics: ai, aws, aws-alb, aws-cloudfront, aws-ecs, aws-fargate, aws-s3, cicd, devops, machine-learning, python, scikit-learn, terraform
- Language: Svelte
- Homepage:
- Size: 2.84 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Shop Analytics
A comprehensive e-commerce analytics platform that combines machine learning with modern web technologies and native cloud to provide predictive discounting insights and product recommendations.
## Overview
Shop Analytics is a full-stack application that analyzes Amazon product data to provide intelligent insights for e-commerce optimization. The platform features:
- **Predictive Discounting**: AI-powered recommendations for optimal discount percentages
- **Product Similarity**: Machine learning-based product recommendations
- **Real-time Analytics**: Interactive dashboard with live data visualization
- **Cloud-Native Architecture**: Scalable AWS-based infrastructure


### Technologies Included
**Backend:**
- **FastAPI** - Modern Python web framework for building APIs
- **scikit-learn** - Machine learning library for predictive models
- **pandas & numpy** - Data manipulation and numerical computing
- **uvicorn** - ASGI server for FastAPI
**Frontend:**
- **SvelteKit** - Full-stack web framework
- **TypeScript** - Type-safe JavaScript
- **Tailwind CSS** - Utility-first CSS framework
- **shadcn/ui** - Modern component library
- **TanStack Table** - Powerful data table component
- **Chart.js** - Interactive charts and visualizations
**Infrastructure:**
- **AWS ECS Fargate** - Containerized application hosting
- **AWS S3** - Static file storage and hosting
- **AWS CloudFront** - Global content delivery network
- **AWS ECR** - Container image registry
- **Terraform** - Infrastructure as Code
- **GitHub Actions** - CI/CD pipeline
## Prerequisites
- **Python 3.12+**
- **Node.js 20+**
- **AWS CLI** (for deployment)
- **Terraform 1.5+** (for infrastructure)
- **Docker** (for containerization)
## Setup & Build
### Clone
```bash
git clone https://github.com/your-username/ML.ShopAnalytics.git
cd ML.ShopAnalytics
```
### Configuration
1. **Backend Configuration**
```bash
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
```
2. **Frontend Configuration**
```bash
cd frontend
npm install
```
3. **AWS Configuration** (for deployment)
```bash
aws configure
# Enter your AWS Access Key ID, Secret Access Key, and region
```
### Build
1. **Train ML Models**
```bash
# Preprocess data
python src/data/preprocess.py
# Train predictive discounting model
python src/predictive_discounting/predictive_discounting.py
# Train similarity recommendation model
python src/similarity_recommendation/similarity_recommendation.py
```
2. **Build Frontend**
```bash
cd frontend
npm run build
```
### Run
#### Local Development
1. **Backend**
```bash
# From project root
python run_api.py
# Or with uvicorn directly
uvicorn src.app:app --reload --host 0.0.0.0 --port 8000
```
2. **Frontend**
```bash
cd frontend
npm run dev
```
3. **Access the application**
- Frontend: http://localhost:5173
- Backend API: http://localhost:8000
- API Documentation: http://localhost:8000/docs
#### Via CI/CD
The application is automatically deployed to AWS when changes are pushed to the `main` branch. The GitHub Actions workflow:
1. **Trains ML Models** - Preprocesses data and trains predictive models
2. **Builds & Pushes Images** - Creates Docker images and pushes to ECR
3. **Deploys Infrastructure** - Uses Terraform to manage AWS resources
4. **Updates Services** - Deploys new versions to ECS Fargate
## Endpoints
### Health Check
- `GET /health` - Application health status
### Products
- `GET /api/v1/products` - List products with pagination and search
- `GET /api/v1/products/{product_id}` - Get specific product details
### Predictive Discounting
- `POST /api/v1/predictive-discounting/predict-discount` - Get discount recommendations
**Request Body:**
```json
{
"product_category": "Electronics",
"product_price_actual": 299.99,
"product_rating_avg": 4.5,
"product_description": "High-quality wireless headphones"
}
```
**Response:**
```json
{
"best_discount_pct": 0.15,
"best_predicted_rating_count": 1250,
"confidence_score": 0.87
}
```
### Similarity Recommendations
- `POST /api/v1/similarity/find-similar` - Find similar products
**Request Body:**
```json
{
"product_name": "Wireless Headphones",
"product_category": "Electronics",
"product_price_actual": 299.99,
"product_discount_pct": 0.1,
"product_rating_avg": 4.5,
"product_rating_count": 1200,
"product_description": "Premium wireless headphones",
"n_recommendations": 5
}
```
## ML
### Technologies Used
- **scikit-learn** - Primary ML framework
- **Random Forest Regressor** - For predictive discounting
- **TF-IDF Vectorization** - Text feature extraction
- **Custom Transformers** - Feature engineering and preprocessing
- **Joblib** - Model serialization and caching
### Predictive Discounting Model
The predictive discounting system uses a machine learning pipeline that:
1. **Feature Engineering**
- Text processing of product descriptions using TF-IDF
- Category encoding with OneHotEncoder
- Price and rating normalization
- Custom transformers for category splitting and weight scaling
2. **Model Architecture**
- Random Forest Regressor for robust predictions
- Pipeline-based approach for consistent preprocessing
- KNearest-Neighbour with MultilabelBinarizer for similarity recommendations
3. **Training Process**
- Uses historical [Amazon product data](https://www.kaggle.com/datasets/karkavelrajaj/amazon-sales-dataset)
- Predicts optimal discount percentages based on product characteristics
- Estimates expected rating count improvements
### Similarity Recommendation Model
The similarity system provides product recommendations by:
1. **Feature Extraction**
- Multi-label binarization for categories
- Text similarity using TF-IDF
- Numerical feature scaling
2. **Similarity Calculation**
- Cosine similarity for text features
- Euclidean distance for numerical features
- Weighted combination of multiple similarity metrics
3. **Recommendation Engine**
- Finds products with similar characteristics
- Ranks by similarity score
- Returns top N recommendations
## Architecture
### AWS Infrastructure
The application is deployed on AWS using a modern, scalable architecture:
#### Compute Layer
- **ECS Fargate** - Serverless container orchestration
- **Application Load Balancer** - Traffic distribution and SSL termination
- **Auto Scaling** - Automatic scaling based on demand
#### Storage Layer
- **S3** - Static frontend hosting and data storage
- **ECR** - Container image registry
- **CloudWatch Logs** - Centralized logging
#### Network Layer
- **CloudFront** - Global CDN for frontend and API
- **Route 53** - DNS management
- **VPC** - Network isolation and security
#### Security
- **IAM Roles** - Least privilege access control
- **Security Groups** - Network-level security
- **WAF** - Web application firewall (optional)
### Application Architecture
```
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Frontend │ │ API Gateway │ │ Backend │
│ (SvelteKit) │◄──►│ (CloudFront) │◄──►│ (FastAPI) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ S3 Bucket │ │ ALB │ │ ML Models │
│ (Static Host) │ │ (Load Bal.) │ │ (Joblib) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
```
### Data Flow
1. **User Request** → CloudFront → ALB → ECS Fargate
2. **API Processing** → FastAPI → ML Models → Response
3. **Static Assets** → S3 → CloudFront → User
## Outlook & Improvements
### Possible Enhancements
1. **Advanced ML Features**
- Real-time model retraining with new data
- A/B testing framework for discount strategies
- Personalized recommendations based on user behavior
- Time-series analysis for seasonal trends
2. **Performance Optimizations**
- Redis caching for frequently accessed data
- Database integration (PostgreSQL/RDS)
- GraphQL API for more efficient data fetching
- CDN optimization for global performance
3. **User Experience**
- Real-time notifications for price changes
- Advanced filtering and sorting options
- Export functionality for reports
- Mobile-responsive design improvements
4. **Infrastructure Enhancements**
- Multi-region deployment for better latency
- Blue-green deployment strategy
- Enhanced monitoring and alerting
- Cost optimization and resource management
- Sagemaker for ML Training
5. **Analytics & Reporting**
- Advanced dashboard with more metrics
- Custom report generation
- Data visualization improvements
- Integration with external analytics tools
### Technical Debt
- Implement comprehensive unit and integration tests
- Add API rate limiting and authentication
- Improve error handling and logging
- Optimize ML model performance and accuracy
- Enhance security measures and compliance
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Support
For support and questions, please open an issue in the GitHub repository or contact the development team.