An open API service indexing awesome lists of open source software.

https://github.com/genaray/ml.shopanalytics

A minimalist Python & cloud ML project that trains on Amazon sales & review data to recommend optimal prices/discounts to boost ratings/sales and surface actionable visual insights. Powered end-to-end by AWS CloudFront, S3, ALB & Fargate and Svelte.
https://github.com/genaray/ml.shopanalytics

ai aws aws-alb aws-cloudfront aws-ecs aws-fargate aws-s3 cicd devops machine-learning python scikit-learn terraform

Last synced: 2 months ago
JSON representation

A minimalist Python & cloud ML project that trains on Amazon sales & review data to recommend optimal prices/discounts to boost ratings/sales and surface actionable visual insights. Powered end-to-end by AWS CloudFront, S3, ALB & Fargate and Svelte.

Awesome Lists containing this project

README

          

# Shop Analytics

A comprehensive e-commerce analytics platform that combines machine learning with modern web technologies and native cloud to provide predictive discounting insights and product recommendations.

## Overview

Shop Analytics is a full-stack application that analyzes Amazon product data to provide intelligent insights for e-commerce optimization. The platform features:

- **Predictive Discounting**: AI-powered recommendations for optimal discount percentages
- **Product Similarity**: Machine learning-based product recommendations
- **Real-time Analytics**: Interactive dashboard with live data visualization
- **Cloud-Native Architecture**: Scalable AWS-based infrastructure

![The products dashboard](images/products_dashboard.png)
![The products insights](images/products_insights.png)

### Technologies Included

**Backend:**
- **FastAPI** - Modern Python web framework for building APIs
- **scikit-learn** - Machine learning library for predictive models
- **pandas & numpy** - Data manipulation and numerical computing
- **uvicorn** - ASGI server for FastAPI

**Frontend:**
- **SvelteKit** - Full-stack web framework
- **TypeScript** - Type-safe JavaScript
- **Tailwind CSS** - Utility-first CSS framework
- **shadcn/ui** - Modern component library
- **TanStack Table** - Powerful data table component
- **Chart.js** - Interactive charts and visualizations

**Infrastructure:**
- **AWS ECS Fargate** - Containerized application hosting
- **AWS S3** - Static file storage and hosting
- **AWS CloudFront** - Global content delivery network
- **AWS ECR** - Container image registry
- **Terraform** - Infrastructure as Code
- **GitHub Actions** - CI/CD pipeline

## Prerequisites

- **Python 3.12+**
- **Node.js 20+**
- **AWS CLI** (for deployment)
- **Terraform 1.5+** (for infrastructure)
- **Docker** (for containerization)

## Setup & Build

### Clone

```bash
git clone https://github.com/your-username/ML.ShopAnalytics.git
cd ML.ShopAnalytics
```

### Configuration

1. **Backend Configuration**
```bash
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt
```

2. **Frontend Configuration**
```bash
cd frontend
npm install
```

3. **AWS Configuration** (for deployment)
```bash
aws configure
# Enter your AWS Access Key ID, Secret Access Key, and region
```

### Build

1. **Train ML Models**
```bash
# Preprocess data
python src/data/preprocess.py

# Train predictive discounting model
python src/predictive_discounting/predictive_discounting.py

# Train similarity recommendation model
python src/similarity_recommendation/similarity_recommendation.py
```

2. **Build Frontend**
```bash
cd frontend
npm run build
```

### Run

#### Local Development

1. **Backend**
```bash
# From project root
python run_api.py
# Or with uvicorn directly
uvicorn src.app:app --reload --host 0.0.0.0 --port 8000
```

2. **Frontend**
```bash
cd frontend
npm run dev
```

3. **Access the application**
- Frontend: http://localhost:5173
- Backend API: http://localhost:8000
- API Documentation: http://localhost:8000/docs

#### Via CI/CD

The application is automatically deployed to AWS when changes are pushed to the `main` branch. The GitHub Actions workflow:

1. **Trains ML Models** - Preprocesses data and trains predictive models
2. **Builds & Pushes Images** - Creates Docker images and pushes to ECR
3. **Deploys Infrastructure** - Uses Terraform to manage AWS resources
4. **Updates Services** - Deploys new versions to ECS Fargate

## Endpoints

### Health Check
- `GET /health` - Application health status

### Products
- `GET /api/v1/products` - List products with pagination and search
- `GET /api/v1/products/{product_id}` - Get specific product details

### Predictive Discounting
- `POST /api/v1/predictive-discounting/predict-discount` - Get discount recommendations

**Request Body:**
```json
{
"product_category": "Electronics",
"product_price_actual": 299.99,
"product_rating_avg": 4.5,
"product_description": "High-quality wireless headphones"
}
```

**Response:**
```json
{
"best_discount_pct": 0.15,
"best_predicted_rating_count": 1250,
"confidence_score": 0.87
}
```

### Similarity Recommendations
- `POST /api/v1/similarity/find-similar` - Find similar products

**Request Body:**
```json
{
"product_name": "Wireless Headphones",
"product_category": "Electronics",
"product_price_actual": 299.99,
"product_discount_pct": 0.1,
"product_rating_avg": 4.5,
"product_rating_count": 1200,
"product_description": "Premium wireless headphones",
"n_recommendations": 5
}
```

## ML

### Technologies Used

- **scikit-learn** - Primary ML framework
- **Random Forest Regressor** - For predictive discounting
- **TF-IDF Vectorization** - Text feature extraction
- **Custom Transformers** - Feature engineering and preprocessing
- **Joblib** - Model serialization and caching

### Predictive Discounting Model

The predictive discounting system uses a machine learning pipeline that:

1. **Feature Engineering**
- Text processing of product descriptions using TF-IDF
- Category encoding with OneHotEncoder
- Price and rating normalization
- Custom transformers for category splitting and weight scaling

2. **Model Architecture**
- Random Forest Regressor for robust predictions
- Pipeline-based approach for consistent preprocessing
- KNearest-Neighbour with MultilabelBinarizer for similarity recommendations

3. **Training Process**
- Uses historical [Amazon product data](https://www.kaggle.com/datasets/karkavelrajaj/amazon-sales-dataset)
- Predicts optimal discount percentages based on product characteristics
- Estimates expected rating count improvements

### Similarity Recommendation Model

The similarity system provides product recommendations by:

1. **Feature Extraction**
- Multi-label binarization for categories
- Text similarity using TF-IDF
- Numerical feature scaling

2. **Similarity Calculation**
- Cosine similarity for text features
- Euclidean distance for numerical features
- Weighted combination of multiple similarity metrics

3. **Recommendation Engine**
- Finds products with similar characteristics
- Ranks by similarity score
- Returns top N recommendations

## Architecture

### AWS Infrastructure

The application is deployed on AWS using a modern, scalable architecture:

#### Compute Layer
- **ECS Fargate** - Serverless container orchestration
- **Application Load Balancer** - Traffic distribution and SSL termination
- **Auto Scaling** - Automatic scaling based on demand

#### Storage Layer
- **S3** - Static frontend hosting and data storage
- **ECR** - Container image registry
- **CloudWatch Logs** - Centralized logging

#### Network Layer
- **CloudFront** - Global CDN for frontend and API
- **Route 53** - DNS management
- **VPC** - Network isolation and security

#### Security
- **IAM Roles** - Least privilege access control
- **Security Groups** - Network-level security
- **WAF** - Web application firewall (optional)

### Application Architecture

```
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Frontend │ │ API Gateway │ │ Backend │
│ (SvelteKit) │◄──►│ (CloudFront) │◄──►│ (FastAPI) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ S3 Bucket │ │ ALB │ │ ML Models │
│ (Static Host) │ │ (Load Bal.) │ │ (Joblib) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
```

### Data Flow

1. **User Request** → CloudFront → ALB → ECS Fargate
2. **API Processing** → FastAPI → ML Models → Response
3. **Static Assets** → S3 → CloudFront → User

## Outlook & Improvements

### Possible Enhancements

1. **Advanced ML Features**
- Real-time model retraining with new data
- A/B testing framework for discount strategies
- Personalized recommendations based on user behavior
- Time-series analysis for seasonal trends

2. **Performance Optimizations**
- Redis caching for frequently accessed data
- Database integration (PostgreSQL/RDS)
- GraphQL API for more efficient data fetching
- CDN optimization for global performance

3. **User Experience**
- Real-time notifications for price changes
- Advanced filtering and sorting options
- Export functionality for reports
- Mobile-responsive design improvements

4. **Infrastructure Enhancements**
- Multi-region deployment for better latency
- Blue-green deployment strategy
- Enhanced monitoring and alerting
- Cost optimization and resource management
- Sagemaker for ML Training

5. **Analytics & Reporting**
- Advanced dashboard with more metrics
- Custom report generation
- Data visualization improvements
- Integration with external analytics tools

### Technical Debt

- Implement comprehensive unit and integration tests
- Add API rate limiting and authentication
- Improve error handling and logging
- Optimize ML model performance and accuracy
- Enhance security measures and compliance

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Support

For support and questions, please open an issue in the GitHub repository or contact the development team.