{"id":30151877,"url":"https://github.com/genaray/ml.shopanalytics","last_synced_at":"2026-04-11T04:32:47.272Z","repository":{"id":309187294,"uuid":"1032464282","full_name":"genaray/ML.ShopAnalytics","owner":"genaray","description":"A minimalist Python \u0026 cloud ML project that trains on Amazon sales \u0026 review data to recommend optimal prices/discounts to boost ratings/sales and surface actionable visual insights. Powered end-to-end by AWS CloudFront, S3, ALB \u0026 Fargate and Svelte.","archived":false,"fork":false,"pushed_at":"2025-08-06T13:45:39.000Z","size":2979,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-03-30T17:48:37.789Z","etag":null,"topics":["ai","aws","aws-alb","aws-cloudfront","aws-ecs","aws-fargate","aws-s3","cicd","devops","machine-learning","python","scikit-learn","terraform"],"latest_commit_sha":null,"homepage":"","language":"Svelte","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/genaray.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-08-05T10:46:46.000Z","updated_at":"2025-08-06T13:52:33.000Z","dependencies_parsed_at":"2025-08-10T13:08:27.676Z","dependency_job_id":null,"html_url":"https://github.com/genaray/ML.ShopAnalytics","commit_stats":null,"previous_names":["genaray/ml.shopanalytics"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/genaray/ML.ShopAnalytics","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/genaray%2FML.ShopAnalytics","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/genaray%2FML.ShopAnalytics/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/genaray%2FML.ShopAnalytics/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/genaray%2FML.ShopAnalytics/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/genaray","download_url":"https://codeload.github.com/genaray/ML.ShopAnalytics/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/genaray%2FML.ShopAnalytics/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31669115,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-10T17:19:37.612Z","status":"online","status_checked_at":"2026-04-11T02:00:05.776Z","response_time":54,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","aws","aws-alb","aws-cloudfront","aws-ecs","aws-fargate","aws-s3","cicd","devops","machine-learning","python","scikit-learn","terraform"],"created_at":"2025-08-11T11:07:52.753Z","updated_at":"2026-04-11T04:32:47.245Z","avatar_url":"https://github.com/genaray.png","language":"Svelte","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Shop Analytics\n\nA comprehensive e-commerce analytics platform that combines machine learning with modern web technologies and native cloud to provide predictive discounting insights and product recommendations.\n\n## Overview\n\nShop Analytics is a full-stack application that analyzes Amazon product data to provide intelligent insights for e-commerce optimization. The platform features:\n\n- **Predictive Discounting**: AI-powered recommendations for optimal discount percentages\n- **Product Similarity**: Machine learning-based product recommendations\n- **Real-time Analytics**: Interactive dashboard with live data visualization\n- **Cloud-Native Architecture**: Scalable AWS-based infrastructure\n\n![The products dashboard](images/products_dashboard.png)\n![The products insights](images/products_insights.png)\n\n### Technologies Included\n\n**Backend:**\n- **FastAPI** - Modern Python web framework for building APIs\n- **scikit-learn** - Machine learning library for predictive models\n- **pandas \u0026 numpy** - Data manipulation and numerical computing\n- **uvicorn** - ASGI server for FastAPI\n\n**Frontend:**\n- **SvelteKit** - Full-stack web framework\n- **TypeScript** - Type-safe JavaScript\n- **Tailwind CSS** - Utility-first CSS framework\n- **shadcn/ui** - Modern component library\n- **TanStack Table** - Powerful data table component\n- **Chart.js** - Interactive charts and visualizations\n\n**Infrastructure:**\n- **AWS ECS Fargate** - Containerized application hosting\n- **AWS S3** - Static file storage and hosting\n- **AWS CloudFront** - Global content delivery network\n- **AWS ECR** - Container image registry\n- **Terraform** - Infrastructure as Code\n- **GitHub Actions** - CI/CD pipeline\n\n## Prerequisites\n\n- **Python 3.12+**\n- **Node.js 20+**\n- **AWS CLI** (for deployment)\n- **Terraform 1.5+** (for infrastructure)\n- **Docker** (for containerization)\n\n## Setup \u0026 Build\n\n### Clone\n\n```bash\ngit clone https://github.com/your-username/ML.ShopAnalytics.git\ncd ML.ShopAnalytics\n```\n\n### Configuration\n\n1. **Backend Configuration**\n   ```bash\n   # Create virtual environment\n   python -m venv .venv\n   source .venv/bin/activate  # On Windows: .venv\\Scripts\\activate\n   \n   # Install dependencies\n   pip install -r requirements.txt\n   ```\n\n2. **Frontend Configuration**\n   ```bash\n   cd frontend\n   npm install\n   ```\n\n3. **AWS Configuration** (for deployment)\n   ```bash\n   aws configure\n   # Enter your AWS Access Key ID, Secret Access Key, and region\n   ```\n\n### Build\n\n1. **Train ML Models**\n   ```bash\n   # Preprocess data\n   python src/data/preprocess.py\n   \n   # Train predictive discounting model\n   python src/predictive_discounting/predictive_discounting.py\n   \n   # Train similarity recommendation model\n   python src/similarity_recommendation/similarity_recommendation.py\n   ```\n\n2. **Build Frontend**\n   ```bash\n   cd frontend\n   npm run build\n   ```\n\n### Run\n\n#### Local Development\n\n1. **Backend**\n   ```bash\n   # From project root\n   python run_api.py\n   # Or with uvicorn directly\n   uvicorn src.app:app --reload --host 0.0.0.0 --port 8000\n   ```\n\n2. **Frontend**\n   ```bash\n   cd frontend\n   npm run dev\n   ```\n\n3. **Access the application**\n   - Frontend: http://localhost:5173\n   - Backend API: http://localhost:8000\n   - API Documentation: http://localhost:8000/docs\n\n#### Via CI/CD\n\nThe application is automatically deployed to AWS when changes are pushed to the `main` branch. The GitHub Actions workflow:\n\n1. **Trains ML Models** - Preprocesses data and trains predictive models\n2. **Builds \u0026 Pushes Images** - Creates Docker images and pushes to ECR\n3. **Deploys Infrastructure** - Uses Terraform to manage AWS resources\n4. **Updates Services** - Deploys new versions to ECS Fargate\n\n## Endpoints\n\n### Health Check\n- `GET /health` - Application health status\n\n### Products\n- `GET /api/v1/products` - List products with pagination and search\n- `GET /api/v1/products/{product_id}` - Get specific product details\n\n### Predictive Discounting\n- `POST /api/v1/predictive-discounting/predict-discount` - Get discount recommendations\n\n**Request Body:**\n```json\n{\n  \"product_category\": \"Electronics\",\n  \"product_price_actual\": 299.99,\n  \"product_rating_avg\": 4.5,\n  \"product_description\": \"High-quality wireless headphones\"\n}\n```\n\n**Response:**\n```json\n{\n  \"best_discount_pct\": 0.15,\n  \"best_predicted_rating_count\": 1250,\n  \"confidence_score\": 0.87\n}\n```\n\n### Similarity Recommendations\n- `POST /api/v1/similarity/find-similar` - Find similar products\n\n**Request Body:**\n```json\n{\n  \"product_name\": \"Wireless Headphones\",\n  \"product_category\": \"Electronics\",\n  \"product_price_actual\": 299.99,\n  \"product_discount_pct\": 0.1,\n  \"product_rating_avg\": 4.5,\n  \"product_rating_count\": 1200,\n  \"product_description\": \"Premium wireless headphones\",\n  \"n_recommendations\": 5\n}\n```\n\n## ML\n\n### Technologies Used\n\n- **scikit-learn** - Primary ML framework\n- **Random Forest Regressor** - For predictive discounting\n- **TF-IDF Vectorization** - Text feature extraction\n- **Custom Transformers** - Feature engineering and preprocessing\n- **Joblib** - Model serialization and caching\n\n### Predictive Discounting Model\n\nThe predictive discounting system uses a machine learning pipeline that:\n\n1. **Feature Engineering**\n   - Text processing of product descriptions using TF-IDF\n   - Category encoding with OneHotEncoder\n   - Price and rating normalization\n   - Custom transformers for category splitting and weight scaling\n\n2. **Model Architecture**\n   - Random Forest Regressor for robust predictions\n   - Pipeline-based approach for consistent preprocessing\n   - KNearest-Neighbour with MultilabelBinarizer for similarity recommendations\n\n3. **Training Process**\n   - Uses historical [Amazon product data](https://www.kaggle.com/datasets/karkavelrajaj/amazon-sales-dataset)\n   - Predicts optimal discount percentages based on product characteristics\n   - Estimates expected rating count improvements\n\n### Similarity Recommendation Model\n\nThe similarity system provides product recommendations by:\n\n1. **Feature Extraction**\n   - Multi-label binarization for categories\n   - Text similarity using TF-IDF\n   - Numerical feature scaling\n\n2. **Similarity Calculation**\n   - Cosine similarity for text features\n   - Euclidean distance for numerical features\n   - Weighted combination of multiple similarity metrics\n\n3. **Recommendation Engine**\n   - Finds products with similar characteristics\n   - Ranks by similarity score\n   - Returns top N recommendations\n\n## Architecture\n\n### AWS Infrastructure\n\nThe application is deployed on AWS using a modern, scalable architecture:\n\n#### Compute Layer\n- **ECS Fargate** - Serverless container orchestration\n- **Application Load Balancer** - Traffic distribution and SSL termination\n- **Auto Scaling** - Automatic scaling based on demand\n\n#### Storage Layer\n- **S3** - Static frontend hosting and data storage\n- **ECR** - Container image registry\n- **CloudWatch Logs** - Centralized logging\n\n#### Network Layer\n- **CloudFront** - Global CDN for frontend and API\n- **Route 53** - DNS management\n- **VPC** - Network isolation and security\n\n#### Security\n- **IAM Roles** - Least privilege access control\n- **Security Groups** - Network-level security\n- **WAF** - Web application firewall (optional)\n\n### Application Architecture\n\n```\n┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐\n│   Frontend      │    │   API Gateway   │    │   Backend       │\n│   (SvelteKit)   │◄──►│   (CloudFront)  │◄──►│   (FastAPI)     │\n└─────────────────┘    └─────────────────┘    └─────────────────┘\n         │                       │                       │\n         │                       │                       │\n         ▼                       ▼                       ▼\n┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐\n│   S3 Bucket     │    │   ALB           │    │   ML Models     │\n│   (Static Host) │    │   (Load Bal.)   │    │   (Joblib)      │\n└─────────────────┘    └─────────────────┘    └─────────────────┘\n```\n\n### Data Flow\n\n1. **User Request** → CloudFront → ALB → ECS Fargate\n2. **API Processing** → FastAPI → ML Models → Response\n3. **Static Assets** → S3 → CloudFront → User\n\n## Outlook \u0026 Improvements\n\n### Possible Enhancements\n\n1. **Advanced ML Features**\n   - Real-time model retraining with new data\n   - A/B testing framework for discount strategies\n   - Personalized recommendations based on user behavior\n   - Time-series analysis for seasonal trends\n\n2. **Performance Optimizations**\n   - Redis caching for frequently accessed data\n   - Database integration (PostgreSQL/RDS)\n   - GraphQL API for more efficient data fetching\n   - CDN optimization for global performance\n\n3. **User Experience**\n   - Real-time notifications for price changes\n   - Advanced filtering and sorting options\n   - Export functionality for reports\n   - Mobile-responsive design improvements\n\n4. **Infrastructure Enhancements**\n   - Multi-region deployment for better latency\n   - Blue-green deployment strategy\n   - Enhanced monitoring and alerting\n   - Cost optimization and resource management\n   - Sagemaker for ML Training\n\n5. **Analytics \u0026 Reporting**\n   - Advanced dashboard with more metrics\n   - Custom report generation\n   - Data visualization improvements\n   - Integration with external analytics tools\n\n### Technical Debt\n\n- Implement comprehensive unit and integration tests\n- Add API rate limiting and authentication\n- Improve error handling and logging\n- Optimize ML model performance and accuracy\n- Enhance security measures and compliance\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Support\n\nFor support and questions, please open an issue in the GitHub repository or contact the development team. ","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgenaray%2Fml.shopanalytics","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgenaray%2Fml.shopanalytics","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgenaray%2Fml.shopanalytics/lists"}