{"id":31287866,"url":"https://github.com/zaidshaikh987/auto-insight","last_synced_at":"2026-04-10T11:31:23.954Z","repository":{"id":316049420,"uuid":"1061737182","full_name":"zaidshaikh987/Auto-Insight","owner":"zaidshaikh987","description":"A cutting-edge real-time AI-powered data analysis and machine learning platform that delivers instant insights through live progress updates, background processing, and intelligent automation. Built with modern web technologies and enterprise-grade infrastructure.","archived":false,"fork":false,"pushed_at":"2025-09-22T10:38:33.000Z","size":184,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-22T12:14:58.095Z","etag":null,"topics":["automl","celery","docker","fastapi","flaml","grafana","minio","mlops","monitoring","postresql","prometheus","python","react","redis","typescript","websockets"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zaidshaikh987.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-22T10:10:41.000Z","updated_at":"2025-09-22T10:49:41.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/zaidshaikh987/Auto-Insight","commit_stats":null,"previous_names":["zaidshaikh987/auto-insight"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/zaidshaikh987/Auto-Insight","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zaidshaikh987%2FAuto-Insight","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zaidshaikh987%2FAuto-Insight/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zaidshaikh987%2FAuto-Insight/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zaidshaikh987%2FAuto-Insight/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zaidshaikh987","download_url":"https://codeload.github.com/zaidshaikh987/Auto-Insight/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zaidshaikh987%2FAuto-Insight/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":276738494,"owners_count":25695923,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-24T02:00:09.776Z","response_time":97,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["automl","celery","docker","fastapi","flaml","grafana","minio","mlops","monitoring","postresql","prometheus","python","react","redis","typescript","websockets"],"created_at":"2025-09-24T11:27:29.016Z","updated_at":"2026-04-10T11:31:23.947Z","avatar_url":"https://github.com/zaidshaikh987.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Auto-Insights Platform 🚀\n\nA cutting-edge **real-time** AI-powered data analysis and machine learning platform that delivers **instant insights** through live progress updates, background processing, and intelligent automation. Built with modern web technologies and enterprise-grade infrastructure.\n\n![Auto-Insights](https://img.shields.io/badge/status-production--ready-brightgreen)\n![Python](https://img.shields.io/badge/python-3.11-blue)\n![React](https://img.shields.io/badge/react-18.2.0-61DAFB)\n![FastAPI](https://img.shields.io/badge/fastapi-0.104.1-009688)\n![Docker](https://img.shields.io/badge/docker-ready-2496ED)\n![TypeScript](https://img.shields.io/badge/typescript-5.x-3178C6)\n![Vite](https://img.shields.io/badge/vite-4.5-646CFF)\n![TailwindCSS](https://img.shields.io/badge/tailwindcss-3.3-38B2AC)\n![Uvicorn](https://img.shields.io/badge/uvicorn-ready-0E7C86)\n![WebSockets](https://img.shields.io/badge/websockets-12.0-FF69B4)\n![Celery](https://img.shields.io/badge/celery-5.3.4-success)\n![Redis](https://img.shields.io/badge/redis-7.0-DC382D)\n![PostgreSQL](https://img.shields.io/badge/postgresql-15-336791)\n![MinIO](https://img.shields.io/badge/minio-ready-FC5A50)\n![Prometheus](https://img.shields.io/badge/prometheus-ready-E6522C)\n![Grafana](https://img.shields.io/badge/grafana-ready-F46800)\n![FLAML](https://img.shields.io/badge/flaml-2.1.1-00A8E8)\n![scikit-learn](https://img.shields.io/badge/scikit--learn-1.3.2-F7931E)\n![Pandas](https://img.shields.io/badge/pandas-2.1.4-150458)\n![NumPy](https://img.shields.io/badge/numpy-1.24-013243)\n![ESLint](https://img.shields.io/badge/eslint-configured-4B32C3)\n![Prettier](https://img.shields.io/badge/prettier-configured-F7B93E)\n![Black](https://img.shields.io/badge/black-formatter-000000)\n![License: MIT](https://img.shields.io/badge/license-MIT-blue)\n\n\u003cimg width=\"1909\" height=\"892\" alt=\"image\" src=\"https://github.com/user-attachments/assets/61e4c80c-0ceb-4ca6-8d98-87cc63cc10da\" /\u003e\n\u003cimg width=\"1897\" height=\"894\" alt=\"image\" src=\"https://github.com/user-attachments/assets/e355c8c9-7266-42ee-ac3b-8b4808f36b7c\" /\u003e\n\u003cimg width=\"1905\" height=\"901\" alt=\"image\" src=\"https://github.com/user-attachments/assets/be9ec279-7f1c-47d8-a9f0-1b40bb66112d\" /\u003e\n\u003cimg width=\"1892\" height=\"895\" alt=\"image\" src=\"https://github.com/user-attachments/assets/a520d094-5795-4019-b858-ed0dc5013231\" /\u003e\n\u003cimg width=\"1877\" height=\"895\" alt=\"image\" src=\"https://github.com/user-attachments/assets/aeb81c87-d347-477d-b465-2210a9d61771\" /\u003e\n\u003cimg width=\"1873\" height=\"888\" alt=\"image\" src=\"https://github.com/user-attachments/assets/c4f669d4-a3ec-438b-8d68-00db94688e6b\" /\u003e\n\u003cimg width=\"1874\" height=\"889\" alt=\"image\" src=\"https://github.com/user-attachments/assets/d31fb247-105b-4755-8902-105bfdbd49ab\" /\u003e\n\u003cimg width=\"1879\" height=\"888\" alt=\"image\" src=\"https://github.com/user-attachments/assets/51ffa7e2-b0de-4b7e-9e75-8e2a46c6c753\" /\u003e\n\u003cimg width=\"1886\" height=\"879\" alt=\"image\" src=\"https://github.com/user-attachments/assets/60b2b62e-7bbc-479a-a143-d7efd5cb3fb5\" /\u003e\n\u003cimg width=\"1878\" height=\"885\" alt=\"image\" src=\"https://github.com/user-attachments/assets/132ee674-5d19-484c-9c9f-bd58957e52be\" /\u003e\n\u003cimg width=\"1862\" height=\"872\" alt=\"image\" src=\"https://github.com/user-attachments/assets/2f43b5d4-397e-4cdc-8860-50d86eb6cde7\" /\u003e\n\u003cimg width=\"1878\" height=\"886\" alt=\"image\" src=\"https://github.com/user-attachments/assets/e361a93b-dea7-48f0-9ccb-669b0a58fc7b\" /\u003e\n\u003cimg width=\"1889\" height=\"882\" alt=\"image\" src=\"https://github.com/user-attachments/assets/9ee8ee9b-a856-4b0e-98df-cc7ddedd6a97\" /\u003e\n\n---\n\n## 🌟 Key Features\n\n### 🔴 **Real-Time Processing**\n- **Live EDA Analysis**: 11-step comprehensive data analysis with step-by-step progress updates\n- **Real-Time AutoML**: Automated machine learning with live training progress and model performance tracking\n- **WebSocket Connections**: Instant progress updates and real-time notifications\n- **Background Job Processing**: Asynchronous task execution with Celery and Redis\n- **Live Progress Tracking**: Detailed progress bars and status updates for all operations\n\n### 🤖 **AI-Powered Intelligence**\n- **Automated EDA**: Comprehensive exploratory data analysis with statistical insights\n- **Smart AutoML**: Automated model selection and hyperparameter tuning using FLAML\n- **Model Explainability**: SHAP, LIME, and permutation importance for model interpretability\n- **Gemini AI Integration**: Natural language explanations and business insights\n- **Multi-Modal Support**: Tabular, Computer Vision, NLP, and Time Series data\n\n### 🎨 **Modern User Experience**\n- **Responsive Web UI**: React + TypeScript + Tailwind CSS with dark/light themes\n- **Real-Time Dashboards**: Live metrics, activity monitoring, and interactive visualizations\n- **Drag \u0026 Drop Interface**: Intuitive file upload and data management\n- **Interactive Visualizations**: Plotly.js and Recharts for data exploration\n- **Mobile Optimized**: Fully responsive design for all devices\n\n---\n\n## 🏗️ Architecture Overview\n\n```\n┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐\n│   Frontend      │    │    Backend      │    │   Infrastructure│\n│   (React + TS)  │◄──►│   (FastAPI)     │◄──►│   (Docker)      │\n│                 │    │                 │    │                 │\n│ • Real-time UI  │    │ • REST APIs     │    │ • PostgreSQL    │\n│ • WebSocket     │    │ • Background     │    │ • Redis         │\n│ • Visualizations│    │   Jobs          │    │ • MinIO         │\n└─────────────────┘    └─────────────────┘    └─────────────────┘\n                              │\n                              ▼\n                       ┌─────────────────┐\n                       │   AI/ML Stack   │\n                       │                 │\n                       │ • FLAML AutoML  │\n                       │ • SHAP/LIME     │\n                       │ • Gemini AI     │\n                       └─────────────────┘\n```\n\n---\n\n## 🛠️ Technology Stack\n\n### **Frontend**\n- **Framework**: React 18.2.0 with TypeScript\n- **Build Tool**: Vite 4.5.0\n- **Styling**: Tailwind CSS 3.3.5\n- **Charts**: Plotly.js 2.27.0, Recharts 2.8.0\n- **Routing**: React Router DOM 6.20.1\n- **State Management**: Zustand 4.4.7\n- **UI Components**: Headless UI, Heroicons\n\n### **Backend**\n- **Framework**: FastAPI 0.104.1 with async support\n- **Server**: Uvicorn with WebSocket support\n- **Data Processing**: Pandas 2.1.4, NumPy 1.24.3\n- **Machine Learning**: Scikit-learn 1.3.2, FLAML 2.1.1\n- **Model Explainability**: SHAP 0.43.0, LIME 0.2.0.1\n- **AI Integration**: Google Generative AI 0.3.2\n- **Task Queue**: Celery 5.3.4 with Redis 5.0.1\n- **WebSockets**: WebSockets 12.0 for real-time updates\n\n### **Infrastructure \u0026 DevOps**\n- **Containerization**: Docker \u0026 Docker Compose\n- **Database**: PostgreSQL with SQLAlchemy 2.0.23\n- **Object Storage**: MinIO 7.2.0\n- **Message Broker**: Redis 7.0 (Alpine)\n- **Monitoring**: Prometheus + Grafana\n- **Task Monitoring**: Flower (Celery dashboard)\n- **Load Balancing**: Nginx (production ready)\n\n---\n\n## 🚀 Quick Start Guide\n\n### **Prerequisites**\n- **Docker \u0026 Docker Compose** (v20.10+)\n- **Git** for version control\n- **Google Gemini API Key** for AI explanations\n\n### **Installation \u0026 Setup**\n\n1. **Clone the Repository**\n   ```bash\n   git clone \u003crepository-url\u003e\n   cd auto-insights\n   ```\n\n2. **Environment Configuration**\n   ```bash\n   # Copy environment template\n   cp .env.example .env\n\n   # Edit .env file with your configuration\n   nano .env  # or use your preferred editor\n   ```\n\n   **Required Environment Variables:**\n   ```env\n   # AI Integration\n   GEMINI_API_KEY=your_google_gemini_api_key_here\n\n   # Database\n   DATABASE_URL=postgresql://user:password@localhost:5432/auto_insights\n\n   # Object Storage\n   MINIO_ENDPOINT=localhost:9000\n   MINIO_ACCESS_KEY=minioadmin\n   MINIO_SECRET_KEY=minioadmin\n\n   # Redis\n   REDIS_URL=redis://localhost:6379/0\n   ```\n\n3. **Launch the Platform**\n   ```bash\n   # Start all services (recommended)\n   ./start.sh\n\n   # Or use Docker Compose directly\n   docker-compose up -d\n   ```\n\n4. **Verify Installation**\n   ```bash\n   # Check service status\n   docker-compose ps\n\n   # Validate platform functionality\n   python validate_platform.py\n   ```\n\n5. **Access Applications**\n   - **Main Application**: http://localhost:3000\n   - **API Documentation**: http://localhost:8000/docs\n   - **Interactive API Docs**: http://localhost:8000/redoc\n   - **Task Monitoring**: http://localhost:5555\n   - **MinIO Console**: http://localhost:9001\n   - **Grafana Dashboard**: http://localhost:3001\n   - **Prometheus Metrics**: http://localhost:9090\n\n---\n\n## 📊 Real-Time Features Deep Dive\n\n### **Live EDA Analysis**\nThe platform performs comprehensive exploratory data analysis with real-time progress updates:\n\n1. **Data Loading \u0026 Validation** (5%)\n2. **Basic Statistics** (15%)\n3. **Missing Values Analysis** (25%)\n4. **Distribution Analysis** (35%)\n5. **Correlation Analysis** (45%)\n6. **Feature Importance** (55%)\n7. **Outlier Detection** (65%)\n8. **Data Quality Report** (75%)\n9. **Visualization Generation** (85%)\n10. **Summary \u0026 Recommendations** (95%)\n11. **Complete** (100%)\n\n### **Real-Time AutoML Training**\nAutomated machine learning with live progress tracking:\n\n- **Algorithm Selection**: Automatic model selection from 20+ algorithms\n- **Hyperparameter Tuning**: Intelligent parameter optimization\n- **Cross-Validation**: Real-time CV score updates\n- **Model Comparison**: Live leaderboard updates\n- **Performance Metrics**: Instant accuracy, precision, recall tracking\n\n### **WebSocket Communication**\nReal-time updates via WebSocket connections:\n\n```typescript\n// Frontend WebSocket integration\nconst ws = new WebSocket(`ws://localhost:8000/ws/job/${jobId}`);\n\nws.onmessage = (event) =\u003e {\n  const data = JSON.parse(event.data);\n  console.log('Progress:', data.progress, '%');\n  console.log('Status:', data.status);\n  console.log('Message:', data.message);\n};\n```\n\n---\n\n## 🔧 Development Workflow\n\n### **Frontend Development**\n```bash\ncd frontend\n\n# Install dependencies\nnpm install\n\n# Start development server\nnpm run dev\n\n# Build for production\nnpm run build\n\n# Preview production build\nnpm run preview\n\n# Run linting\nnpm run lint\n\n# Type checking\nnpm run type-check\n```\n\n### **Backend Development**\n```bash\ncd backend\n\n# Install Python dependencies\npip install -r requirements.txt\n\n# Start development server\nuvicorn main:app --reload --host 0.0.0.0 --port 8000\n\n# Run with background task support\ncelery -A celery_app.celery_app worker --loglevel=info\n\n# Run Redis for local development\nredis-server\n```\n\n### **Full Development Stack**\n```bash\n# Terminal 1: Frontend\ncd frontend \u0026\u0026 npm run dev\n\n# Terminal 2: Backend\ncd backend \u0026\u0026 uvicorn main:app --reload\n\n# Terminal 3: Redis\nredis-server\n\n# Terminal 4: Celery Worker\ncd backend \u0026\u0026 celery -A celery_app.celery_app worker --loglevel=info\n```\n\n---\n\n## 🧪 Testing \u0026 Validation\n\n### **Platform Validation**\n```bash\n# Comprehensive platform validation\npython validate_platform.py\n```\n\n### **Real-Time Feature Testing**\n```bash\n# Install test dependencies\npip install websockets requests pandas\n\n# Run comprehensive real-time test\npython test_realtime.py\n```\n\n### **Load Testing**\n```bash\n# API load testing\npython load_test.py\n\n# WebSocket stress testing\npython websocket_test.py\n```\n\n---\n\n## 📈 Monitoring \u0026 Observability\n\n### **Application Monitoring**\n- **Prometheus**: System metrics collection\n- **Grafana**: Real-time dashboards and alerting\n- **Custom Metrics**: Business KPIs and ML model performance\n\n### **Log Management**\n- **Structured Logging**: JSON formatted logs with correlation IDs\n- **Log Aggregation**: Centralized logging with ELK stack ready\n- **Error Tracking**: Comprehensive error handling and reporting\n\n### **Performance Monitoring**\n- **Real-time Metrics**: CPU, memory, disk usage\n- **Application Metrics**: Response times, throughput, error rates\n- **ML Metrics**: Model accuracy, training time, prediction latency\n\n---\n\n## 🔒 Security \u0026 Best Practices\n\n### **Security Features**\n- **CORS Protection**: Configured for production domains\n- **Input Validation**: Pydantic models for all API inputs\n- **SQL Injection Protection**: Parameterized queries\n- **XSS Protection**: Input sanitization and validation\n- **CSRF Protection**: Token-based authentication ready\n\n### **Data Protection**\n- **Encrypted Storage**: Database and object storage encryption\n- **Secure APIs**: HTTPS enforcement in production\n- **Access Control**: Role-based permissions ready\n- **Audit Logging**: Complete activity tracking\n\n---\n\n## 🚀 Deployment Options\n\n### **Production Deployment**\n```bash\n# Build production images\ndocker-compose -f docker-compose.prod.yml up -d\n\n# Or use the production startup script\n./deploy.sh\n```\n\n### **Cloud Deployment**\n- **AWS**: ECS Fargate with RDS and ElastiCache\n- **Google Cloud**: Cloud Run with Cloud SQL and Memorystore\n- **Azure**: Container Instances with Azure Database and Redis Cache\n\n### **Scaling Considerations**\n- **Horizontal Scaling**: Multiple backend instances behind load balancer\n- **Database Scaling**: Read replicas and connection pooling\n- **Celery Scaling**: Multiple worker nodes\n- **Caching**: Redis clustering for high availability\n\n---\n\n## 📚 API Documentation\n\n### **Core Endpoints**\n\n#### **Project Management**\n- `GET /api/projects` - List all projects\n- `POST /api/projects` - Create new project\n- `GET /api/projects/{id}` - Get project details\n- `PUT /api/projects/{id}` - Update project\n- `DELETE /api/projects/{id}` - Delete project\n\n#### **Data Management**\n- `POST /api/projects/{id}/upload` - Upload dataset\n- `GET /api/projects/{id}/datasets` - List datasets\n- `GET /api/projects/{id}/datasets/{dataset_id}` - Get dataset info\n- `DELETE /api/projects/{id}/datasets/{dataset_id}` - Delete dataset\n\n#### **Real-Time Analysis**\n- `POST /api/eda/analyze` - Start EDA analysis with WebSocket\n- `GET /api/eda/{project_id}/{dataset_id}/report` - Get EDA results\n- `POST /api/automl/train` - Start AutoML training with WebSocket\n- `GET /api/automl/{project_id}/leaderboard` - Get model leaderboard\n- `GET /api/automl/{project_id}/models/{model_id}` - Get specific model\n\n#### **WebSocket Endpoints**\n- `ws://localhost:8000/ws/job/{job_id}` - Real-time job progress\n\n### **Response Format**\n```json\n{\n  \"job_id\": \"uuid-string\",\n  \"status\": \"running|completed|failed\",\n  \"progress\": 75.5,\n  \"message\": \"Processing step 8/11: Feature importance analysis\",\n  \"data\": {\n    \"results\": \"...\",\n    \"metrics\": {...}\n  },\n  \"websocket_url\": \"ws://localhost:8000/ws/job/uuid-string\"\n}\n```\n\n---\n\n## 🤝 Contributing\n\n### **Development Setup**\n1. Fork the repository\n2. Create a feature branch\n3. Make your changes\n4. Add tests for new functionality\n5. Ensure all tests pass\n6. Submit a pull request\n\n### **Code Style**\n- **Python**: PEP 8 with Black formatting\n- **TypeScript**: ESLint + Prettier\n- **Git Hooks**: Pre-commit hooks for code quality\n\n### **Testing Standards**\n- Unit tests for all new features\n- Integration tests for API endpoints\n- End-to-end tests for critical workflows\n- Performance benchmarks for ML components\n\n---\n\n## 📄 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n---\n\n## 🆘 Support \u0026 Troubleshooting\n\n### **Common Issues**\n\n**Problem**: WebSocket connections failing\n```bash\n# Solution: Check Redis and Celery services\ndocker-compose logs redis\ndocker-compose logs celery_worker\n```\n\n**Problem**: ML models not training\n```bash\n# Solution: Verify Python dependencies\ndocker-compose exec backend pip list | grep -E \"(pandas|scikit-learn|flaml)\"\n```\n\n**Problem**: File uploads failing\n```bash\n# Solution: Check MinIO service and permissions\ndocker-compose logs minio\n\n---\n\n## 🙏 Acknowledgments\n\n- **Google Gemini AI** for natural language explanations\n- **FLAML** for automated machine learning\n- **FastAPI** for the robust backend framework\n- **React** ecosystem for the modern frontend\n- **Open Source Community** for all the amazing tools and libraries\n\n---\n\n**Built with ❤️ for data scientists, ML engineers, and business analysts who need instant insights from their data.**\n\n---\n\n**⭐ Star this repository if you find it useful!**\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzaidshaikh987%2Fauto-insight","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzaidshaikh987%2Fauto-insight","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzaidshaikh987%2Fauto-insight/lists"}