An open API service indexing awesome lists of open source software.

https://github.com/zaidshaikh987/auto-insight

A cutting-edge real-time AI-powered data analysis and machine learning platform that delivers instant insights through live progress updates, background processing, and intelligent automation. Built with modern web technologies and enterprise-grade infrastructure.
https://github.com/zaidshaikh987/auto-insight

automl celery docker fastapi flaml grafana minio mlops monitoring postresql prometheus python react redis typescript websockets

Last synced: 2 months ago
JSON representation

A cutting-edge real-time AI-powered data analysis and machine learning platform that delivers instant insights through live progress updates, background processing, and intelligent automation. Built with modern web technologies and enterprise-grade infrastructure.

Awesome Lists containing this project

README

          

# Auto-Insights Platform πŸš€

A cutting-edge **real-time** AI-powered data analysis and machine learning platform that delivers **instant insights** through live progress updates, background processing, and intelligent automation. Built with modern web technologies and enterprise-grade infrastructure.

![Auto-Insights](https://img.shields.io/badge/status-production--ready-brightgreen)
![Python](https://img.shields.io/badge/python-3.11-blue)
![React](https://img.shields.io/badge/react-18.2.0-61DAFB)
![FastAPI](https://img.shields.io/badge/fastapi-0.104.1-009688)
![Docker](https://img.shields.io/badge/docker-ready-2496ED)
![TypeScript](https://img.shields.io/badge/typescript-5.x-3178C6)
![Vite](https://img.shields.io/badge/vite-4.5-646CFF)
![TailwindCSS](https://img.shields.io/badge/tailwindcss-3.3-38B2AC)
![Uvicorn](https://img.shields.io/badge/uvicorn-ready-0E7C86)
![WebSockets](https://img.shields.io/badge/websockets-12.0-FF69B4)
![Celery](https://img.shields.io/badge/celery-5.3.4-success)
![Redis](https://img.shields.io/badge/redis-7.0-DC382D)
![PostgreSQL](https://img.shields.io/badge/postgresql-15-336791)
![MinIO](https://img.shields.io/badge/minio-ready-FC5A50)
![Prometheus](https://img.shields.io/badge/prometheus-ready-E6522C)
![Grafana](https://img.shields.io/badge/grafana-ready-F46800)
![FLAML](https://img.shields.io/badge/flaml-2.1.1-00A8E8)
![scikit-learn](https://img.shields.io/badge/scikit--learn-1.3.2-F7931E)
![Pandas](https://img.shields.io/badge/pandas-2.1.4-150458)
![NumPy](https://img.shields.io/badge/numpy-1.24-013243)
![ESLint](https://img.shields.io/badge/eslint-configured-4B32C3)
![Prettier](https://img.shields.io/badge/prettier-configured-F7B93E)
![Black](https://img.shields.io/badge/black-formatter-000000)
![License: MIT](https://img.shields.io/badge/license-MIT-blue)

image
image
image
image
image
image
image
image
image
image
image
image
image

---

## 🌟 Key Features

### πŸ”΄ **Real-Time Processing**
- **Live EDA Analysis**: 11-step comprehensive data analysis with step-by-step progress updates
- **Real-Time AutoML**: Automated machine learning with live training progress and model performance tracking
- **WebSocket Connections**: Instant progress updates and real-time notifications
- **Background Job Processing**: Asynchronous task execution with Celery and Redis
- **Live Progress Tracking**: Detailed progress bars and status updates for all operations

### πŸ€– **AI-Powered Intelligence**
- **Automated EDA**: Comprehensive exploratory data analysis with statistical insights
- **Smart AutoML**: Automated model selection and hyperparameter tuning using FLAML
- **Model Explainability**: SHAP, LIME, and permutation importance for model interpretability
- **Gemini AI Integration**: Natural language explanations and business insights
- **Multi-Modal Support**: Tabular, Computer Vision, NLP, and Time Series data

### 🎨 **Modern User Experience**
- **Responsive Web UI**: React + TypeScript + Tailwind CSS with dark/light themes
- **Real-Time Dashboards**: Live metrics, activity monitoring, and interactive visualizations
- **Drag & Drop Interface**: Intuitive file upload and data management
- **Interactive Visualizations**: Plotly.js and Recharts for data exploration
- **Mobile Optimized**: Fully responsive design for all devices

---

## πŸ—οΈ Architecture Overview

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Frontend β”‚ β”‚ Backend β”‚ β”‚ Infrastructureβ”‚
β”‚ (React + TS) │◄──►│ (FastAPI) │◄──►│ (Docker) β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β€’ Real-time UI β”‚ β”‚ β€’ REST APIs β”‚ β”‚ β€’ PostgreSQL β”‚
β”‚ β€’ WebSocket β”‚ β”‚ β€’ Background β”‚ β”‚ β€’ Redis β”‚
β”‚ β€’ Visualizationsβ”‚ β”‚ Jobs β”‚ β”‚ β€’ MinIO β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ AI/ML Stack β”‚
β”‚ β”‚
β”‚ β€’ FLAML AutoML β”‚
β”‚ β€’ SHAP/LIME β”‚
β”‚ β€’ Gemini AI β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

---

## πŸ› οΈ Technology Stack

### **Frontend**
- **Framework**: React 18.2.0 with TypeScript
- **Build Tool**: Vite 4.5.0
- **Styling**: Tailwind CSS 3.3.5
- **Charts**: Plotly.js 2.27.0, Recharts 2.8.0
- **Routing**: React Router DOM 6.20.1
- **State Management**: Zustand 4.4.7
- **UI Components**: Headless UI, Heroicons

### **Backend**
- **Framework**: FastAPI 0.104.1 with async support
- **Server**: Uvicorn with WebSocket support
- **Data Processing**: Pandas 2.1.4, NumPy 1.24.3
- **Machine Learning**: Scikit-learn 1.3.2, FLAML 2.1.1
- **Model Explainability**: SHAP 0.43.0, LIME 0.2.0.1
- **AI Integration**: Google Generative AI 0.3.2
- **Task Queue**: Celery 5.3.4 with Redis 5.0.1
- **WebSockets**: WebSockets 12.0 for real-time updates

### **Infrastructure & DevOps**
- **Containerization**: Docker & Docker Compose
- **Database**: PostgreSQL with SQLAlchemy 2.0.23
- **Object Storage**: MinIO 7.2.0
- **Message Broker**: Redis 7.0 (Alpine)
- **Monitoring**: Prometheus + Grafana
- **Task Monitoring**: Flower (Celery dashboard)
- **Load Balancing**: Nginx (production ready)

---

## πŸš€ Quick Start Guide

### **Prerequisites**
- **Docker & Docker Compose** (v20.10+)
- **Git** for version control
- **Google Gemini API Key** for AI explanations

### **Installation & Setup**

1. **Clone the Repository**
```bash
git clone
cd auto-insights
```

2. **Environment Configuration**
```bash
# Copy environment template
cp .env.example .env

# Edit .env file with your configuration
nano .env # or use your preferred editor
```

**Required Environment Variables:**
```env
# AI Integration
GEMINI_API_KEY=your_google_gemini_api_key_here

# Database
DATABASE_URL=postgresql://user:password@localhost:5432/auto_insights

# Object Storage
MINIO_ENDPOINT=localhost:9000
MINIO_ACCESS_KEY=minioadmin
MINIO_SECRET_KEY=minioadmin

# Redis
REDIS_URL=redis://localhost:6379/0
```

3. **Launch the Platform**
```bash
# Start all services (recommended)
./start.sh

# Or use Docker Compose directly
docker-compose up -d
```

4. **Verify Installation**
```bash
# Check service status
docker-compose ps

# Validate platform functionality
python validate_platform.py
```

5. **Access Applications**
- **Main Application**: http://localhost:3000
- **API Documentation**: http://localhost:8000/docs
- **Interactive API Docs**: http://localhost:8000/redoc
- **Task Monitoring**: http://localhost:5555
- **MinIO Console**: http://localhost:9001
- **Grafana Dashboard**: http://localhost:3001
- **Prometheus Metrics**: http://localhost:9090

---

## πŸ“Š Real-Time Features Deep Dive

### **Live EDA Analysis**
The platform performs comprehensive exploratory data analysis with real-time progress updates:

1. **Data Loading & Validation** (5%)
2. **Basic Statistics** (15%)
3. **Missing Values Analysis** (25%)
4. **Distribution Analysis** (35%)
5. **Correlation Analysis** (45%)
6. **Feature Importance** (55%)
7. **Outlier Detection** (65%)
8. **Data Quality Report** (75%)
9. **Visualization Generation** (85%)
10. **Summary & Recommendations** (95%)
11. **Complete** (100%)

### **Real-Time AutoML Training**
Automated machine learning with live progress tracking:

- **Algorithm Selection**: Automatic model selection from 20+ algorithms
- **Hyperparameter Tuning**: Intelligent parameter optimization
- **Cross-Validation**: Real-time CV score updates
- **Model Comparison**: Live leaderboard updates
- **Performance Metrics**: Instant accuracy, precision, recall tracking

### **WebSocket Communication**
Real-time updates via WebSocket connections:

```typescript
// Frontend WebSocket integration
const ws = new WebSocket(`ws://localhost:8000/ws/job/${jobId}`);

ws.onmessage = (event) => {
const data = JSON.parse(event.data);
console.log('Progress:', data.progress, '%');
console.log('Status:', data.status);
console.log('Message:', data.message);
};
```

---

## πŸ”§ Development Workflow

### **Frontend Development**
```bash
cd frontend

# Install dependencies
npm install

# Start development server
npm run dev

# Build for production
npm run build

# Preview production build
npm run preview

# Run linting
npm run lint

# Type checking
npm run type-check
```

### **Backend Development**
```bash
cd backend

# Install Python dependencies
pip install -r requirements.txt

# Start development server
uvicorn main:app --reload --host 0.0.0.0 --port 8000

# Run with background task support
celery -A celery_app.celery_app worker --loglevel=info

# Run Redis for local development
redis-server
```

### **Full Development Stack**
```bash
# Terminal 1: Frontend
cd frontend && npm run dev

# Terminal 2: Backend
cd backend && uvicorn main:app --reload

# Terminal 3: Redis
redis-server

# Terminal 4: Celery Worker
cd backend && celery -A celery_app.celery_app worker --loglevel=info
```

---

## πŸ§ͺ Testing & Validation

### **Platform Validation**
```bash
# Comprehensive platform validation
python validate_platform.py
```

### **Real-Time Feature Testing**
```bash
# Install test dependencies
pip install websockets requests pandas

# Run comprehensive real-time test
python test_realtime.py
```

### **Load Testing**
```bash
# API load testing
python load_test.py

# WebSocket stress testing
python websocket_test.py
```

---

## πŸ“ˆ Monitoring & Observability

### **Application Monitoring**
- **Prometheus**: System metrics collection
- **Grafana**: Real-time dashboards and alerting
- **Custom Metrics**: Business KPIs and ML model performance

### **Log Management**
- **Structured Logging**: JSON formatted logs with correlation IDs
- **Log Aggregation**: Centralized logging with ELK stack ready
- **Error Tracking**: Comprehensive error handling and reporting

### **Performance Monitoring**
- **Real-time Metrics**: CPU, memory, disk usage
- **Application Metrics**: Response times, throughput, error rates
- **ML Metrics**: Model accuracy, training time, prediction latency

---

## πŸ”’ Security & Best Practices

### **Security Features**
- **CORS Protection**: Configured for production domains
- **Input Validation**: Pydantic models for all API inputs
- **SQL Injection Protection**: Parameterized queries
- **XSS Protection**: Input sanitization and validation
- **CSRF Protection**: Token-based authentication ready

### **Data Protection**
- **Encrypted Storage**: Database and object storage encryption
- **Secure APIs**: HTTPS enforcement in production
- **Access Control**: Role-based permissions ready
- **Audit Logging**: Complete activity tracking

---

## πŸš€ Deployment Options

### **Production Deployment**
```bash
# Build production images
docker-compose -f docker-compose.prod.yml up -d

# Or use the production startup script
./deploy.sh
```

### **Cloud Deployment**
- **AWS**: ECS Fargate with RDS and ElastiCache
- **Google Cloud**: Cloud Run with Cloud SQL and Memorystore
- **Azure**: Container Instances with Azure Database and Redis Cache

### **Scaling Considerations**
- **Horizontal Scaling**: Multiple backend instances behind load balancer
- **Database Scaling**: Read replicas and connection pooling
- **Celery Scaling**: Multiple worker nodes
- **Caching**: Redis clustering for high availability

---

## πŸ“š API Documentation

### **Core Endpoints**

#### **Project Management**
- `GET /api/projects` - List all projects
- `POST /api/projects` - Create new project
- `GET /api/projects/{id}` - Get project details
- `PUT /api/projects/{id}` - Update project
- `DELETE /api/projects/{id}` - Delete project

#### **Data Management**
- `POST /api/projects/{id}/upload` - Upload dataset
- `GET /api/projects/{id}/datasets` - List datasets
- `GET /api/projects/{id}/datasets/{dataset_id}` - Get dataset info
- `DELETE /api/projects/{id}/datasets/{dataset_id}` - Delete dataset

#### **Real-Time Analysis**
- `POST /api/eda/analyze` - Start EDA analysis with WebSocket
- `GET /api/eda/{project_id}/{dataset_id}/report` - Get EDA results
- `POST /api/automl/train` - Start AutoML training with WebSocket
- `GET /api/automl/{project_id}/leaderboard` - Get model leaderboard
- `GET /api/automl/{project_id}/models/{model_id}` - Get specific model

#### **WebSocket Endpoints**
- `ws://localhost:8000/ws/job/{job_id}` - Real-time job progress

### **Response Format**
```json
{
"job_id": "uuid-string",
"status": "running|completed|failed",
"progress": 75.5,
"message": "Processing step 8/11: Feature importance analysis",
"data": {
"results": "...",
"metrics": {...}
},
"websocket_url": "ws://localhost:8000/ws/job/uuid-string"
}
```

---

## 🀝 Contributing

### **Development Setup**
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests for new functionality
5. Ensure all tests pass
6. Submit a pull request

### **Code Style**
- **Python**: PEP 8 with Black formatting
- **TypeScript**: ESLint + Prettier
- **Git Hooks**: Pre-commit hooks for code quality

### **Testing Standards**
- Unit tests for all new features
- Integration tests for API endpoints
- End-to-end tests for critical workflows
- Performance benchmarks for ML components

---

## πŸ“„ License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

---

## πŸ†˜ Support & Troubleshooting

### **Common Issues**

**Problem**: WebSocket connections failing
```bash
# Solution: Check Redis and Celery services
docker-compose logs redis
docker-compose logs celery_worker
```

**Problem**: ML models not training
```bash
# Solution: Verify Python dependencies
docker-compose exec backend pip list | grep -E "(pandas|scikit-learn|flaml)"
```

**Problem**: File uploads failing
```bash
# Solution: Check MinIO service and permissions
docker-compose logs minio

---

## πŸ™ Acknowledgments

- **Google Gemini AI** for natural language explanations
- **FLAML** for automated machine learning
- **FastAPI** for the robust backend framework
- **React** ecosystem for the modern frontend
- **Open Source Community** for all the amazing tools and libraries

---

**Built with ❀️ for data scientists, ML engineers, and business analysts who need instant insights from their data.**

---

**⭐ Star this repository if you find it useful!**