https://github.com/pyjeebz/helios
Predict
https://github.com/pyjeebz/helios
Last synced: 5 months ago
JSON representation
Predict
- Host: GitHub
- URL: https://github.com/pyjeebz/helios
- Owner: pyjeebz
- Created: 2025-12-31T02:22:59.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2026-01-02T21:52:58.000Z (6 months ago)
- Last Synced: 2026-01-04T14:56:27.097Z (6 months ago)
- Language: HTML
- Size: 890 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Helios
**Predictive Infrastructure Intelligence Platform**
[](https://github.com/pyjeebz/helios/actions/workflows/ci.yml)
[](https://github.com/pyjeebz/helios/actions/workflows/release.yml)
[](https://opensource.org/licenses/Apache-2.0)
Helios uses machine learning to forecast infrastructure demand and provide proactive scaling recommendations, reducing reactive incidents and optimizing resource utilization.
## π Quick Deploy
**One-click deployment options:**
[](https://render.com/deploy?repo=https://github.com/pyjeebz/helios)
**Docker:**
```bash
docker run -d -p 8000:8000 ghcr.io/pyjeebz/helios/inference:latest
```
**Kubernetes (Helm):**
```bash
helm repo add helios https://pyjeebz.github.io/helios
helm install helios helios/helios
```
**Python SDK:**
```bash
pip install helios-sdk helios-agent
```
---
## π― Overview
Helios analyzes real-time metrics from your infrastructure, predicts future resource demands, detects anomalies, and provides actionable scaling recommendationsβbefore problems occur.
### Key Features
| Feature | Description |
|---------|-------------|
| **Traffic Forecasting** | Predict CPU, memory, and request rates up to 1 hour ahead |
| **Anomaly Detection** | Real-time detection of unusual patterns using XGBoost |
| **Scaling Recommendations** | Actionable advice for replica counts and resource limits |
| **Multi-Cloud Support** | Works on GKE, EKS, AKS, or any Kubernetes cluster |
| **Prometheus Native** | Exposes metrics in Prometheus format for easy integration |
---
## ποΈ Architecture
```
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β HELIOS PLATFORM β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β ββββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββββββ β
β β Metrics Adapter βββββΆβ ML Pipeline βββββΆβ Inference Service β β
β β (pluggable) β β (training) β β (FastAPI) β β
β β β’ GCP β β β β /predict β β
β β β’ AWS β β β’ Baseline β β /detect β β
β β β’ Azure β β β’ Prophet β β /recommend β β
β β β’ Prometheus β β β’ XGBoost β β /metrics β β
β ββββββββββββββββββββ βββββββββββββββββββ ββββββββββββ¬βββββββββββ β
β β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββΌββββββββββββ β
β β β β β
β βΌ βΌ βΌ β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ βββββββββββββ β
β β Prometheus β β Alertmanagerβ β KEDA β β Grafana β β
β β (scraping) β β (alerts) β β (autoscale) β β (dashboardβ β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ βββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
```
---
## π Project Structure
```
helios/
βββ agent/ # Helios Agent (metrics collection)
β βββ src/helios_agent/
β β βββ agent.py # Main agent orchestrator
β β βββ cli.py # Command-line interface
β β βββ config.py # Configuration loader
β β βββ client.py # Helios API client
β β βββ sources/ # Pluggable metric sources
β β βββ base.py # MetricsSource interface
β β βββ registry.py # Source registration
β β βββ system.py # Local system metrics
β β βββ prometheus.py # Prometheus backend
β β βββ datadog.py # Datadog backend
β β βββ cloudwatch.py # AWS CloudWatch
β β βββ azure_monitor.py # Azure Monitor
β β βββ gcp_monitoring.py# GCP Cloud Monitoring
β βββ pyproject.toml
β
βββ ml/ # Machine Learning
β βββ config.py # Configuration
β βββ train.py # Training pipeline
β βββ pipeline/ # Data processing
β β βββ data_fetcher.py # Cloud metrics fetcher
β β βββ feature_engineering.py
β βββ models/ # ML models
β β βββ baseline.py # Moving Average + Trend
β β βββ prophet_model.py # Prophet forecasting
β β βββ xgboost_anomaly.py # XGBoost anomaly detection
β βββ inference/ # Inference service (Phase 5)
β βββ app.py # FastAPI application
β βββ ...
β
βββ infra/ # Infrastructure
β βββ terraform/ # IaC for GCP/AWS/Azure
β β βββ gcp/
β β βββ main.tf
β β βββ modules/
β β βββ gke/ # Kubernetes cluster
β β βββ cloudsql/ # PostgreSQL database
β β βββ redis/ # Memorystore cache
β β βββ ...
β βββ kubernetes/ # K8s manifests
β βββ saleor/ # Demo application
β βββ locust/ # Load testing
β βββ monitoring/ # Prometheus + Grafana
β βββ helios-inference/ # ML inference service
β
βββ loadtest/ # Load testing
β βββ locustfiles/ # Locust scenarios
β βββ locustfile.py # Main test file
β βββ personas/ # User personas
β
βββ docs/ # Documentation
βββ architecture/
βββ ARCHITECTURE.md
```
---
## π Quick Start
### Prerequisites
- Python 3.11+
- Kubernetes cluster (GKE, EKS, AKS, or local)
- kubectl configured
- Terraform (for infrastructure provisioning)
- Google Cloud SDK (for GCP deployment)
### 1. Clone and Setup
```bash
git clone https://github.com/your-org/helios.git
cd helios
python -m venv .venv
source .venv/bin/activate # or .venv\Scripts\activate on Windows
pip install -r ml/requirements.txt
```
### 2. Configure Your Project
Use the setup script to configure Helios with your GCP project:
```bash
# Linux/Mac
./scripts/setup.sh YOUR_GCP_PROJECT_ID
# Windows PowerShell
.\scripts\setup.ps1 -ProjectId YOUR_GCP_PROJECT_ID
```
This will:
- Create a `.env` file with your project configuration
- Update Kubernetes manifests with your project ID
- Set up all required environment variables
Alternatively, configure manually:
```bash
# Copy the example env file
cp .env.example .env
# Edit .env with your values
# GCP_PROJECT_ID=your-project-id
# GCP_REGION=us-central1
```
### 3. Authenticate with Cloud Provider
```bash
# For GCP
gcloud config set project YOUR_GCP_PROJECT_ID
gcloud auth application-default login
# For AWS
export AWS_PROFILE=your-profile
# For Azure
az login
```
### 3. Train Models
```bash
cd ml
python train.py
```
### 4. Deploy Infrastructure
```bash
cd infra/terraform/gcp
terraform init
terraform apply
```
### 5. Deploy Inference Service
```bash
kubectl apply -k infra/kubernetes/helios-inference/
```
---
## π€ Helios Agent
The Helios Agent is a unified metrics collection daemon that can pull metrics from multiple backends and forward them to the Helios platform.
### Installation
```bash
# Base installation (system metrics + Prometheus)
pip install helios-agent
# With specific backends
pip install helios-agent[datadog] # + Datadog support
pip install helios-agent[aws] # + AWS CloudWatch
pip install helios-agent[azure] # + Azure Monitor
pip install helios-agent[gcp] # + GCP Cloud Monitoring
pip install helios-agent[all] # All backends
```
### Supported Metrics Sources
| Source | Description | Requirements |
|--------|-------------|--------------|
| `system` | Local CPU, memory, disk, network via psutil | Built-in |
| `prometheus` | Query any Prometheus server | Built-in |
| `datadog` | Pull metrics from Datadog API | `pip install helios-agent[datadog]` |
| `cloudwatch` | AWS CloudWatch metrics | `pip install helios-agent[aws]` |
| `azure_monitor` | Azure Monitor metrics | `pip install helios-agent[azure]` |
| `gcp_monitoring` | Google Cloud Monitoring | `pip install helios-agent[gcp]` |
### Quick Start
```bash
# Generate a configuration file
helios-agent init
# List available metric sources
helios-agent sources
# Test configured sources
helios-agent test
# Run the agent (continuous collection)
helios-agent run
# Check agent status
helios-agent status
```
### Configuration
Create a `helios-agent.yaml` file:
```yaml
agent:
collection_interval: 60 # seconds
batch_size: 100
log_level: INFO
sources:
# Local system metrics (always recommended)
- type: system
enabled: true
config:
collect_cpu: true
collect_memory: true
collect_disk: true
collect_network: true
# Prometheus server
- type: prometheus
enabled: true
config:
url: http://prometheus:9090
queries:
- name: container_cpu
query: rate(container_cpu_usage_seconds_total[5m])
- name: container_memory
query: container_memory_usage_bytes
# AWS CloudWatch
- type: cloudwatch
enabled: false
config:
region: us-east-1
namespace: AWS/EC2
metrics:
- CPUUtilization
- NetworkIn
- NetworkOut
# Datadog
- type: datadog
enabled: false
config:
api_key: ${DATADOG_API_KEY}
app_key: ${DATADOG_APP_KEY}
queries:
- avg:system.cpu.user{*}
helios:
endpoint: http://localhost:8080
api_key: ${HELIOS_API_KEY}
```
### Environment Variables
| Variable | Description | Default |
|----------|-------------|---------|
| `HELIOS_CONFIG_FILE` | Path to config file | `./helios-agent.yaml` |
| `HELIOS_ENDPOINT` | Helios API endpoint | `http://localhost:8080` |
| `HELIOS_API_KEY` | API key for authentication | - |
| `DATADOG_API_KEY` | Datadog API key | - |
| `DATADOG_APP_KEY` | Datadog application key | - |
| `AWS_REGION` | AWS region for CloudWatch | - |
### CLI Commands
```bash
# Initialize configuration
helios-agent init [--output FILE]
# Run agent with custom config
helios-agent run --config /path/to/config.yaml
# List registered source plugins
helios-agent sources
# Test all configured sources
helios-agent test
# Show agent status
helios-agent status
# Run single collection (for testing)
helios-agent run --once
```
### Creating Custom Sources
You can create custom metric sources by implementing the `MetricsSource` interface:
```python
from helios_agent.sources import MetricsSource, register_source, MetricSample
@register_source("custom")
class CustomSource(MetricsSource):
"""Custom metrics source."""
async def initialize(self) -> None:
# Connect to your data source
pass
async def collect(self) -> CollectionResult:
# Fetch and return metrics
samples = [
MetricSample(
name="custom_metric",
value=42.0,
labels={"env": "prod"}
)
]
return CollectionResult(samples=samples)
async def health_check(self) -> bool:
return True
async def close(self) -> None:
# Cleanup
pass
```
---
## π ML Models
### Model Performance (167 data points, 24-hour training)
| Model | MAE | MAPE | Notes |
|-------|-----|------|-------|
| **Baseline (MA+Trend)** | 2.5M | 2.6% | Simple, fast, reliable |
| **Prophet** | 25M | 21.1% | Better with more data, seasonality |
| **XGBoost Anomaly** | - | 0.69% rate | 1 anomaly detected |
### Feature Engineering
- **Lag features**: 1, 3, 6, 12 periods
- **Rolling statistics**: mean, std, min, max (windows: 3, 6, 12)
- **Time features**: hour, day_of_week, is_weekend, sin/cos encoding
- **Percent changes**: 1, 3 periods
---
## π API Endpoints
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/health` | GET | Service health check |
| `/models` | GET | List loaded models |
| `/predict` | POST | Forecast future metrics |
| `/detect` | POST | Anomaly detection scoring |
| `/recommend` | POST | Scaling recommendations |
| `/metrics` | GET | Prometheus metrics |
### Example: Get Predictions
```bash
curl -X POST http://helios-inference:8080/predict \
-H "Content-Type: application/json" \
-d '{
"metrics": {
"cpu_utilization": 0.45,
"memory_utilization": 0.62,
"db_connections": 15
},
"periods": 12
}'
```
### Example: Detect Anomalies
```bash
curl -X POST http://helios-inference:8080/detect \
-H "Content-Type: application/json" \
-d '{
"metrics": {
"cpu_utilization": 0.95,
"memory_utilization": 0.88,
"db_connections": 150
}
}'
```
---
## π§ Configuration
### Environment Variables
| Variable | Description | Default |
|----------|-------------|---------|
| `GCP_PROJECT_ID` | GCP project (for Cloud Monitoring) | - |
| `METRICS_LOOKBACK_HOURS` | Hours of historical data | 24 |
| `AGGREGATION_INTERVAL` | Metric aggregation (minutes) | 5 |
| `ANOMALY_THRESHOLD_SIGMA` | Standard deviations for anomaly | 2.5 |
### Scaling Thresholds
```yaml
scaling:
cpu_scale_up_threshold: 0.80
cpu_scale_down_threshold: 0.20
memory_warning_threshold: 0.85
min_replicas: 1
max_replicas: 10
```
---
## π Roadmap
### Completed
- [x] **Phase 1**: Infrastructure setup (Terraform + GKE)
- [x] **Phase 2**: Demo application (Saleor e-commerce)
- [x] **Phase 3**: Observability (Prometheus + Grafana)
- [x] **Phase 4**: ML pipeline (Baseline, Prophet, XGBoost)
### In Progress
- [ ] **Phase 5**: Inference service & auto-scaling integration
- [ ] FastAPI inference service
- [ ] Real-time scoring loop
- [ ] KEDA predictive autoscaling
- [ ] Grafana dashboards
- [ ] Alertmanager integration
### Future
- [ ] **Phase 6**: Multi-cloud adapters (AWS, Azure)
- [ ] **Phase 7**: Deep learning models (LSTM, Transformer)
- [ ] **Phase 8**: Kubernetes operator
---
## π§ͺ Testing
### Run Unit Tests
```bash
cd ml
pytest tests/
```
### Run Load Tests
```bash
# Deploy Locust to cluster
kubectl apply -k infra/kubernetes/locust/base
# Port-forward to UI
kubectl port-forward -n loadtest svc/locust-master 8089:8089
# Start test via API
curl -X POST http://localhost:8089/swarm \
-d "user_count=100&spawn_rate=10&host=http://saleor-api.saleor.svc"
```
---
## π License
Apache 2.0 - See [LICENSE](LICENSE) for details.
---
## π€ Contributing
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing`)
3. Commit changes (`git commit -m 'Add amazing feature'`)
4. Push to branch (`git push origin feature/amazing`)
5. Open a Pull Request