{"id":50285556,"url":"https://github.com/benfradjselim/ruptura","last_synced_at":"2026-05-28T02:00:39.668Z","repository":{"id":345635186,"uuid":"1185831602","full_name":"benfradjselim/ruptura","owner":"benfradjselim","description":"Predictive failure detection engine for cloud-native infrastructure. Rupture Index™ detects divergence hours early — adaptive ensemble of 5 models, 8 composite signals, automated K8s remediation.","archived":false,"fork":false,"pushed_at":"2026-05-20T19:51:23.000Z","size":48264,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-21T01:11:15.399Z","etag":null,"topics":["cloud-native","go","kubernetes","mlops","observability","opentelemetry","predictive-analytics","prometheus","rupture-detection","sre"],"latest_commit_sha":null,"homepage":"https://benfradjselim.github.io/ruptura/","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/benfradjselim.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":"GOVERNANCE.md","roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":"MAINTAINERS.md","copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-19T01:37:25.000Z","updated_at":"2026-05-20T19:51:27.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/benfradjselim/ruptura","commit_stats":null,"previous_names":["benfradjselim/mlops_crew_automation","benfradjselim/ruptura"],"tags_count":47,"template":false,"template_full_name":null,"purl":"pkg:github/benfradjselim/ruptura","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/benfradjselim%2Fruptura","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/benfradjselim%2Fruptura/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/benfradjselim%2Fruptura/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/benfradjselim%2Fruptura/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/benfradjselim","download_url":"https://codeload.github.com/benfradjselim/ruptura/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/benfradjselim%2Fruptura/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33590884,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-28T02:00:06.440Z","response_time":99,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cloud-native","go","kubernetes","mlops","observability","opentelemetry","predictive-analytics","prometheus","rupture-detection","sre"],"created_at":"2026-05-28T02:00:29.961Z","updated_at":"2026-05-28T02:00:39.659Z","avatar_url":"https://github.com/benfradjselim.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Ruptura\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/logo/ruptura-icon-256.png\" alt=\"Ruptura\" width=\"120\" /\u003e\n\u003c/p\u003e\n\n**The Predictive Action Layer for Cloud-Native Infrastructure.**\n\nRuptura detects workload ruptures before they cause outages — using the Fused Rupture Index™, 10 composite KPI signals with adaptive per-workload baselines, and an action engine that responds automatically with safety gates.\n\n→ **[Technical documentation \u0026 quickstart](workdir/README.md)**\n→ **[Website \u0026 full docs](https://benfradjselim.github.io/ruptura/)**\n→ **[API Specification](docs/openapi.yaml)**\n→ **[Live dashboard](http://185.229.225.115:31469/)**\n\n---\n\n## Project Status\n\n| Version | Date | Status |\n|---------|------|--------|\n| v7.0.4 | 2026-05-15 | ✅ Released — OTLP NodePort 31470 exposed, workload simulator |\n| v7.0.3 | 2026-05-15 | ✅ Released — JSON crash fix, real PNG logo, topology overhaul, health scores per workload |\n| v7.0.2 | 2026-05-15 | ✅ Released — 10-signal bars, light/dark mode, dataflow stats, all backend APIs wired |\n| v7.0.1 | 2026-05-15 | ✅ Released — ruptura-ui pod, logo, calibrating state, Settings \u0026 Alerts pages |\n| v7.0.0 | 2026-05-15 | ✅ Released — v7 architecture: separate UI pod, SSE, k8s metadata, node health |\n| v6.8.13 | 2026-05-13 | ✅ Released — log/trace ingest counters, Live Data Flow, ruptura-ctl v1.0.0 |\n\n**Operator:**\n\n| Version | Date | Status |\n|---------|------|--------|\n| ruptura-operator v0.6.9 | 2026-05-07 | 🔄 Submitted to Red Hat OperatorHub |\n| ruptura-operator v0.6.8 | 2026-05-07 | ✅ Merged into OperatorHub community-operators |\n\n**Active branch:** `v7` · **Module:** `github.com/benfradjselim/ruptura`\n\n---\n\n## v7 Architecture\n\nv7 ships as **two separate Kubernetes pods** behind a shared Helm chart:\n\n```\n┌────────────────────────────────────────────────────────────┐\n│                      ruptura-system                         │\n│                                                             │\n│  ┌───────────────────────┐    ┌────────────────────────┐   │\n│  │    ruptura-engine     │    │      ruptura-ui         │   │\n│  │    (Go binary)        │    │  (Svelte 4 + nginx)     │   │\n│  │                       │    │                          │   │\n│  │  :8080  REST API      │◄───│  nginx proxies /api/    │   │\n│  │  :4317  OTLP ingest   │    │  injects Bearer token   │   │\n│  │                       │    │  :80   dashboard UI      │   │\n│  └───────────────────────┘    └────────────────────────┘   │\n│          NodePort 31468               NodePort 31469         │\n│          NodePort 31470 (OTLP)                               │\n└────────────────────────────────────────────────────────────┘\n```\n\n| Port | Purpose |\n|------|---------|\n| 31468 | Engine REST API (`/api/v2/*`) |\n| 31469 | Svelte dashboard |\n| 31470 | OTLP ingest (`/api/v2/write`, `/otlp/v1/metrics`, `/otlp/v1/logs`, `/otlp/v1/traces`) |\n\n---\n\n## How Ruptura Works\n\n### 1 — Telemetry ingestion (port 31470)\n\n```\nPrometheus remote-write  →  /api/v2/write          →  metric pipeline\nOTLP metrics             →  /otlp/v1/metrics        →  metric pipeline\nOTLP logs                →  /otlp/v1/logs           →  burst detector → logR\nOTLP traces              →  /otlp/v1/traces         →  topology graph + traceR\n```\n\nWorkload identity is derived from OTLP resource attributes (`k8s.deployment.name`, `k8s.namespace.name`). Metrics via `/api/v2/write` must include a `host` label set to `namespace/Kind/name`.\n\n### 2 — Signal computation (10 KPIs)\n\nEvery 15 seconds, the analyzer computes 10 composite KPI signals per workload:\n\n| Signal | Measures |\n|--------|----------|\n| **Stress** | CPU + latency burst |\n| **Fatigue** | Cumulative baseline deviation (long-term wear) |\n| **Mood** | Log error/warn sentiment ratio |\n| **Pressure** | Memory + disk saturation |\n| **Humidity** | Forecast variance — how predictable behavior is |\n| **Contagion** | Error propagation from upstream services |\n| **Resilience** | Recovery speed after spikes |\n| **Entropy** | Internal signal disorder |\n| **Velocity** | Request rate acceleration |\n| **Throughput** | Data volume processed per cycle |\n\nEach signal carries `value`, `state` (ok / warning / critical), and `trend` (rising / falling / stable).\n\n### 3 — Fused Rupture Index (FusedR)\n\n```\nFusedR = weighted_average(metricR, logR, traceR)\n```\n\n| FusedR | State | Default action |\n|--------|-------|----------------|\n| \u003c 1.5 | Stable / Elevated | None |\n| 1.5 – 3.0 | Warning | Tier-3 — human alert |\n| 3.0 – 5.0 | Critical | Tier-2 — suggested action |\n| ≥ 5.0 | Emergency | Tier-1 — automated action |\n\n### 4 — Adaptive 5-model ensemble\n\n| Model | Strength |\n|-------|----------|\n| CA-ILR (dual-scale) | O(1) update, detects acceleration |\n| ARIMA | Stationary series with trends |\n| Holt-Winters | Seasonal / periodic patterns |\n| MAD | Outlier-robust |\n| EWMA | Fast reaction to recent shifts |\n\nModels are re-weighted every 60s based on actual prediction error — no config needed.\n\n### 5 — HealthScore forecast\n\nProjects HealthScore +15 and +30 minutes forward. `critical_eta_minutes` appears on the card when the projected score is heading toward critical — shows \"⚠ Critical in ~12m\" in the UI.\n\n### 6 — Rupture fingerprinting \u0026 pattern matching\n\nAt every confirmed rupture (FusedR ≥ 3.0), an 11-dimensional KPI vector is stored. Future queries run cosine similarity — a match ≥ 0.85 surfaces as `pattern_match` with the prior resolution note. Operators can immediately apply a known fix.\n\n### 7 — Business signals\n\n| Signal | Description |\n|--------|-------------|\n| `slo_burn_velocity` | Error budget burn rate (multiples) |\n| `blast_radius` | Downstream service count |\n| `recovery_debt` | Near-miss count in 7 days |\n\n### 8 — SSE live event stream\n\n`GET /api/v2/events` — Server-Sent Events. Every rupture and recovery fires in real time. The Fleet dashboard shows a live rupture counter that updates without polling.\n\n### 9 — Action engine\n\n```\nFusedR ≥ threshold → safety gates → K8s (scale / restart / cordon) or webhook\nPOST /api/v2/actions/emergency-stop   →  halt all pending actions immediately\n```\n\n### 10 — Narrative explain\n\n```\nGET /api/v2/explain/{id}/narrative\n→ \"payment-api fatigue 0.81 + contagion wave from payment-db pushed\n   FusedR from 1.8 to 4.2 in 18 minutes. Cascade rupture, not an isolated spike.\"\n\nGET /api/v2/explain/{id}/formula\n→ raw KPI breakdown\n```\n\n---\n\n## Dashboard (v7)\n\nSvelte 4 SPA with nginx reverse proxy — light/dark mode toggle, full SSE integration.\n\n| View | What you see |\n|------|-------------|\n| **Fleet** | Workload grid. Per-card: health ring (actual KPI value), 10 signal mini-bars, calibration progress, rupture warning |\n| **Fleet → Signals** | 10 KPI cells, PatternMatch warning, BusinessSignals, explain panel |\n| **Fleet → History** | Time-series — toggle any of 12 signals; Chart.js |\n| **Fleet → Forecast** | HealthScore projection chart |\n| **Fleet → Predictions** | Per-metric ensemble predictions |\n| **Fleet → Events** | SSE live rupture/recovery log |\n| **Fleet → Logs** | Last 200 log lines for the workload |\n| **Fleet → Actions** | Approve / reject Tier-2 actions |\n| **Fleet → Kubernetes** | Pod list, replicas, resources, labels |\n| **Topology** | Service dependency graph from OTLP traces. Click node → health bar + FusedR. Click edge → call rate, error rate, P99 latency |\n| **Engine** | Runtime stats, analyzer state, ingest rates, cumulative data flow, BadgerDB storage |\n| **Alerts** | Active / resolved alert feed |\n| **Nodes** | K8s node health — CPU, memory, disk pressure |\n| **Settings** | Data sources, Ingest Stats (live totals), preferences |\n\n---\n\n## Install\n\n**Kubernetes (Helm — OCI):**\n\n```bash\nhelm install ruptura oci://ghcr.io/benfradjselim/charts/ruptura \\\n  --namespace ruptura-system \\\n  --create-namespace \\\n  --set apiKey=$(openssl rand -hex 32)\n\n# Dashboard:   http://\u003cnode-ip\u003e:31469/\n# Engine API:  http://\u003cnode-ip\u003e:31468/api/v2/health\n# OTLP ingest: http://\u003cnode-ip\u003e:31470/api/v2/write\n```\n\n**Inject synthetic workloads immediately:**\n\n```bash\npython3 scripts/simulate.py\n# Sends 5 workloads every 5s:\n#   gateway        — stable/healthy\n#   order-service  — slow-burn CPU stress (45→90% over 10 min)\n#   payment-api    — error bursts every 2 min (8→43%)\n#   cache-worker   — traffic spikes every 3 min (1200 req/s)\n#   ml-inference   — noisy/calibrating new workload\n```\n\n**Send OTLP metrics directly:**\n\n```bash\n# Remote-write format\ncurl -X POST http://\u003cnode-ip\u003e:31470/api/v2/write \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"timeseries\": [{\n      \"Labels\": [\n        {\"Name\": \"__name__\",   \"Value\": \"cpu_percent\"},\n        {\"Name\": \"host\",       \"Value\": \"default/Deployment/my-app\"},\n        {\"Name\": \"namespace\",  \"Value\": \"default\"},\n        {\"Name\": \"deployment\", \"Value\": \"my-app\"}\n      ],\n      \"Samples\": [{\"Value\": 72.5, \"Timestamp\": '$(date +%s%3N)'}]\n    }]\n  }'\n\n# OTLP JSON\ncurl -X POST http://\u003cnode-ip\u003e:31470/otlp/v1/metrics \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"resourceMetrics\":[{\"resource\":{\"attributes\":[\n    {\"key\":\"k8s.deployment.name\",\"value\":{\"stringValue\":\"my-app\"}},\n    {\"key\":\"k8s.namespace.name\",\"value\":{\"stringValue\":\"default\"}}\n  ]},\"scopeMetrics\":[{\"metrics\":[\n    {\"name\":\"process.cpu.utilization\",\"gauge\":{\"dataPoints\":[\n      {\"timeUnixNano\":\"'$(date +%s%N)'\",\"asDouble\":0.72}\n    ]}}\n  ]}]}]}'\n```\n\n---\n\n## Repository Layout\n\n```\nworkdir/                  Go source (v7.0.4)\n  cmd/ruptura/            Engine binary\n  cmd/ruptura-ctl/        CLI tool\n  internal/\n    analyzer/             10-signal KPI computation + calibration\n    api/                  REST API (44 endpoints)\n    actions/              Action engine + K8s actuator + safety gates\n    correlator/           Burst detector + topology builder\n    explain/              Narrative engine + fingerprinting\n    fusion/               FusedR compositor (metricR + logR + traceR)\n    history/              Time-series history manager\n    ingest/               OTLP + Prometheus remote-write receivers\n    pipeline/             5-model anomaly ensemble\n    predictor/            HealthScore forecast\n    storage/              BadgerDB + TTL GC\n\nui/                       Svelte 4 dashboard (v7.0.4)\n  src/routes/             Fleet, Map, Engine, Alerts, Nodes, Settings\n  src/components/         WorkloadCard, TopologyMap, NavBar, modals\n\nhelm/                     Helm chart (OCI: ghcr.io/benfradjselim/charts/ruptura)\nscripts/\n  simulate.py             Workload simulator (5 behavioral profiles)\noperator/                 Kubernetes operator (ruptura-operator v0.6.9)\n```\n\n---\n\n## Roadmap\n\n```\nv7.0.4  ✅  OTLP NodePort 31470 · workload simulator\nv7.0.3  ✅  JSON crash fix · PNG logo · topology edge click · per-workload health scores\nv7.0.2  ✅  10-signal bars · light/dark mode · dataflow stats · all backend APIs wired\nv7.0.1  ✅  ruptura-ui pod · calibration per workload · Settings + Alerts pages\nv7.0.0  ✅  Separate UI pod · SSE stream · k8s metadata · node health view\nv6.8.x  ✅  Log/trace counters · BadgerDB GC · embedded dashboard (pre-v7)\n\nv7.1.0  ⏳  SLO config UI · dashboard layout customization · multi-tenant namespaces\nv7.2.0  ⏳  Python SDK v2 · Grafana data source plugin\n```\n\n---\n\n## CNCF\n\nRuptura targets alignment with CNCF sandbox criteria: Apache 2.0 license, open governance ([GOVERNANCE.md](GOVERNANCE.md)), documented security policy ([SECURITY.md](SECURITY.md)), public roadmap.\n\n---\n\n## License\n\nApache 2.0\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbenfradjselim%2Fruptura","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbenfradjselim%2Fruptura","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbenfradjselim%2Fruptura/lists"}