{"id":50454107,"url":"https://github.com/mizcausevic-dev/slo-budget-tracker","last_synced_at":"2026-06-01T01:05:39.122Z","repository":{"id":358120883,"uuid":"1239229466","full_name":"mizcausevic-dev/slo-budget-tracker","owner":"mizcausevic-dev","description":"SLO + error-budget tracker for Python services. FastAPI middleware, Prometheus exporter, multi-window burn-rate alerts. Part of the Platform Reliability Stack.","archived":false,"fork":false,"pushed_at":"2026-05-15T19:28:01.000Z","size":25,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-15T22:18:54.612Z","etag":null,"topics":["asgi","burn-rate","error-budget","fastapi","monitoring","prometheus","python","reliability","slo","sre"],"latest_commit_sha":null,"homepage":"https://kineticgain.com/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mizcausevic-dev.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-14T22:22:37.000Z","updated_at":"2026-05-15T19:28:04.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/mizcausevic-dev/slo-budget-tracker","commit_stats":null,"previous_names":["mizcausevic-dev/slo-budget-tracker"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/mizcausevic-dev/slo-budget-tracker","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mizcausevic-dev%2Fslo-budget-tracker","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mizcausevic-dev%2Fslo-budget-tracker/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mizcausevic-dev%2Fslo-budget-tracker/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mizcausevic-dev%2Fslo-budget-tracker/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mizcausevic-dev","download_url":"https://codeload.github.com/mizcausevic-dev/slo-budget-tracker/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mizcausevic-dev%2Fslo-budget-tracker/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33755379,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-31T02:00:06.040Z","response_time":95,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["asgi","burn-rate","error-budget","fastapi","monitoring","prometheus","python","reliability","slo","sre"],"created_at":"2026-06-01T01:05:39.064Z","updated_at":"2026-06-01T01:05:39.114Z","avatar_url":"https://github.com/mizcausevic-dev.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# slo-budget-tracker\n\n[![CI](https://github.com/mizcausevic-dev/slo-budget-tracker/actions/workflows/ci.yml/badge.svg)](https://github.com/mizcausevic-dev/slo-budget-tracker/actions/workflows/ci.yml)\n[![Python](https://img.shields.io/badge/python-3.11%20%7C%203.12%20%7C%203.13-blue)](https://www.python.org/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)\n\n**SLO + error-budget tracker for Python services** — drop-in FastAPI middleware, Prometheus exporter, and a small standalone library you can wire into any ASGI app or background worker.\n\nBuilt around the math in the [Google SRE Workbook](https://sre.google/workbook/alerting-on-slos/): one rolling window, multi-window burn-rate alerts (defaults to 1h + 6h at burn rate ≥ 14.4), and an explicit error-budget remaining gauge so dashboards stop lying about reliability.\n\n---\n\n## Why\n\nMost \"SLO dashboards\" you find in the wild conflate _availability_ with _uptime_ and surface neither error budget nor burn rate. You can't tell, at a glance, whether the freshly deployed service is **burning the next 30 days of error budget in the next 30 minutes**. This library makes that visible by default.\n\nTwo things matter:\n\n1. **Error budget remaining** — a `[1.0 → ≤0]` ratio on every dashboard.\n2. **Burn rate** — `(1 − actual_success_ratio) / (1 − target)`, sampled at short windows so fast-burn incidents page before the budget is spent.\n\n---\n\n## Install\n\n```bash\npip install slo-budget-tracker\n# or, with the FastAPI extras:\npip install \"slo-budget-tracker[fastapi]\"\n```\n\nPython 3.11+. Single runtime dep: `prometheus-client`.\n\n---\n\n## Quick start — standalone library\n\n```python\nfrom slo_budget_tracker import SLODefinition, SLOTracker\n\ntracker = SLOTracker(\n    SLODefinition(\n        name=\"availability\",\n        target=0.999,                # three nines\n        window_seconds=30 * 24 * 3600,  # 30-day rolling window\n        burn_rate_windows=(3600, 21600),  # alert on 1h and 6h\n        burn_rate_threshold=14.4,         # SRE workbook fast-burn page\n    )\n)\n\n# Hot path — O(1)\ntracker.record_success()\ntracker.record_failure()\n\nsnap = tracker.snapshot()\nprint(f\"success ratio: {snap.success_ratio:.4f}\")\nprint(f\"budget left:   {snap.error_budget_remaining:.2%}\")\nprint(f\"burn rate:     {snap.burn_rate:.2f}\")\n\nif snap.is_budget_exhausted:\n    print(\"Freeze deploys.\")\n\nfor alert in tracker.check_burn_rate():\n    print(f\"FAST BURN over {alert.window_seconds}s: {alert.burn_rate:.1f}x budget\")\n```\n\n---\n\n## FastAPI middleware\n\n`SLOMiddleware` auto-classifies every HTTP response — by default 5xx and unhandled exceptions are failures, everything else is a success. Override with your own classifier when 4xx (or specific routes) should burn budget.\n\n```python\nfrom fastapi import FastAPI\nfrom fastapi.responses import Response\nfrom slo_budget_tracker import (\n    PrometheusExporter,\n    SLODefinition,\n    SLOMiddleware,\n    SLORegistry,\n)\n\nregistry = SLORegistry()\nregistry.define(SLODefinition(name=\"availability\", target=0.999))\nregistry.define(SLODefinition(name=\"freshness\",    target=0.99))\n\napp = FastAPI()\napp.add_middleware(SLOMiddleware, registry=registry, slo_name=\"availability\")\n\nexporter = PrometheusExporter(registry)\n\n\n@app.get(\"/metrics\")\nasync def metrics() -\u003e Response:\n    body, content_type = exporter.render()\n    return Response(content=body, media_type=content_type)\n\n\n@app.get(\"/slo\")\nasync def slo_snapshot() -\u003e dict[str, object]:\n    return {\"slos\": [s.__dict__ for s in registry.snapshot_all()]}\n```\n\nPoint your Prometheus scrape at `/metrics` and you get:\n\n```\nslo_target{slo=\"availability\"} 0.999\nslo_success_ratio{slo=\"availability\"} 0.9991\nslo_error_budget_remaining{slo=\"availability\"} 0.42\nslo_burn_rate{slo=\"availability\",window_seconds=\"3600\"} 2.1\nslo_burn_rate{slo=\"availability\",window_seconds=\"21600\"} 0.8\nslo_breached{slo=\"availability\"} 0.0\n```\n\n---\n\n## Custom classification\n\nDefault: anything `\u003c 500` and no exception is a success. Want 4xx to burn budget? Pass `classify=`:\n\n```python\napp.add_middleware(\n    SLOMiddleware,\n    registry=registry,\n    slo_name=\"availability\",\n    classify=lambda status, exc: exc is None and status \u003c 400,\n)\n```\n\nThe classifier receives `(status_code, exception_or_None)` and returns `True` for success.\n\n---\n\n## API surface\n\n| Object             | Purpose                                                            |\n| ------------------ | ------------------------------------------------------------------ |\n| `SLODefinition`    | Frozen dataclass: name, target, window, burn-rate windows + threshold. Validates at construction. |\n| `SLOTracker`       | Records observations, computes snapshots and burn-rate alerts.     |\n| `SLORegistry`      | Holds many named trackers; supports `snapshot_all()` and `check_burn_rates()`. |\n| `SLOMiddleware`    | ASGI middleware that auto-records HTTP outcomes against a tracker. |\n| `PrometheusExporter` | Renders the registry as Prometheus text format on demand.        |\n| `Observation`      | `(timestamp, success)` event.                                      |\n| `SLOSnapshot`      | Point-in-time view: ratios, failures, budget remaining, burn rate. |\n| `BurnRateAlert`    | One short window has crossed the configured threshold.             |\n| `BurnRateSample`   | One short-window measurement attached to a snapshot.               |\n\n---\n\n## Burn-rate math\n\n```\nerror_budget   = (1 - target) * total_requests_in_window\nbudget_used    = failures_in_window\nremaining_pct  = (error_budget - budget_used) / error_budget\n\nburn_rate(short_window) = (1 - success_ratio(short_window)) / (1 - target)\n```\n\nA `burn_rate == 1.0` means the service is failing at exactly the rate the SLO allows. `burn_rate == 14.4` means the next 30-day budget is being eaten in ~2 days. The default threshold of `14.4` follows the [SRE Workbook fast-burn page](https://sre.google/workbook/alerting-on-slos/#5-multiwindow-multi-burn-rate-alerts).\n\n---\n\n## Storage backends\n\nThe default `InMemoryStore` keeps a thread-safe deque trimmed to the window. For services pushing `\u003e` ~100 rps you'll want a sampling or bucketed backend — wire one in by passing `store=` to `SLOTracker`. The protocol is small:\n\n```python\nclass ObservationStore(Protocol):\n    def record(self, observation: Observation) -\u003e None: ...\n    def window(self, now: float, seconds: int) -\u003e list[Observation]: ...\n    def trim(self, before: float) -\u003e None: ...\n    def __len__(self) -\u003e int: ...\n```\n\nA Redis sorted-set backend is on the roadmap (`ZADD`/`ZREMRANGEBYSCORE`); contributions welcome.\n\n---\n\n## Tests\n\n```bash\npip install -e \".[dev]\"\nruff check src tests \u0026\u0026 ruff format --check src tests\nmypy src\npytest -v\n```\n\nThe CI matrix runs Python 3.11 / 3.12 / 3.13.\n\n---\n\n## Related work in this ecosystem\n\nThis is part of the [**Platform Reliability Stack**](https://github.com/mizcausevic-dev) — small, focused libraries that compose into a production reliability story:\n\n- **[procurement-decision-api](https://github.com/mizcausevic-dev/procurement-decision-api)** — drafts AI Procurement Decision Cards from vendor Suite documents.\n- **reliability-toolkit-rs** — async rate-limit + circuit-breaker + retry + bulkhead in Rust _(coming next)_.\n- More at [kineticgain.com](https://kineticgain.com/).\n\n---\n\n## License\n\nMIT. See [LICENSE](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmizcausevic-dev%2Fslo-budget-tracker","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmizcausevic-dev%2Fslo-budget-tracker","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmizcausevic-dev%2Fslo-budget-tracker/lists"}