{"id":35591307,"url":"https://github.com/rsionnach/nthlayer","last_synced_at":"2026-01-18T15:04:21.214Z","repository":{"id":327617956,"uuid":"1104878764","full_name":"rsionnach/nthlayer","owner":"rsionnach","description":"Generate the complete reliability stack from a service spec in 5 minutes. Dashboards, alerts, SLOs, PagerDuty - zero toil.","archived":false,"fork":false,"pushed_at":"2026-01-12T22:03:43.000Z","size":15136,"stargazers_count":13,"open_issues_count":30,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-01-13T01:57:23.971Z","etag":null,"topics":["alerts","devops","grafana","monitoring","observability","pagerduty","prometheus","python","slo","sre"],"latest_commit_sha":null,"homepage":"https://rsionnach.github.io/nthlayer/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rsionnach.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2025-11-26T20:23:55.000Z","updated_at":"2026-01-12T22:03:03.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/rsionnach/nthlayer","commit_stats":null,"previous_names":["rsionnach/nthlayer"],"tags_count":11,"template":false,"template_full_name":null,"purl":"pkg:github/rsionnach/nthlayer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rsionnach%2Fnthlayer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rsionnach%2Fnthlayer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rsionnach%2Fnthlayer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rsionnach%2Fnthlayer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rsionnach","download_url":"https://codeload.github.com/rsionnach/nthlayer/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rsionnach%2Fnthlayer/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28399850,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-13T14:36:09.778Z","status":"ssl_error","status_checked_at":"2026-01-13T14:35:19.697Z","response_time":56,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["alerts","devops","grafana","monitoring","observability","pagerduty","prometheus","python","slo","sre"],"created_at":"2026-01-04T23:15:44.079Z","updated_at":"2026-01-16T03:47:37.870Z","avatar_url":"https://github.com/rsionnach.png","language":"Python","funding_links":[],"categories":["13. SLOs and SLIs Tools","Incident Management / Incident Response / IT Alerting / On-Call","9. Processing and Analyze and Act"],"sub_categories":["Container Orchestration","Alerts"],"readme":"\u003cdiv align=\"center\"\u003e\n  \u003ca href=\"https://github.com/rsionnach/nthlayer\"\u003e\n    \u003cimg src=\"presentations/public/nthlayer_dark_banner.png\" alt=\"NthLayer\" width=\"400\"\u003e\n  \u003c/a\u003e\n\n  \u003cbr\u003e\u003cbr\u003e\n\n  \u003cimg src=\"demo/vhs/nthlayer-apply.gif\" alt=\"nthlayer apply demo\" width=\"700\"\u003e\n\u003c/div\u003e\n\n# NthLayer\n\n### The Missing Layer of Reliability\n\n**Reliability requirements as code.**\n\n[![Status: Alpha](https://img.shields.io/badge/Status-Alpha-orange?style=for-the-badge)](https://github.com/rsionnach/nthlayer)\n[![PyPI](https://img.shields.io/pypi/v/nthlayer?style=for-the-badge\u0026logo=pypi\u0026logoColor=white)](https://pypi.org/project/nthlayer/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-green?style=for-the-badge)](LICENSE.txt)\n[![Alert Rules](https://img.shields.io/badge/Alert_Rules-593+-red?style=for-the-badge\u0026logo=prometheus\u0026logoColor=white)](https://github.com/samber/awesome-prometheus-alerts)\n\nNthLayer lets you define what \"production-ready\" means for a service,\nthen generates, validates, and enforces those requirements automatically.\n\n**Define once. Generate everything. Block bad deploys.**\n\n---\n\n## The Problem\n\nFor every new service, teams are expected to:\n- Manually create dashboards\n- Hand-craft alerts and recording rules\n- Define SLOs and error budgets\n- Configure incident escalation\n- Decide if a service is \"ready\" for production\n\nThese decisions are usually made **after deployment**, enforced **inconsistently**, or revisited **only during incidents**.\n\n## The Solution\n\nNthLayer moves reliability left in the delivery lifecycle:\n\n```\n┌─────────────────────────────────────────────────────────────────────────────┐\n│ service.yaml → generate → lint → verify → check-deploy → deploy            │\n│                   ↓         ↓       ↓           ↓                          │\n│               artifacts   valid?  metrics?  budget ok?                     │\n│                                                                            │\n│ \"Is this production-ready?\" - answered BEFORE deployment                   │\n└─────────────────────────────────────────────────────────────────────────────┘\n```\n\n```bash\n# In your Tekton/GitHub Actions pipeline:\nnthlayer apply service.yaml --lint    # Generate + validate PromQL syntax\nnthlayer verify service.yaml          # Verify declared metrics exist\nnthlayer check-deploy service.yaml    # Check error budget gate\n# Only if all pass: deploy to production\n```\n\nWorks with: **Tekton**, **GitHub Actions**, **GitLab CI**, **ArgoCD**, **Mimir/Cortex**\n\n---\n\n## 🚦 Shift Left Features\n\n| Command | What It Does | Pipeline Exit Code |\n|---------|--------------|-------------------|\n| `nthlayer verify` | Validates declared metrics exist in Prometheus | 1 if missing metrics |\n| `nthlayer check-deploy` | Checks error budget - blocks if exhausted | 2 if budget exhausted |\n| `nthlayer drift` | Detects reliability degradation trends over time | 1 warn, 2 critical |\n| `nthlayer apply --lint` | Validates PromQL syntax with pint | 1 if invalid queries |\n\n### Deployment Gate Example\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"demo/vhs/check-deploy-demo.gif\" alt=\"nthlayer check-deploy demo\" width=\"700\"\u003e\n\u003c/div\u003e\n\n---\n\n## ⚡ Quick Start\n\n```bash\npipx install nthlayer\n\nnthlayer apply service.yaml\n\n# Output: generated/payment-api/\n#   ├── dashboard.json       → Grafana\n#   ├── alerts.yaml          → Prometheus\n#   ├── slos.yaml            → OpenSLO\n#   └── recording-rules.yaml → Prometheus\n```\n\n---\n\n## What NthLayer Is\n\n- A **reliability specification** that defines production-readiness\n- A **compiler** from service intent to operational reality\n- A **CI/CD-native** way to standardize reliability across teams\n\nNthLayer integrates with existing tools (Prometheus, Grafana, PagerDuty) but operates **before** them - deciding what is allowed to reach production.\n\n## What NthLayer Is Not\n\n- Not a service catalog\n- Not an observability platform\n- Not an incident management system\n- Not a runtime control plane\n\nNthLayer **complements** these systems by ensuring services meet reliability expectations before they are deployed.\n\n## Why NthLayer?\n\n| With NthLayer | Without NthLayer |\n|---------------|------------------|\n| Platform teams encode reliability standards **once** | Standards recreated per service |\n| Service teams inherit sane defaults **automatically** | Each team invents their own |\n| \"Is this production-ready?\" = **deterministic check** | \"Is this ready?\" = negotiated opinion |\n| Reliability is **enforced by default** | Reliability is **reactive and inconsistent** |\n\n---\n\n## 📥 What You Put In\n\n### 1. Service Spec (`service.yaml`)\n\n```yaml\n# Minimal example (5 lines)\nname: payment-api\ntier: critical\ntype: api\ndependencies:\n  - postgresql\n```\n\n### 2. Environment Variables (optional)\n\n```bash\n# 📟 PagerDuty - auto-create team, escalation policy, service\nexport PAGERDUTY_API_KEY=...\n\n# 📊 Grafana - auto-push dashboards\nexport NTHLAYER_GRAFANA_URL=...\nexport NTHLAYER_GRAFANA_API_KEY=...\nexport NTHLAYER_GRAFANA_ORG_ID=1              # Default: 1\n\n# 🔍 Prometheus - metric discovery for intent resolution\nexport NTHLAYER_PROMETHEUS_URL=...\nexport NTHLAYER_METRICS_USER=...              # If auth required\nexport NTHLAYER_METRICS_PASSWORD=...\n```\n\n---\n\n## 📤 What You Get Out\n\n| Output | File | Deploy To |\n|--------|------|-----------|\n| 📊 Dashboard | `generated/\u003cservice\u003e/dashboard.json` | Grafana |\n| 🚨 Alerts | `generated/\u003cservice\u003e/alerts.yaml` | Prometheus |\n| 🎯 SLOs | `generated/\u003cservice\u003e/slos.yaml` | OpenSLO-compatible |\n| ⚡ Recording Rules | `generated/\u003cservice\u003e/recording-rules.yaml` | Prometheus |\n| 📟 PagerDuty | Created via API | Team, escalation policy, service |\n\n---\n\n## 📊 SLO Portfolio\n\nTrack reliability across your entire organization:\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"demo/vhs/portfolio-demo.gif\" alt=\"nthlayer portfolio demo\" width=\"700\"\u003e\n\u003c/div\u003e\n\n```bash\nnthlayer portfolio              # Org-wide reliability view\nnthlayer portfolio --format json  # Machine-readable for dashboards\nnthlayer slo collect service.yaml  # Query current budget from Prometheus\n```\n\n---\n\n## 📝 Full Service Example\n\n```yaml\nname: payment-api\ntier: critical              # critical | standard | low\ntype: api                   # api | worker | stream\nteam: payments\n\nslos:\n  availability: 99.95       # Generates Prometheus alerts\n  latency_p99_ms: 200       # Generates histogram queries\n\ndependencies:\n  - postgresql              # Adds PostgreSQL panels\n  - redis                   # Adds Redis panels\n  - kubernetes              # Adds K8s pod metrics\n\npagerduty:\n  enabled: true\n  support_model: self       # self | shared | sre | business_hours\n```\n\n---\n\n## 💰 The Value\n\n### Generation: 20 hours → 5 minutes per service\n\n| Task | Manual Effort | With NthLayer |\n|------|---------------|---------------|\n| 🎯 Define SLOs \u0026 error budgets | 6 hours | Generated from tier |\n| 🚨 Research \u0026 configure alerts | 4 hours | 400+ battle-tested rules |\n| 📊 Build Grafana dashboards | 5 hours | 12-28 panels auto-generated |\n| 📟 PagerDuty escalation setup | 2 hours | Tier-based defaults |\n| 📋 Write recording rules | 3 hours | 20+ pre-computed metrics |\n\n### Validation: Catch issues before production\n\n| Problem | Without NthLayer | With NthLayer |\n|---------|------------------|---------------|\n| Missing metrics | Discover after deploy | `nthlayer verify` blocks promotion |\n| Invalid PromQL | Prometheus rejects rules | `--lint` catches in CI |\n| Policy violations | Manual review | `nthlayer validate-spec` enforces |\n| Exhausted budget | Deploy anyway, incident | `check-deploy` blocks risky deploys |\n\n### At Scale\n\n| Scale | Generation Saved | Incidents Prevented* |\n|-------|------------------|---------------------|\n| 🚀 50 services | 996 hours ($100K) | ~12/year |\n| 📈 200 services | 3,983 hours ($400K) | ~48/year |\n| 🏢 1,000 services | 19,917 hours ($2M) | ~240/year |\n\n\u003csub\u003e*Estimated based on 60% reduction in \"missing monitoring\" incidents. Value at $100/hr engineering cost.\u003c/sub\u003e\n\n---\n\n## 🧠 How It Works\n\n### Generation\n\n| Step | What Happens |\n|------|--------------|\n| 🎯 **Intent Resolution** | Maps \"availability SLO\" → best matching PromQL query |\n| 🔀 **Type Routing** | API services get HTTP metrics, workers get job metrics |\n| ⚡ **Tier Defaults** | Critical = 99.95% SLO + 5min escalation, Low = 99.5% + 60min |\n| 🏗️ **Technology Templates** | 23 built-in: PostgreSQL, Redis, Kafka, MongoDB, etc. |\n\n### CI/CD Pipeline\n\n```\n┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐\n│   Generate  │───▶│   Validate  │───▶│   Protect   │───▶│   Deploy    │\n│ nthlayer    │    │ --lint      │    │ check-deploy│    │ kubectl     │\n│ apply       │    │ verify      │    │             │    │ argocd      │\n└─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘\n      │                  │                  │\n      ▼                  ▼                  ▼\n  artifacts         exit 1 if          exit 2 if\n  to git            invalid            budget exhausted\n```\n\nWorks with: **GitHub Actions**, **GitLab CI**, **ArgoCD**, **Tekton**, **Jenkins**\n\n---\n\n## 🛠️ CLI Commands\n\n### Generate\n\n```bash\nnthlayer init                   # Interactive service.yaml creation\nnthlayer plan service.yaml      # Preview what will be generated\nnthlayer apply service.yaml     # Generate all artifacts\nnthlayer apply --push           # Also push dashboard to Grafana\nnthlayer apply --push-ruler     # Push alerts to Mimir/Cortex Ruler API\n```\n\n### Validate\n\n```bash\nnthlayer apply --lint           # Validate PromQL syntax (pint)\nnthlayer validate-spec service.yaml  # Check against policies (OPA/Rego)\nnthlayer verify service.yaml    # Verify metrics exist in Prometheus\n```\n\n### Protect\n\n```bash\nnthlayer check-deploy service.yaml  # Check error budget gate (exit 2 = blocked)\nnthlayer drift service.yaml         # Analyze reliability drift trends\nnthlayer portfolio              # Org-wide SLO health\nnthlayer portfolio --drift      # Include drift analysis in portfolio\nnthlayer slo collect service.yaml   # Query current budget from Prometheus\n```\n\n---\n\n## 🔮 Coming Soon\n\n| Feature | Description | Status |\n|---------|-------------|--------|\n| 💰 **Error Budgets** | Track budget consumption, correlate with deploys | ✅ Done |\n| 📊 **SLO Portfolio** | Org-wide reliability view across all services | ✅ Done |\n| 🚦 **Deployment Gates** | Block deploys when error budget exhausted | ✅ Done |\n| ✅ **Contract Verification** | Verify declared metrics exist before promotion | ✅ Done |\n| 📉 **Drift Detection** | Detect reliability degradation trends, project budget exhaustion | ✅ Done |\n| 📝 **Loki Integration** | Generate LogQL alert rules, technology-specific log patterns | 🔨 Next |\n| 🤖 **AI Generation** | Conversational service.yaml creation via MCP | 📋 Planned |\n\n---\n\n## 📦 Installation\n\n```bash\n# Recommended\npipx install nthlayer\n\n# Or with pip\npip install nthlayer\n\n# Verify\nnthlayer --version\n```\n\n---\n\n## 🌐 Live Demo\n\nSee NthLayer in action with real Grafana dashboards and generated configs:\n\n[![Live Dashboards](https://img.shields.io/badge/Live-Dashboards-blue?logo=grafana\u0026style=for-the-badge)](https://nthlayer.grafana.net)\n[![Interactive Demo](https://img.shields.io/badge/Interactive-Demo-green?style=for-the-badge)](https://rsionnach.github.io/nthlayer/demo/)\n\n---\n\n## 📚 Documentation\n\n**[Full Documentation](https://rsionnach.github.io/nthlayer/)** - Comprehensive guides and reference.\n[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/rsionnach/nthlayer)\n\n| Quick Links | |\n|-------------|---|\n| 🚀 [Quick Start](https://rsionnach.github.io/nthlayer/getting-started/quick-start/) | Get running in 5 minutes |\n| 🔧 [Setup Wizard](https://rsionnach.github.io/nthlayer/commands/setup/) | Interactive configuration |\n| 📊 [SLO Portfolio](https://rsionnach.github.io/nthlayer/commands/portfolio/) | Org-wide reliability view |\n| 🔌 [18 Technologies](https://rsionnach.github.io/nthlayer/integrations/technologies/) | PostgreSQL, Redis, Kafka... |\n| 📖 [CLI Reference](https://rsionnach.github.io/nthlayer/reference/cli/) | All commands |\n| 🤝 [Contributing](CONTRIBUTING.md) | How to contribute |\n\n\u003cdetails\u003e\n\u003csummary\u003eBuild docs locally\u003c/summary\u003e\n\n```bash\nuv sync --extra docs\nuv run mkdocs serve  # Opens at http://localhost:8000\n```\n\u003c/details\u003e\n\n---\n\n## 🤝 Contributing\n\n```bash\n# Install uv (https://docs.astral.sh/uv/)\ncurl -LsSf https://astral.sh/uv/install.sh | sh\n\ngit clone https://github.com/rsionnach/nthlayer.git\ncd nthlayer\nmake setup    # Install deps, start services\nmake test     # Run tests\n```\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md) for details.\n\n---\n\n## 📄 License\n\nMIT - See [LICENSE.txt](LICENSE.txt)\n\n---\n\n## 🙏 Acknowledgments\n\n### Core Dependencies\n- [grafana-foundation-sdk](https://github.com/grafana/grafana-foundation-sdk) - Dashboard generation SDK (Apache 2.0)\n- [awesome-prometheus-alerts](https://github.com/samber/awesome-prometheus-alerts) - 580+ battle-tested alert rules (CC BY 4.0)\n- [pint](https://github.com/cloudflare/pint) - PromQL linting and validation (Apache 2.0)\n- [conftest](https://github.com/open-policy-agent/conftest) / [OPA](https://github.com/open-policy-agent/opa) - Policy validation (Apache 2.0)\n- [PagerDuty Python SDK](https://github.com/PagerDuty/pdpyras) - Incident management integration (MIT)\n\n### Architecture Inspiration\n- [autograf](https://github.com/FUSAKLA/autograf) - Dynamic Prometheus metric discovery\n- [Sloth](https://github.com/slok/sloth) - SLO specification and burn rate calculations\n- [OpenSLO](https://github.com/openslo/openslo) - SLO specification standard\n\n### CLI \u0026 Documentation\n- [Rich](https://github.com/Textualize/rich) - Terminal formatting and styling (MIT)\n- [Questionary](https://github.com/tmbo/questionary) - Interactive CLI prompts (MIT)\n- [MkDocs Material](https://github.com/squidfunk/mkdocs-material) - Documentation theme (MIT)\n- [VHS](https://github.com/charmbracelet/vhs) - Terminal demo recordings (MIT)\n- [Nord Theme](https://www.nordtheme.com/) - Color palette inspiration (MIT)\n\n### Tooling\n- [Shields.io](https://shields.io/) - Badges\n- [Slidev](https://sli.dev/) - Presentation framework\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frsionnach%2Fnthlayer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frsionnach%2Fnthlayer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frsionnach%2Fnthlayer/lists"}