{"id":44716178,"url":"https://github.com/erezrozenbaum/pf9-mngt","last_synced_at":"2026-05-31T08:02:50.772Z","repository":{"id":336414320,"uuid":"1149551805","full_name":"erezrozenbaum/pf9-mngt","owner":"erezrozenbaum","description":"Open-source MSP and enterprise operations platform for Platform9/OpenStack, providing multi-cluster / multi-region management, identity federation, inventory intelligence, snapshot automation, restore orchestration, migration planning, governance, and operational analytics in one self-hosted control layer, Open-source MSP and enterprise operations ","archived":false,"fork":false,"pushed_at":"2026-05-26T11:05:26.000Z","size":39865,"stargazers_count":4,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2026-05-26T12:19:07.063Z","etag":null,"topics":["cloud-operations","devops","docker","fastapi","msp","openstack","platform9","postgresql","react"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/erezrozenbaum.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"docs/SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":"erezrozenbaum","buy_me_a_coffee":"erezrozenbaum"}},"created_at":"2026-02-04T08:43:26.000Z","updated_at":"2026-05-26T11:05:24.000Z","dependencies_parsed_at":"2026-04-16T08:02:14.374Z","dependency_job_id":null,"html_url":"https://github.com/erezrozenbaum/pf9-mngt","commit_stats":null,"previous_names":["erezrozenbaum/pf9-mngt"],"tags_count":414,"template":false,"template_full_name":null,"purl":"pkg:github/erezrozenbaum/pf9-mngt","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/erezrozenbaum%2Fpf9-mngt","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/erezrozenbaum%2Fpf9-mngt/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/erezrozenbaum%2Fpf9-mngt/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/erezrozenbaum%2Fpf9-mngt/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/erezrozenbaum","download_url":"https://codeload.github.com/erezrozenbaum/pf9-mngt/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/erezrozenbaum%2Fpf9-mngt/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33723550,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-31T02:00:06.040Z","response_time":95,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cloud-operations","devops","docker","fastapi","msp","openstack","platform9","postgresql","react"],"created_at":"2026-02-15T14:01:05.739Z","updated_at":"2026-05-31T08:02:50.767Z","avatar_url":"https://github.com/erezrozenbaum.png","language":"Python","funding_links":["https://github.com/sponsors/erezrozenbaum","https://buymeacoffee.com/erezrozenbaum"],"categories":[],"sub_categories":[],"readme":"# PF9 Management — Day-2 Operations Control Plane\n\n\u003e **Platform9 solves provisioning. pf9-mngt solves Day-2 operations at scale.**\n\n**pf9-mngt** is a self-hosted operational control plane that **extends Platform9/OpenStack** with persistent inventory, automated recovery workflows, and governance capabilities. Built for teams responsible for what happens *after* Day-0 provisioning.\n\n*Platform9 handles infrastructure provisioning brilliantly. pf9-mngt handles what comes next — snapshot SLA enforcement, 3am VM restores under pressure, cross-tenant visibility at scale, and VMware migration planning.*\n\n**Works alongside Platform9 via its APIs — not a replacement, but an operational layer on top.**\n\n![Dashboard Overview](docs/images/dashboard-overview.png)\n\n![Architecture Overview](docs/images/Architecture.png)\n\n## ⚡ What You'll See in 60 Seconds\n• 🎯 **Multi-tenant dashboard** with live KPIs and health metrics  \n• 📊 **Snapshot compliance** across 3 demo tenants with SLA tracking\n• 🔄 **VM restore workflow with side-by-side validation**  \n• 🗺️ **Migration planner** with RVTools import and risk assessment\n\n---\n\n## 🎯 Who This Is For\n\n- **🏢 MSPs** managing Platform9/OpenStack environments\n- **☁️ Cloud Providers** operating multi-tenant infrastructure  \n- **⚙️ DevOps Teams** requiring automated Day-2 operations\n\n*These operational challenges require purpose-built tooling beyond standard platform capabilities.*\n\n## 🚀 Quick Facts\n\n• **🏗️ 18-container microservices** — designed for production deployment  \n• **📈 670+ commits, actively evolving** — established codebase  \n• **✅ 626 passing tests** — comprehensive test coverage ([see tests/](tests/))  \n• **🔒 Kubernetes-native** — Helm charts + ArgoCD GitOps  \n• **🎮 Demo mode** — full product experience without Platform9  \n\n[![Version](https://img.shields.io/badge/version-2.16.5-blue.svg)](CHANGELOG.md) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE) [![Docker](https://img.shields.io/badge/docker-ready-blue.svg)](https://www.docker.com/) [![Kubernetes](https://img.shields.io/badge/kubernetes-ready-green.svg)](https://kubernetes.io/)\n\n*Used to model real-world MSP Day-2 operational scenarios.*\n\n---\n\n## 🔄 The Day-2 Operations Reality\n\nProvisioning is not the hard part anymore. Running infrastructure at scale is.\n\nWhat actually breaks in real Platform9/OpenStack environments:\n- **Snapshot SLAs** across tenants — no native scheduler exists\n- **VM restore under pressure** — no native workflow; everything is manual reconstruction  \n- **Metadata ownership** — resource names, relationships, and topology live on the platform, not with you\n- **Cross-tenant visibility** at scale — the native UI is per-tenant, not operational-aggregate\n- **Customer self-service** — tenants need infrastructure status without you being a human API\n\n---\n\n## 🔄 Three Core Workflows\n\n### 1. 📸 Snapshot SLA Enforcement  \n**Policy Definition** → **Automated Execution** → **Compliance Monitoring** → **Alert Generation** → **Audit Reports**\n\n### 2. 🔄 VM Restore Under SLA Pressure  \n**Select Target VM** → **Dry-Run Validation** → **Execute Restore** → **Real-time Monitoring** → **Compliance Audit**\n\n### 3. 🗺️ Migration Planning \u0026 Execution  \n**RVTools Import** → **Risk Assessment** → **Cohort Analysis** → **Wave Planning** → **PCD Deployment**\n\n---\n\n## 🏛️ Four Operational Pillars\n\nEverything in pf9-mngt is built around four operational concerns:\n\n| Pillar | What it covers |\n|--------|---------------|\n| 🔍 **Visibility** | Cross-tenant, multi-region inventory with drift detection, dependency graph, and historical tracking — metadata owned by you, not the platform |\n| 🔄 **Recovery** | Snapshot automation and full VM restore orchestration — two modes, dry-run validation, SLA compliance, not natively addressed in OpenStack |\n| ⚙️ **Operations** | Ticketing, 25 built-in runbooks, metering, chargeback, standardized governance workflows, and tenant self-service portal |\n| 🧠 **Intelligence** | AI Ops Copilot (plain-language queries against live infrastructure), Operational Intelligence Feed (capacity, waste, risk and anomaly engines), **Workload Right-Sizing** (idle + over-provisioned VM detection with flavor recommendations and savings estimates), SLA compliance tracking, QBR PDF generator, Account Manager Portfolio and Executive Health dashboards, revenue leakage detection |\n\n\u003e Everything else in the system — LDAP, multi-region, Kubernetes, export reports — supports one of these four pillars.\n\n---\n\n## 🎯 What This Actually Replaces\n\n| **Without pf9-mngt** | **With pf9-mngt** |\n|----------------------|-------------------|\n| Scripts that dump inventory to CSV, manually maintained | Persistent PostgreSQL inventory, 29 resource types, always current |\n| VM restore = manual reconstruction at 3am under SLA pressure | Fully automated restore — flavor, network, IPs, volumes, credentials |\n| No snapshot scheduler — custom cron per tenant, no SLA tracking | Policy-driven snapshot automation, cross-tenant, quota-aware, SLA-compliant |\n| Migration planning in spreadsheets — guesswork | End-to-end planner: RVTools → risk scoring → wave planning → PCD provisioning |\n| Separate ticketing tool + separate runbook wiki + separate billing exports | Built-in: tickets, 25 runbooks, metering, chargeback — one system |\n| Tenants call you for every status check — your team is the bottleneck | Tenant self-service portal: customers view their own VMs, snapshots, restores — scoped, isolated, MFA-protected |\n| Idle and over-provisioned VMs burning budget silently | Workload Right-Sizing: automated idle/over-provisioned VM detection, flavor recommendations, monthly savings estimates — surfaced for both admins and tenants |\n\n**Unified operational platform.**\n\n---\n## 🎯 What Makes It Different\n\nMost platforms solve provisioning. pf9-mngt solves **what happens after deployment** — snapshot SLAs, restore procedures, compliance reporting, capacity forecasting, and migration planning.\n\n**MSP Business Value:**\n- **SLA compliance tracking** per tier (Gold/Silver/Bronze) with automated breach detection\n- **QBR PDF generation** per customer with usage analytics and capacity planning\n- **Account Manager Portfolio dashboard** — per-tenant SLA status, vCPU usage, leakage alerts\n- **Executive Health dashboard** — fleet SLA gauge, MTTR, revenue leakage detection\n- **Revenue leakage detection** — identify underutilized resources and optimization opportunities\n- **Workload Right-Sizing** (v2.6.0) — automatically classify idle and over-provisioned VMs, recommend smaller flavors, and quantify estimated monthly savings; surfaces in both admin UI and tenant self-service portal with Snooze/Dismiss lifecycle management\n\nBuilt from real-world operational scenarios observed during Platform9 evaluation.\n\n---\n## 📊 Why This Matters\n\n| **Challenge** | **Native Platform9** | **pf9-mngt Solution** |\n|---------------|---------------------|----------------------|\n| Cross-tenant visibility | Per-tenant only | Centralized persistent inventory (29 resource types) |\n| Snapshot SLA enforcement | None built-in | Policy-driven, multi-tenant, audited |\n| VM restore workflow | Manual reconstruction | Full automation, two modes, dry-run validation |\n| Metadata ownership | Lives on the platform | Your PostgreSQL, always available |\n| Tenant self-service | You are the human API | MFA-protected portal, RLS-isolated, scoped to their projects |\n| VMware migration | No native tooling | End-to-end planner: RVTools → PCD provisioning |\n\n---\n\n## 🚀 Demo Mode\n\n⏱ **Setup time:** ~2–3 minutes  \n🧠 **No Platform9 required**  \n🎯 **Full product experience**  \n\n```bash\ngit clone https://github.com/erezrozenbaum/pf9-mngt.git\ncd pf9-mngt\n.\\deployment.ps1  # Choose option 2 for Demo Mode\n```\n\n**Experience:** Dashboard + compliance tracking + VM restore + migration planning + chargeback  \n**Ready-to-use demo data** with tenants, VMs, snapshots, and SLA scenarios\n\n---\n\n## 🚀 Quick Start\n\n### 🐳 Complete Platform (Recommended)\n```bash\ngit clone https://github.com/erezrozenbaum/pf9-mngt.git\ncd pf9-mngt\n.\\deployment.ps1  # Automated setup wizard\n\n# Access: http://localhost:5173 (UI) | http://localhost:8000 (API)\n```\n\n### ☁️ Kubernetes Production\n```bash\nhelm repo add pf9-mngt https://erezrozenbaum.github.io/pf9-mngt\nhelm install pf9-mngt pf9-mngt/pf9-mngt \\\n  --namespace pf9-mngt --create-namespace \\\n  -f k8s/helm/pf9-mngt/values.prod.yaml\n```\n\n---\n\n## 🏗️ Architecture\n\n**Production-ready microservices platform** with 18 specialized containers:\n\n| Service Type | Count | Examples | Stack |\n|--------------|-------|----------|-------|\n| **Core Services** | 6 | Frontend UI, Backend API, Database, Monitoring, Tenant Portal, Cache | React 19.2+, FastAPI, PostgreSQL 16 |\n| **Worker Services** | 9 | Snapshot, Backup, Metering, Search, Sync Workers | Python |\n| **Infrastructure** | 3 | Nginx, Redis, Queue Manager | Standard components |\n\n**What sets it apart:**\n- **Persistent inventory engine** — 29 resource types, independent of platform uptime (RVTools-equivalent for OpenStack)\n- **Snapshot automation engine** — quota-aware, cross-tenant, policy-driven scheduling\n- **VM restore system** — full automation of flavor, network, IPs, credentials, volumes\n- **Migration planning workbench** — from RVTools ingestion through PCD auto-provisioning\n\n**Tech Stack:** React 19.2+ / TypeScript / FastAPI / PostgreSQL 16 / Redis / Docker / Kubernetes  \n**Deployment Ready:** 593 tests, security scanning, observability, Kubernetes deployment  \n*Built to solve operational gaps identified during Platform9 evaluation.*\n\n---\n\n## 📸 Key Screens\n\n**Dashboard Overview** — Multi-tenant health metrics and live KPIs  \n![Landing Dashboard](docs/images/dashboard-overview.png)\n\n**Snapshot Compliance** — SLA tracking and automated remediation  \n![Snapshot Compliance](docs/images/snapshot-compliance-report.png)\n\n**Tenant Self-Service Portal** — Isolated MFA-protected interface  \n![Tenant Portal](docs/images/Tenant_portal.png)\n\n**Chargeback \u0026 Metering** — Multi-resource cost tracking  \n![Metering \u0026 Chargeback](docs/images/Metering_system.png)  \n\n---\n\n## 🎬 Video Walkthrough\n\n▶️ [**PF9 Management System — Full UI Walkthrough (15 min)**](https://www.youtube.com/watch?v=V0z5-HKVWts)\n\n---\n\n## 📚 Documentation\n\n| Document | Purpose |\n|----------|---------|\n| [Deployment Guide](docs/DEPLOYMENT_GUIDE.md) | Step-by-step setup instructions |\n| [Admin Guide](docs/ADMIN_GUIDE.md) | Day-to-day administration |\n| [Migration Planner Guide](docs/MIGRATION_PLANNER_GUIDE.md) | VMware → PCD migration planning, provisioning \u0026 handoff |\n| [Architecture](docs/ARCHITECTURE.md) | System design \u0026 data model |\n| [Kubernetes Guide](docs/KUBERNETES_GUIDE.md) | Helm charts \u0026 production deployment |\n| [Features Reference](docs/FEATURES_REFERENCE.md) | Complete technical deep-dive |\n\n---\n## 📋 Current Status \u0026 Maturity\n\n| Component | Status | Notes |\n|-----------|--------|---------|\n| **Demo Mode** | ✅ Fully available | Complete experience, no Platform9 required |\n| **Platform9 Integration** | ✅ Supported | Works via Platform9 APIs, tested against v6.0+ |\n| **Kubernetes Deployment** | ✅ Helm/ArgoCD ready | Production-ready manifests, observability included |\n| **Test Coverage** | ✅ 593 passing tests | API, integration, and UI tests ([see tests/](tests/)) |\n| **Production Usage** | ✅ Production-ready core | 593 tests, Kubernetes deployment, enterprise monitoring |\n| **Documentation** | ✅ Complete | 20+ guides covering deployment through operations |\n\n*Production-ready architecture; currently used in evaluation and laboratory environments.*\n\n---\n## 🤝 How This Complements Platform9\n\n**Platform9 excels at infrastructure provisioning.** pf9-mngt extends it with operational capabilities:\n\n| **Challenge** | **Platform9 Strength** | **pf9-mngt Extension** |\n|---------------|------------------------|------------------------|\n| Infrastructure deployment | ✅ Excellent provisioning APIs | Persistent inventory, 29 resource types, historical tracking |\n| Basic operations | ✅ Native OpenStack workflows | Automated snapshot scheduling, SLA compliance, audit trails |\n| VM management | ✅ Standard create/delete | Full restore automation, dry-run validation, side-by-side comparison |\n| Multi-tenancy | ✅ Keystone project isolation | Cross-tenant operational visibility, centralized governance |\n| Migration support | ✅ Standard OpenStack migration | End-to-end VMware migration planning: RVTools → PCD provisioning |\n| Operational workflow | ✅ Admin UI for infrastructure | Tenant self-service portal, ticketing, runbooks, chargeback |\n\n**Works alongside Platform9 via its APIs. Better together.**\n\n---\n\n## ❓ FAQ\n\n**Q: Does this replace the Platform9 UI?**  \nNo — it's a complementary operational layer. Platform9 handles provisioning, pf9-mngt handles Day-2 operations.\n\n**Q: Can I try without Platform9?**  \nYes — Demo Mode provides full functionality with sample data.\n\n**Q: Is this production-ready?**  \nDesigned for production deployment — 593 tests ([see tests/](tests/)), Kubernetes deployment, security scanning, observability.\n\n**Q: Minimum requirements?**  \nDocker host: 4GB RAM, 2 CPU cores, network access to Platform9 endpoints.\n\n---\n## 💰 MSP ROI Impact\n\n**For Service Providers, Every Feature Drives Revenue:**\n\n| **MSP Challenge** | **Revenue Impact** | **pf9-mngt Solution** |\n|-------------------|-------------------|----------------------|\n| **Revenue Leakage** | Lost $2-5K/month per client from untracked resources | Automated leakage detection with efficiency scoring |\n| **Manual Tenant Support** | $50-200/ticket for status checks and restores | Self-service tenant portal eliminates 80%+ of tickets |\n| **Compliance Penalties** | $10-50K per SLA breach incident | Automated SLA monitoring with proactive breach prevention |\n| **Migration Risk** | $25-100K+ in failed migration costs | End-to-end VMware migration planner with risk scoring |\n| **Billing Disputes** | Hours/month of manual reconciliation | Multi-currency chargeback system with audit trails |\n| **Executive Reporting** | Manual QBR preparation (4-8 hours/client) | Automated QBR PDF generation per customer |\n\n**Typical MSP ROI:** 300-500% within 6 months through reduced operational overhead and eliminated revenue leakage.\n\n*Estimates based on common MSP operational cost patterns; actual results depend on environment size and processes.*\n\n---\n## 🆕 Recent Highlights\n\n- **v2.12** — **KVM Node Log Viewer**: SSH-based log fetching from KVM nodes (`/var/log/pf9/`) via paramiko; `NODE_LOG_SOURCE=ssh` env var; credentials from `pf9-ssh-credentials` K8s secret; NetworkPolicy egress for port 22; PF9 hostagent log format timestamp parsing. (May 2026)\n- **v2.11** — **Platform Health \u0026 Prometheus Integration**: per-pod CPU/RAM sparklines, PVC utilisation bars, network receive rate via `GET /api/admin/platform/metrics`; KPI summary tiles; CLEA Automation page redesign. (May 2026)\n- **v2.10** — **Shared Internal Library**: `secret_helper`, `crypto_helper`, and `request_helpers` extracted into `shared/` package shared across `api/` and `tenant_portal/`; `secret_helper` security hardening for file permission checks. (May 2026)\n- **v2.9** — **Closed-Loop Event Automation (CLEA)**: policies map event types to runbooks with `auto` or `single_approval` modes; event bus triggers or queues executions automatically; policy CRUD API + admin UI tab. (May 2026)\n- **v2.8** — **Schema Consolidation**: all DDL moved from lazy `_ensure_tables()` calls into `db/init.sql` and versioned `db/migrate_*.sql` files; clean separation of fresh-install and upgrade paths. (May 2026)\n- **v2.7** — **Event Bus \u0026 Platform Health Endpoint**: `emit_event()` fire-and-forget writer to `operational_events`; `GET /api/admin/platform/health` exposes DB latency, Redis ping, pool stats, and worker last-run timestamps. (May 2026)\n- **v2.6** — **Workload Right-Sizing \u0026 Cost Waste Detection**: idle/over-provisioned VM classification, flavor recommendations, monthly savings estimates, Snooze/Dismiss lifecycle, tenant self-service resize requests, billing impact on all recommendation objects. (May 2026)\n- **v2.5** — **Circuit Breaker Observability**: breaker state, failure count, and time-until-reset surfaced in the region sync-status endpoint for live visibility of outbound Platform9 API health. (May 2026)\n- **v2.4** — **Notification Dead-Letter Queue**: failed email sends retried with exponential back-off (5 → 15 → 60 min); exhausted notifications marked `dead_lettered`; `GET /notifications/admin/retry-queue` for queue visibility. (May 2026)\n- **v2.3** — **Snapshot Chain Tracking \u0026 Health Score Configuration**: parent linkage, pre-delete guard, chain policy editor, Snapshot Chain Explorer UI; configurable health score weights; per-worker PostgreSQL least-privilege roles; SSRF guard on external integration URLs. (May 2026)\n- **v2.2** — **Copilot Agentic Execution**: \"Run it\" button triggers runbooks from the Copilot chat with per-user quotas, dry-run mode, risk-level badges, and full audit trail. (May 2026)\n- **v2.1** — **Tenant Notifications \u0026 MFA Enforcement**: 9-event-type subscriptions with email + webhook delivery; admin-configurable MFA enforcement per role. (May 2026)\n- **v1.97–v1.99** — **Platform Foundations**: PgBouncer connection pooling; tenant composite health scoring (0–100); Fernet key rotation CLI; append-only audit logs via PostgreSQL RLS; Copilot LLM key encryption at rest; GIN indexes on JSONB columns. (May 2026)\n\n📋 **Full version history → [CHANGELOG.md](CHANGELOG.md)**\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ferezrozenbaum%2Fpf9-mngt","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ferezrozenbaum%2Fpf9-mngt","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ferezrozenbaum%2Fpf9-mngt/lists"}