{"id":48510341,"url":"https://github.com/jonathan-vella/azure-postgresql-ha-aks-workshop","last_synced_at":"2026-04-07T17:32:33.808Z","repository":{"id":321673641,"uuid":"1085259629","full_name":"jonathan-vella/azure-postgresql-ha-aks-workshop","owner":"jonathan-vella","description":"A complete automation framework for deploying a highly available PostgreSQL database on Azure Kubernetes Service with Premium v2 storage, CloudNativePG operator, and PgBouncer connection pooling.","archived":false,"fork":false,"pushed_at":"2025-10-30T22:02:08.000Z","size":666,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-10-30T23:32:14.911Z","etag":null,"topics":["aks","azure","high-availability","kubernetes","microsoft","postgresql"],"latest_commit_sha":null,"homepage":"","language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jonathan-vella.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-10-28T19:49:11.000Z","updated_at":"2025-10-30T22:02:13.000Z","dependencies_parsed_at":"2025-10-30T23:32:16.485Z","dependency_job_id":"8c8a70a5-1f75-4eae-a8fb-7880633c311c","html_url":"https://github.com/jonathan-vella/azure-postgresql-ha-aks-workshop","commit_stats":null,"previous_names":["jonathan-vella/azure-postgresql-ha-aks-workshop"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/jonathan-vella/azure-postgresql-ha-aks-workshop","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jonathan-vella%2Fazure-postgresql-ha-aks-workshop","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jonathan-vella%2Fazure-postgresql-ha-aks-workshop/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jonathan-vella%2Fazure-postgresql-ha-aks-workshop/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jonathan-vella%2Fazure-postgresql-ha-aks-workshop/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jonathan-vella","download_url":"https://codeload.github.com/jonathan-vella/azure-postgresql-ha-aks-workshop/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jonathan-vella%2Fazure-postgresql-ha-aks-workshop/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31522321,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-07T16:28:08.000Z","status":"ssl_error","status_checked_at":"2026-04-07T16:28:06.951Z","response_time":105,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aks","azure","high-availability","kubernetes","microsoft","postgresql"],"created_at":"2026-04-07T17:32:33.621Z","updated_at":"2026-04-07T17:32:33.744Z","avatar_url":"https://github.com/jonathan-vella.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🚀 Azure PostgreSQL HA on AKS Workshop\r\n\r\n**Version**: `v1.0.0` | **License**: MIT | **Status**: Lab \u0026 PoC Ready\r\n\r\nA complete automation framework for deploying a **highly available PostgreSQL database** on Azure Kubernetes Service with Premium v2 storage, CloudNativePG operator, and PgBouncer connection pooling.\r\n\r\n\u003e **⚠️ IMPORTANT: Lab and Proof-of-Concept Use Only**  \r\n\u003e This code is provided strictly for **lab environments and proof-of-concept purposes only**. It is not intended for production use. Additional hardening, security reviews, compliance validation, and operational procedures are required before considering any production deployment.\r\n\r\n[![Version](https://img.shields.io/badge/Version-v1.0.0-blue)](#) [![Status](https://img.shields.io/badge/Status-Lab%2FPoC-yellow)](#) [![License](https://img.shields.io/badge/License-MIT-green)](#) [![PostgreSQL](https://img.shields.io/badge/PostgreSQL-18.0-336791?logo=postgresql)](#) [![AKS](https://img.shields.io/badge/AKS-1.32-0078D4?logo=kubernetes)](#) [![CNPG](https://img.shields.io/badge/CloudNativePG-1.27.1-326CE5?logo=kubernetes)](#) [![Azure](https://img.shields.io/badge/Azure-CLI-0078D4?logo=microsoft-azure)](#) [![HA](https://img.shields.io/badge/HA-RPO:0_RTO:\u003c10s-success)](#) [![Performance](https://img.shields.io/badge/TPS-8K--10K-orange)](#)\r\n\r\n---\r\n\r\n## 🏗️ Architecture Overview\r\n\r\n![PostgreSQL HA on AKS with PgBouncer](images/aks-cnpg-pgbouncer-architecture-rw.png)\r\n\r\n### Architecture Diagram\r\n```mermaid\r\ngraph TB\r\n    subgraph \"Azure Subscription\"\r\n        subgraph \"Virtual Network (10.0.0.0/8)\"\r\n            subgraph \"AKS Cluster (1.32)\"\r\n                subgraph \"System Node Pool (2x D2s_v5)\"\r\n                    CNPG[\"CNPG Operator\u003cbr/\u003e(cnpg-system)\"]\r\n                    INF[\"Prometheus\u003cbr/\u003eMonitoring\"]\r\n                end\r\n                \r\n                subgraph \"Connection Pooling Layer\"\r\n                    PGB1[\"PgBouncer Pod 1\u003cbr/\u003eTransaction Mode\u003cbr/\u003e10K Connections\"]\r\n                    PGB2[\"PgBouncer Pod 2\u003cbr/\u003eTransaction Mode\u003cbr/\u003e10K Connections\"]\r\n                    PGB3[\"PgBouncer Pod 3\u003cbr/\u003eTransaction Mode\u003cbr/\u003e10K Connections\"]\r\n                end\r\n                \r\n                subgraph \"PostgreSQL Node Pool (3x E8as_v6)\"\r\n                    PG1[\"PostgreSQL Primary\u003cbr/\u003eInstance 1\u003cbr/\u003e200GB Data + WAL\u003cbr/\u003e40K IOPS\"]\r\n                    PG2[\"PostgreSQL Sync Replica\u003cbr/\u003eInstance 2 (Quorum)\u003cbr/\u003e200GB Data + WAL\u003cbr/\u003e40K IOPS\"]\r\n                    PG3[\"PostgreSQL Async Replica\u003cbr/\u003eInstance 3\u003cbr/\u003e200GB Data + WAL\u003cbr/\u003e40K IOPS\"]\r\n                end\r\n                \r\n                subgraph \"Kubernetes Services\"\r\n                    SVC_POOL_RW[\"Service: pg-primary-pooler-rw\u003cbr/\u003e(PgBouncer Read-Write)\u003cbr/\u003ePort 5432\"]\r\n                    SVC_POOL_RO[\"Service: pg-primary-pooler-ro\u003cbr/\u003e(PgBouncer Read-Only)\u003cbr/\u003ePort 5432\"]\r\n                    SVC_RW[\"Service: pg-primary-rw\u003cbr/\u003e(Direct Read-Write)\u003cbr/\u003ePort 5432\"]\r\n                    SVC_RO[\"Service: pg-primary-ro\u003cbr/\u003e(Direct Read-Only)\u003cbr/\u003ePort 5432\"]\r\n                end\r\n            end\r\n            \r\n            SVC_POOL_RW --\u003e PGB1 \u0026 PGB2 \u0026 PGB3\r\n            SVC_POOL_RO --\u003e PGB1 \u0026 PGB2 \u0026 PGB3\r\n            PGB1 \u0026 PGB2 \u0026 PGB3 -.-\u003e|Connection Pool| PG1\r\n            PGB1 \u0026 PGB2 \u0026 PGB3 -.-\u003e|Connection Pool| PG2 \u0026 PG3\r\n            PG1 ===|Sync Replication\u003cbr/\u003eRPO=0| PG2\r\n            PG1 ---|Async Replication| PG3\r\n            SVC_RW --\u003e PG1\r\n            SVC_RO --\u003e PG2 \u0026 PG3\r\n        end\r\n        \r\n        subgraph \"Storage \u0026 Backup\"\r\n            SA[\"Azure Storage Account\u003cbr/\u003e(ZRS)\u003cbr/\u003eBlob Backups\"]\r\n            LA[\"Log Analytics\u003cbr/\u003eWorkspace\"]\r\n        end\r\n        \r\n        subgraph \"Monitoring\"\r\n            GRAF[\"Azure Managed Grafana\u003cbr/\u003eInstance\"]\r\n            AMW[\"Azure Monitor\u003cbr/\u003eWorkspace\"]\r\n        end\r\n        \r\n        subgraph \"Network Security\"\r\n            NSG[\"Network Security Group\u003cbr/\u003e- Kubernetes API: 443\u003cbr/\u003e- PostgreSQL: 5432\"]\r\n            MI[\"Managed Identity\u003cbr/\u003e(Workload Identity)\"]\r\n        end\r\n        \r\n        PG1 \u0026 PG2 \u0026 PG3 --\u003e|WAL Archive + Backups| SA\r\n        CNPG \u0026 PG1 \u0026 PG2 \u0026 PG3 --\u003e|Metrics| AMW\r\n        AMW --\u003e GRAF\r\n        MI --\u003e|Auth to Storage| SA\r\n        NSG -.-\u003e|Security Rules| PG1 \u0026 PG2 \u0026 PG3\r\n    end\r\n    \r\n    style PG1 fill:#336791,stroke:#2d5a7b,color:#fff\r\n    style PG2 fill:#336791,stroke:#2d5a7b,color:#fff\r\n    style PG3 fill:#336791,stroke:#2d5a7b,color:#fff\r\n    style PGB1 fill:#47a8bd,stroke:#358a9c,color:#fff\r\n    style PGB2 fill:#47a8bd,stroke:#358a9c,color:#fff\r\n    style PGB3 fill:#47a8bd,stroke:#358a9c,color:#fff\r\n    style SA fill:#0078d4,stroke:#0062a3,color:#fff\r\n    style GRAF fill:#ff9830,stroke:#d67f1a,color:#fff\r\n    style AMW fill:#0078d4,stroke:#0062a3,color:#fff\r\n    style MI fill:#7fba00,stroke:#6d9b00,color:#fff\r\n    style NSG fill:#ff6b6b,stroke:#e63946,color:#fff\r\n```\r\n\r\n## ✨ Key Features\r\n\r\n### 🔧 Infrastructure \u0026 Deployment\r\n- **Full Automation**: Pure Azure CLI scripts following Microsoft reference implementation\r\n- **Separate Node Pools**: 2 system nodes (D4s_v5) + 3 user nodes (E8as_v6) for workload isolation\r\n- **Zone Redundancy**: Deployment across 3 Azure Availability Zones\r\n- **Premium Storage**: Premium SSD v2 with 40K IOPS \u0026 1,250 MB/s per disk (200 GiB)\r\n- **DevContainer Ready**: Pre-configured environment with all tools installed\r\n\r\n### 🛡️ High Availability \u0026 Reliability\r\n- **3-Node Cluster**: 1 primary + 1 quorum sync replica + 1 async replica\r\n- **Automatic Failover**: \u003c10 second RTO with zero data loss (RPO = 0)\r\n- **Data Durability**: Synchronous replication with remote_apply guarantee\r\n- **Connection Pooling**: 3 PgBouncer instances handling 10,000+ concurrent connections\r\n- **Health Monitoring**: Automated health checks with self-healing capabilities\r\n\r\n### 📊 Performance \u0026 Scalability\r\n- **Target Throughput**: Optimized for 8,000-10,000 TPS\r\n- **Dynamic Resources**: PostgreSQL parameters auto-calculate from memory allocation\r\n- **Efficient Pooling**: Transaction-mode pooling for optimal connection management\r\n- **Load Balancing**: Automatic read distribution across replicas\r\n\r\n### 🔐 Security \u0026 Compliance\r\n- **Workload Identity**: Federated credentials (no secrets in pods)\r\n- **Authentication**: SCRAM-SHA-256 password encryption\r\n- **Network Security**: NSGs, private networking, NAT Gateway\r\n- **Encryption**: At-rest and in-transit encryption\r\n- **RBAC**: Kubernetes role-based access control\r\n\r\n### 📈 Observability \u0026 Operations\r\n- **Grafana Dashboards**: Pre-built dashboard with 9 monitoring panels\r\n- **Prometheus Metrics**: Real-time cluster health and performance metrics\r\n- **Azure Monitor**: Centralized log aggregation and alerting\r\n- **CloudNativePG**: 1.27.1 operator for automated lifecycle management\r\n\r\n### 💾 Backup \u0026 Recovery\r\n- **Automated Backups**: WAL archiving + base backups to Azure Blob Storage\r\n- **7-Day Retention**: Configurable backup retention policies\r\n- **Point-in-Time Recovery**: PITR capability via WAL archives\r\n- **Geo-Redundancy**: Optional GRS for disaster recovery\r\n\r\n---\r\n\r\n## 🚀 Quick Start\r\n\r\n### Option A: Use DevContainer (Recommended, Tested, Validated) 🐳\r\n\r\nAll tools pre-installed in isolated container with auto-generated environment:\r\n\r\n```bash\r\n# Requirements: Docker Desktop + VS Code Remote - Containers extension\r\n# 1. Open project in VS Code\r\n# 2. Ctrl+Shift+P -\u003e \"Dev Containers: Reopen in Container\"\r\n# 3. Wait for build (2-5 min first time)\r\n# 4. .env file auto-generated with unique resource names\r\n# 5. Tools ready: az, kubectl, helm, jq, bc, psql, netcat, kubectl-cnpg (v1.27.1)\r\n```\r\n\r\n**Key Features**:\r\n- Auto-generates `.env` with unique suffix and resource names\r\n- CNPG kubectl plugin v1.27.1 pre-installed\r\n- PostgreSQL client (psql) for testing\r\n- Network tools (netcat) for connectivity testing\r\n- Calculator (bc) for pgbench metrics\r\n\r\nSee `.devcontainer/README.md` for detailed setup.\r\n\r\n### Option B: Local Installation (Not tested)\r\n\r\n**Prerequisites**:\r\n- Azure CLI (v2.56+), kubectl (v1.21+), Helm (v3.0+), jq, OpenSSL\r\n- Azure subscription with Owner or User Access Administrator role\r\n- Region with Premium v2 disk support\r\n\r\n### 1️⃣ Configure\r\n\r\n**Option A: DevContainer (Recommended)**\r\n```bash\r\n# .env is auto-generated when container starts\r\n# Contains unique suffix and all resource names\r\nsource .env\r\n\r\n# Verify configuration\r\necho \"Resource Group: $RESOURCE_GROUP_NAME\"\r\necho \"AKS Cluster: $AKS_PRIMARY_CLUSTER_NAME\"\r\n\r\n# Optional: Regenerate with new suffix\r\n./scripts/regenerate-env.sh\r\n```\r\n\r\n**Option B: Manual Setup**\r\n```bash\r\n# Clone repository\r\ngit clone \u003crepo-url\u003e\r\ncd azure-postgresql-ha-aks-workshop\r\n\r\n# Review and customize environment variables\r\ncode config/environment-variables.sh\r\n```\r\n\r\n### 2️⃣ Deploy\r\n\r\n**Using DevContainer**:\r\n```bash\r\n# Load auto-generated environment variables\r\nsource .env\r\n\r\n# Deploy all components (8 automated steps)\r\n./scripts/deploy-all.sh\r\n```\r\n\r\n**Using Manual Setup**:\r\n```bash\r\n# Load environment variables into current shell session\r\n# This makes all configuration values available to deployment scripts\r\nsource config/environment-variables.sh\r\n\r\n# Deploy all components (8 automated steps)\r\n./scripts/deploy-all.sh\r\n```\r\n\r\n\u003e **What does this do?** The `source` command loads all configuration variables (like resource names, regions, VM sizes) into your current terminal session. This allows the deployment scripts to access these values without hardcoding them. In DevContainer, `.env` is auto-generated with unique resource names; otherwise, use `config/environment-variables.sh`.\r\n\r\n### 3️⃣ Verify\r\n```bash\r\n# Get cluster credentials\r\naz aks get-credentials --resource-group \u003crg-name\u003e --name \u003ccluster-name\u003e\r\n\r\n# Check status\r\nkubectl cnpg status pg-primary -n cnpg-database\r\n\r\n# View pods\r\nkubectl get pods -n cnpg-database -l cnpg.io/cluster=pg-primary\r\n```\r\n\r\n### 4️⃣ Validate Deployment\r\n```bash\r\n# Run comprehensive cluster validation (in-cluster Kubernetes Job)\r\n./scripts/07a-run-cluster-validation.sh\r\n```\r\n\r\n**What gets validated:**\r\n- ✅ Primary and replica connectivity (100% pass rate)\r\n- ✅ PgBouncer connection pooling (3 instances)\r\n- ✅ Data write operations and replication consistency\r\n- ✅ Read-only service routing to replicas\r\n- ✅ Replication health and accessibility\r\n- ✅ Multi-connection concurrency testing\r\n- ⚡ Completes in ~7 seconds (in-cluster execution)\r\n\r\n### 5️⃣ Connect\r\n```bash\r\n# Option 1: Connect via PgBouncer (Recommended for Applications)\r\nkubectl port-forward svc/pg-primary-pooler-rw 5432:5432 -n cnpg-database \u0026\r\npsql -h localhost -U app -d appdb\r\n\r\n# Option 2: Direct connection to PostgreSQL\r\nkubectl port-forward svc/pg-primary-rw 5432:5432 -n cnpg-database \u0026\r\npsql -h localhost -U app -d appdb\r\n```\r\n\r\n**Why use PgBouncer?**\r\n- Handles 10,000+ concurrent connections efficiently\r\n- Reduces PostgreSQL connection overhead\r\n- Transaction-level pooling for optimal performance\r\n- Automatic load distribution across replicas\r\n\r\n---\r\n\r\n### 📋 Documentation\r\n\r\n| Document | Description |\r\n|----------|-------------|\r\n| 📖 [**SETUP_COMPLETE.md**](docs/SETUP_COMPLETE.md) | 👈 **START HERE** - Complete setup guide |\r\n| ⚡ [**QUICK_REFERENCE.md**](docs/QUICK_REFERENCE.md) | Command cheat sheet |\r\n| 💰 [**COST_ESTIMATION.md**](docs/COST_ESTIMATION.md) | Hourly/monthly cost breakdown (~$2,873/month) |\r\n| 📊 [**GRAFANA_DASHBOARD_GUIDE.md**](docs/GRAFANA_DASHBOARD_GUIDE.md) | Dashboard usage and metrics |\r\n| 🔄 [**FAILOVER_TESTING.md**](docs/FAILOVER_TESTING.md) | High availability testing |\r\n| 🎯 [**CNPG_BEST_PRACTICES.md**](docs/CNPG_BEST_PRACTICES.md) | CloudNativePG 1.27 production best practices |\r\n\r\n### ⚙️ Configuration\r\n```\r\n.env                           - Auto-generated (DevContainer only, gitignored)\r\n    - Unique suffix for resource names\r\n    - All Azure resource names pre-configured\r\n    - Generated at devcontainer startup\r\n\r\nconfig/\r\n└── environment-variables.sh   - Bash environment configuration template\r\n    - Resource names with random suffix\r\n    - AKS settings (version, VM sizes)\r\n    - Storage configuration (IOPS, throughput)\r\n    - PostgreSQL parameters\r\n    - Auto-detect public IP for firewall\r\n```\r\n\r\n### 🚀 Deployment Scripts\r\n```\r\nscripts/\r\n├── deploy-all.sh                       - Master orchestration script (8 steps with logging)\r\n├── regenerate-env.sh                   - ⭐ Regenerate .env with new suffix (DevContainer)\r\n├── setup-prerequisites.sh              - ⭐ Install required tools (non-DevContainer)\r\n├── 02-create-infrastructure.sh         - Creates Azure resources (RG, AKS, Storage, Identity, Bastion, NAT Gateway, Container Insights)\r\n├── 03-configure-workload-identity.sh   - Sets up federated credentials\r\n├── 04-deploy-cnpg-operator.sh          - Installs CloudNativePG operator v1.27.1 via Helm\r\n├── 04a-install-barman-cloud-plugin.sh  - Installs Barman Cloud Plugin v0.8.0 for backup/restore\r\n├── 05-deploy-postgresql-cluster.sh     - Deploys PostgreSQL cluster + PgBouncer pooler + PodMonitor\r\n├── 06-configure-monitoring.sh          - Configures Azure Managed Grafana\r\n├── 06a-configure-azure-monitor-prometheus.sh - Configures Azure Monitor Managed Prometheus\r\n├── 06b-import-grafana-dashboard.sh     - ⭐ Import Grafana dashboard automatically\r\n├── 07-display-connection-info.sh       - Displays connection endpoints and credentials\r\n└── 07a-run-cluster-validation.sh       - ⭐ In-cluster validation (100% pass rate, 7s execution)\r\n```\r\n\r\n### ⚙️ Kubernetes Reference\r\n```\r\nkubernetes/\r\n└── postgresql-cluster.yaml - Reference manifest (NOT used in deployment)\r\n    - See scripts/05-deploy-postgresql-cluster.sh for actual deployment\r\n    - Configuration values loaded from environment variables\r\n```\r\n\r\n### � Repository Structure\r\n\r\n```\r\n📦 azure-postgresql-ha-aks-workshop/\r\n├── 📄 README.md                        # Main project documentation\r\n├── 📄 00_START_HERE.md                 # Quick start guide\r\n├── 📄 CONTRIBUTING.md                  # Contribution guidelines\r\n├── 📄 LICENSE                          # MIT License\r\n│\r\n├── 📂 config/                          # Configuration files\r\n│   └── environment-variables.sh        # Bash environment config\r\n│\r\n├── 📂 scripts/                         # Deployment automation\r\n│   ├── deploy-all.sh                   # Master orchestration (8 steps)\r\n│   ├── 02-create-infrastructure.sh     # Azure resources + Container Insights\r\n│   ├── 03-configure-workload-identity.sh\r\n│   ├── 04-deploy-cnpg-operator.sh\r\n│   ├── 04a-install-barman-cloud-plugin.sh\r\n│   ├── 05-deploy-postgresql-cluster.sh\r\n│   ├── 06-configure-monitoring.sh      # Grafana\r\n│   ├── 06a-configure-azure-monitor-prometheus.sh # Azure Monitor\r\n│   ├── 07-display-connection-info.sh\r\n│   └── 07a-run-cluster-validation.sh   # In-cluster validation\r\n│\r\n├── 📂 kubernetes/                      # K8s manifests\r\n│   ├── postgresql-cluster.yaml         # Reference manifest\r\n│   └── cluster-validation-job.yaml     # In-cluster validation Job\r\n│\r\n├── 📂 grafana/                         # Grafana dashboards\r\n│   └── grafana-cnpg-ha-dashboard.json  # PostgreSQL HA dashboard\r\n│\r\n├── 📂 docs/                            # Comprehensive documentation\r\n│   ├── README.md                       # Full technical guide\r\n│   ├── SETUP_COMPLETE.md               # 👈 Start here\r\n│   ├── QUICK_REFERENCE.md              # Command cheat sheet\r\n│   ├── COST_ESTIMATION.md              # Budget planning\r\n│   ├── PRE_DEPLOYMENT_CHECKLIST.md     # Pre-flight checks\r\n│   ├── AZURE_MONITORING_SETUP.md       # Monitoring setup\r\n│   ├── GRAFANA_DASHBOARD_GUIDE.md      # Dashboard usage\r\n│   ├── IMPORT_DASHBOARD_NOW.md         # Dashboard import\r\n│   ├── FAILOVER_TESTING.md             # HA testing\r\n│   └── VM_SETUP_GUIDE.md               # Load test VM\r\n│\r\n└── 📂 .github/\r\n    └── copilot-instructions.md         # AI assistant context\r\n```\r\n\r\n---\r\n\r\n## 🎓 How to Use This Project\r\n\r\n### Phase 1: Understanding (10 mins)\r\n1. Read `docs/SETUP_COMPLETE.md` - Overview and prerequisites\r\n2. Review `docs/QUICK_REFERENCE.md` - Command reference\r\n3. Check `docs/COST_ESTIMATION.md` - Budget planning\r\n4. Skim `docs/README.md` - Full capabilities\r\n\r\n### Phase 2: Preparation (15 mins)\r\n1. Verify prerequisites installed (az, kubectl, helm, jq)\r\n2. Update `config/environment-variables.sh`\r\n3. Change PostgreSQL password in environment variables\r\n4. Verify region support for Premium v2\r\n\r\n### Phase 3: Deployment (20 mins)\r\n1. Load environment: `source config/environment-variables.sh`\r\n2. Run `./scripts/deploy-all.sh`\r\n3. Monitor deployment progress (7 automated steps)\r\n4. Verify cluster health\r\n\r\n### Phase 4: Validation (10 mins)\r\n1. Check pods are running\r\n2. Test PostgreSQL connection\r\n3. Verify backups to storage\r\n4. Access Grafana dashboard\r\n5. Run pgbench test: `./scripts/08-test-pgbench.sh`\r\n\r\n### Phase 5: Operation (Ongoing)\r\n1. Monitor cluster metrics\r\n2. Test backup/restore\r\n3. Scale as needed\r\n4. Apply updates\r\n\r\n---\r\n\r\n## � Connection Pooling with PgBouncer\r\n\r\n### Architecture\r\nThe deployment includes **3 PgBouncer instances** for high-availability connection pooling:\r\n\r\n| Component | Configuration |\r\n|-----------|---------------|\r\n| **Instances** | 3 pods with pod anti-affinity (different nodes) |\r\n| **Mode** | Transaction pooling (optimal for OLTP workloads) |\r\n| **Max Connections** | 10,000 client connections per instance |\r\n| **Pool Size** | 25 PostgreSQL connections per user/database |\r\n| **Total Capacity** | 30,000 concurrent client connections across all instances |\r\n\r\n### Services\r\n```bash\r\n# PgBouncer services (Recommended)\r\npg-primary-pooler-rw    # Read-write via connection pool\r\npg-primary-pooler-ro    # Read-only via connection pool\r\n\r\n# Direct PostgreSQL services\r\npg-primary-rw           # Direct read-write (no pooling)\r\npg-primary-ro           # Direct read-only (no pooling)\r\n```\r\n\r\n### When to Use PgBouncer\r\n✅ **Use PgBouncer for:**\r\n- Applications with many short-lived connections\r\n- Microservices architectures\r\n- Serverless workloads (Azure Functions, AWS Lambda)\r\n- Connection-heavy applications (10K+ connections)\r\n- High-availability workloads requiring connection efficiency\r\n\r\n⚠️ **Direct connections for:**\r\n- Long-running analytical queries\r\n- Database administration tasks\r\n- Schema migrations\r\n- Backup/restore operations\r\n\r\n### Connection Examples\r\n```bash\r\n# Via PgBouncer (Applications)\r\npsql \"host=pg-primary-pooler-rw.cnpg-database.svc.cluster.local port=5432 dbname=appdb user=app\"\r\n\r\n# Direct (Admin tasks)\r\npsql \"host=pg-primary-rw.cnpg-database.svc.cluster.local port=5432 dbname=appdb user=app\"\r\n```\r\n\r\n---\r\n\r\n## �📊 What Gets Deployed\r\n\r\n### Azure Resources\r\n- ✅ Resource Group\r\n- ✅ Virtual Network (10.0.0.0/8)\r\n- ✅ Network Security Group\r\n- ✅ AKS Cluster (1.32)\r\n  - System node pool: 2 x Standard_D2s_v5\r\n  - Postgres node pool: 3 x Standard_E8as_v6\r\n- ✅ Managed Identity (Workload Identity)\r\n- ✅ Storage Account (ZRS, Standard_V2)\r\n- ✅ Log Analytics Workspace\r\n- ✅ Managed Grafana Instance\r\n\r\n### Kubernetes Resources\r\n- ✅ CNPG Operator (cnpg-system namespace)\r\n- ✅ PostgreSQL Cluster (cnpg-database namespace)\r\n  - 3 PostgreSQL instances (48 GiB RAM, 6 vCPU each)\r\n  - 3 PgBouncer pooler instances (transaction mode, 10K max connections)\r\n  - 200GB data storage per instance\r\n  - Premium SSD v2 disks (40,000 IOPS, 1,250 MB/s per disk)\r\n  - Expected performance: 8,000-10,000 TPS sustained\r\n- ✅ StorageClass (managed-csi-premium-v2)\r\n- ✅ Services (pooler read-write, pooler read-only, direct read-write, direct read-only)\r\n- ✅ ConfigMaps \u0026 Secrets\r\n- ✅ PersistentVolumeClaims\r\n\r\n### Features Enabled\r\n- ✅ High Availability (automatic failover)\r\n- ✅ Zone Redundancy (across 3 AZs)\r\n- ✅ Workload Identity (secure auth)\r\n- ✅ Backup to Azure Storage\r\n- ✅ Point-in-Time Recovery (7 days)\r\n- ✅ WAL compression (lz4)\r\n- ✅ Monitoring (Prometheus + Grafana)\r\n- ✅ Health checks (automatic)\r\n\r\n---\r\n\r\n## 🔐 Security Features\r\n\r\n| Feature | Implementation |\r\n|---------|----------------|\r\n| **Authentication** | Workload Identity + SCRAM-SHA-256 |\r\n| **Network** | NSGs + Network Policies (Cilium) |\r\n| **Secrets** | No hardcoded secrets in pods |\r\n| **RBAC** | Kubernetes + Azure RBAC enabled |\r\n| **Encryption** | Storage encrypted at rest |\r\n| **Backups** | No public access, encrypted |\r\n| **Isolation** | Dedicated namespaces |\r\n\r\n---\r\n\r\n## 💾 Storage Options\r\n\r\n### Premium SSD v2 (Default - Optimized for High Performance)\r\n- **IOPS**: 40,000 per disk (configurable 3,100-80,000)\r\n- **Throughput**: 1,250 MB/s per disk (configurable 125-1,200 MB/s)\r\n- **Capacity**: 200 GiB per instance\r\n- **Benefits**: Excellent price-performance for high-TPS workloads (8-10K TPS)\r\n- **Regions**: swedencentral, westeurope, eastus, canadacentral, etc.\r\n\r\n### Premium SSD (Alternative)\r\n- **IOPS**: Fixed per disk size (lower than Premium v2)\r\n- **Throughput**: Fixed per disk size (lower than Premium v2)\r\n- **Benefits**: Widely available, proven performance\r\n- **Tradeoff**: Less cost-efficient and lower IOPS than Premium v2\r\n\r\n### Local NVMe (Ultra-High Performance - Future Migration)\r\n- **IOPS**: 400K+ per disk (Standard_L8s_v3)\r\n- **Throughput**: 2,000+ MB/s\r\n- **Benefits**: Sub-millisecond latency, 50K+ TPS capability\r\n- **Tradeoff**: Requires Azure Container Storage, higher cost\r\n- **Use Case**: Extreme transactional workloads (see Step 5 documentation)\r\n\r\n---\r\n\r\n## 🔧 Configuration Overview\r\n\r\n### Key Parameters to Adjust\r\n\r\n**In `config/environment-variables.sh`:**\r\n```bash\r\n# Azure settings\r\nPRIMARY_CLUSTER_REGION=\"swedencentral\"\r\nAKS_CLUSTER_VERSION=\"1.32\"\r\n\r\n# VM sizes (Standard_E8as_v6: 8 vCPU, 64 GiB RAM, AMD EPYC 9004 @ 3.7 GHz)\r\nSYSTEM_NODE_POOL_VMSKU=\"Standard_D2s_v5\"\r\nUSER_NODE_POOL_VMSKU=\"Standard_E8as_v6\"\r\n\r\n# Storage (Premium SSD v2 - Optimized for 10K TPS)\r\nDISK_IOPS=\"40000\"              # Max Premium SSD v2 IOPS\r\nDISK_THROUGHPUT=\"1200\"         # Max Premium SSD v2 throughput (MB/s)\r\nPG_STORAGE_SIZE=\"200Gi\"        # Increased for better performance\r\n\r\n# PostgreSQL (Optimized for Standard_E8as_v6)\r\nPG_DATABASE_NAME=\"appdb\"\r\nPG_DATABASE_USER=\"app\"\r\nPG_DATABASE_PASSWORD=\"SecurePassword123!\"  # Change this!\r\nPG_MEMORY=\"48Gi\"               # 75% of 64 GiB available on E8as_v6\r\nPG_CPU=\"6\"                     # 75% of 8 vCPUs available on E8as_v6\r\n\r\n# CNPG version (Operator 1.27.1)\r\nCNPG_VERSION=\"0.26.1\"\r\n```\r\n\r\n**All configuration is centralized in environment variables** - no need to edit multiple files.\r\n\r\n---\r\n\r\n## 📈 Monitoring \u0026 Observability\r\n\r\n### Azure Monitor\r\n- Application Insights integration\r\n- Container Insights (AKS logs)\r\n- Performance metrics\r\n\r\n### Prometheus + Grafana\r\n- PostgreSQL metrics via PodMonitor\r\n- Cluster health dashboards\r\n- Performance visualization\r\n- Alert capabilities\r\n\r\n### Key Metrics\r\n```\r\n# PostgreSQL Metrics\r\npg_up                                   # Database health\r\npg_stat_replication_lag_bytes            # Replication lag\r\npg_database_size_bytes                   # Database size\r\npg_wal_archive_status                    # Backup status\r\n\r\n# PgBouncer Metrics\r\npgbouncer_pools_cl_active               # Active client connections\r\npgbouncer_pools_sv_active               # Active server connections\r\npgbouncer_pools_maxwait                 # Connection pool wait time\r\npgbouncer_pools_cl_waiting              # Queued client connections\r\n\r\n# Infrastructure Metrics\r\nnode_memory_MemAvailable_bytes           # Node memory\r\n```\r\n\r\n---\r\n\r\n## 🚨 Critical Prerequisites\r\n\r\n### Tools\r\n- Azure CLI (v2.56+)\r\n- kubectl (v1.21+)\r\n- Helm (v3.0+)\r\n- jq (v1.5+)\r\n- OpenSSL (v3.3+)\r\n- Krew + CNPG plugin\r\n\r\n### Azure Requirements\r\n- Subscription with appropriate quota\r\n- Permissions: Owner or User Access Administrator\r\n- Region with Premium v2 support\r\n\r\n### Before Deployment\r\n- [ ] Change default passwords\r\n- [ ] Verify region support\r\n- [ ] Check subscription quota\r\n- [ ] Update managed identity references\r\n- [ ] Review cost implications\r\n\r\n---\r\n\r\n## ✅ Deployment Checklist\r\n\r\nBefore deployment:\r\n- [ ] Prerequisites installed\r\n- [ ] Configuration reviewed\r\n- [ ] Passwords changed\r\n- [ ] Region selected\r\n- [ ] Quota verified\r\n\r\nAfter deployment:\r\n- [ ] Cluster created\r\n- [ ] Pods running (3 PostgreSQL + 3 PgBouncer instances)\r\n- [ ] Storage provisioned\r\n- [ ] Backups to storage\r\n- [ ] Grafana accessible\r\n- [ ] Connection successful (both direct and pooled)\r\n\r\n---\r\n\r\n## 📞 Support \u0026 Troubleshooting\r\n\r\n### Quick Diagnostics\r\n```bash\r\n# Check operator\r\nkubectl logs -n cnpg-system deployment/cnpg-cloudnative-pg\r\n\r\n# Check cluster status\r\nkubectl cnpg status pg-primary -n cnpg-database\r\n\r\n# Check all pods (PostgreSQL + PgBouncer)\r\nkubectl get pods -n cnpg-database\r\n\r\n# Check PgBouncer logs\r\nkubectl logs -n cnpg-database -l cnpg.io/poolerName=pg-primary-pooler\r\n\r\n# Check storage\r\nkubectl get pvc -n cnpg-database\r\n\r\n# Check backups\r\naz storage blob list --account-name \u003caccount\u003e --container-name backups\r\n\r\n# Test performance\r\n./scripts/08-test-pgbench.sh\r\n```\r\n\r\n### Common Issues\r\n1. **Pods stuck in Init**: Check PVC binding and storage quota\r\n2. **WAL archiving fails**: Verify managed identity permissions\r\n3. **Operator not deploying**: Check Helm repository and CRDs\r\n4. **Premium v2 unavailable**: Check region support\r\n\r\nSee `docs/README.md` for detailed troubleshooting.\r\n\r\n---\r\n\r\n## 📚 Learning Path\r\n\r\n1. **Understand the basics**\r\n   - Read: docs/SETUP_COMPLETE.md\r\n   - Review: docs/README.md\r\n\r\n2. **Explore configuration**\r\n   - Edit: config/deployment-config.json\r\n   - Review: kubernetes/postgresql-cluster.yaml\r\n\r\n3. **Deploy to Azure**\r\n   - Run: scripts/deploy-postgresql-ha.sh\r\n   - Monitor: kubectl commands\r\n\r\n4. **Test operations**\r\n   - Connect to database\r\n   - Create backups\r\n   - Test failover\r\n   - Monitor metrics\r\n\r\n5. **Advanced topics**\r\n   - Scale cluster\r\n   - Update PostgreSQL\r\n   - Performance tuning\r\n   - Backup management\r\n\r\n---\r\n\r\n## 🎯 Success Criteria\r\n\r\nYour deployment is successful when:\r\n- ✅ 3 PostgreSQL pods running\r\n- ✅ 3 PgBouncer pooler pods running\r\n- ✅ Primary pod shows \"Primary\" status\r\n- ✅ Replica pods show \"Standby (sync)\"  \r\n- ✅ WAL archiving shows \"OK\"\r\n- ✅ Backups present in storage\r\n- ✅ Can connect via psql (both direct and pooled)\r\n- ✅ Grafana dashboard accessible\r\n- ✅ All PVCs bound and sized correctly\r\n- ✅ PgBouncer metrics showing active connections\r\n\r\n---\r\n\r\n## 🧪 Failover Testing\r\n\r\nAfter deployment, validate high availability with comprehensive failover tests:\r\n\r\n### Quick Start\r\n```bash\r\n# Navigate to failover testing\r\ncd scripts/failover-testing\r\n\r\n# Set PostgreSQL password\r\nexport PGPASSWORD=$(kubectl get secret pg-primary-app -n cnpg-database \\\r\n  -o jsonpath='{.data.password}' | base64 -d)\r\n\r\n# Run recommended scenario (PgBouncer + Simulated Failure)\r\n./scenario-2b-aks-pooler-simulated.sh\r\n```\r\n\r\n### Test Scenarios\r\n\r\n**Automated AKS Pod Scenarios** (ready to run):\r\n- `scenario-1a-aks-direct-manual.sh` - Direct PostgreSQL + Manual failover\r\n- `scenario-1b-aks-direct-simulated.sh` - Direct PostgreSQL + Simulated failure\r\n- `scenario-2a-aks-pooler-manual.sh` - PgBouncer + Manual failover ⭐\r\n- `scenario-2b-aks-pooler-simulated.sh` - PgBouncer + Simulated failure ⭐ **Recommended**\r\n\r\n**Azure VM External Client Scenarios** (requires VM setup):\r\n- See `docs/VM_SETUP_GUIDE.md` for Azure VM configuration\r\n- See `scripts/failover-testing/VM_SCENARIOS_REFERENCE.md` for external client testing\r\n\r\n### What Gets Tested\r\n- ✅ **RPO = 0** validation (zero data loss with synchronous replication)\r\n- ✅ **RTO \u003c 10s** measurement (recovery time objective)\r\n- ✅ **Connection resilience** (Direct vs PgBouncer comparison)\r\n- ✅ **Data consistency** (pre/post-failover transaction verification)\r\n- ✅ **Client reconnection** (automatic vs manual)\r\n- ✅ **Performance impact** (TPS and latency during failover)\r\n\r\n### Expected Results\r\n- **Target TPS**: 4,000-8,000 sustained (payment gateway workload)\r\n- **Failover Duration**: \u003c10 seconds (automatic promotion)\r\n- **Data Loss**: Zero (RPO=0 with synchronous replication)\r\n- **PgBouncer Advantage**: Transparent reconnection, \u003c1% error rate\r\n- **Direct Connection**: 5-10% error rate during failover window\r\n\r\n### Documentation\r\n- **Complete Guide**: [docs/FAILOVER_TESTING.md](docs/FAILOVER_TESTING.md)\r\n- **VM Setup**: [docs/VM_SETUP_GUIDE.md](docs/VM_SETUP_GUIDE.md)\r\n- **Quick Reference**: [scripts/failover-testing/README.md](scripts/failover-testing/README.md)\r\n\r\n---\r\n\r\n## 🔗 Important Links\r\n\r\n- **CloudNativePG**: https://cloudnative-pg.io/\r\n- **Azure AKS**: https://learn.microsoft.com/en-us/azure/aks/\r\n- **Premium v2 Disks**: https://learn.microsoft.com/en-us/azure/virtual-machines/disks-types\r\n- **Well-Architected Framework**: https://learn.microsoft.com/en-us/azure/architecture/framework/\r\n\r\n---\r\n\r\n## 📝 Version Information\r\n\r\n**Project Version**: `v1.0.0` (Semantic Versioning)  \r\n**Release Date**: October 2025  \r\n**AKS Version**: `1.32`  \r\n**Kubernetes Version**: `1.32`  \r\n**CNPG Operator**: `1.27.1`  \r\n**PostgreSQL**: `18.0`  \r\n**Status**: ✅ Lab \u0026 PoC Ready\r\n\r\n---\r\n\r\n**Ready to deploy?** Start with `docs/SETUP_COMPLETE.md` 🚀\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjonathan-vella%2Fazure-postgresql-ha-aks-workshop","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjonathan-vella%2Fazure-postgresql-ha-aks-workshop","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjonathan-vella%2Fazure-postgresql-ha-aks-workshop/lists"}