An open API service indexing awesome lists of open source software.

https://github.com/cloudon-one/kubelaunch-essentials

A preconfigured Kubernetes environment with Terragrunt-based automation, service mesh, and observability baked in—ready to deploy in minutes.
https://github.com/cloudon-one/kubelaunch-essentials

kubernetes platform-engineering terraform terragrunt

Last synced: about 2 months ago
JSON representation

A preconfigured Kubernetes environment with Terragrunt-based automation, service mesh, and observability baked in—ready to deploy in minutes.

Awesome Lists containing this project

README

          


Terraform
Terragrunt
AWS EKS
Kubernetes
Security
License

# KubeLaunch Essentials

Production-ready Kubernetes platform on AWS EKS with integrated security controls, GitOps automation, service mesh, and observability — deployed entirely via Infrastructure as Code.

---

## Table of Contents

- [Architecture](#architecture)
- [Component Matrix](#component-matrix)
- [Repository Structure](#repository-structure)
- [Quick Start](#quick-start)
- [Configuration](#configuration)
- [Security Hardening](#security-hardening)
- [CI/CD Integration](#cicd-integration)
- [Operations](#operations)
- [Troubleshooting](#troubleshooting)
- [Documentation](#documentation)
- [Contributing](#contributing)

---

## Architecture

```mermaid
graph TB
subgraph AWS["AWS Infrastructure"]
OIDC["GitHub OIDC"] --> S3["S3 State Backend
KMS + DynamoDB"]
Lambda["Secrets Rotation
Lambda"] --> SM["Secrets Manager"]
Audit["Security Audit
CloudWatch"]
end

subgraph Core["1. Core Platform"]
Karpenter["Karpenter"] & ExDNS["External DNS"] & CertMgr["Cert Manager"] & ExtSec["External Secrets"]
end

subgraph Mesh["2. Service Mesh"]
Istio["Istio mTLS"] & Kong["Kong Gateway"] & Jaeger["Jaeger"]
end

subgraph Sec["3. Security"]
Kyverno["Kyverno"] & Falco["Falco"] & Velero["Velero"]
end

subgraph Obs["4. Observability"]
Loki["Loki Stack"] & Kubecost["Kubecost"] & Compliance["CIS Scanner"]
end

subgraph Tools["5. Platform Tools"]
ArgoCD["ArgoCD"] & Atlantis["Atlantis"] & Vault["Vault"] & Airflow["Airflow"]
end

SM --> ExtSec
ExtSec --> ArgoCD & Vault
Kyverno -.->|Policy| Tools & Mesh
Falco -.->|Monitor| Core
Velero -.->|Backup| Tools
```

**Deployment order**: Core Platform -> Service Mesh -> Security -> Observability -> Platform Tools. Destroy in reverse.

---

## Component Matrix

| Layer | Component | Version | Purpose |
|-------|-----------|---------|---------|
| **Core Platform** | Karpenter | v1.10.0 | Node auto-provisioning |
| | External DNS | - | DNS automation |
| | Cert Manager | v1.20.0 | Certificate lifecycle |
| | External Secrets | v2.2.0 | AWS Secrets sync |
| **Service Mesh** | Istio | v1.29.1 | mTLS, traffic management |
| | Kong Gateway | v3.9.1 | API gateway |
| | Jaeger | v2.16.0 | Distributed tracing |
| **Security** | Kyverno | v3.7.1 | Admission control (4 policies) |
| | Falco | v8.0.1 | Runtime threat detection (eBPF) |
| | Velero | v12.0.0 | Backup & disaster recovery |
| **Observability** | Loki Stack | v3.7.1 | Log aggregation |
| | Kubecost | v3.0.3 | FinOps / cost monitoring |
| | Compliance Scanner | v1.2.0 | CIS 1.8 benchmark scanning |
| **Platform Tools** | ArgoCD | v3.3.6 | GitOps deployment |
| | Atlantis | - | Terraform PR automation |
| | Vault | v1.21.4 | Secrets management |
| | Airflow | v3.1.8 | Workflow orchestration |
| **AWS Infra** | State Backend | - | S3 + DynamoDB + KMS |
| | GitHub OIDC | - | Federated CI/CD auth |
| | Secrets Rotation | - | Lambda auto-rotation |
| | Security Audit | - | CloudWatch monitoring |

---

## Repository Structure

```
.
├── aws-infrastructure/ # AWS foundation (Terraform, local modules)
│ ├── state-backend/ # S3 + DynamoDB + KMS for state
│ ├── github-oidc/ # GitHub Actions OIDC federation
│ ├── external-secrets-iam/ # IRSA roles for External Secrets
│ ├── secrets-rotation-lambda/ # Automated secrets rotation
│ └── security-audit-automation/ # CloudWatch security monitoring

├── k8s-platform-tools/ # Kubernetes platform (Terragrunt, remote modules)
│ ├── core-platform/ # Karpenter, External DNS, Cert Manager, External Secrets
│ ├── service-mesh/ # Istio, Kong, Jaeger
│ ├── security/ # Kyverno, Falco, Velero
│ ├── observability/ # Loki, Kubecost, Compliance Scanner
│ ├── platform-tools/ # ArgoCD, Atlantis, Vault, Airflow
│ ├── ci-cd-templates/ # Reusable GitHub Actions workflows
│ ├── github-actions-templates/ # Language-specific test coverage actions
│ ├── common.hcl # Shared Terragrunt config (state, provider, versions)
│ └── platform_vars.yaml # Single source of truth for all config

└── .github/workflows/ # OIDC-based CI/CD pipeline
```

---

## Quick Start

### Prerequisites

| Tool | Version | Purpose |
|------|---------|---------|
| Terraform | >= 1.12.2 | Infrastructure provisioning |
| Terragrunt | >= 1.0.0 | Configuration orchestration |
| AWS CLI | v2 | AWS authentication |
| kubectl | >= 1.28 | Cluster access |
| Helm | v3.x | Chart management |

### Deploy

```bash
# 1. Configure platform variables
cd k8s-platform-tools
cp platform_vars.yaml.example platform_vars.yaml # Edit with your values

# 2. Bootstrap AWS infrastructure
cd ../aws-infrastructure/state-backend && terraform init && terraform apply
cd ../github-oidc && terragrunt apply
cd ../external-secrets-iam && terragrunt apply

# 3. Deploy platform layers (in order)
cd ../../k8s-platform-tools/core-platform && terragrunt run -a -- apply
cd ../service-mesh && terragrunt run -a -- apply
cd ../security && terragrunt run -a -- apply
cd ../observability && terragrunt run -a -- apply
cd ../platform-tools && terragrunt run -a -- apply

# 4. Deploy operational security
cd ../../aws-infrastructure/security-audit-automation && terragrunt apply
cd ../secrets-rotation-lambda && terragrunt apply
```

### Destroy (reverse order)

```bash
cd k8s-platform-tools
terragrunt run -a --working-dir platform-tools -- destroy
terragrunt run -a --working-dir observability -- destroy
terragrunt run -a --working-dir security -- destroy
terragrunt run -a --working-dir service-mesh -- destroy
terragrunt run -a --working-dir core-platform -- destroy
```

---

## Configuration

All platform configuration lives in **`k8s-platform-tools/platform_vars.yaml`** with three sections:

| YAML Path | Components |
|-----------|------------|
| `Platform.Tools..inputs` | Core platform, service mesh, platform tools |
| `Platform.Security..inputs` | Kyverno, Falco, Velero |
| `Platform.Observability..inputs` | Compliance Scanner |
| `common.*` | Shared values (region, VPC, EKS, tags) |

**Key convention**: Component directory name must match the YAML key exactly (resolved via `basename(get_terragrunt_dir())`).

### Environment Selection

```bash
ENV=dev terragrunt apply # default
ENV=prod terragrunt apply # production
```

### Secrets Management

All sensitive values are stored in AWS Secrets Manager and referenced as:
```yaml
admin_password: "aws-secretsmanager:///dev/argocd/admin-password"
```

Secrets are synced to Kubernetes via External Secrets Operator with IRSA.

---

## Security Hardening

### Phase 1: Foundation

| Control | Implementation |
|---------|---------------|
| State encryption | S3 KMS + DynamoDB KMS with key rotation |
| State locking | DynamoDB with prevent_destroy lifecycle |
| CI/CD auth | GitHub OIDC federation (no long-lived keys) |
| Secrets access | IRSA least-privilege per component |

### Phase 2: Runtime

| Control | Implementation |
|---------|---------------|
| Admission control | Kyverno: approved registries, no `latest` tag, resource limits, security contexts |
| Threat detection | Falco eBPF: privileged containers, sensitive file access, C2 connections |
| Backup & DR | Velero: daily full, hourly critical, weekly maintenance with S3+KMS |
| Network policies | Default-deny with explicit allow for DNS, k8s API |

### Phase 3: Operational

| Control | Implementation |
|---------|---------------|
| Secrets rotation | Lambda-based monthly rotation with SNS notifications |
| Compliance scanning | Weekly CIS 1.8 benchmarks with S3 reports |
| Security monitoring | CloudWatch alarms: failed auth (>10/5min), privilege escalation, policy violations |
| Security dashboard | Centralized CloudWatch dashboard |

---

## CI/CD Integration

### Main Workflow (`.github/workflows/terragrunt-plan-apply-oidc.yaml`)

OIDC-authenticated pipeline with manual dispatch:

```
Workflow Dispatch → OIDC Auth → Init → Validate → Plan → [Approval] → Apply
```

| Feature | Detail |
|---------|--------|
| Authentication | AWS OIDC (no static credentials) |
| Environments | dev, qa, prod (selectable) |
| Approval | Required before apply via GitHub Issues |
| Artifacts | Plan output stored 30 days |
| Tools | Terraform 1.12.2, Terragrunt 1.0.0 |

### Reusable Templates (`k8s-platform-tools/ci-cd-templates/`)

| Template | Purpose |
|----------|---------|
| `terragrunt-plan-apply.yaml` | Full pipeline: TFSEC, Checkov, Infracost, drift detection |
| `reusable-docker-build.yaml` | Multi-platform Docker builds with Trivy scanning |
| `terragrunt-fmt-commit.yaml` | Auto-format with TFLint, PR creation |
| `get-env-func.yaml` | Branch-to-environment mapping |

---

## Operations

### Monitoring

```bash
# Security dashboard
cd aws-infrastructure/security-audit-automation && terragrunt output dashboard_url

# Compliance reports
aws s3 ls s3://--compliance-reports/

# Falco alerts
kubectl logs -n falco -l app.kubernetes.io/name=falco | grep CRITICAL

# Kyverno policy reports
kubectl get clusterpolicyreport -o yaml

# Velero backup status
velero backup get
```

### State Management

```bash
terragrunt state list # List resources
terragrunt state pull > backup.tfstate # Backup state
terragrunt force-unlock # Unlock stuck state
```

---

## Troubleshooting

| Problem | Solution |
|---------|----------|
| State locked | `terragrunt force-unlock ` |
| Config not applied | Verify directory name matches `platform_vars.yaml` key |
| Module fetch fails | Check git access to `github.com/cloudon-one/k8s-platform-modules`, verify `ref=dev` exists |
| OIDC auth fails | `aws iam list-open-id-connect-providers` and check role trust policy |
| Policy blocks deploy | Set Kyverno to Audit: `kubectl patch clusterpolicy -p '{"spec":{"validationFailureAction":"Audit"}}'` |
| Secrets not syncing | `kubectl describe externalsecret -n ` |
| Dependency errors | Verify parent layer is deployed; check deployment order |

---

## Documentation

| Document | Description |
|----------|-------------|
| [Security Review](./SECURITY_REVIEW.md) | Initial security audit findings |
| [Security Implementation Plan](./SECURITY_IMPLEMENTATION_PLAN.md) | Complete security roadmap |
| [Phase 1 Deployment](./PHASE1_FOUNDATION_SECURITY_DEPLOYMENT.md) | Foundation security guide |
| [Phase 2 Deployment](./PHASE2_SECURITY_DEPLOYMENT.md) | Runtime security guide |
| [Phase 3 Deployment](./PHASE3_OPERATIONAL_SECURITY_DEPLOYMENT.md) | Operational security guide |
| [IaC Summary](./INFRASTRUCTURE_AS_CODE_SUMMARY.md) | Infrastructure as Code overview |

---

## Contributing

1. Fork the repository
2. Create feature branch (`git checkout -b feature/my-feature`)
3. Follow existing Terragrunt/Terraform patterns
4. Update `platform_vars.yaml` for configuration changes
5. Open a Pull Request

---

## License

MIT License - see [LICENSE](LICENSE) for details.

---

Built for production Kubernetes deployments