https://github.com/cloudon-one/kubelaunch-essentials
A preconfigured Kubernetes environment with Terragrunt-based automation, service mesh, and observability baked in—ready to deploy in minutes.
https://github.com/cloudon-one/kubelaunch-essentials
kubernetes platform-engineering terraform terragrunt
Last synced: about 2 months ago
JSON representation
A preconfigured Kubernetes environment with Terragrunt-based automation, service mesh, and observability baked in—ready to deploy in minutes.
- Host: GitHub
- URL: https://github.com/cloudon-one/kubelaunch-essentials
- Owner: cloudon-one
- License: mit
- Created: 2025-01-12T09:33:50.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-01-21T20:35:29.000Z (over 1 year ago)
- Last Synced: 2025-03-20T05:43:38.431Z (over 1 year ago)
- Topics: kubernetes, platform-engineering, terraform, terragrunt
- Language: HCL
- Homepage: https://cloudon.work
- Size: 92.8 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# KubeLaunch Essentials
Production-ready Kubernetes platform on AWS EKS with integrated security controls, GitOps automation, service mesh, and observability — deployed entirely via Infrastructure as Code.
---
## Table of Contents
- [Architecture](#architecture)
- [Component Matrix](#component-matrix)
- [Repository Structure](#repository-structure)
- [Quick Start](#quick-start)
- [Configuration](#configuration)
- [Security Hardening](#security-hardening)
- [CI/CD Integration](#cicd-integration)
- [Operations](#operations)
- [Troubleshooting](#troubleshooting)
- [Documentation](#documentation)
- [Contributing](#contributing)
---
## Architecture
```mermaid
graph TB
subgraph AWS["AWS Infrastructure"]
OIDC["GitHub OIDC"] --> S3["S3 State Backend
KMS + DynamoDB"]
Lambda["Secrets Rotation
Lambda"] --> SM["Secrets Manager"]
Audit["Security Audit
CloudWatch"]
end
subgraph Core["1. Core Platform"]
Karpenter["Karpenter"] & ExDNS["External DNS"] & CertMgr["Cert Manager"] & ExtSec["External Secrets"]
end
subgraph Mesh["2. Service Mesh"]
Istio["Istio mTLS"] & Kong["Kong Gateway"] & Jaeger["Jaeger"]
end
subgraph Sec["3. Security"]
Kyverno["Kyverno"] & Falco["Falco"] & Velero["Velero"]
end
subgraph Obs["4. Observability"]
Loki["Loki Stack"] & Kubecost["Kubecost"] & Compliance["CIS Scanner"]
end
subgraph Tools["5. Platform Tools"]
ArgoCD["ArgoCD"] & Atlantis["Atlantis"] & Vault["Vault"] & Airflow["Airflow"]
end
SM --> ExtSec
ExtSec --> ArgoCD & Vault
Kyverno -.->|Policy| Tools & Mesh
Falco -.->|Monitor| Core
Velero -.->|Backup| Tools
```
**Deployment order**: Core Platform -> Service Mesh -> Security -> Observability -> Platform Tools. Destroy in reverse.
---
## Component Matrix
| Layer | Component | Version | Purpose |
|-------|-----------|---------|---------|
| **Core Platform** | Karpenter | v1.10.0 | Node auto-provisioning |
| | External DNS | - | DNS automation |
| | Cert Manager | v1.20.0 | Certificate lifecycle |
| | External Secrets | v2.2.0 | AWS Secrets sync |
| **Service Mesh** | Istio | v1.29.1 | mTLS, traffic management |
| | Kong Gateway | v3.9.1 | API gateway |
| | Jaeger | v2.16.0 | Distributed tracing |
| **Security** | Kyverno | v3.7.1 | Admission control (4 policies) |
| | Falco | v8.0.1 | Runtime threat detection (eBPF) |
| | Velero | v12.0.0 | Backup & disaster recovery |
| **Observability** | Loki Stack | v3.7.1 | Log aggregation |
| | Kubecost | v3.0.3 | FinOps / cost monitoring |
| | Compliance Scanner | v1.2.0 | CIS 1.8 benchmark scanning |
| **Platform Tools** | ArgoCD | v3.3.6 | GitOps deployment |
| | Atlantis | - | Terraform PR automation |
| | Vault | v1.21.4 | Secrets management |
| | Airflow | v3.1.8 | Workflow orchestration |
| **AWS Infra** | State Backend | - | S3 + DynamoDB + KMS |
| | GitHub OIDC | - | Federated CI/CD auth |
| | Secrets Rotation | - | Lambda auto-rotation |
| | Security Audit | - | CloudWatch monitoring |
---
## Repository Structure
```
.
├── aws-infrastructure/ # AWS foundation (Terraform, local modules)
│ ├── state-backend/ # S3 + DynamoDB + KMS for state
│ ├── github-oidc/ # GitHub Actions OIDC federation
│ ├── external-secrets-iam/ # IRSA roles for External Secrets
│ ├── secrets-rotation-lambda/ # Automated secrets rotation
│ └── security-audit-automation/ # CloudWatch security monitoring
│
├── k8s-platform-tools/ # Kubernetes platform (Terragrunt, remote modules)
│ ├── core-platform/ # Karpenter, External DNS, Cert Manager, External Secrets
│ ├── service-mesh/ # Istio, Kong, Jaeger
│ ├── security/ # Kyverno, Falco, Velero
│ ├── observability/ # Loki, Kubecost, Compliance Scanner
│ ├── platform-tools/ # ArgoCD, Atlantis, Vault, Airflow
│ ├── ci-cd-templates/ # Reusable GitHub Actions workflows
│ ├── github-actions-templates/ # Language-specific test coverage actions
│ ├── common.hcl # Shared Terragrunt config (state, provider, versions)
│ └── platform_vars.yaml # Single source of truth for all config
│
└── .github/workflows/ # OIDC-based CI/CD pipeline
```
---
## Quick Start
### Prerequisites
| Tool | Version | Purpose |
|------|---------|---------|
| Terraform | >= 1.12.2 | Infrastructure provisioning |
| Terragrunt | >= 1.0.0 | Configuration orchestration |
| AWS CLI | v2 | AWS authentication |
| kubectl | >= 1.28 | Cluster access |
| Helm | v3.x | Chart management |
### Deploy
```bash
# 1. Configure platform variables
cd k8s-platform-tools
cp platform_vars.yaml.example platform_vars.yaml # Edit with your values
# 2. Bootstrap AWS infrastructure
cd ../aws-infrastructure/state-backend && terraform init && terraform apply
cd ../github-oidc && terragrunt apply
cd ../external-secrets-iam && terragrunt apply
# 3. Deploy platform layers (in order)
cd ../../k8s-platform-tools/core-platform && terragrunt run -a -- apply
cd ../service-mesh && terragrunt run -a -- apply
cd ../security && terragrunt run -a -- apply
cd ../observability && terragrunt run -a -- apply
cd ../platform-tools && terragrunt run -a -- apply
# 4. Deploy operational security
cd ../../aws-infrastructure/security-audit-automation && terragrunt apply
cd ../secrets-rotation-lambda && terragrunt apply
```
### Destroy (reverse order)
```bash
cd k8s-platform-tools
terragrunt run -a --working-dir platform-tools -- destroy
terragrunt run -a --working-dir observability -- destroy
terragrunt run -a --working-dir security -- destroy
terragrunt run -a --working-dir service-mesh -- destroy
terragrunt run -a --working-dir core-platform -- destroy
```
---
## Configuration
All platform configuration lives in **`k8s-platform-tools/platform_vars.yaml`** with three sections:
| YAML Path | Components |
|-----------|------------|
| `Platform.Tools..inputs` | Core platform, service mesh, platform tools |
| `Platform.Security..inputs` | Kyverno, Falco, Velero |
| `Platform.Observability..inputs` | Compliance Scanner |
| `common.*` | Shared values (region, VPC, EKS, tags) |
**Key convention**: Component directory name must match the YAML key exactly (resolved via `basename(get_terragrunt_dir())`).
### Environment Selection
```bash
ENV=dev terragrunt apply # default
ENV=prod terragrunt apply # production
```
### Secrets Management
All sensitive values are stored in AWS Secrets Manager and referenced as:
```yaml
admin_password: "aws-secretsmanager:///dev/argocd/admin-password"
```
Secrets are synced to Kubernetes via External Secrets Operator with IRSA.
---
## Security Hardening
### Phase 1: Foundation
| Control | Implementation |
|---------|---------------|
| State encryption | S3 KMS + DynamoDB KMS with key rotation |
| State locking | DynamoDB with prevent_destroy lifecycle |
| CI/CD auth | GitHub OIDC federation (no long-lived keys) |
| Secrets access | IRSA least-privilege per component |
### Phase 2: Runtime
| Control | Implementation |
|---------|---------------|
| Admission control | Kyverno: approved registries, no `latest` tag, resource limits, security contexts |
| Threat detection | Falco eBPF: privileged containers, sensitive file access, C2 connections |
| Backup & DR | Velero: daily full, hourly critical, weekly maintenance with S3+KMS |
| Network policies | Default-deny with explicit allow for DNS, k8s API |
### Phase 3: Operational
| Control | Implementation |
|---------|---------------|
| Secrets rotation | Lambda-based monthly rotation with SNS notifications |
| Compliance scanning | Weekly CIS 1.8 benchmarks with S3 reports |
| Security monitoring | CloudWatch alarms: failed auth (>10/5min), privilege escalation, policy violations |
| Security dashboard | Centralized CloudWatch dashboard |
---
## CI/CD Integration
### Main Workflow (`.github/workflows/terragrunt-plan-apply-oidc.yaml`)
OIDC-authenticated pipeline with manual dispatch:
```
Workflow Dispatch → OIDC Auth → Init → Validate → Plan → [Approval] → Apply
```
| Feature | Detail |
|---------|--------|
| Authentication | AWS OIDC (no static credentials) |
| Environments | dev, qa, prod (selectable) |
| Approval | Required before apply via GitHub Issues |
| Artifacts | Plan output stored 30 days |
| Tools | Terraform 1.12.2, Terragrunt 1.0.0 |
### Reusable Templates (`k8s-platform-tools/ci-cd-templates/`)
| Template | Purpose |
|----------|---------|
| `terragrunt-plan-apply.yaml` | Full pipeline: TFSEC, Checkov, Infracost, drift detection |
| `reusable-docker-build.yaml` | Multi-platform Docker builds with Trivy scanning |
| `terragrunt-fmt-commit.yaml` | Auto-format with TFLint, PR creation |
| `get-env-func.yaml` | Branch-to-environment mapping |
---
## Operations
### Monitoring
```bash
# Security dashboard
cd aws-infrastructure/security-audit-automation && terragrunt output dashboard_url
# Compliance reports
aws s3 ls s3://--compliance-reports/
# Falco alerts
kubectl logs -n falco -l app.kubernetes.io/name=falco | grep CRITICAL
# Kyverno policy reports
kubectl get clusterpolicyreport -o yaml
# Velero backup status
velero backup get
```
### State Management
```bash
terragrunt state list # List resources
terragrunt state pull > backup.tfstate # Backup state
terragrunt force-unlock # Unlock stuck state
```
---
## Troubleshooting
| Problem | Solution |
|---------|----------|
| State locked | `terragrunt force-unlock ` |
| Config not applied | Verify directory name matches `platform_vars.yaml` key |
| Module fetch fails | Check git access to `github.com/cloudon-one/k8s-platform-modules`, verify `ref=dev` exists |
| OIDC auth fails | `aws iam list-open-id-connect-providers` and check role trust policy |
| Policy blocks deploy | Set Kyverno to Audit: `kubectl patch clusterpolicy -p '{"spec":{"validationFailureAction":"Audit"}}'` |
| Secrets not syncing | `kubectl describe externalsecret -n ` |
| Dependency errors | Verify parent layer is deployed; check deployment order |
---
## Documentation
| Document | Description |
|----------|-------------|
| [Security Review](./SECURITY_REVIEW.md) | Initial security audit findings |
| [Security Implementation Plan](./SECURITY_IMPLEMENTATION_PLAN.md) | Complete security roadmap |
| [Phase 1 Deployment](./PHASE1_FOUNDATION_SECURITY_DEPLOYMENT.md) | Foundation security guide |
| [Phase 2 Deployment](./PHASE2_SECURITY_DEPLOYMENT.md) | Runtime security guide |
| [Phase 3 Deployment](./PHASE3_OPERATIONAL_SECURITY_DEPLOYMENT.md) | Operational security guide |
| [IaC Summary](./INFRASTRUCTURE_AS_CODE_SUMMARY.md) | Infrastructure as Code overview |
---
## Contributing
1. Fork the repository
2. Create feature branch (`git checkout -b feature/my-feature`)
3. Follow existing Terragrunt/Terraform patterns
4. Update `platform_vars.yaml` for configuration changes
5. Open a Pull Request
---
## License
MIT License - see [LICENSE](LICENSE) for details.
---
Built for production Kubernetes deployments