https://github.com/zxkane/openhands-infra
An AWS CDK (TypeScript) infrastructure project for deploying OpenHands, an AI‑driven development platform, on AWS.
https://github.com/zxkane/openhands-infra
aws aws-cdk code-agent iac openhands
Last synced: 29 days ago
JSON representation
An AWS CDK (TypeScript) infrastructure project for deploying OpenHands, an AI‑driven development platform, on AWS.
- Host: GitHub
- URL: https://github.com/zxkane/openhands-infra
- Owner: zxkane
- License: apache-2.0
- Created: 2026-01-19T09:02:48.000Z (about 2 months ago)
- Default Branch: main
- Last Pushed: 2026-02-12T15:47:54.000Z (about 1 month ago)
- Last Synced: 2026-02-12T17:48:34.716Z (about 1 month ago)
- Topics: aws, aws-cdk, code-agent, iac, openhands
- Language: TypeScript
- Homepage: https://kane.mx/posts/2026/deploying-openhands-on-aws-with-cdk/
- Size: 621 KB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
- Security: security-check.sh
- Agents: AGENTS.md
Awesome Lists containing this project
README
# OpenHands AWS Infrastructure
AWS CDK TypeScript project for deploying [OpenHands](https://github.com/All-Hands-AI/OpenHands) - an AI-driven development platform.
## Architecture Overview
```
User → CloudFront (WAF+Lambda@Edge Auth) → HTTP Origin → ALB (origin verified) → EC2 m7g.xlarge Graviton (ASG)
│ ↓
└── Cognito (OAuth2, Managed Login v2) OpenHands Docker + Watchtower
↓
VPC Endpoints → Bedrock / CloudWatch Logs
↓
RDS Proxy → Aurora Serverless v2 PostgreSQL
Runtime Apps:
{port}-{convId}.runtime.{subdomain}.{domain} → CloudFront → Lambda@Edge → OpenResty → Docker Container
```
Key features:
- **CloudFront with Origin Verification**: ALB requires X-Origin-Verify header - direct access returns 403
- **Internet-facing ALB**: Required for WebSocket support (CloudFront VPC Origin doesn't support WebSocket)
- **Self-Healing Architecture**: Conversation history persists across EC2 instance replacements
- **Runtime Subdomain Routing**: User apps accessible via `{port}-{convId}.runtime.{subdomain}.{domain}` with proper in-app routing
## Features
- **Graviton (ARM64)**: Cost-optimized m7g.xlarge instances (~20% cheaper than x86)
- **AWS Bedrock**: LLM inference via IAM Role (no API keys)
- **Aurora Serverless v2**: PostgreSQL database with RDS Proxy for connection pooling and high availability
- **S3 Data Persistence**: Conversation events, settings stored in S3 (survives instance replacement)
- **EFS Workspace Persistence**: Sandbox workspaces stored in EFS under `/data/openhands` (survives instance replacement)
- **Self-Healing**: ASG with ELB health checks + persistent database = no data loss on instance replacement
- **Auto-updates**: Watchtower for Docker image updates
- **Security**: Cognito auth (30-day session), WAF, VPC Endpoints, private subnets, Secrets Manager
- **Monitoring**: CloudWatch Logs, Alarms, Dashboard
- **Backup**: AWS Backup with 14-day retention, Aurora automatic backups (35 days)
- **Runtime Subdomain**: Apps run at domain root with proper internal routing and cookie isolation
- **Sandbox AWS Access**: Optional feature enabling sandbox containers to access AWS services with scoped IAM credentials
## Prerequisites
- AWS CLI configured with appropriate credentials
- Node.js 20+ and npm
- Existing VPC with private subnets and NAT Gateway
- Existing Route 53 Hosted Zone
## Deployment
### 1. Install Dependencies
```bash
npm install
```
### 2. Configure Context
Required context parameters:
| Parameter | Description | Example |
|-----------|-------------|---------|
| `vpcId` | Existing VPC ID | `vpc-0123456789abcdef0` |
| `hostedZoneId` | Route 53 Hosted Zone ID | `Z0123456789ABCDEFGHIJ` |
| `domainName` | Domain name | `example.com` |
| `subDomain` | Subdomain for OpenHands | `openhands` |
| `region` | AWS region (optional, defaults to us-east-1) | `us-west-2` |
| `siteName` | Cognito managed login site name (optional) | `Openhands on AWS` |
| `authCallbackDomains` | Extra OAuth callback domains for shared Cognito client (optional; JSON array or comma-separated) | `["openhands.example.com","openhands.test.example.com"]` |
| `authDomainPrefixSuffix` | Suffix for Cognito domain prefix (optional; avoids collisions) | `shared` |
| `edgeStackSuffix` | Suffix for Edge stack name in us-east-1 (optional; enables multiple Edge stacks) | `my-project` |
| `sandboxAwsAccess` | Enable sandbox AWS access (optional, defaults to false) | `true` |
| `sandboxAwsPolicyFile` | Path to custom IAM policy JSON for sandbox (optional) | `config/sandbox-aws-policy.json` |
### 3. Bootstrap CDK (First Time Only)
CDK must be bootstrapped in both regions since Edge resources are deployed to us-east-1:
```bash
npx cdk bootstrap --region
npx cdk bootstrap --region us-east-1 # Required for Lambda@Edge and CloudFront
```
### 4. Create Sandbox Secret Key (First Time Only)
The sandbox secret key is used to encrypt secrets in conversation state, enabling session resume after sandbox restart. Create it before deployment:
```bash
aws secretsmanager create-secret \
--name openhands/sandbox-secret-key \
--secret-string "$(openssl rand -base64 32)" \
--region \
--description "OpenHands sandbox secret key for session encryption"
```
**Note**: This secret must exist in each region where you deploy. If deploying to multiple regions, create the secret in each region.
### 5. Deploy
```bash
# Deploy all stacks
npx cdk deploy --all \
--context vpcId= \
--context hostedZoneId= \
--context domainName= \
--context subDomain= \
--context region= \
--require-approval never
```
**Deployment Order** (handled automatically by CDK):
0. Auth (us-east-1) - shared Cognito (managed login branding + multi-domain callbacks)
1. Network (main region)
2. Monitoring (main region) - independent, creates S3 data bucket
3. Security (main region) - depends on Network, Monitoring
4. Database (main region) - depends on Network, Security
5. Compute (main region) - depends on Network, Security, Monitoring, Database
6. Edge (us-east-1) - depends on Compute
### 6. Access OpenHands
After deployment, access OpenHands at:
```
https://.
```
## Stack Structure
| Stack | Region | Description |
|-------|--------|-------------|
| `OpenHands-Auth` | us-east-1 | Cognito User Pool + Managed Login v2 branding |
| `OpenHands-Network` | Main | VPC import, VPC Endpoints |
| `OpenHands-Monitoring` | Main | CloudWatch Logs, Alarms, Dashboard, Backup, S3 Data Bucket |
| `OpenHands-Security` | Main | IAM Roles, Security Groups |
| `OpenHands-Database` | Main | Aurora Serverless v2 PostgreSQL with RDS Proxy |
| `OpenHands-Compute` | Main | EC2 ASG, Launch Template, Internal ALB |
| `OpenHands-Edge-*` | us-east-1 | Lambda@Edge, CloudFront (VPC Origin), WAF, Route 53 (per domain/environment) |
**Notes**:
- Cognito is provisioned in `OpenHands-Auth` so multiple Edge stacks can reuse a single user pool/client.
- To add another domain/environment, include it in `authCallbackDomains`, deploy `OpenHands-Auth`, then deploy a new `OpenHands-Edge-`.
- The Database stack is **required** for self-healing architecture - it persists conversation history across EC2 instance replacements.
## Multi-Domain Deployment
You can deploy multiple OpenHands instances on different domains, all sharing the same backend infrastructure (Compute, Database, etc.) but with separate CloudFront distributions and DNS records.
### Architecture
```
┌─────────────────────────────────┐
│ AuthStack (us-east-1) │
│ Shared Cognito User Pool │
│ - Multi-domain callbacks │
└─────────────────────────────────┘
│
┌───────────────────────────────┼───────────────────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────────────────┐ ┌─────────────────────────┐ ┌─────────────────────────┐
│ EdgeStack-Domain1 │ │ EdgeStack-Domain2 │ │ EdgeStack-DomainN │
│ (us-east-1) │ │ (us-east-1) │ │ (us-east-1) │
│ - CloudFront │ │ - CloudFront │ │ - CloudFront │
│ - Lambda@Edge │ │ - Lambda@Edge │ │ - Lambda@Edge │
│ - WAF │ │ - WAF │ │ - WAF │
│ - Route 53 records │ │ - Route 53 records │ │ - Route 53 records │
│ - ACM Certificate │ │ - ACM Certificate │ │ - ACM Certificate │
└─────────────────────────┘ └─────────────────────────┘ └─────────────────────────┘
│ │ │
└───────────────────────────────┼───────────────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ ComputeStack (main region) │
│ - ALB with origin verification │
│ - EC2 ASG │
│ - SSM parameters in us-east-1 │
└─────────────────────────────────────┘
│
┌──────────────────┴──────────────────┐
▼ ▼
┌─────────────────────────┐ ┌─────────────────────────┐
│ DatabaseStack │ │ MonitoringStack │
│ Aurora PostgreSQL │ │ S3, CloudWatch │
└─────────────────────────┘ └─────────────────────────┘
```
### Step 1: Configure Shared Authentication
First, configure the Auth stack with all domains that will use it:
```bash
# Deploy Auth stack with all callback domains
npx cdk deploy OpenHands-Auth \
--context vpcId= \
--context hostedZoneId= \
--context domainName= \
--context subDomain=openhands \
--context region= \
--context authCallbackDomains='["openhands.domain1.com","openhands.domain2.com"]' \
--require-approval never
```
### Step 2: Deploy Backend Infrastructure
Deploy the shared backend infrastructure (only once):
```bash
npx cdk deploy OpenHands-Network OpenHands-Monitoring OpenHands-Security OpenHands-Database OpenHands-UserConfig OpenHands-Compute \
--context vpcId= \
--context hostedZoneId= \
--context domainName= \
--context subDomain=openhands \
--context region= \
--require-approval never
```
### Step 3: Deploy Edge Stacks for Each Domain
Deploy a separate Edge stack for each domain:
```bash
# Domain 1 (e.g., openhands.test.example.com)
npx cdk deploy OpenHands-Edge-Test \
--context vpcId= \
--context hostedZoneId= \
--context domainName=test.example.com \
--context subDomain=openhands \
--context region= \
--context edgeStackSuffix=Test \
--exclusively \
--require-approval never
# Domain 2 (e.g., openhands.prod.example.com)
npx cdk deploy OpenHands-Edge-Prod \
--context vpcId= \
--context hostedZoneId= \
--context domainName=prod.example.com \
--context subDomain=openhands \
--context region= \
--context edgeStackSuffix=Prod \
--exclusively \
--require-approval never
```
**Important**: Use `--exclusively` flag when deploying individual Edge stacks to avoid redeploying the backend stacks with different domain context.
### How It Works
1. **Shared Cognito**: All domains use the same Cognito User Pool. Users can log in with the same credentials on any domain.
2. **ALB Origin Verification**: The internet-facing ALB is protected by a custom `X-Origin-Verify` header. Direct access returns 403 - only CloudFront with the valid header can reach the ALB.
3. **SSM Parameter Sharing**: ComputeStack writes ALB DNS name and origin secret to SSM parameters in us-east-1. Edge stacks read from these parameters, avoiding CDK cross-region reference conflicts.
4. **Unique Resource Names**: Each Edge stack has unique CloudFront ResponseHeadersPolicy names (includes domain) to avoid naming conflicts.
### Adding a New Domain
To add a new domain to an existing deployment:
1. **Update Auth stack** with the new callback domain:
```bash
npx cdk deploy OpenHands-Auth \
--context authCallbackDomains='["existing.com","new-domain.com"]' \
...
```
2. **Deploy new Edge stack**:
```bash
npx cdk deploy OpenHands-Edge-NewDomain \
--context hostedZoneId= \
--context domainName=new-domain.com \
--context edgeStackSuffix=NewDomain \
--exclusively \
...
```
### Removing a Domain
To remove a domain:
1. **Delete the Edge stack**:
```bash
aws cloudformation delete-stack --stack-name OpenHands-Edge- --region us-east-1
```
2. **Optionally update Auth stack** to remove the callback domain (not required, but keeps config clean).
## Cost Estimate
### Base Infrastructure (~$375-420/month)
| Component | Monthly Cost (USD) | Usage Assumption |
|-----------|--------------------|------------------|
| EC2 m7g.xlarge Graviton | ~$112 | 730 hours (24/7) |
| EBS gp3 300GB | ~$30 | 300GB storage |
| Aurora Serverless v2 | ~$43-80 | 0.5-4 ACU (scales with usage) |
| RDS Proxy | ~$18 | 730 hours |
| S3 Data Bucket | ~$1-5 | Depends on usage |
| NAT Gateway | ~$0-35 | Depends on traffic |
| CloudFront | ~$85 | 1TB data transfer |
| ALB | ~$25 | 730 hours + LCUs |
| VPC Endpoints (8) | ~$50 | 8 endpoints × 730 hours |
| CloudWatch | ~$5 | Logs, metrics, alarms |
| Route 53 | ~$1 | 1 hosted zone + queries |
### Bedrock Usage (Variable)
Claude 4.5 models available on Amazon Bedrock:
| Model | Model ID | Input (per 1M) | Output (per 1M) |
|-------|----------|----------------|-----------------|
| Claude Opus 4.5 | `anthropic.claude-opus-4-5-20251101-v1:0` | $5 | $25 |
| Claude Sonnet 4.5 | `anthropic.claude-sonnet-4-5-20250929-v1:0` | $3 | $15 |
| Claude Haiku 4.5 | `anthropic.claude-haiku-4-5-20251001-v1:0` | $1 | $5 |
**Example**: 10M input + 2M output tokens/month with Claude Sonnet 4.5 ≈ $60/month
**Note**: Claude Sonnet 4.5 pricing increases for prompts >200K tokens ($6 input / $22.50 output per 1M).
## VPC Requirements
Your existing VPC must have:
- At least 2 private subnets in different AZs
- NAT Gateway for outbound internet access
- DNS hostnames enabled
## Data Persistence
OpenHands data is stored durably for self-healing across instance replacements:
| Data Type | Storage | Persistence |
|-----------|---------|-------------|
| Conversation Metadata | Aurora PostgreSQL | Permanent (via RDS Proxy) |
| Conversation Events | S3 | Permanent (survives instance replacement) |
| User Settings | S3 | Permanent |
| Workspace Files | EFS | Persistent (survives instance replacement) |
**Aurora Serverless v2 with RDS Proxy**:
- PostgreSQL 15.8 with RDS Proxy for connection pooling
- Password-based authentication via Secrets Manager (no token refresh needed)
- Serverless v2: 0.5-4 ACU (auto-scales with usage)
- 35-day automatic backup retention
- Storage encryption enabled
- CloudWatch logs export
- Removal policy: SNAPSHOT (creates final backup on deletion)
**RDS Proxy Benefits**:
- Connection pooling for efficient database connections
- Automatic failover handling
- No IAM token refresh required (uses stable passwords from Secrets Manager)
- Improved application availability during database maintenance
**S3 Bucket Configuration**:
- SSE-S3 encryption
- Versioning enabled (30-day retention for old versions)
- Block all public access
- Enforce SSL
- Removal policy: RETAIN (data preserved if stack is deleted)
### Conversation Resume (Self-Healing)
When EC2 instances are replaced (ASG scaling, CDK deployment, health check failure), sandbox Docker containers are terminated. OpenHands marks these conversations as `ARCHIVED` because the sandbox is missing. However, all conversation data is preserved:
| Data | Storage | Survives EC2 Replacement |
|------|---------|--------------------------|
| Conversation metadata | Aurora PostgreSQL | ✅ Yes |
| Conversation events/history | S3 | ✅ Yes |
| Workspace files | EFS (`/data/openhands`) | ✅ Yes |
**Auto-Resume Flow**:
```
User clicks archived conversation
↓
Frontend detects ARCHIVED status
↓
Calls POST /api/v1/app-conversations/{id}/resume
↓
Backend recreates sandbox container with:
- Same conversation ID
- user_id label (for authorization)
- EFS workspace mounted
↓
Page reloads → conversation is usable again
```
**What happens automatically**:
1. User navigates to an archived conversation
2. Frontend patch detects `status: ARCHIVED` and triggers resume
3. Backend creates a new sandbox container with the original workspace
4. Container gets `user_id` label for runtime URL authorization
5. Page reloads and conversation is ready for use
**Note**: The workspace files on EFS are preserved, so any code or files created in the previous session are still available after resume.
## Session Management
Cognito token validity configuration:
| Token Type | Validity | Description |
|------------|----------|-------------|
| Access Token | 1 hour | API access token |
| ID Token | 1 day | Identity token (stored in cookie) |
| Refresh Token | 30 days | Used to obtain new tokens |
Users stay logged in for up to 30 days without re-authentication.
## Runtime Subdomain Routing
When AI agents run applications (e.g., Flask, Node.js) inside the sandbox, they are accessible via dedicated runtime subdomains:
```
https://{port}-{convId}.runtime.{subdomain}.{domain}/
```
**Example**: `https://5000-abc123def456.runtime.openhands.example.com/`
### Benefits
| Feature | Benefit |
|---------|---------|
| Domain Root | Apps run at `/` - internal routes (e.g., `/add`, `/api`) work correctly |
| Cookie Isolation | Each runtime has isolated cookies (no cross-runtime cookie leakage) |
| Security Headers | X-Frame-Options, CSP, X-XSS-Protection automatically applied |
| No Authentication | Runtime subdomains bypass Cognito (apps are public within the conversation) |
### Architecture
```
User Browser
↓
https://5000-{convId}.runtime.openhands.example.com/
↓
CloudFront (matches *.runtime.* wildcard certificate)
↓
Lambda@Edge (viewer-request: parse subdomain, rewrite URI to /runtime/{convId}/{port}/...)
↓
ALB → EC2 → OpenResty
↓
Lua Docker Discovery (find container by convId, route to container IP:port)
↓
User App (Flask/Node.js/etc. inside sandbox container)
```
### Security Considerations
- **Cookie Isolation**: Cookies set by runtime apps are scoped to the exact subdomain only
- **No Domain Attribute**: Cookies don't leak to parent domain or other subdomains
- **SameSite=Strict**: Cross-site requests don't carry cookies
- **Security Headers**: Lambda@Edge origin-response adds protective headers
## Sandbox AWS Access
This optional feature enables AI agents running in sandbox containers to access AWS services (e.g., S3, DynamoDB, Lambda) using scoped IAM credentials.
### Enabling Sandbox AWS Access
```bash
npx cdk deploy --all \
--context sandboxAwsAccess=true \
--context sandboxAwsPolicyFile=config/sandbox-aws-policy.json \
--context vpcId= \
--context hostedZoneId= \
--context domainName= \
--context subDomain= \
--context region= \
--require-approval never
```
### ⚠️ IMPORTANT: Customize the Policy File
The default `config/sandbox-aws-policy.json` grants broad permissions (`Action: "*"`). **You MUST customize this file for your specific use case** to follow the principle of least privilege.
**Default policy (overly permissive - customize this!):**
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowAllExceptDenied",
"Effect": "Allow",
"Action": "*",
"Resource": "*"
}
]
}
```
**Example: Purpose-built policy for S3 and DynamoDB only:**
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowS3Access",
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:PutObject", "s3:ListBucket"],
"Resource": ["arn:aws:s3:::my-bucket", "arn:aws:s3:::my-bucket/*"]
},
{
"Sid": "AllowDynamoDB",
"Effect": "Allow",
"Action": ["dynamodb:GetItem", "dynamodb:PutItem", "dynamodb:Query"],
"Resource": "arn:aws:dynamodb:*:*:table/my-table"
}
]
}
```
### Hardcoded Explicit Denies
Regardless of what you configure in the policy file, the following actions are **always denied** to prevent privilege escalation:
| Category | Denied Actions |
|----------|----------------|
| IAM Users | `iam:CreateUser`, `iam:DeleteUser`, `iam:CreateAccessKey` |
| IAM Policies | `iam:AttachUserPolicy`, `iam:PutUserPolicy`, `iam:PutRolePolicy` |
| IAM Roles | `iam:CreateRole`, `iam:DeleteRole`, `iam:AttachRolePolicy` |
| Account | `organizations:*`, `account:*`, `billing:*` |
| Role Assumption | `sts:AssumeRole` (prevents lateral movement) |
### How It Works
1. **IAM Role**: SecurityStack creates `OpenHandsSandboxRole` with your custom policy + explicit denies
2. **Credential Refresh**: A systemd timer on EC2 refreshes temporary credentials every 10 minutes (15-minute token lifetime)
3. **Credential Mount**: The credentials file is mounted read-only into sandbox containers at `/data/sandbox-credentials`
4. **AWS CLI**: The agent-server container includes AWS CLI v2, configured to use the mounted credentials
## Security
- EC2 instances in private subnets only
- All AWS service access via VPC Endpoints
- IAM Role with least privilege (Bedrock, Logs, SSM, S3, Secrets Manager)
- Database credentials stored in AWS Secrets Manager
- RDS Proxy with TLS-encrypted connections
- Cognito authentication for all requests (30-day session)
- WAF protection with rate limiting
- EBS, S3, and Aurora storage encryption enabled
## Useful Commands
```bash
# Build TypeScript
npm run build
# Watch for changes
npm run watch
# Run tests
npm run test
# Show diff before deploy
npx cdk diff --all --context ...
# Synthesize CloudFormation
npx cdk synth --all --context ...
# Destroy all stacks
npx cdk destroy --all --context ...
```
## Troubleshooting
### VPC Lookup Fails
Ensure the VPC exists and your AWS credentials have `ec2:DescribeVpcs` permission.
### Certificate Validation Pending
ACM certificates use DNS validation. Ensure the Hosted Zone is correctly configured.
### EC2 Instance Not Starting
Check CloudWatch Logs at `/openhands/application` for container startup errors.
## CI/CD
This project uses GitHub Actions for continuous integration:
| Workflow | Trigger | Description |
|----------|---------|-------------|
| **CI** | Push/PR to main, develop | Build TypeScript, run all tests (Jest + pytest) |
| **Security Scan** | Push/PR to main, daily | npm audit, Checkov, git-secrets, Semgrep SAST, cfn-lint |
### Running Tests Locally
```bash
# Run all tests (build + TypeScript + Python)
npm run test
# Run TypeScript tests only
npm run test:ts
# Run Python tests only (requires .venv)
npm run test:py
# Update snapshots after intentional changes
npm run test:ts -- -u
```
## License
This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.
This infrastructure project deploys [OpenHands](https://github.com/All-Hands-AI/OpenHands). See the [OpenHands License](https://github.com/All-Hands-AI/OpenHands/blob/main/LICENSE) for the main application.