https://github.com/zxkane/openhands-infra
An AWS CDK (TypeScript) infrastructure project for deploying OpenHands, an AIβdriven development platform, on AWS.
https://github.com/zxkane/openhands-infra
aws aws-cdk code-agent iac openhands
Last synced: about 2 months ago
JSON representation
An AWS CDK (TypeScript) infrastructure project for deploying OpenHands, an AIβdriven development platform, on AWS.
- Host: GitHub
- URL: https://github.com/zxkane/openhands-infra
- Owner: zxkane
- License: apache-2.0
- Created: 2026-01-19T09:02:48.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2026-02-12T15:47:54.000Z (4 months ago)
- Last Synced: 2026-02-12T17:48:34.716Z (4 months ago)
- Topics: aws, aws-cdk, code-agent, iac, openhands
- Language: TypeScript
- Homepage: https://kane.mx/posts/2026/deploying-openhands-on-aws-with-cdk/
- Size: 621 KB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
- Security: security-check.sh
- Agents: AGENTS.md
Awesome Lists containing this project
README
# π OpenHands on AWS
### Self-host your AI coding agent β fully serverless, zero idle cost
[](LICENSE)
[](https://github.com/zxkane/openhands-infra/actions/workflows/ci.yml)
[](https://aws.amazon.com/cdk/)
[](https://github.com/All-Hands-AI/OpenHands)
Deploy [OpenHands](https://github.com/All-Hands-AI/OpenHands) on AWS with **production-grade infrastructure** in minutes.
ECS Fargate β’ Bedrock LLM β’ Per-conversation isolation β’ Self-healing architecture.
[**Getting Started**](#-quick-start) Β· [**Architecture**](#architecture-overview) Β· [**Cost Estimate**](#cost-estimate) Β· [**Blog Post**](https://kane.mx/posts/2026/serverless-multi-tenant-openhands-on-aws/)
---
## Why This Project?
Running OpenHands locally is great for trying it out. Running it for a **team** or in **production** is a different story:
| Challenge | How This Project Solves It |
|-----------|---------------------------|
| **"I don't want to manage servers"** | Fully serverless β ECS Fargate, Aurora Serverless, no EC2 instances |
| **"Idle cost is too high"** | Sandboxes scale to zero when not in use; pay only for active conversations |
| **"Multi-user access control"** | Cognito authentication with 30-day sessions, per-user conversation isolation |
| **"My conversations disappear on restart"** | Self-healing: Aurora + S3 + EFS persist everything across Fargate task replacements |
| **"I need AWS access from the AI agent"** | Optional scoped IAM credentials for sandbox containers (least-privilege) |
| **"Setting up infra is painful"** | One `cdk deploy --all` command β 10 stacks deployed in the right order automatically |
## β¨ Key Features
- **ποΈ Fully Serverless** β ECS Fargate (ARM64) for compute, Aurora Serverless v2 for database, no instances to patch
- **π° Zero Idle Cost** β Sandbox containers spin up per-conversation and stop automatically after idle timeout
- **π Per-Conversation Isolation** β Each sandbox gets a dedicated EFS access point; no cross-conversation access
- **π Self-Healing Architecture** β Conversations resume seamlessly after Fargate task replacement (Aurora + S3 + EFS)
- **π€ AWS Bedrock** β LLM inference via IAM Role, no API keys to manage
- **π Multi-Domain Support** β Share one backend across multiple CloudFront distributions and domains
- **π Enterprise Security** β Cognito auth, WAF, VPC Endpoints, private subnets, KMS encryption, Secrets Manager
- **π Runtime Subdomain** β Agent-built apps accessible via `{port}-{convId}.runtime.{subdomain}.{domain}`
- **π Observability** β CloudWatch Logs, Alarms, Container Insights, AWS Backup (14-day retention)
- **ποΈ Warm Pool** β Pre-warmed sandbox tasks for instant conversation starts
## Architecture Overview
```
User β CloudFront (WAF+Lambda@Edge Auth) β ALB (origin verified) β ECS Fargate (App + OpenResty)
β β
βββ Cognito (OAuth2, Managed Login v2) Cloud Map β Sandbox Fargate Tasks
β
VPC Endpoints β Bedrock / CloudWatch Logs
β
RDS Proxy β Aurora Serverless v2 PostgreSQL
Sandbox Orchestration:
App β Orchestrator Lambda β DynamoDB Registry β Sandbox Fargate Tasks (per-conversation EFS isolation)
Runtime Apps:
{port}-{convId}.runtime.{subdomain}.{domain} β CloudFront β Lambda@Edge β OpenResty β Sandbox Fargate Task
```
> π For a detailed architecture deep dive (10-stack breakdown, data flows, sandbox lifecycle), see [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md).
## π Quick Start
### Prerequisites
- AWS CLI configured with appropriate credentials
- Node.js 22+ and npm
- Existing VPC with private subnets and NAT Gateway
- Existing Route 53 Hosted Zone
### 1. Install Dependencies
```bash
git clone https://github.com/zxkane/openhands-infra.git
cd openhands-infra
npm install
```
### 2. Bootstrap CDK (First Time Only)
```bash
npx cdk bootstrap --region
npx cdk bootstrap --region us-east-1 # Required for Lambda@Edge and CloudFront
```
### 3. Create Sandbox Secret Key (First Time Only)
```bash
aws secretsmanager create-secret \
--name openhands/sandbox-secret-key \
--secret-string "$(openssl rand -base64 32)" \
--region \
--description "OpenHands sandbox secret key for session encryption"
```
> **Note**: This secret must exist in each region where you deploy.
### 4. Deploy
```bash
npx cdk deploy --all \
--context vpcId= \
--context hostedZoneId= \
--context domainName= \
--context subDomain= \
--context region= \
--require-approval never
```
That's it! Access OpenHands at `https://.` π
## Configuration
π All Context Parameters
| Parameter | Description | Example |
|-----------|-------------|---------|
| `vpcId` | Existing VPC ID | `vpc-0123456789abcdef0` |
| `hostedZoneId` | Route 53 Hosted Zone ID | `Z0123456789ABCDEFGHIJ` |
| `domainName` | Domain name | `example.com` |
| `subDomain` | Subdomain for OpenHands | `openhands` |
| `region` | AWS region (optional, defaults to us-east-1) | `us-west-2` |
| `siteName` | Cognito managed login site name (optional) | `Openhands on AWS` |
| `authCallbackDomains` | Extra OAuth callback domains for shared Cognito client (optional; JSON array or comma-separated) | `["openhands.example.com","openhands.test.example.com"]` |
| `authDomainPrefixSuffix` | Suffix for Cognito domain prefix (optional; avoids collisions) | `shared` |
| `edgeStackSuffix` | Suffix for Edge stack name in us-east-1 (optional; enables multiple Edge stacks) | `my-project` |
| `sandboxAwsAccess` | Enable sandbox AWS access (optional, defaults to false) | `true` |
| `sandboxAwsPolicyFile` | Path to custom IAM policy JSON for sandbox (optional) | `config/sandbox-aws-policy.json` |
| `skipS3Endpoint` | Skip S3 Gateway endpoint if VPC already has one (optional) | `true` |
| `warmPoolSize` | Number of pre-warmed sandbox Fargate tasks (optional, default: 2) | `3` |
| `idleTimeoutMinutes` | Minutes before idle sandbox is stopped (optional, default: 30, staging: 10) | `15` |
| `sandboxSociImageUri` | SOCI v2 image URI for Fargate lazy loading (optional, see AGENTS.md) | `:tag-soci` |
## Stack Structure
The project deploys **10 stacks** with automatic dependency resolution:
| Stack | Region | Description |
|-------|--------|-------------|
| `OpenHands-Auth` | us-east-1 | Cognito User Pool + Managed Login v2 branding |
| `OpenHands-Network` | Main | VPC import, VPC Endpoints |
| `OpenHands-Monitoring` | Main | CloudWatch Logs, Alarms, S3 Data Bucket, Backup |
| `OpenHands-Security` | Main | IAM Roles, Security Groups, KMS key |
| `OpenHands-Database` | Main | Aurora Serverless v2 PostgreSQL with RDS Proxy |
| `OpenHands-UserConfig` | Main | User Configuration API Lambda (MCP, Secrets, Integrations) |
| `OpenHands-Cluster` | Main | Shared ECS Cluster + Cloud Map namespace |
| `OpenHands-Sandbox` | Main | Sandbox Fargate tasks, DynamoDB registry, Orchestrator Lambda |
| `OpenHands-Compute` | Main | Fargate services (App + OpenResty), ALB, EFS |
| `OpenHands-Edge-*` | us-east-1 | Lambda@Edge, CloudFront, WAF, Route 53 (per domain/environment) |
**Deployment Order** (handled automatically by CDK):
0. Auth β 1. Network β 2. Monitoring β 3. Security β 4. Database β 5. UserConfig β 6. Cluster β 7. Sandbox β 8. Compute β 9. Edge
## Cost Estimate
### Base Infrastructure (~$250-350/month)
| Component | Monthly Cost (USD) | Notes |
|-----------|--------------------|-------|
| Fargate App Service (1 vCPU / 2 GB ARM64) | ~$30 | Auto-scales 1-3 |
| Fargate OpenResty Service (0.25 vCPU / 512 MB) | ~$8 | Auto-scales 1-3 |
| Fargate Sandbox Tasks | ~$0-50 | On-demand, per-conversation |
| Aurora Serverless v2 | ~$43-80 | 0.5-4 ACU |
| RDS Proxy | ~$18 | |
| CloudFront | ~$85 | 1TB data transfer |
| VPC Endpoints (10) | ~$60 | |
| ALB | ~$25 | |
| Other (EFS, S3, NAT, CW, R53, DDB) | ~$10-50 | Usage-dependent |
### Bedrock LLM Cost (Variable)
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|-------|----------------------|------------------------|
| Claude Opus 4.5 | $5 | $25 |
| Claude Sonnet 4.5 | $3 | $15 |
| Claude Haiku 4.5 | $1 | $5 |
**Example**: 10M input + 2M output tokens/month with Claude Sonnet 4.5 β **$60/month**
## Advanced Topics
π Multi-Domain Deployment
You can deploy multiple OpenHands instances on different domains, all sharing the same backend infrastructure.
### Architecture
```
βββββββββββββββββββββββββββββββββββ
β AuthStack (us-east-1) β
β Shared Cognito User Pool β
β - Multi-domain callbacks β
βββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββββββββββ βββββββββββββββββββββββββββ βββββββββββββββββββββββββββ
β EdgeStack-Domain1 β β EdgeStack-Domain2 β β EdgeStack-DomainN β
β (us-east-1) β β (us-east-1) β β (us-east-1) β
β - CloudFront β β - CloudFront β β - CloudFront β
β - Lambda@Edge β β - Lambda@Edge β β - Lambda@Edge β
β - WAF β β - WAF β β - WAF β
β - Route 53 records β β - Route 53 records β β - Route 53 records β
β - ACM Certificate β β - ACM Certificate β β - ACM Certificate β
βββββββββββββββββββββββββββ βββββββββββββββββββββββββββ βββββββββββββββββββββββββββ
β β β
βββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββ
β ComputeStack (main region) β
β - ALB with origin verification β
β - Fargate services (App+OpenResty) β
β - SSM parameters in us-east-1 β
βββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββ΄βββββββββββββββββββ
βΌ βΌ
βββββββββββββββββββββββββββ βββββββββββββββββββββββββββ
β DatabaseStack β β MonitoringStack β
β Aurora PostgreSQL β β S3, CloudWatch β
βββββββββββββββββββββββββββ βββββββββββββββββββββββββββ
```
### Step 1: Configure Shared Authentication
```bash
npx cdk deploy OpenHands-Auth \
--context vpcId= \
--context hostedZoneId= \
--context domainName= \
--context subDomain=openhands \
--context region= \
--context authCallbackDomains='["openhands.domain1.com","openhands.domain2.com"]' \
--require-approval never
```
### Step 2: Deploy Backend Infrastructure
```bash
npx cdk deploy OpenHands-Network OpenHands-Monitoring OpenHands-Security \
OpenHands-Database OpenHands-UserConfig OpenHands-Cluster \
OpenHands-Sandbox OpenHands-Compute \
--context vpcId= \
--context hostedZoneId= \
--context domainName= \
--context subDomain=openhands \
--context region= \
--require-approval never
```
### Step 3: Deploy Edge Stacks for Each Domain
```bash
# Domain 1
npx cdk deploy OpenHands-Edge-Test \
--context vpcId= \
--context hostedZoneId= \
--context domainName=test.example.com \
--context subDomain=openhands \
--context region= \
--context edgeStackSuffix=Test \
--exclusively \
--require-approval never
# Domain 2
npx cdk deploy OpenHands-Edge-Prod \
--context vpcId= \
--context hostedZoneId= \
--context domainName=prod.example.com \
--context subDomain=openhands \
--context region= \
--context edgeStackSuffix=Prod \
--exclusively \
--require-approval never
```
**Important**: Use `--exclusively` flag when deploying individual Edge stacks to avoid redeploying the backend stacks with different domain context.
### Managing Domains
**Adding a new domain:**
1. Update Auth stack with the new callback domain
2. Deploy a new Edge stack with `--context edgeStackSuffix= --exclusively`
**Removing a domain:**
1. `aws cloudformation delete-stack --stack-name OpenHands-Edge- --region us-east-1`
2. Optionally update Auth stack to remove the callback domain
π Conversation Resume (Self-Healing)
When sandbox Fargate tasks stop (idle timeout, crash, or deployment), conversations become `ARCHIVED`. All data is preserved:
| Data | Storage | Survives Task Stop |
|------|---------|-------------------|
| Conversation metadata | Aurora PostgreSQL | β
|
| Conversation events/history | S3 | β
|
| Workspace files | EFS (per-conversation access point) | β
|
**Auto-Resume Flow:**
```
User clicks archived conversation
β
Frontend detects ARCHIVED status
β
Calls POST /api/v1/app-conversations/{id}/resume
β
App β Orchestrator Lambda:
- Creates new EFS access point for conversation
- Registers new task definition with access point
- Launches Fargate sandbox task
- Updates DynamoDB registry
β
Page reloads β conversation is usable again
```
Workspace files on EFS are preserved via the access point, so code and files from the previous session remain available after resume.
π Sandbox AWS Access
Enable AI agents in sandbox containers to access AWS services with scoped IAM credentials:
```bash
npx cdk deploy --all \
--context sandboxAwsAccess=true \
--context sandboxAwsPolicyFile=config/sandbox-aws-policy.json \
...
```
### β οΈ Customize the Policy File
The default `config/sandbox-aws-policy.json` grants broad permissions. **Customize this for your use case!**
**Example: Purpose-built policy for S3 and DynamoDB only:**
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowS3Access",
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:PutObject", "s3:ListBucket"],
"Resource": ["arn:aws:s3:::my-bucket", "arn:aws:s3:::my-bucket/*"]
},
{
"Sid": "AllowDynamoDB",
"Effect": "Allow",
"Action": ["dynamodb:GetItem", "dynamodb:PutItem", "dynamodb:Query"],
"Resource": "arn:aws:dynamodb:*:*:table/my-table"
}
]
}
```
### Hardcoded Explicit Denies
These actions are **always denied** regardless of your policy:
| Category | Denied Actions |
|----------|----------------|
| IAM Users | `iam:CreateUser`, `iam:DeleteUser`, `iam:CreateAccessKey` |
| IAM Policies | `iam:AttachUserPolicy`, `iam:PutUserPolicy`, `iam:PutRolePolicy` |
| IAM Roles | `iam:CreateRole`, `iam:DeleteRole`, `iam:AttachRolePolicy` |
| Account | `organizations:*`, `account:*`, `billing:*` |
| Role Assumption | `sts:AssumeRole` (prevents lateral movement) |
π Runtime Subdomain Routing
When AI agents run applications (e.g., Flask, Node.js) inside the sandbox, they are accessible via dedicated runtime subdomains:
```
https://{port}-{convId}.runtime.{subdomain}.{domain}/
```
**Example**: `https://5000-abc123def456.runtime.openhands.example.com/`
| Feature | Benefit |
|---------|---------|
| Domain Root | Apps run at `/` β internal routes work correctly |
| Cookie Isolation | Each runtime has isolated cookies |
| Security Headers | X-Frame-Options, CSP, X-XSS-Protection applied automatically |
| No Authentication | Runtime subdomains bypass Cognito (public within conversation) |
### Architecture
```
User Browser
β
https://5000-{convId}.runtime.openhands.example.com/
β
CloudFront (matches *.runtime.* wildcard certificate)
β
Lambda@Edge (viewer-request: parse subdomain, rewrite URI)
β
ALB β OpenResty β Sandbox Discovery (DynamoDB) β User App
```
πΎ Data Persistence
| Data Type | Storage | Persistence |
|-----------|---------|-------------|
| Conversation Metadata | Aurora PostgreSQL | Permanent (via RDS Proxy) |
| Conversation Events | S3 | Permanent (survives task replacement) |
| User Settings / Secrets | S3 | Permanent (KMS envelope encryption) |
| Workspace Files | EFS | Persistent (per-conversation access points) |
| SDK Conversation Cache | EFS | Persistent (enables LLM context restoration) |
| Sandbox Registry | DynamoDB | Permanent (task state, user ownership) |
**Aurora Serverless v2**: PostgreSQL 15.8, RDS Proxy connection pooling, 0.5-4 ACU auto-scaling, 35-day backups.
**S3 Bucket**: SSE-S3 encryption, versioning (30-day retention), RETAIN removal policy.
π Security
- Fargate tasks in private subnets only
- Per-conversation EFS isolation via access points
- All AWS service access via VPC Endpoints
- IAM Roles with least privilege per service
- Database credentials in Secrets Manager
- RDS Proxy with TLS-encrypted connections
- User secrets protected by KMS envelope encryption
- Cognito authentication (30-day sessions)
- Lambda@Edge header spoofing prevention
- WAF protection with rate limiting
- S3 and Aurora storage encryption
**Session Management:**
| Token Type | Validity | Description |
|------------|----------|-------------|
| Access Token | 1 hour | API access token |
| ID Token | 1 day | Identity token (stored in cookie) |
| Refresh Token | 30 days | Used to obtain new tokens |
## VPC Requirements
Your existing VPC must have:
- At least 2 private subnets in different AZs
- NAT Gateway for outbound internet access
- DNS hostnames enabled
## CI/CD
| Workflow | Trigger | Description |
|----------|---------|-------------|
| **CI** | Push/PR to main, develop | Build TypeScript, run all tests (Jest + pytest) |
| **Security Scan** | Push/PR to main, daily | npm audit, Checkov, git-secrets, Semgrep SAST, cfn-lint |
```bash
npm run test # Run all tests
npm run test:ts # TypeScript tests only
npm run test:py # Python tests only
npm run test:ts -- -u # Update snapshots
```
## Useful Commands
```bash
npm run build # Build TypeScript
npm run watch # Watch for changes
npx cdk diff --all # Show diff before deploy
npx cdk synth --all # Synthesize CloudFormation
npx cdk destroy --all # Destroy all stacks
```
## Troubleshooting
Common issues
**VPC Lookup Fails** β Ensure the VPC exists and your AWS credentials have `ec2:DescribeVpcs` permission.
**Certificate Validation Pending** β ACM certificates use DNS validation. Ensure the Hosted Zone is correctly configured.
**Fargate Task Not Starting** β Check CloudWatch Logs at `/openhands/application` for container startup errors. Check ECS service events for Fargate capacity issues.
## Contributing
Contributions are welcome! Please feel free to submit issues and pull requests.
### AI Agent Skills
This project uses [autonomous-dev-team](https://github.com/zxkane/autonomous-dev-team) skills for AI-assisted development with Claude Code, Kiro CLI, and Codex. Install after cloning:
```bash
npx skills add zxkane/autonomous-dev-team -s '*' -a claude-code -a kiro-cli -a codex -y
```
Or restore from the lock file:
```bash
npx skills experimental_install
```
These skills enforce TDD, git worktree isolation, PR workflows, and E2E testing. See `CLAUDE.md` for the full workflow.
## License
This project is licensed under the Apache License 2.0 β see the [LICENSE](LICENSE) file for details.
This infrastructure project deploys [OpenHands](https://github.com/All-Hands-AI/OpenHands). See the [OpenHands License](https://github.com/All-Hands-AI/OpenHands/blob/main/LICENSE) for the main application.
---
**If this project helps you deploy OpenHands, consider giving it a β**
Built with β€οΈ using [AWS CDK](https://aws.amazon.com/cdk/) and [OpenHands](https://github.com/All-Hands-AI/OpenHands)