{"id":29735907,"url":"https://github.com/fonsecagoncalo/ecs-runner-fleet","last_synced_at":"2026-05-06T13:37:32.754Z","repository":{"id":305871304,"uuid":"1022053603","full_name":"FonsecaGoncalo/ECS-Runner-Fleet","owner":"FonsecaGoncalo","description":"AWS Runner Fleet provisions self-hosted GitHub Actions runners as ephemeral ECS Fargate tasks. The control plane reacts to GitHub workflow events, launches runners only when needed, and stops them automatically once the job is complete — providing isolation, scalability, and cost efficiency.","archived":false,"fork":false,"pushed_at":"2025-09-04T16:23:12.000Z","size":1115,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-04T16:27:16.163Z","etag":null,"topics":["actions","aws","ecs","github-aws"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/FonsecaGoncalo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-07-18T11:25:49.000Z","updated_at":"2025-09-04T16:23:15.000Z","dependencies_parsed_at":"2025-07-22T13:05:08.009Z","dependency_job_id":"4706d7c7-1834-44f0-8b2a-810808310b90","html_url":"https://github.com/FonsecaGoncalo/ECS-Runner-Fleet","commit_stats":null,"previous_names":["fonsecagoncalo/ecs-runner-fleet"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/FonsecaGoncalo/ECS-Runner-Fleet","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FonsecaGoncalo%2FECS-Runner-Fleet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FonsecaGoncalo%2FECS-Runner-Fleet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FonsecaGoncalo%2FECS-Runner-Fleet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FonsecaGoncalo%2FECS-Runner-Fleet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/FonsecaGoncalo","download_url":"https://codeload.github.com/FonsecaGoncalo/ECS-Runner-Fleet/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FonsecaGoncalo%2FECS-Runner-Fleet/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":281467275,"owners_count":26506462,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-28T02:00:06.022Z","response_time":60,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["actions","aws","ecs","github-aws"],"created_at":"2025-07-25T14:01:19.442Z","updated_at":"2025-10-28T15:54:14.029Z","avatar_url":"https://github.com/FonsecaGoncalo.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n  \u003cimg alt=\"AWS Runner Fleet logo\" src=\"/assets/img.png\" width=\"20%\" height=\"20%\"\u003e\n\n  **AWS Runner Fleet** — Ephemeral GitHub Actions runners on AWS\n\u003c/div\u003e\n\n\n---\n\n## Overview\n\n**AWS Runner Fleet** provisions self-hosted GitHub Actions runners as **ephemeral ECS Fargate tasks**.\nThe control plane reacts to GitHub workflow events, launches runners only when needed, and stops them automatically once the job is complete — providing **isolation, scalability, and cost efficiency**.\n\n\n## Architecture\n\u003cdiv align=\"center\"\u003e\n    \u003cimg alt=\"arch\" src=\"/assets/arch.png\" width=\"60%\" height=\"60%\"\u003e\n\u003c/div\u003e\n\nThe system consists of two main components:\n\n### 1. Control Plane\n\n- **Lambda function** behind **API Gateway** and **EventBridge**:\n\n  * Validates GitHub webhook signatures.\n  * Requests short-lived runner registration tokens.\n  * Starts ECS Fargate tasks for queued jobs.\n  * Updates runner status in DynamoDB.\n  * Optionally triggers CodeBuild for on-the-fly runner image builds.\n\n- A scheduled Lambda (\"Janitor\") periodically scans the runner table and:\n  - Stops orphaned ECS tasks and marks runners `OFFLINE`.\n  - Fails and cleans up runners stuck in `IMAGE_CREATING`, `STARTING`, or `WAITING_FOR_JOB` beyond a timeout.\n  \n  Configure via Terraform variables:\n  \n  - `runner_ttl_seconds` (default: 7200) — global timeout to clean up any runner.\n  - `janitor_schedule_expression` (default: `rate(5 minutes)`) — EventBridge schedule.\n\n### 2. ECS Fleet\n\nAn **ECS cluster** with:\n\n* IAM roles for execution and runner tasks.\n* ECR repository for runner images.\n* CloudWatch logging.\n* Optional CodeBuild project for dynamic image builds from `image:\u003cbase\u003e` labels.\n\n---\n\n### Event Flow Diagram\n\n```mermaid\nsequenceDiagram\n    participant GitHub\n    participant API Gateway\n    participant EventBridge\n    participant Lambda (Control Plane)\n    participant ECS Fargate\n    participant DynamoDB\n    participant CodeBuild\n\n    GitHub-\u003e\u003eAPI Gateway: workflow_job webhook\n    API Gateway-\u003e\u003eEventBridge: Forward event\n    EventBridge-\u003e\u003eLambda (Control Plane): Trigger\n    Lambda (Control Plane)-\u003e\u003eGitHub: Request runner token\n    alt Image label present\n        Lambda (Control Plane)-\u003e\u003eCodeBuild: Start image build\n        CodeBuild--\u003e\u003eLambda (Control Plane): Build complete\n    end\n    Lambda (Control Plane)-\u003e\u003eECS Fargate: Launch task with token\n    ECS Fargate-\u003e\u003eGitHub: Register as runner\n    ECS Fargate--\u003e\u003eGitHub: Run job\n    ECS Fargate-\u003e\u003eEventBridge: Status update\n    EventBridge-\u003e\u003eLambda (Control Plane): Trigger update\n    Lambda (Control Plane)-\u003e\u003eDynamoDB: Save status\n```\n\n### Janitor Cleanup Flow\n\n```mermaid\nsequenceDiagram\n    participant EB as EventBridge\n    participant J as Janitor Lambda\n    participant D as DynamoDB\n    participant E as ECS\n\n    EB-\u003e\u003eJ: Scheduled trigger (rate/cron)\n    J-\u003e\u003eD: Scan runners table\n    loop For each runner older than TTL\n        alt Has task_id\n            J-\u003e\u003eE: stop_task(task_id)\n            J-\u003e\u003eD: Mark state (FAILED if active, else OFFLINE)\n        else No task\n            J-\u003e\u003eD: Mark OFFLINE/FAILED accordingly\n        end\n    end\n```\n\n---\n\n## Runner Lifecycle\n\n- STARTING: created in DynamoDB; awaiting image/task launch.\n- IMAGE_CREATING: CodeBuild building a custom image for requested `image:` label.\n- WAITING_FOR_JOB: ECS task started; runner registered; waiting for job assignment.\n- RUNNING: runner executing a job.\n- OFFLINE: task stopped and runner is no longer available.\n- FAILED: terminal error (image build failed, task launch failed, or exceeded TTL).\n\nTransitions are persisted in DynamoDB under `status` with timestamps. The Janitor enforces timeouts.\n\n## Terraform Module\n\nAll infrastructure is defined in a single Terraform module, composed of:\n\n* `ecs-fleet` — ECS cluster, IAM roles, ECR repo, CloudWatch logs.\n* `control-plane` — Lambda orchestration, EventBridge rules, API Gateway.\n* (Optional) `image-build-project` — CodeBuild for dynamic runner images.\n\n### Core Variables\n\n| Variable              | Description                                     |\n| --------------------- | ----------------------------------------------- |\n| `aws_region`          | AWS region for all resources                    |\n| `github_pat`          | GitHub PAT for registering runners              |\n| `github_repo`         | Repository (`owner/repo`) owning the runners    |\n| `webhook_secret`      | Secret for validating GitHub webhooks           |\n| `subnet_ids`          | Subnets for Fargate tasks                       |\n| `security_groups`     | Security groups for the tasks                   |\n| `runner_image_tag`    | Base tag for the runner Docker image            |\n| `runner_class_sizes`  | Map of runner sizes (`cpu`, `memory`)           |\n| `event_bus_name`      | EventBridge bus name                            |\n| `image_build_project` | (Optional) CodeBuild project for dynamic builds |\n\n**Outputs include:**\n\n* Webhook URL for GitHub.\n* DynamoDB table name.\n* ECS, ECR, and IAM resource ARNs.\n\n---\n\n## Dynamic Images\n\nIf `image_build_project` is set, jobs can use:\n\n```yaml\nruns-on: [self-hosted, image:ubuntu:22.04]\n```\n\nThe control plane will:\n\n1. Trigger CodeBuild to build a runner image using [`runner/Dockerfile`](runner/Dockerfile) with the specified base image.\n2. Push the image to ECR.\n3. Launch a runner task with the built image.\n\nSubsequent jobs reuse the image if it exists.\n\n### Labels\n\n- Required base: `self-hosted`\n- Image selection: `image:\u003cbase\u003e`, e.g. `image:ubuntu:22.04` or `image:ghcr.io/org/image:tag`\n- Class sizing: `class:\u003cname\u003e`, e.g. `class:medium` (maps CPU/memory overrides)\n\nExamples:\n\n```yaml\nruns-on: [self-hosted, image:ubuntu:22.04, class:medium]\nruns-on: [self-hosted, image:public.ecr.aws/docker/library/node:20, class:large]\n```\n\n### Class Sizes (SSM format)\n\nStore CPU/memory overrides as a JSON string in SSM (the module wires the parameter ARN):\n\n```json\n{\n  \"small\":  { \"cpu\": 512,  \"memory\": 1024 },\n  \"medium\": { \"cpu\": 1024, \"memory\": 2048 },\n  \"large\":  { \"cpu\": 2048, \"memory\": 4096 }\n}\n```\n\n---\n\n## CLI Tool\n\nThe included CLI (`ecsrunner_cli.py`) lets you inspect runners and class sizes.\n\nSet:\n\n```bash\nexport RUNNER_TABLE=\u003cdynamodb_table_name\u003e\nexport CLASS_SIZES_PARAM=\u003cssm_param_name\u003e\n```\n\n### Examples:\n\n```bash\n# List all runners\npython ecsrunner_cli.py runners list\n\n# Show runner details\npython ecsrunner_cli.py runners details \u003crunner_id\u003e\n\n# Terminate a runner by ID\npython ecsrunner_cli.py runners terminate \u003crunner_id\u003e\n\n# Show class sizes from SSM\npython ecsrunner_cli.py list-class-sizes\n```\n\n---\n\n## Getting Started\n\n### 1. Install prerequisites\n\n* [Terraform](https://developer.hashicorp.com/terraform/downloads)\n* AWS CLI with credentials configured.\n* Python 3.9+.\n\n### 2. Define Terraform module\n\nCreate `ecs-fleet.tf` with:\n\n```hcl\nmodule \"ecs-fleet\" {\n  source = \"./../..\" # path to this module\n\n  github_pat  = \"ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\"\n  github_repo = \"your-github-org/your-repo\"\n\n  subnet_ids      = module.vpc.public_subnets\n  security_groups = [aws_security_group.ecs_tasks_sg.id]\n  webhook_secret  = \"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\"\n\n  image_build_project = \"image_builder\"\n\n  runner_class_sizes = {\n    small = {\n      cpu    = 512\n      memory = 1024\n    }\n    medium = {\n      cpu    = 1024\n      memory = 2048\n    }\n    large = {\n      cpu    = 2048\n      memory = 4096\n    }\n  }\n}\n```\n\n### 3. Install Lambda dependencies\n\n```bash\npip install -r lambda/control_plane/requirements.txt -t lambda/control_plane\n```\n\n### 4. Deploy infrastructure\n\n```bash\nterraform init\nterraform apply\n```\n\n### 5. Configure GitHub webhook\n\n* Use the `webhook_url` output from Terraform.\n* Event type: `workflow_job`.\n* Method: `POST /webhook`.\n\n---\n\n## Example Workflow\n\n```yaml\njobs:\n  build:\n    runs-on: [self-hosted, class:medium, image:ubuntu:22.04]\n    steps:\n      - uses: actions/checkout@v4\n      - name: Install dependencies\n        run: sudo apt-get update \u0026\u0026 sudo apt-get install -y make\n      - name: Run tests\n        run: make test\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffonsecagoncalo%2Fecs-runner-fleet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffonsecagoncalo%2Fecs-runner-fleet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffonsecagoncalo%2Fecs-runner-fleet/lists"}