{"id":48654705,"url":"https://github.com/nashspence/gpu-service-manager","last_synced_at":"2026-04-10T09:01:02.520Z","repository":{"id":350401229,"uuid":"1206633304","full_name":"nashspence/gpu-service-manager","owner":"nashspence","description":"FastAPI service that serializes access to a single GPU host by leasing one Docker Compose stack at a time, with readiness checks, persistent state, and queued handoff.","archived":false,"fork":false,"pushed_at":"2026-04-10T07:22:58.000Z","size":35,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-10T08:25:15.088Z","etag":null,"topics":["docker","docker-compose","fastapi","gpu","gpu-scheduling","homelab","nvidia-gpu","python","queueing","resource-manager","self-hosted","service-orchestration"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nashspence.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-10T05:25:25.000Z","updated_at":"2026-04-10T07:24:57.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/nashspence/gpu-service-manager","commit_stats":null,"previous_names":["nashspence/nvidia-rtx-pro-4000-blackwell-manager"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/nashspence/gpu-service-manager","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nashspence%2Fgpu-service-manager","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nashspence%2Fgpu-service-manager/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nashspence%2Fgpu-service-manager/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nashspence%2Fgpu-service-manager/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nashspence","download_url":"https://codeload.github.com/nashspence/gpu-service-manager/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nashspence%2Fgpu-service-manager/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31635969,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-10T07:40:12.752Z","status":"ssl_error","status_checked_at":"2026-04-10T07:40:11.664Z","response_time":98,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["docker","docker-compose","fastapi","gpu","gpu-scheduling","homelab","nvidia-gpu","python","queueing","resource-manager","self-hosted","service-orchestration"],"created_at":"2026-04-10T09:00:52.057Z","updated_at":"2026-04-10T09:01:02.513Z","avatar_url":"https://github.com/nashspence.png","language":"Python","readme":"# GPU Service Manager\n\n`gpu-service-manager` is a small FastAPI service that keeps a single GPU host predictable by allowing only one Docker Compose service stack to hold the GPU lease at a time.\n\nThe practical goal is simple: if you have one NVIDIA RTX Pro 4000 Blackwell and several heavy stacks that can each push VRAM usage hard, this gives you a clean way to run one known stack at a time and avoid accidental overlap and OOM churn.\n\n## What It Does\n\n- Discovers candidate service stacks under `services/\u003ctarget\u003e/`\n- Starts exactly one target stack at a time with `docker compose up -d`\n- Waits for one designated container healthcheck before declaring the stack ready\n- Persists lease and queue state on disk\n- Serializes access so callers cannot accidentally bring up multiple GPU-heavy stacks at once\n- Supports queued handoff when the GPU is busy\n\n## How It Works\n\nEach managed target is a Docker Compose project. A client calls `POST /acquire` to request a target. If the GPU is idle, that target is started and a lease is issued. If the GPU is already leased, the caller can optionally join a priority queue.\n\nWhen the active lease is released, the queue head gets a short claim window. Only that queued token can claim the GPU during that window. Fresh callers cannot skip the queue.\n\nThe manager enforces a single active stack. If a new target is acquired, any other managed targets are brought down before the new target is started.\n\n## Service Contract\n\nEach target must live in its own directory under `services/` and include one supported Compose filename:\n\n- `docker-compose.yml`\n- `docker-compose.yaml`\n- `compose.yml`\n- `compose.yaml`\n\nExactly one service in that Compose project must be marked as the readiness master:\n\n```yaml\nservices:\n  api:\n    labels:\n      gpu.healthcheck-master: \"true\"\n    healthcheck:\n      test: [\"CMD\", \"curl\", \"-f\", \"http://127.0.0.1:8080/healthz\"]\n      interval: 5s\n      timeout: 3s\n      retries: 20\n```\n\nThat labeled container is the one inspected for Docker health. Acquire fails if:\n\n- no service has the label\n- more than one service has the label\n- the labeled service has no Docker `healthcheck`\n- the labeled service becomes unhealthy or exits\n\n## Repository Layout\n\n```text\n.\n├── app.py\n├── docker-compose.yml\n├── requirements.txt\n├── requirements-dev.txt\n└── services/\n    └── \u003ctarget\u003e/\n        └── docker-compose.yml\n```\n\nThe repository also includes `services/dummy-*` targets used for local integration and stress testing.\n\n## Configuration\n\nThe manager uses these environment variables:\n\n| Variable | Required | Default | Description |\n| --- | --- | --- | --- |\n| `GPU_HOST_SERVICES_DIR` | yes | none | Host path containing target Compose projects and optional `.env` |\n| `GPU_HOST_RUNTIME_DIR` | yes | none | Host path used for persisted lease and queue state |\n| `GPU_SERVICES_DIR` | no | `/services` | Services path inside the manager container |\n| `GPU_RUNTIME_DIR` | no | `/runtime` | Runtime state path inside the manager container |\n| `GPU_ENV_FILE` | no | `/services/.env` | Optional env file passed to every `docker compose` invocation |\n| `DEFAULT_WAIT_S` | no | `900` | Default readiness wait timeout for `acquire` |\n| `DEFAULT_LEASE_TTL_S` | no | `1800` | Default lease lifetime |\n| `QUEUE_CLAIM_WINDOW_S` | no | `10` | How long the queue head has to claim the GPU after release |\n| `DOCKER_SOCK` | no | `/var/run/docker.sock` | Docker socket path |\n| `HEALTHCHECK_MASTER_LABEL` | no | `gpu.healthcheck-master` | Label key used to choose readiness master |\n| `HEALTHCHECK_MASTER_VALUE` | no | `true` | Label value used to choose readiness master |\n\nIf `${GPU_ENV_FILE}` exists, it is passed to every `docker compose` invocation with `--env-file`.\n\n## Running It\n\nThe latest published container image is:\n\n```text\nghcr.io/nashspence/gpu-service-manager:latest\n```\n\nFor a normal deployment, use a minimal `docker-compose.yml` and `.env` like this:\n\n`docker-compose.yml`\n\n```yaml\nservices:\n  gpu-service-manager:\n    image: ghcr.io/nashspence/gpu-service-manager:latest\n    container_name: gpu-service-manager\n    restart: unless-stopped\n    ports:\n      - \"8080:8080\"\n    environment:\n      GPU_HOST_SERVICES_DIR: ${GPU_HOST_SERVICES_DIR}\n      GPU_HOST_RUNTIME_DIR: ${GPU_HOST_RUNTIME_DIR}\n    volumes:\n      - ${GPU_HOST_SERVICES_DIR}:/services\n      - ${GPU_HOST_RUNTIME_DIR}:/runtime\n      - /var/run/docker.sock:/var/run/docker.sock\n```\n\n`.env`\n\n```dotenv\nGPU_HOST_SERVICES_DIR=/opt/gpu-service-manager/services\nGPU_HOST_RUNTIME_DIR=/opt/gpu-service-manager/runtime\n```\n\nThen start the manager:\n\n```bash\ndocker compose up -d\n```\n\nThis configuration runs the manager on port `8080` and mounts:\n\n- `${GPU_HOST_SERVICES_DIR}` at `/services`\n- `${GPU_HOST_RUNTIME_DIR}` at `/runtime`\n- `/var/run/docker.sock`\n\nFor local development from this repository, you can still build and run the included top-level Compose file:\n\n```bash\nexport GPU_HOST_SERVICES_DIR=\"$PWD/services\"\nexport GPU_HOST_RUNTIME_DIR=\"$PWD/runtime\"\ndocker compose up -d --build\n```\n\n## API\n\n### `GET /healthz`\n\nSimple manager liveness check.\n\n### `GET /status`\n\nReturns:\n\n- current public lease state\n- current queue state\n- current running service status, if any\n- discovered services\n\n### `POST /acquire`\n\nAcquire a target or refresh an existing lease.\n\nExample:\n\n```bash\ncurl -X POST http://localhost:8080/acquire \\\n  -H 'content-type: application/json' \\\n  -d '{\"target\":\"my-stack\",\"owner\":\"me\"}'\n```\n\nRefresh an active lease:\n\n```bash\ncurl -X POST http://localhost:8080/acquire \\\n  -H 'content-type: application/json' \\\n  -d '{\"target\":\"my-stack\",\"lease_token\":\"\u003cactive-token\u003e\"}'\n```\n\nJoin the queue when the GPU is busy:\n\n```bash\ncurl -X POST http://localhost:8080/acquire \\\n  -H 'content-type: application/json' \\\n  -d '{\"target\":\"my-stack\",\"owner\":\"batch-job\",\"priority\":100}'\n```\n\nRequest body:\n\n| Field | Required | Description |\n| --- | --- | --- |\n| `target` | yes | Service target directory name under `services/` |\n| `owner` | no | Human-readable owner string |\n| `lease_token` | no | Existing active lease token or queued token |\n| `lease_ttl_s` | no | Lease TTL override |\n| `wait_s` | no | Readiness timeout override |\n| `wait_ready` | no | Wait for readiness before returning, defaults to `true` |\n| `priority` | no | Queue priority when the GPU is busy |\n\nBehavior:\n\n- If idle, the target is started and a lease is granted.\n- If the same active token is presented again for the same target, the lease is refreshed.\n- If busy and `priority` is set, the caller is added to the queue.\n- If busy and no queue priority is supplied, the call is rejected with `409`.\n- If a queued token reaches the front after release, that token can claim the GPU during the claim window.\n\n### `POST /release`\n\nRelease the active lease.\n\n```bash\ncurl -X POST http://localhost:8080/release \\\n  -H 'content-type: application/json' \\\n  -d '{\"lease_token\":\"\u003ctoken\u003e\"}'\n```\n\nForce release without a token:\n\n```bash\ncurl -X POST http://localhost:8080/release \\\n  -H 'content-type: application/json' \\\n  -d '{\"force\":true}'\n```\n\nNotes:\n\n- Releasing removes the lease.\n- Releasing does not immediately stop the current target stack.\n- A different target acquisition will bring other managed targets down before starting the new one.\n\n## Queue Semantics\n\n- Higher `priority` wins.\n- Equal priority is FIFO.\n- The queue head only gets a claim deadline after the active lease is released.\n- While a queue head is waiting to claim, other callers cannot jump ahead.\n- Queued ownership is preserved when that queued token later claims the GPU.\n\n## State Files\n\nRuntime state is stored under:\n\n- `runtime/state.json`\n- `runtime/lease.lock`\n\nThis lets the manager survive restarts without losing the lease and queue model.\n\n## Development\n\nInstall dependencies:\n\n```bash\npython3 -m pip install -r requirements-dev.txt\n```\n\nRun the full test suite:\n\n```bash\npython3 -m pytest --cov=app --cov-report=term-missing\n```\n\nRun the live HTTP stress test only:\n\n```bash\npython3 -m pytest tests/test_gpu_service_manager_stress.py -q\n```\n\n## Test Coverage\n\nThe test suite exercises:\n\n- service discovery across all supported Compose filenames\n- happy-path acquire, status, refresh, and release\n- readiness master validation failures\n- queue fairness and claim-window behavior\n- live multi-worker stress with concurrent enqueue, status polling, bad-token retries, and claim races\n\n## Limitations\n\n- This manager coordinates only the Compose projects it knows about under `services/`.\n- It does not stop unrelated containers running outside that set.\n- It assumes Docker healthchecks are a reliable signal for stack readiness.\n- It is designed for one managed GPU host, not distributed scheduling across multiple machines.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnashspence%2Fgpu-service-manager","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnashspence%2Fgpu-service-manager","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnashspence%2Fgpu-service-manager/lists"}