https://github.com/nashspence/gpu-service-manager

FastAPI service that serializes access to a single GPU host by leasing one Docker Compose stack at a time, with readiness checks, persistent state, and queued handoff.
https://github.com/nashspence/gpu-service-manager

docker docker-compose fastapi gpu gpu-scheduling homelab nvidia-gpu python queueing resource-manager self-hosted service-orchestration

Last synced: 19 days ago
JSON representation

FastAPI service that serializes access to a single GPU host by leasing one Docker Compose stack at a time, with readiness checks, persistent state, and queued handoff.

Host: GitHub
URL: https://github.com/nashspence/gpu-service-manager
Owner: nashspence
Created: 2026-04-10T05:25:25.000Z (19 days ago)
Default Branch: main
Last Pushed: 2026-04-10T07:22:58.000Z (19 days ago)
Last Synced: 2026-04-10T08:25:15.088Z (19 days ago)
Topics: docker, docker-compose, fastapi, gpu, gpu-scheduling, homelab, nvidia-gpu, python, queueing, resource-manager, self-hosted, service-orchestration
Language: Python
Homepage:
Size: 34.2 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# GPU Service Manager

`gpu-service-manager` is a small FastAPI service that keeps a single GPU host predictable by allowing only one Docker Compose service stack to hold the GPU lease at a time.

The practical goal is simple: if you have one NVIDIA RTX Pro 4000 Blackwell and several heavy stacks that can each push VRAM usage hard, this gives you a clean way to run one known stack at a time and avoid accidental overlap and OOM churn.

## What It Does

- Discovers candidate service stacks under `services//`
- Starts exactly one target stack at a time with `docker compose up -d`
- Waits for one designated container healthcheck before declaring the stack ready
- Persists lease and queue state on disk
- Serializes access so callers cannot accidentally bring up multiple GPU-heavy stacks at once
- Supports queued handoff when the GPU is busy

## How It Works

Each managed target is a Docker Compose project. A client calls `POST /acquire` to request a target. If the GPU is idle, that target is started and a lease is issued. If the GPU is already leased, the caller can optionally join a priority queue.

When the active lease is released, the queue head gets a short claim window. Only that queued token can claim the GPU during that window. Fresh callers cannot skip the queue.

The manager enforces a single active stack. If a new target is acquired, any other managed targets are brought down before the new target is started.

## Service Contract

Each target must live in its own directory under `services/` and include one supported Compose filename:

- `docker-compose.yml`
- `docker-compose.yaml`
- `compose.yml`
- `compose.yaml`

Exactly one service in that Compose project must be marked as the readiness master:

```yaml
services:
api:
labels:
gpu.healthcheck-master: "true"
healthcheck:
test: ["CMD", "curl", "-f", "http://127.0.0.1:8080/healthz"]
interval: 5s
timeout: 3s
retries: 20
```

That labeled container is the one inspected for Docker health. Acquire fails if:

- no service has the label
- more than one service has the label
- the labeled service has no Docker `healthcheck`
- the labeled service becomes unhealthy or exits

## Repository Layout

```text
.
├── app.py
├── docker-compose.yml
├── requirements.txt
├── requirements-dev.txt
└── services/
└── /
└── docker-compose.yml
```

The repository also includes `services/dummy-*` targets used for local integration and stress testing.

## Configuration

The manager uses these environment variables:

| Variable | Required | Default | Description |
| --- | --- | --- | --- |
| `GPU_HOST_SERVICES_DIR` | yes | none | Host path containing target Compose projects and optional `.env` |
| `GPU_HOST_RUNTIME_DIR` | yes | none | Host path used for persisted lease and queue state |
| `GPU_SERVICES_DIR` | no | `/services` | Services path inside the manager container |
| `GPU_RUNTIME_DIR` | no | `/runtime` | Runtime state path inside the manager container |
| `GPU_ENV_FILE` | no | `/services/.env` | Optional env file passed to every `docker compose` invocation |
| `DEFAULT_WAIT_S` | no | `900` | Default readiness wait timeout for `acquire` |
| `DEFAULT_LEASE_TTL_S` | no | `1800` | Default lease lifetime |
| `QUEUE_CLAIM_WINDOW_S` | no | `10` | How long the queue head has to claim the GPU after release |
| `DOCKER_SOCK` | no | `/var/run/docker.sock` | Docker socket path |
| `HEALTHCHECK_MASTER_LABEL` | no | `gpu.healthcheck-master` | Label key used to choose readiness master |
| `HEALTHCHECK_MASTER_VALUE` | no | `true` | Label value used to choose readiness master |

If `${GPU_ENV_FILE}` exists, it is passed to every `docker compose` invocation with `--env-file`.

## Running It

The latest published container image is:

```text
ghcr.io/nashspence/gpu-service-manager:latest
```

For a normal deployment, use a minimal `docker-compose.yml` and `.env` like this:

`docker-compose.yml`

```yaml
services:
gpu-service-manager:
image: ghcr.io/nashspence/gpu-service-manager:latest
container_name: gpu-service-manager
restart: unless-stopped
ports:
- "8080:8080"
environment:
GPU_HOST_SERVICES_DIR: ${GPU_HOST_SERVICES_DIR}
GPU_HOST_RUNTIME_DIR: ${GPU_HOST_RUNTIME_DIR}
volumes:
- ${GPU_HOST_SERVICES_DIR}:/services
- ${GPU_HOST_RUNTIME_DIR}:/runtime
- /var/run/docker.sock:/var/run/docker.sock
```

`.env`

```dotenv
GPU_HOST_SERVICES_DIR=/opt/gpu-service-manager/services
GPU_HOST_RUNTIME_DIR=/opt/gpu-service-manager/runtime
```

Then start the manager:

```bash
docker compose up -d
```

This configuration runs the manager on port `8080` and mounts:

- `${GPU_HOST_SERVICES_DIR}` at `/services`
- `${GPU_HOST_RUNTIME_DIR}` at `/runtime`
- `/var/run/docker.sock`

For local development from this repository, you can still build and run the included top-level Compose file:

```bash
export GPU_HOST_SERVICES_DIR="$PWD/services"
export GPU_HOST_RUNTIME_DIR="$PWD/runtime"
docker compose up -d --build
```

## API

### `GET /healthz`

Simple manager liveness check.

### `GET /status`

Returns:

- current public lease state
- current queue state
- current running service status, if any
- discovered services

### `POST /acquire`

Acquire a target or refresh an existing lease.

Example:

```bash
curl -X POST http://localhost:8080/acquire \
-H 'content-type: application/json' \
-d '{"target":"my-stack","owner":"me"}'
```

Refresh an active lease:

```bash
curl -X POST http://localhost:8080/acquire \
-H 'content-type: application/json' \
-d '{"target":"my-stack","lease_token":""}'
```

Join the queue when the GPU is busy:

```bash
curl -X POST http://localhost:8080/acquire \
-H 'content-type: application/json' \
-d '{"target":"my-stack","owner":"batch-job","priority":100}'
```

Request body:

| Field | Required | Description |
| --- | --- | --- |
| `target` | yes | Service target directory name under `services/` |
| `owner` | no | Human-readable owner string |
| `lease_token` | no | Existing active lease token or queued token |
| `lease_ttl_s` | no | Lease TTL override |
| `wait_s` | no | Readiness timeout override |
| `wait_ready` | no | Wait for readiness before returning, defaults to `true` |
| `priority` | no | Queue priority when the GPU is busy |

Behavior:

- If idle, the target is started and a lease is granted.
- If the same active token is presented again for the same target, the lease is refreshed.
- If busy and `priority` is set, the caller is added to the queue.
- If busy and no queue priority is supplied, the call is rejected with `409`.
- If a queued token reaches the front after release, that token can claim the GPU during the claim window.

### `POST /release`

Release the active lease.

```bash
curl -X POST http://localhost:8080/release \
-H 'content-type: application/json' \
-d '{"lease_token":""}'
```

Force release without a token:

```bash
curl -X POST http://localhost:8080/release \
-H 'content-type: application/json' \
-d '{"force":true}'
```

Notes:

- Releasing removes the lease.
- Releasing does not immediately stop the current target stack.
- A different target acquisition will bring other managed targets down before starting the new one.

## Queue Semantics

- Higher `priority` wins.
- Equal priority is FIFO.
- The queue head only gets a claim deadline after the active lease is released.
- While a queue head is waiting to claim, other callers cannot jump ahead.
- Queued ownership is preserved when that queued token later claims the GPU.

## State Files

Runtime state is stored under:

- `runtime/state.json`
- `runtime/lease.lock`

This lets the manager survive restarts without losing the lease and queue model.

## Development

Install dependencies:

```bash
python3 -m pip install -r requirements-dev.txt
```

Run the full test suite:

```bash
python3 -m pytest --cov=app --cov-report=term-missing
```

Run the live HTTP stress test only:

```bash
python3 -m pytest tests/test_gpu_service_manager_stress.py -q
```

## Test Coverage

The test suite exercises:

- service discovery across all supported Compose filenames
- happy-path acquire, status, refresh, and release
- readiness master validation failures
- queue fairness and claim-window behavior
- live multi-worker stress with concurrent enqueue, status polling, bad-token retries, and claim races

## Limitations

- This manager coordinates only the Compose projects it knows about under `services/`.
- It does not stop unrelated containers running outside that set.
- It assumes Docker healthchecks are a reliable signal for stack readiness.
- It is designed for one managed GPU host, not distributed scheduling across multiple machines.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/nashspence/gpu-service-manager

Awesome Lists containing this project

README