{"id":49550499,"url":"https://github.com/durable-workflow/server","last_synced_at":"2026-07-15T21:01:20.938Z","repository":{"id":350804603,"uuid":"1208272134","full_name":"durable-workflow/server","owner":"durable-workflow","description":"Language-neutral workflow orchestration server for running durable workflows with external workers over HTTP. Supports task polling, schedules, namespaces, history export, role-scoped auth, and Docker/Compose/Kubernetes deployment, with persistence backed by SQLite, MySQL, or PostgreSQL.","archived":false,"fork":false,"pushed_at":"2026-07-14T01:18:55.000Z","size":7665,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-07-14T01:23:42.059Z","etag":null,"topics":["cronjob-scheduler","distributed-cron","distributed-systems","microservice-framework","microservice-orchestration","microservices-architecture","orchestrator","workflow-automation","workflow-engine","workflow-management","workflow-management-system","workflows"],"latest_commit_sha":null,"homepage":"https://durable-workflow.com","language":"PHP","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/durable-workflow.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-12T03:45:35.000Z","updated_at":"2026-07-14T01:18:59.000Z","dependencies_parsed_at":"2026-06-02T02:00:56.940Z","dependency_job_id":null,"html_url":"https://github.com/durable-workflow/server","commit_stats":null,"previous_names":["durable-workflow/server"],"tags_count":669,"template":false,"template_full_name":null,"purl":"pkg:github/durable-workflow/server","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/durable-workflow%2Fserver","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/durable-workflow%2Fserver/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/durable-workflow%2Fserver/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/durable-workflow%2Fserver/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/durable-workflow","download_url":"https://codeload.github.com/durable-workflow/server/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/durable-workflow%2Fserver/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":35520685,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-07-15T02:00:06.706Z","response_time":131,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cronjob-scheduler","distributed-cron","distributed-systems","microservice-framework","microservice-orchestration","microservices-architecture","orchestrator","workflow-automation","workflow-engine","workflow-management","workflow-management-system","workflows"],"created_at":"2026-05-02T22:06:12.848Z","updated_at":"2026-07-15T21:01:20.929Z","avatar_url":"https://github.com/durable-workflow.png","language":"PHP","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Durable Workflow Server\n\nA standalone, language-neutral workflow orchestration server. Write durable workflows in any language. Built on the same engine as [Durable Workflow](https://github.com/durable-workflow/workflow).\n\n## Quick Start\n\n### Official Image + SQLite\n\nUse this path when you want to validate the published image without cloning the\nrepository or starting MySQL/Redis. The image defaults to SQLite, database\nqueues, and file cache; mount `/app/database` so the bootstrap command and API\nserver share the same SQLite file.\n\n```bash\nexport DW_SERVER_IMAGE=durableworkflow/server:0.2\nexport DW_AUTH_TOKEN=dev-token\ndocker volume create durable-workflow-sqlite\n\n# Bootstrap schema + default namespace once.\ndocker run --rm \\\n  -v durable-workflow-sqlite:/app/database \\\n  -e DW_AUTH_DRIVER=token \\\n  -e DW_AUTH_TOKEN=\"$DW_AUTH_TOKEN\" \\\n  \"$DW_SERVER_IMAGE\" server-bootstrap\n\n# Start the API server.\ndocker run --rm --name durable-workflow-server \\\n  -p 8080:8080 \\\n  -v durable-workflow-sqlite:/app/database \\\n  -e DW_AUTH_DRIVER=token \\\n  -e DW_AUTH_TOKEN=\"$DW_AUTH_TOKEN\" \\\n  \"$DW_SERVER_IMAGE\"\n\n# In a separate terminal, run the database queue worker that fires timers.\ndocker run --rm --name durable-workflow-queue-worker \\\n  -v durable-workflow-sqlite:/app/database \\\n  -e DW_AUTH_DRIVER=token \\\n  -e DW_AUTH_TOKEN=\"$DW_AUTH_TOKEN\" \\\n  \"$DW_SERVER_IMAGE\" php artisan queue:work database --sleep=1 --tries=3\n```\n\nIn another terminal:\n\n```bash\ncurl http://localhost:8080/api/health\ncurl http://localhost:8080/api/ready\ncurl -H \"Authorization: Bearer $DW_AUTH_TOKEN\" \\\n  http://localhost:8080/api/cluster/info\n\ncurl -X POST http://localhost:8080/api/worker/register \\\n  -H \"Authorization: Bearer $DW_AUTH_TOKEN\" \\\n  -H \"Content-Type: application/json\" \\\n  -H \"X-Namespace: default\" \\\n  -H \"X-Durable-Workflow-Protocol-Version: 1.13\" \\\n  -d '{\"worker_id\":\"quickstart-worker\",\"task_queue\":\"quickstart\",\"runtime\":\"python\"}'\n```\n\nUse Redis or another shared cache backend for multi-node deployments. The file\ncache default is intentionally scoped to the one-container SQLite quickstart.\n\n### Official Image + Compose\n\nUse this path when you want a source-free multi-container stack backed by MySQL\nand Redis. The same Compose file supports local development and single-node\nproduction; the difference is the environment you provide and the operational\ncare around persistence, backups, and upgrades.\n\nImage selection:\n\n- `DW_SERVER_TAG=0.2` pulls `durableworkflow/server:0.2` from Docker Hub.\n- `DW_SERVER_IMAGE=ghcr.io/durable-workflow/server:0.2` pulls the same release\n  line from GitHub Container Registry.\n- `DW_SERVER_IMAGE=durableworkflow/server@sha256:...` pins an exact image\n  digest for production change control.\n\n#### Local Development Compose\n\nThis recipe is for one developer machine or internal non-production testing. It\nuses the default MySQL/Redis volumes, exposes only the API port, and allows the\nsingle `DW_AUTH_TOKEN` compatibility token for quick verification.\n\n```bash\ncurl -fsSLO https://raw.githubusercontent.com/durable-workflow/server/main/docker-compose.published.yml\n\nexport DW_SERVER_TAG=0.2\nexport DW_AUTH_TOKEN=dev-token\n\ndocker compose -f docker-compose.published.yml up -d --wait\n```\n\nVerify health, readiness, cluster discovery, and worker registration:\n\n```bash\ncurl http://localhost:8080/api/health\ncurl http://localhost:8080/api/ready\ncurl -H \"Authorization: Bearer $DW_AUTH_TOKEN\" \\\n  http://localhost:8080/api/cluster/info\n\ncurl -X POST http://localhost:8080/api/worker/register \\\n  -H \"Authorization: Bearer $DW_AUTH_TOKEN\" \\\n  -H \"Content-Type: application/json\" \\\n  -H \"X-Namespace: default\" \\\n  -H \"X-Durable-Workflow-Protocol-Version: 1.13\" \\\n  -d '{\"worker_id\":\"compose-worker\",\"task_queue\":\"compose\",\"runtime\":\"python\"}'\n```\n\n#### Single-Node Production Compose\n\nThis recipe is for a small self-hosted deployment on one Docker host. It keeps\nMySQL, Redis, and server storage in named volumes, exposes only the API port,\nand expects role-scoped credentials plus an exact image tag or digest.\n\nCreate a production env file outside source control:\n\n```env\nDW_SERVER_IMAGE=durableworkflow/server:0.2\nSERVER_PORT=8080\nAPP_ENV=production\nAPP_DEBUG=false\n\nDB_DATABASE=durable_workflow\nDB_USERNAME=workflow\nDB_PASSWORD=replace-with-random-password\nDB_ROOT_PASSWORD=replace-with-random-root-password\n\nDW_AUTH_DRIVER=token\nDW_AUTH_BACKWARD_COMPATIBLE=false\nDW_WORKER_TOKEN=replace-with-worker-token\nDW_OPERATOR_TOKEN=replace-with-operator-token\nDW_ADMIN_TOKEN=replace-with-admin-token\n```\n\nStart the stack and run the same readiness checks:\n\n```bash\ndocker compose --env-file durable-workflow.prod.env \\\n  -f docker-compose.published.yml up -d --wait\n\ncurl http://localhost:8080/api/health\ncurl http://localhost:8080/api/ready\ncurl -H \"Authorization: Bearer $(grep '^DW_ADMIN_TOKEN=' durable-workflow.prod.env | cut -d= -f2-)\" \\\n  http://localhost:8080/api/cluster/info\n```\n\nRegister SDK workers with `DW_WORKER_TOKEN` and send operator traffic with\n`DW_OPERATOR_TOKEN`. Operator and admin credentials may also call\n`/api/worker/register` for diagnostic worker registration, while heartbeats,\ntask polling, and task completion remain worker-token endpoints. Put TLS,\nrequest logging, and public routing in a reverse proxy in front of the API\ncontainer; do not expose the MySQL or Redis services.\n\nPersistence and backups:\n\n- `mysql_data` is the durable workflow state. Back it up before every image\n  upgrade and on a regular schedule.\n- `redis_data` contains queue/cache state. Preserve it for graceful restarts;\n  MySQL remains the source of truth for workflow history.\n- Keep a copy of the exact env file and image reference with each backup so a\n  restore uses the same auth, database, and image contract.\n\nBackup and restore examples:\n\n```bash\ndocker compose --env-file durable-workflow.prod.env \\\n  -f docker-compose.published.yml exec -T mysql \\\n  sh -lc 'mysqldump -u\"$MYSQL_USER\" -p\"$MYSQL_PASSWORD\" \"$MYSQL_DATABASE\"' \\\n  \u003e durable-workflow-$(date +%Y%m%d%H%M%S).sql\n\ndocker compose --env-file durable-workflow.prod.env \\\n  -f docker-compose.published.yml exec -T mysql \\\n  sh -lc 'mysql -u\"$MYSQL_USER\" -p\"$MYSQL_PASSWORD\" \"$MYSQL_DATABASE\"' \\\n  \u003c durable-workflow-backup.sql\n```\n\nUpgrade order:\n\n1. Back up MySQL and record the current image reference.\n2. Change only `DW_SERVER_IMAGE` or `DW_SERVER_TAG` in the env file.\n3. Run `docker compose --env-file durable-workflow.prod.env -f docker-compose.published.yml pull`.\n4. Run `docker compose --env-file durable-workflow.prod.env -f docker-compose.published.yml up -d --wait`.\n5. Confirm `/api/ready`, `/api/cluster/info`, and worker registration before\n   shifting external traffic.\n\nThe image generates an internal runtime key automatically. Set `DW_SERVER_KEY`\nonly if your deployment needs that key to remain stable across container\nreplacement.\n\nThe published Compose smoke workflow runs this file in both `local` and\n`production` profiles for amd64 and arm64. The `local` profile validates the\nsingle-token development recipe; the `production` profile validates role-scoped\nworker/admin tokens with backward-compatible auth disabled.\n\n### Small Cluster Status\n\nSmall clustered deployments without Kubernetes are validated as a narrow public\nsupport boundary, not as a general HA promise. The current supported shape uses\nexternal MySQL or PostgreSQL plus 2 or 3 API nodes behind a stateless load\nbalancer, shared Redis, at least one server queue worker, and independently\nscaled external workers. The first contract requires exactly one scheduler or\nmaintenance runner. SQLite, Redis-less multi-node mode, duplicate schedulers,\nrolling upgrades, active/active multi-region, Helm-based Kubernetes deployments,\nand provider-specific failover semantics are not part of that first contract.\n\nThe CI harness in `docker-compose.small-cluster.yml` runs the MySQL and\nPostgreSQL variants with two API nodes, one bootstrap job, one Redis queue\nworker, one scheduler, shared Redis, load-balanced health/readiness/cluster-info\nchecks, external worker registration, and a workflow-task poll on one API node\nfollowed by completion on the other. The Phase 0 rationale and harness details\nlive in\n[`docs/small-cluster-validation.md`](docs/small-cluster-validation.md).\n\n### Multi-Region Status\n\nActive/passive multi-region with operator-driven regional failover is a\nself-serve contract. One region runs the validated single-region or\nsmall-cluster shape, and one standby region holds an asynchronously\nreplicated standby database, optional standby Redis, and idle API/worker\ncontainers. The singleton scheduler/maintenance runner runs in the active\nregion only; failover starts it in the promoted region after the database\nis promoted. Active/active multi-region, automatic regional failover, and\nsynchronous cross-region replication remain support-led. The contract,\noperator runbook, and rehearsal expectations live in\n[`docs/multi-region-validation.md`](docs/multi-region-validation.md).\n\n### Docker Compose\n\n```bash\n# Clone the repository\ngit clone https://github.com/durable-workflow/server.git\ncd server\n\n# Copy environment config\ncp .env.example .env\n\n# Start the server with all dependencies\ndocker compose up -d\n\n# Verify\ncurl http://localhost:8080/api/health\ncurl http://localhost:8080/api/ready\n```\n\nCompose runs a one-shot `bootstrap` service before the API and worker\ncontainers start. That service calls the image's `server-bootstrap` command,\nwhich runs migrations and seeds the default namespace.\nThe long-running `server`, `worker`, and `scheduler` services each pin\n`DW_SERVER_TOPOLOGY_SHAPE` and `DW_SERVER_PROCESS_CLASS` so\n`GET /api/cluster/info` reports the role class you actually launched during\nlocal split-role testing.\nThe local compose files pass `WORKFLOW_PACKAGE_REF=2.0.0-alpha.284`, matching\nthe Dockerfile fallback, so `docker compose up --build` works from a clean\ncheckout with Composer metadata aligned to the embedded workflow package.\nOverride `WORKFLOW_PACKAGE_SOURCE`, `WORKFLOW_PACKAGE_REF`, or\n`WORKFLOW_PACKAGE_COMMIT` if you need a different package remote, tag, or\ncommit guard during image builds.\n\n### Using the CLI\n\n```bash\n# Install the CLI from the public release channel\ncurl -fsSL https://durable-workflow.com/install.sh | VERSION=0.1.80 sh\nexport PATH=\"$HOME/.local/bin:$PATH\"\n\n# Start a workflow\ndw workflow start --type=my-workflow --input='{\"name\":\"world\"}'\n\n# List workflows\ndw workflow list\n\n# Check server health\ndw server health\n```\n\n## Getting Started: End-to-End Workflow\n\nThis walkthrough shows the full lifecycle using `curl` — start the server,\ncreate a workflow, poll for tasks, and complete them. Any HTTP client in any\nlanguage follows the same steps.\n\nSet role tokens for convenience (or set `DW_AUTH_DRIVER=none` in\n`.env` to skip auth during development). If you only configure the legacy\n`DW_AUTH_TOKEN`, use the same value for each variable below.\n\n```bash\nexport ADMIN_TOKEN=\"your-admin-token\"\nexport OPERATOR_TOKEN=\"your-operator-token\"\nexport WORKER_TOKEN=\"your-worker-token\"\nexport SERVER=\"http://localhost:8080\"\n```\n\n### 1. Check Server Health\n\n```bash\ncurl $SERVER/api/health\n```\n\n```json\n{\"status\":\"serving\",\"timestamp\":\"2026-04-13T12:00:00Z\"}\n```\n\n### 2. Create a Namespace (or Use the Default)\n\nThe bootstrap seeds a `default` namespace. To create a dedicated one:\n\n```bash\ncurl -X POST $SERVER/api/namespaces \\\n  -H \"Authorization: Bearer $ADMIN_TOKEN\" \\\n  -H \"Content-Type: application/json\" \\\n  -H \"X-Durable-Workflow-Control-Plane-Version: 2\" \\\n  -d '{\"name\":\"my-app\",\"description\":\"My application namespace\",\"retention_days\":30}'\n```\n\n### 3. Start a Workflow\n\n```bash\ncurl -X POST $SERVER/api/workflows \\\n  -H \"Authorization: Bearer $OPERATOR_TOKEN\" \\\n  -H \"Content-Type: application/json\" \\\n  -H \"X-Namespace: default\" \\\n  -H \"X-Durable-Workflow-Control-Plane-Version: 2\" \\\n  -d '{\n    \"workflow_id\": \"order-42\",\n    \"workflow_type\": \"orders.process\",\n    \"task_queue\": \"order-workers\",\n    \"input\": [\"order-42\", {\"rush\": true}],\n    \"execution_timeout_seconds\": 3600,\n    \"run_timeout_seconds\": 600\n  }'\n```\n\n```json\n{\n  \"workflow_id\": \"order-42\",\n  \"run_id\": \"abc123\",\n  \"workflow_type\": \"orders.process\",\n  \"status\": \"pending\",\n  \"outcome\": \"started_new\"\n}\n```\n\n### 4. Register a Worker\n\nBefore polling, register the worker with the server:\n\n```bash\ncurl -X POST $SERVER/api/worker/register \\\n  -H \"Authorization: Bearer $WORKER_TOKEN\" \\\n  -H \"Content-Type: application/json\" \\\n  -H \"X-Namespace: default\" \\\n  -H \"X-Durable-Workflow-Protocol-Version: 1.13\" \\\n  -d '{\n    \"worker_id\": \"worker-1\",\n    \"task_queue\": \"order-workers\",\n    \"runtime\": \"python\",\n    \"supported_workflow_types\": [\"orders.process\"],\n    \"workflow_definition_fingerprints\": {\n      \"orders.process\": \"sha256:...\"\n    }\n  }'\n```\n\nWhen a worker re-registers the same active `worker_id`, any advertised\nworkflow type must keep the same `workflow_definition_fingerprints` value. A\nchanged fingerprint is rejected with `workflow_definition_changed`; restart\nthe process with a new worker id before serving a changed workflow class.\nWorkers that omit fingerprints during re-registration cannot clear previously\nstored fingerprints for workflow types they still advertise; the server keeps\nthe stored value until a new worker id is used.\n\n### 5. Poll for Workflow Tasks\n\nThe server holds the connection open (long-poll) until a task is ready or\nthe timeout expires:\n\n```bash\ncurl -X POST $SERVER/api/worker/workflow-tasks/poll \\\n  -H \"Authorization: Bearer $WORKER_TOKEN\" \\\n  -H \"Content-Type: application/json\" \\\n  -H \"X-Namespace: default\" \\\n  -H \"X-Durable-Workflow-Protocol-Version: 1.13\" \\\n  -d '{\n    \"worker_id\": \"worker-1\",\n    \"task_queue\": \"order-workers\"\n  }'\n```\n\nThe response includes the task, its history events, and lease metadata:\n\n```json\n{\n  \"protocol_version\": \"1.13\",\n  \"task\": {\n    \"task_id\": \"task-xyz\",\n    \"workflow_id\": \"order-42\",\n    \"run_id\": \"abc123\",\n    \"workflow_type\": \"orders.process\",\n    \"workflow_task_attempt\": 1,\n    \"lease_owner\": \"worker-1\",\n    \"task_queue\": \"order-workers\",\n    \"history_events\": [\n      {\"sequence\": 1, \"event_type\": \"StartAccepted\", \"...\": \"...\"},\n      {\"sequence\": 2, \"event_type\": \"WorkflowStarted\", \"...\": \"...\"}\n    ]\n  }\n}\n```\n\n### 6. Complete a Workflow Task\n\nReplay history, execute logic, and return commands. To schedule an activity:\n\n```bash\ncurl -X POST $SERVER/api/worker/workflow-tasks/task-xyz/complete \\\n  -H \"Authorization: Bearer $WORKER_TOKEN\" \\\n  -H \"Content-Type: application/json\" \\\n  -H \"X-Namespace: default\" \\\n  -H \"X-Durable-Workflow-Protocol-Version: 1.13\" \\\n  -d '{\n    \"lease_owner\": \"worker-1\",\n    \"workflow_task_attempt\": 1,\n    \"commands\": [\n      {\n        \"type\": \"schedule_activity\",\n        \"activity_type\": \"orders.send-confirmation\",\n        \"task_queue\": \"order-workers\",\n        \"input\": [\"order-42\"]\n      }\n    ]\n  }'\n```\n\nTo complete the workflow (terminal command):\n\n```bash\ncurl -X POST $SERVER/api/worker/workflow-tasks/task-xyz/complete \\\n  -H \"Authorization: Bearer $WORKER_TOKEN\" \\\n  -H \"Content-Type: application/json\" \\\n  -H \"X-Namespace: default\" \\\n  -H \"X-Durable-Workflow-Protocol-Version: 1.13\" \\\n  -d '{\n    \"lease_owner\": \"worker-1\",\n    \"workflow_task_attempt\": 1,\n    \"commands\": [\n      {\n        \"type\": \"complete_workflow\",\n        \"result\": {\"status\": \"shipped\", \"tracking\": \"TRK-123\"}\n      }\n    ]\n  }'\n```\n\n### 7. Poll and Complete Activity Tasks\n\nIf the workflow scheduled activities, poll for them on the same (or different) queue:\n\n```bash\n# Poll\ncurl -X POST $SERVER/api/worker/activity-tasks/poll \\\n  -H \"Authorization: Bearer $WORKER_TOKEN\" \\\n  -H \"Content-Type: application/json\" \\\n  -H \"X-Namespace: default\" \\\n  -H \"X-Durable-Workflow-Protocol-Version: 1.13\" \\\n  -d '{\"worker_id\": \"worker-1\", \"task_queue\": \"order-workers\"}'\n\n# Complete (use task_id and activity_attempt_id from the poll response)\ncurl -X POST $SERVER/api/worker/activity-tasks/TASK_ID/complete \\\n  -H \"Authorization: Bearer $WORKER_TOKEN\" \\\n  -H \"Content-Type: application/json\" \\\n  -H \"X-Namespace: default\" \\\n  -H \"X-Durable-Workflow-Protocol-Version: 1.13\" \\\n  -d '{\n    \"activity_attempt_id\": \"ATTEMPT_ID\",\n    \"lease_owner\": \"worker-1\",\n    \"result\": \"confirmation-sent\"\n  }'\n```\n\n### 8. Check Workflow Status\n\n```bash\ncurl $SERVER/api/workflows/order-42 \\\n  -H \"Authorization: Bearer $OPERATOR_TOKEN\" \\\n  -H \"X-Namespace: default\" \\\n  -H \"X-Durable-Workflow-Control-Plane-Version: 2\"\n```\n\n### 9. View Event History\n\n```bash\ncurl \"$SERVER/api/workflows/order-42/runs/abc123/history\" \\\n  -H \"Authorization: Bearer $OPERATOR_TOKEN\" \\\n  -H \"X-Namespace: default\" \\\n  -H \"X-Durable-Workflow-Control-Plane-Version: 2\"\n```\n\n### Supported Workflow Task Commands\n\n| Command | Terminal | Description |\n|---------|----------|-------------|\n| `complete_workflow` | Yes | Complete workflow with a result |\n| `fail_workflow` | Yes | Fail workflow with an error |\n| `continue_as_new` | Yes | Continue as a new run |\n| `schedule_activity` | No | Schedule an activity for execution |\n| `start_timer` | No | Start a durable timer |\n| `start_child_workflow` | No | Start a child workflow |\n| `record_side_effect` | No | Record a non-deterministic value |\n| `record_version_marker` | No | Record a version marker |\n| `upsert_search_attributes` | No | Update search attributes |\n\nRetry and timeout fields are scoped to the command layer they control.\n`schedule_activity` accepts activity `retry_policy`,\n`start_to_close_timeout`, `schedule_to_start_timeout`,\n`schedule_to_close_timeout`, and `heartbeat_timeout`; heartbeat, start, and\nschedule-to-start budgets cannot exceed schedule-to-close or start-to-close\nwhere those outer budgets are present. `start_child_workflow` accepts child\nworkflow `retry_policy`, `execution_timeout_seconds`, and\n`run_timeout_seconds`; the run timeout cannot exceed the execution timeout.\n`non_retryable` is only a failure outcome flag on `fail_workflow` and\n`fail_update`. HTTP transport retry policy is configured by clients outside the\nworkflow-task command payload.\n\n## API Overview\n\n### System\n- `GET /api/health` — Health check plus a machine-readable topology summary for the current node\n- `GET /api/ready` — Bounded readiness check for migrations, configured queue storage, default namespace, cache, auth config, fail-closed backend/fleet admission, and the current node topology summary\n- `GET /api/cluster/info` — Bounded server compatibility and capability discovery used by SDK and CLI preflight\n- `GET /api/cluster/info?include=diagnostics` — Compatibility discovery plus explicit coordination health, fleet detail, task-repair diagnostics, and operator metrics\n- `GET /api/system/health` — Full rollout-safety health snapshot for the requested namespace, including check status, categories, routing-drain state, operator metrics, and structural limits\n- `GET /api/system/metrics` — Server metrics including bounded stuck workflow-task diagnostics\n- `GET /api/system/operator-metrics` — Full operator metrics snapshot (runs, tasks, backlog, repair, workers/fleet, backend, structural limits) for namespace-scoped rollout-safety coordination health\n- `GET /api/system/repair` — Task repair diagnostics\n- `POST /api/system/repair/pass` — Run task repair sweep\n- `GET /api/system/activity-timeouts` — Expired activity execution diagnostics\n- `POST /api/system/activity-timeouts/pass` — Enforce activity timeouts\n\n`POST /api/system/repair/pass` accepts optional `connection`, `queue`,\n`run_ids`, and `instance_id` filters. Set `respect_throttle=true` when a\ndedicated matching-role loop should skip a pass rather than duplicate work\nalready covered by another matching-role process holding the repair throttle.\n\n### Namespaces\n- `GET /api/namespaces` — List namespaces\n- `POST /api/namespaces` — Create namespace\n- `GET /api/namespaces/{name}` — Get namespace\n- `PUT /api/namespaces/{name}` — Update namespace\n- `DELETE /api/namespaces/{name}` — Delete namespace and clean up its runtime state\n- `PUT /api/namespaces/{name}/external-storage` — Configure external payload storage policy\n\nWhen a namespace enables external payload storage, the server resolves\n`{codec, external_storage}` payload envelopes on workflow start, signal, query,\nupdate, bridge-adapter, and activity result/failure ingress. The same policy is\nused while recording workflow starts, activity inputs/results, and workflow\noutputs, so oversized encoded payloads enter history as external references\ninstead of inline blobs. The `local` driver stores blobs below the configured\n`file://` URI or the namespace-scoped server storage path. S3, GCS, and Azure\npolicies are available when the policy includes `config.disk` naming a\nconfigured Laravel filesystem disk plus `config.bucket`; the server emits\nprovider URIs such as `s3://bucket/prefix/...` while using that disk for\nput/get/delete operations. Object-storage policies without a configured disk\nremain fail-closed so references are not silently accepted by a runtime that\ncannot dereference or delete them. History retention deletes referenced local\nand configured object-storage payload blobs before pruning an expired run, and\nleaves runs in place when a retained reference uses a provider this server\ncannot delete yet.\n\n### Workflows\n- `GET /api/workflows` — List workflows (with filters)\n- `POST /api/workflows` — Start a workflow\n- `GET /api/workflows/{id}` — Describe a workflow\n- `GET /api/workflows/{id}/runs` — List runs (continue-as-new chain)\n- `GET /api/workflows/{id}/runs/{runId}` — Describe a specific run\n- `GET /api/workflows/{id}/debug` — Bounded support diagnostic for the current run\n- `GET /api/workflows/{id}/runs/{runId}/debug` — Bounded support diagnostic for a specific run\n- `POST /api/workflows/{id}/signal/{name}` — Send a signal\n- `POST /api/workflows/{id}/query/{name}` — Execute a query\n- `POST /api/workflows/{id}/update/{name}` — Execute an update\n- `POST /api/workflows/{id}/cancel` — Request cancellation\n- `POST /api/workflows/{id}/terminate` — Terminate immediately\n- `POST /api/workflows/{id}/repair` — Request repair for retryable stuck state\n- `POST /api/workflows/{id}/archive` — Archive a closed workflow run\n\nInstance-targeted signal, query, update, cancel, terminate, repair, and archive\nroutes operate on the current run for the workflow id. Run-targeted command\nvariants under `/api/workflows/{id}/runs/{runId}/...` are available for\nsignal, query, update, cancel, terminate, repair, and archive; they only forward\nwhen `{runId}` is the current run and reject historical-run commands with `409`\nand `reason: \"historical_run_command_rejected\"`.\n\nWorkflow starts fail closed with `409` / `reason: \"task_queue_draining\"`\nwhen the requested task queue has been explicitly drained and no active worker\ncohort remains to claim new work. The response includes\n`routing_status`, worker counts, and `draining_build_ids` so operators can\ndistinguish \"wait for the active cohort to return\" from \"resume or replace the\ndrained build cohort first.\"\n\nWorkflow debug responses are capped support snapshots, not full run exports:\nthe server fetches at most 25 pending workflow tasks, 25 pending activities\nwith only each activity's current/latest attempt, and 10 recent failures. The\nlast history event includes only sequence, type, timestamp, and bounded payload\nmetadata by default; add `include_last_event_payload=true` to include at most a\n4 KiB JSON preview. Use the history endpoints when a full replay/debug archive\nis needed.\n\n### History\n- `GET /api/workflows/{id}/runs/{runId}/history` — Get event history\n- `GET /api/workflows/{id}/runs/{runId}/history/export` — Export replay bundle\n\n### External Payload Storage\n- `POST /api/storage/test` — Round-trip diagnostic for the selected namespace storage policy\n\nThe storage diagnostic writes, reads, verifies, and deletes small and large test\npayloads through the namespace's configured policy. It supports `local` and\nconfigured-disk `s3`, `gcs`, and `azure` policies; it returns\n`storage_driver_unavailable` when the namespace only stores provider metadata\nand the current server runtime has no filesystem disk configured for that\nprovider.\n\nEvery non-health, non-discovery control-plane endpoint must send\n`X-Durable-Workflow-Control-Plane-Version: 2` on the request. That\ncovers namespace, schedule, search-attribute, task-queue, worker-management,\nsystem, workflow, and history endpoints. Requests without that header or\nwith legacy `wait_policy` fields are rejected. Mutating requests with bodies\nmust use `Content-Type: application/json` or another `application/*+json` media\ntype; XML, form, and other body formats return a versioned 415 response before\ncontroller work. Workflow and history responses always return the same header.\nThe v2 canonical workflow command fields are\n`workflow_id`, `command_status`, `outcome`, plus `signal_name`, `query_name`,\nor `update_name` where applicable and, for updates, `wait_for`,\n`wait_timed_out`, and `wait_timeout_seconds`.\nValidation failures return HTTP 422 with `reason: validation_failed` plus both\n`errors` and `validation_errors`; workflow operation routes also project that\nreason and validation detail into the nested `control_plane` metadata. Current\nrun-targeted command routes project the URL `run_id` in the response and\n`control_plane.run_id` so clients can distinguish instance-level commands from\nexplicit selected-run commands.\n\nOnly `GET /api/health`, `GET /api/ready`, and `GET /api/cluster/info` are\nexempt. They are intentionally version-free so probes can check liveness and\nreadiness, and clients can discover the supported control-plane version before\nadopting it.\n\nWorkflow control-plane responses, including run-history listing responses, also\npublish a nested, independently versioned `control_plane.contract` boundary\nwith:\n- `schema: durable-workflow.v2.control-plane-response.contract`\n- `version: 1`\n- `legacy_field_policy: reject_non_canonical`\n- `legacy_fields`, `required_fields`, and `success_fields`\n\nClients can validate that nested contract separately from the outer\n`control_plane` envelope.\n\nHistory export responses are the exception inside the workflow route group:\n`GET /api/workflows/{id}/runs/{runId}/history/export` returns the replay bundle\nas-is so its integrity checksum and optional signature cover the exact artifact\nthe client receives.\n\nThe server also publishes the current request contract in\n`GET /api/cluster/info` under `control_plane.request_contract` with:\n- `schema: durable-workflow.v2.control-plane-request.contract`\n- `version: 1`\n- `operations`\n\nTreat that versioned manifest as the source of truth for canonical request\nvalues, rejected aliases, and removed fields such as start\n`duplicate_policy` and update `wait_for`. Clients should reject missing or\nunknown request-contract schema or version instead of silently guessing.\n\n`GET /api/cluster/info` also includes `client_compatibility`, whose\n`authority` is `protocol_manifests`. The top-level server `version` is build\nidentity only; CLI and SDK compatibility must be decided from\n`control_plane.version`, `control_plane.request_contract`, and, for workers,\n`worker_protocol.version`. Unknown, missing, or undiscoverable protocol\nmanifests should fail closed.\n\nThe same cluster-info response publishes `skew_refusal_matrix_contract`, a\nmachine-readable matrix for published-artifact compatibility checks. It names\nthe required CLI, Python SDK, PHP worker, and Waterline pairings, the compatible\nand intentionally skewed version classes, the allowed worker and Waterline\nclassifications, and the wire evidence a conformance run must capture before a\nskew result can pass. Smoke-only compatibility evidence is explicitly\nnon-passing under that contract. The manifest also publishes the host-runner\nhandoff for the full matrix, including focused coverage-gap findings for any\nunexecuted cell.\n\n### Worker Protocol\n- `POST /api/worker/register` — Register a worker\n- `POST /api/worker/heartbeat` — Worker fleet heartbeat (free task slots, basic process metrics)\n- `POST /api/worker/workflow-tasks/poll` — Long-poll for workflow tasks\n- `POST /api/worker/workflow-tasks/{id}/heartbeat` — Workflow task heartbeat\n- `POST /api/worker/workflow-tasks/{id}/complete` — Complete workflow task\n- `POST /api/worker/workflow-tasks/{id}/fail` — Fail workflow task\n- `POST /api/worker/activity-tasks/poll` — Long-poll for activity tasks\n- `POST /api/worker/activity-tasks/{id}/complete` — Complete activity task\n- `POST /api/worker/activity-tasks/{id}/fail` — Fail activity task\n- `POST /api/worker/activity-tasks/{id}/heartbeat` — Activity heartbeat\n\nWorker-fleet heartbeats accept optional `task_slots` (`workflow_available`,\n`activity_available`, `session_available`) and `process_metrics`\n(`cpu_percent`, `memory_bytes`, `process_uptime_seconds`, `process_id`,\n`host`, `process_started_at`) so operators can answer \"what workers are\npolling task queue X right now, what's their slot capacity, when did each last\ncheck in\" via `GET /api/workers`, the CLI `dw worker:list` /\n`dw worker:describe`, and the Waterline Worker Status view.\n`process_started_at` is a process identity value, alongside `host` and\n`process_id`, that lets the server distinguish a restarted process that reused\nthe same worker id and OS pid. The register and heartbeat acknowledgements\nadvertise the recommended cadence in `heartbeat_interval_seconds` (default\n10s, configurable via `DW_WORKER_HEARTBEAT_INTERVAL_SECONDS`); workers that\nmiss enough heartbeats fall out of the default `GET /api/workers` and\n`dw worker:list` active roster after `DW_WORKER_STALE_AFTER_SECONDS`; operators\ncan still ask for the expired diagnostic set with `status=stale`, and stale\nworkers stop being considered for query-task dispatch and routing-gate\nadmission.\n\nThe current server advertises worker protocol `1.13` by default. Worker-plane\nrequests should send the highest `X-Durable-Workflow-Protocol-Version` their SDK\nimplements, and worker-plane responses always echo the server's advertised\nversion in the same header plus the `protocol_version` body field.\nWorker-protocol compatibility is same-major with a worker minor less than or\nequal to the server minor: a server advertising `1.13` accepts worker requests\nfor `1.0` through `1.13`, including older `1.12` and `1.11` workers. Requests\nwith a missing, malformed, different-major, or higher-minor protocol version are\nrejected with `missing_protocol_version` or `unsupported_protocol_version` and a\n`supported_version` value that tells the worker what the server advertises.\nWorker-session routes and `worker_session` workflow commands additionally\nrequire request protocol `1.8` or newer; older same-major workers remain\naccepted for non-session work and receive `worker_sessions_unavailable` if they\ntry to use session features.\nWorker requests with bodies follow the same JSON media-type requirement as the\ncontrol plane and return a worker-protocol 415 response for XML, form, or other\nnon-JSON body formats.\nWorker registration, poll, heartbeat, complete, and fail responses all include\n`server_capabilities.supported_workflow_task_commands` so SDK workers can\nnegotiate whether the server only supports terminal workflow-task commands or\nthe expanded non-terminal command set. The same `server_capabilities` object\nalso advertises command-option support for activity retry policies, activity\ntimeouts, child workflow retry policies, child workflow timeouts, parent-close\npolicy, and non-retryable failures. SDK workers can therefore negotiate worker\nbehavior from either `GET /api/cluster/info` or any worker-plane response.\n\nLong-poll wake-ups use short-lived cache-backed signal keys plus periodic\nreprobes. Multi-node deployments therefore need a shared cache backend for\nprompt wake behavior; without one, correctness still comes from the periodic\ndatabase recheck, but wake latency will regress toward the forced recheck\ninterval.\n\nServer-owned cache keys and metric label sets are governed by the bounded-growth\npolicy in `config/dw-bounded-growth.php`; the human-readable inventory lives in\n`docs/bounded-growth.md`.\n\nDefault cluster discovery publishes the full bounded `topology` manifest. It\nfreezes the server's role vocabulary (`api_ingress`, `control_plane`, `matching`,\n`history_projection`, `scheduler`, `execution_plane`), the product's supported\ndeployment shapes (`embedded`, `standalone_server`,\n`split_control_execution`), the roles currently hosted by this node's\nconfigured process class, and the current execution mode. `execution_mode` is\n`remote_worker_protocol` in the default service-mode deployment and switches to\n`local_queue_worker` when `DW_MODE=embedded` routes workflow and activity task\nexecution through local Laravel queue workers. Set\n`DW_SERVER_TOPOLOGY_SHAPE` and `DW_SERVER_PROCESS_CLASS` when a deployment\nsplits control-plane, scheduler, matching, or execution work away from the\ndefault `server_http_node` so discovery reports the live node identity instead\nof a generic HTTP shape. The published Compose artifacts set these per service\nfor the supported `server`, `worker`, and `scheduler` nodes, so `GET /api/cluster/info`\nand local diagnostics report the same node class the operator actually\ndeployed. `topology.matching_role` adds the live matching-role\ndeployment knobs for that node: `queue_wake_enabled`, the matching-role\n`shape` (`in_worker` or `dedicated`), who owns the broad-poll wake\n(`worker_loop` or `dedicated_repair_pass`), the active `task_dispatch_mode`\n(`poll` or `queue`), the frozen `partition_primitives`\n(`connection`, `queue`, `compatibility`, `namespace`), and the durable\n`backpressure_model` (`lease_ownership`). The same manifest now also publishes\n`role_catalog` for the current node, the supported process-class assignments\nfor each topology, the durable-write authority boundary for every role, the\nsurface-by-surface authority map for durable tables, the expected degraded\nbehavior for each role failure domain, the scaling axis for each role, and the\nincremental migration steps from today's standalone shape to the split\ncontrol/execution topology.\n\nAuthenticated API routes now also fail closed against that advertised process\nclass. Nodes that do not host the server's current HTTP control surface return\n`503` with `reason: \"topology_role_unavailable\"` on role-gated routes instead\nof pretending to be interchangeable HTTP peers. `GET /api/cluster/info`,\n`/api/health`, and `/api/ready` stay available for discovery and liveness even\non scheduler-only, execution-only, or matching-only nodes. The unauthenticated\nhealth and readiness probes publish the current node's topology summary\n(`schema`, `version`, `current_shape`, `current_process_class`,\n`current_roles`, `execution_mode`, and `matching_role`) so operators can\nidentify split-role nodes without authenticating into the broader\n`/api/cluster/info` manifest.\n\nThose runtime-serving write and poll routes also fail closed on bootstrap\nblockers. If database connectivity or workflow-table migrations are not ready,\nthe workflow index, workflow start/mutation routes, bridge-adapter traffic, and\nworker-protocol traffic return `503` with `reason: \"workflow_v2_blocked\"` plus\n`blocked_by` and `remediation` instead of accepting traffic that depends on an\nincomplete rollout state. Run-scoped debug and history diagnostics stay\navailable so operators can inspect the broken node.\n\nThe explicit `GET /api/cluster/info?include=diagnostics` response includes a versioned\n`coordination_health` manifest for rollout-safety coordination risk. It\nsummarizes the current server-wide workflow v2 health status, warning and error\ncheck names, category counts, the normalized check list that already powers the\nbounded readiness gate, and a `routing_drains` summary that lists namespaces and task\nqueues with draining build-id cohorts. The manifest is intentionally\n`all_namespaces` scoped so it describes the server's fleet-wide coordination\nposture; use `GET /api/system/health` for the namespace-scoped\n`routing_drains` view.\n\nExplicit diagnostic discovery also publishes the full operator metrics snapshot at\n`operator_metrics`, scoped to the requested namespace (or\n`server.default_namespace` when no `X-Namespace` header is supplied). The\nsnapshot mirrors the shape served by `GET /api/system/operator-metrics` —\n`runs`, `tasks`, `backlog`, `repair`, `workers` (including the live `fleet`\ndetail), `backend`, `structural_limits`, and `repair_policy` — so the\nstandalone server's diagnostic discovery surface carries namespace-specific backlog,\nworker, and repair detail alongside the fleet-wide `coordination_health`\nmanifest. Use the dedicated `GET /api/system/operator-metrics` endpoint when\noperators need an admin-gated, control-plane-versioned read of the same\nsnapshot.\n\nDefault cluster discovery, liveness, readiness, and runtime bootstrap admission\nnever build this full snapshot. Their database and CPU work stays fixed as\nworkflow history and ready-task cardinality grow. Default discovery retains the\ncompatibility, capability, control-plane, worker-protocol, auth-composition,\nlimits, static contract manifests, and full topology fields required by public\nSDK and CLI preflight. Use\n`GET /api/system/health`, `GET /api/system/operator-metrics`, or\n`GET /api/cluster/info?include=diagnostics` when detailed backlog, projection,\nhistory, command-contract, and fleet diagnostics are required.\n\nThe activity-grade external execution surface is published from\n`GET /api/cluster/info` at\n`worker_protocol.external_execution_surface_contract`. That manifest is the\ncarrier-neutral umbrella for durable, bounded, external work: operator,\nplatform, and integration automation first, with script or agent handlers as\nsecondary consumers. It keeps workflow replay, ContinueAsNew, signal/update/query\nordering, and event-history interpretation inside real runtimes. A\nhuman-readable summary lives in `docs/contracts/external-execution-surface.md`.\nHandler mappings are config-first: set `DW_EXTERNAL_EXECUTOR_CONFIG_PATH` to a\n`durable-workflow.external-executor.config` JSON file and, when needed, set\n`DW_EXTERNAL_EXECUTOR_CONFIG_OVERLAY` to apply an environment overlay before\nserver validation. Cluster discovery publishes the config contract and redacted\nruntime diagnostics at `worker_protocol.external_executor_config_contract`.\nWhen a leased activity task matches a valid configured activity mapping by task\nqueue and activity type, the activity poll response includes a redacted\n`task.external_executor` mapping block with the handler, carrier target, auth\nreference, rollout metadata, and config schema version.\n\nThe first concrete invocable carrier contract is published at\n`worker_protocol.invocable_carrier_contract` with carrier type\n`invocable_http`. It is activity-task only: the target endpoint receives the\nexternal task input envelope over `POST` and must return the external task\nresult envelope. The server validates `invocable_http` carrier config\nfail-closed, including absolute HTTPS `url` targets, HTTP only for loopback\ndevelopment targets, no embedded URL credentials, `POST` method, bounded\n`timeout_seconds`, optional bounded `retry_policy`, and activity-only\ncapabilities, before mapping it onto pollable activity tasks. The carrier\nmust also resolve an `auth_ref` for non-loopback targets; only loopback HTTP\ndevelopment targets may omit auth. Effective auth remains redacted in cluster\ndiagnostics and activity poll responses.\nretry policy is transport-only: it may repeat transient HTTP delivery before a\nresult is reported, while durable activity retry remains owned by the\nserver/runtime after complete/fail reporting.\nFor leased invocable mappings, `task.external_executor.dispatch` also exposes\nthe attempt-level diagnostics needed to reason about one handler call: content\ntypes, configured transport timeout, task deadline fields, idempotency key\nsource, normalized transport retry policy, durable retry authority, failure\nmapping, and the complete/fail result reporting paths.\n\nThe carrier-neutral external task input envelope is published from\n`GET /api/cluster/info` at `worker_protocol.external_task_input_contract`.\nThat manifest explicitly splits its scope: activity tasks are the\nactivity-grade external-execution handler input, while workflow tasks are\npublished for worker-protocol runtime compatibility and drift testing rather\nthan as generic external handler work. Both shapes freeze task identity,\nattempt, queue, handler, workflow/run context, lease metadata, deadlines where\nrelevant, payload metadata, idempotency keys, and versioning rules.\nShared JSON fixtures are embedded in the manifest as artifact objects with\nstable artifact names, media types, SHA-256 digests, and examples. A\nhuman-readable summary lives in `docs/contracts/external-task-input.md`.\n\nThe carrier-neutral external task result envelope is published from\n`GET /api/cluster/info` at `worker_protocol.external_task_result_contract`.\nThat manifest freezes success, structured failure, malformed output,\ncancellation, handler crash, decode failure, and unsupported payload outcomes.\nShared result fixtures use the same embedded artifact shape so CLI, SDK, and\nfuture carriers can validate parser behavior without repository-local fixture\npaths. A human-readable summary lives in\n`docs/contracts/external-task-result.md`.\n\nWithin worker protocol version `1.13`, `worker_protocol.version`,\n`server_capabilities.long_poll_timeout`, and\n`server_capabilities.supported_workflow_task_commands` are stable contract\nfields. The command-option booleans under `server_capabilities` are additive\nworker capability fields. Adding new workflow-task commands or optional\ncapability booleans is additive; removing or renaming a command or capability\nrequires a protocol version bump.\n\nWorkflow task polling returns a leased task plus `workflow_task_attempt`. Clients\nmust echo both `workflow_task_attempt` and `lease_owner` on workflow-task\n`heartbeat`, `complete`, and `fail` calls. Workflow-task completion supports\nnon-terminal commands such as `schedule_activity`, `start_timer`,\n`start_child_workflow`, `complete_update`, and `fail_update`, plus terminal\n`complete_workflow`, `fail_workflow`, and `continue_as_new` commands. Workers\nuse `complete_update` with `update_id` and an optional encoded `result`\nafter applying an accepted update, or `fail_update` with `update_id`,\n`message`, and optional exception metadata when the update handler fails. Poll\nresponses also expose stable resume\ncontext fields from the durable task payload: `workflow_wait_kind`,\n`open_wait_id`, `resume_source_kind`, `resume_source_id`,\n`workflow_update_id`, `workflow_signal_id`, `workflow_command_id`,\n`signal_name`, `signal_wait_id`, `activity_execution_id`,\n`activity_attempt_id`, `activity_type`,\n`child_call_id`, `child_workflow_run_id`, `workflow_sequence`,\n`workflow_event_type`, `timer_id`, `condition_wait_id`, `condition_key`, and\n`condition_definition_fingerprint`. Signal-backed tasks also expose\n`signal_arguments` as the same codec-tagged payload envelope used by workflow\nstart and activity inputs. Fields that do not apply to the leased task are\n`null`; pure timer resumes set\n`workflow_wait_kind: \"timer\"`, `open_wait_id: \"timer:{timer_id}\"`, and\n`timer_id` so SDK workers can apply timer-fired history directly. Update-backed\ntasks set\n`workflow_wait_kind: \"update\"` and `workflow_update_id` so SDK workers can tie\nthe task to the accepted update they are applying. Signal-backed tasks set\n`workflow_wait_kind: \"signal\"`, `workflow_signal_id`, `signal_name`, and\n`signal_wait_id` so SDK workers can tie the task to the accepted signal or\ntimer-backed signal wait they are applying, while activity-backed resume tasks\nset `workflow_wait_kind: \"activity\"` and `activity_execution_id` so workers can\napply completed or failed activity history without scanning the full event\nstream. Timer-backed condition resumes set `workflow_wait_kind: \"condition\"`,\n`condition_wait_id`, `condition_key`, and\n`condition_definition_fingerprint` when the original wait recorded them. If a\ncancel or terminate command closes the run while a workflow task\nis leased, the next workflow-task\n`history`, `heartbeat`, `complete`, or `fail` response returns the worker\nenvelope with `reason: \"run_closed\"`, `can_continue: false`,\n`cancel_requested: true`, and a concrete `stop_reason` such as `run_cancelled`\nor `run_terminated`. The response also includes `run_closed_reason` and\n`run_closed_at` from the durable run so external workers can log the exact\nclosure state that stopped their leased task.\n\nStart-boundary command ordering is part of the worker replay contract. When a\nsignal or update is accepted after the run is persisted but before the first\nworkflow task is polled, the server still records and returns `WorkflowStarted`\nbefore `SignalReceived` or `UpdateAccepted`. SDK workers can initialize workflow\nstate before applying command handlers during replay; commands sent before a\nworkflow ID is bound remain rejected as `instance_not_found`.\n\nActivity task polling returns a leased attempt identity. Clients must echo both\n`activity_attempt_id` and `lease_owner` on activity `complete`, `fail`, and\n`heartbeat` calls. When the activity execution has timeout deadlines configured,\nthe poll response includes a `deadlines` object with ISO-8601 timestamps for\n`schedule_to_start`, `start_to_close`, `schedule_to_close`, and/or `heartbeat`.\nWorkers should use these deadlines to self-cancel before the server enforces the\ntimeout. The server runs `activity:timeout-enforce` periodically to expire\nactivities that exceed their deadlines. Heartbeats accept `message`, `current`,\n`total`, `unit`, and `details` fields; the server normalizes them to the package\nheartbeat-progress contract before recording the heartbeat.\nWhen a run-level cancel or terminate command stops a leased activity task,\nheartbeat, complete, and fail responses include `run_closed_reason` and\n`run_closed_at` alongside `cancel_requested: true`.\n\n### Schedules\n- `GET /api/schedules` — List schedules with visibility filters and cursor paging\n- `POST /api/schedules` — Create schedule\n- `GET /api/schedules/{id}` — Describe schedule\n- `PUT /api/schedules/{id}` — Update schedule\n- `DELETE /api/schedules/{id}` — Delete schedule\n- `POST /api/schedules/{id}/pause` — Pause schedule\n- `POST /api/schedules/{id}/resume` — Resume schedule\n- `POST /api/schedules/{id}/trigger` — Trigger immediately\n- `POST /api/schedules/{id}/backfill` — Backfill missed runs\n\nSchedule listing is namespace-scoped and never returns deleted schedules. The\noptional `status` (`active` or `paused`) and `workflow_type` filters are exact\nmatches. `query` accepts equality predicates joined by `AND` for\n`ScheduleId`, `Status`, `WorkflowType`, `TaskQueue`, `Note`, and registered\nsearch-attribute names. String literals use quotes; number and boolean literals\nretain their JSON types. Other operators and fields are rejected instead of\nbeing ignored. All supplied filters combine with AND semantics.\n\n`page_size` defaults to 50 and accepts 1 through 200. Results use the stable\norder `created_at DESC, schedule_id ASC`. A non-null `next_page_token` is an\nopaque, signed keyset cursor; pass it back unchanged with the same namespace,\nstatus, workflow type, and visibility query. `page_size` may change between\npages. A null token terminates traversal. Malformed, cross-namespace,\nfilter-mismatched, and stale cursors return typed errors with `reason`, `field`,\n`errors`, and `last_safe_cursor` evidence. A cursor becomes stale when its\nanchor schedule is deleted or no longer matches the original filtered set;\nrestart the traversal without a token in that case.\n\n### Task Queues\n- `GET /api/task-queues` — List task queues\n- `GET /api/task-queues/{name}` — Task queue details, pollers, and recent add/dispatch flow\n\nTask queue responses include an `admission` object so operators can separate\nworker-local capacity from server-side queue and query-task admission limits. Workflow\nand activity entries report active worker count, configured slots from worker\nregistrations, leased and ready counts, available slots, optional server-side\nqueue and namespace active lease caps, optional queue and namespace per-minute\ndispatch caps, optional downstream budget-group dispatch caps, and a status such as\n`accepting`, `throttled`, `saturated`, `no_slots`, or `no_active_workers`. Set\n`DW_WORKFLOW_TASK_MAX_ACTIVE_LEASES_PER_QUEUE` and\n`DW_ACTIVITY_TASK_MAX_ACTIVE_LEASES_PER_QUEUE` to cap active leases per\nnamespace/task queue. Set `DW_WORKFLOW_TASK_MAX_ACTIVE_LEASES_PER_NAMESPACE`\nand `DW_ACTIVITY_TASK_MAX_ACTIVE_LEASES_PER_NAMESPACE` to cap active leases\nacross all task queues in a namespace. Set `DW_WORKFLOW_TASK_MAX_DISPATCHES_PER_MINUTE` and\n`DW_ACTIVITY_TASK_MAX_DISPATCHES_PER_MINUTE` to smooth downstream dispatch per\nnamespace/task queue. Set `DW_WORKFLOW_TASK_MAX_DISPATCHES_PER_MINUTE_PER_NAMESPACE`\nand `DW_ACTIVITY_TASK_MAX_DISPATCHES_PER_MINUTE_PER_NAMESPACE` to smooth\ntenant-wide dispatch across all queues in a namespace, or use\n`DW_TASK_QUEUE_ADMISSION_OVERRIDES` for exact queue and namespace overrides\nkeyed by `namespace:task_queue`, `namespace:*`, `task_queue`, or `*`. Override\nentries may set `workflow_tasks.max_active_leases`,\n`workflow_tasks.max_active_leases_per_namespace`,\n`workflow_tasks.max_dispatches_per_minute`,\n`workflow_tasks.max_dispatches_per_minute_per_namespace`,\n`workflow_tasks.dispatch_budget_group`,\n`workflow_tasks.max_dispatches_per_minute_per_budget_group`,\n`activity_tasks.max_active_leases`,\n`activity_tasks.max_active_leases_per_namespace`,\n`activity_tasks.max_dispatches_per_minute`, or\n`activity_tasks.max_dispatches_per_minute_per_namespace`,\n`activity_tasks.dispatch_budget_group`, or\n`activity_tasks.max_dispatches_per_minute_per_budget_group`. Give several\nqueues the same `dispatch_budget_group` when they share a rate-limited\ndownstream dependency and should consume one namespace-scoped per-minute\nbudget without throttling every queue in the namespace. Query-task\nentries report `server.query_tasks.max_pending_per_queue`, approximate pending\ncount, remaining capacity, cache-lock support, and whether the queue is\n`accepting`, `full`, or `unavailable`.\n\n### Search Attributes\n- `GET /api/search-attributes` — List search attributes\n- `POST /api/search-attributes` — Register custom attribute\n- `DELETE /api/search-attributes/{name}` — Remove custom attribute\n\n## Authentication\n\nSet the `X-Namespace` header to target a specific namespace (defaults to `default`).\nRequests that name a namespace which is not registered receive a `404` with\n`reason: \"namespace_not_found\"`; register the namespace via\n`POST /api/namespaces` before directing traffic to it. The namespace\nadministration endpoints (`/api/namespaces/**`), cluster discovery\n(`/api/cluster/info`), and the unauthenticated `/api/health` and `/api/ready`\nprobes are exempt from this check.\n\n### Token Authentication\n\nFor production, prefer role-scoped tokens:\n\n```env\nDW_AUTH_DRIVER=token\nDW_WORKER_TOKEN=worker-secret\nDW_OPERATOR_TOKEN=operator-secret\nDW_ADMIN_TOKEN=admin-secret\n```\n\n`worker` tokens can call `/api/worker/*` and `/api/cluster/info`. `operator`\ntokens can call workflow, history, schedule, search-attribute, task-queue,\nworker-read, namespace-read, and diagnostic `/api/worker/register` endpoints.\n`admin` tokens can call admin operations such as `/api/system/*`, namespace\ncreate/update/delete, worker deletion, diagnostic `/api/worker/register`, and\ncan also use operator endpoints.\n\n```bash\ncurl -H \"Authorization: Bearer operator-secret\" \\\n     -H \"X-Durable-Workflow-Control-Plane-Version: 2\" \\\n     http://localhost:8080/api/workflows\n```\n\nExisting deployments can keep `DW_AUTH_TOKEN`. When no role tokens\nare configured, that legacy token keeps full API access. Once any role token is\nconfigured, the legacy token is treated as an admin token and no longer grants\nworker-plane access. Set `DW_AUTH_BACKWARD_COMPATIBLE=false` to\nrequire role-scoped credentials only.\n\nFor audit trails that need stable actor names rather than role labels, set\n`DW_PRINCIPAL_TOKENS` to a JSON object or array. Each entry maps one bearer\ntoken to a server-derived principal subject and role set; clients cannot\noverride these values with request payloads or headers.\n\n```env\nDW_AUTH_DRIVER=token\nDW_AUTH_BACKWARD_COMPATIBLE=false\nDW_PRINCIPAL_TOKENS='[{\"token\":\"alice-v1\",\"subject\":\"alice\",\"roles\":[\"operator\"],\"label\":\"Alice\"},{\"token\":\"alice-v2\",\"subject\":\"alice\",\"roles\":[\"operator\"],\"label\":\"Alice\"},{\"token\":\"bob-token\",\"subject\":\"bob\",\"roles\":[\"operator\"],\"label\":\"Bob\"},{\"token\":\"worker-token\",\"subject\":\"worker:principal-conformance\",\"roles\":[\"worker\"]}]'\n```\n\nThe same subject may appear on more than one token, which lets operators\nrotate credentials without changing the recorded principal identity.\n\n### Signature Authentication\n\nSignature auth supports the same role split with role-scoped HMAC keys:\n\n```env\nDW_AUTH_DRIVER=signature\nDW_WORKER_SIGNATURE_KEY=worker-hmac-key\nDW_OPERATOR_SIGNATURE_KEY=operator-hmac-key\nDW_ADMIN_SIGNATURE_KEY=admin-hmac-key\n```\n\n```bash\n# HMAC-SHA256 of the request body\ncurl -H \"X-Signature: COMPUTED_SIGNATURE\" \\\n     -H \"X-Durable-Workflow-Control-Plane-Version: 2\" \\\n     http://localhost:8080/api/workflows\n```\n\nThe legacy `DW_SIGNATURE_KEY` follows the same compatibility rule\nas the legacy bearer token.\n\nSet `DW_AUTH_DRIVER=none` to disable authentication (development only).\n\n### Custom Auth Providers\n\nSet `DW_AUTH_PROVIDER` to the fully-qualified class name of a Laravel\ncontainer-resolvable implementation of `App\\Contracts\\AuthProvider` to replace\nthe built-in token/signature provider without editing server middleware. The\nprovider returns an `App\\Auth\\Principal` from `authenticate(Request $request)`\nand receives each route authorization decision as\n`authorize(Principal $principal, string $action, array $resource): bool`.\n\nThe route resource includes `allowed_roles`, HTTP method/path, route name/URI,\nnormalized `requested_namespace`, `default_namespace`, route parameters,\n`operation_family`, `operation_name`, and stable identifier fields such as\n`workflow_id`, `run_id`, `signal_name`, `query_name`, `update_name`, `task_id`,\n`query_task_id`, `task_queue`, `worker_id`, `schedule_id`, and\n`search_attribute_name` when those identifiers are present on the route or in\nthe worker request body. This resource is built before namespace existence is\nvalidated, so tenant-aware providers can deny access by namespace or workflow\nresource without reparsing raw paths and without revealing whether a namespace\nexists. The authenticated principal is also recorded in workflow command\nattribution so signal/update/query history can show the subject, roles, tenant,\nand non-secret claims supplied by the provider. When `DW_AUTH_PROVIDER` is set,\n`/api/ready` verifies that the class resolves and implements `AuthProvider`;\nbuilt-in token or signature credentials are not required for readiness.\n\n## Deployment\n\n### Docker\n\n```bash\ndocker build -t durable-workflow-server .\nexport DW_AUTH_TOKEN=dev-token\ndocker volume create durable-workflow-sqlite\n\n# Bootstrap schema + default namespace once\ndocker run --rm \\\n  -v durable-workflow-sqlite:/app/database \\\n  -e DW_AUTH_DRIVER=token \\\n  -e DW_AUTH_TOKEN=\"$DW_AUTH_TOKEN\" \\\n  durable-workflow-server server-bootstrap\n\n# Start the API server\ndocker run --rm -p 8080:8080 \\\n  -v durable-workflow-sqlite:/app/database \\\n  -e DW_AUTH_DRIVER=token \\\n  -e DW_AUTH_TOKEN=\"$DW_AUTH_TOKEN\" \\\n  durable-workflow-server\n\n# In a separate terminal, run the database queue worker that fires timers\ndocker run --rm \\\n  -v durable-workflow-sqlite:/app/database \\\n  -e DW_AUTH_DRIVER=token \\\n  -e DW_AUTH_TOKEN=\"$DW_AUTH_TOKEN\" \\\n  durable-workflow-server php artisan queue:work database --sleep=1 --tries=3\n```\n\nThe Dockerfile clones the `durable-workflow/workflow` `2.0.0-alpha.284` tag\ninto the build by default and refreshes the Composer package metadata from that\nsource before installing production dependencies. Use\n`--build-arg WORKFLOW_PACKAGE_SOURCE=...`,\n`--build-arg WORKFLOW_PACKAGE_REF=...`, and\n`--build-arg WORKFLOW_PACKAGE_COMMIT=...` to point the image build at another\nremote or ref while requiring the resolved commit to match the supplied full\nSHA.\n\nThe production image defaults to `DB_CONNECTION=sqlite`,\n`DB_DATABASE=/app/database/database.sqlite`, `QUEUE_CONNECTION=database`, and\n`CACHE_STORE=file` so the plain Docker quickstart works without external\nservices. The entrypoint creates the SQLite file when a fresh volume is mounted.\n`server-bootstrap` creates the database queue schema, and the queue-worker\nprocess shown above consumes the persisted timer jobs.\nSQLite uses WAL journal mode and a 5000 ms busy timeout by default. The server\nalso serializes SQLite worker poll claim probes through the polling cache so\nconcurrent PHP/Python workers in the single-container quickstart do not race the\nsame file-backed writer lock. If worker poll endpoints still return\n`reason: backend_lock_pressure`, workers should retry with backoff; sustained\nmulti-worker deployments should use MySQL/PostgreSQL with Redis.\n\nThe standalone server image also reserves PHP request-worker capacity for\nhealth and control-plane routes. Empty workflow and activity worker long-polls\nacquire a short-lived wait slot before sleeping; once the node-local slot cap\nis reached, additional idle polls receive `Retry-After: 1` instead of entering\na tight empty-poll loop or holding another PHP server worker for the full poll\ntimeout. Published Python, Rust, PHP, and other supported workers receive a\nprotocol-compatible HTTP 200 empty response with the\n`long_poll_capacity_exhausted` reason and retry header. The server holds that\ncompatibility response for the advertised one-second cooldown because existing\nworker poll APIs do not all expose retry headers to their loops. This keeps\nworkers alive and bounds immediate empty repolling without admitting another\nfull long-poll wait. Idle query-task\npolls use a separate wait-slot budget, derived to one slot on the default\nstandalone image, so workflow/activity polls cannot starve live workflow queries\nacross the PHP and Python worker queues, and query-task polls cannot consume the\nrequest workers needed by the waiting query request and the worker's completion\ncallback. A\npoll that arrives after a query task is pending still claims it immediately\nbefore any wait slot or backpressure response is required. Size\n`PHP_CLI_SERVER_WORKERS` for expected\nconcurrent workflow, activity, and query workers when using the standalone\nserver.\n\nAcross Compose, plain Docker, and Kubernetes, the supported bootstrap contract\nis the same: run the image's `server-bootstrap` command once before starting the\nserver and worker processes. `/api/health` is a liveness check; `/api/ready`\nis the readiness check to gate workers and load balancers. After bootstrap it\nalso evaluates the bounded workflow v2 backend and fleet admission checks, so\n`DW_V2_FLEET_VALIDATION_MODE=fail` incompatibility and error-severity backend\nissues keep the server unready until corrected. Full rollout-safety and\noperator-metrics snapshots remain explicit diagnostic requests and are not\nexecuted by recurring liveness/readiness probes.\n\n### Dedicated Matching-Role Daemon\n\nBy default every queue worker also runs the in-worker matching-role wake on\nevery Looping event, which keeps the broad-poll repair sweep close to the\nworkers that consume tasks. This is the in-worker shape of the matching role\ndescribed in `vendor/durable-workflow/workflow/docs/architecture/task-matching.md`\nand is the right default for small deployments.\n\nLarger deployments can opt execution-only nodes out of the in-worker wake and\nrun the broad sweep as a dedicated process. Set\n`DW_V2_MATCHING_ROLE_QUEUE_WAKE=false` on the queue-worker pods or services so\nthey stop broad-polling the durable task table, and run a single dedicated\nmatching-role daemon alongside the cluster:\n\n```bash\ndocker run --rm --name durable-workflow-matching \\\n  --env-file .env \\\n  durable-workflow-server \\\n  php artisan workflow:v2:repair-pass --loop\n```\n\nFor Compose deployments, layer the\n[`docker-compose.dedicated-matching.yml`](docker-compose.dedicated-matching.yml)\noverride on top of `docker-compose.published.yml` to enable the same shape:\n\n```bash\ndocker compose \\\n  -f docker-compose.published.yml \\\n  -f docker-compose.dedicated-matching.yml \\\n  up\n```\n\nThe override sets `DW_V2_MATCHING_ROLE_QUEUE_WAKE=false` on the `server`,\n`worker`, `scheduler`, and `matching` services so every long-running process\nreports the dedicated repair pass as the broad-poll wake owner. It adds a\n`matching` service running `php artisan workflow:v2:repair-pass --loop` so the\nbroad sweep runs in a dedicated process operators can scale and supervise\nindependently of API ingress and execution workers. It also pins\n`DW_SERVER_TOPOLOGY_SHAPE=split_control_execution` on the `server`, `worker`,\n`scheduler`, and `matching` services, with `DW_SERVER_PROCESS_CLASS`\nrespectively set to `control_plane_node`, `execution_node`,\n`scheduler_node`, and `matching_node`. That lets the public HTTP service\nadvertise the split control-plane shape while execution, scheduler, and\nmatching nodes each report their own independent role class and a consistent\n`wake_owner=dedicated_repair_pass` contract.\n\nThe daemon respects the watchdog loop throttle on every iteration so multiple\ncooperating matching-role processes coexist without duplicating broad-poll\nwork, sleeps for `DW_V2_TASK_REPAIR_LOOP_THROTTLE_SECONDS` between iterations\n(override with `--sleep-seconds=N`), and traps `SIGTERM`/`SIGINT` for graceful\nshutdown so process supervisors (systemd, supervisord, Docker, Kubernetes)\ncan drain it cleanly between deployments.\n\nOperators can confirm which shape each node is running through the\noperator-metrics snapshot: the `matching_role` block on\n`GET /api/system/metrics` reports `queue_wake_enabled`, `shape` (`in_worker`\nor `dedicated`), the configured `task_dispatch_mode`, the frozen\n`partition_primitives`, and the durable `backpressure_model` per process. The\ncluster-topology manifest reuses the same matching-role contract and adds\n`wake_owner` so operators can see which process class owns the broad-poll wake.\n\n### Publishing Container Images\n\nThe `Release` workflow publishes multi-arch images to\nDocker Hub (`durableworkflow/server`) and GitHub Container Registry\n(`ghcr.io/durable-workflow/server`) when a server semver tag is pushed. The\nworkflow builds the server image with the public\n`durable-workflow/workflow:2.0.0-alpha.284` package and verifies that the tag\nresolves to commit `80bef5d9bf01f3282c088b59c433e46b8b146617` before the\nimage can be published.\n\nWhen a later server image needs a newer workflow package fix, publish the\nworkflow tag first, update both Workflow package pins in the release workflow,\nthen tag server:\n\n```bash\n# In the workflow repo, publish the package ref the server image must consume.\ngit tag 2.0.0-alpha.267 origin/v2\ngit push origin refs/tags/2.0.0-alpha.267\n\n# In the server repo, publish the Docker image tags.\ngit tag 0.2.0 origin/main\ngit push origin refs/tags/0.2.0\n```\n\nThe server tag push publishes the exact version plus the semver aliases\ngenerated by the release workflow, including `latest`, to both registries. After\nthe workflow finishes, verify the image provenance and runtime config before\nannouncing the release:\n\n```bash\ndocker pull durableworkflow/server:0.2.0\ndocker run --rm --entrypoint sh durableworkflow/server:0.2.0 -lc \\\n  'cat /app/.package-provenance \u0026\u0026 grep -n \"serializer\" /app/vendor/durable-workflow/workflow/src/config/workflows.php'\n\ndocker pull ghcr.io/durable-workflow/server:0.2.0\ndocker run --rm --entrypoint sh ghcr.io/durable-workflow/server:0.2.0 -lc \\\n  'cat /app/.package-provenance \u0026\u0026 grep -n \"serializer\" /app/vendor/durable-workflow/workflow/src/config/workflows.php'\n```\n\n### Kubernetes\n\nThe published Helm chart in [`k8s/helm/durable-workflow/`](k8s/helm/durable-workflow/)\nis the recommended self-serve path for Kubernetes deployments. The raw\nmanifests remain the inspectable low-level alternative for teams that\nintentionally do not want Helm in the rollout.\n\nBoth paths share the same external-persistence, singleton-scheduler, and\n`/api/ready` readiness contracts. Use Helm values, Kustomize overlays, or\ndirect patches for environment-specific names, images, registry secrets, and\nscaling policy.\n\nThe public manifests default to the pinned Docker Hub image\n`durableworkflow/server:0.2`. For production, patch every workload to the exact\nDocker Hub or GHCR tag or digest you intend to run before applying it. See\n[`k8s/README.md`](k8s/README.md) for the raw-manifest support boundary,\n[`docs/helm-validation.md`](docs/helm-validation.md) for the Helm contract and\nvalidation harness, and [`k8s/helm/durable-workflow/docs/UPGRADING.md`](k8s/helm/durable-workflow/docs/UPGRADING.md)\nfor chart upgrade steps.\n\nThe supported apply order is configuration first, migration second, and\nlong-running workloads last. The helper script enforces that order, deletes any\nprevious completed migration job so a new deploy runs bootstrap again, waits for\ncompletion, and only then applies the server, worker, scheduler, and disruption\nbudget manifests:\n\n```bash\nscripts/deploy-k8s.sh\n```\n\nBefore running it, create the externally managed credentials referenced by the\npod templates. Keep DB/Redis credentials out of `k8s/secret.yaml`; manage them\nwith your secret manager, External Secrets operator, or `kubectl`:\n\n```bash\n# Required by every pod template.\nkubectl apply -f k8s/namespace.yaml\nkubectl create secret generic durable-workflow-database \\\n  --namespace durable-workflow \\\n  --from-literal=DB_USERNAME=workflow \\\n  --from-literal=DB_PASSWORD='CHANGE_ME'\n\n# Optional; only create this when Redis requires auth.\nkubectl create secret generic durable-workflow-redis \\\n  --namespace durable-workflow \\\n  --from-literal=REDIS_USERNAME='\u003cusername\u003e' \\\n  --from-literal=REDIS_PASSWORD='\u003cpassword\u003e'\n\n# App config and app-level secrets only.\nkubectl apply -f k8s/secret.yaml\n\n# Manual equivalent of scripts/deploy-k8s.sh.\nkubectl apply -f k8s/migration-job.yaml\nkubectl -n durable-workflow wait --for=condition=complete --timeout=300s job/durable-workflow-migrate\n\nkubectl apply -f k8s/server-pdb.yaml\nkubectl apply -f k8s/server-deployment.yaml\nkubectl apply -f k8s/worker-deployment.yaml\nkubectl apply -f k8s/scheduler-cronjob.yaml\n```\n\nThe Deployment manifests omit `spec.replicas` so HorizontalPodAutoscalers and\noperator overlays own replica count. For static installs, set replicas in your\noverlay or with `kubectl scale`.\n\n### Configuration\n\nAll operator-facing configuration is via `DW_*` environment variables.\n`config/dw-contract.php` is the authoritative machine-checkable contract;\nCI (`tests/Unit/EnvContractTest.php`) diffs it against `.env.example`,\n`docker-compose.yml`, and `k8s/secret.yaml` so the three surfaces cannot\ndrift. The Docker entrypoint runs `php artisan env:audit` at boot and\nlogs a warning for any unknown `DW_*` variable and any legacy\n`WORKFLOW_*` / `ACTIVITY_*` name that still resolves.\n\nRules — every `DW_*` name is stable across minor versions. Additions are\nfine; renames require a major bump with the old name alias-honored for\none major. Set `DW_ENV_AUDIT_STRICT=1` to fail container boot when the\naudit finds drift.\n\n#### Environment variable reference\n\nThe full table below is generated from `config/dw-contract.php` and lists\nevery operator-facing variable the server honors.\n\n| `DW_*` name | Default | Description |\n| --- | --- | --- |\n| `DW_MODE` | `service` | Server mode: \"service\" (external workers poll) or \"embedded\" (local queue). |\n| `DW_SERVER_ID` | `gethostname()` | Unique identifier for this server instance. |\n| `DW_SERVER_KEY` | generated at container boot | Optional server-internal runtime key. |\n| `DW_DEFAULT_NAMESPACE` | `default` | Namespace used when a request omits the namespace header. |\n| `DW_TASK_DISPATCH_MODE` | (unset) | Override for `workflows.v2.task_dispatch_mode`. Set to `queue` to dispatch locally in service mode. |\n| `DW_EXTERNAL_EXECUTOR_CONFIG_PATH` | (unset) | Optional path to an external executor handler-mapping JSON config. |\n| `DW_EXTERNAL_EXECUTOR_CONFIG_OVERLAY` | (unset) | Optional named overlay to apply before validating the external executor config. |\n| `DW_AUTH_PROVIDER` | (unset) | Optional FQCN implementing `App\\Contracts\\AuthProvider`; unset uses the built-in driver. |\n| `DW_AUTH_DRIVER` | `token` | `none`, `token`, or `signature`. |\n| `DW_AUTH_TOKEN` | (unset) | Single shared bearer token (backward-compat credential). |\n| `DW_SIGNATURE_KEY` | (unset) | HMAC key used when `DW_AUTH_DRIVER=signature` and no role-scoped key is configured. |\n| `DW_WORKER_TOKEN` | (unset) | Bearer token for worker registration, polling, heartbeat, and completion. |\n| `DW_OPERATOR_TOKEN` | (unset) | Bearer token for the operator control plane and diagnostic worker registration; polling remains worker-only. |\n| `DW_ADMIN_TOKEN` | (unset) | Bearer token for the admin control plane and diagnostic worker registration; polling remains worker-only. |\n| `DW_PRINCIPAL_TOKENS` | (unset) | JSON token map for named bearer-token principals used by audit attribution. |\n| `DW_WORKER_SIGNATURE_KEY` | (unset) | Role-scoped HMAC key for worker registration, polling, heartbeat, and completion. |\n| `DW_OPERATOR_SIGNATURE_KEY` | (unset) | Role-scoped HMAC key for the operator control plane and diagnostic worker registration; polling remains worker-only. |\n| `DW_ADMIN_SIGNATURE_KEY` | (unset) | Role-scoped HMAC key for the admin control plane and diagnostic worker registration; polling remains worker-only. |\n| `DW_AUTH_BACKWARD_COMPATIBLE` | `true` | Honor `DW_AUTH_TOKEN` / `DW_SIGNATURE_KEY` as a fallback when role credentials are missing. |\n| `DW_TRUST_FORWARDED_ATTRIBUTION_HEADERS` | `false` | Accept forwarded caller/auth headers from a trusted gateway. |\n| `DW_CALLER_TYPE_HEADER` | `X-Workflow-Caller-Type` | Request header carrying the forwarded caller type. |\n| `DW_CALLER_LABEL_HEADER` | `X-Workflow-Caller-Label` | Request header carrying the forwarded caller label. |\n| `DW_AUTH_STATUS_HEADER` | `X-Workflow-Auth-Status` | Request header carrying the forwarded auth status. |\n| `DW_AUTH_METHOD_HEADER` | `X-Workflow-Auth-Method` | Request header carrying the forwarded auth method. |\n| `DW_WORKER_POLL_TIMEOUT` | `30` | Seconds the server holds a poll open. |\n| `DW_WORKER_POLL_INTERVAL_MS` | `1000` | Internal scan interval during an open poll. |\n| `DW_WORKER_POLL_SIGNAL_CHECK_INTERVAL_MS` | `100` | Wake-signal check interval during an open poll. |\n| `DW_POLLING_CACHE_PATH` | `storage/.../server-polling/\u003cAPP_ENV\u003e` | Directory for worker-poll coordination state. |\n| `DW_WAKE_SIGNAL_TTL_SECONDS` | `max(DW_WORKER_POLL_TIMEOUT + 5, 60)` | TTL for per-queue wake signals. |\n| `DW_WORKER_LONG_POLL_MAX_CONCURRENT` | (unset; derived for PHP CLI server) | Optional cap for concurrent held workflow/activity worker long-poll waits on this server node. Query-task polls use a separate wait budget so live workflow queries are not starved by idle workflow/activity waits. |\n| `DW_WORKER_LONG_POLL_RESERVED_HTTP_WORKERS` | `2` | PHP CLI server workers reserved for health and control-plane requests when deriving the workflow/activity long-poll wait cap. |\n| `DW_MAX_TASKS_PER_POLL` | `1` | Maximum tasks returned per poll. |\n| `DW_SQLITE_CLAIM_LOCK_TTL_SECONDS` | `10` | Seconds the SQLite quickstart backend holds the cache-backed worker poll claim gate before the lock expires. |\n| `DW_SQLITE_CLAIM_LOCK_WAIT_SECONDS` | `5` | Seconds SQLite worker poll claims wait for the cache-backed claim gate before returning backend lock pressure. |\n| `DW_WORKFLOW_TASK_MAX_ACTIVE_LEASES_PER_QUEUE` | (unset) | Optional server-side cap for active workflow-task leases per namespace/task queue. |\n| `DW_WORKFLOW_TASK_MAX_ACTIVE_LEASES_PER_NAMESPACE` | (unset) | Optional server-side cap for active workflow-task leases across all task queues in a namespace. |\n| `DW_WORKFLOW_TASK_MAX_DISPATCHES_PER_MINUTE` | (unset) | Optional server-side cap for workflow-task dispatches per minute per namespace/task queue. |\n| `DW_WORKFLOW_TASK_MAX_DISPATCHES_PER_MINUTE_PER_NAMESPACE` | (unset) | Optional server-side cap for workflow-task dispatches per minute across all task queues in a namespace. |\n| `DW_ACTIVITY_TASK_MAX_ACTIVE_LEASES_PER_QUEUE` | (unset) | Optional server-side cap for active activity-task leases per namespace/task queue. |\n| `DW_ACTIVITY_TASK_MAX_ACTIVE_LEASES_PER_NAMESPACE` | (unset) | Optional server-side cap for active activity-task leases across all task queues in a namespace. |\n| `DW_ACTIVITY_TASK_MAX_DISPATCHES_PER_MINUTE` | (unset) | Optional server-side cap for activity-task dispatches per minute per namespace/task queue. |\n| `DW_ACTIVITY_TASK_MAX_DISPATCHES_PER_MINUTE_PER_NAMESPACE` | (unset) | Optional server-side cap for activity-task dispatches per minute across all task queues in a namespace. |\n| `DW_TASK_QUEUE_ADMISSION_OVERRIDES` | `{}` | JSON overrides keyed by `namespace:task_queue`, `namespace:*`, `task_queue`, or `*` for workflow/activity active lease, dispatch-per-minute, namespace, and downstream budget-group caps. |\n| `DW_DUE_TIMER_RECOVERY_SCAN_LIMIT` | `5` | Max due service-mode timer tasks recovered per worker poll pass. |\n| `DW_EXPIRED_WORKFLOW_TASK_RECOVERY_SCAN_LIMIT` | `5` | Max expired workflow tasks recovered per pass. |\n| `DW_EXPIRED_WORKFLOW_TASK_RECOVERY_TTL_SECONDS` | `5` | Min seconds between expired-task recovery passes. |\n| `DW_WORKER_PROTOCOL_VERSION` | `WorkerProtocol::VERSION` | Override for the advertised worker protocol version. |\n| `DW_HISTORY_PAGE_SIZE_DEFAULT` | `DEFAULT_HISTORY_PAGE_SIZE` | Default page size for worker history reads. |\n| `DW_HISTORY_PAGE_SIZE_MAX` | `MAX_HISTORY_PAGE_SIZE` | Maximum page size honored for worker history reads. |\n| `DW_QUERY_TASK_TIMEOUT` | `min(max(DW_WORKER_POLL_TIMEOUT + 15, 40), 55)` | Seconds the control plane waits for a worker query response. The default covers one worker long-poll cycle plus dispatch grace, while the 55-second ceiling preserves time to return a structured failure before the standard client transport deadline. |\n| `DW_QUERY_TASK_LEASE_TIMEOUT` | `DW_WORKFLOW_TASK_TIMEOUT` | Configured lease timeout for ephemeral query tasks; when `DW_QUERY_TASK_TIMEOUT` is nonzero, effective leases are at least `DW_QUERY_TASK_TIMEOUT + 5` seconds. |\n| `DW_QUERY_TASK_TTL_SECONDS` | `180` | Configured retention floor for query-task result rows; effective retention is at least query timeout + effective lease + 60 seconds. |\n| `DW_QUERY_TASK_MAX_PENDING_PER_QUEUE` | `1024` | Max pending cache-backed query tasks per namespace/task queue before new queries are rejected. |\n| `DW_QUERY_TASK_POLL_MAX_CONCURRENT` | (unset; derived for PHP CLI server) | Optional cap for concurrent held idle query-task worker long-poll waits. Pending query tasks are still claimed immediately before an idle poll waits; the default standalone image derives one query-task wait slot. |\n| `DW_WORKFLOW_TASK_TIMEOUT` | `60` | Default workflow-task lease timeout (seconds). |\n| `DW_ACTIVITY_TASK_TIMEOUT` | `300` | Default activity-task lease timeout (seconds). |\n| `DW_WORKER_STALE_AFTER_SECONDS` | `max(DW_WORKER_HEARTBEAT_INTERVAL_SECONDS * 3, 30)` | Seconds before a worker heartbeat is considered stale. |\n| `DW_WORKER_HEARTBEAT_INTERVAL_SECONDS` | `10` | Cadence in seconds advertised to SDK workers for fleet heartbeats. |\n| `DW_MAX_HISTORY_EVENTS` | `50000` | Max history events per run before continue-as-new is enforced. |\n| `DW_HISTORY_RETENTION_DAYS` | `30` | Default retention for closed-run history (namespaces can override). |\n| `DW_MAX_PAYLOAD_BYTES` | `2097152` | Max serialized bytes for a single payload. |\n| `DW_MAX_MEMO_BYTES` | `262144` | Max serialized bytes for a workflow memo. |\n| `DW_MAX_SEARCH_ATTRIBUTES` | `100` | Max search attributes per workflow. |\n| `DW_MAX_PENDING_ACTIVITIES` | `2000` | Max pending activities per run. |\n| `DW_MAX_PENDING_CHILDREN` | `2000` | Max pending child workflows per run. |\n| `DW_COMPRESSION_ENABLED` | `true` | Enable gzip/deflate on JSON responses over the size threshold. |\n| `DW_EXPOSE_PACKAGE_PROVENANCE` | `false` | Include `package_provenance` in `/api/cluster/info` (admin-only). |\n| `DW_PACKAGE_PROVENANCE_PATH` | `\u003cbase_path\u003e/.package-provenance` | Path to the package provenance file written at Docker build time. |\n| `DW_ENV_AUDIT_STRICT` | `0` | When `1`, the entrypoint fails container boot on unknown/legacy DW vars. |\n| `DW_BOOTSTRAP_RETRIES` | `30` | Bootstrap attempts before the entrypoint gives up. |\n| `DW_BOOTSTRAP_DELAY_SECONDS` | `2` | Seconds between bootstrap attempts. |\n\nThe bundled `durable-workflow/workflow` package reads the same\n`DW_V2_*` prefix for operator controls; every entry below is resolved\ninside the package's `config/workflows.php` via\n`Workflow\\Support\\Env::dw` and falls back to its legacy\n`WORKFLOW_V2_*` counterpart the same way the server's own vars do.\n\n| `DW_*` name | Default | Description |\n| --- | --- | --- |\n| `DW_V2_NAMESPACE` | (unset) | Scope workflow instances to a namespace. Unset means the default, visible-to-every-consumer namespace. |\n| `DW_V2_CURRENT_COMPATIBILITY` | (unset) | Worker-compatibility marker this worker advertises (e.g. `build-2026-04-17`). |\n| `DW_V2_SUPPORTED_COMPATIBILITIES` | (unset) | Comma-separated marker list the worker accepts, or `*` for any. |\n| `DW_V2_COMPATIBILITY_NAMESPACE` | (unset) | Compatibility namespace for independent fleets sharing one database. |\n| `DW_V2_COMPATIBILITY_HEARTBEAT_TTL` | `30` | Seconds a worker-compatibility heartbeat remains valid. |\n| `DW_V2_PIN_TO_RECORDED_FINGERPRINT` | `true` | Resolve in-flight runs from the fingerprint recorded at WorkflowStarted. |\n| `DW_V2_CONTINUE_AS_NEW_EVENT_THRESHOLD` | `10000` | History event count at which the package signals continue-as-new. |\n| `DW_V2_CONTINUE_AS_NEW_SIZE_BYTES_THRESHOLD` | `5242880` | Serialized-history byte count at which the package signals continue-as-new. |\n| `DW_V2_HISTORY_EXPORT_SIGNING_KEY` | (unset) | Optional HMAC key authenticating history export archives. |\n| `DW_V2_HISTORY_EXPORT_SIGNING_KEY_ID` | (unset) | Optional key identifier recorded alongside signed exports. |\n| `DW_V2_UPDATE_WAIT_COMPLETION_TIMEOUT_SECONDS` | `10` | Seconds the server waits for an update to reach a terminal stage. |\n| `DW_V2_UPDATE_WAIT_POLL_INTERVAL_MS` | `50` | Milliseconds between update-stage polls. |\n| `DW_V2_GUARDRAILS_BOOT` | `warn` | Boot-time structural guardrail mode: `warn`, `fail`, or `silent`. |\n| `DW_V2_LIMIT_PENDING_ACTIVITIES` | `2000` | Package-level pending-activity ceiling per run. |\n| `DW_V2_LIMIT_PENDING_CHILDREN` | `1000` | Package-level pending-child ceiling per run. |\n| `DW_V2_LIMIT_PENDING_TIMERS` | `2000` | Package-level pending-timer ceiling per run. |\n| `DW_V2_LIMIT_PENDING_SIGNALS` | `5000` | Package-level pending-signal ceiling per run. |\n| `DW_V2_LIMIT_PENDING_UPDATES` | `500` | Package-level pending-update ceiling per run. |\n| `DW_V2_LIMIT_COMMAND_BATCH_SIZE` | `1000` | Maximum commands accepted per workflow-task completion. |\n| `DW_V2_LIMIT_PAYLOAD_SIZE_BYTES` | `2097152` | Package-level single-payload byte ceiling. |\n| `DW_V2_LIMIT_MEMO_SIZE_BYTES` | `262144` | Package-level memo byte ceiling. |\n| `DW_V2_LIMIT_SEARCH_ATTRIBUTE_SIZE_BYTES` | `40960` | Package-level search-attribute byte ceiling. |\n| `DW_V2_LIMIT_HISTORY_TRANSACTION_SIZE` | `5000` | Package-level history-transaction event ceiling. |\n| `DW_V2_LIMIT_WARNING_THRESHOLD_PERCENT` | `80` | Percent of a structural limit at which the package warns. |\n| `DW_V2_TASK_DISPATCH_MODE` | `queue` | Package-level workflow-task dispatch mode. Usually overridden by the server via `DW_TASK_DISPATCH_MODE`. |\n| `DW_V2_MATCHING_ROLE_QUEUE_WAKE` | `true` | Whether queue workers run the in-worker matching-role wake on every Looping event. Set to `false` to opt execution-only nodes out of the broad-poll wake when a dedicated `php artisan workflow:v2:repair-pass --loop` daemon owns the sweep. |\n| `DW_V2_TASK_REPAIR_REDISPATCH_AFTER_SECONDS` | `3` | Seconds before an orphaned workflow task is redispatched. |\n| `DW_V2_TASK_REPAIR_LOOP_THROTTLE_SECONDS` | `5` | Minimum seconds between successive task-repair passes. |\n| `DW_V2_TASK_REPAIR_SCAN_LIMIT` | `25` | Maximum tasks considered per task-repair pass. |\n| `DW_V2_TASK_REPAIR_FAILURE_BACKOFF_MAX_SECONDS` | `60` | Ceiling on task-repair failure backoff in seconds. |\n| `DW_V2_MULTI_NODE` | `false` | Declare multi-node deployment so cache backends are validated for cross-node coordination. |\n| `DW_V2_VALIDATE_CACHE_BACKEND` | `true` | Validate the long-poll cache backend at boot. |\n| `DW_V2_CACHE_VALIDATION_MODE` | `warn` | Cache-backend validation failure handling: `fail`, `warn`, or `silent`. |\n| `DW_V2_FLEET_VALIDATION_MODE` | `warn` | Fleet-compatibility validation handling: `warn` logs, `fail` blocks dispatch and fails closed when no compatible worker is available. |\n| `DW_SERIALIZER` | `avro` | Payload codec diagnostic input. Legacy values are surfaced by `workflow:v2:doctor`; new-run v2 payloads always resolve to Avro. |\n\nLegacy `WORKFLOW_*` / `WORKFLOW_V2_*` / `ACTIVITY_*` names remain\nhonored as fallbacks during the deprecation window so existing\ndeployments keep working — `env:audit` logs a rename hint at boot for\neach one it sees.\n\n### HTTP concurrency (PHP_CLI_SERVER_WORKERS)\n\nThe image's default CMD runs `php artisan serve --no-reload` with\n`PHP_CLI_SERVER_WORKERS=24`. The `--no-reload` flag is required for\nLaravel's built-in server to honour the worker count — without it the\nserver logs `Unable to respect the PHP_CLI_SERVER_WORKERS environment\nvariable without the --no-reload flag` and falls back to a single\nthread, which will block every other request while one worker holds a\nlong-poll connection open.\n\nThe server derives a conservative idle long-poll budget from the worker\ncount. The published default also leaves request capacity for recurring\nliveness probes while eight workflow starts, eight worker polls, workflow\nlisting, readiness, and compatibility discovery are concurrent. Raise the\nworker count further for larger polyglot or multi-worker deployments:\n\n```bash\ndocker run --rm -p 8080:8080 -e PHP_CLI_SERVER_WORKERS=32 \\\n  --env-file .env durable-workflow-server\n```\n\nFor production workloads the `php artisan serve` built-in server is a\nreasonable default but not the ceiling — FrankenPHP, RoadRunner, or an\nnginx/php-fpm pair are all valid replacements and only require\noverriding the container's `CMD`.\n\n## Writing Workers\n\nWorkers in any language connect to the server via HTTP. The protocol is simple:\n\n1. **Register** the worker with supported types\n2. **Poll** for tasks (long-poll, server holds connection)\n3. **Execute** the task locally\n4. **Complete** or **fail** the task back to the server\n5. **Heartbeat** for long-running activities\n\n### PHP (using the SDK)\n```php\nuse DurableWorkflow\\Client;\nuse DurableWorkflow\\Worker;\n\n$client = new Client('http://localhost:8080', token: 'WORKER_TOKEN');\n\n$worker = new Worker($client, taskQueue: 'default');\n$worker-\u003eregisterWorkflow(MyWorkflow::class);\n$worker-\u003eregisterActivity(MyActivity::class);\n$worker-\u003erun();\n```\n\n### Python\n```python\nfrom durable_workflow import Client, Worker, workflow, activity\n\nclient = Client(\"http://localhost:8080\", token=\"WORKER_TOKEN\", namespace=\"default\")\n\nworker = Worker(\n    client,\n    task_queue=\"default\",\n    workflows=[MyWorkflow],\n    activities=[my_activity],\n)\nawait worker.run()\n```\n\n### Rust\n\nThe Rust SDK is developed and released independently in the\n[durable-workflow/sdk-rust](https://github.com/durable-workflow/sdk-rust)\nrepository. Installation and API guidance are available from the\n[Rust SDK documentation](https://rust.durable-workflow.com/).\n\nMIT\n## Public Boundary Checks\n\nThis is a public repository. Do not add private tracker names, workspace-only absolute paths, or loop/lane metadata to files or new commit metadata. Run `scripts/check-public-boundary.sh` before publishing changes; CI runs the same scan on pushes and pull requests.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdurable-workflow%2Fserver","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdurable-workflow%2Fserver","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdurable-workflow%2Fserver/lists"}