{"id":50951470,"url":"https://github.com/szautkin/verbinal-execution","last_synced_at":"2026-06-18T02:02:43.615Z","repository":{"id":363217268,"uuid":"1262373557","full_name":"szautkin/verbinal-execution","owner":"szautkin","description":"CANFAR/Skaha contributed-session image: a file-drop watcher that runs agent-supplied Python/bash snippets and writes JSON results — no shell, no inbound network.","archived":false,"fork":false,"pushed_at":"2026-06-07T23:19:36.000Z","size":71,"stargazers_count":0,"open_issues_count":10,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-08T01:19:48.372Z","etag":null,"topics":["astronomy","canfar","code-execution","docker","python","skaha"],"latest_commit_sha":null,"homepage":null,"language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/szautkin.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-06-07T23:17:34.000Z","updated_at":"2026-06-07T23:19:40.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/szautkin/verbinal-execution","commit_stats":null,"previous_names":["szautkin/verbinal-execution"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/szautkin/verbinal-execution","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/szautkin%2Fverbinal-execution","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/szautkin%2Fverbinal-execution/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/szautkin%2Fverbinal-execution/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/szautkin%2Fverbinal-execution/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/szautkin","download_url":"https://codeload.github.com/szautkin/verbinal-execution/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/szautkin%2Fverbinal-execution/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34472826,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-18T02:00:06.871Z","response_time":128,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["astronomy","canfar","code-execution","docker","python","skaha"],"created_at":"2026-06-18T02:02:41.089Z","updated_at":"2026-06-18T02:02:43.606Z","avatar_url":"https://github.com/szautkin.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# verbinal-compute\n\n[![ci](https://github.com/szautkin/verbinal-execution/actions/workflows/ci.yml/badge.svg)](https://github.com/szautkin/verbinal-execution/actions/workflows/ci.yml)\n[![license: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)\n[![code style: ruff](https://img.shields.io/badge/code%20style-ruff-261230.svg)](https://github.com/astral-sh/ruff)\n\nA CANFAR/Skaha **contributed** interactive session image whose baked-in\nentrypoint is a long-lived watcher. The watcher polls a directory under the\nlaunching user's `/arc` home, executes short Python/bash snippets that an\nexternal client (Verbinal) drops there as JSON request files, and writes JSON\nresult files back. **No shell, no inbound network — the only work channel is\nfiles under `/arc`.**\n\nIt runs as a *contributed* session so it lands on the fast interactive pool\n(Running in seconds) rather than the headless batch queue. Long/batch work is\nout of scope — use headless jobs for that.\n\n## How it runs on Skaha\n\nA contributed session is, by the platform's contract, **a web app on port 5000\nlaunched from `/skaha/startup.sh`**, declared with the\n`ca.nrc.cadc.skaha.type=\"contributed\"` label. This image satisfies that contract\n*and* does the real work over files:\n\n```\n/skaha/startup.sh            ENTRYPOINT (contributed-session launch contract)\n├── health_server.py  :5000  liveness web surface so the portal keeps the pod up\n└── watcher.sh               the real work loop (file-drop executor)\n```\n\n`startup.sh` resolves config once, exports it, launches both processes, and\n`wait -n`s on them; if either exits it tears down so the pod restarts. The\nexecution mechanism is the **ENTRYPOINT** because Skaha ignores `cmd/args/env`\nfor contributed sessions — they are silently dropped.\n\nSkaha provides the runtime (don't configure it): the session runs as the\nlaunching CANFAR user (uid/gid from SSO; any `USER` directive is overridden),\nauto-mounts the user's home at `/arc/home/\u003cusername\u003e/`, provides `/scratch` for\nephemeral temp, and renews a session TTL.\n\n## The file-drop contract\n\nDefault layout (relocatable via config, below):\n\n```\n$HOME/.verbinal/exec/\n├── status.json     # heartbeat/readiness, written by the watcher\n├── inbox/          # client writes request files here\n├── out/            # watcher writes result files here\n└── done/           # watcher moves processed requests here (audit/idempotency)\n```\n\nThe watcher `mkdir -p`s the whole tree at startup — it is the source of truth\nand does not assume the client created it.\n\n**Request** (`inbox/\u003cSAFE_ID\u003e.json`):\n\n```json\n{ \"id\": \"req-1\", \"language\": \"python\", \"code\": \"print(1+1)\", \"timeout_seconds\": 120 }\n```\n\n- `id` — client-chosen, opaque, `[A-Za-z0-9._-]`, ≤128 chars. Sanitized before\n  any path use (`/:?*\u003c\u003e|\"\\` → `_`). The result echoes the **original** id.\n- `language` — `\"python\"` (→ `python3`) or `\"bash\"`; anything else → error.\n- `code` — UTF-8 string, run from a staged file (never the command line).\n- `timeout_seconds` — int, clamped to `[1, timeout_ceiling]` (default ceiling 900).\n- Unknown fields are ignored (forward-compat).\n\n**Result** (`out/\u003cSAFE_ID\u003e.json`):\n\n```json\n{ \"id\":\"req-1\", \"status\":\"ok\", \"exit_code\":0,\n  \"stdout\":\"2\\n\", \"stderr\":\"\", \"stdout_encoding\":\"utf8\", \"stderr_encoding\":\"utf8\",\n  \"duration_ms\":41, \"truncated\":false,\n  \"started_at\":\"2026-06-02T14:03:11Z\", \"finished_at\":\"2026-06-02T14:03:11Z\" }\n```\n\n- `status` — `ok` (exit 0) / `error` (non-zero, or malformed/unsupported) /\n  `timeout` (killed).\n- `exit_code` — real exit code; `124` on timeout; `-1` on malformed/unsupported.\n- `stdout`/`stderr` — each capped at 256 KiB (`output_cap_bytes`) and\n  **tail**-truncated (the end, where tracebacks live), with `truncated:true`.\n- `stdout_encoding`/`stderr_encoding` — `\"utf8\"` (default) when the stream is\n  valid UTF-8 (json.dumps escapes control chars; truncation prepends a\n  `...[truncated N bytes]...` marker), or `\"base64\"` when the stream contains\n  non-UTF-8 bytes — the field is then the base64 of the (tail-truncated) raw\n  bytes. This guarantees the result file is always valid JSON, so binary output\n  can never produce an unparseable result that hangs the client. (Absent ⇒\n  `utf8`, so it's backward-compatible.)\n- A result is produced for **every** claimed request, including failures and\n  timeouts — the client blocks on it.\n\n**Writes.** The watcher publishes results atomically: write\n`\u003cname\u003e.json.partial`, flush + fsync, verify non-empty, then `os.replace` to\n`\u003cname\u003e.json`.\n\n**Claiming (no rename on the client side).** Verbinal's cavern/ARC layer has no\nmove/rename op — it does a single HTTP `PUT` straight to\n`inbox/\u003cSAFE_ID\u003e.json`, so a file can be observed mid-upload. The watcher\ntherefore **never claims a file until it parses as a complete request**\n(structural check: valid JSON object with `id`/`language`/`code`/`timeout_seconds`\nof the right types). A file that doesn't parse yet is skipped and retried next\npoll. To avoid hanging the client on a genuinely malformed request, a file that\nstays unparseable **and byte-stable** (same size+mtime) past a short grace\nwindow (`STABLE_GRACE_SEC`, 3 s) is then claimed and given an `error` result.\nAn unsupported language is a *complete* request, so it's claimed immediately and\nerrored. The claim itself is the atomic `mv inbox → done`.\n\n**Crash recovery / idempotency:** names are deterministic from the id. If\n`out/\u003cSAFE_ID\u003e.json` exists the request is done (never re-run). On boot the\nwatcher re-scans `inbox/` for unresulted requests and also re-runs anything left\nin `done/` without a result (a crash mid-execution).\n\n## Configuration\n\nOne well-known JSON file — default `$HOME/.verbinal/config.json` — carries\nsettings under a top-level **`verbinal-execution`** key (so the same file can\nhold config for other Verbinal components). The file is optional; every field\nis optional and falls back to the documented default. Malformed/missing config\nsilently yields defaults.\n\n\u003e **No config file is required.** Verbinal v1 does not write\n\u003e `$HOME/.verbinal/config.json`; the watcher runs entirely on the defaults below\n\u003e — in particular `exec_dir = $HOME/.verbinal/exec`, ceiling `900`, cap\n\u003e `262144`. The file exists only to relocate/tune later without rebuilding.\n\n```json\n{\n  \"verbinal-execution\": {\n    \"exec_dir\": \"/arc/home/\u003cuser\u003e/.verbinal/exec\",\n    \"poll_interval_ms\": 1000,\n    \"output_cap_bytes\": 262144,\n    \"timeout_ceiling_seconds\": 900,\n    \"mem_fraction\": 0.75\n  }\n}\n```\n\n| Key | Default | Clamp | Meaning |\n|-----|---------|-------|---------|\n| `exec_dir` | `$HOME/.verbinal/exec` | — | Where the `inbox/out/done` tree lives. Relative paths resolve against `$HOME`; use this to put the channel on `/arc/projects/...` instead of home. |\n| `poll_interval_ms` | `1000` | 100–60000 | Inbox poll cadence. Heartbeat refreshes at ≤ this interval (and ≤ 2 s), staying fresher than the client's 3×-poll staleness bound. |\n| `output_cap_bytes` | `262144` | 1024–1048576 | Per-stream stdout/stderr cap before tail-truncation. |\n| `timeout_ceiling_seconds` | `900` | 1–3600 | Upper clamp on a request's `timeout_seconds`. |\n| `mem_fraction` | `0.75` | 0.10–0.95 | Per-request address-space (`ulimit -v`) ceiling as a fraction of the session memory limit, so one request can't OOM-kill the pod. |\n\n## Health / readiness\n\n`status.json` (composed and published atomically by the watcher) is both a\nheartbeat and a **live activity record**:\n\n```json\n{ \"ready\":true, \"watcher_version\":\"1.0.0\", \"pid\":1,\n  \"languages\":[\"python\",\"bash\"], \"poll_interval_ms\":1000,\n  \"resolved_home\":\"/arc/home/\u003cuser\u003e\", \"resolved_user\":\"\u003cuser\u003e\",\n  \"state\":\"processing\", \"processed_count\":7,\n  \"current\":{ \"id\":\"req-9\", \"language\":\"python3\", \"started_at\":\"...Z\" },\n  \"last_request\":{ \"id\":\"req-8\", \"status\":\"ok\", \"exit_code\":0, \"finished_at\":\"...Z\" },\n  \"last_error\":null,\n  \"heartbeat_at\":\"...Z\", \"started_at\":\"...Z\" }\n```\n\n- `state` — `idle` (polling), `processing` (a request is running — see\n  `current`), or `exiting` (SIGTERM). `current`/`last_request`/`last_error` are\n  `null` when not applicable.\n- `resolved_home`/`resolved_user` — what the watcher resolved from `$HOME` /\n  `id`. Verbinal builds its path from its CADC username, so it can assert these\n  match and **fail loudly** instead of silently reading a different directory.\n- The watcher is the **single writer**: it refreshes `heartbeat_at` on every\n  poll *and* from within a running request, so the heartbeat never goes stale —\n  even during a 900 s request — yet a genuinely hung watcher correctly stops\n  heartbeating (it doesn't get masked by an independent heartbeat thread).\n\n`GET http://\u003cpod\u003e:5000/` returns JSON reflecting that state — `state`,\n`processed_count`, `current`, `last_request`, `last_error` — plus\n**capability discovery** (`python_version` and the installed `packages` agent\ncode can import) and the full `watcher_status`:\n\n```json\n{ \"service\":\"verbinal-compute\", \"ready\":true, \"state\":\"processing\",\n  \"processed_count\":7, \"current\":{...}, \"last_request\":{...}, \"last_error\":null,\n  \"python_version\":\"3.11.x\", \"package_count\":142,\n  \"packages\":[ {\"name\":\"astropy\",\"version\":\"6.1.0\"}, {\"name\":\"numpy\",\"version\":\"1.26.4\"}, ... ],\n  \"heartbeat_at\":\"...Z\", \"started_at\":\"...Z\", \"watcher_status\":{...} }\n```\n\n`packages` is the authoritative list from the *same* interpreter agent code\nruns (`importlib.metadata` over the venv), computed **once at startup** and\ncached, so listing it adds no per-request cost.\n\nHTTP 200 when the heartbeat is fresh (`ready:true`) or the watcher hasn't\nwritten status yet (still starting); 503 once status exists but the heartbeat\nhas gone stale. This is the platform liveness/observability surface only — it\nperforms **no** execution.\n\n## Security \u0026 isolation\n\nThis image runs arbitrary agent-supplied code **as the CANFAR user** — intended.\nBlast radius equals what that user could do at a terminal; the watcher grants no\nextra privilege. There is no inbound network and no listening port for the work\nchannel — the only channel is `/arc`. Per-request: enforced wall-clock timeout\n(`timeout --signal=TERM --kill-after=5s`), `ulimit -t` (CPU) and `ulimit -v`\n(address space, a fraction of session RAM), and output caps. `id` is treated as\nuntrusted: sanitized for paths, never `eval`'d, never interpolated into a shell\nstring — code is always staged to a file and the file is run.\n\n## Bundled Python packages (what agent code can import)\n\nRather than the multi-GB `skaha/astroml` base, this image uses a lean\n`python:3.11-slim` base carrying the **star-ai-images science stack** (see\n`requirements.txt`), installed into a venv that becomes the default `python3`.\nResult: a **~1 GB image** (venv ≈ 870 MB) with the full stack — `numpy`,\n`scipy`, `pandas`, `matplotlib`, `astropy`, `astroquery`, `photutils`,\n`specutils`, `reproject`, `regions`, `fitsio`, `h5py`, `scikit-learn`,\n`scikit-image`, `ipython`, `tqdm`, `pyyaml`, `requests`, and `canfar` — plus\ntheir dependencies (~135 packages total). The notebook server (`jupyter`) from\nthe star-ai list is intentionally omitted (useless headless, large).\n\n**The live, authoritative list is the `:5000` health endpoint's `packages`\nfield** (see below) — that's the ground truth from the same interpreter agent\ncode runs. Add/remove packages by editing `requirements.txt` and rebuilding.\n\n## Build \u0026 publish\n\nlinux/amd64 only (arm64 fails at pull). Published image:\n\n```\nimages.canfar.net/private-test/verbinal-execution:0.0.1\n```\n\nBuild a clean single-arch image and push (provenance/SBOM attestations off so\nthe registry gets a plain amd64 image):\n\n```bash\ndocker build --platform linux/amd64 --provenance=false --sbom=false \\\n  -t images.canfar.net/private-test/verbinal-execution:0.0.1 .\ndocker push images.canfar.net/private-test/verbinal-execution:0.0.1\n```\n\n- **Base image** is the build arg `BASE_IMAGE`, defaulting to\n  `python:3.11-slim-bookworm` (a multi-stage build compiles into a venv in a\n  throwaway builder, so the toolchain never ships). Override to pin a different\n  base if needed.\n- The `ca.nrc.cadc.skaha.type=\"contributed\"` label is required — without it,\n  `POST /v1/session?type=contributed` returns HTTP 400. The image must be\n  registered in Harbor with the contributed session type.\n- The client reads/writes over `https://ws-uv.canfar.net/arc/files/home/\u003cuser\u003e/\n  .verbinal/exec/{out,inbox}/\u003cSAFE_ID\u003e.json`; the image only ever sees these as\n  plain files under the exec dir.\n\n## Layout\n\n```\nverbinal-execution/             # repo root\n├── Dockerfile                  # multi-stage: builder venv -\u003e slim runtime\n├── requirements.txt            # the bundled science stack\n├── Makefile                    # build / test / push helpers\n├── skaha/startup.sh            # contributed-session entrypoint (health + watcher)\n├── opt/verbinal/\n│   ├── watcher.sh              # the never-exiting file-drop executor\n│   ├── parse_request.py        # validate/stage a request (JSON in)\n│   ├── build_result.py         # build + atomically publish a result (JSON out)\n│   ├── read_config.py          # resolve the verbinal-execution config\n│   ├── status_writer.py        # compose + atomically publish status.json\n│   └── health_server.py        # :5000 liveness + live-state + packages surface\n├── test/\n│   ├── checklist.sh            # §7 go/no-go unit checks (drives watcher.sh)\n│   ├── integration.sh          # startup.sh + :5000 state + config relocation\n│   └── imports.sh              # the bundled science stack actually imports\n└── .github/workflows/ci.yml    # build image + run all three suites\n```\n\n## How the Verbinal team can check the image\n\n**1. Inspect what's available without launching code — hit `:5000`.** From\ninside the session (or via the Skaha-proxied URL), any GET returns the live\nstate plus the full package list. No `curl` in the slim base; use Python:\n\n```python\nimport json, urllib.request\nd = json.loads(urllib.request.urlopen(\"http://127.0.0.1:5000/\").read())\nprint(d[\"ready\"], d[\"state\"], d[\"python_version\"], d[\"package_count\"])\nprint({p[\"name\"]: p[\"version\"] for p in d[\"packages\"]})   # e.g. astropy 7.2.0, numpy 2.4.6, canfar 1.3.5 ...\n```\n\n`ready:true` + a fresh `heartbeat_at` means the watcher is up; `state` is\n`idle`/`processing`/`exiting`; `current`/`last_request`/`processed_count` show\nlive activity; `packages` is exactly what snippets can import.\n\n**2. End-to-end round-trip over the file channel** (what Verbinal actually\ndoes): write a request and read the result.\n\n```python\n# write inbox/t1.json  (single PUT in production; here a local write)\nreq = {\"id\":\"t1\",\"language\":\"python\",\"code\":\"print(1+1)\",\"timeout_seconds\":30}\n# ... PUT to .../.verbinal/exec/inbox/t1.json ...\n# then poll .../.verbinal/exec/out/t1.json -\u003e {\"status\":\"ok\",\"stdout\":\"2\\n\",...}\n```\n\n**3. Verify identity matches** before relying on the channel: compare\nVerbinal's CADC-derived path against `resolved_home`/`resolved_user` in\n`status.json` (or the health body's `watcher_status`); mismatch ⇒ fail loudly.\n\n## Testing locally (maintainers)\n\nBuild and run all three suites inside the image (they need GNU coreutils, which\nthe base provides). `-u 4321:4321` mirrors Skaha assigning an arbitrary,\nnon-root uid:\n\n```bash\nmake build \u0026\u0026 make test\n# equivalently:\ndocker buildx build --platform linux/amd64 -t verbinal-execution:dev .\nfor t in checklist integration imports; do\n  docker run --rm -u 4321:4321 -v \"$PWD\":/src:ro --entrypoint bash \\\n    verbinal-execution:dev /src/test/$t.sh\ndone\n```\n\nThe first build pulls the science stack (~100 s); after that, buildx layer\ncaching makes iteration fast — editing `opt/verbinal/` only rebuilds the small\nfinal layers, not the venv. (`BASE_IMAGE` must be a Python image, since the\nbuilder stage creates a venv; the default is `python:3.11-slim-bookworm`.)\n\nCI (`.github/workflows/ci.yml`) runs exactly these three suites on every push\nand PR. Current status: **checklist 37/37, integration 14/14, imports 19/19** on\nthe default lean image (linux/amd64, run as uid 4321).\n\n## Final validation on CANFAR (go / no-go)\n\nLaunch as the real user via Skaha (`type=contributed`) and confirm: session\nstays Running ≥10 min; `status.json` + tree appear within seconds with\n`ready:true` and advancing `heartbeat_at`; `:5000` returns the package list;\nround-trips for python/bash return correct streams; non-zero exit and timeouts\ndo **not** kill the session; \u003e256 KiB output truncates to valid JSON; binary\noutput comes back base64; unsupported language and malformed JSON return `error`\nand the loop continues; a relaunch on the same `/arc` home re-runs unresulted\nrequests but not already-resulted ones.\n\n## License\n\n[MIT](LICENSE). See also [`CONTRIBUTING.md`](CONTRIBUTING.md),\n[`SECURITY.md`](SECURITY.md), and [`CHANGELOG.md`](CHANGELOG.md).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fszautkin%2Fverbinal-execution","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fszautkin%2Fverbinal-execution","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fszautkin%2Fverbinal-execution/lists"}