{"id":50733742,"url":"https://github.com/synapt-dev/vorn-mat","last_synced_at":"2026-06-10T11:31:06.700Z","repository":{"id":359856383,"uuid":"1245607052","full_name":"synapt-dev/vorn-mat","owner":"synapt-dev","description":"Vorn: Residual Direction, Familial Eviction, and the Granularity Rescue Spectrum — paper + reproducible code, data, and figures (Zenodo DOI: 10.5281/zenodo.20519215)","archived":false,"fork":false,"pushed_at":"2026-06-03T12:54:03.000Z","size":41911,"stargazers_count":0,"open_issues_count":8,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-03T14:22:58.627Z","etag":null,"topics":["attention","eviction","kv-cache","language-models","long-context","machine-learning","nlp","reproducible-research"],"latest_commit_sha":null,"homepage":"https://synapt.dev/vorn-mat/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/synapt-dev.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-21T11:34:11.000Z","updated_at":"2026-06-03T03:48:15.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/synapt-dev/vorn-mat","commit_stats":null,"previous_names":["synapt-dev/vorn-mat"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/synapt-dev/vorn-mat","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/synapt-dev%2Fvorn-mat","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/synapt-dev%2Fvorn-mat/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/synapt-dev%2Fvorn-mat/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/synapt-dev%2Fvorn-mat/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/synapt-dev","download_url":"https://codeload.github.com/synapt-dev/vorn-mat/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/synapt-dev%2Fvorn-mat/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34151271,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-10T02:00:07.152Z","response_time":89,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["attention","eviction","kv-cache","language-models","long-context","machine-learning","nlp","reproducible-research"],"created_at":"2026-06-10T11:31:06.643Z","updated_at":"2026-06-10T11:31:06.692Z","avatar_url":"https://github.com/synapt-dev.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Vorn-Mat\n\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.20519215.svg)](https://doi.org/10.5281/zenodo.20519215)\n\nReference implementation and result artifacts for the paper *Vorn: Residual Direction, Familial Eviction, and the Granularity Rescue Spectrum* (v1.1, June 2026).\n\nThis repository contains the prototype source, all released result artifacts, supplementary analysis scripts, and a runbook for reproducing the headline cells. The companion HuggingFace dataset at [`synapt/vorn-mat-cross-family-results`](https://huggingface.co/datasets/synapt/vorn-mat-cross-family-results) mirrors the result artifacts for citeable, discoverable access independent of this code repository.\n\n## What's in this repository\n\n- **`src/vorn_mat/`**: prototype source.\n  - `vorn.py`: residual-direction scoring at the canonical mid-depth layer (`L* = L // 2`) under a prefill-time cache selection contract with full-prompt visibility.\n  - `baselines/live_eviction.py`: token-level and sentence-level retention policies, plus TOVA-style and H2O-style attention-weight baselines under the same one-shot prefill contract.\n  - `plan.py`, `runner.py`, `remote_exec.py`: experiment plan dataclasses, result-envelope JSON schemas, and Modal job dispatch.\n  - `paired_stats.py`: exact paired McNemar tests over per-fixture observations preserved in each result row.\n- **`results/`**: 67 released JSON artifacts (each paired with a Markdown summary) covering the seven-family active claim panel (Mistral 7B v0.3, Llama 3.1 8B, Ministral 8B, Gemma 2 9B, Gemma 4 E4B-it, Qwen 2.5 7B, Qwen 3-NT 8B), the granularity rescue spectrum, the cross-task validation surface, two observational-boundary entries (Gemma 3 12B-pt, Qwen 3 30B-A3B), and the supporting probes documented in the paper. The cross-family finding is family-conditional: five families are channel-tolerant (Mistral, Llama 3.1, Ministral, Gemma 2, Qwen 2.5) and two families are attention-favoring at the shared b=1024 gate (Gemma 4 and Qwen 3-NT 8B).\n- **`scripts/`**: the Appendix A artifact-accounting recompute script (`appendix_a_recompute.py`) and the cross-family statistics script (`vorn_mat_cross_family_stats.py`) referenced in the paper.\n- **`examples/`**: Modal job harness for live-eviction experiments.\n- **`tests/`**: pytest suite covering plan, results, paired_stats.\n- **`docs/RUNBOOK.md`**: environment setup, smoke test, and headline-cell reproduction instructions.\n\n## Quickstart\n\nThe lightweight quickstart installs only the lightweight dev dependencies (`pytest`, `numpy`) and runs the 133 torch-free tests — the plan/result/paired-stats/orchestration layer that doesn't need a GPU:\n\n```bash\npython -m venv .venv \u0026\u0026 source .venv/bin/activate\npip install -e \".[dev]\"\npytest tests/ --ignore=tests/test_live_eviction_runner.py --ignore=tests/test_local_exec.py\n```\n\nFor the full 186-test suite (which exercises the live-eviction runner and the local-execution path through `transformers`), install the `[local]` extras as well. `[local]` pulls torch + transformers + accelerate + datasets + faiss-cpu + sentencepiece + huggingface_hub at the canonical pin set (~5 GB total; CPU-only is fine for running the tests, GPU is needed for the headline cells):\n\n```bash\npip install -e \".[dev,local]\"\npytest tests/\n```\n\nFor local validation against the bundled 5-case NIAH smoke fixture (requires sufficient RAM):\n\n```bash\npip install -e \".[local]\"\npython examples/run_local_vanilla.py --limit 5\n```\n\nFor Modal-backed reproduction of the headline cells in the paper, see `docs/RUNBOOK.md`.\n\n## Reproducibility substrate\n\nThe canonical reproduction path is a hash-locked Docker image built from the\nrepo-root `Dockerfile`. The base image is `nvidia/cuda:12.4.1-cudnn-runtime-ubuntu22.04`\nand the Python dependency closure is pinned by `requirements.lock` (generated\nvia `uv pip compile --generate-hashes`).\n\n```bash\ndocker build --platform linux/amd64 -t vorn-mat:canonical .\ndocker run --gpus all -v $(pwd):/app vorn-mat:canonical pytest tests/\n```\n\nThe `--platform linux/amd64` flag is required on macOS arm64 and other non-x86 hosts: the base image `nvidia/cuda:12.4.1-cudnn-runtime-ubuntu22.04` and the pinned torch wheel target Linux x86_64. Without the flag, Docker silently pulls an emulated image (or fails to find one) and reproduction drifts from the canonical Modal-run platform.\n\nThe Modal job entry-points (`examples/run_modal_*.py`) build the same image\nthrough `Image.from_dockerfile(...)`, so local Docker reproduction and Modal\nreproduction share an identical software stack.\n\n### Substrate improvement from failure: multi-token EOS support\n\nA 2026-05-23 Gemma 3 instruct rerun surfaced a harness defect in the\nhand-rolled live-eviction generation loops. Some chat-tuned models expose\nmultiple terminal token ids through `generation_config.eos_token_id`\n(for example `google/gemma-3-12b-it` uses both `\u003ceos\u003e` and\n`\u003cend_of_turn\u003e`). The earlier harness stopped only on the tokenizer's\nsingular `eos_token_id`, which could let the loop consume terminal special\ntokens and decode them away to an empty string. The live-eviction and\nstreaming loops now honor the full terminal-token set from\n`generation_config`.\n\nThe same patch also strengthens metadata isolation in the Modal wrappers by\noverwriting `metadata.model` with the request model id. This prevents stale\ncanonical-plan metadata from leaking into rerun artifacts when the request\nmodel differs from the default family.\n\n### Substrate improvement from failure: bf16 live generation stability\n\nThe same Gemma 3 rerun later exposed a deeper numerical-correctness defect:\nmanual live generation on CUDA float16 could emit NaN next-token logits.\nBecause the hand-rolled greedy loop did not suppress non-terminal pad tokens\nor treat NaN logits explicitly, the pad-mask path could surface NaN top\ncandidates and collapse into immediate blank / terminal behavior. Those\nblank constrained-row outputs were harness artifacts, not evidence about\nGemma 3 retrieval behavior.\n\nThe live harness now loads CUDA models in `bfloat16`, casts hidden-state and\nattention tensors back to `float32` before NumPy scoring boundaries, suppresses\nnon-terminal pad tokens during greedy selection, and raises explicitly if a\nrow contains only NaN logits. Live-eviction generation also emits structured\n`generation_step_*` and `token_step` diagnostic lines so token-level failures\nare visible in Modal logs.\n\nVerify-by-fruit anchor: a one-case Gemma 3 control rerun\n(`google/gemma-3-12b-it`, `sentence_vorn`, `B=8192`, `n=1`,\n`max_new_tokens=8`) produced prediction `9375710` with `correct=true`,\n`hit_rate=1.0`, and estimated cost `$0.1119` after the bf16 + cast fix.\n\nTwo-layer note: the Dockerfile installs the `vorn_mat` source via\n`pip install --no-deps -e /app` during image build to prime the editable\ninstall layer. At Modal run time the volume mount overlays the live source,\nso the in-container `vorn_mat` import points at whatever the Modal task\nmounts (not a stale build-time snapshot). Local `docker run -v $(pwd):/app`\ngets the same behavior.\n\nTo regenerate `requirements.lock` after an intentional pin change in\n`pyproject.toml`:\n\n```bash\nuv pip compile \\\n    --python-version 3.11 \\\n    --python-platform linux \\\n    --generate-hashes \\\n    pyproject.toml --extra local \\\n    -o requirements.lock\n```\n\nThe `--python-platform linux` flag is load-bearing: without it, the resolver\nruns against the host platform (macOS / Windows) and silently omits the\nLinux+CUDA transitive closure (`cuda-toolkit`, `nvidia-cublas`, etc.). The\nDocker image then fails at `pip install --require-hashes` because those\ntransitives are pulled at install-time but have no hash entries in the lock.\nAlways regenerate with the Linux target since that is what the Dockerfile\ninstalls into.\n\n### Reproducibility disclosure\n\nThe pin set in `pyproject.toml` and `requirements.lock` is a **best-guess\nreconstruction** based on raw-report timestamps and PyPI release chronology.\nThe original public canonical runs did not preserve:\n\n- a `pip freeze` / lockfile inside the prototype\n- a Modal image hash or image id in the result artifacts\n- a persisted `environment_versions` block in the result envelopes\n\nSo the pinned substrate above is the closest defensible reconstruction of the\ncanonical family-wave software stack, not a recovered exact lockfile. Two\ndocumented caveats follow from this:\n\n- the earliest May 13 / 14 seed reports likely ran on\n  `huggingface_hub==1.14.0` (since `1.15.0` had not released yet at those\n  timestamps)\n- some late May 20 budget-fill rows may have crossed into\n  `transformers==5.9.0` if Modal rebuilt the image after the `5.9.0` release\n  at `2026-05-20T14:50:45Z`\n\nGoing forward, every cell run on the pinned substrate captures\n`env_versions` (transformers / torch / accelerate / datasets / sentencepiece /\nhuggingface_hub / faiss-cpu) and CUDA peak-memory telemetry\n(`peak_memory_allocated_gb`, `peak_memory_reserved_gb`, `oom_near_miss`) into\nthe result envelope (`vorn_mat.results.RunResult`). This closes the\nprovenance hole for all post-2026-05-23 artifacts.\n\n## Resilience: per-case incremental persistence\n\nEvery baseline runner (`run_vanilla` / `run_vorn` / `run_live_eviction`)\naccepts an `on_case` callback that fires once per completed case with the\ncase's `CaseObservation`. The Modal and local wrappers wire this callback to\n`vorn_mat.results.append_observation`, which appends a single JSONL line +\n`fsync()` to a ledger file at `output_path.with_suffix(\".observations.jsonl\")`\n**before the next case runs**.\n\nWhy this matters: cell runs are minutes-to-an-hour long. Mid-run failures\n(server-side OOM, Modal container kill, network blip, manual kill, account\nGPU cap hit, laptop sleep) used to lose all completed cases because the\nsummary `RunResult` envelope only landed on disk after all cases finished.\nThe per-case ledger persists every completed case incrementally, so a mid-run\nkill at case N of 50 preserves cases 1..N on disk and only cases N+1..50 are\nlost.\n\nTo recover from a mid-run kill:\n- The summary file at `output_path` is missing or partial. Ignore it.\n- The ledger at `output_path.with_suffix(\".observations.jsonl\")` has all\n  completed cases. Reload with `vorn_mat.results.load_observations(path)`.\n- Re-run the cell with `case_offset_start` (or your wrapper's equivalent) to\n  resume from case N+1.\n\n### Modal-native parallel cell execution (Layer 5: fire-and-forget + collect)\n\n`examples/run_modal_cells_parallel.py` is the canonical entrypoint for firing\nmany cells in parallel. The local entrypoint defaults to fire-and-forget:\nit calls `orchestrate_wave.spawn(specs)`, persists the resulting `call_id`\nto `wave-state.json`, and exits within seconds. The cloud-side\n`orchestrate_wave` function (decorated with `@app.function(timeout=86400, ...)`)\nruns to completion independent of the spawning local client; it issues\n`binding.remote_fn.spawn()` per cell + per-handle `.get()` collection under\n`max_containers=10` on the cell function binding.\n\nResults are materialized by a separate `examples/collect_modal_wave.py`\nscript that retrieves the wave_report via\n`modal.FunctionCall.from_id(call_id).get()` and writes `reports.json` +\n`failures.json` to the output_dir recorded in `wave-state.json`. The collect\nstep is idempotent: re-running with the same wave-state.json returns the\non-disk artifacts without re-fetching.\n\nWhy this shape (Layer 5 substrate-fix, 2026-05-26 second-occurrence of\nModal-client-disconnect class-of-failure): Layer 3's ONE-`.remote()` call\ncorrectly protected the cloud function under `--detach`, but `.remote()`\nBLOCKS the local entrypoint synchronously waiting on the wave_report. If\nlocal disconnects mid-wait, `--detach` keeps the function running but the\nlocal artifact-writing step (reports.json/failures.json) is lost. Switching\ndispatch to `.spawn()` + on-disk `wave-state.json` + a separate collect\nscript gives the canonical Modal fire-and-forget pattern: local exits within\nseconds carrying only the call_id; cloud function runs independent of\nspawning client lifecycle; collect step materializes artifacts later.\n\nPer-cell exceptions surface as entries in the JSON-safe wave-report dict\nthat `orchestrate_wave` returns (`{\"reports\": [...], \"failures\": [...]}`),\nso partial-wave failures do not kill the whole batch. Each cell still\nbenefits from per-case persistence on the Modal Volume mount, so even\nwithin a failed cell, completed cases are preserved.\n\nThis pattern replaces, in successive layers:\n- (Layer 1) `pip_install` of loose-pin tuples at image-build time, which\n  could not reproduce the dependency closure that produced canonical results.\n- (Layer 2) user-side parallelism (ThreadPoolExecutor wrapping per-cell\n  `modal run`), which created N independent local-client lifecycles = N\n  independent disconnect-class failure points.\n- (Layer 3) local-entrypoint server-side fanout, which had ambiguous\n  protection semantics under `--detach`.\n- (Layer 4) cloud-side orchestrator via `.remote()`, which protected the\n  cloud function but required local to stay attached for the synchronous\n  wave_report return (the artifact-writing step).\n- (Layer 5) fire-and-forget `.spawn()` + persisted `call_id` + separate\n  idempotent collect; local exits in seconds; collect retrieves later.\n\n```bash\n# Build a cell spec JSON, then fire the wave (returns within seconds):\nmodal run --detach examples/run_modal_cells_parallel.py \\\n  --cell-spec-path .benchmarks/cell-specs.json\n# Output: call_id=fc_..., wave_state_path=.benchmarks/parallel-cells/wave-state.json\n\n# Walk away. Cloud runs to completion. Come back later and collect:\npython examples/collect_modal_wave.py \\\n  --wave-state-path .benchmarks/parallel-cells/wave-state.json\n# Output: cells_succeeded=N, cells_failed=M, output_dir=...\n\n# Aggregate per-cell results into per-family canonical artifacts:\npython examples/build_matrix_backfill_artifact.py --family mistral\n```\n\nLegacy sync mode (--wait flag preserves the prior Layer 4 behavior for\nshort waves / interactive dev):\n\n```bash\nmodal run examples/run_modal_cells_parallel.py \\\n  --cell-spec-path .benchmarks/cell-specs.json --wait\n```\n\n### Observability (Layer 4: Modal-visible progress logging)\n\nLong-running cells (30-60min A100 inference) used to go silent after the\nHuggingFace model-load phase. Modal captures stdout to the dashboard and\n`modal app logs` CLI, but cells emitted nothing during the silent inference\nphase, so from outside the local terminal there was no mid-cell progress\nvisibility.\n\nEvery baseline runner (`run_vanilla` / `run_vorn` / `run_live_eviction`)\naccepts a `progress_logger: Callable[[str], None] | None` keyword. When set,\nthe runner emits a line-based progress trace to Modal's stdout capture:\n\n- Once at start: `vorn-mat: dataset_loaded n_cases=N`\n- Per case: `vorn-mat: case I/N correct=true running_accuracy=0.XXX`\n- Once at end: `vorn-mat: complete n_cases=N hit_rate=0.XXX`\n\nThe Modal entry-points (`run_modal_*_niah`) default `progress_logger` to\n`vorn_mat.progress.default_progress_logger`, which prints with `flush=True`\nso Modal sees output immediately rather than buffered until container exit.\nModal auto-timestamps each line in the dashboard, so no local timestamping\nis added. To suppress emissions in non-Modal contexts (tests, library use),\npass `progress_logger=None`.\n\n## Reproducing the paper's headline numbers\n\nThe Appendix A v1.1 totals (67 artifacts / 446 counted rows / 342 with observations / 21,600 per-fixture observations / $119.59 / 49.46h) are reproducible from this repository's `results/` directory by running:\n\n```bash\npython scripts/appendix_a_recompute.py\n```\n\nThe script defines the explicit counting contract (which row-array fields are counted versus excluded, and why) and recomputes the totals against the released artifacts directly. The script handles the canonical result-envelope schemas plus the Phase 3 composed-artifact schemas (`phase1_cells`, `phase3_a100_cells`), top-level single-cell diagnostic artifacts, top-level list envelopes, `models[]`/`families[]` wrapper-descent for v0.2 extension-wave artifacts, and nested `row.result.observations[]` descent for rows that wrap a `result` sub-dict. `cells_by_family` (merged-view summary) and failure-list envelopes are excluded by design.\n\nThe paired McNemar p-values cited in the paper are recoverable from the per-fixture `observations[]` arrays in the claim-bearing result rows. 342 of the 446 counted rows carry observations. See Appendix A for the counting contract. For example, the Llama 3.1 vorn cross-task headline cell (sentence-vorn 92/200 versus token-vorn 52/200 at b=1024 on qa_2_4k, paired exact McNemar `p = 1.03e-08`) traces to `results/token-vorn-qa2-cross-task-2026-05-20.json`.\n\n## Citation\n\nIf you use this repository or the released artifacts in your work, please cite the paper:\n\n```bibtex\n@misc{penney2026vorn,\n  title={Vorn: Residual Direction, Familial Eviction, and the Granularity Rescue Spectrum},\n  author={Penney, L.},\n  year={2026},\n  doi={10.5281/zenodo.20519215},\n  howpublished={\\url{https://synapt.dev/vorn-mat/}},\n}\n```\n\n## License\n\nMIT. See `LICENSE`.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsynapt-dev%2Fvorn-mat","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsynapt-dev%2Fvorn-mat","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsynapt-dev%2Fvorn-mat/lists"}