{"id":49307645,"url":"https://github.com/45ck/prompt-language","last_synced_at":"2026-04-26T10:03:28.882Z","repository":{"id":345175964,"uuid":"1184804355","full_name":"45ck/prompt-language","owner":"45ck","description":"Programmable runtime for Claude Code with persistent state, context, control flow, and verification.","archived":false,"fork":false,"pushed_at":"2026-04-24T11:29:22.000Z","size":13772,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-24T11:32:18.370Z","etag":null,"topics":["agent","ai","anthropic","automation","claude","claude-code","completion-gates","control-flow","dsl","llm","plugin","prompt-engineering","typescript","workflow"],"latest_commit_sha":null,"homepage":"https://github.com/45ck/prompt-language#readme","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/45ck.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":"docs/roadmap.md","authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null},"funding":{"github":["45ck"]}},"created_at":"2026-03-18T00:24:11.000Z","updated_at":"2026-04-24T11:29:26.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/45ck/prompt-language","commit_stats":null,"previous_names":["45ck/prompt-language"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/45ck/prompt-language","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/45ck%2Fprompt-language","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/45ck%2Fprompt-language/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/45ck%2Fprompt-language/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/45ck%2Fprompt-language/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/45ck","download_url":"https://codeload.github.com/45ck/prompt-language/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/45ck%2Fprompt-language/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32292960,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-26T09:34:17.070Z","status":"ssl_error","status_checked_at":"2026-04-26T09:34:00.993Z","response_time":129,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agent","ai","anthropic","automation","claude","claude-code","completion-gates","control-flow","dsl","llm","plugin","prompt-engineering","typescript","workflow"],"created_at":"2026-04-26T10:03:28.806Z","updated_at":"2026-04-26T10:03:28.876Z","avatar_url":"https://github.com/45ck.png","language":"TypeScript","funding_links":["https://github.com/sponsors/45ck"],"categories":[],"sub_categories":[],"readme":"# @45ck/prompt-language\n\nA verification-first supervision runtime for coding agents. It wraps supported harnesses such as Claude Code and Codex in a persistent state machine with deterministic control flow, verification gates, and state management.\n\n[![npm](https://img.shields.io/npm/v/@45ck/prompt-language)](https://www.npmjs.com/package/@45ck/prompt-language) [![CI](https://github.com/45ck/prompt-language/actions/workflows/quality.yml/badge.svg)](https://github.com/45ck/prompt-language/actions/workflows/quality.yml) [![license](https://img.shields.io/npm/l/@45ck/prompt-language)](LICENSE) [![node](https://img.shields.io/node/v/@45ck/prompt-language)](package.json) [![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](CONTRIBUTING.md) [![npm downloads](https://img.shields.io/npm/dm/@45ck/prompt-language)](https://www.npmjs.com/package/@45ck/prompt-language)\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/hero-flow.svg\" alt=\"prompt-language flow example\" width=\"720\"\u003e\n\u003c/p\u003e\n\n\u003e **In one line.** A thin orchestrator that turns flaky agents into deterministic pipelines by imposing loops, retries, and verification gates the AI cannot self-report past.\n\n## Why it matters — the evidence so far\n\n| Claim                                                                      | Measurement                                                                                                                                                                  | Source                                                                                                                 |\n| -------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------- |\n| PL lifts `qwen3-opencode:30b` above solo aider on 10 independent fixtures. | **6 wins, 0 losses, 3 ties** (H1–H10). Gate-enforced retry turned 7/10 → 10/10 (H2), 0/3 → 3/3 (H5), 0/4 → 4/4 (H8).                                                         | [experiments/aider-vs-pl/SCORECARD.md](experiments/aider-vs-pl/SCORECARD.md)                                           |\n| PL rescues at least one sub-30B model on a coding task.                    | **qwen3:8b on E-SMALL CSV: 5/11 pre-retry → 9/11 after one PL retry edit** (+4 assertions). N=1, replications queued.                                                        | [experiments/aider-vs-pl/rescue-viability/LIVE-NOTES.md](experiments/aider-vs-pl/rescue-viability/LIVE-NOTES.md)       |\n| PL can be used to develop PL itself.                                       | Opencode progress-detector bug diagnosed, patched, and shipped as commit `04367d2` with new Vitest coverage (14/14). The MVP case study for the Level B self-hosting ladder. | [experiments/aider-vs-pl/SELF-HOSTING-THEORY.md](experiments/aider-vs-pl/SELF-HOSTING-THEORY.md)                       |\n| PL does **not** rescue models below the literal-code-emission threshold.   | `gemma4-opencode:{e2b,e4b}` refuted: degenerate decoding (repetition traps, 1/11 on E-SMALL CSV solo and under PL).                                                          | [experiments/aider-vs-pl/LOCAL-MODEL-VIABILITY-FINDINGS.md](experiments/aider-vs-pl/LOCAL-MODEL-VIABILITY-FINDINGS.md) |\n\nFull journey: **[experiments/JOURNEY.md](experiments/JOURNEY.md)** · Capability showcase: **[experiments/POWER-OF-PL.md](experiments/POWER-OF-PL.md)** · Open work: **[experiments/STATUS.md](experiments/STATUS.md)**\n\n## How it works in three bullets\n\n- **Deterministic execution** -- loops, branches, variables, and retries run without AI involvement. The AI only activates at `prompt` nodes. ~85% of execution is deterministic; ~15% is AI.\n- **Verification gates** -- `done when: tests_pass` runs real commands and blocks completion until they pass. The AI cannot self-report \"done.\"\n- **Parallel agents** -- `spawn` launches child processes, `await` collects results, `race` picks the fastest. Variables flow between parent and children automatically.\n\n## Install\n\n```bash\nnpx @45ck/prompt-language\n```\n\nRequires [Claude Code](https://docs.anthropic.com/en/docs/claude-code/overview) and Node.js \u003e= 22. Also works with [Codex CLI](https://github.com/openai/codex), [Gemini CLI](https://github.com/google-gemini/gemini-cli), and other harnesses via `npx @45ck/prompt-language run --runner \u003cname\u003e`.\n\nFor the full Claude Code and Codex walkthrough, including install verification, meta-prompt toggles, skill-aware wrapping, and terminal screenshots, see [docs/guides/claude-code-and-codex.md](docs/guides/claude-code-and-codex.md).\n\n## Example\n\n```\nagents:\n  reviewer:\n    model: \"opus\"\n    skills: \"code-review\", \"security-review\"\n\nflow:\n  let spec = prompt \"Write a detailed technical spec for: ${goal}\"\n  let tasks = prompt \"Break this spec into numbered implementation tasks: ${spec}\"\n\n  while ask \"Are there remaining tasks or unresolved review findings?\" grounded-by \"npm test\" max 10\n    foreach task in ${tasks}\n      retry max 3\n        prompt: Implement ${task}. Follow the spec: ${spec}\n        run: npm test\n        if command_failed\n          prompt: Fix the failing tests for ${task}.\n        end\n      end\n    end\n\n    spawn \"review\" as reviewer\n      prompt: Review all changes against the spec. List any gaps, bugs, or missing edge cases as numbered tasks.\n    end\n    await \"review\" timeout 120\n\n    if ${review.findings} != \"none\"\n      let tasks = prompt \"Convert these review findings into implementation tasks: ${review.findings}\"\n    end\n  end\n\ndone when:\n  all(tests_pass, lint_pass)\n```\n\nThe outer `while ask` loop keeps iterating as long as there are unresolved tasks or review findings. Each iteration implements tasks with retry, then spawns a reviewer -- if the reviewer finds problems, those become the new task list and the loop continues. The flow only exits when the AI judges there's nothing left _and_ real `tests_pass` + `lint_pass` gates confirm it.\n\nInvoke this as a skill from your preferred harness -- Claude Code, Codex CLI, Gemini CLI -- or run it headless in CI with `npx @45ck/prompt-language ci`.\n\nMore examples: [docs/examples](docs/examples/index.md) | [Proof examples](examples/public/) | DSL cheatsheet: [docs/reference/dsl-cheatsheet.md](docs/reference/dsl-cheatsheet.md)\n\n## Features\n\n| Category          | Highlights                                                                                                |\n| ----------------- | --------------------------------------------------------------------------------------------------------- |\n| **Control flow**  | `if`/`else if`/`else`, `while`, `until`, `retry`, `foreach`, `try`/`catch`/`finally`, `break`, `continue` |\n| **Variables**     | `let x = \"literal\"` / `run \"cmd\"` / `prompt \"...\"`, `${x}` interpolation, lists, arithmetic               |\n| **Verification**  | `tests_pass`, `lint_pass`, `file_exists`, custom gates, `all()`/`any()` composition                       |\n| **Agents**        | Named `agents:` with model/skills/profile, `spawn`/`await`, `race`, `send`/`receive`                      |\n| **AI conditions** | `ask \"question\" grounded-by \"command\"` for subjective evaluation with real data                           |\n| **Resilience**    | Persistent state, compaction survival, `snapshot`/`rollback`, `import`/`include`                          |\n\n## Research program\n\nprompt-language is developed in public alongside a live research program. The product code and the research artifacts share this repository on purpose: every claim on this README traces to a specific measurement in `experiments/`.\n\n### Overall goal\n\nFind out **whether a thin, declarative supervisor layer above an existing coding-agent harness can reliably turn imperfect agents (especially cheap/local ones) into dependable pipelines** — without retraining models, without re-implementing harnesses, and without trusting the model's own self-assessment.\n\n### Core hypotheses\n\n| #               | Hypothesis                                                                                                                                    | How we prove or refute it                                                                                                                                                                                                                                                 |\n| --------------- | --------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| **H-LIFT**      | PL's deterministic control flow + verification gates lift a given model above its solo-harness pass rate on coding fixtures.                  | Paired solo-vs-PL arms at fixed model. **Status: Validated** at `qwen3-opencode:30b` on H1-H10 (6-0-3, [SCORECARD](experiments/aider-vs-pl/SCORECARD.md)).                                                                                                                |\n| **H-RESCUE**    | PL rescues lower-capability models — i.e. the lift is largest, not smallest, as model capability drops.                                       | Rescue-delta sweep R1-R10 across models (qwen3:8b, qwen3-opencode:30b) and PL intensities (lite / medium / full). **Status: First signal at 8B**, [LIVE-NOTES](experiments/aider-vs-pl/rescue-viability/LIVE-NOTES.md). Replications + ablation queued.                   |\n| **H-CEILING**   | PL's lift shrinks with task difficulty; at some point the orchestration cannot compensate for missing capability.                             | Difficulty ladder from E-SMALL → H1-H10 → H11 multi-file refactor. **Status: partially confirmed** — qwen3-opencode:30b at 30B drops to 2-3/12 on H11.                                                                                                                    |\n| **H-FLOOR**     | PL cannot rescue models below the literal-code-emission threshold.                                                                            | Tested on `gemma4-opencode:{e2b,e4b}`. **Status: confirmed** (1/11 solo = 1/11 PL; decoding traps are not fixable by orchestration). [Findings](experiments/aider-vs-pl/LOCAL-MODEL-VIABILITY-FINDINGS.md).                                                               |\n| **H-SELF-HOST** | PL can be used to author non-trivial changes to PL itself under authoritative gates.                                                          | Meta-factory program (A-E ladder). **Status: Level 1 evidence** — today's opencode patch was diagnosed, authored, tested, and shipped. Level B (PL authors a PL-src patch end-to-end) designed but not yet run. [Theory](experiments/aider-vs-pl/SELF-HOSTING-THEORY.md). |\n| **H-ARENA**     | A PL + local-model + task-tuned flow can rival a vanilla cloud-harness + frontier-model stack on real coding tasks at a fraction of the cost. | Head-to-head pilot HA-E1 (5 arms on H11 multi-file refactor, $5 API budget cap). **Status: planned, blocked on oracle-isolation runner.** [Plan](experiments/harness-arena-HA-E1-PLAN.md).                                                                                |\n\n### How we prove things (not just \"run and hope\")\n\n1. **Every claim must cite a fixture and a pass count.** No \"it seems to work.\" If it isn't in an oracle's exit code, it doesn't count.\n2. **Gates are executable shell, not AI self-assessment.** `done when: gate x: \u003ccmd\u003e` — the shell's exit code decides, not the model's text. PL hard-stops after 50 consecutive gate failures (`PLO-004`) rather than trust the model's \"done.\"\n3. **Infrastructure defects disqualify measurements.** Today's session surfaced four PL runtime bugs (two aider-runner P1s, one gate-evaluator P1, one concurrent-state risk). Each is filed as an open bead with a reproducer; rescue-viability claims are held weak until they close.\n4. **Replications over single runs.** The first rescue delta on 8B was +4 assertions but `N=1` with observed pre-retry variance of 5-8/11 on the same prompt. The claim is held at \"thin signal\" until three runs are logged and a solo control is measured.\n5. **Falsification stop conditions.** The rescue program has an explicit abandon point at Run 9 (R2-D, qwen3:8b solo on H8): after nine runs, either median rescue ≥ +3/11 AND one PL feature carries ≥ +2/11 (confirmed), or pl-lite ≥ pl-full (refuted: it's just decomposition, not universal wisdom), or ambiguous (publish honestly as \"boilerplate-smoother, not universal wisdom\"). No later experiment can resurrect a refuted thesis.\n6. **The research tree is the evidence tree.** `experiments/` is dated, scorecard-first, and commits to raw artifacts: fixtures, flows, run dirs, audit JSONL, verify outputs. [experiments/POWER-OF-PL.md §9](experiments/POWER-OF-PL.md) states what PL is **not** claiming, explicitly.\n\n### Experiment areas\n\n| Codename          | Dir                                                                                       | Charter                                                                       |\n| ----------------- | ----------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------- |\n| **ladder**        | [`experiments/aider-vs-pl/`](experiments/aider-vs-pl/)                                    | Rung-by-rung solo-vs-PL at one fixed local model (H1-H20).                    |\n| **rescue**        | [`experiments/aider-vs-pl/rescue-viability/`](experiments/aider-vs-pl/rescue-viability/)  | Does PL lift lower-capability models (R1-R10).                                |\n| **atlas**         | [`experiments/ecosystem-analysis/`](experiments/ecosystem-analysis/)                      | Position PL in the OSS coding-agent landscape.                                |\n| **forge**         | [`experiments/meta-factory/`](experiments/meta-factory/)                                  | Can PL develop PL (self-hosting).                                             |\n| **foundry**       | [`experiments/full-saas-factory/`](experiments/full-saas-factory/) et al.                 | End-to-end product-build factories.                                           |\n| **harness-arena** | [`experiments/harness-arena/`](experiments/harness-arena/)                                | Vanilla cloud harness + frontier model vs PL + local model + task-tuned flow. |\n| **crucible**      | [`experiments/bounded-feature-benchmark/`](experiments/bounded-feature-benchmark/) et al. | Narrow stress tests isolating one DSL primitive.                              |\n\nFull codename charter and evidence tiers: [experiments/EXPERIMENT-AREAS.md](experiments/EXPERIMENT-AREAS.md).\n\n### Current work-in-progress\n\nLive beads snapshot: [`experiments/STATUS.md`](experiments/STATUS.md). Two P1 bugs gate trust in further measurements:\n\n- [`prompt-7zyi`](experiments/aider-vs-pl/AIDER-P1-TRIAGE.md) — aider runner walks to parent git dir for path resolution.\n- [`prompt-0zn1`](experiments/STATUS.md#p1--blocking-measurement-integrity) — gate evaluator reports `file_exists` false when file exists on disk.\n\n### Honest limitations (read before citing)\n\n- All local measurements come from one Windows 11 / RX 7600 XT 16 GB host, one model family (qwen3 variants + one gemma probe), and mostly one runner (aider).\n- H11 phase-2 shows the PL lift _shrinks_ as task difficulty rises (solo 2/12 → PL 3/12).\n- The rescue-at-8B signal is N=1 with pending replications and no solo control arm yet.\n- PL does **not** make a small model smarter in a single turn. It adds retries, gates, decomposition, and structure. Capability below the \"literal correct syntax\" bar cannot be rescued ([H-FLOOR confirmed](experiments/aider-vs-pl/LOCAL-MODEL-VIABILITY-FINDINGS.md)).\n\n## CLI commands\n\n| Command                               | What it does                                 |\n| ------------------------------------- | -------------------------------------------- |\n| `npx @45ck/prompt-language`           | Install the runtime                          |\n| `npx @45ck/prompt-language status`    | Check installation                           |\n| `npx @45ck/prompt-language validate`  | Parse, lint, score, and preview a flow       |\n| `npx @45ck/prompt-language run`       | Execute a flow via Claude or headless runner |\n| `npx @45ck/prompt-language ci`        | Run a flow in headless CI mode               |\n| `npx @45ck/prompt-language watch`     | Live TUI flow monitor                        |\n| `npx @45ck/prompt-language init`      | Scaffold a starter flow                      |\n| `npx @45ck/prompt-language demo`      | Print an annotated example                   |\n| `npx @45ck/prompt-language uninstall` | Remove the runtime                           |\n\nFull CLI documentation: [docs/reference/cli-reference.md](docs/reference/cli-reference.md)\n\n## Documentation\n\n### Research and evidence (start here if evaluating PL)\n\n| Topic                    | Link                                                                                                       |\n| ------------------------ | ---------------------------------------------------------------------------------------------------------- |\n| Research journey         | [experiments/JOURNEY.md](experiments/JOURNEY.md)                                                           |\n| Capability showcase      | [experiments/POWER-OF-PL.md](experiments/POWER-OF-PL.md)                                                   |\n| Current work-in-progress | [experiments/STATUS.md](experiments/STATUS.md)                                                             |\n| Experiment areas index   | [experiments/README.md](experiments/README.md)                                                             |\n| Scorecard (H1-H10)       | [experiments/aider-vs-pl/SCORECARD.md](experiments/aider-vs-pl/SCORECARD.md)                               |\n| Rescue-viability roadmap | [experiments/aider-vs-pl/rescue-viability/ROADMAP.md](experiments/aider-vs-pl/rescue-viability/ROADMAP.md) |\n| Self-hosting theory      | [experiments/aider-vs-pl/SELF-HOSTING-THEORY.md](experiments/aider-vs-pl/SELF-HOSTING-THEORY.md)           |\n\n### Using PL as a runtime\n\n| Topic                   | Link                                                                         |\n| ----------------------- | ---------------------------------------------------------------------------- |\n| Getting started         | [docs/guides/getting-started.md](docs/guides/getting-started.md)             |\n| Claude Code and Codex   | [docs/guides/claude-code-and-codex.md](docs/guides/claude-code-and-codex.md) |\n| Language reference      | [docs/reference/index.md](docs/reference/index.md)                           |\n| DSL cheatsheet          | [docs/reference/dsl-cheatsheet.md](docs/reference/dsl-cheatsheet.md)         |\n| How the runtime works   | [docs/guides/guide.md](docs/guides/guide.md)                                 |\n| Architecture and design | [docs/architecture.md](docs/architecture.md)                                 |\n| Security model          | [docs/security.md](docs/security.md)                                         |\n| Examples                | [docs/examples/index.md](docs/examples/index.md)                             |\n| Proof examples          | [examples/public/](examples/public/)                                         |\n| Experiments (catalog)   | [docs/experiments.md](docs/experiments.md)                                   |\n| Troubleshooting         | [docs/operations/troubleshooting.md](docs/operations/troubleshooting.md)     |\n| Roadmap                 | [docs/roadmap.md](docs/roadmap.md)                                           |\n| Full doc index          | [docs/index.md](docs/index.md)                                               |\n\n## Tooling\n\n- **VS Code extension** -- syntax highlighting for `.flow`, `.prompt`, and inline flow blocks. Source in `vscode-extension/`.\n- **GitHub Actions** -- run flows in CI with [`45ck/prompt-language-action`](https://github.com/45ck/prompt-language-action).\n\n## Contributing\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md).\n\n## License\n\nMIT. See [LICENSE](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F45ck%2Fprompt-language","html_url":"https://awesome.ecosyste.ms/projects/github.com%2F45ck%2Fprompt-language","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F45ck%2Fprompt-language/lists"}