https://github.com/martelogan/pi-autoclanker

Autoclanker pi extension for Bayesian Agent experiment loops
https://github.com/martelogan/pi-autoclanker

Last synced: 18 days ago
JSON representation

Autoclanker pi extension for Bayesian Agent experiment loops

Host: GitHub
URL: https://github.com/martelogan/pi-autoclanker
Owner: martelogan
License: apache-2.0
Created: 2026-04-13T00:39:26.000Z (2 months ago)
Default Branch: main
Last Pushed: 2026-05-28T00:42:44.000Z (21 days ago)
Last Synced: 2026-05-28T02:22:01.924Z (21 days ago)
Language: TypeScript
Size: 1000 KB
Stars: 1
Watchers: 0
Forks: 0
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE
- Agents: AGENTS.md

Awesome Lists containing this project

awesome-pi-coding-agent - martelogan-pi-autoclanker

README

# pi-autoclanker

### TypeScript-native pi extension for autoclanker

[![Node](https://img.shields.io/badge/node-22%2B-339933?logo=node.js&logoColor=white)](https://nodejs.org/)
[![Interface](https://img.shields.io/badge/interface-pi%20extension-143D59)](#quick-start)
[![Backend](https://img.shields.io/badge/backend-autoclanker-214E34)](https://github.com/martelogan/autoclanker)

**[Install](#install)** ·
**[Quick start](#quick-start)** ·
**[Clankerbench](#clankerbench)** ·
**[Mental model](#mental-model)** ·
**[Live surfaces](#live-surfaces)** ·
**[Commands](#commands)** ·
**[Tools](#tools)** ·
**[Skills](#skills)** ·
**[Hooks](#hooks)** ·
**[Start simple](#start-simple)** ·
**[Optimization loop](#optimization-loop)** ·
**[Why this is different](#why-this-is-different)** ·
**[Files & output](#files--output)** ·
**[Developer](#developer)**

*Start from a rough optimization goal, explore short ideas or full plan files in parallel, keep the eval surface fixed, and let `autoclanker` drive the actual fit loop.*

`pi-autoclanker` is the thin pi layer for
[autoclanker](https://github.com/martelogan/autoclanker). It is meant to feel
simple from the pi side:

- take a goal and rough ideas,
- write a small resumable session surface into your project,
- shell out to `autoclanker` for preview, apply, ingest, fit, suggest, and
commit recommendation.

If you like the optimization flow of
[Autoresearch](https://github.com/karpathy/autoresearch) or
[cEvolve](https://github.com/jnormore/cevolve), this is the same
`idea -> explore -> rethink` routine, but supported by Bayesian typed priors and
the snapshot-eval outer loop harness provided by `autoclanker`, so plans can
stay explicit, be explored in parallel, and be judged against locked eval
feedback instead of dissolving into one long prompt thread.

## Install

You need two things:

1. `autoclanker` on your machine
2. the `pi-autoclanker` extension installed into pi

Install `autoclanker`:

```bash
uv tool install git+https://github.com/martelogan/autoclanker.git
# or: pip install git+https://github.com/martelogan/autoclanker.git
```

Install the extension:

```bash
pi install https://github.com/martelogan/pi-autoclanker
```

For local development instead of the published repo:

```bash
pi install /absolute/path/to/pi-autoclanker
```

## Quick start

Inside a real project:

```bash
/autoclanker start Improve parser throughput without losing context quality.
```

That is the shortest useful path to **initialize or resume the session**. It
materializes the project-local files, previews typed beliefs through
`autoclanker`, and refreshes the widget/dashboard. It does not by itself launch
an autonomous coding loop. After the session starts, give the agent normal
instructions to use the `autoclanker_*` tools, implement candidates, run the
fixed eval surface, ingest measurements, fit, suggest, and repeat until a
measured keep/reject/blocker result exists.

For an unattended long run, use the execution handoff instead:

```bash
/autoclanker run --overnight Improve parser throughput without losing context quality.
```

`run --overnight` initializes or resumes the same files, sets `runIntensity` to
`mega`, records an unattended execution policy, and returns a handoff prompt
that tells the supervising agent not to ask late clarification questions. After
startup, uncertainty should become assumptions, risks, pending comparison
queries, or proposal notes. See [`docs/HEADLESS_AGENT.md`](docs/HEADLESS_AGENT.md)
for non-Pi and enterprise/cloud supervisor usage.

If you do not provide a real eval command yet, `pi-autoclanker` can generate a
default checked-in `autoclanker.eval.sh` stub so the session starts immediately
and stays inspectable.

If you want a checked-in intake file, store rough ideas directly or in
`autoclanker.ideas.json`:

```json
{
"goal": "Improve parser throughput without losing context quality.",
"ideas": [
"Cache repeated matcher work.",
{ "id": "context_plan", "path": "plans/context-pair-plan.md" }
],
"constraints": ["Keep output quality stable."]
}
```

Use plain strings for quick ideas, and point at the file directly when one idea
is already a checked-in markdown or text plan. This is one of the main harness
advantages: several rough ideas or fully-fledged plans can stay explicit, be
explored in parallel when practical, and be measured against the same locked
eval feedback instead of getting flattened into a single prompt history.

That intake file is only a convenience surface, not the minimum required input.
`autoclanker.beliefs.json` remains the generated belief surface. When you are
ready for explicit multi-path comparison, add `autoclanker.frontier.json` and
use the frontier commands instead of burying alternative paths in prompt
history. `pi-autoclanker` keeps a concise local label for a plan-backed idea,
but passes a bounded canonicalization view to `autoclanker` while keeping file
path and digest provenance locally in `autoclanker.beliefs.json`.

Keep the starter file intentionally small. If you later want to capture risks,
pairwise preferences, or confidence hints, let
`/skill:autoclanker-advanced-beliefs` pull those out after the first preview
instead of expanding the starter file up front.

If you already want explicit early lanes, keep that as a later-stage
`pathways` shape instead of front-loading it into the first intake example. See
[`examples/parser-demo-expanded/autoclanker.ideas.json`](examples/parser-demo-expanded/autoclanker.ideas.json)
for the explicit-lane form and the expanded demo.

For domain work, do not rely on the package demo surface. Put the domain-local
typed surface and explicit candidate lanes directly in `autoclanker.ideas.json`
or `autoclanker.frontier.json`. A compact domain intake can carry:

```json
{
"goal": "Improve a domain-specific hot path.",
"surface_overlay": {
"registry": {
"domain.request_boundary": {
"states": ["baseline", "precompute_settings"],
"default_state": "baseline",
"description": "Move repeated request setup work earlier.",
"surface_kind": "mutation_family",
"semantic_level": "strategy",
"materializable": false,
"origin": "idea_file"
}
}
},
"ideas": [{ "id": "settings_boundary", "text": "Precompute settings once." }],
"pathways": [
{
"id": "settings_boundary",
"idea_ids": ["settings_boundary"],
"genotype": [
{
"gene_id": "domain.request_boundary",
"state_id": "precompute_settings"
}
]
}
]
}
```

When a frontier has more than one lane, `ingest-eval` now requires an explicit
`--candidate-id` or unambiguous `--family-id`; this prevents measurements from
being attributed to a generic current workspace lane. That isolation is per
measurement, not per whole run: after each fit/suggest cycle the frontier can
keep, drop, split, or merge pathways and evaluate those merged candidates under
the same locked eval surface.

If you want a guided setup instead of typing everything into a slash command,
start with:

```bash
/skill:autoclanker-create
```

For the LLM-assisted intake path, start with rough notes rather than trying to
hand-author Bayes syntax:

```text
I want to start a pi-autoclanker run from these rough optimization notes.
Ask at most three clarifying questions if needed, then create or update
autoclanker.ideas.json with:

- a short goal
- the rough ideas as strings, markdown-plan paths, or rich idea objects
- the fixed eval command or a checked-in autoclanker.eval.sh surface
- domain-local surface_overlay genes when this is not the package demo domain
- explicit pathways with genotype entries for the first lanes worth comparing

After writing the file, run autoclanker_preview_beliefs, show me the typed
belief/frontier preview, revise once if the lanes are wrong, then apply and
start the measured loop. For multi-candidate frontiers, bind every eval ingest
to a candidateId or unambiguous familyIds selector.
```

That discussion phase is intentionally lightweight. The LLM should help turn
loose markdown, JSON snippets, benchmark notes, and operator constraints into a
reviewable `autoclanker.ideas.json`; `pi-autoclanker` then keeps the resulting
beliefs, frontier, eval surface, progress snapshot, and history inspectable on
disk instead of leaving the setup hidden in chat. During long runs, use the
widget, `/autoclanker status`, and `autoclanker.progress.json` to see the active
command, current lane, iteration count, trust/eval state, and latest measured
summary.

Most runs should keep a finite `maxIterations` so weak searches converge and
summarize instead of spinning. When the intent is a supervised "go as far as
possible" run, use `/autoclanker run --overnight`, set
`runIntensity: "mega"` in `autoclanker.config.json` or
`autoclanker.ideas.json`, or pass `--run-intensity mega` / `--mega`. Mega mode
keeps the locked eval surface and candidate binding rules, but disables the
wrapper's max-iteration stop so the supervisor can continue until all valuable
lanes have a measured keep/reject/blocker result.

For a preseeded benchmark workspace, start from the directory that contains the
session files and point at the existing intake file:

```bash
/autoclanker start --ideas-input autoclanker.ideas.json
```

Then paste the workspace prompt, for example:

```text
Read README.md, autoclanker.md, autoclanker.ideas.json, and the benchmark
briefs first. Use the active pi-autoclanker session and its autoclanker_* tools.
Treat bash autoclanker.eval.sh as the fixed eval surface. Do not stop until you
have a measured keep/reject/blocker result.
```

For unattended runs, prefer:

```bash
/autoclanker run --overnight --ideas-input autoclanker.ideas.json
```

## Mental model

The beginner mental model should stay small:

![pi-autoclanker loop](docs/assets/pi-autoclanker-mental-model.svg)

- start from a direct goal or an optional `autoclanker.ideas.json`
- let `autoclanker` preview those ideas as typed beliefs
- evaluate one or more candidate lanes or pathways
- ingest, fit, and ask what to compare next

One vocabulary layer is enough to use the tool well:

![pi-autoclanker structure](docs/assets/pi-autoclanker-structure.svg)

- `optimization lever (gene)`: one explicit knob the upstream adapter exposes
- `setting (state)`: one concrete value of that lever
- `candidate lane` or `pathway`: one concrete combination being evaluated
- `frontier`: the explicit set of lanes under comparison
- `belief`: a claim about one setting, relation, risk, or preference

The engine learns over explicit candidate features and relations, not hidden
prompt state. Status may show backend names or comparison focus, but those are
evidence and debugging details, not required user inputs.

See [`docs/MENTAL_MODEL.md`](docs/MENTAL_MODEL.md) for the fuller plain-language
version, including when advanced structure is worth adding and what beginners
can safely ignore.

## What’s included

| Surface | What it gives you |
| --- | --- |
| Extension | pi tools plus the `/autoclanker` command family |
| Skills | beginner creation, advanced belief authoring, and session review |
| Hooks | optional executable before/after eval sidecars for context refresh, notifications, and learnings |
| Local files | resumable checked-in session files plus an optional `autoclanker.ideas.json` intake file |
| Status surface | trust digest, backend choice, and next concrete comparison without digging through raw JSON |
| Clankerbench contracts | generic staged benchmark manifest, schema, TypeScript types, and provider example |
| Upstream artifacts | `.autoclanker//` JSON, reports, and charts from `autoclanker` |

The fastest way to understand the repo now is:

- [`examples/targets/parser-quickstart`](examples/targets/parser-quickstart) for
a real packaged parser target and benchmark
- [`examples/minimal`](examples/minimal) for the smallest kickoff shape
- [`examples/parser-demo-expanded`](examples/parser-demo-expanded) for a fuller
worked session after the extension has already materialized local files

## Clankerbench

`clankerbench` is the generic benchmark framework contract that lets a project
describe a staged benchmark harness without putting project-specific logic into
the benchmark contract. It defines the stage vocabulary, manifest schema, and
provider boundary for flows like:

```text
bootstrap -> cohort -> materialize -> analyze -> spec
| |
v v
context -> eval -> compare
|
v
distill -> session
```

The current support is intentionally additive and contract-first:

- [`docs/CLANKERBENCH.md`](docs/CLANKERBENCH.md) describes the methodology and
provider boundary.
- [`schemas/clankerbench.pipeline.schema.json`](schemas/clankerbench.pipeline.schema.json)
defines the JSON manifest.
- [`src/clankerbench.ts`](src/clankerbench.ts) exports the stage constants,
TypeScript types, and lightweight validation helper.
- [`examples/clankerbench-mini`](examples/clankerbench-mini) shows the manifest
shape for a command-backed provider.

Project-specific harnesses should plug in behind provider commands or modules.
Any compatible outer-loop engine can then consume the declared artifacts,
context brief, guardrails, hooks, stop conditions, and eval command;
`clankerbench` explains how the benchmark surface was selected, prepared,
packaged, researched, and checked.

`pi-autoclanker command start` auto-detects `clankerbench.manifest.json` in the
workspace, or accepts `--clankerbench-manifest `. When present, the
manifest can seed the session goal, fixed eval command, guardrails,
max-iteration budget, research sources, and status/evidence paths while
project-specific benchmark logic remains behind provider commands.

## Live surfaces

The wrapper now keeps one shared live model and exposes it through four views:

- a compact widget that appears when the current cwd contains an autoclanker
session (i.e. `autoclanker.config.json`, `autoclanker.md`, or
`autoclanker.history.jsonl` is present at the project root). In directories
without an autoclanker session the widget stays hidden so pi boots silently
outside of optimization work; `/autoclanker` and the keyboard shortcuts
below still surface the widget on demand from anywhere.
- `Ctrl+X` or `Ctrl+Alt+X` for an expanded inline dashboard
- `Ctrl+Shift+X` or `Ctrl+Alt+Shift+X` for a fullscreen overlay
- `/autoclanker export` for the machine-readable bundle and, inside the
interactive extension, a browser dashboard that auto-refreshes while the
extension is driving work

Those views stay grounded in the same four plain-language briefs:

- `Prior Brief`: what the run started with and why those lanes exist
- `Run Brief`: what is being tested now, who leads, and what comparison comes next
- `Posterior Brief`: what the evidence changed after fit and suggest
- `Proposal Brief`: what is ready, blocked, deferred, or waiting for approval

The expanded views include both a frontier decision table and a proposal table.
That is where promoted lanes, pending merges, blocked lanes, and recommended or
approval-ready proposals stay visible while a long run continues.

The browser dashboard and widget stack do not add a second engine. They are
just richer views over the same local files and upstream `autoclanker`
artifacts. When upstream `autoclanker` exposes `session review-bundle`,
`pi-autoclanker` prefers that normalized review model and mirrors it through
status, export, the widget stack, lineage, trust, and next-action panels while
keeping a local-derived fallback for older sessions. Partial upstream review
data is merged with local frontier and artifact state rather than replacing it,
and the wrapper does not leave behind extra `dashboard_payload.json`-style
files by default.

During pi context compaction, the extension emits a deterministic summary from
`autoclanker.md`, beliefs, frontier, proposals, hooks, and recent local history
instead of relying on an LLM to rediscover the run state from chat. New turns
also receive a short active-session prompt pointer so the agent resumes from the
project-local files rather than from stale conversation memory.

## Commands

`pi-autoclanker` exposes one slash-command family:

| Command | Description |
| --- | --- |
| `/autoclanker run ` | Initialize or resume an unattended/headless execution handoff; use `--overnight` for long autonomous runs. |
| `/autoclanker start ` | Initialize or resume the project-local session from a goal; it does not launch autonomous coding by itself. |
| `/autoclanker resume` | Mark the current session active again without changing beliefs. |
| `/autoclanker status` | Summarize the current local session files plus upstream review, trust, lineage, and next-action state. |
| `/autoclanker frontier-status` | Show local frontier state plus upstream frontier summary. |
| `/autoclanker compare-frontier` | Persist or reuse `autoclanker.frontier.json` and compare explicit pathways upstream. |
| `/autoclanker merge-pathways` | Merge selected pathways into the local frontier file and re-rank them upstream. |
| `/autoclanker off` | Disable the current session without deleting resumable files. |
| `/autoclanker clear` | Delete local `pi-autoclanker` files and the upstream session root. |
| `/autoclanker export` | Export the current session bundle as machine-readable JSON, including the normalized review bundle; inside the interactive extension it also opens the browser dashboard. |

Useful examples:

```text
/autoclanker start Reduce API latency without hurting correctness.
/autoclanker run --overnight Reduce API latency without hurting correctness.
/autoclanker compare-frontier
/autoclanker status
/autoclanker export
/autoclanker off
```

## Tools

These are the extension tools available to pi:

| Tool | Description |
| --- | --- |
| `autoclanker_init_session` | Bootstrap local session files and upstream session state. |
| `autoclanker_session_status` | Read resumable local state and ask `autoclanker` for upstream status. |
| `autoclanker_frontier_status` | Read the local frontier file and ask `autoclanker` for upstream frontier status. |
| `autoclanker_preview_beliefs` | Preview or canonicalize rough ideas before apply. |
| `autoclanker_apply_beliefs` | Apply the current belief batch through `autoclanker`. |
| `autoclanker_ingest_eval` | Run optional eval hooks, execute the checked-in eval surface under the locked upstream eval contract, and ingest its JSON result. |
| `autoclanker_fit` | Fit the active upstream `autoclanker` session. |
| `autoclanker_suggest` | Request the next suggestion, optionally against an explicit candidate pool. |
| `autoclanker_compare_frontier` | Persist or reuse `autoclanker.frontier.json`, then compare explicit pathways through upstream `autoclanker`. |
| `autoclanker_merge_pathways` | Merge selected pathways into the local frontier file and ask upstream `autoclanker` to re-rank them. |
| `autoclanker_recommend_commit` | Ask `autoclanker` for a commit recommendation. |

The point of these tools is not to reimplement `autoclanker` in TypeScript. The
extension stays thin and inspectable, while `autoclanker` remains the Bayesian
source of truth.

## Skills

| Skill | Purpose |
| --- | --- |
| `autoclanker-create` | Start from a direct goal or optional `autoclanker.ideas.json`, write the local files, preview beliefs, and initialize the session. |
| `autoclanker-autonomous-supervisor` | Drive unattended or headless execution from the generated handoff without asking late clarification questions. |
| `autoclanker-advanced-beliefs` | Turn rough ideas into compact advanced JSON beliefs by starting with up to three high-yield follow-up questions per round when the beginner path is no longer enough. |
| `autoclanker-hooks` | Add optional `autoclanker.hooks/before-eval.sh` and `after-eval.sh` scripts for eval-adjacent side effects without turning hooks into a second optimizer. |
| `autoclanker-review` | Read the current session and summarize it through the Prior / Run / Posterior / Proposal briefs in plain language. |

The common flow is:

- use `autoclanker-create` first,
- keep rough ideas as plain strings at first,
- move to `autoclanker-advanced-beliefs` only when risks, relations, or
graph-structured priors actually matter.

## Hooks

`pi-autoclanker` also has a small lifecycle hook surface inspired by recent
[`pi-autoresearch`](https://github.com/martelogan/pi-autoresearch) workflow
improvements, adapted to autoclanker's eval-contract model:

```text
autoclanker.hooks/
before-eval.sh # optional, executable, runs before autoclanker.eval.sh
after-eval.sh # optional, executable, runs after upstream eval ingest
```

Each hook receives one JSON object on stdin with workspace, session, candidate,
frontier, and recent history context. `after-eval.sh` also receives the eval
JSON and upstream ingest result. Hook stdout/stderr are capped, returned from
`autoclanker_ingest_eval`, and recorded in `autoclanker.history.jsonl`.
Non-zero exits and timeouts are visible but do not by themselves fail the eval
lane.

Use hooks for side effects around evidence collection: pull fresh external
context before an expensive lane, append a learnings journal after an ingest,
notify on interesting results, or remind the agent that a lane is missing
notes. Do not use hooks to rewrite the fixed eval contract or hide candidate
selection outside the frontier.

For guided setup:

```bash
/skill:autoclanker-hooks
```

That skill ships reference scripts for frontier reminders, operator-provided
context lookup, anti-thrash nudges, idea rotation, learnings journals,
machine-readable evidence digests, and local macOS notifications.

## Start Simple

You do not need advanced Bayes JSON or a complex population file to begin. The
smallest useful input is still just a goal, a few rough ideas, and optional
constraints:

```text
goal: lower latency without reducing quality
rough ideas:
- cache repeated work
- try batch sizes 16 / 32 / 64
- reduce allocation churn
constraints:
- keep output quality stable
- keep the eval surface fixed while comparing paths
```

That is enough to preview beliefs and start a session. `autoclanker.beliefs.json`
can keep those as plain strings at first. Candidate-pool JSON, graph
directives, and advanced belief authoring stay opt-in until the search actually
needs them. If you prefer a reusable intake file, the same beginner shape fits
naturally in `autoclanker.ideas.json`, but direct prompt input stays the
default path.

## Optimization Loop

`pi-autoclanker` should feel easy to start from rough optimization ideas:

```text
rough ideas
↓ preview as typed beliefs
candidate lanes: [A] [B] [A+B]
↓ evaluate available lanes in parallel when practical
eval JSON per lane
↓ ingest -> fit
ranked candidates + influence notes + next query
↓ keep / merge / split / drop lanes
next era
```

That loop is the core product:

- start from rough ideas, not hand-authored Bayes syntax
- keep isolated paths and combined paths explicit instead of burying them in
prompt history
- evaluate candidates against a fixed checked-in `autoclanker.eval.sh` surface,
with the wrapper threading the locked upstream eval contract into that shell
before ingest and supporting explicit parallel lane exploration when you have
the workers to do it
- keep the locked contract and frontier summary visible in `status` and
`export`, so trust drift and lane counts stay inspectable
- keep the currently active objective backend, acquisition backend, and any
concrete lane-vs-lane follow-up query visible in the wrapper summary instead
of hiding them in upstream JSON
- use `fit`, `suggest`, and `recommend-commit` to decide whether to drop,
merge, split, or strengthen lanes in the next era

If you already like the
[Autoresearch](https://github.com/karpathy/autoresearch) or
[cEvolve](https://github.com/jnormore/cevolve) intuition, the important
difference is that `pi-autoclanker` can run that same search loop while also
recording typed beliefs, explicit relations, and machine-readable uncertainty.

An evolve-style epoch still maps cleanly:

```text
Era 0 lanes: [A], [B], [C], [A+B]
-> evaluate available lanes in parallel
-> rank, compare, and query the interesting differences
-> keep / merge / split / drop lanes
Era 1 lanes: [A], [B+C], [A+B], [A+B+C], ...
```

## Why This Is Different

`pi-autoclanker` should give pi users a thin, inspectable path into
`autoclanker`:

- gather rough optimization ideas from a user
- turn them into previewable `autoclanker` sessions
- help escalate rough ideas into advanced Bayes declarations when needed
- structure multiple candidate pathways explicitly so `autoclanker` can rank,
compare, and query them
- keep a resumable project-local session surface
- expose thin tools and skills rather than reimplementing the Bayesian engine

`pi-autoclanker` is not meant to compete with a loose planning chat on
free-form brainstorming alone. The value shows up when you want the exploration
to stay structured and comparable:

- rough ideas become inspectable belief batches instead of disappearing into
prompt history
- candidate lanes can stay explicit instead of getting buried inside a single
prompt thread
- `suggest` can evaluate an explicit candidate pool so several pathways can be
ranked and compared together
- advanced beliefs can express when pathways should reinforce, combine with, or
stay separate from each other through explicit priors and graph directives
- the checked-in eval shell stays fixed for the life of a session, so long
optimization loops cannot quietly rewrite that local eval surface mid-run
- `fit`, `suggest`, and `recommend-commit` keep the downstream reasoning
machine-readable through ranked candidates, follow-up queries, backend
selection, and influence summaries when upstream provides them

If you want the simplest mental model, treat `pi-autoclanker` as a strict
superset of an evolve-style workflow:

| Evolve-style intuition | `pi-autoclanker` equivalent |
| --- | --- |
| idea list | rough ideas and canonical belief batch |
| population | explicit candidate pool |
| crossover | explicit combined candidates or positive graph links |
| mutation | revised candidate variants or updated belief parameters |
| fitness run | eval shell -> ingest -> fit |
| rethink pass | revise beliefs, candidate lanes, or both for the next era |
| winner | ranked candidate plus commit recommendation |

What Bayes adds on top of that loop:

- ideas can reinforce each other, conflict, or stay intentionally separate
through explicit priors and graph directives
- confidence and risk can be encoded directly instead of staying implicit in
prompt prose
- follow-up queries can say what evidence would most reduce uncertainty next
- a candidate pool can emulate 1:1 evolution epochs, but the belief layer can
also explain why a combination should exist, not just whether it happened to
score well once

That is the main claim of the project: `autoclanker` should make
[Autoresearch](https://github.com/karpathy/autoresearch) or
[cEvolve](https://github.com/jnormore/cevolve)-style exploration easy to
reproduce, while also making the search space more inspectable, more expressive,
and easier to hand off honestly.

## Files & output

Every session keeps five always-present core files, an optional explicit
frontier file, an optional project-local proposal mirror, and an optional ideas
intake file:

| File | Purpose |
| --- | --- |
| `autoclanker.md` | Human-readable summary of the current session state. |
| `autoclanker.config.json` | Wrapper config, including the upstream session root. |
| `autoclanker.beliefs.json` | Rough or advanced beliefs for the session. |
| `autoclanker.eval.sh` | The checked-in eval surface for this session. |
| `autoclanker.frontier.json` | Optional reviewable local frontier for multi-path runs. |
| `autoclanker.proposals.json` | Optional project-local mirror of the active session proposal ledger once proposal state exists. |
| `autoclanker.history.jsonl` | Local chronological wrapper log. |
| `autoclanker.ideas.json` | Optional user-authored intake file for a goal, ideas, constraints, and simple pathway seeds. |
| `autoclanker.hooks/` | Optional user-authored executable hooks around eval ingestion. |

Those files live at the project root. They are enough for local inspection and
lightweight handoff.

A run has three layers:

- `autoclanker.md`: the wrapper-local summary at the project root
- `autoclanker.proposals.json`: the durable active-session proposal mirror when
proposal state exists
- `autoclanker.history.jsonl`: the local chronological log of what the wrapper
did
- `autoclanker.frontier.json`: the optional local frontier document for
explicit path comparison and merges
- `autoclanker.ideas.json`: the optional user-authored intake surface that can
seed beliefs and, when present, a first frontier draft
- `autoclanker.hooks/`: optional before/after eval scripts; hook output is
returned from ingest and logged locally
- `.autoclanker//RESULTS.md` plus the session PNGs: the upstream
summary and visual report bundle
- `.autoclanker//...`: the deeper upstream JSON and YAML artifacts when
you want posterior details, influence summaries, queries, or export,
including `observations.jsonl`, `posterior_summary.json`,
`influence_summary.json`, and `query.json`

That means the public story stays simple even though the underlying model is
stronger than a plain evolve loop. You can begin from plain strings at first,
read the summary, and only drop into the deeper artifact tree when you actually
need the extra structure.

The upstream session root, usually `.autoclanker//`, keeps the deeper
`autoclanker` artifacts:

- `RESULTS.md`
- `observations.jsonl`
- `posterior_summary.json`
- `influence_summary.json`
- `query.json`
- `belief_graph_prior.png`
- `belief_graph_posterior.png`

`pi-autoclanker` also snapshots the checked-in `autoclanker.eval.sh` surface at
session start, passes the locked upstream eval contract into that shell at
ingest time, and refuses eval ingest if the local eval file drifts during the
life of the session.

`/autoclanker status` also surfaces the wrapper-side trust and frontier state
directly:

- locked vs current eval-contract digest
- eval drift status
- last measured eval lease / stabilization summary when upstream has executed a
hardened eval
- objective backend and acquisition backend from the latest upstream status and
suggestion artifacts
- concrete comparison focus when upstream asks a candidate-vs-candidate or
family-vs-family follow-up question
- compared lane count
- frontier family count
- pending queries
- pending merge suggestions
- the normalized Prior / Run / Posterior / Proposal briefs
- a machine-readable dashboard model, proposal ledger mirror, evidence views,
and resume metadata

If you compare that with a lighter `cevolve`-style run directory:

| `cevolve`-style artifact | `autoclanker` equivalent or stronger |
| --- | --- |
| `config.json` | `autoclanker.config.json` |
| `ideas.json` | rough ideas plus `autoclanker.beliefs.json` |
| `population.json` | candidate pool plus ranked candidates and posterior state |
| `history.jsonl` | `autoclanker.history.jsonl` plus upstream `observations.jsonl` |
| `RESULTS.md` | upstream `.autoclanker//RESULTS.md` plus wrapper-local `autoclanker.md` |
| chart PNGs | upstream `convergence.png`, `candidate_rankings.png`, `belief_graph_prior.png`, and `belief_graph_posterior.png` |

The upstream session root now emits the small report bundle directly after
`fit`, `suggest`, or `recommend-commit`, and it can be refreshed explicitly
with `autoclanker session render-report`. The underlying JSON and YAML
artifacts are still there when you need the deeper Bayesian state.

The graph and report views are meant as evidence views, not as required inputs:

![pi-autoclanker evidence views](docs/assets/pi-autoclanker-evidence-views.svg)

- prior graph: what the session believed before evidence
- posterior graph: what still looks plausible after evals
- candidate rankings: which lanes look strongest right now
- convergence: whether new evals are still changing the picture

## Example demos

The shipped examples now separate the real runnable target from the wrapper-side
session tiers:

- [`examples/targets/parser-quickstart`](examples/targets/parser-quickstart):
packaged parser app, benchmark harness, eval shell, and candidate pool
- [`examples/minimal`](examples/minimal): smallest useful kickoff shape,
centered on direct prompt input or `autoclanker.ideas.json`, intended to be
used with the packaged parser target
- [`examples/parser-demo-expanded`](examples/parser-demo-expanded): fuller
worked session with `autoclanker.ideas.json`, `candidates.json`,
`autoclanker.proposals.json`, the four-brief summary, and a checked-in eval
surface for that same packaged target

Use `examples/targets/parser-quickstart` when you want to get your hands on a
real target immediately, even from a lean `autoclanker + pi-autoclanker`
install. Use `examples/minimal` to see what the wrapper can start from. Use
`examples/parser-demo-expanded` to see what the project looks like after
`pi-autoclanker` has already written the resumable files around that target.

## Developer

Main deterministic gate:

```bash
./bin/dev setup
./bin/dev check
```

Higher-confidence deterministic parity gate:

```bash
./bin/dev check-parity
```

Opt-in live gate:

```bash
./bin/dev check-live
```

Successful live runs record evidence under `.local/live-evidence/`.

The main contract sources are:

- [`AGENTS.md`](AGENTS.md)
- [`docs/SPEC.md`](docs/SPEC.md)
- [`docs/DESIGN.md`](docs/DESIGN.md)
- [`docs/COMPLIANCE_MATRIX.md`](docs/COMPLIANCE_MATRIX.md)
- [`tests/compliance_matrix.json`](tests/compliance_matrix.json)
- [`tests/parity_manifest.json`](tests/parity_manifest.json)
- [`tests/python_requirement_parity.test.ts`](tests/python_requirement_parity.test.ts)
- [`tests/python_behavior_parity.test.ts`](tests/python_behavior_parity.test.ts)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/martelogan/pi-autoclanker

Awesome Lists containing this project

README