An open API service indexing awesome lists of open source software.

https://github.com/keesan12/martin-loop

Martin Loop — The control plane for AI coding agents.
https://github.com/keesan12/martin-loop

agentic-engineering ai-agent-runtime ai-coding-agents ai-governance ai-governance-ai-orchestration ai-governance-layer ai-infrastructure ai-observability ai-safety audit-trail budget-enforcement control-plane devtools governed-runtime llmops observability opentelemetry policy-as-code ralph-loop secure-by-default

Last synced: 10 days ago
JSON representation

Martin Loop — The control plane for AI coding agents.

Awesome Lists containing this project

README

          

MartinLoop

### Governed AI coding loops with budgets, verifier gates, rollback evidence, and receipts.

[![License: Apache-2.0](https://img.shields.io/badge/license-Apache--2.0-blue?logo=apache)](./LICENSE)
[![TypeScript](https://img.shields.io/badge/TypeScript-strict-3178c6?style=flat-square&logo=typescript&logoColor=white)](./tsconfig.base.json)
[![Node](https://img.shields.io/badge/node-%3E%3D20-3c873a?style=flat-square&logo=nodedotjs&logoColor=white)](#quick-start)
[![npm](https://img.shields.io/badge/npm-martin--loop-cc3534?style=flat-square&logo=npm&logoColor=white)](https://www.npmjs.com/package/martin-loop)

MartinLoop has been accepted into the NVIDIA Inception program.


**Your AI coding run estimated $2.40.**
**It kept retrying until the bill hit $65.**

47 attempts. No hard stop. No rollback. No audit trail. Nothing merged.



MartinLoop makes that failure visible, bounded, and reviewable.

> AI coding agents are useful. Unbounded retry loops are not.
>
> MartinLoop wraps Claude Code, Codex, and custom agent runs with budget caps, policy checks, verifier gates, rollback evidence, and inspectable JSONL run records.


MartinLoop CLI — governed agent run

---

## The Problem

A typical autonomous coding loop keeps attempting work until tests pass. Without a governance layer, that loop can keep spending, mutate files outside the intended scope, lose track of why it failed, and leave teams without a clean audit trail.

Autonomous coding loops are powerful, but the usual pattern is attempt, check, retry, repeat, with no strong answer to:

- What changed?
- What did it cost?
- Why was it allowed?
- Why did it stop?
- Can we inspect or resume it later?

MartinLoop governs the failure mode.

---

## The Solution

MartinLoop wraps AI coding loops with a governance layer.

It does not try to replace the agent pattern. It makes that pattern safe to run.

### What MartinLoop Does Today

| Capability | Current behavior |
|---|---|
| Budget governance | Enforces `maxUsd`, `softLimitUsd`, `maxIterations`, and `maxTokens`; rejects attempts projected to exceed remaining budget and exits on budget or iteration exhaustion. Hard USD budget caps that stop work before the next attempt breaches policy. |
| Verifier gate | A run only reaches `completed` when the adapter result and verifier state pass. Unsafe verifier commands are blocked before agent execution. |
| Failure taxonomy | Classifies failures across 11 current classes, including hallucination, test regression, scope creep, repo grounding failure, environment mismatch, and budget pressure, that distinguishes real success from unsafe, invalid, or terminal behavior.|
| Safety leash | Evaluates verifier commands, file scope, dependency or migration changes that require approval, and secret-like values in task text. **Policy-as-code**. |
| Context integrity | Scans user prompts and tool output for injection patterns (authority inversion, instruction override, identity redefinition) before any attempt is admitted. Aborts with human escalation on detection. |
| Red-Blue Testing | Adversarial probe suite that runs before a patch is accepted. Six deterministic probes detect assertion deletion, silent reverts, context poisoning, budget self-reporting, and grounding evasion. Three risk tiers: `baseline`, `high_risk`, and `release_critical` (adds a Haiku model call). A single block-severity finding rejects the patch. |
| Rollback evidence | Captures rollback boundaries and restore outcomes for repo-backed attempts when a persistence store is configured. |
| Context distillation | Carries a distilled summary of recent attempts and remaining constraints into subsequent attempts. |
| Run records | The CLI appends JSONL loop records under `~/.martin/runs/.jsonl`; lower-level stores can also persist contracts, ledgers, and attempt artifacts.

The result is a runtime that can complete good work, refuse unsafe work, stop uneconomical work, and leave evidence behind.
---

## Ralph-Style Loops Need a Control Layer

The **Ralph Loop** is the failure mode where an AI coding agent keeps trying without knowing when it should stop.

The pattern is simple: attempt the task, run checks, retry on failure, repeat. The problem is not that the loop exists. The problem is that most implementations have no hard budget cap, no signed evidence layer, and no pre-execution control system. They know how to keep trying. They do **not** know when continuing is unsafe, uneconomical, or impossible.

MartinLoop solves the Ralph Loop problem by enforcing rules **before** damage happens:

- it stops the next attempt before budget overspend
- it classifies unsafe or invalid actions before execution
- it appends a structured JSONL audit record for every attempt
- it rolls back failed runs instead of leaving broken state behind
- it reduces runaway token growth with context distillation

If a Ralph-style loop has ever burned budget without producing a verified result, MartinLoop is designed to stop that failure mode before the next unsafe attempt runs.


Martin vs Ralph — governed vs ungoverned agent loop

### How It Works — Five Layers

| Layer | What it does |
|---|---|
| **1. Task Contract** | Objective, verifier plan, repo root, allowed/denied paths, acceptance criteria, workspace, project, and budget. |
| **2. Policy & Budget** | Defaults from `martin.config.yaml`; CLI flags override. Budget preflight rejects attempts before execution. |
| **3. Agent Adapters** | Claude CLI, Codex CLI, direct-provider, and stub adapters normalize execution results into the core runtime contract. |
| **4. Safety & Verification** | Verifier commands, file scope, approval-boundary changes, secret-like values, and grounding determine whether work is kept. |
| **5. Persistence** | CLI writes JSONL records under `~/.martin/runs/`. Repo-backed runs can also persist contracts, ledgers, diffs, and rollback artifacts. |

---

## See It In Action

Same task, same starting state. MartinLoop completes in one verified attempt at `$2.30`. The uncontrolled loop retries four times, spends `$5.20`, and fails with no audit trail.

Martin Loop matters because it turns AI coding from an opaque experiment into something that can be governed, replayed, verified, and trusted.


Martin vs Ralph — governed vs ungoverned agent loop side-by-side benchmark comparison

Try the packaged demo locally:

```sh
npx martin-loop demo
cd martin-loop-demo
npm install
MARTIN_LIVE=false npx martin-loop run "Summarize the demo workspace and confirm the verifier is green" --verify "npm test"
```

Challenge page: [Can your AI coding agent finish this task under $3?](./docs/distribution/UNDER-3-CHALLENGE.md)

If the problem is familiar, star the repo so other builders can find the runtime before their next unbounded agent loop.

---

## Quick Start

```sh
npm install -g martin-loop
```

This installs the public `martin-loop` CLI package. This README is synced for `martin-loop@0.2.0`.

Want a safe sandbox first? Run `npx martin-loop demo` and MartinLoop will copy a disposable local workspace into `./martin-loop-demo`.

### Three-Minute First Value

Start with the local readiness check:

```sh
npx martin-loop doctor
```

Then run the no-spend proof path:

```sh
npx martin-loop demo
cd martin-loop-demo
npm install
MARTIN_LIVE=false npx martin-loop run "Summarize the demo workspace and confirm the verifier is green" --verify "npm test"
npx martin-loop dossier --latest
```

`dossier --latest` gives you the receipt-style follow-up: what happened, verifier evidence, rollback or artifact evidence, directional token and cost totals, and the next safe action.

### Public Package Surface

The public package surface is:

- Install target: `npm install martin-loop`
- CLI target: `npx martin-loop`
- SDK target: `import { MartinLoop } from "martin-loop"`
- MCP target: `npx -y @martinloop/mcp`

`martin-loop` and `@martinloop/mcp` are published separately. The root package is for CLI and SDK use; the MCP package is for MCP hosts.

### MCP server

`@martinloop/mcp@0.2.0` exposes ten stdio tools plus read-only resources, resource templates, and prompts. `martin_run` remains the only tool that can execute work; the newer cockpit tools are read-only review helpers for recent runs, dossiers, attempts, and verifier results.

Recommended first-use flow:

1. `martin_doctor`
2. `martin_preflight`
3. `martin_run`
4. `martin_list_runs`, `martin_run_dossier`, `martin_inspect`, or `martin_status`

### MCP install

Use the published MCP package directly:

- Codex: `codex mcp add martin-loop -- npx -y @martinloop/mcp`
- Claude Code macOS/Linux: `claude mcp add --transport stdio --scope user martin-loop -- npx -y @martinloop/mcp`
- Claude Code Windows PowerShell/cmd: `claude mcp add --transport stdio --scope user martin-loop -- cmd /c npx -y @martinloop/mcp`

If you just want to launch the server manually, the one-line command is:

```sh
npx -y @martinloop/mcp
```

### Run a governed task

```sh
npx martin-loop run "fix the auth regression" \
--budget 3.00 \
--verify "pnpm test"
```

You can also pass the objective explicitly:

```sh
npx martin-loop run --objective "fix the auth regression" --budget 3.00 --verify "pnpm test"
```

For a no-spend repo-local dry run, use the stub adapter:

```powershell
$env:MARTIN_LIVE='false'
npx martin-loop run --objective "Summarize the current runtime state" --verify "pnpm --filter @martin/core test"
Remove-Item Env:MARTIN_LIVE
```

### Inspect or resume runs

```sh
npx martin-loop inspect --file ~/.martin/runs/.jsonl
npx martin-loop resume
```

`inspect` prints a portfolio summary for records in the file. `resume` looks up a persisted loop record by ID under `~/.martin/runs/`.

For the richer operator view, use:

```sh
npx martin-loop dossier --latest
```

---

## CLI

```text
martin-loop run [options]
martin-loop doctor
martin-loop dossier (--latest | --loop-id | --file )

--objective The task to accomplish, or pass it as the first positional arg
--budget Hard cost cap in USD
--budget-usd Alias for --budget
--soft-limit-usd Soft budget threshold in USD
--verify Verifier command after each attempt
--max-iterations Maximum number of attempts
--max-tokens Maximum total token budget
--engine Adapter to use: claude (default) or codex
--model Override the adapter model
--cwd Repo root for the run
--allow-path Restrict agent writes to this path pattern; repeatable
--deny-path Block this path pattern; repeatable
--accept Add an acceptance criterion; repeatable
--config Path to a martin.config.yaml file
--workspace Workspace ID for the run record
--project Project ID for the run record
--metadata Attach metadata to the run record; repeatable
```

The public CLI includes `doctor`, `demo`, `dossier`, `inspect`, and `resume`. `inspect` and `resume` remain useful compatibility views; `dossier` is the fastest way to review the latest run with receipt-style evidence.


MartinLoop CLI terminal output

---

## Policy File

Drop a `martin.config.yaml` in your repo root to set governance defaults:

```yaml
budget:
maxUsd: 5.00
softLimitUsd: 3.75
maxIterations: 5
maxTokens: 40000

governance:
destructiveActionPolicy: approval
telemetryDestination: local-only
verifierRules:
- pnpm test
```

CLI flags override config values when provided.

---

## TypeScript SDK

```sh
npm install martin-loop
```

```typescript
import {
MartinLoop,
createClaudeCliAdapter,
createCodexCliAdapter,
runMartin
} from "martin-loop";

const loop = new MartinLoop({
adapter: createClaudeCliAdapter({ workingDirectory: process.cwd() }),
defaults: {
workspaceId: "my-workspace",
projectId: "my-project",
budget: {
maxUsd: 3.00,
softLimitUsd: 2.25,
maxIterations: 3,
maxTokens: 20_000
}
}
});

const result = await loop.run({
task: {
title: "Fix auth regression",
objective: "Fix the failing auth regression tests",
verificationPlan: ["pnpm test"],
repoRoot: process.cwd()
}
});

console.log(result.decision.status);
```

Use Codex instead of Claude by swapping adapters:

```typescript
const loop = new MartinLoop({
adapter: createCodexCliAdapter({ workingDirectory: process.cwd() })
});
```

The lower-level `runMartin` function is also exported for callers that want to assemble the runtime input directly.

---

## Package Map

| Package or app | Role |
|---|---|
| `martin-loop` | Root public npm facade that vendors the runtime, CLI, adapters, and contracts into `dist/`. |
| `@martin/contracts` | Shared types for loops, policy, governance, budget, telemetry, and rollback. |
| `@martin/core` | Runtime controller, policy engine, safety leash, grounding, persistence, and rollback logic. |
| `@martin/adapters` | Claude CLI, Codex CLI, direct-provider, and stub adapter surfaces. |
| `@martin/cli` | CLI implementation for `run`, `demo`, `inspect`, and `resume`. |
| `@martinloop/mcp` | MCP server with governed execution plus read-only run review tools. |

Users install the root `martin-loop` package for the CLI and SDK, or the standalone `@martinloop/mcp` package for MCP hosts.

---
## Development

Requirements:

- Node 20+
- pnpm 10.x

```bash
git clone https://github.com/Keesan12/martin-loop.git
cd martin-loop
pnpm install

pnpm test
pnpm lint
pnpm build

```

Current RC gate commands:

```sh
pnpm oss:validate
pnpm public:smoke
pnpm rc:validate
pnpm release:matrix:local
pnpm --filter @martinloop/mcp verify:release
```

> **Caution:** This package is live on npm. Public releases should use the guarded GitHub Actions release workflow, with versioning and public copy verified before publishing.

Helpful docs:

- [OSS quickstart](./docs/oss/QUICKSTART.md)
- [OSS examples](./docs/oss/EXAMPLES.md)
- [Under-$3 benchmark challenge](./docs/distribution/UNDER-3-CHALLENGE.md)
- [Directory submission pack](./docs/distribution/DIRECTORY-SUBMISSIONS.md)
- [Integration outreach pack](./docs/distribution/INTEGRATION-OUTREACH.md)
- [Claude Code walkthrough](./docs/oss/CLAUDE-CODE-WALKTHROUGH.md)
- [Ralph-style loop safety guide](./docs/oss/RALPH-LOOP-SAFETY.md)
- [OSS surface overview](./docs/oss/README.md)

---

## Contributing

```sh
git checkout -b feat/your-feature
pnpm lint
pnpm test
git commit -m "feat: describe what you built"
git push -u origin feat/your-feature
```

Conventional commit prefixes: `feat:`, `fix:`, `chore:`, `docs:`, `refactor:`, and `test:`.

---

**⭐Give the repo a star⭐** if you think AI coding needs budgets, brakes, and receipts.

**APACHE 2.0 Licensed** · [martinloop.com](https://martinloop.com) · [keesan@martinloop.com](mailto:support@martinloop.com)

*"AI coding accountability: completes good work, refuses unsafe work, stops uneconomical work."*






NVIDIA Inception Program logo