https://github.com/robhowley/pi-structured-return

Pi extension that turns noisy CLI output into compact structured results - fewer tokens, full logs preserved.
https://github.com/robhowley/pi-structured-return
pi pi-coding-agent
Last synced: 2 months ago
JSON representation
Pi extension that turns noisy CLI output into compact structured results - fewer tokens, full logs preserved.
Host: GitHub
URL: https://github.com/robhowley/pi-structured-return
Owner: robhowley
License: mit
Created: 2026-03-15T23:29:10.000Z (3 months ago)
Default Branch: main
Last Pushed: 2026-03-23T21:23:20.000Z (2 months ago)
Last Synced: 2026-03-24T19:50:17.380Z (2 months ago)
Topics: pi, pi-coding-agent
Language: TypeScript
Homepage:
Size: 244 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

          # pi-structured-return

A [Pi](https://pi.dev/) extension that adds a `structured_return` tool alongside `bash`, returning compact parsed results with full logs — 60–95% fewer tokens without losing signal.

A failing test run, before and after:

**Raw pytest output (262 tokens):**

```

============================= test session starts ==============================

platform darwin -- Python 3.14.2, pytest-9.0.2

collecting ... collected 3 items

test_math.py::test_adds_two_numbers_correctly PASSED                     [ 33%]

test_math.py::test_multiplies_two_numbers_correctly FAILED               [ 66%]

test_math.py::test_does_not_divide_by_zero FAILED                        [100%]

=================================== FAILURES ===================================

____________________ test_multiplies_two_numbers_correctly _____________________

    def test_multiplies_two_numbers_correctly():

>       assert 3 * 4 == 99

E       assert (3 * 4) == 99

test_math.py:5: AssertionError

_________________________ test_does_not_divide_by_zero _________________________

    def test_does_not_divide_by_zero():

>       result = 1 / 0

                 ^^^^^

E       ZeroDivisionError: division by zero

test_math.py:8: ZeroDivisionError

=========================== short test summary info ============================

FAILED test_math.py::test_multiplies_two_numbers_correctly

FAILED test_math.py::test_does_not_divide_by_zero - ZeroDivisionError: ...

========================= 2 failed, 1 passed in 0.01s ==========================

```

**Structured result returned to the model (56 tokens):**

```

pytest test_math.py --junitxml=.tmp/report.xml → cwd: project

2 failed, 1 passed

  test_math.py:5  assert (3 * 4) == 99

  test_math.py:8  ZeroDivisionError: division by zero

```

262 → 56 tokens on a 3-test example. Real test suites are much larger — the reduction scales with them, saving thousands of tokens per run.

## Installation

```bash

pi install npm:@robhowley/pi-structured-return

```

## Design

`structured_return` is a separate tool, not a wrapper around `bash`. Intercepting `bash` to silently rewrite commands would override a primitive the model and platform both rely on. Pi's philosophy is to extend rather than obfuscate: features are built on top of the platform, not hidden inside it. A dedicated tool honors that. It adds to the available surface, keeps `bash` honest, and leaves the choice explicit. The skill guides the model toward it; nothing is hijacked to get there.

## Token reduction

Measured with `cl100k_base` (tiktoken). All benchmarks use tiny fixtures — reduction grows with real-world output.

### Test runners

Benchmark: 3 tests — 1 passing, 1 assertion failure, 1 unexpected error.

| Tool | Raw | Structured | Reduction | Notes |

|---|---|---|---|---|

| `mvn test` | 1063 | 86 | **92%** | build lifecycle noise with surefire stack traces per failure |

| `node --test` | 629 | 64 | **90%** | strips full stack traces, assertion internals, timing; preserves expected/actual |

| `npx ava` | 483 | 56 | **88%** | source snippets, diffs, full stack traces stripped; expected/actual preserved |

| `go test` | 394 | 48 | **88%** | stack traces, goroutine frames, panic recovery noise stripped; file:line + expected/actual preserved |

| `dotnet test` | 487 | 107 | **78%** | build header and VSTest output with per-failure stack traces |

| `npx vitest` | 348 | 75 | **78%** | source diff with inline arrows and ANSI color codes per failure |

| `python -m unittest` | 231 | 52 | **78%** | full tracebacks with source annotations; expected/actual from AssertionError |

| `cargo test` | 285 | 68 | **76%** | cargo progress + test binary output with panic traces per failure |

| `pytest` | 289 | 71 | **75%** | verbose output with source snippets and summary footer |

| `rspec` | 212 | 55 | **74%** | default output with backtrace |

| `gradle test` | 263 | 81 | **69%** | gradle console output with build lifecycle noise |

| `npx mocha` | 180 | 55 | **69%** | stack traces + assertion diff formatting; expected/actual preserved |

| `npx jest` | 309 | 99 | **68%** | source annotations with deep jest-circus stack traces per failure |

| `ruby` (minitest) | 168 | 59 | **65%** | default output with backtrace |

### Build tools and compilers

Benchmark: 1 file, 1–2 errors. Reduction scales with error count since raw output includes source snippets, caret indicators, and annotations per error.

| Tool | Raw | Structured | Reduction | Notes |

|---|---|---|---|---|

| `dotnet build` | 383 | 53 | **86%** | strips restore/timing noise, deduplicates repeated error lines, absolute paths relativized |

| `npx jsonlint` | 148 | 28 | **81%** | strips stack trace, source pointer line; preserves line number and expecting message |

| `tidy` | 233 | 51 | **78%** | strips remediation advice, accessibility tips, reformatted HTML output, Info lines |

| `cargo build` | 225 | 77 | **66%** | rustc error annotations with code spans and help text per error |

| `swiftc` | 161 | 58 | **64%** | source annotations with backtick markers deduplicated |

| `gcc` / `clang` | 109 | 77 | **29%** | strips source snippets, caret indicators, line numbers from gutter |

| `javac` | 79 | 66 | **16%** | strips source snippets, caret indicators; folds symbol/location into message |

### Linters and type checkers

Benchmark: 1 file, 1–2 violations. Reduction is a conservative lower bound — scales with file and error count since raw output repeats paths, source snippets, and annotations per violation.

| Tool | Raw | Structured | Reduction | Notes |

|---|---|---|---|---|

| `isort --check` | 143 | 29 | **80%** | strips diff hunks, absolute paths, timestamps; lists files with unsorted imports |

| `black --check` | 155 | 31 | **80%** | strips diff hunks, emoji, timestamps; lists files needing reformatting |

| `ruff check` | 107 | 52 | **51%** | source context + help text per error |

| `shellcheck` | 224 | 117 | **48%** | strips source snippets, carets, suggestions, wiki URLs |

| `npx htmlhint` | 174 | 92 | **47%** | strips ANSI codes, source evidence, rule descriptions, URLs |

| `vale` | 141 | 79 | **44%** | strips ANSI codes, Action/Span metadata, column-aligned formatting |

| `markdownlint` | 199 | 117 | **41%** | strips context quotes, URLs, fix info, error ranges |

| `pyright` | 100 | 59 | **41%** | strips version, timing, absolute paths; detail lines collapsed |

| `rubocop` | 149 | 90 | **40%** | strips source snippets, caret indicators, summary line |

| `tsc` | 107 | 72 | **33%** | vs `--pretty true` default; source snippets and underlines stripped |

| `stylelint` | 70 | 51 | **27%** | strips summary footer and fix hint |

| `pylint` | 141 | 120 | **15%** | strips header, score line, separator; scales with error count |

| `prettier --check` | 38 | 33 | **13%** | strips preamble, [warn] prefixes, footer hint; scales with file count |

| `hadolint` | 178 | 156 | **12%** | strips ANSI color codes and level labels; measured vs colored output |

| `eslint` | 64 | 59 | **8%** | already compact formatter |

| `mypy` | 75 | 72 | **4%** | mypy text is already compact; notes folded into parent errors |

### Security and audit

| Tool | Raw | Structured | Reduction | Notes |

|---|---|---|---|---|

| `bandit` | 402 | 99 | **75%** | strips source snippets, CWE URLs, run metrics, confidence labels |

| `npm audit` | 158 | 50 | **68%** | strips advisory URLs, fix instructions, CVSS vectors; advisory titles joined per package |

### Pipeline tools

dbt output is the noisiest tool in this repo relative to useful signal. Every run prints version info, adapter registration, project stats, concurrency settings, and per-node start/finish lines — all before any result.

The numbers below use 3–4 model toy examples; real projects run hundreds of models where the noise scales linearly and reduction compounds.

| Tool | Raw | Structured | Reduction | Notes |

|---|---|---|---|---|

| `dbt run` (success) | 428 | 20 | **95%** | version, adapter, concurrency, per-model start/finish — all noise on success |

| `dbt run` (failure) | 618 | 198 | **68%** | error messages, model paths, compiled code paths preserved |

| `dbt test` | 720 | 274 | **62%** | unit test diff tables preserved verbatim; preamble stripped |

| `dbt compile` | 775 | 683 | **12%** | compiled SQL is the signal and returned verbatim |

At 12 models, run failures hit 85% reduction. An 18-model DAG success: 1,645 → 20 tokens (99%).

### Already compact — use `bash` directly

Evaluated for structured parsing but raw output is already compact enough that a parser adds no reduction (or goes negative). Use `bash` instead of `structured_return` for these tools.

| Tool | Raw tokens | Format | Why no parser |

|---|---|---|---|

| `go build` | 85 | `file:line:col: message` | one line per error, no decoration |

| `flake8` | 75 | `file:line:col: CODE message` | no JSON without a plugin; text is already one line per violation |

| `yamllint` | 72 | `file:line:col level message (rule)` | filename printed once; one line per issue |

| `golangci-lint` | 59 | `file:line:col: message (linter)` | text output already minimal; JSON includes massive linter report |

| `go vet` | ~60 | `file:line:col: message` | same format as `go build` |

| `vulture` | 58 | `file:line: message (confidence%)` | single line per finding |

| `pydocstyle` | 48 | `file:line context + CODE: message` | two lines per issue; structured format would repeat file paths |

## How it works

1. The agent runs commands through `structured_return` when it would reduce noise and token usage.

2. Full output is captured and stored as a log.

3. A parser converts noisy CLI output into a compact structured result. If no parser matches, the last 200 lines and the log path are returned as a fallback.

4. The agent receives the structured result in context — signal only, no noise.

5. The full log is always available on disk for both the agent and humans to inspect.

6. Run `/sr-stats` to see how many tokens structured-return has saved — both in the current session and across all sessions.

Run `/sr-parsers` in a pi session to see all registered parsers with their match rules. Run `/sr-stats` to see token savings for the current session and lifetime.

## Extending with project-local parsers

Built-in parsers cover common tools. For everything else — internal CLIs, custom test runners, proprietary lint tools — add a `.pi/structured-return.json` to your project root.

**Why:** keeps token costs low for tools the built-ins don't know about, without forking the package.

**Two options:**

### 1. Re-use a built-in parser

Route a project-specific command to an existing parser. Use this when your tool's output already matches a supported format (e.g. a test runner that emits JUnit XML).

```json

// .pi/structured-return.json

{

  "parsers": [

    {

      "id": "acme-tests",

      "match": { "argvIncludes": ["acme", "test"] },

      "parseAs": "junit-xml"

    }

  ]

}

```

### 2. Write a custom parser

Point to a local `.ts` file for tools with unique output formats.

```json

// .pi/structured-return.json

{

  "parsers": [

    {

      "id": "foo-json",

      "match": { "argvIncludes": ["foo-cli", "check"] },

      "module": "parsers/foo-cli.js"

    }

  ]

}

```

```ts

// .pi/parsers/foo-cli.ts

import fs from "node:fs";

import type { RunContext } from "@robhowley/pi-structured-return/types";

export default {

  id: "foo-json",

  async parse(ctx: RunContext) {

    const data = JSON.parse(fs.readFileSync(ctx.stdoutPath, "utf8"));

    return {

      tool: "foo-cli",

      status: data.ok ? "pass" : "fail",

      summary: data.ok ? "passed" : `${data.errors.length} errors`,

      failures: data.errors.map((e, i) => ({ id: e.id ?? `error-${i}`, file: e.file, line: e.line, message: e.message })),

      logPath: ctx.logPath,

    };

  },

};

```

The parser receives a `RunContext` (command, argv, cwd, stdout/stderr paths, artifact paths, log path) and returns a `ParsedResult`. Match rules support `argvIncludes` (array of required tokens) or `regex` (tested against the full argv string).

## Structured result schema

Every parser returns the same shape. The model always knows where to look.

| Field | Type | Description |

|---|---|---|

| `tool` | `string` | Name of the tool that ran (`eslint`, `pytest`, etc.) |

| `exitCode` | `number` | Raw process exit code |

| `status` | `pass \| fail \| error` | Normalized outcome |

| `summary` | `string` | One-line human+model readable result (`3 failed, 12 passed`) |

| `cwd` | `string` | Working directory — anchor for resolving relative paths in failures |

| `failures` | `{ id, file?, line?, message?, rule? }[]` | Per-failure details with relative file paths |

| `logPath` | `string` | Path to full stdout+stderr log |

| `rawTail` | `string?` | Last 200 lines of log, included on fallback when no parser matched |
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/robhowley/pi-structured-return

Awesome Lists containing this project

README