https://github.com/plaited/acp-harness
CLI for agent evaluation. Capture trajectories, run trials with pass@k metrics, and score with polyglot graders (TypeScript, Python, any language).
https://github.com/plaited/acp-harness
acp agent-client-protocol agent-evaluation ai-agents bun cli eval-harness grader jsonl llm-evaluation pass-at-k trajectory-capture typescript
Last synced: 4 months ago
JSON representation
CLI for agent evaluation. Capture trajectories, run trials with pass@k metrics, and score with polyglot graders (TypeScript, Python, any language).
- Host: GitHub
- URL: https://github.com/plaited/acp-harness
- Owner: plaited
- License: isc
- Created: 2026-01-15T18:07:11.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2026-01-18T09:03:21.000Z (5 months ago)
- Last Synced: 2026-01-18T16:58:54.647Z (5 months ago)
- Topics: acp, agent-client-protocol, agent-evaluation, ai-agents, bun, cli, eval-harness, grader, jsonl, llm-evaluation, pass-at-k, trajectory-capture, typescript
- Language: TypeScript
- Homepage:
- Size: 266 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Codeowners: .github/CODEOWNERS
Awesome Lists containing this project
README
# @plaited/acp
[](https://www.npmjs.com/package/@plaited/acp)
[](https://github.com/plaited/acp-harness/actions/workflows/ci.yml)
[](https://opensource.org/licenses/ISC)
Unified ACP client and evaluation harness for TypeScript/Bun projects. Connect to ACP-compatible agents programmatically, capture full trajectories, and pipe to downstream analysis tools.
## Installation
```bash
bun add @plaited/acp
```
**Prerequisite:** Install an ACP adapter:
```bash
npm install -g @zed-industries/claude-code-acp
```
## Quick Start
```typescript
import { createACPClient, createPrompt, summarizeResponse } from '@plaited/acp'
const client = createACPClient({
command: ['claude-code-acp'],
cwd: '/path/to/project',
})
await client.connect()
const session = await client.createSession()
const { updates } = await client.promptSync(
session.id,
createPrompt('Create a function that validates email addresses')
)
const summary = summarizeResponse(updates)
console.log(summary.text, summary.completedToolCalls)
await client.disconnect()
```
## Recommended: Use the Bundled Plugin
This package includes a comprehensive **eval-harness plugin** designed for AI-assisted evaluation development. The plugin provides:
- Complete API reference for `createACPClient` and helpers
- Harness CLI usage with all options and examples
- Output format schemas (summary and judge formats)
- LLM-as-judge evaluation templates
- Downstream integration patterns (Braintrust, jq, custom scorers)
- Docker execution guidance
### Install the Plugin
Install via the Plaited marketplace:
**Claude Code:**
```bash
/plugin marketplace add plaited/marketplace
```
**Other AI coding agents:**
```bash
curl -fsSL https://raw.githubusercontent.com/plaited/marketplace/main/install.sh | bash -s -- --agent --plugin acp-harness
Supported agents: gemini, copilot, cursor, opencode, amp, goose, factory
```
Once installed, the plugin auto-activates when working on evaluation tasks. Ask your AI agent to help you:
- Set up evaluation prompts
- Configure the harness CLI
- Design scoring pipelines
- Integrate with Braintrust or custom analysis tools
The plugin contains everything needed to build agent evaluations - use it as your primary reference.
## Development
```bash
bun install # Install dependencies
bun run check # Type check + lint + format
bun test # Run unit tests
bun run check:write # Auto-fix issues
```
## Requirements
- **Runtime:** Bun >= 1.2.9
- **ACP Adapter:** `@zed-industries/claude-code-acp` or compatible
- **API Key:** `ANTHROPIC_API_KEY` environment variable
## License
ISC © [Plaited Labs](https://github.com/plaited)