https://github.com/evaluation-context-protocol/ecp

ECP is a standardized interface for orchestrating, auditing, and enforcing authority limits in AI Agent evaluations. It moves evaluation from "brittle Python scripts" to a deterministic infrastructure protocol
https://github.com/evaluation-context-protocol/ecp

evaluation-metrics evaluations llm-evaluation model-evaluation

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/evaluation-context-protocol/ecp
Owner: evaluation-context-protocol
License: apache-2.0
Created: 2026-01-21T18:07:17.000Z (6 months ago)
Default Branch: main
Last Pushed: 2026-04-24T04:20:13.000Z (3 months ago)
Last Synced: 2026-04-24T06:23:14.321Z (3 months ago)
Topics: evaluation-metrics, evaluations, llm-evaluation, model-evaluation
Language: Python
Homepage: http://evaluationcontextprotocol.io/
Size: 737 KB
Stars: 8
Watchers: 0
Forks: 1
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # Evaluation Context Protocol (ECP)

![Status](https://img.shields.io/badge/Status-Experimental-orange)

![License](https://img.shields.io/badge/License-Apache%202.0-blue)

> Work in progress: this repository is actively evolving, and some concepts may change.

A lightweight protocol and reference runtime for evaluating agents with public output, private reasoning, and tool usage. This repo contains:

- `sdk/` - Python SDK for implementing an ECP agent.

- `runtime/` - Python runtime (CLI) that runs manifests and grades results.

- `examples/` - Minimal framework demos (LangChain, LlamaIndex, CrewAI, PydanticAI).

- `spec/` - Protocol specification.

## Documentation

- Docs site: https://evaluationcontextprotocol.io/

- Quickstart: https://evaluationcontextprotocol.io/quickstart/

- Specification: https://evaluationcontextprotocol.io/spec/

- Docs deploy automatically from `main` via GitHub Actions.

## Quick Start

Create a venv and install the current PyPI prerelease that matches the latest GitHub beta release:

```bash

py -m venv .venv

.\.venv\Scripts\Activate.ps1

pip install "ecp-runtime==0.2.9" "ecp-sdk[langchain]==0.2.9" langchain-openai

```

Run the example manifest:

```bash

python -m ecp_runtime.cli run --manifest .\examples\langchain_demo\manifest.yaml

```

Generate an HTML report:

```bash

python -m ecp_runtime.cli run --manifest .\examples\langchain_demo\manifest.yaml --report .\report.html

```

Print a JSON report (useful for CI tooling):

```bash

python -m ecp_runtime.cli run --manifest .\examples\langchain_demo\manifest.yaml --json

```

Save a JSON report to a file:

```bash

python -m ecp_runtime.cli run --manifest .\examples\langchain_demo\manifest.yaml --json-out .\report.json

```

If your manifest uses `llm_judge`, set your key:

```bash

$env:OPENAI_API_KEY="your_key_here"

$env:ECP_LLM_JUDGE_MODEL="gpt-4o-mini"

```

The latest stable packages on PyPI are now `0.2.9`, and this repository is aligned with that release line.

Run the other demos:

```bash

pip install "ecp-sdk[crewai]==0.2.9" crewai

python -m ecp_runtime.cli run --manifest .\examples\crewai_demo\manifest.yaml

pip install "ecp-sdk[pydanticai]==0.2.9" pydantic-ai

python -m ecp_runtime.cli run --manifest .\examples\pydantic_ai_demo\manifest.yaml

```

## Example (LangChain Agent + Manifest)

Agent (LangChain `create_agent` + tool usage):

```python

from langchain.agents import create_agent

from langchain_openai import ChatOpenAI

from langchain_core.tools import tool

from ecp import serve

from ecp.adaptors.langchain import ECPLangChainAdapter

@tool

def calculator(expression: str) -> str:

    allowed = set("0123456789+-*/() ")

    if not expression or any(ch not in allowed for ch in expression):

        return "Invalid expression."

    try:

        return str(int(eval(expression, {"__builtins__": {}})))

    except Exception:

        return "Invalid expression."

agent = create_agent(

    model=ChatOpenAI(model="gpt-3.5-turbo", temperature=0),

    tools=[calculator],

    system_prompt="Use the calculator tool for arithmetic."

)

def to_messages(text: str):

    return {"messages": [{"role": "user", "content": text}]}

serve(ECPLangChainAdapter(agent, name="MathBot", input_mapper=to_messages))

```

Manifest (runtime checks output + tool usage):

```yaml

manifest_version: "v1"

name: "LangChain Math Check"

target: "python agent.py"

scenarios:

  - name: "Ratio Word Problem"

    steps:

      - input: "Katy makes coffee using teaspoons of sugar and cups of water in the ratio of 7:13..."

        graders:

          - type: text_match

            field: public_output

            condition: contains

            value: "42"

          - type: tool_usage

            tool_name: "calculator"

            arguments: {}

```

Supported graders:

- `text_match` (`contains`, `equals`, `does_not_contain`, `regex`)

- `llm_judge` (requires `OPENAI_API_KEY`)

- `tool_usage` (name + argument subset match)

Note: manifest validation is strict and fails fast on invalid grader configuration.

## ECP in 60 Seconds

ECP is JSON-RPC 2.0 over stdio. The runtime launches your agent process and calls:

- `agent/initialize`

- `agent/step`

- `agent/reset`

Your agent replies with a structured result containing:

- `public_output` (what the user sees)

- `private_thought` (for evaluators)

- `tool_calls` (actions taken)

See `spec/protocol.md` for the full protocol.

## Repo Layout

- `sdk/python/src/ecp` - SDK decorators, adapters, and server loop

- `runtime/python/src/ecp_runtime` - CLI, runner, graders, reporting, and trend analysis

- `examples/` - Demo agents and manifests for LangChain, LlamaIndex, CrewAI, and PydanticAI

- `runtime/python/tests` - Runtime unit and CLI smoke tests

- `sdk/python/tests` - Adapter normalization tests

## Status

This project is evolving quickly. Expect changes between minor versions.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/evaluation-context-protocol/ecp

Awesome Lists containing this project

README