https://github.com/evaluation-context-protocol/ecp
ECP is a standardized interface for orchestrating, auditing, and enforcing authority limits in AI Agent evaluations. It moves evaluation from "brittle Python scripts" to a deterministic infrastructure protocol
https://github.com/evaluation-context-protocol/ecp
evaluation-metrics evaluations llm-evaluation model-evaluation
Last synced: about 1 month ago
JSON representation
ECP is a standardized interface for orchestrating, auditing, and enforcing authority limits in AI Agent evaluations. It moves evaluation from "brittle Python scripts" to a deterministic infrastructure protocol
- Host: GitHub
- URL: https://github.com/evaluation-context-protocol/ecp
- Owner: evaluation-context-protocol
- License: apache-2.0
- Created: 2026-01-21T18:07:17.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2026-04-24T04:20:13.000Z (about 1 month ago)
- Last Synced: 2026-04-24T06:23:14.321Z (about 1 month ago)
- Topics: evaluation-metrics, evaluations, llm-evaluation, model-evaluation
- Language: Python
- Homepage: http://evaluationcontextprotocol.io/
- Size: 737 KB
- Stars: 8
- Watchers: 0
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Evaluation Context Protocol (ECP)


> Work in progress: this repository is actively evolving, and some concepts may change.
A lightweight protocol and reference runtime for evaluating agents with public output, private reasoning, and tool usage. This repo contains:
- `sdk/` - Python SDK for implementing an ECP agent.
- `runtime/` - Python runtime (CLI) that runs manifests and grades results.
- `examples/` - Minimal framework demos (LangChain, LlamaIndex, CrewAI, PydanticAI).
- `spec/` - Protocol specification.
## Documentation
- Docs site: https://evaluationcontextprotocol.io/
- Quickstart: https://evaluationcontextprotocol.io/quickstart/
- Specification: https://evaluationcontextprotocol.io/spec/
- Docs deploy automatically from `main` via GitHub Actions.
## Quick Start
Create a venv and install the current PyPI prerelease that matches the latest GitHub beta release:
```bash
py -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install "ecp-runtime==0.2.9" "ecp-sdk[langchain]==0.2.9" langchain-openai
```
Run the example manifest:
```bash
python -m ecp_runtime.cli run --manifest .\examples\langchain_demo\manifest.yaml
```
Generate an HTML report:
```bash
python -m ecp_runtime.cli run --manifest .\examples\langchain_demo\manifest.yaml --report .\report.html
```
Print a JSON report (useful for CI tooling):
```bash
python -m ecp_runtime.cli run --manifest .\examples\langchain_demo\manifest.yaml --json
```
Save a JSON report to a file:
```bash
python -m ecp_runtime.cli run --manifest .\examples\langchain_demo\manifest.yaml --json-out .\report.json
```
If your manifest uses `llm_judge`, set your key:
```bash
$env:OPENAI_API_KEY="your_key_here"
$env:ECP_LLM_JUDGE_MODEL="gpt-4o-mini"
```
The latest stable packages on PyPI are now `0.2.9`, and this repository is aligned with that release line.
Run the other demos:
```bash
pip install "ecp-sdk[crewai]==0.2.9" crewai
python -m ecp_runtime.cli run --manifest .\examples\crewai_demo\manifest.yaml
pip install "ecp-sdk[pydanticai]==0.2.9" pydantic-ai
python -m ecp_runtime.cli run --manifest .\examples\pydantic_ai_demo\manifest.yaml
```
## Example (LangChain Agent + Manifest)
Agent (LangChain `create_agent` + tool usage):
```python
from langchain.agents import create_agent
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from ecp import serve
from ecp.adaptors.langchain import ECPLangChainAdapter
@tool
def calculator(expression: str) -> str:
allowed = set("0123456789+-*/() ")
if not expression or any(ch not in allowed for ch in expression):
return "Invalid expression."
try:
return str(int(eval(expression, {"__builtins__": {}})))
except Exception:
return "Invalid expression."
agent = create_agent(
model=ChatOpenAI(model="gpt-3.5-turbo", temperature=0),
tools=[calculator],
system_prompt="Use the calculator tool for arithmetic."
)
def to_messages(text: str):
return {"messages": [{"role": "user", "content": text}]}
serve(ECPLangChainAdapter(agent, name="MathBot", input_mapper=to_messages))
```
Manifest (runtime checks output + tool usage):
```yaml
manifest_version: "v1"
name: "LangChain Math Check"
target: "python agent.py"
scenarios:
- name: "Ratio Word Problem"
steps:
- input: "Katy makes coffee using teaspoons of sugar and cups of water in the ratio of 7:13..."
graders:
- type: text_match
field: public_output
condition: contains
value: "42"
- type: tool_usage
tool_name: "calculator"
arguments: {}
```
Supported graders:
- `text_match` (`contains`, `equals`, `does_not_contain`, `regex`)
- `llm_judge` (requires `OPENAI_API_KEY`)
- `tool_usage` (name + argument subset match)
Note: manifest validation is strict and fails fast on invalid grader configuration.
## ECP in 60 Seconds
ECP is JSON-RPC 2.0 over stdio. The runtime launches your agent process and calls:
- `agent/initialize`
- `agent/step`
- `agent/reset`
Your agent replies with a structured result containing:
- `public_output` (what the user sees)
- `private_thought` (for evaluators)
- `tool_calls` (actions taken)
See `spec/protocol.md` for the full protocol.
## Repo Layout
- `sdk/python/src/ecp` - SDK decorators, adapters, and server loop
- `runtime/python/src/ecp_runtime` - CLI, runner, graders, reporting, and trend analysis
- `examples/` - Demo agents and manifests for LangChain, LlamaIndex, CrewAI, and PydanticAI
- `runtime/python/tests` - Runtime unit and CLI smoke tests
- `sdk/python/tests` - Adapter normalization tests
## Status
This project is evolving quickly. Expect changes between minor versions.