{"id":45721870,"url":"https://github.com/evaluation-context-protocol/ecp","last_synced_at":"2026-04-25T07:01:47.715Z","repository":{"id":339056619,"uuid":"1139251597","full_name":"evaluation-context-protocol/ecp","owner":"evaluation-context-protocol","description":"ECP is a standardized interface for orchestrating, auditing, and enforcing authority limits in AI Agent evaluations. It moves evaluation from \"brittle Python scripts\" to a deterministic infrastructure protocol","archived":false,"fork":false,"pushed_at":"2026-04-24T04:20:13.000Z","size":755,"stargazers_count":8,"open_issues_count":1,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-24T06:23:14.321Z","etag":null,"topics":["evaluation-metrics","evaluations","llm-evaluation","model-evaluation"],"latest_commit_sha":null,"homepage":"http://evaluationcontextprotocol.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/evaluation-context-protocol.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-01-21T18:07:17.000Z","updated_at":"2026-04-24T04:20:17.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/evaluation-context-protocol/ecp","commit_stats":null,"previous_names":["evaluation-context-protocol/ecp"],"tags_count":10,"template":false,"template_full_name":null,"purl":"pkg:github/evaluation-context-protocol/ecp","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/evaluation-context-protocol%2Fecp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/evaluation-context-protocol%2Fecp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/evaluation-context-protocol%2Fecp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/evaluation-context-protocol%2Fecp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/evaluation-context-protocol","download_url":"https://codeload.github.com/evaluation-context-protocol/ecp/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/evaluation-context-protocol%2Fecp/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32253251,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-25T04:23:17.126Z","status":"ssl_error","status_checked_at":"2026-04-25T04:21:53.360Z","response_time":59,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["evaluation-metrics","evaluations","llm-evaluation","model-evaluation"],"created_at":"2026-02-25T06:02:28.535Z","updated_at":"2026-04-25T07:01:47.688Z","avatar_url":"https://github.com/evaluation-context-protocol.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"﻿# Evaluation Context Protocol (ECP)\n\n![Status](https://img.shields.io/badge/Status-Experimental-orange)\n![License](https://img.shields.io/badge/License-Apache%202.0-blue)\n\n\u003e Work in progress: this repository is actively evolving, and some concepts may change.\n\n\nA lightweight protocol and reference runtime for evaluating agents with public output, private reasoning, and tool usage. This repo contains:\n\n- `sdk/` - Python SDK for implementing an ECP agent.\n- `runtime/` - Python runtime (CLI) that runs manifests and grades results.\n- `examples/` - Minimal framework demos (LangChain, LlamaIndex, CrewAI, PydanticAI).\n- `spec/` - Protocol specification.\n\n## Documentation\n\n- Docs site: https://evaluationcontextprotocol.io/\n- Quickstart: https://evaluationcontextprotocol.io/quickstart/\n- Specification: https://evaluationcontextprotocol.io/spec/\n- Docs deploy automatically from `main` via GitHub Actions.\n\n## Quick Start\n\nCreate a venv and install the current PyPI prerelease that matches the latest GitHub beta release:\n\n```bash\npy -m venv .venv\n.\\.venv\\Scripts\\Activate.ps1\npip install \"ecp-runtime==0.2.9\" \"ecp-sdk[langchain]==0.2.9\" langchain-openai\n```\n\nRun the example manifest:\n\n```bash\npython -m ecp_runtime.cli run --manifest .\\examples\\langchain_demo\\manifest.yaml\n```\n\nGenerate an HTML report:\n\n```bash\npython -m ecp_runtime.cli run --manifest .\\examples\\langchain_demo\\manifest.yaml --report .\\report.html\n```\n\nPrint a JSON report (useful for CI tooling):\n\n```bash\npython -m ecp_runtime.cli run --manifest .\\examples\\langchain_demo\\manifest.yaml --json\n```\n\nSave a JSON report to a file:\n\n```bash\npython -m ecp_runtime.cli run --manifest .\\examples\\langchain_demo\\manifest.yaml --json-out .\\report.json\n```\n\nIf your manifest uses `llm_judge`, set your key:\n\n```bash\n$env:OPENAI_API_KEY=\"your_key_here\"\n$env:ECP_LLM_JUDGE_MODEL=\"gpt-4o-mini\"\n```\n\nThe latest stable packages on PyPI are now `0.2.9`, and this repository is aligned with that release line.\n\nRun the other demos:\n\n```bash\npip install \"ecp-sdk[crewai]==0.2.9\" crewai\npython -m ecp_runtime.cli run --manifest .\\examples\\crewai_demo\\manifest.yaml\n\npip install \"ecp-sdk[pydanticai]==0.2.9\" pydantic-ai\npython -m ecp_runtime.cli run --manifest .\\examples\\pydantic_ai_demo\\manifest.yaml\n```\n\n## Example (LangChain Agent + Manifest)\n\nAgent (LangChain `create_agent` + tool usage):\n\n```python\nfrom langchain.agents import create_agent\nfrom langchain_openai import ChatOpenAI\nfrom langchain_core.tools import tool\nfrom ecp import serve\nfrom ecp.adaptors.langchain import ECPLangChainAdapter\n\n@tool\ndef calculator(expression: str) -\u003e str:\n    allowed = set(\"0123456789+-*/() \")\n    if not expression or any(ch not in allowed for ch in expression):\n        return \"Invalid expression.\"\n    try:\n        return str(int(eval(expression, {\"__builtins__\": {}})))\n    except Exception:\n        return \"Invalid expression.\"\n\nagent = create_agent(\n    model=ChatOpenAI(model=\"gpt-3.5-turbo\", temperature=0),\n    tools=[calculator],\n    system_prompt=\"Use the calculator tool for arithmetic.\"\n)\n\ndef to_messages(text: str):\n    return {\"messages\": [{\"role\": \"user\", \"content\": text}]}\n\nserve(ECPLangChainAdapter(agent, name=\"MathBot\", input_mapper=to_messages))\n```\n\nManifest (runtime checks output + tool usage):\n\n```yaml\nmanifest_version: \"v1\"\nname: \"LangChain Math Check\"\ntarget: \"python agent.py\"\n\nscenarios:\n  - name: \"Ratio Word Problem\"\n    steps:\n      - input: \"Katy makes coffee using teaspoons of sugar and cups of water in the ratio of 7:13...\"\n        graders:\n          - type: text_match\n            field: public_output\n            condition: contains\n            value: \"42\"\n          - type: tool_usage\n            tool_name: \"calculator\"\n            arguments: {}\n```\n\nSupported graders:\n\n- `text_match` (`contains`, `equals`, `does_not_contain`, `regex`)\n- `llm_judge` (requires `OPENAI_API_KEY`)\n- `tool_usage` (name + argument subset match)\n\nNote: manifest validation is strict and fails fast on invalid grader configuration.\n\n## ECP in 60 Seconds\n\nECP is JSON-RPC 2.0 over stdio. The runtime launches your agent process and calls:\n\n- `agent/initialize`\n- `agent/step`\n- `agent/reset`\n\nYour agent replies with a structured result containing:\n\n- `public_output` (what the user sees)\n- `private_thought` (for evaluators)\n- `tool_calls` (actions taken)\n\nSee `spec/protocol.md` for the full protocol.\n\n## Repo Layout\n\n- `sdk/python/src/ecp` - SDK decorators, adapters, and server loop\n- `runtime/python/src/ecp_runtime` - CLI, runner, graders, reporting, and trend analysis\n- `examples/` - Demo agents and manifests for LangChain, LlamaIndex, CrewAI, and PydanticAI\n- `runtime/python/tests` - Runtime unit and CLI smoke tests\n- `sdk/python/tests` - Adapter normalization tests\n\n## Status\n\nThis project is evolving quickly. Expect changes between minor versions.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fevaluation-context-protocol%2Fecp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fevaluation-context-protocol%2Fecp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fevaluation-context-protocol%2Fecp/lists"}