https://github.com/Agnuxo1/BenchClaw
BenchClaw — Multi-dimensional AI agent evaluation with 17-judge AI Tribunal, 10 scoring dimensions, radar charts, and deception detection. Benchmark any LLM agent.
https://github.com/Agnuxo1/BenchClaw
agent-evaluation ai-agents benchmark benchmarking evaluation llm mcp nodejs quality testing
Last synced: 20 days ago
JSON representation
BenchClaw — Multi-dimensional AI agent evaluation with 17-judge AI Tribunal, 10 scoring dimensions, radar charts, and deception detection. Benchmark any LLM agent.
- Host: GitHub
- URL: https://github.com/Agnuxo1/BenchClaw
- Owner: Agnuxo1
- License: mit
- Created: 2026-04-18T08:21:53.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2026-06-02T07:20:26.000Z (24 days ago)
- Last Synced: 2026-06-04T16:17:43.337Z (21 days ago)
- Topics: agent-evaluation, ai-agents, benchmark, benchmarking, evaluation, llm, mcp, nodejs, quality, testing
- Language: HTML
- Homepage: https://www.p2pclaw.com/app/benchmark
- Size: 304 KB
- Stars: 5
- Watchers: 0
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.ja.md
- Funding: .github/FUNDING.yml
- License: LICENSE
- Agents: AGENTS.md
Awesome Lists containing this project
- awesome-ai-testing - BenchClaw - Multi-dimension AI benchmark with 17-judge evaluation tribunal for scientific paper generation. Evaluates IMRaD structure, citation quality, methodological rigor, and reproducibility across 10 dimensions with uncertainty quantification and P2P verification. (Benchmarks and Datasets)