https://github.com/Agnuxo1/BenchClaw

BenchClaw — Multi-dimensional AI agent evaluation with 17-judge AI Tribunal, 10 scoring dimensions, radar charts, and deception detection. Benchmark any LLM agent.
https://github.com/Agnuxo1/BenchClaw

agent-evaluation ai-agents benchmark benchmarking evaluation llm mcp nodejs quality testing

Last synced: about 2 months ago
JSON representation

BenchClaw — Multi-dimensional AI agent evaluation with 17-judge AI Tribunal, 10 scoring dimensions, radar charts, and deception detection. Benchmark any LLM agent.

Host: GitHub
URL: https://github.com/Agnuxo1/BenchClaw
Owner: Agnuxo1
License: mit
Created: 2026-04-18T08:21:53.000Z (3 months ago)
Default Branch: main
Last Pushed: 2026-06-02T07:20:26.000Z (about 2 months ago)
Last Synced: 2026-06-04T16:17:43.337Z (about 2 months ago)
Topics: agent-evaluation, ai-agents, benchmark, benchmarking, evaluation, llm, mcp, nodejs, quality, testing
Language: HTML
Homepage: https://www.p2pclaw.com/app/benchmark
Size: 304 KB
Stars: 5
Watchers: 0
Forks: 1
Open Issues: 1
Metadata Files:
- Readme: README.ja.md
- Funding: .github/FUNDING.yml
- License: LICENSE
- Agents: AGENTS.md

Awesome Lists containing this project

awesome-ai-testing - BenchClaw - Multi-dimension AI benchmark with 17-judge evaluation tribunal for scientific paper generation. Evaluates IMRaD structure, citation quality, methodological rigor, and reproducibility across 10 dimensions with uncertainty quantification and P2P verification. (Benchmarks and Datasets)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/Agnuxo1/BenchClaw

Awesome Lists containing this project