Projects in Awesome Lists tagged with evaluation-tools
A curated list of projects in awesome lists tagged with evaluation-tools .
https://github.com/jetbrains/teamcity-ai-agent-testing-demo
End-to-end TeamCity framework to run AI agents on SWE-Bench Lite. Spin up isolated Docker images per task, extract patches, score with the official harness, and aggregate success rates. As an example, we'll look at Junie and Google Gemini CLI
agent-evaluation agentic-ai ai eval evaluation evaluation-framework evaluation-tools
Last synced: 18 Apr 2026