https://github.com/awslabs/agent-evaluation
A generative AI-powered framework for testing virtual agents.
https://github.com/awslabs/agent-evaluation
Last synced: 27 days ago
JSON representation
A generative AI-powered framework for testing virtual agents.
- Host: GitHub
- URL: https://github.com/awslabs/agent-evaluation
- Owner: awslabs
- License: apache-2.0
- Created: 2024-03-19T19:58:26.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-03-31T17:33:09.000Z (6 months ago)
- Last Synced: 2025-09-04T14:33:49.186Z (about 1 month ago)
- Language: Python
- Homepage: https://awslabs.github.io/agent-evaluation/
- Size: 178 MB
- Stars: 288
- Watchers: 3
- Forks: 42
- Open Issues: 22
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
- awesome_ai_agents - Agent-Evaluation - A generative AI-powered framework for testing virtual agents. (Building / Testing)
- awesome_ai_agents - Agent-Evaluation - A generative AI-powered framework for testing virtual agents. (Building / Testing)
- awesome-ai-agents - Agent Evaluation - evaluation) | Benchmark for evaluating agent capabilities. | (⚙️ Agent Operations / 📊 Evaluation)
README



[](https://github.com/PyCQA/bandit)
[](https://github.com/psf/black)
[](https://squidfunk.github.io/mkdocs-material/)# Agent Evaluation
Agent Evaluation is a generative AI-powered framework for testing virtual agents.
Internally, Agent Evaluation implements an LLM agent (evaluator) that will orchestrate conversations with your own agent (target) and evaluate the responses during the conversation.
## ✨ Key features
- Built-in support for popular AWS services including [Amazon Bedrock](https://aws.amazon.com/bedrock/), [Amazon Q Business](https://aws.amazon.com/q/business/), and [Amazon SageMaker](https://aws.amazon.com/sagemaker/). You can also [bring your own agent](https://awslabs.github.io/agent-evaluation/targets/custom_targets/) to test using Agent Evaluation.
- Orchestrate concurrent, multi-turn conversations with your agent while evaluating its responses.
- Define [hooks](https://awslabs.github.io/agent-evaluation/hooks/) to perform additional tasks such as integration testing.
- Can be incorporated into CI/CD pipelines to expedite the time to delivery while maintaining the stability of agents in production environments.## 📚 Documentation
To get started, please visit the full documentation [here](https://awslabs.github.io/agent-evaluation/). To contribute, please refer to [CONTRIBUTING.md](./CONTRIBUTING.md)
## 👏 Contributors
Shout out to these awesome contributors: