https://github.com/athina-ai/athina-evals
Python SDK for running evaluations on LLM generated responses
https://github.com/athina-ai/athina-evals
evaluation evaluation-framework evaluation-metrics llm-eval llm-evaluation llm-evaluation-toolkit llm-ops llmops
Last synced: 10 days ago
JSON representation
Python SDK for running evaluations on LLM generated responses
- Host: GitHub
- URL: https://github.com/athina-ai/athina-evals
- Owner: athina-ai
- Created: 2023-11-22T10:46:44.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-04-15T15:21:24.000Z (10 days ago)
- Last Synced: 2025-04-15T15:42:19.594Z (10 days ago)
- Topics: evaluation, evaluation-framework, evaluation-metrics, llm-eval, llm-evaluation, llm-evaluation-toolkit, llm-ops, llmops
- Language: Python
- Homepage: https://docs.athina.ai
- Size: 1.84 MB
- Stars: 276
- Watchers: 5
- Forks: 17
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-llm-eval - athina-ai - ai是一个开源库,提供即插即用的预设评估(preset evals)/模块化、可扩展的框架来编写和运行评估,帮助工程师通过评估驱动的开发来系统性地提高他们的大型语言模型的可靠性和性能,athina-ai提供了一个系统,用于评估驱动的开发,克服了传统工作流程的限制,允许快速实验和具有一致指标的可定制评估器 | (Tools)
- StarryDivineSky - athina-ai/athina-evals - evals是一个用于评估大型语言模型(LLM)生成响应的Python SDK。它允许开发者轻松地对LLM的输出进行自动化的质量评估。该项目提供了一系列预定义的评估指标,例如准确性、相关性和一致性。用户可以自定义评估指标以满足特定的需求。Athina-evals通过将LLM的输出与参考答案或预定义的规则进行比较来工作。它支持多种LLM,包括OpenAI、Anthropic和Cohere。该项目旨在帮助开发者构建更可靠和高质量的LLM应用。使用Athina-evals可以显著减少手动评估的工作量,并提高评估的一致性。它提供了一个灵活且可扩展的框架,用于评估各种LLM任务。项目目标是简化LLM评估流程,并促进LLM技术的进步。 (A01_文本生成_文本对话 / 大语言对话模型及数据)
README
# Overview
Athina is an Observability and Experimentation platform for AI teams.
This SDK is an open-source repository of [50+ preset evals](https://docs.athina.ai/evals/preset-evals/overview). You can also use [custom evals](https://docs.athina.ai/evals/custom-evals/overview).
This SDK also serves as a companion to [Athina IDE](https://athina.ai/develop) where you can prototype pipelines, run experiments and evaluations, and compare datasets.
---
### Quick Start
Follow [this notebook](https://github.com/athina-ai/athina-evals/blob/main/examples/run_eval_suite.ipynb) for a quick start guide.To get an Athina API key, sign up at https://app.athina.ai
---
### Run Evals
These evals can be run [programmatically](https://athina.ai/videos/run-evals-programmatically.mp4), or [via the UI](https://docs.athina.ai/ide/run-eval) on Athina IDE.
---
### Compare datasets side-by-side ([Docs](https://docs.athina.ai/ide/compare-datasets))
Once a dataset is logged to Athina IDE, you can also compare it against another dataset.

Once you run evals using Athina, they will be visible in [Athina IDE](https://athina.ai/develop) where you can run experiments, evals, and compare datasets side-by-side.
---
### Preset Evals
---
### Athina Steps
To use CodeExecutionV2, you need to install e2b.
```bash
pip install e2b-code-interpreter
```