An open API service indexing awesome lists of open source software.

https://github.com/athina-ai/athina-evals

Python SDK for running evaluations on LLM generated responses
https://github.com/athina-ai/athina-evals

evaluation evaluation-framework evaluation-metrics llm-eval llm-evaluation llm-evaluation-toolkit llm-ops llmops

Last synced: 10 days ago
JSON representation

Python SDK for running evaluations on LLM generated responses

Awesome Lists containing this project

README

        

# Overview

Athina is an Observability and Experimentation platform for AI teams.

This SDK is an open-source repository of [50+ preset evals](https://docs.athina.ai/evals/preset-evals/overview). You can also use [custom evals](https://docs.athina.ai/evals/custom-evals/overview).

This SDK also serves as a companion to [Athina IDE](https://athina.ai/develop) where you can prototype pipelines, run experiments and evaluations, and compare datasets.

---

### Quick Start
Follow [this notebook](https://github.com/athina-ai/athina-evals/blob/main/examples/run_eval_suite.ipynb) for a quick start guide.

To get an Athina API key, sign up at https://app.athina.ai

---

### Run Evals

These evals can be run [programmatically](https://athina.ai/videos/run-evals-programmatically.mp4), or [via the UI](https://docs.athina.ai/ide/run-eval) on Athina IDE.

image

---

### Compare datasets side-by-side ([Docs](https://docs.athina.ai/ide/compare-datasets))

Once a dataset is logged to Athina IDE, you can also compare it against another dataset.

![image](https://github.com/athina-ai/athina-evals/assets/7515552/90640acc-495e-45e0-b590-d6ddee8c5727)

Once you run evals using Athina, they will be visible in [Athina IDE](https://athina.ai/develop) where you can run experiments, evals, and compare datasets side-by-side.

---

### Preset Evals

---

### Athina Steps

To use CodeExecutionV2, you need to install e2b.

```bash
pip install e2b-code-interpreter
```