https://github.com/synapsekit/evalci

LLM quality gates for every PR — run @eval_case suites automatically and block merge if quality drops below threshold
https://github.com/synapsekit/evalci

ai ci eval github-actions llm llmops machine-learning quality-assurance synapsekit testing

Last synced: 2 months ago
JSON representation

LLM quality gates for every PR — run @eval_case suites automatically and block merge if quality drops below threshold

Host: GitHub
URL: https://github.com/synapsekit/evalci
Owner: SynapseKit
License: apache-2.0
Created: 2026-04-09T17:37:26.000Z (3 months ago)
Default Branch: main
Last Pushed: 2026-04-09T18:50:24.000Z (3 months ago)
Last Synced: 2026-04-09T19:38:56.937Z (3 months ago)
Topics: ai, ci, eval, github-actions, llm, llmops, machine-learning, quality-assurance, synapsekit, testing
Language: Python
Homepage: https://synapsekit.github.io/synapsekit-docs/
Size: 19.5 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- Funding: .github/FUNDING.yml
- License: LICENSE
- Security: SECURITY.md

Awesome Lists containing this project

README

          # EvalCI by SynapseKit

[![GitHub Marketplace](https://img.shields.io/badge/Marketplace-EvalCI-blue?logo=github)](https://github.com/marketplace/actions/evalci-by-synapsekit)

[![License](https://img.shields.io/badge/License-Apache_2.0-orange.svg)](https://github.com/SynapseKit/evalci/blob/main/LICENSE)

[![Latest Release](https://img.shields.io/github/v/release/SynapseKit/evalci?label=release&color=orange)](https://github.com/SynapseKit/evalci/releases/latest)

[![GitHub Stars](https://img.shields.io/github/stars/SynapseKit/evalci?style=flat&color=orange)](https://github.com/SynapseKit/evalci/stargazers)

[![Docs](https://img.shields.io/badge/docs-synapsekit-orange)](https://synapsekit.github.io/synapsekit-docs/docs/evalci/overview)

[![Discussions](https://img.shields.io/github/discussions/SynapseKit/evalci)](https://github.com/SynapseKit/evalci/discussions)

[![Issues](https://img.shields.io/github/issues/SynapseKit/evalci)](https://github.com/SynapseKit/evalci/issues)

**LLM quality gates for every PR.** Run your `@eval_case` suites automatically and block merge if quality drops below threshold.

- Zero infrastructure — runs entirely in GitHub Actions

- 2-minute setup

- Works with any LLM provider (OpenAI, Anthropic, Gemini, and [30+ more](https://synapsekit.github.io/synapsekit-docs/))

- Posts a formatted results table as a PR comment

- Sets Action outputs for downstream steps

---

## Quickstart

Add `.github/workflows/eval.yml` to your repo:

```yaml

name: EvalCI

on:

  pull_request:

jobs:

  eval:

    runs-on: ubuntu-latest

    steps:

      - uses: actions/checkout@v4

      - uses: SynapseKit/evalci@v1

        with:

          path: tests/evals

          threshold: "0.80"

        env:

          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}

```

That's it. EvalCI will:

1. Install `synapsekit` into the runner

2. Discover and run all `@eval_case`-decorated functions under `tests/evals/`

3. Post a results table as a PR comment

4. Fail the check if any case scores below threshold

---

## Example eval file

```python

# tests/evals/test_rag.py

from synapsekit.testing import eval_case

@eval_case(min_score=0.80, max_cost_usd=0.01, max_latency_ms=3000)

def test_rag_relevancy(eval_context):

    result = my_rag_pipeline("What is SynapseKit?")

    return eval_context.score_relevancy(result, reference="SynapseKit is a Python library...")

@eval_case(min_score=0.75)

def test_rag_faithfulness(eval_context):

    result = my_rag_pipeline("How do I install SynapseKit?")

    return eval_context.score_faithfulness(result, context=retrieved_docs)

```

---

## PR Comment

EvalCI posts a comment like this on every PR:

> ## EvalCI Results

>

> | | Test | Score | Cost | Latency |

> |---|---|---|---|---|

> | ✅ | test_rag_relevancy | 0.850 | $0.0050 | 1200ms |

> | ❌ | test_rag_faithfulness | 0.650 | $0.0120 | 2500ms |

>

> **1/2 passed** · Threshold: `0.80` · [SynapseKit EvalCI](https://synapsekit.github.io/synapsekit-docs/)

---

## Inputs

| Input | Description | Default |

|---|---|---|

| `path` | Path to eval files or directory | `.` |

| `threshold` | Global minimum score (0.0–1.0) | `0.7` |

| `extras` | pip extras for synapsekit (e.g. `openai,anthropic`) | `openai` |

| `synapsekit-version` | synapsekit version to install, or `latest` | `latest` |

| `github-token` | Token for posting PR comments | `${{ github.token }}` |

| `fail-on-regression` | Fail if score regresses vs. baseline | `false` |

| `token` | EvalCI backend API token _(future)_ | — |

## Outputs

| Output | Description |

|---|---|

| `passed` | Number of eval cases that passed |

| `failed` | Number of eval cases that failed |

| `total` | Total number of eval cases run |

| `mean-score` | Mean score across all eval cases |

---

## Using outputs in downstream steps

```yaml

- uses: SynapseKit/evalci@v1

  id: eval

  with:

    path: tests/evals

- run: |

    echo "Passed: ${{ steps.eval.outputs.passed }}/${{ steps.eval.outputs.total }}"

    echo "Mean score: ${{ steps.eval.outputs.mean-score }}"

```

---

## Multiple providers

```yaml

- uses: SynapseKit/evalci@v1

  with:

    extras: "openai,anthropic"

    threshold: "0.75"

  env:

    OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}

    ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}

```

---

## Badge

```markdown

[![EvalCI](https://github.com/{owner}/{repo}/actions/workflows/eval.yml/badge.svg)](https://github.com/{owner}/{repo}/actions/workflows/eval.yml)

```

---

## Documentation

Full documentation is available at **[synapsekit.github.io/synapsekit-docs/docs/evalci/overview](https://synapsekit.github.io/synapsekit-docs/docs/evalci/overview)**

| | |

|---|---|

| [Overview](https://synapsekit.github.io/synapsekit-docs/docs/evalci/overview) | What EvalCI is and how it works |

| [Quickstart](https://synapsekit.github.io/synapsekit-docs/docs/evalci/quickstart) | Set up in 5 minutes |

| [Writing eval cases](https://synapsekit.github.io/synapsekit-docs/docs/evalci/writing-evals) | How to write `@eval_case` functions |

| [Action reference](https://synapsekit.github.io/synapsekit-docs/docs/evalci/action-reference) | All inputs, outputs, and configuration |

| [Examples](https://synapsekit.github.io/synapsekit-docs/docs/evalci/examples) | RAG, agents, multi-provider workflows |

## About

EvalCI is built on [SynapseKit](https://synapsekit.github.io/synapsekit-docs/) — a Python library for building LLM applications with 30+ provider integrations and a built-in evaluation framework.

- [Documentation](https://synapsekit.github.io/synapsekit-docs/docs/evalci/overview)

- [SynapseKit](https://github.com/SynapseKit/SynapseKit)

- [Issues](https://github.com/SynapseKit/evalci/issues)

- [Discussions](https://github.com/SynapseKit/evalci/discussions)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/synapsekit/evalci

Awesome Lists containing this project

README