https://github.com/synapsekit/evalci
LLM quality gates for every PR — run @eval_case suites automatically and block merge if quality drops below threshold
https://github.com/synapsekit/evalci
ai ci eval github-actions llm llmops machine-learning quality-assurance synapsekit testing
Last synced: 2 months ago
JSON representation
LLM quality gates for every PR — run @eval_case suites automatically and block merge if quality drops below threshold
- Host: GitHub
- URL: https://github.com/synapsekit/evalci
- Owner: SynapseKit
- License: apache-2.0
- Created: 2026-04-09T17:37:26.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2026-04-09T18:50:24.000Z (3 months ago)
- Last Synced: 2026-04-09T19:38:56.937Z (3 months ago)
- Topics: ai, ci, eval, github-actions, llm, llmops, machine-learning, quality-assurance, synapsekit, testing
- Language: Python
- Homepage: https://synapsekit.github.io/synapsekit-docs/
- Size: 19.5 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- Funding: .github/FUNDING.yml
- License: LICENSE
- Security: SECURITY.md
Awesome Lists containing this project
README
# EvalCI by SynapseKit
[](https://github.com/marketplace/actions/evalci-by-synapsekit)
[](https://github.com/SynapseKit/evalci/blob/main/LICENSE)
[](https://github.com/SynapseKit/evalci/releases/latest)
[](https://github.com/SynapseKit/evalci/stargazers)
[](https://synapsekit.github.io/synapsekit-docs/docs/evalci/overview)
[](https://github.com/SynapseKit/evalci/discussions)
[](https://github.com/SynapseKit/evalci/issues)
**LLM quality gates for every PR.** Run your `@eval_case` suites automatically and block merge if quality drops below threshold.
- Zero infrastructure — runs entirely in GitHub Actions
- 2-minute setup
- Works with any LLM provider (OpenAI, Anthropic, Gemini, and [30+ more](https://synapsekit.github.io/synapsekit-docs/))
- Posts a formatted results table as a PR comment
- Sets Action outputs for downstream steps
---
## Quickstart
Add `.github/workflows/eval.yml` to your repo:
```yaml
name: EvalCI
on:
pull_request:
jobs:
eval:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: SynapseKit/evalci@v1
with:
path: tests/evals
threshold: "0.80"
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
```
That's it. EvalCI will:
1. Install `synapsekit` into the runner
2. Discover and run all `@eval_case`-decorated functions under `tests/evals/`
3. Post a results table as a PR comment
4. Fail the check if any case scores below threshold
---
## Example eval file
```python
# tests/evals/test_rag.py
from synapsekit.testing import eval_case
@eval_case(min_score=0.80, max_cost_usd=0.01, max_latency_ms=3000)
def test_rag_relevancy(eval_context):
result = my_rag_pipeline("What is SynapseKit?")
return eval_context.score_relevancy(result, reference="SynapseKit is a Python library...")
@eval_case(min_score=0.75)
def test_rag_faithfulness(eval_context):
result = my_rag_pipeline("How do I install SynapseKit?")
return eval_context.score_faithfulness(result, context=retrieved_docs)
```
---
## PR Comment
EvalCI posts a comment like this on every PR:
> ## EvalCI Results
>
> | | Test | Score | Cost | Latency |
> |---|---|---|---|---|
> | ✅ | test_rag_relevancy | 0.850 | $0.0050 | 1200ms |
> | ❌ | test_rag_faithfulness | 0.650 | $0.0120 | 2500ms |
>
> **1/2 passed** · Threshold: `0.80` · [SynapseKit EvalCI](https://synapsekit.github.io/synapsekit-docs/)
---
## Inputs
| Input | Description | Default |
|---|---|---|
| `path` | Path to eval files or directory | `.` |
| `threshold` | Global minimum score (0.0–1.0) | `0.7` |
| `extras` | pip extras for synapsekit (e.g. `openai,anthropic`) | `openai` |
| `synapsekit-version` | synapsekit version to install, or `latest` | `latest` |
| `github-token` | Token for posting PR comments | `${{ github.token }}` |
| `fail-on-regression` | Fail if score regresses vs. baseline | `false` |
| `token` | EvalCI backend API token _(future)_ | — |
## Outputs
| Output | Description |
|---|---|
| `passed` | Number of eval cases that passed |
| `failed` | Number of eval cases that failed |
| `total` | Total number of eval cases run |
| `mean-score` | Mean score across all eval cases |
---
## Using outputs in downstream steps
```yaml
- uses: SynapseKit/evalci@v1
id: eval
with:
path: tests/evals
- run: |
echo "Passed: ${{ steps.eval.outputs.passed }}/${{ steps.eval.outputs.total }}"
echo "Mean score: ${{ steps.eval.outputs.mean-score }}"
```
---
## Multiple providers
```yaml
- uses: SynapseKit/evalci@v1
with:
extras: "openai,anthropic"
threshold: "0.75"
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
```
---
## Badge
```markdown
[](https://github.com/{owner}/{repo}/actions/workflows/eval.yml)
```
---
## Documentation
Full documentation is available at **[synapsekit.github.io/synapsekit-docs/docs/evalci/overview](https://synapsekit.github.io/synapsekit-docs/docs/evalci/overview)**
| | |
|---|---|
| [Overview](https://synapsekit.github.io/synapsekit-docs/docs/evalci/overview) | What EvalCI is and how it works |
| [Quickstart](https://synapsekit.github.io/synapsekit-docs/docs/evalci/quickstart) | Set up in 5 minutes |
| [Writing eval cases](https://synapsekit.github.io/synapsekit-docs/docs/evalci/writing-evals) | How to write `@eval_case` functions |
| [Action reference](https://synapsekit.github.io/synapsekit-docs/docs/evalci/action-reference) | All inputs, outputs, and configuration |
| [Examples](https://synapsekit.github.io/synapsekit-docs/docs/evalci/examples) | RAG, agents, multi-provider workflows |
## About
EvalCI is built on [SynapseKit](https://synapsekit.github.io/synapsekit-docs/) — a Python library for building LLM applications with 30+ provider integrations and a built-in evaluation framework.
- [Documentation](https://synapsekit.github.io/synapsekit-docs/docs/evalci/overview)
- [SynapseKit](https://github.com/SynapseKit/SynapseKit)
- [Issues](https://github.com/SynapseKit/evalci/issues)
- [Discussions](https://github.com/SynapseKit/evalci/discussions)