https://github.com/zer0contextlost/llmgate

Pre-deploy LLM regression testing for CI pipelines
https://github.com/zer0contextlost/llmgate

ci evaluation llm llmops pytest python regression-testing testing

Last synced: 2 months ago
JSON representation

Pre-deploy LLM regression testing for CI pipelines

Host: GitHub
URL: https://github.com/zer0contextlost/llmgate
Owner: zer0contextlost
License: mit
Created: 2026-04-28T15:30:59.000Z (2 months ago)
Default Branch: dev
Last Pushed: 2026-04-28T15:38:57.000Z (2 months ago)
Last Synced: 2026-04-28T17:19:36.020Z (2 months ago)
Topics: ci, evaluation, llm, llmops, pytest, python, regression-testing, testing
Language: Python
Size: 11.7 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # llmgate

**Pre-deploy LLM regression testing for CI pipelines.** Trace your LLM calls, then `llmgate diff baseline current` fails your PR if output quality dropped. No server, no account, SQLite only.

```python

import llmgate

@llmgate.trace

def answer(question: str) -> str:

    return my_llm_call(question)

```

That's it. Every call is logged locally. When you change your prompt or swap models, run:

```bash

llmgate diff main feature-branch

```

If outputs degraded, the command exits 1 and your PR fails. No server, no account, no config.

---

## Install

```bash

pip install llmgate

```

## Usage

### 1. Trace your LLM calls

```python

import llmgate

import os

os.environ["LLMGATE_RUN_ID"] = "v1.0"  # set per run, or use git SHA in CI

@llmgate.trace

def my_pipeline(query: str) -> str:

    context = retrieve(query)

    return llm.complete(f"{context}\n\n{query}")

```

### 2. Assert output quality

```python

output = my_pipeline("What is the capital of France?")

llmgate.assert_contains(output, "Paris")

llmgate.assert_output(output, lambda s: len(s) < 500, "response too long")

llmgate.assert_similarity(output, baseline, threshold=0.85)

```

### 3. Diff runs in CI

```bash

# See all recorded runs

llmgate runs

# Compare two runs — exits 1 if regressions found

llmgate diff v1.0 v1.1

# Inspect a specific run

llmgate show abc123

```

### 4. GitHub Actions

```yaml

- name: Run LLM eval suite

  env:

    LLMGATE_RUN_ID: ${{ github.sha }}

  run: python examples/eval_suite.py

- name: Check for regressions

  run: llmgate diff ${{ github.base_ref }} ${{ github.sha }}

```

---

## How it works

- All traces are stored in `.llmgate.db` (SQLite, commit it or cache it as a CI artifact)

- `@llmgate.trace` works with any function that returns a string, or OpenAI/Anthropic response objects

- `llmgate diff` computes token-level similarity between baseline and current outputs

- Nothing leaves your machine unless you choose to push the `.db` file

## CLI reference

```

llmgate runs                          # list all runs with stats

llmgate show                  # inspect calls in a run

llmgate diff       # compare runs, exit 1 on regression

  --threshold FLOAT                   # similarity threshold (default: 0.8)

  --no-fail                           # report only, don't exit 1

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/zer0contextlost/llmgate

Awesome Lists containing this project

README