https://github.com/zer0contextlost/llmgate
Pre-deploy LLM regression testing for CI pipelines
https://github.com/zer0contextlost/llmgate
ci evaluation llm llmops pytest python regression-testing testing
Last synced: 2 months ago
JSON representation
Pre-deploy LLM regression testing for CI pipelines
- Host: GitHub
- URL: https://github.com/zer0contextlost/llmgate
- Owner: zer0contextlost
- License: mit
- Created: 2026-04-28T15:30:59.000Z (2 months ago)
- Default Branch: dev
- Last Pushed: 2026-04-28T15:38:57.000Z (2 months ago)
- Last Synced: 2026-04-28T17:19:36.020Z (2 months ago)
- Topics: ci, evaluation, llm, llmops, pytest, python, regression-testing, testing
- Language: Python
- Size: 11.7 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# llmgate
**Pre-deploy LLM regression testing for CI pipelines.** Trace your LLM calls, then `llmgate diff baseline current` fails your PR if output quality dropped. No server, no account, SQLite only.
```python
import llmgate
@llmgate.trace
def answer(question: str) -> str:
return my_llm_call(question)
```
That's it. Every call is logged locally. When you change your prompt or swap models, run:
```bash
llmgate diff main feature-branch
```
If outputs degraded, the command exits 1 and your PR fails. No server, no account, no config.
---
## Install
```bash
pip install llmgate
```
## Usage
### 1. Trace your LLM calls
```python
import llmgate
import os
os.environ["LLMGATE_RUN_ID"] = "v1.0" # set per run, or use git SHA in CI
@llmgate.trace
def my_pipeline(query: str) -> str:
context = retrieve(query)
return llm.complete(f"{context}\n\n{query}")
```
### 2. Assert output quality
```python
output = my_pipeline("What is the capital of France?")
llmgate.assert_contains(output, "Paris")
llmgate.assert_output(output, lambda s: len(s) < 500, "response too long")
llmgate.assert_similarity(output, baseline, threshold=0.85)
```
### 3. Diff runs in CI
```bash
# See all recorded runs
llmgate runs
# Compare two runs — exits 1 if regressions found
llmgate diff v1.0 v1.1
# Inspect a specific run
llmgate show abc123
```
### 4. GitHub Actions
```yaml
- name: Run LLM eval suite
env:
LLMGATE_RUN_ID: ${{ github.sha }}
run: python examples/eval_suite.py
- name: Check for regressions
run: llmgate diff ${{ github.base_ref }} ${{ github.sha }}
```
---
## How it works
- All traces are stored in `.llmgate.db` (SQLite, commit it or cache it as a CI artifact)
- `@llmgate.trace` works with any function that returns a string, or OpenAI/Anthropic response objects
- `llmgate diff` computes token-level similarity between baseline and current outputs
- Nothing leaves your machine unless you choose to push the `.db` file
## CLI reference
```
llmgate runs # list all runs with stats
llmgate show # inspect calls in a run
llmgate diff # compare runs, exit 1 on regression
--threshold FLOAT # similarity threshold (default: 0.8)
--no-fail # report only, don't exit 1
```