https://github.com/geval-labs/geval
Decision orchestration and reconciliation for AI changes.
https://github.com/geval-labs/geval
ai-agents aievals evals evaluation geval llm-evaluation llms open-source
Last synced: 2 months ago
JSON representation
Decision orchestration and reconciliation for AI changes.
- Host: GitHub
- URL: https://github.com/geval-labs/geval
- Owner: geval-labs
- License: mit
- Created: 2026-01-24T21:11:03.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2026-03-30T15:53:55.000Z (2 months ago)
- Last Synced: 2026-03-30T17:35:45.443Z (2 months ago)
- Topics: ai-agents, aievals, evals, evaluation, geval, llm-evaluation, llms, open-source
- Language: Rust
- Homepage: https://geval.io
- Size: 352 KB
- Stars: 18
- Watchers: 1
- Forks: 8
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# Geval
Decision orchestration and reconciliation for AI changes.
You bring all kinds of signals and your rules. Geval orchestrates and reconciles them into one outcome. No brain — just your rules applied, every time.
---
## Demo video
**[Watch the Geval demo on YouTube →](https://youtu.be/v6LuxIshgDU)** — walkthrough of how Geval turns signals and policy rules into **PASS**, **REQUIRE_APPROVAL**, or **BLOCK**.
---
## Try it in under a minute
**1. Download** (pick your OS):
```bash
# Linux
curl -sSL https://github.com/geval-labs/geval/releases/latest/download/geval-linux-x86_64 -o geval && chmod +x geval
# macOS (Apple Silicon)
curl -sSL https://github.com/geval-labs/geval/releases/latest/download/geval-macos-aarch64 -o geval && chmod +x geval
# Windows (PowerShell) — see note below
Invoke-WebRequest -Uri https://github.com/geval-labs/geval/releases/latest/download/geval-windows-x86_64.exe -OutFile geval.exe
```
> **Windows:** Open **PowerShell as Administrator** (right‑click → *Run as administrator*). Then run the download command and `.\geval.exe demo`.
**2. Run the demo** (no files needed):
```bash
./geval demo # Linux / macOS — use ./ so you run this binary, not another "geval" in PATH
.\geval.exe demo # Windows (same folder as geval.exe)
```
You get a report and one outcome: **PASS**, **REQUIRE_APPROVAL**, or **BLOCK** — produced by the demo contract and signals. [Use in CI →](geval/docs/github-actions.md)
**No binary for your OS?** [Build from source](geval/docs/installation.md#build-from-source).
> **If you see "unknown command 'init'" or "required option '--eval'"** — you're running a **different** program named `geval` (e.g. from npm or another install). Use the **binary from [Releases](https://github.com/geval-labs/geval/releases)** or build from source and run it with `./geval` (or put it first in your PATH).
### Start from a template (like create-react-app)
Inside your project (your codebase is not changed except for one new folder), run the **same binary** you downloaded (e.g. `./geval`):
```bash
./geval init # or: /path/to/geval init
```
This creates a **.geval** folder with:
- **contract.yaml** — Names your release gate, versions it, and lists policy files to evaluate.
- **policies/** — Two starter files with descriptive names (`safety-and-blocking.yaml`, `quality-and-approval.yaml`); edit rules to match your metrics.
- **signals.json** — Example pipeline metrics; replace with your real signal names and values.
- **README.md** — What each file is for and how to run checks.
Then run:
```bash
./geval check --contract .geval/contract.yaml --signals .geval/signals.json
```
Use a different folder: `./geval init my-rules`. Overwrite existing files: `./geval init --force`.
### Updating
Use the same download commands. Replace your old file with the new one. Check version: `geval --version`.
---
## Use Geval with your own signals and contract
You need a **contract** (one YAML that references one or more **policy** files) and a **signals** file. Geval evaluates each policy against the same signals, then combines outcomes (e.g. all must pass, or any block blocks). Use `geval init` for a template with a contract and two policies, or create the files yourself below.
**All kinds of signals:** Not every signal needs a score. You can mix: entries with a numeric `value`, and entries with no value (presence-only). Use a rule with `operator: presence` to match “this metric exists.” [Details →](geval/docs/signals-and-rules.md)
### Step 1: Your signals (data file)
A list of evidence: what you measured, observed, or flagged. Each item has a **metric** (name). **Value** is optional — use it for scores; omit it for “this happened” (presence-only).
Example — save as `mydata.json`:
```json
{
"signals": [
{ "metric": "accuracy", "value": 0.94 },
{ "metric": "engagement_drop", "value": 0.02 }
]
}
```
You can add labels like `component` or `system` if you need them. [Full example →](geval/examples/signals.json)
### Step 2: Your contract and policies
A **contract** is a YAML file that lists one or more **policy** files and a **combination rule** (how to merge their outcomes). Each **policy** file contains rules with **unique** priorities: **When** [condition on signals], **then** [pass / block / require_approval].
**Prefer a form instead of writing YAML by hand?** Use **[config.geval.io](https://config.geval.io)** to generate Geval-compatible `contract.yaml` and policy files (download or copy), then validate with `geval validate-contract` and run `geval check` as below.
Example contract — save as `contract.yaml`:
```yaml
name: my-gate
version: "1.0.0"
combine: worst_case
policies:
- path: policy.yaml
```
Example policy — save as `policy.yaml` (path relative to the contract file):
```yaml
name: quality
version: "1.0.0"
policy:
rules:
- priority: 1
name: block_bad_engagement
when:
metric: engagement_drop
operator: ">"
threshold: 0
then:
action: block
- priority: 2
name: allow_good_accuracy
when:
metric: accuracy
operator: ">="
threshold: 0.9
then:
action: pass
```
**Combine (`worst_case`):** any **BLOCK** wins; else any **require_approval**; else **pass**. **Rule priorities** must be **unique** per policy; **1** = highest; Geval records every match and the **best** priority wins. **Operators:** `>`, `<`, `>=`, `<=`, `==`, `presence`. **Actions:** `pass`, `block`, `require_approval`.
[Full example →](geval/examples/contract.yaml) and [policy →](geval/examples/policy.yaml)
### Step 3: Run Geval
```bash
./geval check --contract contract.yaml --signals mydata.json
```
(Windows: `.\geval.exe check --contract contract.yaml --signals mydata.json`)
### Step 4: Read the outcome
- **PASS** — Every policy passed (or combined rule says go).
- **REQUIRE_APPROVAL** — At least one policy requires approval.
- **BLOCK** — At least one policy blocks.
To see **per-policy results** and the combined decision:
```bash
./geval explain --contract contract.yaml --signals mydata.json
```
To validate the contract and all referenced policies:
```bash
./geval validate-contract contract.yaml
```
---
The problem •
What Geval is •
Commands •
Docs
---
## The problem
You have many signals: scores, A/B results, human reviews, flags. You change a model or a prompt. Then what?
- One signal says “better.”
- Another says “worse.”
- Someone asks: “Do we ship?”
Today that call happens in chat or a meeting. Hard to repeat. Hard to audit. You don't need a system that "decides" for you — you need **orchestration and reconciliation**: one place to define rules, one place to feed all your signals (not just numbers), and one deterministic outcome every time.
---
## What Geval is
**Geval is a decision orchestration and reconciliation engine.** It does not make decisions. It has no brain. You provide:
1. **Your signals** (one file) — any kind: scores, presence-only, flags, labels. Non-uniform is fine.
2. **Your rules** (one file) — e.g. “If engagement drops, block. If accuracy is below X, need approval.”
Geval **orchestrates** the run and **reconciles** your signals against your rules in order. Same inputs + same rules = same outcome. It returns:
| Outcome | Meaning |
|--------|--------|
| **PASS** | No rule matched a block or require-approval. Good to go. |
| **REQUIRE_APPROVAL** | A rule matched; it says a person must approve first. |
| **BLOCK** | A rule matched; it says don’t ship. Fix first. |
Each run is recorded: which rules, which signals, when. So you can always answer: “Why did we ship?” and “Who approved?” — without any black box.
---
## Commands
Run with `./geval` (or ensure this repo’s binary is the one in your PATH):
| Command | What it does |
|--------|----------------|
| `./geval demo` | Run the built-in example. Try this first. |
| `./geval init` | Create .geval/ with contract and policies. Edit and run. |
| `./geval check --contract --signals ` | Evaluate contract → one outcome (PASS / REQUIRE_APPROVAL / BLOCK) |
| `./geval explain --contract --signals ` | Per-policy results and combined decision report |
| `./geval validate-contract ` | Validate contract and all referenced policies |
| `./geval approve` / `./geval reject` | Record a person’s approval or rejection |
---
## Documentation
| Guide | Description |
|-------|-------------|
| [**Demo video (YouTube)**](https://youtu.be/v6LuxIshgDU) | Walkthrough of Geval |
| [**Config generator (web)**](https://config.geval.io) | Fill in forms → download `contract.yaml` and policies |
| [**Architecture**](geval/docs/architecture.md) | Contract = multiple policies + combine rule; module layout |
| [**Signals and rules**](geval/docs/signals-and-rules.md) | Non-uniform signals (scores, presence-only, mix); how rules use them |
| [**Signal assumptions**](geval/docs/signal-assumptions.md) | What we assume; what input forms we accept (number, string, trace, object) |
| [**Versioning**](geval/docs/versioning.md) | Contract, policy, and signals versioning; nothing unversioned |
| [**Extending**](geval/docs/extending.md) | How to add a combination rule or change behavior; process and conventions |
| [**GitHub Actions**](geval/docs/github-actions.md) | Use Geval in CI |
| [**Examples**](geval/examples/README.md) | Sample data and rules files |
| [**Customer demo (feature story)**](geval/docs/customer-demo-feature.md) | Signals, policies, rules, and PASS/BLOCK/approval narrative for demos |
| [**Installation**](geval/docs/installation.md) | Install, PATH, build from source |
| [**Developer workflow**](geval/docs/developer-workflow.md) | PRs, check, approve/reject |
| [**Auditing**](geval/docs/auditing.md) | How decisions are recorded |
---
## Contributing
Contributions welcome. [CONTRIBUTING.md](CONTRIBUTING.md). Build from source: [Installation](geval/docs/installation.md#build-from-source).
---
## License
MIT © [Geval Contributors](https://github.com/geval-labs/geval/graphs/contributors)
---
Website •
Demo video •
Config generator •
Releases •
GitHub