https://github.com/rul1an/assay
CI-native evidence compiler for agent systems: MCP policy enforcement, evidence receipts, Trust Basis claims, and reviewable artifacts.
https://github.com/rul1an/assay
agent-security ai-agents ai-security ci cyclonedx evidence evidence-bundles evidence-receipts github-actions mcp mcp-server openfeature policy-as-code policy-enforcement promptfoo provenance rust sbom supply-chain-security trust-basis
Last synced: 9 days ago
JSON representation
CI-native evidence compiler for agent systems: MCP policy enforcement, evidence receipts, Trust Basis claims, and reviewable artifacts.
- Host: GitHub
- URL: https://github.com/rul1an/assay
- Owner: Rul1an
- License: mit
- Created: 2025-12-20T22:56:24.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2026-05-23T20:52:32.000Z (11 days ago)
- Last Synced: 2026-05-23T21:25:28.552Z (11 days ago)
- Topics: agent-security, ai-agents, ai-security, ci, cyclonedx, evidence, evidence-bundles, evidence-receipts, github-actions, mcp, mcp-server, openfeature, policy-as-code, policy-enforcement, promptfoo, provenance, rust, sbom, supply-chain-security, trust-basis
- Language: Rust
- Homepage: https://getassay.dev
- Size: 210 MB
- Stars: 2
- Watchers: 0
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Codeowners: .github/CODEOWNERS
- Security: SECURITY.md
- Roadmap: docs/ROADMAP.md
Awesome Lists containing this project
README
Assay
Evidence compiler for agent review artifacts
Portable evidence receipts, verifiable bundles, and bounded Trust Basis claims for agent systems.
See It Work ·
Promptfoo JSONL ·
OpenFeature ·
CycloneDX ML-BOM ·
Quick Start ·
CI Guide ·
Discussions
---
Use Assay if you already have machine-readable AI outcomes or agent tool-call tests and want a small reviewable artifact boundary in CI.
Start with the path that matches what you already have:
| You have | Use this when | What you get | Next click |
|---|---|---|---|
| Promptfoo JSONL from CI evals | You want smaller PR evidence than a full eval export | Eval outcome receipts, verified bundle, Trust Basis diff | [Promptfoo JSONL](docs/use-cases/evidence-receipts-from-promptfoo-jsonl.md) |
| OpenFeature boolean `EvaluationDetails` | You want CI evidence for a runtime flag decision boundary | Decision receipt, verified bundle, Trust Basis diff | [OpenFeature EvaluationDetails](docs/use-cases/openfeature-evaluationdetails-to-ci-review-artifact.md) |
| CycloneDX ML-BOM model component | You want CI evidence for the model inventory/provenance boundary that existed | Inventory receipt, verified bundle, Trust Basis diff | [CycloneDX ML-BOM](docs/use-cases/cyclonedx-mlbom-model-to-inventory-receipt.md) |
| MCP tool calls | You are ready to put a policy file around tool execution | Allow/deny audit trail and evidence for observed tool behavior | [MCP Quick Start](examples/mcp-quickstart/) |
| A GitHub PR gate | You want CI to block regressions from checked artifacts | Trust Basis diff, gate status, SARIF/JUnit-ready output | [CI Guide](docs/guides/github-action.md) |
The core workflow is intentionally small: import or record a bounded outcome, bundle and verify it, compile `trust-basis.json`, then gate the Trust Basis diff. Assay does not make the upstream tool the source of truth; it makes the evidence boundary inspectable.
```text
Trust Basis Gate
Status: OK
Bundles verified: 1
Regressed claims: 0
```
Assay is not a trust-score engine, a generic eval dashboard, or a hosted observability product. See [What Assay is and is not](docs/concepts/scope.md) for the boundary.
## Is This For Me?
**Yes, if you:**
- already have eval output, runtime decisions, inventory artifacts, or MCP tool-call tests
- want a CI review artifact instead of a dashboard-only result
- need bounded auditability, not a scalar trust badge
**Not yet, if you:**
- need Assay to judge model correctness or policy quality for you
- want a hosted dashboard as the primary product
- want a compliance claim instead of a bounded evidence boundary
## Install
```bash
cargo install assay-cli
```
CI: [GitHub Action](https://github.com/marketplace/actions/assay-ai-agent-security). Python SDK: `pip install assay-it`.
No hosted backend. No API keys for core flows. Deterministic: same input, same decision.
Evidence levels and non-goals
Trust claims use explicit **epistemology**, not a single “safety score”:
| Level | Meaning |
|-------|---------|
| `verified` | Backed by direct evidence or offline verification in the bundle/path |
| `self_reported` | Emitted by the system without stronger independent corroboration |
| `inferred` | Derived from bounded, documented rules |
| `absent` | No trustworthy evidence supports the claim |
Assay does **not** ship a primary aggregate trust score or a `safe/unsafe` badge as the main output. See [ADR-033](docs/architecture/ADR-033-OTel-Trust-Compiler-Positioning.md).
## What ships today
| Output | Role |
|--------|------|
| **Policy gate** | MCP `wrap` — deterministic allow/deny before tools run (see CLI note below the diagram). |
| **Evidence bundle** | Offline-verifiable, tamper-evident archive for audit and replay. |
| **External receipts** | Selected eval outcomes, runtime decision details, and inventory/provenance surfaces as bounded evidence receipts with JSON Schema contracts. |
| **Trust Basis** | Canonical `trust-basis.json` — bounded claim classification from verified bundles. |
| **Trust Card** | `trustcard.json` / `trustcard.md` / `trustcard.html` — same claims, review-friendly artifacts. |
| **SARIF / CI** | GitHub Action, Security tab integration, policy gates on PRs. |
> **Repository truth:** release notes and [CHANGELOG.md](CHANGELOG.md) remain the authority for what is actually public. `main` may carry release-prep commits before a tag is cut; crates.io publication is separate from repository merge state.
```
Agent ──► Assay ──► MCP Server
│
├─ ✅ ALLOW / ❌ DENY (policy)
├─► 📋 Evidence bundle (verifiable)
└─► 📊 Trust Basis → Trust Card → SARIF / CI
```
> **CLI:** The `mcp` command group is **hidden** from top-level `assay --help` while the surface stabilizes; it is supported. Use `assay mcp --help`, `assay mcp wrap …`, or follow the [MCP Quickstart](examples/mcp-quickstart/).
> **Wedge, not category.** “MCP firewall” describes the control plane; **trust compilation** describes the outcome: reviewable claims backed by evidence. See [ADR-033](docs/architecture/ADR-033-OTel-Trust-Compiler-Positioning.md) and [RFC-005](docs/architecture/RFC-005-trust-compiler-mvp-2026q2.md).
## See It Work
[](https://safeskill.dev/scan/rul1an-assay)
```bash
cargo install assay-cli
mkdir -p /tmp/assay-demo && echo "safe content" > /tmp/assay-demo/safe.txt
assay mcp wrap --policy examples/mcp-quickstart/policy.yaml \
-- npx @modelcontextprotocol/server-filesystem /tmp/assay-demo
```
```
✅ ALLOW read_file path=/tmp/assay-demo/safe.txt reason=policy_allow
✅ ALLOW list_dir path=/tmp/assay-demo/ reason=policy_allow
❌ DENY read_file path=/tmp/outside-demo.txt reason=path_constraint_violation
❌ DENY exec cmd=ls reason=tool_denied
```
Inspect the audit artifact:
```bash
assay evidence show demo/fixtures/bundle.tar.gz
```

The bundle is tamper-evident and cryptographically verifiable. Signed mandate events can include an Ed25519-backed authorization trail for high-risk actions.
### Trust artifacts from a verified bundle
After a bundle verifies, compile the claim artifact:
```bash
# Machine-readable claim basis (deterministic, claim-first)
assay trust-basis generate demo/fixtures/bundle.tar.gz > trust-basis.json
```
`trust-basis.json` is the canonical output for CI and review. Claim `id` values are stable across runs; consumers should key by `id`, not row count or order. It is not a scalar trust score.
The current claim-visible receipt families are Promptfoo assertion-component results, OpenFeature boolean `EvaluationDetails`, and CycloneDX ML-BOM model components. See the [receipt-family matrix](docs/reference/receipt-family-matrix.json), the [three-family note](docs/notes/EVIDENCE-RECEIPTS-FOR-AI-OUTCOMES-RUNTIME-DECISIONS-MODEL-INVENTORY.md), and [Evidence Receipts in Action](docs/notes/EVIDENCE-RECEIPTS-IN-ACTION.md).
Trust Card details
```bash
assay trustcard generate demo/fixtures/bundle.tar.gz --out-dir ./trust-out
# -> trust-out/trustcard.json , trust-out/trustcard.md , trust-out/trustcard.html
```
The Trust Card is a deterministic render of the same claim rows plus frozen non-goals; `trustcard.json` is canonical, while Markdown and static HTML are reviewer projections. Contract versions, pack floors, and release checklist: [MIGRATION — Trust Compiler 3.2](docs/architecture/MIGRATION-TRUST-COMPILER-3.2.md), [receipt-family matrix](docs/reference/receipt-family-matrix.json). Release history belongs in [CHANGELOG.md](CHANGELOG.md).
## Add to Cursor in 30 Seconds
Assay ships a helper that finds your local Cursor MCP config path and prints a ready-to-paste entry:
```bash
assay mcp config-path cursor
```
It generates JSON like:
```json
{
"filesystem-secure": {
"command": "assay",
"args": [
"mcp",
"wrap",
"--policy",
"/path/to/policy.yaml",
"--",
"npx",
"-y",
"@modelcontextprotocol/server-filesystem",
"/Users/you"
]
}
}
```
The same wrapped command works in other MCP clients — see [MCP Quick Start](docs/mcp/quickstart.md).
## Policy Is Simple
```yaml
version: "2.0"
name: "my-policy"
tools:
allow: ["read_file", "list_dir"]
deny: ["exec", "shell", "write_file"]
schemas:
read_file:
type: object
additionalProperties: false
properties:
path:
type: string
pattern: "^/app/.*"
minLength: 1
required: ["path"]
```
Legacy `constraints:` policies still work. Use `assay policy migrate` for the v2 JSON Schema form, or `assay init --from-trace trace.jsonl` to generate from observed behavior.
See [Policy Files](docs/reference/config/policies.md).
Other import paths and protocol adapters
### OpenTelemetry in, canonical evidence out
Assay ingests OpenTelemetry JSONL, builds replayable traces, and exports **canonical evidence** — OTel is a bridge, not the sole semantic authority.
```bash
assay trace ingest-otel \
--input otel-export.jsonl \
--db .eval/eval.db \
--out-trace traces/otel.v2.jsonl
```
See [OpenTelemetry & Langfuse](docs/guides/otel-langfuse.md).
### Protocol adapters
Assay ships adapters that map protocol events into **canonical evidence**:
| Protocol | Adapter | What it maps |
|----------|---------|--------------|
| **ACP** (OpenAI/Stripe) | `assay-adapter-acp` | Checkout events, payment intents, tool calls |
| **A2A** (Google) | `assay-adapter-a2a` | Agent capabilities, task delegation, artifacts |
| **UCP** (Google/Shopify) | `assay-adapter-ucp` | Discover/buy/post-purchase state transitions |
Adapter crates are workspace / binary-driven, not published as separate `crates.io` packages.
## Add to CI
```yaml
# .github/workflows/assay.yml
name: Assay Gate
on: [push, pull_request]
permissions:
contents: read
security-events: write
jobs:
assay:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: Rul1an/assay-action@v2
```
PRs that violate policy get blocked; SARIF can surface in the Security tab.
## Why Assay
| | |
|---|---|
| **Canonical evidence** | Assay’s evidence model is the stable contract; OTel and adapters map into it. |
| **Deterministic** | Same input, same decision — not probabilistic. |
| **Portable artifacts** | Bundles, Trust Basis, Trust Card, SARIF — for CI, review, audit. |
| **Bounded claims** | Explicit about what is **verified** vs **visible** vs **absent** — no score-first UX. |
| **MCP-native wedge** | `assay mcp wrap` is the fast path (the `mcp` group is hidden from `assay --help`; use `assay mcp --help`). Adapters extend the same engine. |
| **Offline-first** | No backend required for core enforcement and bundle verification. |
Measured latency
On the M1 Pro/macOS fragmented-IPI harness, protected tool-decision path:
- Main protection run: `0.771ms` p50 / `1.913ms` p95
- Fast-path scenario: `0.345ms` p50 / `1.145ms` p95
These are tool-decision timings, not end-to-end model latency. (See [Research & experiments](#research-mappings-experiments) for methodology context.)
## Learn More
- [Promptfoo JSONL to Evidence Receipts](docs/use-cases/evidence-receipts-from-promptfoo-jsonl.md) — smallest adoption path for existing eval artifacts
- [OpenFeature EvaluationDetails to CI Review Artifact](docs/use-cases/openfeature-evaluationdetails-to-ci-review-artifact.md) — runtime decision receipt path
- [CycloneDX ML-BOM Model to Inventory Receipt](docs/use-cases/cyclonedx-mlbom-model-to-inventory-receipt.md) — model inventory/provenance receipt path
- [MCP Quickstart](examples/mcp-quickstart/) — filesystem server walkthrough
- [Policy Files](docs/reference/config/policies.md) — YAML schema for `assay mcp wrap`
- [OpenTelemetry & Langfuse](docs/guides/otel-langfuse.md) — traces → replay and evidence
- [CI Guide](docs/guides/github-action.md) — GitHub Action
- [Evidence Store](docs/guides/evidence-store-aws-s3.md) — S3, B2, MinIO
- [ADR-033: Trust compiler positioning](docs/architecture/ADR-033-OTel-Trust-Compiler-Positioning.md)
- [RFC-005: Trust compiler MVP & Trust Card](docs/architecture/RFC-005-trust-compiler-mvp-2026q2.md)
## Internal: Assay-Runner
Assay-Runner is an internal measured-run subsystem used by Assay's delegated Linux/eBPF acceptance path. It is **not a standalone product**. As of Phase 2D, the runner candidate is split into extraction-ready Rust crates (`assay-runner-schema`, `assay-runner-core`, `assay-runner-linux`) — all `publish = false` — plus the `runner-fixtures/` package tree (Node fixture marked `"private": true`; Python fixture has no distribution surface). Everything stays inside this repository.
- [Assay-Runner reference index](docs/reference/runner/index.md) — internal contracts, boundary map, slice history
- [Measured-run proof-bundle walkthrough](docs/reference/runner/examples/measured-run-proof-bundle.md) — read-only walkthrough for maintainers evaluating standalone use cases
- [Phase 2D consolidation audit](docs/reference/runner/phase-2d-consolidation-audit.md) — current burn-in criteria; the extraction question is closed until the criteria are observed and at least one concrete external use case appears
No release commitment. No timeline. No external demand has been measured.
## Research, mappings & experiments
**Bounded context:** numbers below support **mapping and experiments**, not a product “security score.”
- [OWASP MCP Top 10 Mapping](docs/security/OWASP-MCP-TOP10-MAPPING.md) — how Assay relates to each risk category (coverage is **not** a scalar guarantee).
- Third-party survey: popular MCP servers often show weak defaults — Assay adds policy + evidence; see discussion in the mapping doc.
- [Security experiments](docs/architecture/SYNTHESIS-TRUST-CHAIN-TRIFECTA-2026q2.md) — attack vectors and harness notes (methodology matters more than headline counts).
## Contributing
```bash
cargo test --workspace
cargo clippy --workspace --all-targets -- -D warnings
```
See [CONTRIBUTING.md](CONTRIBUTING.md). **Discussions:** [GitHub Discussions](https://github.com/Rul1an/assay/discussions) — seed topics for pinned threads live in [docs/community/DISCUSSIONS.md](docs/community/DISCUSSIONS.md).
## License
[MIT](LICENSE)