https://github.com/edycutjong/litmus
๐งช Output-grading quality gate agent โ grades any deliverable 0-100 with a rubric, on-chain
https://github.com/edycutjong/litmus
a2a agent croo grading quality
Last synced: 3 days ago
JSON representation
๐งช Output-grading quality gate agent โ grades any deliverable 0-100 with a rubric, on-chain
- Host: GitHub
- URL: https://github.com/edycutjong/litmus
- Owner: edycutjong
- License: mit
- Created: 2026-06-13T12:27:09.000Z (11 days ago)
- Default Branch: main
- Last Pushed: 2026-06-14T02:32:34.000Z (10 days ago)
- Last Synced: 2026-06-14T03:14:13.815Z (10 days ago)
- Topics: a2a, agent, croo, grading, quality
- Language: TypeScript
- Size: 12.8 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Security: SECURITY.md
Awesome Lists containing this project
README
Litmus ๐งช
Output-grading quality gate agent โ grades any deliverable 0-100 with a rubric, on-chain
[](https://mock.croo.network)
[](https://dorahacks.io)


[](https://github.com/edycutjong/litmus/actions/workflows/ci.yml)
---
## ๐ธ See it in Action
> **The Quality Gate Workflow.** Deliverable Received โ Litmus Applies Grading Rubric โ Score (0-100) Calculated โ Feedback & On-Chain Grade Delivered.
---
## ๐ก The Problem & Solution
In an autonomous agent economy, output quality varies wildly. How do you trust an agent's work without manual human review?
**Litmus** is an AI Quality Gate Agent. It acts as an automated, impartial grader that evaluates deliverables against strict, predefined rubrics. If an agent submits subpar code, writing, or analysis, Litmus rejects it, ensuring only high-quality work passes the gate.
**Key Features:**
- โ๏ธ **Objective Grading:** Evaluates work across multiple rubric categories, assigning a deterministic score from 0-100.
- ๐ง **Quality Gatekeeper:** Automatically rejects work that falls below the acceptable threshold.
- โ๏ธ **On-Chain Attestation:** Cryptographically signs the grade to ensure the evaluation is immutable and verifiable.
## ๐ The Constellation โ On-Chain A2A Graph
Litmus is the constellation's **quality oracle**: other agents pay it on-chain to grade a deliverable 0โ100 against a rubric. A two-model "tribunal" (with a tiebreaker) keeps scoring stable (ฯ < 4). Verifiable, paid, impartial grading-as-a-service is a primitive a normal API marketplace can't offer.
```mermaid
graph LR
User([Any Agent / User]) -->|hires to grade| L[Litmus ๐งช]
M[Maestro ๐ผ] -->|grade + re-grade in its reflection loop| L
G[Gauntlet ๐งค] -.->|certifies| L
classDef hot fill:#F59E0B,stroke:#111,color:#111,font-weight:bold;
class L hot;
```
- **Depth:** Maestro hires Litmus **twice** per pipeline โ once to grade, once to re-grade the self-corrected draft โ making it a high-traffic A2A node.
- **Anti-gaming:** rubric weights are validated and Format/Clarity is capped at 15% so agents can't farm a passing grade on style alone.
## ๐ Live Run Log โ On-Chain Proof (Base Mainnet)
Real CAP grading orders Litmus fulfilled as a **provider**.
**Total real CAP orders: _0_** ยท _last updated: 2026-06-__
| # | Date | Counterparty (requester) | Amount (USDC) | Order ID | Tx (BaseScan) | Score |
|---|------|--------------------------|---------------|----------|---------------|-------|
| 1 | _2026-06-__ | _Maestro / external_ | _0.00_ | `_ord_โฆ_` | [0xโฆ](https://basescan.org/tx/0xโฆ) | _N_/100 |
> Order IDs + pay tx are in the provider logs and the CROO dashboard. Delete this note once populated.
## ๐๏ธ Architecture & Tech Stack
| Layer | Technology |
|---|---|
| **Runtime** | Node.js (TypeScript) |
| **Ecosystem** | Constellation A2A (croo-core) |
| **Testing** | Vitest |
## ๐ Getting Started
### Prerequisites
- Node.js โฅ 20
- npm
### Installation
1. Clone: `git clone https://github.com/edycutjong/litmus.git`
2. Install: `npm install`
3. Configure: `cp .env.example .env.local` and fill in your service ID + an LLM key (skip for mock mode)
### โถ๏ธ Run it now โ offline mock mode (no wallet, no USDC)
```bash
npm install
CROO_MOCK=true npm run dev # boots the grader provider with no on-chain calls
```
Grading works with **no API key** (deterministic mock grade); set `OPENAI_API_KEY` and/or `ANTHROPIC_API_KEY` to enable the live LLM tribunal. Run `npm run stability` to reproduce the ฯ < 4 scoring-variance harness.
## ๐งช Testing & CI
**4-stage pipeline:** Quality โ Security โ Build โ Deploy Gate
```bash
# โโ Code Quality โโโโโโโโโโโโโโโโโโโโโโโโโโโโ
make lint # ESLint
make typecheck # TypeScript check
make test # Run tests
make test-coverage # Coverage report
make ci # Full quality gate
# โโ Security โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
make security-scan # npm audit + license check
```
| Layer | Tool | Status |
|---|---|---|
| Code Quality | ESLint + TypeScript | โ
|
| Unit Testing | Vitest | โ
|
| Security (SAST) | CodeQL | โ
|
| Security (SCA) | Dependabot + npm audit | โ
|
| Secret Scanning | TruffleHog | โ
|
## ๐ Project Structure
```text
dorahacks-croo-litmus/
โโโ docs/ # README assets (hero, screenshots)
โโโ src/ # Application source code
โโโ scripts/ # Build and run scripts
โโโ __tests__/ # Vitest test suites
โโโ .github/ # CI workflows
โโโ README.md # You are here
```
## ๐ข Deploy
Containerized for any PaaS. Litmus is a background **worker** (connects out to the CROO WebSocket โ no inbound port):
```bash
docker build -t litmus .
docker run --env-file .env.local litmus
```
## ๐ License
[MIT](LICENSE) ยฉ 2026 Edy Cu
## ๐ Acknowledgments
Built for the DoraHacks CROO Hackathon 2026.