{"id":30912358,"url":"https://github.com/geminimir/promptproof","last_synced_at":"2025-09-09T21:50:38.404Z","repository":{"id":309238680,"uuid":"1035076523","full_name":"geminimir/promptproof","owner":"geminimir","description":"Deterministic LLM testing for CI. Record→Replay + policy-as-code to block risky merges (PII, schema drift, regressions, budget creep).","archived":false,"fork":false,"pushed_at":"2025-09-02T03:55:38.000Z","size":391,"stargazers_count":2,"open_issues_count":15,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-09-02T05:36:24.580Z","etag":null,"topics":["anthropic","ci","github-actions","guardrails","llm","openai","prompt","rag","security","testing"],"latest_commit_sha":null,"homepage":"https://github.com/geminimir/promptproof?tab=readme-ov-file#-quick-start","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/geminimir.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":"ROADMAP.md","authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":["geminimir"]}},"created_at":"2025-08-09T15:52:23.000Z","updated_at":"2025-09-02T03:55:19.000Z","dependencies_parsed_at":null,"dependency_job_id":"254a9a07-865e-45cf-a928-55f97dceb061","html_url":"https://github.com/geminimir/promptproof","commit_stats":null,"previous_names":["geminimir/promptproof"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/geminimir/promptproof","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/geminimir%2Fpromptproof","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/geminimir%2Fpromptproof/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/geminimir%2Fpromptproof/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/geminimir%2Fpromptproof/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/geminimir","download_url":"https://codeload.github.com/geminimir/promptproof/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/geminimir%2Fpromptproof/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":274367808,"owners_count":25272302,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-09T02:00:10.223Z","response_time":80,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anthropic","ci","github-actions","guardrails","llm","openai","prompt","rag","security","testing"],"created_at":"2025-09-09T21:50:34.710Z","updated_at":"2025-09-09T21:50:38.391Z","avatar_url":"https://github.com/geminimir.png","language":"TypeScript","funding_links":["https://github.com/sponsors/geminimir"],"categories":[],"sub_categories":[],"readme":"# PromptProof\n\nDeterministic LLM testing for production reliability.  \n**Record→Replay + policy‑as‑code** to catch PII leaks, schema drift, and behavioral regressions **before merge**.\n\n\u003cp\u003e\n  \u003ca href=\"https://github.com/codespaces/new?hide_repo_select=true\u0026ref=main\u0026repo=1035076523\"\u003e\n    \u003cimg alt=\"Open in Codespaces\" src=\"https://github.com/codespaces/badge.svg\" /\u003e\n  \u003c/a\u003e\n  \u0026nbsp;\n  \u003ca href=\"https://gitpod.io/#https://github.com/geminimir/promptproof\"\u003e\n    \u003cimg alt=\"Open in Gitpod\" src=\"https://img.shields.io/badge/Gitpod-Open%20Workspace-blue?logo=gitpod\" /\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n[![CI](https://img.shields.io/github/actions/workflow/status/geminimir/promptproof/promptproof.yml?branch=main)](https://github.com/geminimir/promptproof/actions)\n[![Action](https://img.shields.io/badge/Marketplace-promptproof--action-blue?logo=github)](https://github.com/marketplace/actions/promptproof-eval)\n[![npm (CLI)](https://img.shields.io/npm/v/promptproof-cli?label=promptproof-cli)](https://www.npmjs.com/package/promptproof-cli)\n[![npm (SDK)](https://img.shields.io/npm/v/promptproof-sdk-node?label=sdk--node)](https://www.npmjs.com/package/promptproof-sdk-node)\n![node](https://img.shields.io/badge/node-%3E%3D18-brightgreen)\n![license](https://img.shields.io/badge/license-MIT-green)\n[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](#contributing)\n\n## Try it in 60s 🚀\n\n```bash\n# Clone and try the example (expect a failure)\ngit clone https://github.com/geminimir/promptproof.git\ncd promptproof\ncorepack enable \u0026\u0026 corepack prepare pnpm@9 --activate\npnpm i\npnpm run try:example\n```\n\nThis runs PromptProof against a deliberately failing JSON output. **No API calls, no setup** - just pure validation.\n\nThen fix it:\n```bash\npnpm run fix:example  # Now it passes!\n```\n\n## Quickstart\n\n### Run locally\n```bash\nnpx promptproof-cli@latest eval -c promptproof.yaml --out report\n```\n\n### GitHub Action\n\n```yaml\n# .github/workflows/promptproof.yml\nname: PromptProof\non: [pull_request]\npermissions: { contents: read, pull-requests: write, security-events: write }\njobs:\n  eval:\n    runs-on: ubuntu-latest\n    steps:\n      - uses: actions/checkout@v4\n      - uses: geminimir/promptproof-action@v0\n        with:\n          config: promptproof.yaml\n          format: sarif  # optional: upload to Code Scanning\n```\n\n### One‑line recording (Node)\n```ts\nimport OpenAI from 'openai'\nimport { withPromptProofOpenAI } from 'promptproof-sdk-node/openai'\nconst ai = withPromptProofOpenAI(new OpenAI({ apiKey: process.env.OPENAI_API_KEY }), { suite: 'support-replies' })\n```\n\nThis writes sanitized JSONL lines to `fixtures/\u003csuite\u003e/outputs*.jsonl` for deterministic CI replay. No network calls during CI.\n\n## Guarantees\n- **Deterministic CI:** replay fixtures offline; **zero** network calls in CI.\n- **Safety by default:** PII redaction on (emails/phones). The SDK never blocks your app if recording fails.\n- **Provider‑agnostic:** we evaluate outputs, not vendors.\n\n## Why not just JSON schema at runtime?\n- We enforce **pre‑merge** gates (block risky PRs), not best‑effort runtime checks.\n- **Replay** of real outputs removes flakiness (no live model calls in CI).\n- **Budgets** catch cost/latency creep alongside quality rules.\n\n## Examples\n- Demo app: `examples/node-support-bot/`\n- Fixtures: `fixtures/` (support replies, RAG, tool calls)\n- Failure Zoo: `zoo/` — real cases with copy‑pasteable rules\n\n## What it looks like\n![Red→Green PR demo](./docs/assets/red-green.gif)\n\n## 🎯 Key Features\n\n- **Deterministic Replay**: Test against recorded LLM outputs with zero network calls in CI\n- **Comprehensive Assertions**: JSON schemas, regex patterns, numeric bounds, string operations, list/set equality, file diffs, and custom checks\n- **Regression Testing**: Snapshot baselines and automatic comparison to catch new failures and performance degradation\n- **Cost Controls**: Budget gates for total cost, per-test cost, and latency with regression tracking\n- **Flake Management**: Seed control and multiple runs with stability scoring for non-deterministic checks\n- **CI/CD Integration**: GitHub Action that fails PRs on violations with detailed reporting\n- **Provider Agnostic**: Works with OpenAI, Anthropic, and any HTTP-based LLM API\n- **Privacy First**: Built-in PII redaction and offline evaluation\n\n## 📋 Requirements\n\n- Node.js \u003e= 18.0.0\n- npm \u003e= 8.0.0\n\n## 🚀 Quick Start\n\n### Install Packages\n\n```bash\n# Install CLI for evaluation\nnpm install -g promptproof-cli@beta\n\n# Install SDK for recording (in your project)\nnpm install promptproof-sdk-node@beta\n```\n\n### Initialize in Your Project\n\n```bash\n# Initialize project structure\npromptproof init --suite support-replies\n```\n\nThis creates:\n- `promptproof.yaml` - Policy configuration\n- `fixtures/` - Directory for recorded outputs\n- `.github/workflows/promptproof.yml` - GitHub Action workflow\n\n## 📝 Record → Replay Workflow\n\n### Step 1: Record LLM Outputs (One Line Change!)\n\n#### OpenAI Integration\n\n```javascript\nimport OpenAI from 'openai'\nimport { withPromptProofOpenAI } from 'promptproof-sdk-node/openai'\n\nconst base = new OpenAI({ apiKey: process.env.OPENAI_API_KEY })\nexport const ai = withPromptProofOpenAI(base, { suite: 'support-replies' })\n\n// Use normally - fixtures are recorded automatically\nconst response = await ai.chat.completions.create({\n  model: 'gpt-4',\n  messages: [{ role: 'user', content: 'Hello!' }]\n})\n```\n\n#### Anthropic Integration\n\n```javascript\nimport Anthropic from '@anthropic-ai/sdk'\nimport { withPromptProofAnthropic } from 'promptproof-sdk-node/anthropic'\n\nconst anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY })\nexport const claude = withPromptProofAnthropic(anthropic, { suite: 'rag-answers' })\n\n// Use normally - fixtures are recorded automatically\nconst response = await claude.messages.create({\n  model: 'claude-3-sonnet-20240229',\n  max_tokens: 1000,\n  messages: [{ role: 'user', content: 'Hello!' }]\n})\n```\n\n#### Generic HTTP Integration\n\n```javascript\nimport { wrapFetch } from 'promptproof-sdk-node/http'\n\n// Wrap global fetch to record any LLM API calls\nglobalThis.fetch = wrapFetch(globalThis.fetch, { suite: 'generic-llm' })\n\n// All fetch calls to LLM APIs are automatically recorded\nconst response = await fetch('https://api.openai.com/v1/chat/completions', {\n  method: 'POST',\n  headers: { 'Authorization': `Bearer ${apiKey}` },\n  body: JSON.stringify({ model: 'gpt-4', messages: [...] })\n})\n```\n\n### Step 2: Fixtures Are Created Automatically\n\nEach LLM call creates a sanitized record in `fixtures/\u003csuite\u003e/outputs.jsonl`:\n\n```json\n{\n  \"schema_version\": \"pp.v1\",\n  \"id\": \"auto-generated\",\n  \"timestamp\": \"2024-08-10T12:34:56Z\",\n  \"source\": \"dev\",\n  \"input\": {\n    \"prompt\": \"user: Hello!\\nassistant: Hi there!\",\n    \"params\": { \"model\": \"gpt-4\", \"temperature\": 0.7 }\n  },\n  \"output\": { \"text\": \"Hello! How can I help you today?\" },\n  \"metrics\": { \"latency_ms\": 812, \"cost_usd\": 0.0012 },\n  \"redaction\": { \"status\": \"sanitized\" }\n}\n```\n\n### Step 3: Define Your Contracts\n\n```yaml\n# promptproof.yaml\nschema_version: pp.v1\nfixtures: fixtures/support-replies/outputs.jsonl\nchecks:\n  - id: no_pii\n    type: regex_forbidden\n    target: output.text\n    patterns:\n      - \"[A-Z0-9._%+-]+@[A-Z0-9.-]+\\\\.[A-Z]{2,}\"  # No emails\n      - \"\\\\b\\\\+?\\\\d[\\\\d\\\\s().-]{7,}\\\\b\"           # No phone numbers\n  \n  - id: response_schema\n    type: json_schema\n    target: output.json\n    schema:\n      type: object\n      required: [status, message]\n      properties:\n        status: { type: string, enum: [success, error] }\n        message: { type: string }\n  \n  - id: contains_disclaimer\n    type: string_contains\n    target: output.text\n    expected: \"We cannot guarantee\"\n    ignore_case: true\n  \n  - id: response_list_exact\n    type: list_equality\n    target: output.json.items\n    expected: [\"step1\", \"step2\", \"step3\"]\n    order_sensitive: true\n\nbudgets:\n  cost_usd_per_run_max: 0.50\n  cost_usd_total_max: 10.00  # Total cost gate\n  latency_ms_p95_max: 2000\n  cost_usd_total_pct_increase_max: 10  # Max 10% cost increase vs baseline\n\nmode: warn  # Start with 'warn', switch to 'fail' after validation\n```\n\n### Step 4: Evaluate Against Contracts\n\n```bash\n# Local evaluation\npromptproof eval -c promptproof.yaml\n\n# With regression comparison against baseline\npromptproof eval -c promptproof.yaml --regress\n\n# With flake controls for non-deterministic checks\npromptproof eval -c promptproof.yaml --seed 42 --runs 3\n\n# Create a snapshot after successful run\npromptproof snapshot promptproof.yaml --promote\n\n# In CI (automatic via GitHub Action)\n# Runs on every PR and blocks merge on violations\n```\n\n## ⚙️ Environment Variables\n\nControl recording behavior with environment variables:\n\n| Variable | Default | Description |\n|----------|---------|-------------|\n| `PP_RECORD` | `1` (dev), `0` (prod) | Master on/off switch |\n| `PP_SAMPLE_RATE` | `1.0` | Record 0-100% of calls |\n| `PP_SUITE` | from options | Override suite name |\n| `PP_OUT` | `fixtures` | Custom output directory |\n| `PP_SOURCE` | `NODE_ENV` | Environment label |\n| `PP_SHARD_BY_PID` | `0` | Write to `outputs.\u003cpid\u003e.jsonl` |\n\n## 🔒 Safety \u0026 Privacy\n\n- ✅ **Redaction ON by default** - emails, phones, SSNs masked\n- ✅ **Never blocks your app** - recording failures are logged, not thrown\n- ✅ **No secrets recorded** - API keys and auth headers excluded\n- ✅ **Deterministic** - same input = same output (ignoring timestamp/id)\n- ✅ **Production ready** - sampling controls, PID sharding for concurrency\n\n## 📊 Example Output\n\n```\n✓ Evaluated 142 fixtures\n✗ 3 violations found:\n\n  [no_pii] Record #47: Found forbidden pattern (email) at output.text\n  [response_schema] Record #89: Missing required field 'status'\n  [latency_budget] P95 latency: 2341ms exceeds limit of 2000ms\n\n📊 Regression Comparison\nBaseline: 2024-01-15-stable\n⚠ 2 new failures:\n  • [string_contains] test-102: Expected string \"disclaimer\" not found\n  • [cost_budget] Total cost $12.50 exceeds budget $10.00\n✓ 1 fixed failures\n↔ 0 unchanged failures\n\nCost \u0026 Performance:\nCost: ↑ $2.50 (25.0%)\nP95 Latency: ↑ 341ms\n\nExit code: 1\n```\n\n## 🏗️ Architecture\n\n```\nApp/Service → SDK Wrapper → fixtures/*.jsonl\n                    ↓\nDeveloper → PR → GitHub Action → CLI eval → Report → Pass/Fail Gate\n```\n\n## 🔧 CLI Commands\n\n```bash\npromptproof eval        # Run contract checks on fixtures\n  --regress             # Compare against baseline snapshot\n  --seed \u003cn\u003e            # Set seed for non-deterministic checks\n  --runs \u003ck\u003e            # Run non-deterministic checks k times\n\npromptproof snapshot   # Create evaluation snapshot\n  --promote             # Promote to baseline\n  --tag \u003cname\u003e          # Custom snapshot tag\n\npromptproof init        # Initialize project with templates\npromptproof promote     # Convert logs to fixture format\npromptproof redact      # Remove PII from fixtures\npromptproof validate    # Validate fixture schema\n```\n\n\u003e **Note**: Use `npx promptproof-cli@beta` or install globally with `npm install -g promptproof-cli@beta`\n\u003e \n\u003e **SDK**: Install `promptproof-sdk-node@beta` in your project for automatic fixture recording\n\n## 📦 Packages\n\n### Core Packages\n- **`promptproof-cli@beta`**: Command-line interface for evaluation\n- **`promptproof-sdk-node@beta`**: SDK wrappers for OpenAI, Anthropic, HTTP\n\n### Architecture\n- **CLI**: Evaluates pre-recorded fixtures against contracts\n- **SDK**: Automatically records LLM interactions to fixtures\n- **Workflow**: SDK records → CLI evaluates → CI gates\n\n### Coming Soon\n- **`@promptproof/action`**: GitHub Action for CI integration\n- **`@promptproof/evaluator`**: Core evaluation engine (bundled in CLI)\n\n## 🎪 Failure Zoo\n\nBrowse real-world LLM failure cases in our [Failure Zoo](./zoo) - anonymized production incidents with patterns and mitigations.\n\n## 🎭 Demo Project\n\nSee our [demo project](./promptproof-demo-project) for a complete working example:\n- **Realistic LLM application** with support \u0026 RAG endpoints\n- **SDK integration** with automatic fixture recording\n- **CLI validation** with intentional failure modes\n- **CI/CD integration** via GitHub Actions\n- **Red → Green demonstrations** showing PromptProof in action\n\n## Support \u0026 Community\n- Issues: https://github.com/geminimir/promptproof/issues\n- Discussions: [GitHub Discussions](https://github.com/geminimir/promptproof/discussions)\n\n## 🤝 Contributing\n\nWe welcome contributions! See [CONTRIBUTING.md](./CONTRIBUTING.md) for guidelines.\n\n## 📄 License\n\nMIT License - see [LICENSE](./LICENSE) for details.\n\n## 🔗 Links\n\n- [Website](https://promptproof.io)\n- [GitHub Repository](https://github.com/geminimir/promptproof)\n- [CLI Package](https://www.npmjs.com/package/promptproof-cli) (`@beta`)\n- [SDK Package](https://www.npmjs.com/package/promptproof-sdk-node) (`@beta`)\n- [GitHub Action](https://github.com/geminimir/promptproof-action)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgeminimir%2Fpromptproof","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgeminimir%2Fpromptproof","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgeminimir%2Fpromptproof/lists"}