{"id":44728289,"url":"https://github.com/doronp/agentshield-benchmark","last_synced_at":"2026-02-22T00:00:40.186Z","repository":{"id":338598335,"uuid":"1158377756","full_name":"doronp/agentshield-benchmark","owner":"doronp","description":"Open benchmark for AI agent security tools — prompt injection, data exfiltration, tool abuse, provenance","archived":false,"fork":false,"pushed_at":"2026-02-15T16:27:22.000Z","size":907,"stargazers_count":9,"open_issues_count":1,"forks_count":2,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-02-17T22:53:52.033Z","etag":null,"topics":["agent-security","ai-security","benchmark","guardrails","llm-security","prompt-injection"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/doronp.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-15T09:09:25.000Z","updated_at":"2026-02-17T07:06:19.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/doronp/agentshield-benchmark","commit_stats":null,"previous_names":["doronp/agentshield-benchmark"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/doronp/agentshield-benchmark","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/doronp%2Fagentshield-benchmark","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/doronp%2Fagentshield-benchmark/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/doronp%2Fagentshield-benchmark/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/doronp%2Fagentshield-benchmark/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/doronp","download_url":"https://codeload.github.com/doronp/agentshield-benchmark/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/doronp%2Fagentshield-benchmark/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29596069,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-18T20:59:56.587Z","status":"ssl_error","status_checked_at":"2026-02-18T20:58:41.434Z","response_time":162,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agent-security","ai-security","benchmark","guardrails","llm-security","prompt-injection"],"created_at":"2026-02-15T18:04:00.692Z","updated_at":"2026-02-18T21:00:49.677Z","avatar_url":"https://github.com/doronp.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/logo.jpg\" alt=\"AgentShield Benchmark\" width=\"300\" /\u003e\n\u003c/p\u003e\n\n\u003ch1 align=\"center\"\u003eAgentShield Benchmark\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003e\u003cstrong\u003eThe first head-to-head benchmark of commercial agent protection providers.\u003c/strong\u003e\u003c/p\u003e\n\nAgentShield is an open, reproducible benchmark suite that evaluates how well commercial AI agent security products defend against real-world attacks — and how much they cost you in latency, false positives, and dollars.\n\n## Disclosure\n\nThis benchmark is maintained by the team behind [Agent Guard](https://agentguard.co/). To ensure credibility, Agent Guard's results were obtained using our [Commit-Reveal Integrity Protocol](src/protocol/README.md) — a commit-reveal scheme with Ed25519 signatures that allows proprietary solutions to participate without revealing their implementation, while cryptographically proving result integrity. The verification bundle is published in `results/` for independent verification. Note: this protocol verifies that results were not tampered with after execution; it does not independently attest which model produced the results.\n\nThe test corpus, scoring methodology, and all adapter code are fully open source and auditable. We welcome third-party verification and contributions from the community.\n\nIf you believe any aspect of the methodology unfairly advantages or disadvantages a particular provider, please [open an issue](../../issues).\n\n## Current Status\n\nThis benchmark currently includes tested results for **6 providers** across ML models, SaaS APIs, and pattern-based scanners with **537 test cases** across 8 categories. We are actively expanding coverage — contributions of new provider adapters are welcome.\n\n### Latest Results\n\n| Provider | Score | PI | Jailbreak | Data Exfil | Tool Abuse | Over-Refusal | Multi-Agent | Provenance | P50 (ms) |\n|---|---|---|---|---|---|---|---|---|---|\n| **AgentGuard**² | **98.4** | 98.5% | 97.8% | 100.0% | 100.0% | 100.0% | 100.0% | 85.0% | 1 |\n| **Deepset DeBERTa** | **87.6** | 99.5% | 97.8% | 95.4% | 98.8% | 63.1% | 100.0% | 100.0% | 19 |\n| **Lakera Guard** | **79.4** | 97.6% | 95.6% | 96.6% | 86.3% | 58.5% | 94.3% | 95.0% | 133 |\n| ProtectAI DeBERTa v2 | 51.4 | 77.1% | 86.7% | 43.7% | 12.5% | 95.4% | 74.3% | 65.0% | 19 |\n| ClawGuard | 38.9 | 62.9% | 22.2% | 40.2% | 17.5% | 100.0% | 40.0% | 25.0% | 0 |\n| LLM Guard¹ | ~38.7 | 77.1% | — | 30.8% | 8.9% | — | — | — | 111 |\n\n¹ Scored on 517-case corpus (pre-provenance). Re-run pending for 537-case corpus with updated penalty.\n² Tested via Commit-Reveal Integrity Protocol (Ed25519 signatures) using a proprietary provenance-based solution. See [protocol documentation](src/protocol/README.md). Verification bundle included in results.\n\n## Benchmark Categories\n\n| # | Category | Tests | Weight | What It Measures |\n|---|----------|-------|--------|-----------------|\n| 1 | **Prompt Injection** | 205 | 20% | Direct, indirect, and context-manipulation injection attacks |\n| 2 | **Jailbreak** | 45 | 10% | DAN variants, roleplay, authority impersonation, token smuggling |\n| 3 | **Data Exfiltration** | 87 | 15% | Resistance to data leakage via tool calls, markdown, errors |\n| 4 | **Tool Abuse** | 80 | 15% | Unauthorized tool calls, scope escalation, parameter tampering |\n| 5 | **Over-Refusal** | 65 | 15% | False positive rate on legitimate requests (penalty only) |\n| 6 | **Multi-Agent Security** | 35 | 10% | Cross-agent attacks, delegation exploits, trust boundary violations |\n| 7 | **Latency Overhead** | — | 10% | Added latency (p50, p95, p99) from the protection layer |\n| 8 | **Provenance \u0026 Audit** | 20 | 5% | Detecting fake authorization claims, spoofed provenance chains, unverifiable approvals |\n\n**Total: 537 test cases** across 8 categories (7 scored + latency).\n\n## Scoring\n\nEach provider receives a **per-category score (0-100)** and a **composite score** computed as the weighted geometric mean across attack detection categories. Over-refusal is excluded from the composite and instead applied as a standalone penalty (`(FPR^1.3) * 40`) to avoid double-counting. A provider that blocks 50% of legitimate requests loses ~16 points — security that breaks usability isn't security.\n\n## Quick Start\n\n```bash\n# Install dependencies\nnpm install\n\n# Run the full benchmark suite\nnpm run benchmark\n\n# Validate the test corpus\nnpm run validate-corpus\n\n# Run tests\nnpm test\n\n# Type check\nnpm run typecheck\n```\n\n\u003e **Note:** Running the benchmark requires API keys or local services for each provider under test. Copy `.env.example` to `.env` and configure the providers you want to benchmark. See [PROVIDERS.md](./PROVIDERS.md) for setup instructions.\n\n## Project Structure\n\n```\nagentshield-benchmark/\n├── corpus/                  # Attack and benign test cases (JSONL)\n│   ├── categories.json      # Category definitions and weights\n│   ├── prompt-injection/    # 205 prompt injection test cases\n│   ├── jailbreak/           # 45 jailbreak test cases\n│   ├── data-exfiltration/   # 87 data exfiltration test cases\n│   ├── tool-abuse/          # 80 tool abuse test cases\n│   ├── over-refusal/        # 65 legitimate request test cases\n│   ├── multi-agent/         # 35 multi-agent security test cases\n│   └── provenance-audit/   # 20 provenance \u0026 audit test cases\n├── src/\n│   ├── types.ts             # Core TypeScript interfaces\n│   ├── runner.ts            # Test runner engine\n│   ├── scoring.ts           # Scoring and aggregation\n│   ├── run-benchmark.ts     # CLI entry point with provider discovery\n│   ├── adapters/            # Provider adapter implementations\n│   └── protocol/            # Commit-reveal integrity protocol\n├── scripts/\n│   ├── hf-model-server.py   # HuggingFace model server for ML-based providers\n│   └── validate-corpus.sh   # Corpus validation script\n├── site/                    # Static leaderboard website\n├── docs/\n│   └── providers.md         # Provider research and API details\n├── package.json\n└── tsconfig.json\n```\n\n## Adding a Provider\n\nSee [`src/adapters/README.md`](./src/adapters/README.md) for the adapter interface. Each provider adapter extends `BaseAdapter` and implements the `evaluateImpl()` method.\n\n## Reproducibility\n\nAgentShield is designed for reproducible benchmark runs. Every result JSON includes the metadata needed to verify and replicate a run.\n\n### Reproducing a Benchmark Run\n\n1. **Check the corpus hash** — each report includes a `corpusHash` (SHA-256 of all JSONL files). Verify your local corpus matches:\n   ```bash\n   # The runner prints the hash at startup:\n   # Computing corpus hash...\n   #    Corpus hash: a1b2c3d4...\n   ```\n   Compare this against the `corpusHash` field in the results JSON.\n\n2. **Use a shuffle seed** — to get deterministic test ordering, pass a `shuffleSeed`:\n   ```typescript\n   const report = await runBenchmark(providers, {\n     shuffle: true,\n     shuffleSeed: 42,\n   });\n   ```\n   The same seed produces the same test order on the same corpus. The seed is recorded in `report.config.shuffleSeed`.\n\n3. **Match the environment** — the report records `environment.os`, `environment.arch`, and `environment.nodeVersion`. While results should be environment-independent, matching these eliminates one variable.\n\n### Commit-Reveal Integrity Protocol\n\nAgentShield includes a commit-reveal protocol (`src/protocol/`) that allows vendors to run the benchmark locally on proprietary models while cryptographically proving result integrity. See [`src/protocol/README.md`](./src/protocol/README.md) for details.\n\n### What the Results JSON Contains\n\n| Field | Description |\n|-------|-------------|\n| `version` | Benchmark suite version |\n| `corpusHash` | SHA-256 of corpus JSONL files — verifies test data integrity |\n| `environment` | OS, architecture, Node.js version, and timestamp |\n| `config.shuffleSeed` | PRNG seed used for test ordering (if set) |\n| `providers[].providerVersion` | Version of each provider (package version or API version) |\n| `providers[].results[]` | Individual test case results with timestamps |\n\n### Environment Requirements\n\n- **Node.js** \u003e= 20.0.0\n- **TypeScript** \u003e= 5.7.0\n- Provider-specific dependencies (see [PROVIDERS.md](./PROVIDERS.md))\n\n## Contributing\n\nSee [CONTRIBUTING.md](./CONTRIBUTING.md) for guidelines. We welcome contributions of:\n\n- New test cases (especially novel attack vectors)\n- Provider adapters\n- Scoring methodology improvements\n\nPlease open an issue before submitting large changes.\n\n## License\n\nApache 2.0 — see [LICENSE](./LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdoronp%2Fagentshield-benchmark","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdoronp%2Fagentshield-benchmark","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdoronp%2Fagentshield-benchmark/lists"}