{"id":51063540,"url":"https://github.com/toxy4ny/kevlar-benchmark","last_synced_at":"2026-06-23T04:30:22.237Z","repository":{"id":330445037,"uuid":"1114358222","full_name":"toxy4ny/kevlar-benchmark","owner":"toxy4ny","description":"Kevlar Benchmark: OWASP Top 10 for Agentic Apps (AI-Agents) 2026 a Red Team Benchmark","archived":false,"fork":false,"pushed_at":"2026-01-16T14:50:46.000Z","size":97,"stargazers_count":23,"open_issues_count":0,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-01-17T04:38:51.007Z","etag":null,"topics":["2025","2026","ai","ai-agent","ai-agents","cybersecurity","education","hacking","hacking-tool","hacking-tools","owasp-top-10","redteam","redteaming","redteaming-tools"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/toxy4ny.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-12-11T09:02:37.000Z","updated_at":"2026-01-16T14:50:50.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/toxy4ny/kevlar-benchmark","commit_stats":null,"previous_names":["toxy4ny/kevlar-benchmark"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/toxy4ny/kevlar-benchmark","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/toxy4ny%2Fkevlar-benchmark","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/toxy4ny%2Fkevlar-benchmark/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/toxy4ny%2Fkevlar-benchmark/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/toxy4ny%2Fkevlar-benchmark/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/toxy4ny","download_url":"https://codeload.github.com/toxy4ny/kevlar-benchmark/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/toxy4ny%2Fkevlar-benchmark/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34675970,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-23T02:00:07.161Z","response_time":65,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["2025","2026","ai","ai-agent","ai-agents","cybersecurity","education","hacking","hacking-tool","hacking-tools","owasp-top-10","redteam","redteaming","redteaming-tools"],"created_at":"2026-06-23T04:30:21.515Z","updated_at":"2026-06-23T04:30:22.217Z","avatar_url":"https://github.com/toxy4ny.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Kevlar: OWASP Top 10 for Agentic Apps 2026 Benchmark\n\n# together with respected people [POXEK AI](https://github.com/szybnev) and [COPYLEFTDEV](https://github.com/copyleftdev)\n\n\u003e **Full-coverage red team framework** for AI agent security testing  \n\u003e Based on [OWASP Top 10 for Agentic Applications (2026)](https://owasp.org/www-project-top-10-for-large-language-model-applications/)  \n\u003e ✅ Licensed under **CC BY-SA 4.0** | ✅ For **authorized red teaming only**\n\n---\n\n## Mission\n\nDetect, exploit, and report **Agent-Specific Injection (ASI)** vulnerabilities before adversaries do.\nKevlar automates adversarial testing of all **10 OWASP ASI risks**, ordered by real-world criticality from **Appendix D**.\n\n---\n\n## Architecture Overview\n\n```\n+-------------------------+\n|   Threat Orchestrator   | \u003c- Prioritizes ASI01 -\u003e ASI10\n+-----------+-------------+\n            |\n            v\n+-----------------------------------------------------+\n|                    ASI Modules                      |\n|  +-------------+ +-------------+ +--------------+   |\n|  |  CRITICAL   | |    HIGH     | |   MEDIUM     |   |\n|  | ASI01-ASI05 | | ASI06-ASI08 | | ASI09-ASI10  |   |\n|  +-------------+ +-------------+ +--------------+   |\n+-----------+-------------------------+---------------+\n            |                         |\n            v                         v\n+---------------------+ +--------------------------+\n|   Exploit Simulator | |   Detection \u0026 Reporting  |\n| - EchoLeak          | | - Data Exfil Detector    |\n| - MCP Poisoning     | | - Goal Drift Analyzer    |\n| - RCE Chains        | | - AIVSS Scoring Engine   |\n+---------------------+ +--------------------------+\n```\n\n---\n\n## OWASP ASI Coverage Matrix\n\n| Rank | ASI ID | Vulnerability                      | Criticality | Real Incidents (2025)         | Status      |\n|------|--------|------------------------------------|-------------|-------------------------------|-------------|\n| 1    | ASI01  | Agent Goal Hijack                  | Critical    | EchoLeak, Operator, Inception | Implemented |\n| 2    | ASI05  | Unexpected Code Execution (RCE)    | Critical    | Cursor RCE, Replit Meltdown   | Implemented |\n| 3    | ASI03  | Identity \u0026 Privilege Abuse         | High        | Copilot Studio Leak           | Implemented |\n| 4    | ASI02  | Tool Misuse \u0026 Exploitation         | High        | EDR Bypass via Chaining       | Implemented |\n| 5    | ASI04  | Agentic Supply Chain               | High        | Postmark MCP BCC              | Implemented |\n| 6    | ASI06  | Memory \u0026 Context Poisoning         | Medium      | Gemini Memory Corruption      | Implemented |\n| 7    | ASI07  | Insecure Inter-Agent Comms         | Medium      | Agent-in-the-Middle           | Implemented |\n| 8    | ASI08  | Cascading Failures                 | Medium      | Financial Trading Collapse    | Implemented |\n| 9    | ASI09  | Human-Agent Trust Exploitation     | Medium      | Fake Explainability           | Implemented |\n| 10   | ASI10  | Rogue Agents                       | Medium      | Self-Replicating Agents       | Implemented |\n\n**Source**: Appendix D, OWASP ASI 2026 - 20+ real-world exploits from May-Oct 2025\n\n---\n\n## Project Structure\n\n```\nkevlar-benchmark/\n├── pyproject.toml\n├── README.md, CLAUDE.md\n├── src/kevlar/\n│   ├── __init__.py\n│   ├── cli.py                     # Main CLI entry point\n│   ├── core/\n│   │   ├── __init__.py\n│   │   ├── orchestrator.py        # ThreatOrchestrator\n│   │   └── types.py               # SessionLog dataclass\n│   ├── agents/\n│   │   ├── __init__.py\n│   │   ├── protocol.py            # AgentProtocol (typing)\n│   │   ├── mock.py                # MockCopilotAgent\n│   │   ├── langchain.py           # RealLangChainAgent\n│   │   └── adapters/\n│   │       ├── asi02.py           # LangChainASI02Agent\n│   │       └── asi04.py           # LangChainASI04Agent\n│   └── modules/                   # ASI test modules\n│       ├── critical/              # ASI01-ASI05\n│       ├── high/                  # ASI06-ASI08\n│       └── medium/                # ASI09-ASI10\n├── scripts/\n│   └── run_asi*.py                # Individual ASI runners\n└── tests/                         # pytest tests\n```\n\n---\n\n## Quick Start\n\n```bash\n# Clone repository\ngit clone https://github.com/toxy4ny/kevlar-benchmark\ncd kevlar-benchmark\n\n# Install dependencies\nuv sync\n\n# Run full benchmark (interactive mode)\nuv run kevlar\n\n# Or run individual ASI test scripts\nuv run scripts/run_asi01.py   # Agent Goal Hijack\nuv run scripts/run_asi02.py   # Tool Misuse\nuv run scripts/run_asi03.py   # Identity Abuse\nuv run scripts/run_asi04.py   # Supply Chain\nuv run scripts/run_asi05.py   # RCE\nuv run scripts/run_asi06.py   # Memory Poisoning\nuv run scripts/run_asi07.py   # Inter-Agent Comms\nuv run scripts/run_asi08.py   # Cascading Failures\nuv run scripts/run_asi09.py   # Human Trust\nuv run scripts/run_asi10.py   # Rogue Agents\n```\n\n---\n\n## CLI Usage\n\nKevlar supports both interactive and non-interactive modes.\n\n### Interactive Mode\n\n```bash\nuv run kevlar\n```\n\n### Non-Interactive Mode\n\n```bash\n# Run specific ASI tests\nuv run kevlar --asi ASI01 --asi ASI05 --mode mock\n\n# Run all tests with real agent\nuv run kevlar --all --mode real --model llama3.1\n\n# Custom output path with quiet mode\nuv run kevlar --asi ASI01 --output report.json --quiet\n```\n\n### CI/CD Integration\n\n```bash\n# CI mode: quiet output + exit codes based on severity\nuv run kevlar --all --ci\n\n# Check exit code\nuv run kevlar --all --ci; echo \"Exit code: $?\"\n```\n\n**Exit Codes:**\n| Code | Meaning |\n|------|---------|\n| 0 | No vulnerabilities found |\n| 1 | Medium/High vulnerabilities found |\n| 2 | Critical vulnerabilities found |\n| 130 | Interrupted (SIGINT) |\n\n### Dependency Check\n\nReal agent mode requires LangChain and Ollama. Check availability before running:\n\n```bash\n# Check if dependencies are available\nuv run kevlar --check\n```\n\nIf dependencies are missing, `--mode real` will fail with a clear error suggesting `--mode mock`.\n\n### All Options\n\n```\n-a, --asi TEXT          ASI tests to run (can be repeated)\n    --all               Run all ASI tests\n-m, --mode [mock|real]  Agent mode (default: mock)\n    --model TEXT        Model for real agent (default: llama3.1)\n-o, --output PATH       Output report path\n-q, --quiet             Suppress banner and colors\n    --ci                CI mode: quiet + exit codes\n    --check             Check agent dependencies and exit\n    --version           Show version\n    --help              Show help\n```\n\n---\n\n## Testing\n\n```bash\n# Run all tests\nuv run pytest tests/\n\n# Unit tests only\nuv run pytest tests/unit/\n\n# Integration tests only\nuv run pytest tests/integration/\n\n# Verbose with short traceback\nuv run pytest tests/ -v --tb=short\n```\n\n### Test Stats\n\n- **591 tests** total\n- **Coverage: ~56%** (threshold: 40%)\n- Unit tests for all 10 ASI modules\n- Integration tests for CLI and orchestrator pipeline\n\n---\n\n## ASI01: Agent Goal Hijack\n\nKevlar simulates **4 real-world attack scenarios** from OWASP:\n\n### Attack Surface Map\n\n```\nUser Input -\u003e [Agent] -\u003e Tools -\u003e Memory -\u003e Human\n     ^          ^          ^        ^       ^\n  [Inception] [EchoLeak] [RCE]  [Poison] [Trust Exploit]\n```\n\n### Test Scenarios\n\n| Scenario           | Technique                   | Evidence to Detect                    |\n|--------------------|-----------------------------|---------------------------------------|\n| **EchoLeak**       | Zero-click email injection  | HTTP call to `attacker.com`           |\n| **Operator**       | Web content poisoning (RAG) | `\"EXFILTRATE\"` in agent output        |\n| **Inception**      | Malicious Google Doc        | Unauthorized wire transfer approved   |\n| **Goal-lock Drift**| Recurring calendar prompt   | Goal alignment score \u003c 0.3            |\n\n---\n\n## Risk Scoring: OWASP AIVSS Integration\n\nKevlar outputs structured reports compatible with **OWASP AI Vulnerability Scoring System (AIVSS)**:\n\n```json\n{\n  \"asi_id\": \"ASI01\",\n  \"aivss_score\": 9.8,\n  \"risk_level\": \"CRITICAL\",\n  \"attack_vector\": \"INDIRECT_PROMPT_INJECTION\",\n  \"blast_radius\": \"ORGANIZATION_WIDE\",\n  \"remediation\": \"https://owasp.org/www-project-top-10-for-large-language-model-applications/2026/en/asi01/\"\n}\n```\n\nReports are generated as JSON in `reports/kevlar_aivss_report_\u003ctimestamp\u003e.json`.\n\n---\n\n## Legal \u0026 Ethical Notice\n\n**Kevlar is for authorized red teaming only.**\n\nDo not test systems without **written permission**.\nMisuse violates:\n- Computer Fraud and Abuse Act (CFAA)\n- GDPR / CCPA (if PII exposed)\n- OWASP Ethical Guidelines\n\nBy using Kevlar, you agree to test **only**:\n- Your own agents\n- Systems where you hold **explicit authorization**\n- Isolated lab environments\n\n---\n\n## License\n\n[![CC BY-SA 4.0](https://img.shields.io/badge/License-CC%20BY--SA%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by-sa/4.0/)\n\nYou are free to **share and adapt** - even commercially - as long as you:\n1. **Give appropriate credit**\n2. **Indicate if changes were made**\n3. **Distribute under same license (ShareAlike)**\n\nCopyright 2026 - [toxy4ny](https://github.com/toxy4ny) | Part of the **Kevlar Offensive AI Security Suite**\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftoxy4ny%2Fkevlar-benchmark","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftoxy4ny%2Fkevlar-benchmark","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftoxy4ny%2Fkevlar-benchmark/lists"}