{"id":49030908,"url":"https://github.com/stackonehq/stackone-defender","last_synced_at":"2026-04-22T17:01:57.122Z","repository":{"id":350046663,"uuid":"1184399557","full_name":"StackOneHQ/stackone-defender","owner":"StackOneHQ","description":null,"archived":false,"fork":false,"pushed_at":"2026-04-14T03:09:58.000Z","size":45254,"stargazers_count":2,"open_issues_count":2,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-19T09:37:12.687Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/StackOneHQ.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-17T14:51:25.000Z","updated_at":"2026-04-08T16:45:21.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/StackOneHQ/stackone-defender","commit_stats":null,"previous_names":["stackonehq/stackone-defender"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/StackOneHQ/stackone-defender","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StackOneHQ%2Fstackone-defender","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StackOneHQ%2Fstackone-defender/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StackOneHQ%2Fstackone-defender/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StackOneHQ%2Fstackone-defender/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/StackOneHQ","download_url":"https://codeload.github.com/StackOneHQ/stackone-defender/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StackOneHQ%2Fstackone-defender/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32145873,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-22T15:33:03.595Z","status":"ssl_error","status_checked_at":"2026-04-22T15:30:42.712Z","response_time":58,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-04-19T09:06:00.361Z","updated_at":"2026-04-22T17:01:57.116Z","avatar_url":"https://github.com/StackOneHQ.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\n  \u003cpicture\u003e\n    \u003csource media=\"(prefers-color-scheme: dark)\" srcset=\"https://raw.githubusercontent.com/StackOneHQ/defender/main/assets/banner-dark.svg\" /\u003e\n    \u003cimg src=\"https://raw.githubusercontent.com/StackOneHQ/defender/main/assets/banner-light.svg\" alt=\"Defender by StackOne — Indirect prompt injection protection for MCP tool calls\" width=\"800\" /\u003e\n  \u003c/picture\u003e\n\n  \u003cp\u003e\n    \u003ca href=\"https://pypi.org/project/stackone-defender/\"\u003e\u003cimg src=\"https://img.shields.io/pypi/v/stackone-defender?style=flat-square\u0026color=047B43\u0026label=pypi\" alt=\"PyPI version\" /\u003e\u003c/a\u003e\n    \u003ca href=\"https://github.com/StackOneHQ/stackone-defender/releases\"\u003e\u003cimg src=\"https://img.shields.io/github/v/release/StackOneHQ/stackone-defender?style=flat-square\u0026color=047B43\u0026label=release\" alt=\"latest GitHub release\" /\u003e\u003c/a\u003e\n    \u003ca href=\"https://github.com/StackOneHQ/stackone-defender/stargazers\"\u003e\u003cimg src=\"https://img.shields.io/github/stars/StackOneHQ/stackone-defender?style=flat-square\u0026color=047B43\" alt=\"GitHub stars\" /\u003e\u003c/a\u003e\n    \u003ca href=\"./LICENSE\"\u003e\u003cimg src=\"https://img.shields.io/pypi/l/stackone-defender?style=flat-square\u0026color=047B43\" alt=\"License\" /\u003e\u003c/a\u003e\n    \u003cimg src=\"https://img.shields.io/badge/Python-3.11+-047B43?style=flat-square\" alt=\"Python 3.11+\" /\u003e\n  \u003c/p\u003e\n  \u003cp\u003e\n    \u003cimg src=\"https://img.shields.io/badge/model-22MB-047B43?style=flat-square\" alt=\"Model size: 22MB\" /\u003e\n    \u003cimg src=\"https://img.shields.io/badge/latency-~10ms-047B43?style=flat-square\" alt=\"Latency: ~10ms\" /\u003e\n    \u003cimg src=\"https://img.shields.io/badge/CPU--only-no%20GPU%20needed-047B43?style=flat-square\" alt=\"CPU only\" /\u003e\n    \u003cimg src=\"https://img.shields.io/badge/F1%20Score-90.8%25-047B43?style=flat-square\" alt=\"F1 Score: 90.8%\" /\u003e\n  \u003c/p\u003e\n\n\u003c/div\u003e\n\n---\n\nIndirect prompt injection defense for AI agents using tool calls (MCP, CLI, or direct APIs). Detects and neutralizes attacks hidden in tool results (emails, documents, PRs, etc.) before they reach your LLM.\n\n**Python package:** [`stackone-defender`](https://pypi.org/project/stackone-defender/) — aligned with [`@stackone/defender`](https://www.npmjs.com/package/@stackone/defender) on npm.\n\n## Installation\n\n**pip**\n\n```bash\npip install stackone-defender\n```\n\n**uv**\n\n```bash\nuv add stackone-defender\n```\n\n**Tier 2 (ONNX)** — add extras:\n\n```bash\npip install stackone-defender[onnx]\n# or: uv add \"stackone-defender[onnx]\"\n```\n\nThe ONNX model (~22MB) is bundled in the wheel — no extra downloads at runtime.\n\n## Quick start\n\n```python\nfrom stackone_defender import create_prompt_defense\n\n# Tier 1 + Tier 2 are on by default. block_high_risk=True enables allow/block.\ndefense = create_prompt_defense(block_high_risk=True)\n\n# Optional: preload ONNX to avoid first-call latency (requires [onnx] extra)\ndefense.warmup_tier2()\n\nresult = defense.defend_tool_result(tool_output, \"gmail_get_message\")\n\nif not result.allowed:\n    print(f\"Blocked: risk={result.risk_level}, score={result.tier2_score}\")\n    print(f\"Detections: {', '.join(result.detections)}\")\nelse:\n    send_to_llm(result.sanitized)\n```\n\n## How it works\n\n\u003cpicture\u003e\n  \u003csource media=\"(prefers-color-scheme: dark)\" srcset=\"https://raw.githubusercontent.com/StackOneHQ/defender/main/assets/demo-dark.svg\" /\u003e\n  \u003cimg src=\"https://raw.githubusercontent.com/StackOneHQ/defender/main/assets/demo-light.svg\" alt=\"Defender flow: poisoned tool output is sanitized and evaluated; high-risk content can be blocked before the LLM\" width=\"900\" /\u003e\n\u003c/picture\u003e\n\n`defend_tool_result()` runs two tiers:\n\n### Tier 1 — Pattern detection (sync, ~1 ms)\n\n- **Unicode normalization** — homoglyph resistance (e.g. Cyrillic `а` → ASCII `a`)\n- **Role stripping** — `SYSTEM:`, `ASSISTANT:`, `\u003csystem\u003e`, `[INST]`, etc.\n- **Pattern removal** — phrases like “ignore previous instructions”\n- **Encoding detection** — suspicious Base64/URL-shaped payloads\n- **Boundary annotation** — `[UD-{id}]…[/UD-{id}]` wrappers around untrusted spans\n\n### Tier 2 — ML classification (ONNX)\n\nSentence-level MiniLM classifier (int8 ONNX ~22 MB, bundled):\n\n- Split text into sentences, score each (0.0 = benign, 1.0 = injection-like), take the max\n- Catches paraphrased or novel injections missed by regex\n- Roughly ~10 ms per batch after warmup (CPU)\n\n**Benchmarks** (F1 @ threshold 0.5):\n\n| Benchmark | F1 | Samples |\n|-----------|-----|--------|\n| Qualifire (in-distribution) | 0.8686 | ~1.5k |\n| xxz224 (out-of-distribution) | 0.8834 | ~22.5k |\n| jayavibhav (adversarial) | 0.9717 | ~1k |\n| **Average** | **0.9079** | ~25k |\n\n### `allowed` vs `risk_level`\n\n- Use **`allowed`** for gating when `block_high_risk=True`: `False` means do not pass `sanitized` to the model as-is.\n- **`risk_level`** is diagnostic: it starts at `default_risk_level` (default `\"medium\"`) and is **escalated** by Tier 1 / Tier 2 signals — not reduced. Use it for logging, not as the sole block signal unless you implement your own policy.\n\n| Level | Typical trigger |\n|-------|------------------|\n| `low` | No strong signals |\n| `medium` | Lighter pattern / sanitization signals |\n| `high` / `critical` | Strong injection patterns, encoding signals, or high Tier 2 score |\n\n## API\n\n### `create_prompt_defense(**kwargs)`\n\n```python\ndefense = create_prompt_defense(\n    enable_tier1=True,\n    enable_tier2=True,\n    block_high_risk=False,\n    default_risk_level=\"medium\",\n    tier2_fields=[\"subject\", \"body\", \"snippet\"],  # optional: scope Tier 2 to these JSON keys\n    config={\n        \"tier2\": {\n            \"high_risk_threshold\": 0.8,\n            \"tier2_fields\": None,  # or list[str]; constructor tier2_fields wins if set\n        },\n    },\n)\n```\n\n### `defense.defend_tool_result(value, tool_name)`\n\nRuns Tier 1 sanitization on risky fields, then Tier 2 on extracted text (with optional field scoping). **Synchronous** — no `await`.\n\n```python\n@dataclass\nclass DefenseResult:\n    allowed: bool\n    risk_level: RiskLevel\n    sanitized: Any\n    detections: list[str]\n    fields_sanitized: list[str]\n    patterns_by_field: dict[str, list[str]]\n    tier2_score: float | None = None\n    tier2_skip_reason: str | None = None\n    max_sentence: str | None = None\n    latency_ms: float = 0.0\n```\n\n### `defense.defend_tool_results(items)`\n\n```python\nresults = defense.defend_tool_results([\n    {\"value\": email_data, \"tool_name\": \"gmail_get_message\"},\n    {\"value\": doc_data, \"tool_name\": \"documents_get\"},\n    {\"value\": pr_data, \"tool_name\": \"github_get_pull_request\"},\n])\nfor r in results:\n    if not r.allowed:\n        print(\"Blocked:\", \", \".join(r.fields_sanitized))\n```\n\n### `defense.analyze(text)`\n\nTier 1 only — useful for debugging pattern hits without full tool-result traversal.\n\n### Tier 2 warmup\n\n```python\ndefense = create_prompt_defense()\ndefense.warmup_tier2()  # no-op if enable_tier2=False or ONNX extra missing\n```\n\n## Integration example\n\n```python\nfrom stackone_defender import create_prompt_defense\n\ndefense = create_prompt_defense(block_high_risk=True)\ndefense.warmup_tier2()\n\ndef run_tool_and_defend(raw_result: dict, tool_name: str):\n    outcome = defense.defend_tool_result(raw_result, tool_name)\n    if not outcome.allowed:\n        return {\"error\": \"Content blocked by safety filter\", \"risk_level\": outcome.risk_level}\n    return outcome.sanitized\n\n# Example agent loop\nsanitized = run_tool_and_defend(gmail_api.get_message(msg_id), \"gmail_get_message\")\n```\n\n## Risky field detection\n\nOnly **string** values under configured “risky” keys are scanned and sanitized. [`RiskyFieldConfig`](https://github.com/StackOneHQ/stackone-defender/blob/main/src/stackone_defender/types.py) provides global names/patterns plus **`tool_overrides`** (wildcard tool names → field list), same idea as the npm package.\n\n| Tool pattern | Scanned fields |\n|--------------|----------------|\n| `gmail_*`, `email_*` | subject, body, snippet, content |\n| `documents_*` | name, description, content, title |\n| `github_*` | name, title, body, description, message |\n| `hris_*` | name, notes, bio, description |\n| `ats_*` | name, notes, description, summary |\n| `crm_*` | name, description, notes, content |\n\nOtherwise the default list applies: `name`, `description`, `content`, `title`, `notes`, `summary`, `bio`, `body`, `text`, `message`, `comment`, `subject`, plus suffix patterns like `*_body`, `*_description`, etc. Structural keys such as `id`, `url`, `created_at` are not treated as risky by default.\n\n## Development\n\n```bash\nuv sync --group dev\nuv run pytest\n```\n\n## License\n\nApache-2.0 — see [LICENSE](./LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstackonehq%2Fstackone-defender","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstackonehq%2Fstackone-defender","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstackonehq%2Fstackone-defender/lists"}