{"id":47977081,"url":"https://github.com/vladlen-codes/llm-security-toolkit","last_synced_at":"2026-04-04T10:57:25.785Z","repository":{"id":342284471,"uuid":"1173262888","full_name":"vladlen-codes/llm-security-toolkit","owner":"vladlen-codes","description":"Python library that sits between your app and the LLM client, adding security checks around every model call.","archived":false,"fork":false,"pushed_at":"2026-03-31T17:57:17.000Z","size":171,"stargazers_count":4,"open_issues_count":0,"forks_count":2,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-04T10:57:21.728Z","etag":null,"topics":["ai","ai-security","llm-inference","security"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vladlen-codes.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-05T07:11:39.000Z","updated_at":"2026-03-31T17:57:21.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/vladlen-codes/llm-security-toolkit","commit_stats":null,"previous_names":["vladlen-codes/llm-security-toolkit"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/vladlen-codes/llm-security-toolkit","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vladlen-codes%2Fllm-security-toolkit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vladlen-codes%2Fllm-security-toolkit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vladlen-codes%2Fllm-security-toolkit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vladlen-codes%2Fllm-security-toolkit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vladlen-codes","download_url":"https://codeload.github.com/vladlen-codes/llm-security-toolkit/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vladlen-codes%2Fllm-security-toolkit/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31397056,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-04T10:20:44.708Z","status":"ssl_error","status_checked_at":"2026-04-04T10:20:06.846Z","response_time":60,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","ai-security","llm-inference","security"],"created_at":"2026-04-04T10:57:25.268Z","updated_at":"2026-04-04T10:57:25.755Z","avatar_url":"https://github.com/vladlen-codes.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# LLM Security Toolkit\n### Architecture \u0026 Detailed Technical Specification\n\n\u003e A production-grade Python middleware library for securing every LLM call, input guards, output validation, tool-call enforcement, and policy-driven control.\n\n---\n\n## Table of Contents\n\n1. [What Is This Project?](#1-what-is-this-project)\n2. [High-Level Architecture](#2-high-level-architecture)\n3. [Repository Structure](#3-repository-structure)\n4. [Core Types \u0026 Models](#4-core-types--models)\n5. [Policy Engine](#5-policy-engine)\n6. [The Guards Layer](#6-the-guards-layer)\n7. [Providers Layer](#7-providers-layer)\n8. [Middleware Layer](#8-middleware-layer)\n9. [Logging \u0026 Exceptions](#9-logging--exceptions)\n10. [Public API Surface](#10-public-api-surface)\n11. [Tests, Examples \u0026 Docs](#11-tests-examples--docs)\n12. [Extensibility \u0026 Design Principles](#12-extensibility--design-principles)\n13. [Future Roadmap](#13-future-roadmap)\n\n---\n\n## 1. What Is This Project?\n\nThe **LLM Security Toolkit** is a Python middleware library that sits between your application code and any LLM provider, intercepting every model call to enforce security checks before and after the AI responds.\n\nThink of it as a security firewall specifically designed for AI calls:\n\n- Scans every prompt for **injection and jailbreak patterns** before it reaches the model\n- Validates every response for **unsafe content, credential leaks, or dangerous commands**\n- Enforces **schema rules** on every tool/function call the model tries to make\n- Applies **configurable policies** to decide whether to block, warn, or log\n\nThe library exposes a clean, importable API, just a few extra lines in any existing Python AI app. No infrastructure changes required.\n\n| Property | Value |\n|---|---|\n| Type | Python library (importable package, pip-installable) |\n| Purpose | Security middleware between app code and LLM provider |\n| Primary Interface | Decorator / context manager / provider wrapper |\n| Guards | Prompt injection, unsafe output, dangerous tool calls |\n| Policy Engine | YAML or Python dict — per-endpoint policies |\n| Provider Support | OpenAI (v1), Generic callable, extensible to Claude, Gemini |\n| Framework Support | FastAPI (native), Flask (planned) |\n| Return Type | `GuardDecision { allowed, score, reasons, safe_output }` |\n\n---\n\n## 2. High-Level Architecture\n\n### 2.1 System Overview\n\nThe toolkit is organized into **five distinct layers**, each with a clearly scoped responsibility:\n\n```\n┌─────────────────────────────────────────────────────────────┐\n│                      Your Application                       │\n└────────────────────────────┬────────────────────────────────┘\n                             │\n┌────────────────────────────▼────────────────────────────────┐\n│                 Middleware Layer (FastAPI / Flask)           │\n│          Dependency injection or global middleware           │\n└────────────────────────────┬────────────────────────────────┘\n                             │\n┌────────────────────────────▼────────────────────────────────┐\n│                    Providers Layer                          │\n│         OpenAIProvider / GenericProvider (adapters)         │\n└──────┬─────────────────────┴──────────────────┬────────────┘\n       │                                         │\n┌──────▼──────┐                         ┌───────▼──────┐\n│   Guards    │                         │   Guards     │\n│  (Input)    │                         │  (Output)    │\n│ prompts.py  │                         │ outputs.py   │\n│  tools.py   │                         │  tools.py    │\n└──────┬──────┘                         └───────┬──────┘\n       │                                         │\n┌──────▼─────────────────────────────────────────▼────────────┐\n│                     Policy Engine                           │\n│           Policy | config.py | policies.py                  │\n└─────────────────────────────────────────────────────────────┘\n```\n\n### 2.2 Request / Response Flow\n\nEvery guarded LLM call follows a **six-stage pipeline**:\n\n| # | Stage | What Happens |\n|---|---|---|\n| 1 | App calls guarded provider | `guarded_openai_chat(prompt, tools, policy)` |\n| 2 | Input scanners run | `scan_prompt()` on prompt + system instructions |\n| 3 | Risk decision | Block / Warn / Allow based on policy thresholds |\n| 4 | Forward to LLM | Real API call to OpenAI / local model |\n| 5 | Output scanners run | `scan_output()` + `validate_tool_call()` |\n| 6 | Return GuardDecision | `{ allowed, score, reasons, safe_output }` |\n\n\u003e At each stage, if a policy threshold is exceeded, the pipeline **short-circuits** and returns a `GuardDecision` immediately — the LLM is never reached for blocked inputs, and blocked outputs are never returned to the user.\n\n---\n\n## 3. Repository Structure\n\nThe project follows the **src-layout** convention to avoid import conflicts and mirrors the separation of concerns across its five internal layers:\n\n```\nllm-security-toolkit/\n├── README.md                    # Project overview and quick-start\n├── CONTRIBUTING.md              # Fork \u0026 contribution guide\n├── CODE_OF_CONDUCT.md           # Community standards\n├── LICENSE                      # MIT (encourages forks)\n├── pyproject.toml               # Build config, deps, tool settings\n├── .pre-commit-config.yaml      # ruff, black, mypy on every commit\n├── .github/\n│   └── workflows/ci.yml         # Tests + lint on push / PR\n│\n├── src/llm_security/            # Main package (src layout)\n│   ├── __init__.py              # Public re-exports\n│   ├── types.py                 # ScanResult, GuardDecision, ToolCall\n│   ├── policies.py              # Policy models + built-in presets\n│   ├── config.py                # YAML / dict → Policy loaders\n│   ├── exceptions.py            # BlockedByPolicyError, etc.\n│   ├── logging.py               # log_decision() + hooks\n│   ├── guards/\n│   │   ├── prompts.py           # Input / injection guards\n│   │   ├── outputs.py           # Output / content guards\n│   │   └── tools.py             # Tool-call validation guards\n│   ├── providers/\n│   │   ├── base.py              # ProviderAdapter ABC\n│   │   ├── openai.py            # OpenAI concrete adapter\n│   │   └── generic.py           # Generic callable adapter\n│   └── middleware/\n│       ├── fastapi.py           # FastAPI dependency + middleware\n│       └── flask.py             # Flask (planned)\n│\n├── tests/                       # Pytest test suite\n├── examples/                    # Runnable minimal examples\n└── docs/                        # MkDocs documentation\n```\n\n---\n\n## 4. Core Types \u0026 Models\n\n\u003e **File:** `src/llm_security/types.py`\n\nEvery part of the library speaks the same three data structures. These are the *lingua franca* of the entire package.\n\n### ScanResult\n\nThe output of a single guard check. Every guard function returns one of these:\n\n```python\n@dataclass\nclass ScanResult:\n    allowed:     bool            # True = safe to proceed\n    score:       float           # 0.0 (safe) → 1.0 (critical risk)\n    reasons:     List[str]       # Human-readable explanations\n    safe_output: Optional[str]   # Redacted text (output guards only)\n```\n\n### GuardDecision\n\nThe top-level result returned to your application — an aggregation of all `ScanResult`s from all active guards:\n\n```python\n@dataclass\nclass GuardDecision:\n    allowed:      bool\n    score:        float\n    reasons:      List[str]\n    safe_output:  Optional[str]\n    scan_results: List[ScanResult]  # Full audit trail\n```\n\n### ToolCall\n\nRepresents a structured tool/function invocation that the model requested:\n\n```python\n@dataclass\nclass ToolCall:\n    name:   str    # e.g. 'read_file'\n    args:   Dict   # e.g. { 'path': '/etc/passwd' }\n    schema: Dict   # JSON Schema the args must conform to\n```\n\n---\n\n## 5. Policy Engine\n\n\u003e **Files:** `policies.py` + `config.py`\n\nA `Policy` is the single configuration object that controls the entire security pipeline. Every guard, every provider, every middleware reads from it.\n\n### 5.1 Policy Structure\n\n```python\nclass Policy(BaseModel):\n    # Guard toggles\n    prompt_guard_enabled:  bool  = True\n    output_guard_enabled:  bool  = True\n    tool_guard_enabled:    bool  = True\n\n    # Thresholds (0.0 – 1.0)\n    block_threshold:  float = 0.75   # Score above this → block\n    warn_threshold:   float = 0.40   # Score above this → log warning\n\n    # Allowed tool names (None = allow all)\n    allowed_tools:  Optional[List[str]] = None\n\n    # On block: raise exception OR return GuardDecision\n    raise_on_block: bool = True\n```\n\n### 5.2 Built-in Policy Presets\n\n| Policy | Behavior | Best For |\n|---|---|---|\n| `StrictPolicy` | Block on any risk signal | Production, sensitive apps |\n| `BalancedPolicy` | Block high-risk, warn medium | Standard apps (default) |\n| `LoggingOnlyPolicy` | Never block — log only | Development / testing |\n\n### 5.3 Loading Policies\n\n```python\n# From YAML file (recommended for production)\npolicy = load_policy_from_yaml('policies/production.yaml')\n\n# From dict (useful in tests)\npolicy = load_policy_from_dict({\n    'block_threshold': 0.8,\n    'allowed_tools': ['read_file', 'search_web'],\n})\n```\n\n---\n\n## 6. The Guards Layer\n\n\u003e **Files:** `src/llm_security/guards/`\n\nGuards are the **security brain** of the toolkit. Each guard module is small, focused, and independently testable — designed to be easy to fork and extend with new detection rules.\n\n| Guard Module | Pattern Detected | Category | Default Action |\n|---|---|---|---|\n| Prompt Guard | `\"ignore previous instructions\"` | Injection | Block or warn |\n| Prompt Guard | `\"pretend you are the system\"` | Jailbreak | Block or warn |\n| Prompt Guard | Requests to reveal hidden context | Exfiltration | Block |\n| Output Guard | API keys, tokens, passwords | Secret leak | Redact + warn |\n| Output Guard | Shell commands (`rm -rf`, `curl`, etc.) | OS command | Block |\n| Output Guard | Self-harm or malware instructions | Content | Block |\n| Tool Guard | Invalid tool name | Schema | Block |\n| Tool Guard | `rm -rf /` or admin API calls | Dangerous op | Block |\n| Tool Guard | Args not matching schema | Validation | Block |\n\n### 6.1 Prompt Guard — `guards/prompts.py`\n\nRuns **before** any API call. Scans the user prompt and system instructions for patterns that indicate an attempt to subvert the model's behaviour.\n\n```python\ndef scan_prompt(prompt: str, policy: Policy) -\u003e ScanResult:\n    \"\"\"\n    Heuristic patterns checked:\n      - 'ignore previous instructions' / 'disregard above'\n      - 'you are now the system prompt'\n      - 'repeat everything above' (context exfiltration)\n      - 'DAN' jailbreak variants\n      - Base64 encoded instructions\n    Returns ScanResult with score + reasons.\n    \"\"\"\n```\n\n### 6.2 Output Guard — `guards/outputs.py`\n\nRuns on every token of the model's response before it reaches your application. Can optionally **redact** sensitive material rather than blocking outright.\n\n```python\ndef scan_output(text: str, policy: Policy) -\u003e ScanResult:\n    \"\"\"\n    Patterns checked:\n      - Credential regexes (API keys, JWTs, SSH private keys)\n      - Shell command patterns (rm, curl, wget, sudo)\n      - Malware / ransomware indicators\n      - Self-harm or violence instructions\n    safe_output field will contain redacted version if score \u003c block_threshold.\n    \"\"\"\n```\n\n### 6.3 Tool Call Guard — `guards/tools.py`\n\nIntercepts every function/tool invocation the model wants to make and validates it against the policy's allowlist and the tool's JSON schema.\n\n```python\ndef validate_tool_call(call: ToolCall, policy: Policy) -\u003e ScanResult:\n    \"\"\"\n    Checks applied:\n      - Tool name in policy.allowed_tools (if allowlist defined)\n      - Args validate against call.schema (jsonschema)\n      - Blocked operation patterns (file deletion, network scanning)\n      - Internal admin API URL detection\n    \"\"\"\n```\n\n---\n\n## 7. Providers Layer\n\n\u003e **Files:** `src/llm_security/providers/`\n\nProviders wrap real LLM clients. They orchestrate the full guard pipeline — input scan → forward → output scan — and return a `GuardDecision` to the caller.\n\n| File | Responsibility |\n|---|---|\n| `providers/base.py` | Abstract base class `ProviderAdapter`. Defines the `chat()` interface all providers must implement. |\n| `providers/openai.py` | Concrete adapter wrapping the OpenAI Python SDK. Runs all guards automatically around every `chat()` call. |\n| `providers/generic.py` | Accepts any callable as the LLM. The user passes their own client function; the adapter handles the full guard flow around it. |\n\n### 7.1 ProviderAdapter Interface\n\n```python\nclass ProviderAdapter(ABC):\n    @abstractmethod\n    def chat(\n        self, *,\n        messages: List[Dict],\n        tools:    Optional[List[Dict]] = None,\n        policy:   Optional[Policy] = None,\n    ) -\u003e GuardDecision: ...\n```\n\n### 7.2 OpenAI Adapter Flow\n\n1. `scan_prompt()` on all user + system messages\n2. If allowed → call `openai.chat.completions.create(...)`\n3. `scan_output()` on response content\n4. `validate_tool_call()` on any `tool_calls` the model requested\n5. Return `GuardDecision` aggregating all results\n\n---\n\n## 8. Middleware Layer\n\n\u003e **Files:** `src/llm_security/middleware/`\n\nThe middleware layer makes it trivial to guard an entire HTTP endpoint with almost no code change.\n\n### 8.1 FastAPI Integration\n\n```python\n# Dependency injection — guard all calls to /chat\ndef get_guarded_openai(policy: Policy = BalancedPolicy()):\n    return OpenAIProvider(policy=policy)\n\n@app.post('/chat')\nasync def chat(\n    req: ChatRequest,\n    provider: OpenAIProvider = Depends(get_guarded_openai),\n):\n    decision = provider.chat(messages=req.messages)\n    if not decision.allowed:\n        raise HTTPException(400, detail=decision.reasons)\n    return { 'reply': decision.safe_output }\n```\n\n### 8.2 Middleware vs Dependency\n\n| Approach | Best For |\n|---|---|\n| Dependency (`Depends`) | Per-route policy. Inject a different provider per endpoint. Most flexible. |\n| Middleware class | Global policy applied to every request. Good for org-wide defaults. |\n| Flask middleware | Planned for v1.1. Same pattern adapted for Flask's `before/after_request` hooks. |\n\n---\n\n## 9. Logging \u0026 Exceptions\n\n### 9.1 Structured Logging — `logging.py`\n\nEvery `GuardDecision` can be passed to `log_decision()` which emits a structured JSON log entry compatible with any logging backend:\n\n```python\ndef log_decision(decision: GuardDecision, logger: logging.Logger) -\u003e None:\n    logger.info({\n        'allowed':   decision.allowed,\n        'score':     decision.score,\n        'reasons':   decision.reasons,\n        'timestamp': datetime.utcnow().isoformat(),\n    })\n# Future: OpenTelemetry spans, Datadog trace hooks\n```\n\n### 9.2 Exception Hierarchy — `exceptions.py`\n\n| Exception | When Raised |\n|---|---|\n| `BlockedByPolicyError` | Prompt or output exceeds `block_threshold` and `raise_on_block=True` |\n| `InvalidToolCallError` | Tool name not in allowlist, or args fail schema validation |\n\n---\n\n## 10. Public API Surface\n\n\u003e **File:** `src/llm_security/__init__.py`\n\nThe top-level package re-exports everything a user needs. Nothing implementation-specific is public:\n\n```python\nfrom .providers.openai  import OpenAIProvider\nfrom .providers.generic import GenericProvider\nfrom .policies          import StrictPolicy, BalancedPolicy, LoggingOnlyPolicy\nfrom .config            import load_policy_from_dict, load_policy_from_yaml\nfrom .types             import ScanResult, GuardDecision, ToolCall\nfrom .exceptions        import BlockedByPolicyError, InvalidToolCallError\n\n__all__ = [\n    'OpenAIProvider', 'GenericProvider',\n    'StrictPolicy', 'BalancedPolicy', 'LoggingOnlyPolicy',\n    'load_policy_from_dict', 'load_policy_from_yaml',\n    'ScanResult', 'GuardDecision', 'ToolCall',\n    'BlockedByPolicyError', 'InvalidToolCallError',\n]\n```\n\n---\n\n## 11. Tests, Examples \u0026 Docs\n\n### 11.1 Test Suite — `tests/`\n\n| File | Covers |\n|---|---|\n| `test_policies.py` | Policy loading from dict and YAML, threshold logic, preset validation |\n| `test_guards_prompts.py` | Each injection and jailbreak pattern: pass and fail cases |\n| `test_guards_outputs.py` | Credential regex, OS command patterns, content categories |\n| `test_guards_tools.py` | Schema validation, allowlist enforcement, blocked operations |\n| `test_providers_openai.py` | OpenAI adapter with mocked API — full pipeline test |\n| `test_middleware_fastapi.py` | FastAPI TestClient integration — dependency injection |\n\n### 11.2 Examples — `examples/`\n\n- `basic_openai_guard.py` — Minimal OpenAI guard in 15 lines\n- `fastapi_endpoint_guard.py` — Full FastAPI endpoint with policy injection\n- `custom_policy_example.py` — Writing and loading a custom YAML policy\n\n### 11.3 Documentation — `docs/`\n\n- `getting-started.md` — Install, first call, first policy\n- `configuration.md` — Full Policy reference and YAML schema\n- `providers.md` — How to add a new ProviderAdapter\n- `middleware.md` — FastAPI and Flask integration guides\n- `contributing.md` — Adding new guard rules, running tests\n\n---\n\n## 12. Extensibility \u0026 Design Principles\n\nThe toolkit is deliberately designed to be **fork-friendly** and **contribution-friendly**. These principles guide every architectural decision:\n\n### Small, Focused Guards\nEach guard function is a single Python function with one job. Adding a new detection rule means adding one function and one test — no class hierarchies to navigate.\n\n### Policy-First Design\nAll security decisions flow through the `Policy` object. Operators can change security posture (strict vs. logging-only) with a config file change — no code change required.\n\n### Provider Abstraction\nThe `ProviderAdapter` ABC means any LLM client can be wrapped. Adding Claude, Gemini, or a local Ollama model requires implementing one method: `chat()`.\n\n### Zero Infra Requirement\nThe toolkit is a pure Python package. No sidecar, no agent, no proxy. It runs in-process alongside your existing app.\n\n---\n\n## 13. Future Roadmap\n\n| Version | Feature | Status |\n|---|---|---|\n| v1.0 | OpenAI adapter + prompt/output/tool guards + FastAPI | Planned |\n| v1.1 | Anthropic (Claude) provider adapter | Planned |\n| v1.1 | Flask middleware | Planned |\n| v1.2 | OpenTelemetry tracing integration | Idea |\n| v1.3 | Gemini + local model (Ollama) adapters | Idea |\n| v2.0 | Optional hosted SaaS gateway pairing | Future |\n\n---\n\n*LLM Security Toolkit — Architecture Document v1.0*\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvladlen-codes%2Fllm-security-toolkit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvladlen-codes%2Fllm-security-toolkit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvladlen-codes%2Fllm-security-toolkit/lists"}