{"id":51293173,"url":"https://github.com/askuma/guardrailprobe","last_synced_at":"2026-06-30T12:03:45.322Z","repository":{"id":363130499,"uuid":"1252097617","full_name":"askuma/guardrailprobe","owner":"askuma","description":"Independent OWASP LLM Top 10 benchmark for AI guardrail backends. Test your safety layer, not your model. NeMo · Presidio · Lakera · OpenAI · Azure · AWS.","archived":false,"fork":false,"pushed_at":"2026-06-24T12:15:27.000Z","size":629,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-24T14:10:12.580Z","etag":null,"topics":["ai-safety","benchmark","compliance","eu-ai-act","fastapi","guardrails","lakera","llm-security","nemo","owasp","presidio","prompt-injection","python","red-teaming"],"latest_commit_sha":null,"homepage":"https://askuma.github.io/guardrailprobe","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/askuma.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-28T07:27:06.000Z","updated_at":"2026-06-24T12:15:31.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/askuma/guardrailprobe","commit_stats":null,"previous_names":["askuma/guardrail_framework_complete"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/askuma/guardrailprobe","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/askuma%2Fguardrailprobe","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/askuma%2Fguardrailprobe/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/askuma%2Fguardrailprobe/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/askuma%2Fguardrailprobe/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/askuma","download_url":"https://codeload.github.com/askuma/guardrailprobe/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/askuma%2Fguardrailprobe/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34965647,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-30T02:00:05.919Z","response_time":92,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-safety","benchmark","compliance","eu-ai-act","fastapi","guardrails","lakera","llm-security","nemo","owasp","presidio","prompt-injection","python","red-teaming"],"created_at":"2026-06-30T12:03:44.235Z","updated_at":"2026-06-30T12:03:45.308Z","avatar_url":"https://github.com/askuma.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# guardrailprobe\n\n**Provider-agnostic AI guardrail benchmarking tool.**\nTests your guardrail layer — not your model — across 11 backends against the OWASP LLM Top 10.\n\n[![CI](https://github.com/askuma/guardrailprobe/actions/workflows/ci.yml/badge.svg)](https://github.com/askuma/guardrailprobe/actions/workflows/ci.yml)\n[![PyPI](https://img.shields.io/pypi/v/guardrailprobe)](https://pypi.org/project/guardrailprobe/)\n[![License](https://img.shields.io/badge/license-Apache--2.0-blue)](LICENSE)\n[![Python](https://img.shields.io/pypi/pyversions/guardrailprobe)](https://pypi.org/project/guardrailprobe/)\n[![Probes](https://img.shields.io/badge/probes-78-22c55e)](METHODOLOGY.md)\n[![Backends](https://img.shields.io/badge/backends-12-0ea5e9)](#supported-backends)\n\n---\n\n## What it does\n\n`guardrailprobe` fires 78 attack probes at your guardrail endpoints and tells you which ones let attacks through. It produces:\n\n- **Pass/fail per probe** across OWASP LLM01–LLM10 and content-moderation categories\n- **Side-by-side comparison** of multiple backends in a single run\n- **Signed benchmark reports** (PDF with RFC 3161 timestamp, JSON, Markdown)\n- **Flask dashboard** for ad-hoc probe runs and report browsing\n\nNo framework lock-in. No cloud account required. Just point it at an endpoint and run.\n\n---\n\n## Supported backends\n\n| Backend | Adapter key | Notes |\n|---|---|---|\n| NVIDIA NeMo Guardrails | `nemo` | Requires `pip install guardrailprobe[nemo]` (includes `nemoguardrails`, `langchain`, `langchain-openai`, `langchain-aws`, `langchain-community`) |\n| Guardrails AI | `guardrails_ai` | Regex fallback always available; SDK optional |\n| Microsoft Presidio | `presidio` | Requires `pip install guardrailprobe[presidio]` |\n| Lakera Guard | `lakera` | Requires `LAKERA_GUARD_API_KEY` |\n| OpenAI Moderation | `openai_moderation` | Requires `OPENAI_API_KEY` |\n| Azure Content Safety | `azure_content_safety` | Requires `AZURE_CONTENT_SAFETY_KEY` + endpoint |\n| Azure Prompt Shields | `azure_prompt_shields` | Shares credentials with `azure_content_safety` — no separate key |\n| AWS Bedrock Guardrails | `aws_bedrock` | Requires `AWS_ACCESS_KEY_ID` + guardrail ID |\n| Meta LlamaFirewall | `llama_firewall` | Requires `pip install guardrailprobe[llamafirewall]` |\n| LLM Guard | `llm_guard` | Requires `pip install guardrailprobe[llm_guard]` |\n| GA Guard | `ga_guard` | Requires `GA_GUARD_API_URL` (must be `https://`) |\n\nAdapters with missing credentials return `SKIPPED` gracefully — partial configurations run fine.\n\n---\n\n## Installation\n\n```bash\npip install guardrailprobe\n```\n\nWith optional SDK extras:\n\n```bash\n# All extras\npip install \"guardrailprobe[all]\"\n\n# Pick what you need\npip install \"guardrailprobe[nemo,guardrails_ai,presidio]\"\n```\n\nSkip the spaCy model download (e.g. in CI):\n\n```bash\nGUARDRAILPROBE_SKIP_SPACY=1 pip install guardrailprobe\n```\n\n---\n\n## Quick start\n\n```bash\n# 1. Set up credentials — interactive wizard (or copy .env.example to .env and edit manually)\nguardrailprobe init\n\n# 2. Check which backends are ready\nguardrailprobe status\n\n# 3. Run a benchmark (current month, all configured backends)\nguardrailprobe run --output-dir ./reports\n\n# 4. Run against specific backends only\nguardrailprobe run --backends lakera,openai_moderation --output-dir ./reports\n\n# 5. Launch the dashboard\nguardrailprobe dashboard\n```\n\n---\n\n## Docker\n\n### Start the dashboard\n\n```bash\ndocker compose up\n```\n\nOpen http://localhost:8080. The container starts even without a `.env` file — the **Setup Guide** card in the dashboard lists exactly which environment variables each unready adapter needs.\n\n### Configure credentials\n\n```bash\ncp .env.example .env\n# Fill in the keys for the backends you want to test, then:\ndocker compose up\n```\n\nThe `.env` file is optional. Any variables already exported in your shell are passed through automatically via the `environment:` block in `docker-compose.yml`.\n\n### Adapter status in Docker\n\nHere is the out-of-the-box status for each adapter and what you need to enable it:\n\n| Adapter | Dependencies | What you need |\n|---|---|---|\n| `guardrails_ai` | None (regex fallback built-in) | Nothing — works without credentials |\n| `presidio` | spaCy model (bundled in image) | Nothing — runs locally |\n| `nemo` | `nemoguardrails` + LangChain stack (bundled) | One LLM provider (priority order): AWS Bedrock (`AWS_ACCESS_KEY_ID` + `AWS_SECRET_ACCESS_KEY`, recommended — no rate limits), Ollama (`OLLAMA_BASE_URL`, local/offline), `NEMO_OPENAI_API_KEY`, `OPENROUTER_API_KEY`, `OPENAI_API_KEY`, `AZURE_OPENAI_API_KEY`, or `ANTHROPIC_API_KEY`. Works without any LLM key in colang pattern-matching mode. |\n| `aws_bedrock` | `boto3` SDK (bundled) | `AWS_BEDROCK_GUARDRAIL_ID`, `AWS_DEFAULT_REGION`, AWS credentials |\n| `lakera` | None — direct REST via `httpx` | `LAKERA_GUARD_API_KEY` |\n| `openai_moderation` | None — direct REST via `httpx` | `OPENAI_API_KEY` |\n| `azure_content_safety` | None — direct REST via `httpx` | `AZURE_CONTENT_SAFETY_KEY` + `AZURE_CONTENT_SAFETY_ENDPOINT` |\n| `azure_prompt_shields` | None — direct REST via `httpx` | Same as `azure_content_safety` — no separate key |\n| `ga_guard` | None — direct REST via `httpx` | `GA_GUARD_API_URL` (must be `https://`) |\n| `llama_firewall` | Volume-mounted (not in image) | See below |\n| `llm_guard` | Volume-mounted (not in image) | See below |\n\n### LlamaFirewall (Meta PromptGuard 2)\n\nLlamaFirewall runs a local ML model and is excluded from the Docker image to keep it lean.\n\n**Requirements:** Python 3.10–3.12 on the host (PyTorch is a transitive dependency).\n\n```bash\n# 1. Install into ./site-packages on your host\n# --ignore-installed avoids false conflicts with other packages in your host environment\npython3.12 -m pip install llamafirewall --target ./site-packages --ignore-installed\n\n# 2. Restart the container — the entrypoint detects the package automatically\ndocker compose up\n```\n\nOn startup you will see:\n\n```\n[guardrailprobe] site-packages mounted — llama_firewall: YES  llm_guard: NO\n```\n\nNo environment variables are required. LlamaFirewall runs fully offline.\n\n---\n\n### LLM Guard (Protect AI)\n\nLLM Guard runs PromptInjection and Toxicity scanners locally and is also excluded from the Docker image.\n\n**Requirements:** Python 3.9–3.12 on the host.\n\n```bash\n# 1. Install into ./site-packages on your host\npython3.12 -m pip install llm-guard --target ./site-packages --ignore-installed\n\n# 2. Restart the container\ndocker compose up\n```\n\nOn startup you will see:\n\n```\n[guardrailprobe] site-packages mounted — llama_firewall: NO  llm_guard: YES\n```\n\nNo environment variables are required. LLM Guard runs fully offline.\n\n---\n\n### Install both at once\n\n```bash\npython3.12 -m pip install llamafirewall llm-guard --target ./site-packages --ignore-installed\ndocker compose up\n```\n\nThe container prints the detected status for each package at startup and skips any that are absent — no configuration required.\n\n---\n\n### GA Guard endpoint\n\nPoint the `ga_guard` backend at your GA Guard (or any HTTPS guardrail) API:\n\n| Variable | Required | Description |\n|---|---|---|\n| `GA_GUARD_API_URL` | **Yes** | Target endpoint — must start with `https://` |\n| `GA_GUARD_API_KEY` | No | Bearer token or API key |\n| `GA_GUARD_AUTH_HEADER` | No | Header name for the key (default: `Authorization`) |\n\nAdd to `.env`:\n\n```bash\nGA_GUARD_API_URL=https://your-guardrail-api.example.com/check\nGA_GUARD_API_KEY=your-key-here\n# GA_GUARD_AUTH_HEADER=X-Api-Key   # only needed if the API uses a non-standard header\n```\n\n### One-shot benchmark via Docker\n\n```bash\ndocker compose run --rm guardrailprobe \\\n  guardrailprobe run --year 2026 --month 6 --output-dir /app/reports\n```\n\nReports are written to the `guardrailprobe_reports` named volume and also to `./docs/benchmarks` on the host (via the `./docs` bind mount).\n\n### Ollama (local LLM for NeMo, GPU recommended)\n\nThe container uses `network_mode: host` so `localhost:11434` inside the container reaches the host's Ollama process directly, without exposing Ollama to the LAN.\n\n```bash\n# Start Ollama on the host (separate terminal)\nollama serve\nollama pull llama3.2\n\n# Enable in .env\necho \"OLLAMA_BASE_URL=http://localhost:11434\" \u003e\u003e .env\n\ndocker compose up\n```\n\nOllama is **disabled by default** (`OLLAMA_BASE_URL=`). CPU inference with llama3.2 is too slow for NeMo's 3-call-per-probe workflow (~20 s/probe); a GPU is recommended. Without `OLLAMA_BASE_URL`, NeMo falls through to AWS Bedrock if credentials are present.\n\n### Skip the spaCy model download (CI / constrained environments)\n\n```bash\ndocker compose build --build-arg SKIP_SPACY=1\n```\n\n---\n\n## Probes\n\n78 built-in attack probes across 11 categories:\n\n| Category | OWASP ref | Probes |\n|---|---|---|\n| Prompt Injection | LLM01 | 7 |\n| Insecure Output Handling | LLM02 | 6 |\n| Training Data Poisoning | LLM03 | 5 |\n| Model Denial of Service | LLM04 | 6 |\n| Supply Chain Vulnerabilities | LLM05 | 5 |\n| Sensitive Info Disclosure | LLM06 | 7 |\n| Insecure Plugin Design | LLM07 | 6 |\n| Excessive Agency | LLM08 | 6 |\n| Overreliance | LLM09 | 5 |\n| Model Theft | LLM10 | 5 |\n| Content Moderation | CM-001–020 | 20 |\n| **Total** | | **78** |\n\nSee [METHODOLOGY.md](METHODOLOGY.md) for probe design, scoring, and reproduction steps.\n\n---\n\n## Reports\n\nEach `guardrailprobe run` produces three artifacts in the output directory:\n\n```\nreports/\n  benchmark_2026_06.pdf   # Signed PDF with RFC 3161 timestamp\n  benchmark_2026_06.json  # Machine-readable full results\n  benchmark_2026_06.md    # Human-readable summary\n```\n\nTo sign reports with your own certificate:\n\n```bash\nguardrailprobe cert generate          # self-signed P12 for testing\nguardrailprobe cert show              # inspect the active signing cert\nguardrailprobe cert verify report.pdf # verify an existing report\n```\n\nSet `GUARDRAIL_SIGNING_KEY_P12` to the path of your P12 file.\n\n---\n\n## Configuration\n\nChoose either approach — both produce the same `.env` file.\n\n### Option A — Interactive wizard (recommended)\n\n```bash\nguardrailprobe init\n```\n\nWalks through each backend and prompts for keys. Press **Enter** to skip any adapter you don't have credentials for. Writes only what you enter to `.env`.\n\n### Option B — Edit manually\n\n```bash\ncp .env.example .env\n# Open .env and fill in the keys for the backends you want to test\n```\n\n---\n\n### Key variables\n\n| Variable | Backend |\n|---|---|\n| `LAKERA_GUARD_API_KEY` | Lakera Guard |\n| `OPENAI_API_KEY` | OpenAI Moderation; NeMo fallback (priority 5) |\n| `AZURE_CONTENT_SAFETY_ENDPOINT` + `AZURE_CONTENT_SAFETY_KEY` | Azure Content Safety **and** Azure Prompt Shields |\n| `AWS_ACCESS_KEY_ID` + `AWS_SECRET_ACCESS_KEY` | AWS Bedrock Guardrails; NeMo LLM via Bedrock (priority 3, recommended) |\n| `AWS_BEDROCK_GUARDRAIL_ID` + `AWS_DEFAULT_REGION` | AWS Bedrock Guardrails — guardrail ID and region |\n| `GA_GUARD_API_URL` | GA Guard / any HTTPS guardrail endpoint |\n| `GA_GUARD_API_KEY` | GA Guard — optional API key |\n| `GUARDRAIL_SIGNING_KEY_P12` | PDF signing certificate path |\n\n#### NeMo-specific variables\n\n| Variable | Default | Description |\n|---|---|---|\n| `NEMO_OPENAI_API_KEY` | — | Dedicated OpenAI key for NeMo only (priority 1); avoids sharing with OpenAI Moderation backend |\n| `NEMO_OPENAI_MODEL` | `gpt-4o-mini` | Model when using OpenAI or OpenRouter as NeMo LLM |\n| `NEMO_BEDROCK_MODEL` | `amazon.nova-pro-v1:0` | Bedrock model for NeMo intent classification |\n| `OPENROUTER_API_KEY` | — | OpenRouter free-tier LLM for NeMo (priority 4, 16 req/min limit) |\n| `OPENROUTER_MODEL` | `nvidia/nemotron-3-nano-30b-a3b:free` | Model when using OpenRouter |\n| `OLLAMA_BASE_URL` | _(empty)_ | Local Ollama endpoint for NeMo (priority 2); set to `http://localhost:11434` to enable. Requires GPU — CPU inference is too slow for NeMo's 3-call-per-probe flow |\n| `OLLAMA_MODEL` | `llama3.2` | Ollama model name |\n| `AZURE_OPENAI_API_KEY` + `AZURE_OPENAI_ENDPOINT` | — | Azure OpenAI as NeMo LLM (priority 6) |\n| `ANTHROPIC_API_KEY` | — | Anthropic Claude as NeMo LLM via LangChain (priority 7) |\n\nAfter either option, verify which backends are ready:\n\n```bash\nguardrailprobe status\n```\n\n---\n\n## Python API\n\n```python\nfrom guardrailprobe import GuardrailBackend\nfrom guardrailprobe.runner import RedTeamRunner\nfrom guardrailprobe.probes import ProbeLibrary, AttackCategory\n\nrunner = RedTeamRunner()\nlibrary = ProbeLibrary()\n\n# Run all probes against one backend\nreport = runner.run(GuardrailBackend.LAKERA, library.all_probes())\nprint(f\"Pass rate: {report.pass_rate:.1%}\")\n\n# Compare backends\ncomparison = runner.compare_backends(\n    [GuardrailBackend.LAKERA, GuardrailBackend.OPENAI_MODERATION],\n    library.all_probes(),\n)\nprint(f\"Best overall: {comparison.best_overall}\")\n\n# Filter probes\ninjection_probes = library.get_by_category(AttackCategory.PROMPT_INJECTION)\ncritical_probes  = library.get_by_severity(\"critical\")\ncm_probes        = library.get_content_moderation_probes()\n```\n\n---\n\n## Development\n\n```bash\nGUARDRAILPROBE_SKIP_SPACY=1 pip install -e \".[dev]\"\npytest tests/ -v\nruff check guardrailprobe/ tests/\n```\n\n---\n\n## License\n\nApache-2.0 — see [LICENSE](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faskuma%2Fguardrailprobe","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faskuma%2Fguardrailprobe","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faskuma%2Fguardrailprobe/lists"}