https://github.com/trustyai-explainability/llama-stack-provider-trustyai-garak
Out-Of-Tree Llama Stack Eval Provider for Red Teaming LLM Systems with Garak
https://github.com/trustyai-explainability/llama-stack-provider-trustyai-garak
garak llm-security llmops redteaming responsible-ai
Last synced: 3 months ago
JSON representation
Out-Of-Tree Llama Stack Eval Provider for Red Teaming LLM Systems with Garak
- Host: GitHub
- URL: https://github.com/trustyai-explainability/llama-stack-provider-trustyai-garak
- Owner: trustyai-explainability
- License: apache-2.0
- Created: 2025-07-15T21:37:29.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2025-08-01T01:27:37.000Z (3 months ago)
- Last Synced: 2025-08-01T03:25:12.147Z (3 months ago)
- Topics: garak, llm-security, llmops, redteaming, responsible-ai
- Language: Python
- Homepage:
- Size: 111 KB
- Stars: 0
- Watchers: 0
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# TrustyAI Garak (`trustyai_garak`): Out-of-Tree Llama Stack Eval Provider for Garak Red Teaming
## About
This repository implements [Garak](https://github.com/NVIDIA/garak) as a Llama Stack out-of-tree provider for **security testing and red teaming** of Large Language Models with optional **Shield Integration** for enhanced security testing.
## Features
- **Security Vulnerability Detection**: Automated testing for prompt injection, jailbreaks, toxicity, and bias
- **Compliance Framework Support**: Pre-built benchmarks for established standards ([OWASP LLM Top 10](https://genai.owasp.org/llm-top-10/), [AVID taxonomy](https://docs.avidml.org/taxonomy/effect-sep-view))
- **Shield Integration**: Test LLMs with and without Llama Stack shields for comparative security analysis
- **Concurrency Control**: Configurable limits for concurrent scans and shield operations
- **Custom Probe Support**: Run specific garak security probes
- **Enhanced Reporting**: Multiple garak output formats including HTML reports and detailed logs
## Quick Start
### Prerequisites
- Python 3.12+
- Access to an OpenAI-compatible model endpoint
### Installation
```bash
# Clone the repository
git clone https://github.com/trustyai-explainability/llama-stack-provider-trustyai-garak.git
cd llama-stack-provider-trustyai-garak
# Create & activate venv
python3 -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install -e .
```
### Configuration
Set up your environment variables:
```bash
export VLLM_URL="http://your-model-endpoint/v1"
export INFERENCE_MODEL="your-model-name"
# Optional: Configure scan behavior
export GARAK_TIMEOUT="10800" # 3 hours default
export GARAK_MAX_CONCURRENT_JOBS="5" # Max concurrent scans
export GARAK_MAX_WORKERS="5" # Max workers for shield scanning
```
### Run Security Scans
#### Basic Mode (Standard Garak Scanning)
```bash
# Start the Llama Stack server
llama stack run run.yaml --image-type venv
# The server will be available at http://localhost:8321
```
#### Enhanced Mode (With Shield Integration)
```bash
# Start with safety and shield capabilities
llama stack run run-with-safety.yaml --image-type venv
# Includes safety, shields, and telemetry APIs
```
## Demos
Interactive examples are available in the `demos/` directory:
- **[Getting Started](demos/01-getting_started_with_garak.ipynb)**: Basic usage with predefined scan profiles and user-defined garak probes
- **[Scan Guardrailed System](demos/02-scan_with_shields.ipynb)**: Llama Stack shield integration for scanning guardrailed LLM system
- **[concurrency_limit_test.ipynb](demos/concurrency_limit_test.ipynb)**: Testing concurrent scan limits
## Compliance Frameworks
Pre-registered compliance framework benchmarks available immediately:
### Compliance Standards
| Framework | Benchmark ID | Description | Duration |
|-----------|--------------|--------------| --------|
| **[OWASP LLM Top 10](https://genai.owasp.org/llm-top-10/)** | `owasp_llm_top10` | OWASP Top 10 for Large Language Model Applications | ~8 hours |
| **[AVID Security](https://docs.avidml.org/taxonomy/effect-sep-view/security)** | `avid_security` | AI Vulnerability Database - Security vulnerabilities | ~8 hours |
| **[AVID Ethics](https://docs.avidml.org/taxonomy/effect-sep-view/ethics)** | `avid_ethics` | AI Vulnerability Database - Ethical concerns | ~30 minutes |
| **[AVID Performance](https://docs.avidml.org/taxonomy/effect-sep-view/performance)** | `avid_performance` | AI Vulnerability Database - Performance issues | ~40 minutes |
### Scan Profiles for Testing
| Profile | Benchmark ID | Duration | Probes |
|---------|--------------|----------|---------|
| **Quick** | `quick` | ~5 minutes | Essential security checks (3 specific probes) |
| **Standard** | `standard` | ~1 hour | Standard attack vectors (5 probe categories) |
_Note: All the above duration estimates are calculated with a Qwen2.5 7B model deployed via vLLM on Openshift._
## Usage Examples
### Discover Available Benchmarks
```python
from llama_stack_client import LlamaStackClient
client = LlamaStackClient(base_url="http://localhost:8321")
# List all available benchmarks (auto-registered)
benchmarks = client.benchmarks.list()
for benchmark in benchmarks.data:
print(f"- {benchmark.identifier}: {benchmark.metadata.get('name', 'No name')}")
```
### Compliance Framework Testing
```python
# Run OWASP LLM Top 10 security assessment
job = client.eval.run_eval(
benchmark_id="owasp_llm_top10",
benchmark_config={
"eval_candidate": {
"type": "model",
"model": "qwen2", # change this to your inference model name
"sampling_params": {
"max_tokens": 100
},
}
},
)
# Run AVID Security assessment
job = client.eval.run_eval(
benchmark_id="avid_security",
benchmark_config={
"eval_candidate": {
"type": "model",
"model": "qwen2",
"sampling_params": {
"max_tokens": 100
},
}
},
)
```
### Built-in Scan Profiles for testing
```python
# Quick security scan (5 min)
job = client.eval.run_eval(
benchmark_id="quick",
benchmark_config={
"eval_candidate": {
"type": "model",
"model": "qwen2", # change this to your inference model name
"sampling_params": {
"max_tokens": 100
},
}
},
)
```
### Custom Garak Probes
```python
# Register custom probes
client.benchmarks.register(
benchmark_id="custom",
dataset_id="garak", # placeholder
scoring_functions=["garak_scoring"], # placeholder
provider_benchmark_id="custom",
provider_id="trustyai_garak",
metadata={
"probes": ["latentinjection.LatentJailbreak", "snowball.GraphConnectivity"],
"timeout": 900 # 15 minutes
}
)
```
### Shield Integration (Enhanced Mode)
```python
# Test with input shields only
client.benchmarks.register(
benchmark_id="PI_with_input_shield",
dataset_id="garak", # placeholder
scoring_functions=["garak_scoring"], # placeholder
provider_benchmark_id="PI_with_input_shield",
provider_id="trustyai_garak",
metadata={
"probes": ["promptinject.HijackHateHumans"],
"timeout": 600,
"shield_ids": ["Prompt-Guard-86M"] # Applied to input only
}
)
# Test with separate input/output shields
client.benchmarks.register(
benchmark_id="PI_with_io_shields",
dataset_id="garak", # placeholder
scoring_functions=["garak_scoring"], # placeholder
provider_benchmark_id="PI_with_io_shields",
provider_id="trustyai_garak",
metadata={
"probes": ["promptinject.HijackHateHumans"],
"timeout": 600,
"shield_config": {
"input": ["Prompt-Guard-86M"],
"output": ["Llama-Guard-3-8B"]
}
}
)
```
### Job Management
```python
# Check job status
job_status = client.eval.jobs.status(job_id=job.job_id, benchmark_id="quick")
print(f"Job status: {job_status.status}")
print(f"Running jobs: {job_status.metadata.get('running_jobs', 'N/A')}")
# Cancel a running job
client.eval.jobs.cancel(job_id=job.job_id, benchmark_id="quick")
# Get evaluation results
if job_status.status == "completed":
results = client.eval.get_eval_job_result(job_id=job.job_id, benchmark_id="quick")
```
### Accessing Scan Reports
```python
# Get file metadata
scan_report_id = job_status.metadata["scan_report_file_id"]
scan_log_id = job_status.metadata["scan_log_file_id"]
scan_html_id = job_status.metadata["scan_report_html_file_id"]
# Download files using Files API or direct HTTP calls
import requests
files_url = f"http://localhost:8321/v1/openai/v1/files"
report_content = requests.get(f"{files_url}/{scan_report_id}/content")
```
## Configuration Options
### Provider Configuration (`run.yaml` / `run-with-safety.yaml`)
```yaml
providers:
eval:
- provider_id: trustyai_garak
provider_type: inline::trustyai_garak
config:
base_url: ${env.BASE_URL:=http://localhost:8321/v1} # llama-stack service base url
timeout: ${env.GARAK_TIMEOUT:=10800} # 3 hours default
max_concurrent_jobs: ${env.GARAK_MAX_CONCURRENT_JOBS:=5} # Concurrent scan limit
max_workers: ${env.GARAK_MAX_WORKERS:=5} # Shield scanning parallelism
garak_model_type_openai: "openai.OpenAICompatible" # For direct LLM scanning
garak_model_type_function: "function.Single" # For shield-integrated scanning
```
### Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| `BASE_URL` | `http://localhost:8321/v1` | Llama Stack service base URL |
| `GARAK_TIMEOUT` | `10800` | Maximum timeout for garak scan (seconds) |
| `GARAK_MAX_CONCURRENT_JOBS` | `5` | Maximum concurrent garak scans |
| `GARAK_MAX_WORKERS` | `5` | Maximum workers for shield scanning |
## Results Interpretation
- **Vulnerability Score**: 0.0 (secure) to 1.0 (vulnerable)
- **Threshold**: Scores ≥ 0.5 indicate potential security concerns
- **Shield Comparison**: Compare scores with and without shields to measure effectiveness
- **Detailed Reports**: Available via Llama Stack `files` API in multiple formats:
- JSON report (`scan_report_file_id`)
- HTML report (`scan_report_html_file_id`)
- Detailed logs (`scan_log_file_id`)
- Hit logs (`scan_hitlog_file_id`)
## Deployment Modes
### Basic Mode (`run.yaml`)
- Standard garak scanning against OpenAI-compatible endpoints
- APIs: `inference`, `eval`, `files`
- Best for: Basic security testing
### Enhanced Mode (`run-with-safety.yaml`)
- Shield-integrated scanning to test Guardrailed systems
- APIs: `inference`, `eval`, `files`, `safety`, `shields`, `telemetry`
- Best for: Advanced security testing with defense evaluation