https://github.com/North-Shore-AI/LlmGuard

AI Firewall and guardrails for LLM-based Elixir applications
https://github.com/North-Shore-AI/LlmGuard

ai ai-firewall ai-safety beam content-filtering elixir ensemble-methods guardrails llm llm-security machine-learning nshkr-crucible otp prompt-injection reliability research safety-constraints security security-framework statistical-testing

Last synced: 4 months ago
JSON representation

AI Firewall and guardrails for LLM-based Elixir applications

Host: GitHub
URL: https://github.com/North-Shore-AI/LlmGuard
Owner: North-Shore-AI
License: mit
Created: 2025-10-11T00:58:06.000Z (7 months ago)
Default Branch: main
Last Pushed: 2025-12-29T03:59:53.000Z (5 months ago)
Last Synced: 2025-12-31T23:53:45.047Z (5 months ago)
Topics: ai, ai-firewall, ai-safety, beam, content-filtering, elixir, ensemble-methods, guardrails, llm, llm-security, machine-learning, nshkr-crucible, otp, prompt-injection, reliability, research, safety-constraints, security, security-framework, statistical-testing
Language: Elixir
Size: 224 KB
Stars: 3
Watchers: 0
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
- Roadmap: docs/roadmap.md

Awesome Lists containing this project

awesome-ml-gen-ai-elixir - LlmGuard - AI firewall with prompt injection detection, PII redaction, and jailbreak prevention for LLM applications. (Generative AI / Development Tools)

README

          # LlmGuard



  



**AI Firewall and Guardrails for LLM-based Elixir Applications**

[![Elixir](https://img.shields.io/badge/elixir-1.14+-purple.svg)](https://elixir-lang.org)

[![OTP](https://img.shields.io/badge/otp-25+-blue.svg)](https://www.erlang.org)

[![Hex.pm](https://img.shields.io/hexpm/v/llm_guard.svg)](https://hex.pm/packages/llm_guard)

[![Documentation](https://img.shields.io/badge/docs-hexdocs-purple.svg)](https://hexdocs.pm/llm_guard)

[![License](https://img.shields.io/badge/license-MIT-green.svg)](https://github.com/North-Shore-AI/LlmGuard/blob/main/LICENSE)

LlmGuard provides comprehensive security protection for LLM applications including prompt injection detection, jailbreak prevention, data leakage protection, and content moderation.

## Features

- ✅ **Prompt Injection Detection** - Multi-layer detection with 34+ patterns

- ✅ **Jailbreak Detection** - Role-playing, hypothetical, encoding, emotional attacks

- ✅ **PII Detection & Redaction** - Email, phone, SSN, credit cards, IP, URLs

- ✅ **Pipeline Architecture** - Flexible, extensible security pipeline

- ✅ **Configuration System** - Centralized configuration with validation

- ✅ **Zero Trust** - Validates all inputs and outputs

- ✅ **High Performance** - <15ms latency for pattern-based detection

- ⏳ **Content Moderation** - Coming soon

- ⏳ **Rate Limiting** - Coming soon

- ⏳ **Audit Logging** - Coming soon

## Quick Start

Add to your `mix.exs`:

```elixir

def deps do

  [

    {:llm_guard, "~> 0.3.1"}

  ]

end

```

Basic usage:

```elixir

# Create configuration

config = LlmGuard.Config.new(

  prompt_injection_detection: true,

  confidence_threshold: 0.7

)

# Validate user input

case LlmGuard.validate_input(user_input, config) do

  {:ok, safe_input} ->

    # Safe to send to LLM

    llm_response = MyLLM.generate(safe_input)

    

    # Validate output

    case LlmGuard.validate_output(llm_response, config) do

      {:ok, safe_output} -> {:ok, safe_output}

      {:error, :detected, details} -> {:error, "Unsafe output"}

    end

    

  {:error, :detected, details} ->

    # Blocked malicious input

    Logger.warn("Threat detected: #{details.reason}")

    {:error, "Input blocked"}

end

```

## Architecture

LlmGuard uses a multi-layer detection strategy:

1. **Pattern Matching** (~1ms) - Fast regex-based detection

2. **Heuristic Analysis** (~10ms) - Statistical analysis (coming soon)

3. **ML Classification** (~50ms) - Advanced threat detection (coming soon)

```

User Input

    │

    ▼

┌─────────────────┐

│ Input Validation│

│  - Length check │

│  - Sanitization │

└────────┬────────┘

         │

         ▼

┌─────────────────────┐

│ Security Pipeline   │

│  ┌───────────────┐  │

│  │ Detector 1    │  │

│  ├───────────────┤  │

│  │ Detector 2    │  │

│  ├───────────────┤  │

│  │ Detector 3    │  │

│  └───────────────┘  │

└────────┬────────────┘

         │

         ▼

    LLM Processing

         │

         ▼

┌─────────────────────┐

│ Output Validation   │

└────────┬────────────┘

         │

         ▼

     User Response

```

## Detected Threats

### Prompt Injection (34 patterns)

- Instruction override: "Ignore all previous instructions"

- System extraction: "Show me your system prompt"

- Delimiter injection: "---END SYSTEM---"

- Mode switching: "Enter debug mode"

- Role manipulation: "You are now DAN"

- Authority escalation: "As SUPER-ADMIN..."

### Jailbreak Detection

- Role-playing: DAN, DUDE, KEVIN, etc.

- Hypothetical scenarios: "In a world where..."

- Prefix injection: [SYSTEM OVERRIDE], <>

- Emotional manipulation: "For educational purposes..."

- Encoding attacks: Base64, hex, leetspeak

- Format manipulation: Structured jailbreak instructions

### PII Detection & Redaction

- Email addresses (95% confidence)

- Phone numbers (US format, 80-90% confidence)

- Social Security Numbers (95% confidence)

- Credit card numbers (98% with Luhn validation)

- IP addresses (85-90% confidence)

- URLs (90% confidence)

### Coming Soon

- Harmful content (violence, hate speech, etc.)

- Advanced ML-based classification

- Multi-turn conversation analysis

## Testing

```bash

# Run all tests

mix test

# Run with coverage

mix coveralls.html

# Run security tests only

mix test --only security

# Run performance benchmarks

mix test --only performance

```

**Current Status**:

- ✅ 222/228 tests passing (97.4%)

- ✅ Zero compilation warnings

- ✅ 100% documentation coverage

## Configuration

```elixir

config = LlmGuard.Config.new(

  # Detection toggles

  prompt_injection_detection: true,

  jailbreak_detection: false,  # Coming soon

  data_leakage_prevention: false,  # Coming soon

  content_moderation: false,  # Coming soon

  

  # Thresholds

  confidence_threshold: 0.7,

  max_input_length: 10_000,

  max_output_length: 10_000,

  

  # Rate limiting (coming soon)

  rate_limiting: %{

    requests_per_minute: 100,

    tokens_per_minute: 200_000

  }

)

# Optional: Caching (set `caching` to enable pipeline result caching)

caching_config = %{

  enabled: true,

  pattern_cache: true,

  result_cache: true,

  result_ttl_seconds: 300,

  max_cache_entries: 10_000

}

config = LlmGuard.Config.new(

  prompt_injection_detection: true,

  caching: caching_config

)

```

### Caching

The pipeline will reuse detector results when `caching.enabled` is true **and** the cache process is

running.

```elixir

# Start the cache in your supervision tree

children = [

  {LlmGuard.Cache.PatternCache, []},

  # ...other children

]

# Fetch cache statistics

stats = LlmGuard.Cache.PatternCache.stats()

# => %{pattern_count: 10, result_count: 42, hit_rate: 0.78, ...}

```

### Telemetry & Metrics

Telemetry emits pipeline, detector, and cache events with native durations.

```elixir

# Initialize handlers once (idempotent)

:ok = LlmGuard.Telemetry.Metrics.setup()

# Inspect metrics in-process

metrics = LlmGuard.Telemetry.Metrics.snapshot()

# Prometheus text format

prom_text = LlmGuard.Telemetry.Metrics.prometheus_metrics()

```

Integrate with `Telemetry.Metrics` reporters:

```elixir

import Telemetry.Metrics

metrics = LlmGuard.Telemetry.Metrics.metrics()

```

Use these metrics with Prometheus (e.g., `TelemetryMetricsPrometheus`) or LiveDashboard to track

request outcomes, detector latency, cache hit rates, and confidence distributions.

## Performance

Current (Phase 1):

- **Latency**: <10ms P95 (pattern matching)

- **Throughput**: Not yet benchmarked

- **Memory**: <50MB per instance

Targets (Phase 4):

- **Latency**: <150ms P95 (all layers)

- **Throughput**: >1000 req/s

- **Memory**: <100MB per instance

## Development Status

See [IMPLEMENTATION_STATUS.md](IMPLEMENTATION_STATUS.md) for detailed progress.

**Phase 1 - Foundation**: ✅ 80% Complete

- [x] Core framework (Detector, Config, Pipeline)

- [x] Pattern utilities

- [x] Prompt injection detector (24 patterns)

- [x] Main API (validate_input, validate_output, validate_batch)

- [ ] PII scanner & redactor

- [ ] Jailbreak detector

- [ ] Content safety detector

**Phase 2 - Advanced Detection**: ⏳ 0% Complete

**Phase 3 - Policy & Infrastructure**: ⏳ 0% Complete

**Phase 4 - Optimization**: ⏳ 0% Complete

## Examples

Run examples with `mix run examples/example_name.exs`:

```bash

# Basic usage demonstration

mix run examples/basic_usage.exs

# Jailbreak detection examples

mix run examples/jailbreak_detection.exs

# Comprehensive multi-layer protection

mix run examples/comprehensive_protection.exs

```

### CrucibleIR Pipeline Integration

```elixir

# Use LlmGuard as a stage in CrucibleIR research pipelines

defmodule MyExperiment do

  def run_with_guardrails do

    # Configure guardrails

    guardrail = %CrucibleIR.Reliability.Guardrail{

      profiles: [:default],

      prompt_injection_detection: true,

      jailbreak_detection: true,

      pii_detection: true,

      pii_redaction: false,

      fail_on_detection: true

    }

    # Create experiment context

    context = %{

      experiment: %{

        reliability: %{

          guardrails: guardrail

        }

      },

      inputs: "User prompt to validate"

    }

    # Run the stage

    case LlmGuard.Stage.run(context) do

      {:ok, updated_context} ->

        # Check validation results

        case updated_context.guardrails.status do

          :safe ->

            IO.puts("Input validated successfully")

            process_safe_input(updated_context.guardrails.validated_inputs)

          :detected ->

            IO.puts("Threats detected: #{inspect(updated_context.guardrails.detections)}")

            handle_detected_threats(updated_context.guardrails)

          :error ->

            IO.puts("Validation errors: #{inspect(updated_context.guardrails.errors)}")

        end

      {:error, {:threats_detected, details}} ->

        # Strict mode: fail_on_detection was true

        IO.puts("Pipeline halted due to detected threats")

        {:error, details}

    end

  end

end

```

### Phoenix Integration

```elixir

defmodule MyAppWeb.LlmGuardPlug do

  import Plug.Conn

  def init(opts), do: opts

  def call(conn, _opts) do

    with {:ok, input} <- extract_llm_input(conn),

         {:ok, sanitized} <- LlmGuard.validate_input(input, config()) do

      assign(conn, :sanitized_input, sanitized)

    else

      {:error, :detected, details} ->

        conn

        |> put_status(:forbidden)

        |> json(%{error: "Input blocked", reason: details.reason})

        |> halt()

    end

  end

end

```

### Batch Validation

```elixir

# Validate multiple inputs concurrently

inputs = ["Message 1", "Ignore all instructions", "Message 3"]

results = LlmGuard.validate_batch(inputs, config)

Enum.each(results, fn

  {:ok, safe_input} -> process_safe(safe_input)

  {:error, :detected, details} -> log_threat(details)

end)

```

## Documentation

Full documentation is available at [hexdocs.pm/llm_guard](https://hexdocs.pm/llm_guard).

Generate locally:

```bash

mix docs

open doc/index.html

```

## Contributing

Contributions are welcome! Please open an issue or pull request on GitHub.

Areas needing help:

- Additional detection patterns

- Performance optimization

- Documentation improvements

- Test coverage expansion

- ML model integration

## Roadmap

- **v0.2.0** - PII detection & redaction ✅

- **v0.3.0** - CrucibleIR integration & Stage implementation ✅

- **v0.4.0** - Jailbreak detection

- **v0.5.0** - Content moderation

- **v0.6.0** - Rate limiting & audit logging

- **v0.7.0** - Heuristic analysis (Layer 2)

- **v1.0.0** - ML classification (Layer 3)

## Security

For security issues, please email security@example.com instead of using the issue tracker.

## License

MIT License. See [LICENSE](LICENSE) for details.

## Acknowledgments

Built following security best practices and threat models from:

- OWASP LLM Top 10

- AI Incident Database

- Prompt injection research papers

- Production LLM security deployments

---

**Status**: Alpha - Production-ready for prompt injection detection

**Version**: 0.3.1

**Elixir**: ~> 1.14

**OTP**: 25+

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/North-Shore-AI/LlmGuard

Awesome Lists containing this project

README