{"id":46380640,"url":"https://github.com/brainpolo/llmshield","last_synced_at":"2026-03-05T06:06:42.470Z","repository":{"id":275459867,"uuid":"925751074","full_name":"brainpolo/llmshield","owner":"brainpolo","description":"Shields your confidential data in LLM prompts from third party AI providers, allowing you to send with confidence without compromising security and privacy.","archived":false,"fork":false,"pushed_at":"2026-02-04T18:37:15.000Z","size":943,"stargazers_count":8,"open_issues_count":1,"forks_count":2,"subscribers_count":2,"default_branch":"main","last_synced_at":"2026-02-24T14:16:31.962Z","etag":null,"topics":["ai","llms","privacy","security"],"latest_commit_sha":null,"homepage":"https://brainpolo.github.io/llmshield/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/brainpolo.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-02-01T16:57:02.000Z","updated_at":"2026-02-05T21:31:01.000Z","dependencies_parsed_at":"2025-02-02T17:40:28.354Z","dependency_job_id":"9215bf0d-9f0b-43a4-bfe2-12901a50b21a","html_url":"https://github.com/brainpolo/llmshield","commit_stats":null,"previous_names":["brainpolo/llmshield"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/brainpolo/llmshield","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brainpolo%2Fllmshield","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brainpolo%2Fllmshield/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brainpolo%2Fllmshield/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brainpolo%2Fllmshield/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/brainpolo","download_url":"https://codeload.github.com/brainpolo/llmshield/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brainpolo%2Fllmshield/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30111797,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-05T03:40:26.266Z","status":"ssl_error","status_checked_at":"2026-03-05T03:39:15.902Z","response_time":93,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","llms","privacy","security"],"created_at":"2026-03-05T06:06:40.847Z","updated_at":"2026-03-05T06:06:42.465Z","avatar_url":"https://github.com/brainpolo.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\n# 🛡️ LLMShield\n\n[![Python 3.12 | 3.13 | 3.14](https://img.shields.io/badge/python-3.12%20%7C%203.13%20%7C%203.14-blue.svg)](https://www.python.org/downloads/)\n[![License: AGPL v3](https://img.shields.io/badge/License-AGPL%20v3-blue.svg)](https://www.gnu.org/licenses/agpl-3.0)\n[![Zero Dependencies](https://img.shields.io/badge/dependencies-zero-green.svg)](https://pypi.org/project/llmshield/)\n[![PyPI version](https://img.shields.io/pypi/v/llmshield.svg)](https://pypi.org/project/llmshield/)\n\n**A lightweight, zero-dependency Python library for protecting PII in LLM interactions.**\n\n_Designed for seamless integration into API-driven applications with minimal configuration._\n\n\u003c/div\u003e\n\n## Overview\n\nLLMShield delivers reliable detection and protection of sensitive information in English in LLM interactions by automatically redacting PII before sending context to the underlying LLM provider SDK. LLMShield then restores the redacted PII from the response.\n\n### Key Features\n\n| Core Capabilities | Advanced Features |\n|-------------------|-------------------|\n| **Zero Dependencies**\u003cbr/\u003ePure Python implementation | **Conversation Memory**\u003cbr/\u003eMulti-turn support with entity consistency |\n| **Entity Detection**\u003cbr/\u003eAutomatic identification using multi-layered analysis | **Streaming Support**\u003cbr/\u003eReal-time processing for streaming responses |\n| **Selective Protection**\u003cbr/\u003eGranular control over specific entity types | **Performance Optimised**\u003cbr/\u003eEfficient architecture with intelligent caching |\n| **Universal Compatibility**\u003cbr/\u003eWorks with major LLM providers | **Ready for Production**\u003cbr/\u003eReliable and secure for service integration |\n\n## Installation\n\n```bash\npip install llmshield\n```\n\n## Quick Start\n\n### Provider Compatibility\n\nLLMShield has been fully tested with these providers:\n\n| Provider | Status | Features |\n|----------|--------|----------|\n| **OpenAI Chat Completions API** | Full Support | Chat, Structured Output, Streaming, Tools |\n| **Anthropic Messages API** | Full Support | Chat, Structured Output, Streaming, Tools |\n| **Google Gemini API** | Full Support | Chat, Structured Output, Streaming, Tools |\n| **Cohere Chat API** | Full Support | Chat, Structured Output, Streaming, Tools |\n| **xAI Responses API** | Full Support | Chat, Structured Output, Streaming, Tools |\n| **OpenAI Compatibility Standard** | Full Support | Chat, Structured Output, Streaming, Tools |\n\n\u003e **Note:** Due to model behaviour differences, slight performance variations may occur. Tune parameters and PII filtration levels based on your requirements.\n\n### Basic Usage\n\n```python\nfrom openai import OpenAI\nfrom llmshield import LLMShield\n\n# Initialise with any LLM provider\nclient = OpenAI(api_key=\"your-api-key\")\nshield = LLMShield(llm_func=client.chat.completions.create)\n\n# Single request with automatic protection\nmessages = [\n    {\"role\": \"user\", \"content\": \"Draft an email to Sarah Johnson at sarah.j@techcorp.com\"}\n]\nresponse = shield.ask(model=\"gpt-4\", messages=messages)\n\n# Multi-turn conversation with entity consistency\nmessages = [\n    {\"role\": \"user\", \"content\": \"I'm John Smith from DataCorp\"},\n    {\"role\": \"assistant\", \"content\": \"Hello! How can I help you?\"},\n    {\"role\": \"user\", \"content\": \"Email me at john@datacorp.com\"}\n]\nresponse = shield.ask(model=\"gpt-4\", messages=messages)\n```\n\n### Streaming Support\n\n```python\nmessages = [\n    {\"role\": \"user\", \"content\": \"Generate a report about Jane Doe (jane@example.com)\"}\n]\nresponse_stream = shield.ask(model=\"gpt-4\", messages=messages, stream=True)\n\nfor chunk in response_stream:\n    print(chunk, end=\"\", flush=True)\n```\n\n### Manual Protection (Advanced)\n\nFor custom LLM integrations:\n\n```python\nshield = LLMShield()\n\n# Protect sensitive information\ncloaked_prompt, entity_map = shield.cloak(\n    \"Contact John Doe at john.doe@company.com or call +1-555-0123\"\n)\nprint(cloaked_prompt)\n# Output: \"Contact \u003cPERSON_0\u003e at \u003cEMAIL_1\u003e or call \u003cPHONE_2\u003e\"\n\n# Process with LLM\nllm_response = your_llm_function(cloaked_prompt)\n\n# Restore original entities\nrestored_response = shield.uncloak(llm_response, entity_map)\n```\n\n\u003e **Important:** Individual `cloak()` and `uncloak()` methods support single messages only and do not maintain conversation history. For multi-turn conversations with entity consistency across messages, use the `ask()` method.\n\n\u003e **Note:** PII cloaking only applies to text-based inputs (`str`, `list[str]`, and `messages`). Non-text inputs such as file paths, binary data, and Pydantic models are passed through to the LLM without cloaking, as PII detection requires scannable text content.\n\n## High-Level Data Flow\n\n\u003cdiv align=\"center\"\u003e\n\n```mermaid\ngraph LR\n    A[\"Raw Input\u003cbr/\u003e'Contact Dr. Smith at smith@hospital.org'\"] --\u003e B[\"Entity Detection\u003cbr/\u003ePERSON: Dr. Smith\u003cbr/\u003eEMAIL: smith@hospital.org\"]\n\n    B --\u003e C[\"PII Anonymisation\u003cbr/\u003e'Contact \u003cPERSON_0\u003e at \u003cEMAIL_1\u003e'\"]\n\n    C --\u003e D[\"LLM Processing\u003cbr/\u003eSafe text sent to\u003cbr/\u003eOpenAI, Claude, etc.\"]\n\n    D --\u003e E[\"Response Restoration\u003cbr/\u003ePlaceholders → Original PII\"]\n\n    E --\u003e F[\"Protected Output\u003cbr/\u003e'I'll help you contact Dr. Smith\u003cbr/\u003eat smith@hospital.org'\"]\n\n    %% Styling\n    classDef flowStyle fill:#f8f9fa,stroke:#495057,stroke-width:2px,color:#212529\n    classDef detectionStyle fill:#e8f4f8,stroke:#0c63e4,stroke-width:2px,color:#212529\n    classDef anonymisationStyle fill:#fff3cd,stroke:#856404,stroke-width:2px,color:#212529\n    classDef llmStyle fill:#f0e6ff,stroke:#6f42c1,stroke-width:2px,color:#212529\n    classDef restorationStyle fill:#d1ecf1,stroke:#0c5460,stroke-width:2px,color:#212529\n\n    class A flowStyle\n    class B detectionStyle\n    class C anonymisationStyle\n    class D llmStyle\n    class E restorationStyle\n    class F flowStyle\n```\n\n\u003c/div\u003e\n\n## Under the Hood: System Architecture\n\n\u003cdiv align=\"center\"\u003e\n\n```mermaid\ngraph LR\n    subgraph Input [\"Input Layer\"]\n        A[\"Textual Input\u003cbr/\u003eContains PII Entities\"]\n    end\n\n    subgraph Detection [\"Entity Detection Engine\"]\n        B[\"Configurable Waterfall Detection\u003cbr/\u003e• Phase 1: Pattern Recognition (RegEx)\u003cbr/\u003e• Phase 2: Numerical Validation (Luhn)\u003cbr/\u003e• Phase 3: Linguistic Analysis (NLP)\u003cbr/\u003e• Selective Type Filtering (EntityConfig)\u003cbr/\u003e9 Entity Types: PERSON, ORGANISATION, EMAIL, etc.\"]\n    end\n\n    subgraph Cloaking [\"Entity Anonymisation\"]\n        C[\"Classification \u0026 Tokenization\u003cbr/\u003ePII → Typed Placeholders\u003cbr/\u003eDeterministic Mapping\u003cbr/\u003eFormat: \u003cTYPE_INDEX\u003e\"]\n    end\n\n    subgraph Provider [\"LLM Provider Interface\"]\n        D[\"Provider-Agnostic API Gateway\u003cbr/\u003eSupported: OpenAI, Anthropic Claude,\u003cbr/\u003eGoogle Gemini, Azure OpenAI,\u003cbr/\u003eAWS Bedrock, Custom Endpoints\"]\n    end\n\n    subgraph Restoration [\"Entity De-anonymisation\"]\n        E[\"Inverse Token Mapping\u003cbr/\u003ePlaceholder Detection\u003cbr/\u003eBidirectional Text Reconstruction\u003cbr/\u003eIntegrity Preservation\"]\n    end\n\n    subgraph Output [\"Output Layer\"]\n        F[\"Reconstructed Response\u003cbr/\u003eOriginal PII Restored\u003cbr/\u003eStream-Compatible\"]\n    end\n\n    subgraph Memory [\"State Management System\"]\n        G[\"Singleton Dictionary Cache\u003cbr/\u003eLRU Conversation Cache\u003cbr/\u003eHash-Based Entity Mapping\u003cbr/\u003eO(1) Lookup Complexity\u003cbr/\u003e95% Memory Reduction\"]\n    end\n\n    %% Primary data flow\n    A --\u003e B\n    B --\u003e C\n    C --\u003e D\n    D --\u003e E\n    E --\u003e F\n\n    %% State management interactions\n    Memory -.-\u003e|\"Read/Write\u003cbr/\u003eEntity Maps\"| C\n    Memory -.-\u003e|\"Consistency\u003cbr/\u003eValidation\"| E\n\n    %% Styling\n    classDef inputStyle fill:#f8f9fa,stroke:#495057,stroke-width:2px,color:#212529\n    classDef detectionStyle fill:#e8f4f8,stroke:#0c63e4,stroke-width:2px,color:#212529\n    classDef cloakingStyle fill:#fff3cd,stroke:#856404,stroke-width:2px,color:#212529\n    classDef providerStyle fill:#f0e6ff,stroke:#6f42c1,stroke-width:2px,color:#212529\n    classDef restorationStyle fill:#d1ecf1,stroke:#0c5460,stroke-width:2px,color:#212529\n    classDef outputStyle fill:#d4edda,stroke:#155724,stroke-width:2px,color:#212529\n    classDef memoryStyle fill:#f8d7da,stroke:#721c24,stroke-width:2px,color:#212529\n\n    class A inputStyle\n    class B detectionStyle\n    class C cloakingStyle\n    class D providerStyle\n    class E restorationStyle\n    class F outputStyle\n    class G memoryStyle\n```\n\n\u003c/div\u003e\n\n## Entity Detection\n\nThe library detects and protects the following entity types:\n\n| Entity Type | Examples | Placeholder Format |\n|-------------|----------|--------------------|\n| **Person** | John Doe, Dr. Smith | `\u003cPERSON_0\u003e` |\n| **Organisation** | Acme Corp, NHS | `\u003cORGANISATION_0\u003e` |\n| **Place** | London, Main Street | `\u003cPLACE_0\u003e` |\n| **Email** | user@domain.com | `\u003cEMAIL_0\u003e` |\n| **Phone** | +1-555-0123 | `\u003cPHONE_0\u003e` |\n| **URL** | https://example.com | `\u003cURL_0\u003e` |\n| **Credit Card** | 4111-1111-1111-1111 | `\u003cCREDIT_CARD_0\u003e` |\n| **IP Address** | 192.168.1.1 | `\u003cIP_ADDRESS_0\u003e` |\n\n## Built-in Memory for Multi-Turn Conversations\n\n\u003e **Note:** LLMShield maintains entity consistency across conversation turns, ensuring the same person or organisation receives the same placeholder throughout the session. This memory is built-in with zero external dependencies.\n\n\u003cdiv align=\"center\"\u003e\n\n```mermaid\nsequenceDiagram\n    participant User\n    participant LLMShield\n    participant LLM\n\n    User-\u003e\u003eLLMShield: \"I'm John Doe from DataCorp\"\n    LLMShield-\u003e\u003eLLM: \"I'm \u003cPERSON_0\u003e from \u003cORGANISATION_1\u003e\"\n    LLM-\u003e\u003eUser: \"Hello John Doe! How can I help?\"\n\n    User-\u003e\u003eLLMShield: \"Email john.doe@datacorp.com\"\n    LLMShield-\u003e\u003eLLM: \"Email \u003cEMAIL_2\u003e\"\n    LLM-\u003e\u003eUser: \"I'll send it to john.doe@datacorp.com\"\n```\n\n\u003c/div\u003e\n\n## Provider Setup Examples\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eOpenAI Configuration\u003c/strong\u003e - Standard \u0026 Beta APIs with full feature support\u003c/summary\u003e\n\n```python\nfrom openai import OpenAI\nimport llmshield\n\n# Configuration constants\nOPENAI_API_KEY = \"your-openai-api-key\"\nAI_MAX_RETRIES = 3\nAI_TIMEOUT = 30.0\nOPENAI_MODEL = \"gpt-4o\"\n\n# Initialise OpenAI client\nopenai_client = OpenAI(\n    api_key=OPENAI_API_KEY,\n    max_retries=AI_MAX_RETRIES,\n    timeout=AI_TIMEOUT,\n)\n\n# Standard Chat Completions API\nopenai_shield = llmshield.LLMShield(\n    llm_func=openai_client.chat.completions.create\n)\n\n# Beta API with Structured Output\nopenai_beta_shield = llmshield.LLMShield(\n    llm_func=openai_client.beta.chat.completions.parse\n)\n\n# Usage examples\nmessages = [\n    {\"role\": \"user\", \"content\": \"Draft email to john.doe@company.com about Q4 report\"}\n]\n\nresponse = openai_shield.ask(model=OPENAI_MODEL, messages=messages)\n\n# Streaming with protection\nmessages = [\n    {\"role\": \"user\", \"content\": \"Generate customer report for Alice Smith\"}\n]\n\nfor chunk in openai_shield.ask(model=OPENAI_MODEL, messages=messages, stream=True):\n    print(chunk, end=\"\", flush=True)\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003exAI Configuration\u003c/strong\u003e - OpenAI-compatible with zero additional setup\u003c/summary\u003e\n\n```python\nfrom openai import OpenAI  # xAI uses OpenAI SDK\nimport llmshield\n\n# Configuration constants\nXAI_BASE_URL = \"https://api.x.ai/v1\"\nXAI_API_KEY = \"your-xai-api-key\"\nAI_MAX_RETRIES = 3\nAI_TIMEOUT = 30.0\nXAI_MODEL = \"grok-beta\"\n\n# Initialise xAI client\nxai_client = OpenAI(\n    base_url=XAI_BASE_URL,\n    api_key=XAI_API_KEY,\n    max_retries=AI_MAX_RETRIES,\n    timeout=AI_TIMEOUT,\n)\n\n# Create shield - identical to OpenAI setup\nxai_shield = llmshield.LLMShield(\n    llm_func=xai_client.chat.completions.create\n)\n\n# Usage with xAI models\nmessages = [\n    {\"role\": \"user\", \"content\": \"Analyse customer data: John Smith, john@company.com, +1-555-0123\"}\n]\n\nresponse = xai_shield.ask(model=XAI_MODEL, messages=messages)\n\n# Multi-turn conversations with entity consistency\nmessages = [\n    {\"role\": \"user\", \"content\": \"I'm Sarah Johnson from TechCorp\"},\n    {\"role\": \"assistant\", \"content\": \"Hello! How can I help you?\"},\n    {\"role\": \"user\", \"content\": \"Email me the report at sarah.j@techcorp.com\"}\n]\n\nresponse = xai_shield.ask(model=XAI_MODEL, messages=messages)\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eAnthropic Configuration\u003c/strong\u003e - Native Messages API with advanced tool support\u003c/summary\u003e\n\n```python\nfrom anthropic import Anthropic\nimport llmshield\n\n# Configuration constants\nANTHROPIC_API_KEY = \"your-anthropic-api-key\"\nAI_MAX_RETRIES = 3\nAI_TIMEOUT = 30.0\nANTHROPIC_MODEL = \"claude-3-5-sonnet-20241022\"\n\n# Initialise Anthropic client\nanthropic_client = Anthropic(\n    api_key=ANTHROPIC_API_KEY,\n    max_retries=AI_MAX_RETRIES,\n    timeout=AI_TIMEOUT,\n)\n\n# Create shield with Messages API\nanthropic_shield = llmshield.LLMShield(\n    llm_func=anthropic_client.messages.create\n)\n\n# Usage with Claude models\nmessages = [\n    {\"role\": \"user\", \"content\": \"Review customer info: Alice Cooper, alice@musiccorp.com\"}\n]\n\nresponse = anthropic_shield.ask(model=ANTHROPIC_MODEL, messages=messages)\n\n# Streaming support\nmessages = [\n    {\"role\": \"user\", \"content\": \"Generate report for client data\"}\n]\n\nfor chunk in anthropic_shield.ask(model=ANTHROPIC_MODEL, messages=messages, stream=True):\n    print(chunk, end=\"\", flush=True)\n\n# Tool usage with PII protection\nPHONE_TOOL_SCHEMA = {\n    \"name\": \"make_call\",\n    \"description\": \"Make a phone call\",\n    \"input_schema\": {\n        \"type\": \"object\",\n        \"properties\": {\"phone\": {\"type\": \"string\"}},\n        \"required\": [\"phone\"]\n    }\n}\n\nmessages = [\n    {\"role\": \"user\", \"content\": \"Call John Doe at +1-555-0123\"}\n]\n\nresponse = anthropic_shield.ask(\n    model=ANTHROPIC_MODEL,\n    messages=messages,\n    tools=[PHONE_TOOL_SCHEMA]\n)\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eOpenAI-Compatible Providers\u003c/strong\u003e - Universal setup for any OpenAI-compatible API\u003c/summary\u003e\n\n```python\nfrom openai import OpenAI\nimport llmshield\n\n# Configuration constants - customize for your provider\nPROVIDER_BASE_URL = \"https://api.your-provider.com/v1\"\nPROVIDER_API_KEY = \"your-provider-api-key\"\nAI_MAX_RETRIES = 3\nAI_TIMEOUT = 30.0\nPROVIDER_MODEL = \"provider-specific-model\"\n\n# Generic OpenAI-compatible provider setup\ncompatible_client = OpenAI(\n    base_url=PROVIDER_BASE_URL,\n    api_key=PROVIDER_API_KEY,\n    max_retries=AI_MAX_RETRIES,\n    timeout=AI_TIMEOUT,\n)\n\n# Create shield - same interface for all providers\nprovider_shield = llmshield.LLMShield(\n    llm_func=compatible_client.chat.completions.create\n)\n\n# Usage with any OpenAI-compatible provider\nmessages = [\n    {\"role\": \"user\", \"content\": \"Process data: Emma Wilson, emma@startup.io, 192.168.1.100\"}\n]\n\nresponse = provider_shield.ask(model=PROVIDER_MODEL, messages=messages)\n```\n\n**Compatible Providers Include:**\n\n- **Together AI** • **Fireworks AI** • **Anyscale** • **Replicate**\n- **Groq** • **Perplexity** • **DeepInfra** • **OpenRouter**\n- **Local deployments** (Ollama, vLLM, etc.)\n\n\u003c/details\u003e\n\n\u003e **Zero Learning Curve:** Same `LLMShield` interface works across all providers. Switch between OpenAI, xAI, Anthropic, and compatible providers without changing your code structure.\n\n## Configuration\n\n### Custom Delimiters\n\n```python\nshield = LLMShield(\n    start_delimiter='[[',\n    end_delimiter=']]'\n)\n# Entities appear as [[PERSON_0]], [[EMAIL_1]], etc.\n```\n\n### Conversation Caching\n\nLLMShield implements an **LRU (Least Recently Used) cache** to maintain entity consistency across multi-turn conversations. The cache stores entity mappings for conversation histories, ensuring that all entities (persons, organisations, emails, phones, etc.) mentioned in different messages receive the same placeholders.\n\n```python\nshield = LLMShield(\n    llm_func=your_llm_function,\n    max_cache_size=10_000  # Default: 10,000\n)\n```\n\n#### Cache Sizing Guidelines\n\n\u003ctable\u003e\n\u003ctr\u003e\n\u003ctd width=\"33%\"\u003e\n\n**Small Applications**\n\n- \u003c 1,000 concurrent conversations\n- `max_cache_size=1000-5000`\n- ~500KB-1MB memory\n\n\u003c/td\u003e\n\u003ctd width=\"33%\"\u003e\n\n**Medium Applications**\n\n- 1,000-10,000 concurrent conversations\n- `max_cache_size=5000-10000`\n- ~5MB-10MB memory\n\n\u003c/td\u003e\n\u003ctd width=\"33%\"\u003e\n\n**Large Applications**\n\n- \u003e 100,000 concurrent conversations\n- `max_cache_size=50000-100000`\n- ~50MB-100MB memory\n\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/table\u003e\n\n**Memory Calculation:** Each conversation stores a dictionary mapping PII entities to their placeholders. With an average of 20 PII entities per conversation, each cache entry uses approximately 1-2KB of memory (entity text + placeholder mappings + metadata).\n\n**Cache Strategy Decision Tree:**\n\n\u003cdiv align=\"center\"\u003e\n\n```mermaid\nflowchart TD\n    A[Cache Configuration] --\u003e B{Concurrent conversations\u003cbr/\u003eper worker?}\n\n    B --\u003e|\u003c 10,000| C[Normal Cache\u003cbr/\u003emax_cache_size: 10000\u003cbr/\u003eMemory: ~10MB]\n    B --\u003e|10,000 - 50,000| D[Large Cache\u003cbr/\u003emax_cache_size: 50000\u003cbr/\u003eMemory: ~50MB]\n    B --\u003e|\u003e 50,000| E[Enterprise Cache\u003cbr/\u003emax_cache_size: 100000\u003cbr/\u003eMemory: ~100MB]\n\n    C --\u003e F{Need user-specific\u003cbr/\u003ecaching?}\n    D --\u003e F\n    E --\u003e F\n\n    F --\u003e|No| G[Single LLMShield instance\u003cbr/\u003ewith chosen cache size]\n    F --\u003e|Yes| H[Multiple LLMShield instances\u003cbr/\u003epartitioned by user type]\n```\n\n\u003c/div\u003e\n\n**Per-Shield Caching Strategy:**\n\nEach `LLMShield` instance maintains its own independent cache, providing flexibility for:\n\n- **Demographic Partitioning**: Separate caches for different user types (premium vs. free, geographic regions, etc.)\n- **Use Case Isolation**: Different cache strategies for customer service vs. internal tools vs. public APIs\n- **Memory Allocation**: Distribute memory budgets across multiple shield instances based on priority\n- **Custom Strategies**: Implement specialized caching logic for specific workflows or data sensitivity levels\n\n**Cache Effectiveness Factors:**\n\n- **Short-lived workers**: Cache benefits diminish with frequent recycling - prioritize memory efficiency\n- **Long-lived workers**: Larger caches significantly reduce \"cold start\" latency for entity detection\n- **Worker density**: Many workers sharing server resources require smaller per-worker caches\n- **Traffic variability**: Spiky loads benefit from larger caches to handle burst scenarios\n\n\u003e **Performance Impact:** Cache hit rates above 80% significantly improve performance for multi-turn conversations by avoiding re-detection of previously seen entities. Size your cache based on expected concurrent \"fresh\" conversations that your server workers are actively serving, not total daily volume.\n\n### Selective PII Detection\n\n\u003e **New in v2.0+:** LLMShield supports chaining for selective entity detection. This allows you to selectively disable specific types of PII protection based on your requirements while maintaining a clean, readable configuration.\n\n#### Factory Methods for Common Configurations\n\n```python\nfrom llmshield import LLMShield\n\n# Opt-In: Enable ALL detection (including technical CONCEPTS like API/SQL)\nshield = LLMShield.enable_all()\n\n# Disable location-based entities (PLACE, IP_ADDRESS, URL)\nshield = LLMShield.disable_locations()\n\n# Baseline Chaining (Start with everything, then subtract)\nshield = LLMShield.enable_all() \\\n            .without_locations() \\\n            .without_persons() \\\n            .without_concepts()\n\n# Disable contact information (EMAIL, PHONE)\nshield = LLMShield.disable_contacts()\n\n# Enable only financial entities (CREDIT_CARD)\nshield = LLMShield.only_financial()\n```\n\n#### Custom Entity Configuration\n\nFor fine-grained control, use the `EntityConfig` class:\n\n```python\nfrom llmshield import LLMShield\nfrom llmshield.entity_detector import EntityConfig, EntityType\n\n# Create custom configuration\nconfig = EntityConfig().with_disabled(\n    EntityType.EMAIL,      # Disable email detection\n    EntityType.PHONE,      # Disable phone detection\n    EntityType.URL         # Disable URL detection\n)\n\nshield = LLMShield(entity_config=config)\n\n# Or enable only specific types\nconfig = EntityConfig().with_enabled(\n    EntityType.PERSON,     # Only detect persons\n    EntityType.CREDIT_CARD # Only detect credit cards\n)\n\nshield = LLMShield(entity_config=config)\n```\n\n#### Available Entity Types\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eAll configurable entity types\u003c/strong\u003e\u003c/summary\u003e\n\n```python\nEntityType.PERSON          # Names (John Doe, Dr. Smith)\nEntityType.ORGANISATION    # Companies (Microsoft Corp)\nEntityType.PLACE           # Locations (London, Main Street)\nEntityType.EMAIL           # Email addresses\nEntityType.PHONE           # Phone numbers\nEntityType.URL             # Web addresses\nEntityType.CREDIT_CARD     # Credit card numbers\nEntityType.IP_ADDRESS      # IP addresses\nEntityType.CONCEPT         # Technical acronyms (API, SQL) - [OPT-IN ONLY]\n```\n\n\u003e **Note on CONCEPT Detection**: In v2.0+, `EntityType.CONCEPT` is **disabled by default** to prevent false positives with technical documentation. Use `.enable_all()` or `.with_enabled(EntityType.CONCEPT)` to activate it.\n\n\u003c/details\u003e\n\n#### Using Selective Detection with ask()\n\nSelective detection works seamlessly with the `ask()` method for end-to-end protection:\n\n```python\nfrom openai import OpenAI\nfrom llmshield import LLMShield\n\nclient = OpenAI(api_key=\"your-api-key\")\n\n# Create shield that ignores URLs and IP addresses\nshield = LLMShield.disable_locations(llm_func=client.chat.completions.create)\n\n# This will protect names and emails but allow URLs through\nmessages = [\n    {\"role\": \"user\", \"content\": \"Contact John Doe at john@company.com or visit https://company.com\"}\n]\nresponse = shield.ask(model=\"gpt-4\", messages=messages)\n# Cloaked: \"Contact \u003cPERSON_0\u003e at \u003cEMAIL_1\u003e or visit https://company.com\"\n```\n\n#### Performance Benefits\n\nSelective detection can improve performance by:\n\n- **Reducing detection overhead** for unused entity types\n- **Minimizing placeholder generation** and entity mapping\n- **Faster text processing** with fewer regex operations\n\n\u003ctable\u003e\n\u003ctr\u003e\n\u003ctd width=\"50%\"\u003e\n\n**Recommended Configurations:**\n\n- **Customer service**: Disable `PLACE` and `URL` if not handling location data\n- **Financial applications**: Use `only_financial()` for credit card protection only\n- **Internal tools**: Disable `PERSON` detection if processing system logs\n- **Public APIs**: Enable all types for maximum protection\n\n\u003c/td\u003e\n\u003ctd width=\"50%\"\u003e\n\n**Performance Impact:**\n\n- **Memory usage**: 20-40% reduction\n- **Processing speed**: 15-30% improvement\n- **Cache efficiency**: Higher hit rates\n- **Latency**: Lower response times\n\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/table\u003e\n\n## Language Support\n\nLLMShield currently supports English only with about ~95% accuracy.\nWe plan to add support for other languages in the future.\n\n## Development\n\n### Setup\n\nRequires [uv](https://docs.astral.sh/uv/) for dependency management.\n\n```bash\ngit clone https://github.com/brainpolo/llmshield.git\ncd llmshield\nuv sync\n```\n\nThis creates a `.venv/` with Python 3.14 and installs all dev dependencies.\n\n### Testing\n\n```bash\n# Run all tests\nmake tests\n\n# Generate coverage report\nmake coverage\n\n# Run linting and formatting\nmake ruff\n\n# Check documentation coverage\nmake doc-coverage\n```\n\n### Building and Publishing\n\n```bash\n# Build the package\nmake build\n```\n\n### Publishing to PyPI (maintainers only)\n\n1. **Update version** in `pyproject.toml`\n2. **Run quality checks**:\n\n   ```bash\n   make tests\n   make coverage\n   make ruff\n   ```\n\n3. **Build and publish**:\n\n   ```bash\n   make build\n   uv run twine upload dist/*\n   ```\n\n## Security Considerations\n\n| Security Aspect | Recommendation |\n|-----------------|----------------|\n| **Validation** | Validate cloaked outputs before LLM transmission |\n| **Storage** | Securely store entity mappings for persistent sessions |\n| **Delimiters** | Choose delimiters that don't conflict with your data format |\n| **Input Validation** | Implement comprehensive input validation |\n| **Auditing** | Regularly audit entity detection accuracy |\n\n## Contributing\n\nWe welcome contributions! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for development guidelines and contribution process.\n\n## License\n\nThis project is licensed under the **GNU Affero General Public License v3.0** - see [LICENSE.txt](LICENSE.txt) for details.\n\n## Support\n\nFor questions, issues, or feature requests:\n\n- **GitHub Issues**: [Report bugs or request features](https://github.com/brainpolo/llmshield/issues)\n- **Documentation**: [Full documentation](https://llmshield.readthedocs.io)\n- **Community**: [Discussions and support](https://github.com/brainpolo/llmshield/discussions)\n\n## Maintainers\n\n- **Aditya Dedhia** ([@adityadedhia](https://github.com/adityadedhia))\n- **Sebastian Andres** ([@S-andres0694](https://github.com/S-andres0694))\n\n## Production Usage\n\nLLMShield is used in production environments by [brainful.one](https://brainful.one) to protect user data confidentiality.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbrainpolo%2Fllmshield","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbrainpolo%2Fllmshield","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbrainpolo%2Fllmshield/lists"}