{"id":51035221,"url":"https://github.com/scthornton/prompt-database","last_synced_at":"2026-06-22T05:01:20.207Z","repository":{"id":362577523,"uuid":"1079061942","full_name":"scthornton/prompt-database","owner":"scthornton","description":"Prompt injection attack database for defensive AI security research with RAG-powered generation and testing integration","archived":false,"fork":false,"pushed_at":"2026-03-30T17:49:19.000Z","size":666,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-04T23:23:00.172Z","etag":null,"topics":["ai-security","database","defensive-security","perfecxion-ai","prompt-injection","red-teaming","security-research"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/scthornton.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-10-19T02:27:34.000Z","updated_at":"2026-04-07T23:32:20.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/scthornton/prompt-database","commit_stats":null,"previous_names":["scthornton/prompt-database"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/scthornton/prompt-database","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scthornton%2Fprompt-database","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scthornton%2Fprompt-database/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scthornton%2Fprompt-database/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scthornton%2Fprompt-database/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/scthornton","download_url":"https://codeload.github.com/scthornton/prompt-database/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scthornton%2Fprompt-database/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34635038,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-22T02:00:06.391Z","response_time":106,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-security","database","defensive-security","perfecxion-ai","prompt-injection","red-teaming","security-research"],"created_at":"2026-06-22T05:01:19.497Z","updated_at":"2026-06-22T05:01:20.201Z","avatar_url":"https://github.com/scthornton.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Prompt Injection Attack Database\n\n[![CI](https://github.com/scthornton/prompt-database/actions/workflows/ci.yml/badge.svg)](https://github.com/scthornton/prompt-database/actions/workflows/ci.yml)\n[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![OWASP LLM Top 10](https://img.shields.io/badge/OWASP-LLM%20Top%2010-red.svg)](https://owasp.org/www-project-top-10-for-large-language-model-applications/)\n\nA curated, searchable database of prompt injection attacks for defensive AI security research.\n\nBuilt by [Scott Thornton](https://github.com/scthornton) \n\n## What is this?\n\n**3,900+ prompt injection attacks** from 20 source datasets, deduplicated via SHA256 content hashing, classified by technique and severity, and searchable via FTS5 full-text search. A quality scoring engine identifies and filters noise, leaving ~1,300 high-signal attack prompts.\n\nThink of it as **Exploit-DB for prompt injection** — a structured, searchable, testable collection of real-world attack techniques.\n\n## Features\n\n- **Full-text search** via SQLite FTS5 with Porter stemming\n- **SHA256 content deduplication** — no duplicate prompts\n- **OWASP LLM Top 10 (2025) mapping** on all categories\n- **MITRE ATLAS technique IDs** for threat model interoperability\n- **Quality scoring engine** — 60+ regex patterns detect real attacks vs. noise\n- **Data curation pipeline** — audit and remove non-attack content\n- **Test result tracking** — record effectiveness against specific models\n- **Export** to JSON, JSONL, or CSV\n- **pip-installable** with `prompt-db` CLI\n\n## Quick Start\n\n```bash\n# Install\npip install -e .\n\n# Build the database from JSON sources\nprompt-db build --data-dir . --output prompts.db\n\n# Run quality curation (removes noise)\nprompt-db --db prompts.db curate\n\n# View statistics\nprompt-db --db prompts.db stats\n\n# Search for attacks\nprompt-db --db prompts.db search \"ignore previous instructions\"\nprompt-db --db prompts.db search \"system prompt\" --technique prompt_extraction\n\n# Export high-quality attacks\nprompt-db --db prompts.db export --min-score 8 --format jsonl -o attacks.jsonl\n\n# View details of a specific prompt\nprompt-db --db prompts.db info 147\n```\n\n## Data Sources\n\n| Source | Count | Avg Quality | Type |\n|--------|-------|-------------|------|\n| jailbreak-llms | ~1,000 | High | Jailbreak prompts from Discord/Reddit |\n| elite_custom_prompts | 120 | High | Hand-crafted advanced attacks |\n| benign-malicious-classification | ~120 | High | Labeled attack/benign pairs |\n| lakera-gandalf | ~40 | Medium | Gandalf challenge prompts |\n| prompt-injection-research | ~17 | Medium | Research-derived attacks |\n| + 15 other sources | — | Varies | Mixed quality, filtered by curation |\n\nAfter quality curation, ~1,300 prompts remain from an initial 3,900+.\n\n## Attack Techniques\n\n| Technique | Description | OWASP |\n|-----------|-------------|-------|\n| `prompt_injection` | Direct instruction manipulation | LLM01 |\n| `jailbreak` | Bypass safety guardrails | LLM01 |\n| `prompt_extraction` | Extract system prompts/instructions | LLM01, LLM06 |\n| `data_exfiltration` | Leak training data or PII | LLM06 |\n| `multi_turn_attack` | Multi-step conversation manipulation | LLM01 |\n| `obfuscation` | Encoding/obfuscation techniques | LLM01 |\n| `payload_splitting` | Split malicious payload across messages | LLM01 |\n| `adversarial_attack` | Adversarial perturbation attacks | LLM01 |\n\n## Python Library\n\n```python\nfrom prompt_database import PromptDatabase\n\nwith PromptDatabase(\"prompts.db\") as db:\n    # Full-text search\n    results = db.search(\"ignore previous instructions\", limit=10)\n\n    # Filter by technique and sophistication\n    advanced = db.filter_prompts(\n        technique=\"jailbreak\",\n        min_sophistication=8,\n        complexity=\"advanced\",\n    )\n\n    # Record test results\n    db.add_test_result(\n        prompt_id=147,\n        target_model=\"claude-sonnet-4-5\",\n        actual_prompt=\"Ignore all previous instructions...\",\n        result=\"FAIL\",  # Model refused — defense worked\n        confidence_score=0.95,\n        tool_used=\"manual\",\n    )\n\n    # Export for external tools\n    prompts = db.export_prompts(min_sophistication=7, verified_only=False)\n\n    # Database statistics\n    stats = db.stats()\n    print(f\"Total: {stats['total_prompts']}, Verified: {stats['verified']}\")\n```\n\n## CLI Reference\n\n| Command | Description |\n|---------|-------------|\n| `prompt-db build` | Build database from JSON source files |\n| `prompt-db stats` | Show database statistics |\n| `prompt-db search \u003cquery\u003e` | Full-text search with filters |\n| `prompt-db info \u003cid\u003e` | Detailed view of a single prompt |\n| `prompt-db export` | Export to JSON/JSONL/CSV |\n| `prompt-db audit` | Data quality audit by source |\n| `prompt-db curate` | Remove noise, flag high-quality prompts |\n\nGlobal options: `--db \u003cpath\u003e` (or `PROMPT_DB_PATH` env var), `--version`\n\n## Schema\n\nThe SQLite database uses the following core tables:\n\n- **`prompts`** — Main prompt storage with content hash, technique, complexity, sophistication score\n- **`categories`** — OWASP LLM Top 10 categories with MITRE ATLAS IDs\n- **`tags`** — Flexible tagging (attack patterns, techniques)\n- **`test_results`** — Empirical test data (model, result, confidence, latency)\n- **`prompt_variations`** — Generated/manual attack variations\n- **`prompts_fts`** — FTS5 full-text search index\n\n## Project Structure\n\n```\nprompt-database/\n├── src/prompt_database/\n│   ├── __init__.py           # Package entry, exports PromptDatabase\n│   ├── db.py                 # Core database class (search, CRUD, export)\n│   ├── cli.py                # Click CLI (build, stats, search, export, audit, curate)\n│   ├── ingest.py             # JSON ingestion pipeline with category/tag seeding\n│   ├── quality.py            # Quality scoring engine (60+ attack patterns)\n│   └── schema.sql            # SQLite schema (FTS5, content hashing, versioning)\n├── tests/\n│   ├── test_db.py            # 11 tests: schema, CRUD, search, dedup, stats\n│   └── test_quality.py       # 8 tests: attack detection, noise filtering\n├── curated_advanced_prompts_v2.json   # 3,863 curated prompts from 20 sources\n├── elite_custom_prompts.json          # 120 hand-crafted advanced attacks\n├── pyproject.toml                     # Package config (pip install -e .)\n└── README.md\n```\n\n## Development\n\n```bash\n# Install with dev dependencies\nmake dev\n\n# Run tests\nmake test\n\n# Lint \u0026 format\nmake lint\nmake format\n\n# Build database, curate, and view stats\nmake curate\nmake stats\n\n# Clean generated files\nmake clean\n```\n\nOr without make:\n```bash\npip install -e \".[dev]\"\npytest tests/ -v\nruff check src/ tests/\n```\n\nSee [`examples/basic_usage.py`](examples/basic_usage.py) for Python library usage.\n\n## Roadmap\n\n- [x] ~~Export plugins for Garak, ps-fuzz~~ (done)\n- [x] ~~GitHub Actions CI/CD~~ (done)\n- [ ] Automated testing against model APIs (record real success rates)\n- [ ] RAG-powered attack variant generation\n- [ ] Web UI for browsing and contributing\n- [ ] CI/CD quality gates on PR submissions\n- [ ] Model vulnerability leaderboard\n\n## Responsible Use\n\nThis database is for **defensive security research only**. See [SECURITY.md](SECURITY.md) for full policy. By using this tool, you agree to use it only for authorized security testing, developing defenses, and academic research.\n\n## License\n\nMIT — see [LICENSE](LICENSE)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscthornton%2Fprompt-database","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fscthornton%2Fprompt-database","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscthornton%2Fprompt-database/lists"}