https://github.com/scthornton/semantic-chameleon
Dual-Stage Temporal Poisoning Attack on RAG Systems
https://github.com/scthornton/semantic-chameleon
artificial-intelligence machine-learning poisoning-attack rag rag-security
Last synced: 3 months ago
JSON representation
Dual-Stage Temporal Poisoning Attack on RAG Systems
- Host: GitHub
- URL: https://github.com/scthornton/semantic-chameleon
- Owner: scthornton
- License: other
- Created: 2025-11-17T03:24:13.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2026-03-09T14:49:43.000Z (3 months ago)
- Last Synced: 2026-03-09T19:30:17.355Z (3 months ago)
- Topics: artificial-intelligence, machine-learning, poisoning-attack, rag, rag-security
- Language: Python
- Homepage: https://zenodo.org/records/18080200
- Size: 8.02 MB
- Stars: 2
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
- Citation: CITATION.cff
- Security: SECURITY.md
Awesome Lists containing this project
README
# Corpus-Dependent RAG Poisoning
[](https://doi.org/10.5281/zenodo.18080200)
[](https://doi.org/10.5281/zenodo.18079735)
**Research Repository for "Semantic Chameleon: Corpus-Dependent Poisoning Attacks and Defenses in RAG Systems"**
**DEFENSIVE RESEARCH ONLY**: This repository contains sanitized educational materials for understanding and defending against RAG poisoning attacks. No weaponized attack materials are included.
---
## Paper
**Paper (PDF)**: [https://doi.org/10.5281/zenodo.18080200](https://doi.org/10.5281/zenodo.18080200)
**Code (This Repo)**: [https://doi.org/10.5281/zenodo.18079735](https://doi.org/10.5281/zenodo.18079735)
**Anonymous Review Copy**: [https://anonymous.4open.science/r/semantic-chameleon-B610/](https://anonymous.4open.science/r/semantic-chameleon-B610/)
**Author**: Scott Thornton (perfecXion.ai)
**Abstract**: This work characterizes how corpus composition and retrieval architecture jointly affect RAG security. We find that technical corpora are 13-62× harder to defend than general knowledge bases, and that simple hybrid BM25+vector retrieval neutralizes gradient-optimized attacks in our experiments.
**Key Findings**:
- 38.0% co-retrieval success on pure vector retrieval (n=50, 95% CI: 25.9%-51.8%)
- Hybrid retrieval (α≤0.5) reduces co-retrieval to 0% across all 50 gradient-optimized attacks
- Joint sparse+dense optimization partially circumvents hybrid (20-44% success) but significantly raises the bar
- **Multi-model E2E** (5 LLMs): attack success 46.7% (GPT-5.3) to 93.3% (Llama 4); safety violations 6.7% (Claude) to 93.3% (Llama 4)
- **FEVER n=25**: 0% overall success across all retrieval configs, confirming corpus-dependent effects at scale
- Technical corpora show 13-62× worse detection performance than general knowledge bases
- Query Pattern Differential emerges as most reliable detection method across corpora
---
## March 2026 Updates (AISec '26 Submission)
### Multi-Model End-to-End Evaluation (5 LLMs)
Attack effectiveness varies dramatically across model families:
| Model | Attack Success | Safety Violations | Payload Leakage | Divergence |
|-------|---------------|-------------------|-----------------|------------|
| GPT-5.3 | **46.7%** (7/15) | 33.3% | 9.6% | 0.284 |
| GPT-4o | 53.3% (8/15) | 86.7% | 12.0% | 0.483 |
| GPT-4o-mini | 53.3% (8/15) | 86.7% | 14.9% | 0.418 |
| Claude Sonnet 4.6 | 60.0% (9/15) | **6.7%** | 5.7% | 0.196 |
| Llama 4 Instruct | **93.3%** (14/15) | **93.3%** | **56.8%** | 0.268 |
**Key Insight**: Safety training maturity varies dramatically. Claude shows the strongest safety boundary (6.7% violations despite 60% attack success). Llama 4 is dramatically vulnerable (93% attack success, only 27% clean refusal rate). GPT-5.3 shows measurable improvement over GPT-4o.
### Joint Sparse+Dense Optimization
A knowledgeable attacker who jointly optimizes for both BM25 and vector retrieval can partially circumvent hybrid defense:
| Attack Type | α=0.7 | α=0.5 | α=0.3 |
|-------------|-------|-------|-------|
| Gradient-only (baseline) | 0% | 0% | 0% |
| Joint optimization | **20%** | **36%** | **44%** |
**Key Insight**: Hybrid retrieval raises the attack bar from 38% (pure vector) to 0% (gradient-only on hybrid), but joint optimization achieves 20-44%. Hybrid retrieval is a significant defense, not an absolute one.
### FEVER Large-Scale (n=25)
25 GCG-optimized attacks on FEVER Wikipedia (2,000-doc representative sample):
| Config | Co-Retrieval | Stealth | Overall Success |
|--------|-------------|---------|-----------------|
| Pure Vector (α=1.0) | 100% | 0% | **0%** |
| Hybrid (α=0.7) | 100% | 0% | **0%** |
| Hybrid (α=0.5) | 100% | 0% | **0%** |
| Hybrid (α=0.3) | 100% | 0% | **0%** |
**Key Insight**: Confirms n=9 pilot at 2.8× scale. General-vocabulary corpora make attack documents conspicuous regardless of retrieval architecture.
---
## December 2025 Updates
### End-to-End LLM Evaluation (Single Model)
Initial evaluation against GPT-4o-mini (15 attack scenarios):
| Metric | Result |
|--------|--------|
| Attack Success Rate | 60% (9/15 scenarios) |
| Safety Bypass Rate | 80% of successful attacks |
| Response Divergence | 46% average |
| Model Tested | GPT-4o-mini |
### Production RAG Case Study
Validated corpus-dependency hypothesis against a 156,777-document production corpus:
| Attack Type | Retrieval Success | Trigger Rank |
|------------|-------------------|--------------|
| Naive (generic) | 0% | N/A |
| Adaptive (corpus-optimized) | 100% | #1 |
---
## Repository Structure
```
semantic-chameleon/
├── README.md # This file
├── LICENSE # MIT License
├── SECURITY.md # Responsible disclosure policy
│
├── detection/ # Detection framework (defensive only)
│ ├── semantic_drift.py # Method 1: Embedding anomaly detection
│ ├── keyword_anomaly.py # Method 2: IDF-based keyword detection
│ ├── query_pattern.py # Method 3: Query differential analysis
│ ├── detection_metrics.py # ROC, F1, AUROC evaluation
│ └── README.md # Detection method documentation
│
├── defense/ # Defense implementations
│ ├── hybrid_retrieval.py # BM25+vector hybrid scoring
│ ├── bm25_implementation.py # Okapi BM25 with configurable params
│ └── README.md # Defense deployment guide
│
├── evaluation/ # Evaluation scripts
│ ├── metrics.py # Success rate, CI calculation (Wilson score)
│ ├── statistical_tests.py # Chi-square, effect size (Cohen's h)
│ ├── corpus_analysis.py # Corpus property analysis
│ ├── e2e_llm_evaluation.py # NEW: End-to-end LLM evaluation
│ └── README.md # Evaluation methodology
│
├── examples/ # Sanitized educational examples
│ ├── sanitized_scenarios.json # Attack scenario descriptions (no exploits)
│ ├── benign_document_templates.txt # Example benign document structures
│ ├── detection_examples.py # How to use detection framework
│ └── README.md # Examples documentation
│
├── data/ # Dataset information (no actual data)
│ ├── security_se_instructions.md # How to obtain Security Stack Exchange
│ ├── fever_instructions.md # How to obtain FEVER dataset
│ └── corpus_statistics.json # Corpus metadata (sizes, domains)
│
├── experiments/ # Experiment scripts (March 2026)
│ ├── exp1_fever_large_scale.py # FEVER n=25 evaluation
│ ├── exp2_multimodel_e2e.py # Multi-model E2E (5 LLMs)
│ ├── exp3_joint_hybrid_attack.py # Joint sparse+dense optimization
│ ├── setup_data.py # Data download and embedding setup
│ └── requirements.txt # Experiment dependencies
│
├── results/ # Experimental results
│ ├── e2e_evaluation_results.json # Dec 2025: E2E LLM evaluation
│ ├── panw_case_study.json # Dec 2025: Production case study
│ ├── march-2026/ # March 2026 experiments
│ │ ├── exp1_fever_large_scale_results.json
│ │ ├── exp2_multimodel_e2e_results.json
│ │ └── exp3_joint_hybrid_attack_results.json
│ └── README.md # Results documentation
│
├── paper/ # Paper materials
│ ├── paper.pdf # Main paper (arXiv version)
│ ├── supplementary.pdf # Supplementary materials
│ └── figures/ # Paper figures (PNG, 300 DPI)
│
├── docs/ # Documentation
│ ├── REPRODUCIBILITY.md # Step-by-step reproduction guide
│ ├── ETHICAL_CONSIDERATIONS.md # Ethics and responsible use
│ ├── DEPLOYMENT_GUIDE.md # How to deploy defenses
│ └── FAQ.md # Frequently asked questions
│
└── requirements.txt # Python dependencies
```
---
## Defensive Focus
This repository provides:
- **Detection methods** - 5 detection approaches with complete implementations
- **Defense mechanisms** - Hybrid retrieval and monitoring strategies
- **Evaluation tools** - Metrics, statistical tests, ROC analysis
- **Corpus analysis** - Understanding corpus-dependent security properties
- **Deployment guides** - Practical guidance for securing RAG systems
**NOT included** - Working attack implementations, weaponizable exploits, malicious document generation
---
## Quick Start
### Installation
```bash
git clone https://github.com/scthornton/semantic-chameleon
cd semantic-chameleon
pip install -r requirements.txt
```
### Run Detection Framework
```python
from detection.query_pattern import QueryPatternDetector
from defense.hybrid_retrieval import HybridRetriever
# Initialize detector
detector = QueryPatternDetector(
benign_queries=100, # Sample from production logs
sensitive_queries=20 # Domain-specific attack patterns
)
# Initialize hybrid defense
retriever = HybridRetriever(alpha=0.5) # Balanced BM25+vector
# Analyze corpus
results = detector.analyze_corpus(corpus, threshold=0.2)
print(f"Detected: {results['flagged_documents']} suspicious documents")
```
### Deploy Hybrid Defense
```python
from defense.hybrid_retrieval import HybridRetriever
# Security-critical configuration (recommended)
retriever = HybridRetriever(
alpha=0.5, # 50% vector, 50% BM25
bm25_k1=1.5, # Standard Okapi BM25
bm25_b=0.75
)
# Retrieve with defense
results = retriever.retrieve(query, k=10)
```
---
## Reproducing Paper Results
See [`docs/REPRODUCIBILITY.md`](docs/REPRODUCIBILITY.md) for complete step-by-step instructions.
**High-level overview**:
1. **Obtain datasets**: Security Stack Exchange dump + FEVER dataset
2. **Run detection evaluation**: `python evaluation/run_detection.py`
3. **Test hybrid defense**: `python evaluation/run_hybrid_defense.py`
4. **Generate figures**: `python evaluation/generate_figures.py`
**Expected compute**: ~8-16 hours on GCP n1-standard-8 (or equivalent)
---
## Research Ethics
This research follows responsible disclosure practices:
- **Defensive focus**: All materials prioritize understanding defenses
- **No weaponization**: Attack implementations are conceptual only
- **Sanitized examples**: All examples use non-exploitable scenarios
- **Coordinated disclosure**: Vulnerabilities reported to affected vendors
See [`docs/ETHICAL_CONSIDERATIONS.md`](docs/ETHICAL_CONSIDERATIONS.md) for full ethics statement.
---
## Citation
If you use this research or code, please cite:
```bibtex
@article{thornton2025semantic,
author = {Thornton, Scott},
title = {Semantic Chameleon: Corpus-Dependent Poisoning Attacks and Defenses in RAG Systems},
year = {2025},
doi = {10.5281/zenodo.18080200},
url = {https://doi.org/10.5281/zenodo.18080200},
publisher = {Zenodo}
}
```
**Paper:** [https://doi.org/10.5281/zenodo.18080200](https://doi.org/10.5281/zenodo.18080200)
**Code:** [https://doi.org/10.5281/zenodo.18079735](https://doi.org/10.5281/zenodo.18079735)
---
## Contributing
We welcome contributions that advance RAG security defenses:
- Detection method improvements
- New defense mechanisms
- Evaluation tools
- Documentation improvements
**Not accepted**: Attack implementations, weaponizable code, malicious examples
See [`CONTRIBUTING.md`](CONTRIBUTING.md) for guidelines.
---
## Contact
**Scott Thornton**
- Website: https://perfecxion.ai
- Email: scott@perfecxion.ai
- Paper: [https://doi.org/10.5281/zenodo.18080200](https://doi.org/10.5281/zenodo.18080200)
- GitHub: https://github.com/scthornton/semantic-chameleon
**Security Issues**: Please report via [SECURITY.md](SECURITY.md)
---
## License
MIT License - see [`LICENSE`](LICENSE) for details.
**Responsible Use Clause**: By using this code, you agree to use it only for defensive security research, system hardening, and educational purposes. Malicious use is prohibited and violates the terms of this license.
---
## Acknowledgments
- Security Stack Exchange community for public dataset
- FEVER dataset maintainers
- Google Cloud Platform for computational resources
- OpenAI for embedding API access
---
**Last Updated**: March 2026
**Paper DOI**: [10.5281/zenodo.18080200](https://doi.org/10.5281/zenodo.18080200)
**Code DOI**: [10.5281/zenodo.18079735](https://doi.org/10.5281/zenodo.18079735)
**Status**: Published on Zenodo (defensive research materials)