https://github.com/scthornton/semantic-chameleon

Dual-Stage Temporal Poisoning Attack on RAG Systems
https://github.com/scthornton/semantic-chameleon
artificial-intelligence machine-learning poisoning-attack rag rag-security
Last synced: 3 months ago
JSON representation
Dual-Stage Temporal Poisoning Attack on RAG Systems
Host: GitHub
URL: https://github.com/scthornton/semantic-chameleon
Owner: scthornton
License: other
Created: 2025-11-17T03:24:13.000Z (7 months ago)
Default Branch: main
Last Pushed: 2026-03-09T14:49:43.000Z (3 months ago)
Last Synced: 2026-03-09T19:30:17.355Z (3 months ago)
Topics: artificial-intelligence, machine-learning, poisoning-attack, rag, rag-security
Language: Python
Homepage: https://zenodo.org/records/18080200
Size: 8.02 MB
Stars: 2
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
- Citation: CITATION.cff
- Security: SECURITY.md
Awesome Lists containing this project

README

          # Corpus-Dependent RAG Poisoning

[![Paper DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.18080200.svg)](https://doi.org/10.5281/zenodo.18080200)

[![Code DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.18079735.svg)](https://doi.org/10.5281/zenodo.18079735)

**Research Repository for "Semantic Chameleon: Corpus-Dependent Poisoning Attacks and Defenses in RAG Systems"**

**DEFENSIVE RESEARCH ONLY**: This repository contains sanitized educational materials for understanding and defending against RAG poisoning attacks. No weaponized attack materials are included.

---

## Paper

**Paper (PDF)**: [https://doi.org/10.5281/zenodo.18080200](https://doi.org/10.5281/zenodo.18080200)

**Code (This Repo)**: [https://doi.org/10.5281/zenodo.18079735](https://doi.org/10.5281/zenodo.18079735)

**Anonymous Review Copy**: [https://anonymous.4open.science/r/semantic-chameleon-B610/](https://anonymous.4open.science/r/semantic-chameleon-B610/)

**Author**: Scott Thornton (perfecXion.ai)

**Abstract**: This work characterizes how corpus composition and retrieval architecture jointly affect RAG security. We find that technical corpora are 13-62× harder to defend than general knowledge bases, and that simple hybrid BM25+vector retrieval neutralizes gradient-optimized attacks in our experiments.

**Key Findings**:

- 38.0% co-retrieval success on pure vector retrieval (n=50, 95% CI: 25.9%-51.8%)

- Hybrid retrieval (α≤0.5) reduces co-retrieval to 0% across all 50 gradient-optimized attacks

- Joint sparse+dense optimization partially circumvents hybrid (20-44% success) but significantly raises the bar

- **Multi-model E2E** (5 LLMs): attack success 46.7% (GPT-5.3) to 93.3% (Llama 4); safety violations 6.7% (Claude) to 93.3% (Llama 4)

- **FEVER n=25**: 0% overall success across all retrieval configs, confirming corpus-dependent effects at scale

- Technical corpora show 13-62× worse detection performance than general knowledge bases

- Query Pattern Differential emerges as most reliable detection method across corpora

---

## March 2026 Updates (AISec '26 Submission)

### Multi-Model End-to-End Evaluation (5 LLMs)

Attack effectiveness varies dramatically across model families:

| Model | Attack Success | Safety Violations | Payload Leakage | Divergence |

|-------|---------------|-------------------|-----------------|------------|

| GPT-5.3 | **46.7%** (7/15) | 33.3% | 9.6% | 0.284 |

| GPT-4o | 53.3% (8/15) | 86.7% | 12.0% | 0.483 |

| GPT-4o-mini | 53.3% (8/15) | 86.7% | 14.9% | 0.418 |

| Claude Sonnet 4.6 | 60.0% (9/15) | **6.7%** | 5.7% | 0.196 |

| Llama 4 Instruct | **93.3%** (14/15) | **93.3%** | **56.8%** | 0.268 |

**Key Insight**: Safety training maturity varies dramatically. Claude shows the strongest safety boundary (6.7% violations despite 60% attack success). Llama 4 is dramatically vulnerable (93% attack success, only 27% clean refusal rate). GPT-5.3 shows measurable improvement over GPT-4o.

### Joint Sparse+Dense Optimization

A knowledgeable attacker who jointly optimizes for both BM25 and vector retrieval can partially circumvent hybrid defense:

| Attack Type | α=0.7 | α=0.5 | α=0.3 |

|-------------|-------|-------|-------|

| Gradient-only (baseline) | 0% | 0% | 0% |

| Joint optimization | **20%** | **36%** | **44%** |

**Key Insight**: Hybrid retrieval raises the attack bar from 38% (pure vector) to 0% (gradient-only on hybrid), but joint optimization achieves 20-44%. Hybrid retrieval is a significant defense, not an absolute one.

### FEVER Large-Scale (n=25)

25 GCG-optimized attacks on FEVER Wikipedia (2,000-doc representative sample):

| Config | Co-Retrieval | Stealth | Overall Success |

|--------|-------------|---------|-----------------|

| Pure Vector (α=1.0) | 100% | 0% | **0%** |

| Hybrid (α=0.7) | 100% | 0% | **0%** |

| Hybrid (α=0.5) | 100% | 0% | **0%** |

| Hybrid (α=0.3) | 100% | 0% | **0%** |

**Key Insight**: Confirms n=9 pilot at 2.8× scale. General-vocabulary corpora make attack documents conspicuous regardless of retrieval architecture.

---

## December 2025 Updates

### End-to-End LLM Evaluation (Single Model)

Initial evaluation against GPT-4o-mini (15 attack scenarios):

| Metric | Result |

|--------|--------|

| Attack Success Rate | 60% (9/15 scenarios) |

| Safety Bypass Rate | 80% of successful attacks |

| Response Divergence | 46% average |

| Model Tested | GPT-4o-mini |

### Production RAG Case Study

Validated corpus-dependency hypothesis against a 156,777-document production corpus:

| Attack Type | Retrieval Success | Trigger Rank |

|------------|-------------------|--------------|

| Naive (generic) | 0% | N/A |

| Adaptive (corpus-optimized) | 100% | #1 |

---

## Repository Structure

```

semantic-chameleon/

├── README.md                          # This file

├── LICENSE                            # MIT License

├── SECURITY.md                        # Responsible disclosure policy

│

├── detection/                         # Detection framework (defensive only)

│   ├── semantic_drift.py             # Method 1: Embedding anomaly detection

│   ├── keyword_anomaly.py            # Method 2: IDF-based keyword detection

│   ├── query_pattern.py              # Method 3: Query differential analysis

│   ├── detection_metrics.py          # ROC, F1, AUROC evaluation

│   └── README.md                     # Detection method documentation

│

├── defense/                           # Defense implementations

│   ├── hybrid_retrieval.py           # BM25+vector hybrid scoring

│   ├── bm25_implementation.py        # Okapi BM25 with configurable params

│   └── README.md                     # Defense deployment guide

│

├── evaluation/                        # Evaluation scripts

│   ├── metrics.py                    # Success rate, CI calculation (Wilson score)

│   ├── statistical_tests.py          # Chi-square, effect size (Cohen's h)

│   ├── corpus_analysis.py            # Corpus property analysis

│   ├── e2e_llm_evaluation.py         # NEW: End-to-end LLM evaluation

│   └── README.md                     # Evaluation methodology

│

├── examples/                          # Sanitized educational examples

│   ├── sanitized_scenarios.json      # Attack scenario descriptions (no exploits)

│   ├── benign_document_templates.txt # Example benign document structures

│   ├── detection_examples.py         # How to use detection framework

│   └── README.md                     # Examples documentation

│

├── data/                              # Dataset information (no actual data)

│   ├── security_se_instructions.md   # How to obtain Security Stack Exchange

│   ├── fever_instructions.md         # How to obtain FEVER dataset

│   └── corpus_statistics.json        # Corpus metadata (sizes, domains)

│

├── experiments/                        # Experiment scripts (March 2026)

│   ├── exp1_fever_large_scale.py     # FEVER n=25 evaluation

│   ├── exp2_multimodel_e2e.py        # Multi-model E2E (5 LLMs)

│   ├── exp3_joint_hybrid_attack.py   # Joint sparse+dense optimization

│   ├── setup_data.py                 # Data download and embedding setup

│   └── requirements.txt              # Experiment dependencies

│

├── results/                           # Experimental results

│   ├── e2e_evaluation_results.json   # Dec 2025: E2E LLM evaluation

│   ├── panw_case_study.json          # Dec 2025: Production case study

│   ├── march-2026/                   # March 2026 experiments

│   │   ├── exp1_fever_large_scale_results.json

│   │   ├── exp2_multimodel_e2e_results.json

│   │   └── exp3_joint_hybrid_attack_results.json

│   └── README.md                     # Results documentation

│

├── paper/                             # Paper materials

│   ├── paper.pdf                     # Main paper (arXiv version)

│   ├── supplementary.pdf             # Supplementary materials

│   └── figures/                      # Paper figures (PNG, 300 DPI)

│

├── docs/                              # Documentation

│   ├── REPRODUCIBILITY.md            # Step-by-step reproduction guide

│   ├── ETHICAL_CONSIDERATIONS.md     # Ethics and responsible use

│   ├── DEPLOYMENT_GUIDE.md           # How to deploy defenses

│   └── FAQ.md                        # Frequently asked questions

│

└── requirements.txt                   # Python dependencies

```

---

## Defensive Focus

This repository provides:

- **Detection methods** - 5 detection approaches with complete implementations

- **Defense mechanisms** - Hybrid retrieval and monitoring strategies

- **Evaluation tools** - Metrics, statistical tests, ROC analysis

- **Corpus analysis** - Understanding corpus-dependent security properties

- **Deployment guides** - Practical guidance for securing RAG systems

**NOT included** - Working attack implementations, weaponizable exploits, malicious document generation

---

## Quick Start

### Installation

```bash

git clone https://github.com/scthornton/semantic-chameleon

cd semantic-chameleon

pip install -r requirements.txt

```

### Run Detection Framework

```python

from detection.query_pattern import QueryPatternDetector

from defense.hybrid_retrieval import HybridRetriever

# Initialize detector

detector = QueryPatternDetector(

    benign_queries=100,  # Sample from production logs

    sensitive_queries=20  # Domain-specific attack patterns

)

# Initialize hybrid defense

retriever = HybridRetriever(alpha=0.5)  # Balanced BM25+vector

# Analyze corpus

results = detector.analyze_corpus(corpus, threshold=0.2)

print(f"Detected: {results['flagged_documents']} suspicious documents")

```

### Deploy Hybrid Defense

```python

from defense.hybrid_retrieval import HybridRetriever

# Security-critical configuration (recommended)

retriever = HybridRetriever(

    alpha=0.5,           # 50% vector, 50% BM25

    bm25_k1=1.5,         # Standard Okapi BM25

    bm25_b=0.75

)

# Retrieve with defense

results = retriever.retrieve(query, k=10)

```

---

## Reproducing Paper Results

See [`docs/REPRODUCIBILITY.md`](docs/REPRODUCIBILITY.md) for complete step-by-step instructions.

**High-level overview**:

1. **Obtain datasets**: Security Stack Exchange dump + FEVER dataset

2. **Run detection evaluation**: `python evaluation/run_detection.py`

3. **Test hybrid defense**: `python evaluation/run_hybrid_defense.py`

4. **Generate figures**: `python evaluation/generate_figures.py`

**Expected compute**: ~8-16 hours on GCP n1-standard-8 (or equivalent)

---

## Research Ethics

This research follows responsible disclosure practices:

- **Defensive focus**: All materials prioritize understanding defenses

- **No weaponization**: Attack implementations are conceptual only

- **Sanitized examples**: All examples use non-exploitable scenarios

- **Coordinated disclosure**: Vulnerabilities reported to affected vendors

See [`docs/ETHICAL_CONSIDERATIONS.md`](docs/ETHICAL_CONSIDERATIONS.md) for full ethics statement.

---

## Citation

If you use this research or code, please cite:

```bibtex

@article{thornton2025semantic,

  author    = {Thornton, Scott},

  title     = {Semantic Chameleon: Corpus-Dependent Poisoning Attacks and Defenses in RAG Systems},

  year      = {2025},

  doi       = {10.5281/zenodo.18080200},

  url       = {https://doi.org/10.5281/zenodo.18080200},

  publisher = {Zenodo}

}

```

**Paper:** [https://doi.org/10.5281/zenodo.18080200](https://doi.org/10.5281/zenodo.18080200)

**Code:** [https://doi.org/10.5281/zenodo.18079735](https://doi.org/10.5281/zenodo.18079735)

---

## Contributing

We welcome contributions that advance RAG security defenses:

- Detection method improvements

- New defense mechanisms

- Evaluation tools

- Documentation improvements

**Not accepted**: Attack implementations, weaponizable code, malicious examples

See [`CONTRIBUTING.md`](CONTRIBUTING.md) for guidelines.

---

## Contact

**Scott Thornton**

- Website: https://perfecxion.ai

- Email: scott@perfecxion.ai

- Paper: [https://doi.org/10.5281/zenodo.18080200](https://doi.org/10.5281/zenodo.18080200)

- GitHub: https://github.com/scthornton/semantic-chameleon

**Security Issues**: Please report via [SECURITY.md](SECURITY.md)

---

## License

MIT License - see [`LICENSE`](LICENSE) for details.

**Responsible Use Clause**: By using this code, you agree to use it only for defensive security research, system hardening, and educational purposes. Malicious use is prohibited and violates the terms of this license.

---

## Acknowledgments

- Security Stack Exchange community for public dataset

- FEVER dataset maintainers

- Google Cloud Platform for computational resources

- OpenAI for embedding API access

---

**Last Updated**: March 2026

**Paper DOI**: [10.5281/zenodo.18080200](https://doi.org/10.5281/zenodo.18080200)

**Code DOI**: [10.5281/zenodo.18079735](https://doi.org/10.5281/zenodo.18079735)

**Status**: Published on Zenodo (defensive research materials)
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/scthornton/semantic-chameleon

Awesome Lists containing this project

README