https://github.com/llaraspata/hallucinationdetection
Analyzing the correlation between Hallucinations and Knowledge Conflicts in Large Language Models
https://github.com/llaraspata/hallucinationdetection
hallucinations knowledge-conflicts llm probing
Last synced: 8 months ago
JSON representation
Analyzing the correlation between Hallucinations and Knowledge Conflicts in Large Language Models
- Host: GitHub
- URL: https://github.com/llaraspata/hallucinationdetection
- Owner: llaraspata
- Created: 2024-12-18T13:47:18.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-10-26T11:01:10.000Z (8 months ago)
- Last Synced: 2025-10-26T13:05:17.351Z (8 months ago)
- Topics: hallucinations, knowledge-conflicts, llm, probing
- Language: Jupyter Notebook
- Homepage:
- Size: 69.2 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Analyzing the correlation between Hallucinations and Knowledge Conflicts in Large Language Models
This project investigates whether hallucinations correlate to knowledge conflicts in LLMs. It provides tools and scripts to collect, analyze, and probe model outputs for factual inconsistencies, supporting research into model reliability and interpretability.
To assess if hallucinations can be detected by using knowledge conflict probing models, we implemented the pipeline illustrated in the figure below.

Vice versa, to check if knowledge conflicts can be detected by using hallucination probing models, we implemented what shown in the next figure.

## π οΈ Setup
> [!NOTE]
> This project "imports" code from several reference studies by including their repositories. As a result, you need to install dependencies for each referenced repository separately by following the setup instructions for each project below.
### Root project
1. Clone the repository:
```bash
git clone https://github.com/llaraspata/HallucinationDetection.git
git submodule update --init --recursive
cd HallucinationDetection
```
2. Create and activate a virtual environment using uv
```bash
uv venv --python 3.11.5
source .venv/bin/activate
```
3. Install dependencies:
```bash
uv pip install -r requirements.txt
```
### Hallucination probing project
1. Move to the project folder:
```bash
cd llm-hallucinations-factual-qa
```
2. Create and activate a virtual environment using uv
```bash
uv venv --python 3.11.5
source .venv/bin/activate
```
3. Install dependencies:
```bash
bash setup.sh
```
### Knowledge Conflict probing project
1. Move to the project folder:
```bash
cd SAE-based-representation-engineering
```
2. Create and activate a virtual environment using uv
```bash
uv venv --python 3.9
source .venv/bin/activate
```
3. Install dependencies:
```bash
bash ./scripts/install.sh
```
## π Datasets
Our analysis on hallucination detection involved the following datasets:
- **Mu-SHROOM (SemEval 2025)**, which collects pairs of questions and hallucinated answer. Its instances cover 14 different languages. The adopted dataset is `data/raw/labeled.json`
- **HaluEval**, available on [π€HuggingFace](https://huggingface.co/datasets/pminervini/HaluEval), which collects human-annotated pairs of (question, answer). For our purposes, we used the `dialog` subset.
- **HaluBench**, available on [π€HuggingFace](https://huggingface.co/datasets/PatronusAI/HaluBench), which collects instances sourced from real-world domains, spanning from finance to medicine for hallucination detection in Question-Answering tasks.
Our analysis on knowledge conflict detection involved the **NQ-Swap** dataset (available on [π€HuggingFace](https://huggingface.co/datasets/pminervini/NQ-Swap)), collects artificially constructed conflicting data pairs designed to test and evaluate LLMs' ability to handle knowledge conflicts in question-answering tasks.
## π§ͺ Experiments
> [!NOTE]
> If you have Internet access during computations, then remove the option `use_local` from the commands below, otherwise you have to download both models and datasets running the following commands:
> ```bash
> huggingface-cli download --repo-type dataset
> huggingface-cli download
> ```
### 1. Detect Hallucination through Knowledge Conflicts
First of all, you have to train knowledge conflict probing models. So run the following commands:
```bash
cd SAE-based-representation-engineering
source .venv/bin/activate
python -W ignore -m hallucination.probing_model.save_activations
python -W ignore -m hallucination.probing_model.activation_patterns
python -W ignore -m hallucination.probing_model.prepare_eval
python -W ignore -m hallucination.probing_model.train_probing_model
```
The last command will save all the trained probing models. You can run the cells in the notebook `SAE-based-representation-engineering/hallucination/notebook/plot_accuracy.ipynb` from Section 3, to push them in a WandB workspace. This notebook plots performance metrics for knowledge conflicts detection (in this setting only), also.
Then, you should move to the root project and run the following command to pull the model artifacts from the previous WandB workspace.
```bash
cd ../HallucinationDetection
source .venv/bin/activate
python -W ignore -m src.model.download_kc_probing_model
```
Lastly, you can run the following commands to predict and evaluate the performances of knowledge conflicts probing models on all hallucination datasets.
```bash
python -W ignore -m src.model.predict --model_name "meta-llama/Meta-Llama-3-8B" --data_name "mushroom" --use_local
python -W ignore -m src.model.predict --model_name "meta-llama/Meta-Llama-3-8B" --data_name "halu_eval" --use_local
python -W ignore -m src.model.predict --model_name "meta-llama/Meta-Llama-3-8B" --data_name "halu_bench" --use_local
python -W ignore -m src.evaluation.eval --model_name "meta-llama/Meta-Llama-3-8B" --data_name "mushroom"
python -W ignore -m src.evaluation.eval --model_name "meta-llama/Meta-Llama-3-8B" --data_name "halu_eval"
python -W ignore -m src.evaluation.eval --model_name "meta-llama/Meta-Llama-3-8B" --data_name "halu_bench"
```
The notebook `2.0-ll-results-analysis-kc.ipynb` plots the results of this last task.
### 2. Detect Knowledge Conflicts through Hallucination
First of all, you have to train collect artifacts and train hallucination probing models. So run the following commands:
```bash
cd llm-hallucinations-factual-qa
source .venv/bin/activate
python -m result_collector
python -W ignore -m classifier_model
```
Then, you can run the following commands to predict and evaluate the performances of hallucinations probing models on NQ-Swap.
```bash
python -m result_collector_kc
python -m predict_kc_by_hall
```
The notebook `llm-hallucinations-factual-qa/plot_accuracy.ipynb` plots the results for both tasks.
## π Project Structure
```
HallucinationDetection/
βββ π README.md
βββ π requirements.txt
βββ π setup.py
βββ π data/ # Mu-SHROOM dataset
βββ π src/ # Main source code for detecting hallucinations through knowledge conflicts
β βββ π data/ # Dataset loaders and processors
β βββ π model/ # Core detection models and utilities
β βββ π evaluation/ # Evaluation metrics and scripts
β βββ π visualization/ # Plotting and analysis tools
βββ π models/ # Trained probing models
βββ π notebooks/ # Analyzis notebooks
βββ π results/ # Evaluation results
βββ π predictions/ # Model predictions
βββ π scripts/ # Utility scripts
βββ π artifacts/ # Generated artifacts and cache
βββ π images/ # Documentation images and schemas
β βββ π schema/ # Architecture diagrams (SVG)
β βββ π hallucination_detection/ # Result visualizations
βββ π llm-hallucinations-factual-qa/ # Original hallucination detection research (with further implementation for our research)
βββ π SAE-based-representation-engineering/ # Original Knowledge conflict probing research (with further implementation for our research)
βββ π wandb/ # Weights & Biases experiment logs
```