{"id":32499268,"url":"https://github.com/llaraspata/hallucinationdetection","last_synced_at":"2025-10-27T15:50:11.204Z","repository":{"id":320865624,"uuid":"905270816","full_name":"llaraspata/HallucinationDetection","owner":"llaraspata","description":"Analyzing the correlation between Hallucinations and Knowledge Conflicts in Large Language Models","archived":false,"fork":false,"pushed_at":"2025-10-26T11:01:10.000Z","size":72529,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-10-26T13:05:17.351Z","etag":null,"topics":["hallucinations","knowledge-conflicts","llm","probing"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/llaraspata.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-12-18T13:47:18.000Z","updated_at":"2025-10-26T11:01:13.000Z","dependencies_parsed_at":"2025-10-26T13:17:06.242Z","dependency_job_id":null,"html_url":"https://github.com/llaraspata/HallucinationDetection","commit_stats":null,"previous_names":["llaraspata/hallucinationdetection"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/llaraspata/HallucinationDetection","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/llaraspata%2FHallucinationDetection","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/llaraspata%2FHallucinationDetection/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/llaraspata%2FHallucinationDetection/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/llaraspata%2FHallucinationDetection/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/llaraspata","download_url":"https://codeload.github.com/llaraspata/HallucinationDetection/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/llaraspata%2FHallucinationDetection/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":281295815,"owners_count":26476759,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-27T02:00:05.855Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["hallucinations","knowledge-conflicts","llm","probing"],"created_at":"2025-10-27T15:50:10.422Z","updated_at":"2025-10-27T15:50:11.199Z","avatar_url":"https://github.com/llaraspata.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Analyzing the correlation between Hallucinations and Knowledge Conflicts in Large Language Models\n\n\nThis project investigates whether hallucinations correlate to knowledge conflicts in LLMs. It provides tools and scripts to collect, analyze, and probe model outputs for factual inconsistencies, supporting research into model reliability and interpretability. \n\nTo assess if hallucinations can be detected by using knowledge conflict probing models, we implemented the pipeline illustrated in the figure below.\n![Hallucination by Knowledge Conflicts Schema](images/schema/Hallucination_by_KC.svg)\n\nVice versa, to check if knowledge conflicts can be detected by using hallucination probing models, we implemented what shown in the next figure.\n![Knowledge Conflicts by Hallucinations Schema](images/schema/KC_by_Hallucination.svg)\n\n\n## 🛠️ Setup\n\n\u003e [!NOTE]\n\u003e This project \"imports\" code from several reference studies by including their repositories. As a result, you need to install dependencies for each referenced repository separately by following the setup instructions for each project below.\n\n### Root project\n1. Clone the repository:\n```bash\ngit clone https://github.com/llaraspata/HallucinationDetection.git\ngit submodule update --init --recursive\ncd HallucinationDetection\n```\n\n2. Create and activate a virtual environment using uv\n```bash\nuv venv --python 3.11.5\nsource .venv/bin/activate\n```\n\n3. Install dependencies:\n```bash\nuv pip install -r requirements.txt\n```\n\n### Hallucination probing project\n1. Move to the project folder:\n```bash\ncd llm-hallucinations-factual-qa\n```\n\n2. Create and activate a virtual environment using uv\n```bash\nuv venv --python 3.11.5\nsource .venv/bin/activate\n```\n\n3. Install dependencies:\n```bash\nbash setup.sh\n```\n\n### Knowledge Conflict probing project\n1. Move to the project folder:\n```bash\ncd SAE-based-representation-engineering\n```\n\n2. Create and activate a virtual environment using uv\n```bash\nuv venv --python 3.9\nsource .venv/bin/activate\n```\n\n3. Install dependencies:\n```bash\nbash ./scripts/install.sh\n```\n\n\n## 📊 Datasets\n\nOur analysis on hallucination detection involved the following datasets:\n\n- **Mu-SHROOM (SemEval 2025)**, which collects pairs of questions and hallucinated answer. Its instances cover 14 different languages. The adopted dataset is  `data/raw/labeled.json`\n\n- **HaluEval**, available on [🤗HuggingFace](https://huggingface.co/datasets/pminervini/HaluEval), which collects human-annotated pairs of (question, answer). For our purposes, we used the `dialog` subset.\n\n- **HaluBench**, available on [🤗HuggingFace](https://huggingface.co/datasets/PatronusAI/HaluBench), which collects instances sourced from real-world domains, spanning from finance to medicine for hallucination detection in Question-Answering tasks.\n\nOur analysis on knowledge conflict detection involved the **NQ-Swap** dataset (available on [🤗HuggingFace](https://huggingface.co/datasets/pminervini/NQ-Swap)), collects artificially constructed conflicting data pairs designed to test and evaluate LLMs' ability to handle knowledge conflicts in question-answering tasks.\n\n## 🧪 Experiments\n\n\u003e [!NOTE]\n\u003e If you have Internet access during computations, then remove the option `use_local` from the commands below, otherwise you have to download both models and datasets running the following commands:\n\u003e ```bash\n\u003e huggingface-cli download --repo-type dataset \u003cdataset_repo_id\u003e\n\u003e huggingface-cli download \u003cmodel_repo_id\u003e\n\u003e ```\n\n\n### 1. Detect Hallucination through Knowledge Conflicts\nFirst of all, you have to train knowledge conflict probing models. So run the following commands:\n\n```bash\ncd SAE-based-representation-engineering\nsource .venv/bin/activate\n\npython -W ignore -m hallucination.probing_model.save_activations\npython -W ignore -m hallucination.probing_model.activation_patterns\npython -W ignore -m hallucination.probing_model.prepare_eval\npython -W ignore -m hallucination.probing_model.train_probing_model\n```\n\nThe last command will save all the trained probing models. You can run the cells in the notebook `SAE-based-representation-engineering/hallucination/notebook/plot_accuracy.ipynb` from Section 3, to push them in a WandB workspace. This notebook plots performance metrics for knowledge conflicts detection (in this setting only), also.\n\nThen, you should move to the root project and run the following command to pull the model artifacts from the previous WandB workspace.\n```bash\ncd ../HallucinationDetection\nsource .venv/bin/activate\npython -W ignore -m src.model.download_kc_probing_model\n```\n\nLastly, you can run the following commands to predict and evaluate the performances of knowledge conflicts probing models on all hallucination datasets.\n```bash\npython -W ignore -m src.model.predict --model_name \"meta-llama/Meta-Llama-3-8B\" --data_name \"mushroom\" --use_local\npython -W ignore -m src.model.predict --model_name \"meta-llama/Meta-Llama-3-8B\" --data_name \"halu_eval\" --use_local\npython -W ignore -m src.model.predict --model_name \"meta-llama/Meta-Llama-3-8B\" --data_name \"halu_bench\" --use_local\n\npython -W ignore -m src.evaluation.eval --model_name \"meta-llama/Meta-Llama-3-8B\" --data_name \"mushroom\"\npython -W ignore -m src.evaluation.eval --model_name \"meta-llama/Meta-Llama-3-8B\" --data_name \"halu_eval\"\npython -W ignore -m src.evaluation.eval --model_name \"meta-llama/Meta-Llama-3-8B\" --data_name \"halu_bench\"\n```\n\nThe notebook `2.0-ll-results-analysis-kc.ipynb` plots the results of this last task.\n\n\n### 2. Detect Knowledge Conflicts through Hallucination\nFirst of all, you have to train collect artifacts and train hallucination probing models. So run the following commands:\n\n```bash\ncd llm-hallucinations-factual-qa\nsource .venv/bin/activate\n\npython -m result_collector\npython -W ignore -m classifier_model\n```\n\nThen, you can run the following commands to predict and evaluate the performances of hallucinations probing models on NQ-Swap.\n```bash\npython -m result_collector_kc\npython -m predict_kc_by_hall\n```\n\nThe notebook `llm-hallucinations-factual-qa/plot_accuracy.ipynb` plots the results for both tasks.\n\n\n## 📁 Project Structure\n\n```\nHallucinationDetection/\n├── 📄 README.md\n├── 📄 requirements.txt\n├── 📄 setup.py\n├── 📁 data/                                  # Mu-SHROOM dataset\n├── 📁 src/                                   # Main source code for detecting hallucinations through knowledge conflicts\n│   ├── 📁 data/                              # Dataset loaders and processors\n│   ├── 📁 model/                             # Core detection models and utilities\n│   ├── 📁 evaluation/                        # Evaluation metrics and scripts\n│   └── 📁 visualization/                     # Plotting and analysis tools\n├── 📁 models/                                # Trained probing models\n├── 📁 notebooks/                             # Analyzis notebooks\n├── 📁 results/                               # Evaluation results\n├── 📁 predictions/                           # Model predictions\n├── 📁 scripts/                               # Utility scripts\n├── 📁 artifacts/                             # Generated artifacts and cache\n├── 📁 images/                                # Documentation images and schemas\n│   ├── 📁 schema/                            # Architecture diagrams (SVG)\n│   └── 📁 hallucination_detection/           # Result visualizations\n├── 📁 llm-hallucinations-factual-qa/         # Original hallucination detection research (with further implementation for our research)\n├── 📁 SAE-based-representation-engineering/  # Original Knowledge conflict probing research (with further implementation for our research)\n└── 📁 wandb/                                 # Weights \u0026 Biases experiment logs\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fllaraspata%2Fhallucinationdetection","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fllaraspata%2Fhallucinationdetection","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fllaraspata%2Fhallucinationdetection/lists"}