https://github.com/declare-lab/trust-align
Codes and datasets for the paper Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse
https://github.com/declare-lab/trust-align
rag retrieval-augmented-generation
Last synced: about 2 months ago
JSON representation
Codes and datasets for the paper Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse
- Host: GitHub
- URL: https://github.com/declare-lab/trust-align
- Owner: declare-lab
- Created: 2024-09-17T12:49:59.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2025-03-03T08:00:13.000Z (3 months ago)
- Last Synced: 2025-03-27T18:21:29.279Z (2 months ago)
- Topics: rag, retrieval-augmented-generation
- Language: Python
- Homepage:
- Size: 2.41 MB
- Stars: 47
- Watchers: 1
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse (ICLR 2025, Oral, top 1.8%)
> 📣 4/2/25: We have updated our repo structure to hopefully be more user friendly!
> 📣 31/1/25: We have open-sourced the Trust-Aligned models [here](https://huggingface.co/collections/declare-lab/trust-align-679491760dd03cc5f4d479e6)!
> 📣 22/1/25: This paper has been accepted to ICLR 2025!
This repository contains the original implementation of [Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse](https://arxiv.org/abs/2409.11242) (accepted at ICLR 2025). There are two parts to this repository:
1. Trust-Align: A preference dataset and framework that aligns LLMs to be more trustworthy, as measured by higher Trust-Score.
2. Trust-Eval: A framework to evaluate the trustworthiness of inline-cited outputs generated by large language models (LLMs) within the Retrieval-Augmented Generation (RAG) setting.
**Paper abstract:**
LLMs are an integral part of retrieval-augmented generation (RAG) systems. While many studies focus on evaluating the quality of end-to-end RAG systems, there is a lack of research on understanding the appropriateness of an LLM for the RAG task. Thus, we introduce a new metric, Trust-Score, that provides a holistic evaluation of the trustworthiness of LLMs in an RAG framework. We show that various prompting methods, such as in-context learning, fail to adapt LLMs effectively to the RAG task. Thus, we propose Trust-Align, a framework to align LLMs for higher Trust-Score. LLaMA-3-8b, aligned with our method, significantly outperforms open-source LLMs of comparable sizes on ASQA (↑10.7), QAMPARI (↑29.2), and ELI5 (↑14.9).
## Data
The **evaluation** dataset used in Trust-Eval is available on [Trust-Align Huggingface](https://huggingface.co/datasets/declare-lab/Trust-Score/tree/main/Trust-Score).
The **SFT and DPO training** dataset used in Trust-Align is also available [Trust-Align Huggingface](https://huggingface.co/datasets/declare-lab/Trust-Score/tree/main/Trust-Align).
## Trust-Eval
Trust-Eval quantifies trustworthiness on three main axis using Trust-Score:
1. **Response Correctness**: Correctness of the generated claims
2. **Attribution Quality**: Quality of citations generated. Concerns the recall (Are generated statements well-supported by the set citations?) and precision (Are the citations relevant to the statements?) of citations.
3. **Refusal Groundedness**: Ability of the model to discern if the question can be answered given the documents
We release Trust-Eval as a standalone package. You can install by following the steps below:
1. **Set up a Python environment**
```bash
conda create -n trust_eval python=3.10.13
conda activate trust_eval
```2. **Install dependencies**
```bash
pip install trust_eval
```> Note: that vLLM will be installed with CUDA 12.1. Please ensure your CUDA setup is compatible.
3. **Set up NLTK**
```bash
import nltk
nltk.download('punkt_tab')
```Please refer to [Trust-Eval README](./trust_eval/README.md) for more information.
## Trust-Align
### Set up
```bash
conda create -n cite python=3.10.13
conda activate cite
pip install -r requirements.txt
```We use the latest version of `alignment-handbook` for training (ver `alignment-handbook-0.4.0.dev0`). We followed the installation instructions on [alignment-handbook repository](https://github.com/huggingface/alignment-handbook):
```bash
git clone https://github.com/huggingface/alignment-handbook.git
cd ./alignment-handbook/
python -m pip install .
```Please refer to [Trust-Align README](./trust_align/README.md) for more information.
## Bug or Questions?
If you have any questions related to the code or the paper, feel free to email Shang Hong (`[email protected]`). If you encounter any problems when using the code, or want to report a bug, you can open an issue. Please try to specify the problem with details so we can help you better and quicker!
## Citation
If you find our code, data, models, or the paper useful, please cite the paper:
```bibtex
@misc{song2024measuringenhancingtrustworthinessllms,
title={Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse},
author={Maojia Song and Shang Hong Sim and Rishabh Bhardwaj and Hai Leong Chieu and Navonil Majumder and Soujanya Poria},
year={2024},
eprint={2409.11242},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2409.11242},
}
```