https://github.com/declare-lab/trust-align

Codes and datasets for the paper Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse
https://github.com/declare-lab/trust-align

rag retrieval-augmented-generation

Last synced: about 2 months ago
JSON representation

Codes and datasets for the paper Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse

Host: GitHub
URL: https://github.com/declare-lab/trust-align
Owner: declare-lab
Created: 2024-09-17T12:49:59.000Z (9 months ago)
Default Branch: main
Last Pushed: 2025-03-03T08:00:13.000Z (3 months ago)
Last Synced: 2025-03-27T18:21:29.279Z (2 months ago)
Topics: rag, retrieval-augmented-generation
Language: Python
Homepage:
Size: 2.41 MB
Stars: 47
Watchers: 1
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse (ICLR 2025, Oral, top 1.8%)

> 📣 4/2/25: We have updated our repo structure to hopefully be more user friendly!

> 📣 31/1/25: We have open-sourced the Trust-Aligned models [here](https://huggingface.co/collections/declare-lab/trust-align-679491760dd03cc5f4d479e6)!

> 📣 22/1/25: This paper has been accepted to ICLR 2025!

This repository contains the original implementation of [Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse](https://arxiv.org/abs/2409.11242) (accepted at ICLR 2025). There are two parts to this repository:

1. Trust-Align: A preference dataset and framework that aligns LLMs to be more trustworthy, as measured by higher Trust-Score.

2. Trust-Eval: A framework to evaluate the trustworthiness of inline-cited outputs generated by large language models (LLMs) within the Retrieval-Augmented Generation (RAG) setting.

**Paper abstract:**

LLMs are an integral part of retrieval-augmented generation (RAG) systems. While many studies focus on evaluating the quality of end-to-end RAG systems, there is a lack of research on understanding the appropriateness of an LLM for the RAG task. Thus, we introduce a new metric, Trust-Score, that provides a holistic evaluation of the trustworthiness of LLMs in an RAG framework. We show that various prompting methods, such as in-context learning, fail to adapt LLMs effectively to the RAG task. Thus, we propose Trust-Align, a framework to align LLMs for higher Trust-Score. LLaMA-3-8b, aligned with our method, significantly outperforms open-source LLMs of comparable sizes on ASQA (↑10.7), QAMPARI (↑29.2), and ELI5 (↑14.9).

## Data

The **evaluation** dataset used in Trust-Eval is available on [Trust-Align Huggingface](https://huggingface.co/datasets/declare-lab/Trust-Score/tree/main/Trust-Score).

The **SFT and DPO training** dataset used in Trust-Align is also available [Trust-Align Huggingface](https://huggingface.co/datasets/declare-lab/Trust-Score/tree/main/Trust-Align).

## Trust-Eval

Trust-Eval quantifies trustworthiness on three main axis using Trust-Score:

1. **Response Correctness**: Correctness of the generated claims
2. **Attribution Quality**: Quality of citations generated. Concerns the recall (Are generated statements well-supported by the set citations?) and precision (Are the citations relevant to the statements?) of citations.
3. **Refusal Groundedness**: Ability of the model to discern if the question can be answered given the documents

Trust-Score

We release Trust-Eval as a standalone package. You can install by following the steps below:

1. **Set up a Python environment**

```bash
conda create -n trust_eval python=3.10.13
conda activate trust_eval
```

2. **Install dependencies**

```bash
pip install trust_eval
```

> Note: that vLLM will be installed with CUDA 12.1. Please ensure your CUDA setup is compatible.

3. **Set up NLTK**

```bash
import nltk
nltk.download('punkt_tab')
```

Please refer to [Trust-Eval README](./trust_eval/README.md) for more information.

## Trust-Align

Trust-Align

### Set up

```bash
conda create -n cite python=3.10.13
conda activate cite
pip install -r requirements.txt
```

We use the latest version of `alignment-handbook` for training (ver `alignment-handbook-0.4.0.dev0`). We followed the installation instructions on [alignment-handbook repository](https://github.com/huggingface/alignment-handbook):

```bash
git clone https://github.com/huggingface/alignment-handbook.git
cd ./alignment-handbook/
python -m pip install .
```

Please refer to [Trust-Align README](./trust_align/README.md) for more information.

## Bug or Questions?

If you have any questions related to the code or the paper, feel free to email Shang Hong (`[email protected]`). If you encounter any problems when using the code, or want to report a bug, you can open an issue. Please try to specify the problem with details so we can help you better and quicker!

## Citation

If you find our code, data, models, or the paper useful, please cite the paper:

```bibtex
@misc{song2024measuringenhancingtrustworthinessllms,
title={Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse},
author={Maojia Song and Shang Hong Sim and Rishabh Bhardwaj and Hai Leong Chieu and Navonil Majumder and Soujanya Poria},
year={2024},
eprint={2409.11242},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2409.11242},
}
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/declare-lab/trust-align

Awesome Lists containing this project

README