https://github.com/dmis-lab/retpo

[NAACL 2025 Findings] Ask Optimal Questions: Aligning Large Language Models with Retriever's Preference in Conversational Search
https://github.com/dmis-lab/retpo

Last synced: about 1 year ago
JSON representation

[NAACL 2025 Findings] Ask Optimal Questions: Aligning Large Language Models with Retriever's Preference in Conversational Search

Host: GitHub
URL: https://github.com/dmis-lab/retpo
Owner: dmis-lab
Created: 2025-01-31T04:13:12.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-04-30T08:03:14.000Z (about 1 year ago)
Last Synced: 2025-04-30T09:22:58.835Z (about 1 year ago)
Language: Python
Homepage: https://arxiv.org/abs/2402.11827
Size: 624 KB
Stars: 3
Watchers: 2
Forks: 0
Open Issues: 2
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# RetPO

Official implementation of "[Ask Optimal Questions: Aligning Large Language Models with Retriever’sPreference](https://arxiv.org/abs/2402.11827)".

> [Chanwoong Yoon^1*](https://scholar.google.com/citations?user=-9GfY0AAAAAJ&hl=en), [Gangwoo Kim^1*](https://scholar.google.com/citations?user=TmWGEFgAAAAJ&hl=en), [Byeongguk Jeon¹](https://scholar.google.com/citations?user=_Kw32VoAAAAJ&hl=en), [Sungdong Kim^2,3](https://scholar.google.com/citations?user=xKrSnDoAAAAJ&hl=en), [Yohan Jo⁴](https://scholar.google.com/citations?user=xp3LGRQAAAAJ&hl=en), [Jaewoo Kang¹](https://scholar.google.co.kr/citations?user=RaBZafQAAAAJ&hl=en)

> Korea University¹, NAVER Cloud², KAIST AI³, Seoul National University⁴

> In NAACL 2025.

📃 Paper | 🤗 Model | 🤗 RF-Collection

Overview Image

> **Abstract** Conversational search, unlike single-turn retrieval tasks, requires understanding the current question within a dialogue context. The common approach of rewrite-then-retrieve aims to decontextualize questions to be self-sufficient for off-the-shelf retrievers, but most existing methods produce sub-optimal query rewrites due to the limited ability to incorporate signals from the retrieval results. To overcome this limitation, we present a novel framework RetPO (Retriever's Preference Optimization), which is designed to optimize a language model (LM) for reformulating search queries in line with the preferences of the target retrieval systems. The process begins by prompting a large LM to produce various potential rewrites and then collects retrieval performance for these rewrites as the retrievers' preferences. Through the process, we construct a large-scale dataset called RF collection, containing Retrievers' Feedback on over 410K query rewrites across 12K conversations. Furthermore, we fine-tune a smaller LM using this dataset to align it with the retrievers' preferences as feedback. The resulting model demonstrates superiority on two benchmarks, surpassing the previous state-of-the-art performance of rewrite-then-retrieve approaches, including GPT-3.5.

## Content
1. Installation Instructions
2. Evaluation
3. RetPO (Retriever's Preference Optimization)

## 1. Installation Instructions
Please be aware that we utilize two distinct environments.
1. retpo_search (retriever indexing and search)
2. retpo_qr (QR model training and inference)
> The base retrieval code uses faiss-gpu, which is tied to specific versions of CUDA and torch. If the versions do not match, errors may occur. Therefore, we use separate environments.

### retpo_search
As we require a lot of retrieval of dense retriever, we recommend to consider to use faiss-gpu.
```bash
# create environment
conda create -n retpo python==3.9 && conda activate retpo_search

# install torch
pip install torch==1.12.0+cu116 torchvision==0.13.0+cu116 torchaudio==0.12.0 --extra-index-url https://download.pytorch.org/whl/cu116

# faiss-cpu or faiss-gpu
# CPU
pip install faiss-cpu==1.7.3
# GPU
pip install https://github.com/kyamagu/faiss-wheels/releases/download/v1.7.3/faiss_gpu-1.7.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

# other requirements
pip install -r requirements.txt

```

### retpo_qr
```bash
# create environment
cd retpo_qr/
conda create -n retpo_qr python=3.10 && conda activate retpo_qr

# install torch
pip install torch==2.1.0 # this specific version is crucial for reproducibility. you may need to install other variants based on your hardware.

# install dependencies
python -m pip install .

# Flash Attention 2 (Optional, but Recommended for Faster Training)
# If your machine has less than 96GB of RAM and many CPU cores, reduce MAX_JOBS, e.g.:
python -m pip install flash-attn --no-build-isolation
```

## 2. Evaluation

### Preparation
We mainly evaluate our method using two types of retrievers: BM25 and ANCE on two Conversational QA benchmarks: TopiOCQA and QReCC.

There are well-organized repositories for preprocessing these datasets and indexing passages for retrieval. We recommend using them before running our code. We mainly refer to the _ConvGQR_ as a reference.

Specifically, to run our code, you need to prepare following files.

> You can find the code to prepare these folders here:
> pyserini_index/ # https://github.com/fengranMark/ConvGQR/blob/main/bm25/create_index.sh
> tokenized/ # https://github.com/fengranMark/ConvGQR/blob/main/gen_tokenized_doc.py
> embeddings/ # https://github.com/fengranMark/ConvGQR/blob/main/gen_doc_embeddings.py

```bash
ROOT_DIR/
└── datasets/
└── checkpoints # Retriever checkpoints
└── ad-hoc-ance-msmarco # https://huggingface.co/3ricL/ad-hoc-ance-msmarco
└── topiocqa/
├── pyserini_index/
├── full_wiki_segments.tsv
├── tokenized/
├── embeddings/
└── qrecc/
├── pyserini_index/
├── full_wiki_segments.tsv
├── tokenized/
├── embeddings/

```

### Reproduce our performance
For those who'd like to reproduce our reported performance, you can download our queries generated by RetPO from [this Google Drive](https://drive.google.com/drive/folders/1YyXpzb8QXjaKajI1kNKywAn3ZZtSGCDe?usp=drive_link). (Place it in the ```ROOT_DIR/distill_outputs``` of the repository.)

You can reproduce our main performance by running the following command.
```bash
cd eval
bash ./scripts/bm25_topiocqa.sh
```

## 3. RetPO (Retriever's Preference Optimization)

### Download RF-Collection
We construct a large-scale dataset called RF-COLLECTION, containing Retrievers’ Feedback on over 410K query rewrites across 12K conversations.
You can download it from [Huggingface](https://huggingface.co/datasets/dmis-lab/RF-Collection) using the following command.
```python
from datasets import load_dataset

ds = load_dataset("dmis-lab/RF-Collection", cache_dir="{ROOT_DIR}/retpo_qr/")
```

## Citation

```bibtex
@article{yoon2024ask,
title={Ask Optimal Questions: Aligning Large Language Models with Retriever's Preference in Conversational Search},
author={Yoon, Chanwoong and Kim, Gangwoo and Jeon, Byeongguk and Kim, Sungdong and Jo, Yohan and Kang, Jaewoo},
journal={arXiv preprint arXiv:2402.11827},
year={2024}
}
```

## Contact
For more information or any questions of our work, feel free to contact me (cwyoon99 (at) korea.ac.kr or gmail.com).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dmis-lab/retpo

Awesome Lists containing this project

README