https://github.com/dmis-lab/retpo
[NAACL 2025 Findings] Ask Optimal Questions: Aligning Large Language Models with Retriever's Preference in Conversational Search
https://github.com/dmis-lab/retpo
Last synced: about 1 year ago
JSON representation
[NAACL 2025 Findings] Ask Optimal Questions: Aligning Large Language Models with Retriever's Preference in Conversational Search
- Host: GitHub
- URL: https://github.com/dmis-lab/retpo
- Owner: dmis-lab
- Created: 2025-01-31T04:13:12.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-04-30T08:03:14.000Z (about 1 year ago)
- Last Synced: 2025-04-30T09:22:58.835Z (about 1 year ago)
- Language: Python
- Homepage: https://arxiv.org/abs/2402.11827
- Size: 624 KB
- Stars: 3
- Watchers: 2
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# RetPO
Official implementation of "[Ask Optimal Questions: Aligning Large Language Models with RetrieverβsPreference](https://arxiv.org/abs/2402.11827)".
> [Chanwoong Yoon1*](https://scholar.google.com/citations?user=-9GfY0AAAAAJ&hl=en), [Gangwoo Kim1*](https://scholar.google.com/citations?user=TmWGEFgAAAAJ&hl=en), [Byeongguk Jeon1](https://scholar.google.com/citations?user=_Kw32VoAAAAJ&hl=en), [Sungdong Kim2,3](https://scholar.google.com/citations?user=xKrSnDoAAAAJ&hl=en), [Yohan Jo4](https://scholar.google.com/citations?user=xp3LGRQAAAAJ&hl=en), [Jaewoo Kang1](https://scholar.google.co.kr/citations?user=RaBZafQAAAAJ&hl=en)
> Korea University1, NAVER Cloud2, KAIST AI3, Seoul National University4
> In NAACL 2025.
π Paper | π€ Model | π€ RF-Collection
> **Abstract** Conversational search, unlike single-turn retrieval tasks, requires understanding the current question within a dialogue context. The common approach of rewrite-then-retrieve aims to decontextualize questions to be self-sufficient for off-the-shelf retrievers, but most existing methods produce sub-optimal query rewrites due to the limited ability to incorporate signals from the retrieval results. To overcome this limitation, we present a novel framework RetPO (Retriever's Preference Optimization), which is designed to optimize a language model (LM) for reformulating search queries in line with the preferences of the target retrieval systems. The process begins by prompting a large LM to produce various potential rewrites and then collects retrieval performance for these rewrites as the retrievers' preferences. Through the process, we construct a large-scale dataset called RF collection, containing Retrievers' Feedback on over 410K query rewrites across 12K conversations. Furthermore, we fine-tune a smaller LM using this dataset to align it with the retrievers' preferences as feedback. The resulting model demonstrates superiority on two benchmarks, surpassing the previous state-of-the-art performance of rewrite-then-retrieve approaches, including GPT-3.5.
## Content
1. Installation Instructions
2. Evaluation
3. RetPO (Retriever's Preference Optimization)
## 1. Installation Instructions
Please be aware that we utilize two distinct environments.
1. retpo_search (retriever indexing and search)
2. retpo_qr (QR model training and inference)
> The base retrieval code uses faiss-gpu, which is tied to specific versions of CUDA and torch. If the versions do not match, errors may occur. Therefore, we use separate environments.
### retpo_search
As we require a lot of retrieval of dense retriever, we recommend to consider to use faiss-gpu.
```bash
# create environment
conda create -n retpo python==3.9 && conda activate retpo_search
# install torch
pip install torch==1.12.0+cu116 torchvision==0.13.0+cu116 torchaudio==0.12.0 --extra-index-url https://download.pytorch.org/whl/cu116
# faiss-cpu or faiss-gpu
# CPU
pip install faiss-cpu==1.7.3
# GPU
pip install https://github.com/kyamagu/faiss-wheels/releases/download/v1.7.3/faiss_gpu-1.7.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
# other requirements
pip install -r requirements.txt
```
### retpo_qr
```bash
# create environment
cd retpo_qr/
conda create -n retpo_qr python=3.10 && conda activate retpo_qr
# install torch
pip install torch==2.1.0 # this specific version is crucial for reproducibility. you may need to install other variants based on your hardware.
# install dependencies
python -m pip install .
# Flash Attention 2 (Optional, but Recommended for Faster Training)
# If your machine has less than 96GB of RAM and many CPU cores, reduce MAX_JOBS, e.g.:
python -m pip install flash-attn --no-build-isolation
```
## 2. Evaluation
### Preparation
We mainly evaluate our method using two types of retrievers: BM25 and ANCE on two Conversational QA benchmarks: TopiOCQA and QReCC.
There are well-organized repositories for preprocessing these datasets and indexing passages for retrieval. We recommend using them before running our code. We mainly refer to the _ConvGQR_ as a reference.
Specifically, to run our code, you need to prepare following files.
> You can find the code to prepare these folders here:
> pyserini_index/ # https://github.com/fengranMark/ConvGQR/blob/main/bm25/create_index.sh
> tokenized/ # https://github.com/fengranMark/ConvGQR/blob/main/gen_tokenized_doc.py
> embeddings/ # https://github.com/fengranMark/ConvGQR/blob/main/gen_doc_embeddings.py
```bash
ROOT_DIR/
βββ datasets/
βββ checkpoints # Retriever checkpoints
βββ ad-hoc-ance-msmarco # https://huggingface.co/3ricL/ad-hoc-ance-msmarco
βββ topiocqa/
βββ pyserini_index/
βββ full_wiki_segments.tsv
βββ tokenized/
βββ embeddings/
βββ qrecc/
βββ pyserini_index/
βββ full_wiki_segments.tsv
βββ tokenized/
βββ embeddings/
```
### Reproduce our performance
For those who'd like to reproduce our reported performance, you can download our queries generated by RetPO from [this Google Drive](https://drive.google.com/drive/folders/1YyXpzb8QXjaKajI1kNKywAn3ZZtSGCDe?usp=drive_link). (Place it in the ```ROOT_DIR/distill_outputs``` of the repository.)
You can reproduce our main performance by running the following command.
```bash
cd eval
bash ./scripts/bm25_topiocqa.sh
```
## 3. RetPO (Retriever's Preference Optimization)
### Download RF-Collection
We construct a large-scale dataset called RF-COLLECTION, containing Retrieversβ Feedback on over 410K query rewrites across 12K conversations.
You can download it from [Huggingface](https://huggingface.co/datasets/dmis-lab/RF-Collection) using the following command.
```python
from datasets import load_dataset
ds = load_dataset("dmis-lab/RF-Collection", cache_dir="{ROOT_DIR}/retpo_qr/")
```
## Citation
```bibtex
@article{yoon2024ask,
title={Ask Optimal Questions: Aligning Large Language Models with Retriever's Preference in Conversational Search},
author={Yoon, Chanwoong and Kim, Gangwoo and Jeon, Byeongguk and Kim, Sungdong and Jo, Yohan and Kang, Jaewoo},
journal={arXiv preprint arXiv:2402.11827},
year={2024}
}
```
## Contact
For more information or any questions of our work, feel free to contact me (cwyoon99 (at) korea.ac.kr or gmail.com).