https://github.com/kidist-amde/ddro
We introduce the direct document relevance optimization (DDRO) for training a pairwise ranker model. DDRO encourages the model to focus on document-level relevance during generation
https://github.com/kidist-amde/ddro
alignment dense-retrieval dpo generative generative-ai generative-model generative-retrieval information-retrieval ir nlp nlp-machine-learning ranking-algorithm ranking-system rankings retrieval rlhf semantic-id semantic-ids
Last synced: about 1 month ago
JSON representation
We introduce the direct document relevance optimization (DDRO) for training a pairwise ranker model. DDRO encourages the model to focus on document-level relevance during generation
- Host: GitHub
- URL: https://github.com/kidist-amde/ddro
- Owner: kidist-amde
- License: apache-2.0
- Created: 2024-10-17T12:00:45.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-08-25T15:31:43.000Z (about 2 months ago)
- Last Synced: 2025-08-25T17:37:27.047Z (about 2 months ago)
- Topics: alignment, dense-retrieval, dpo, generative, generative-ai, generative-model, generative-retrieval, information-retrieval, ir, nlp, nlp-machine-learning, ranking-algorithm, ranking-system, rankings, retrieval, rlhf, semantic-id, semantic-ids
- Language: Python
- Homepage:
- Size: 2.1 MB
- Stars: 24
- Watchers: 1
- Forks: 3
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# DDRO: Direct Document Relevance Optimization for Generative Information Retrieval
[](https://arxiv.org/abs/2504.05181)
[](LICENSE)
[](https://huggingface.co/kiyam)This repository contains the official implementation of our SIGIR 2025 paper:
๐ **[Lightweight and Direct Document Relevance Optimization for Generative IR (DDRO)](https://arxiv.org/abs/2504.05181)**
- Optimizing Generative Retrieval with Ranking-Aligned Objectives
---### ๐ง Repository Under Development
This repository is actively under development. Thanks for your patience, changes and improvements may be applied frequently. Stay tuned for updates!
---
## ๐ Table of Contents- [Motivation](#motivation)
- [What DDRO Does](#what-ddro-does)
- [Learning Objectives](#learning-objectives-in-ddro)
- [๐ ๏ธ Setup & Dependencies - Steps to Reproduce ๐ฏ](#1-install-environment)
- [Preprocessed Data & Model Checkpoints](#preprocessed-data--model-checkpoints)
- [๐ฌ Evaluate Pre-trained Models from HuggingFace
](#quick-evaluation)
- [Citation](#citation)## Motivation
**Misalignment in Learning Objectives:**
Gen-IR models are typically trained via next-token prediction (cross-entropy loss) over docid tokens.
While effective for language modeling, this objective:
- ๐ฏ Optimizes **token-level generation**
- โ Not designed for **document-level ranking**As a result, Gen-IR models are not directly optimized for **learning-to-rank**, which is the core requirement in IR systems.
## What DDRO Does
In this work, we ask:
> _How can Gen-IR models directly learn to rank documents, instead of just predicting the next token?_
We propose **DDRO**:
**Lightweight and Direct Document Relevance Optimization for Gen-IR**### โ Key Contributions:
- Aligns training objective with ranking by using **pairwise preference learning**
- Trains the model to **prefer relevant documents over non-relevant ones**
- Bridges the gap between **autoregressive training** and **ranking-based optimization**
- Requires **no reinforcement learning or reward modeling**---
### Learning Objectives in DDRO
We optimize DDRO in two phases:
---
#### ๐ Phase 1: Supervised Fine-Tuning (SFT)
Learn to generate the correct **docid** sequence given a query by minimizing the autoregressive token-level cross-entropy loss:
-
Maximize the likelihood of generating the correct docid given a query:
-
![]()
---#### ๐ Phase 2: Pairwise Ranking Optimization (DDRO Loss)
This phase improves the **ranking quality** of generated document identifiers by applying a **pairwise learning-to-rank objective** inspired by **Direct Preference Optimization (DPO)**.
๐ *Rafailov et al., 2023 โ [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://arxiv.org/abs/2305.18290)*
-
### ๐ Description
This **Direct Document Relevance Optimization (DDRO)** loss guides the model to **prefer relevant documents (`docidโบ`) over non-relevant ones (`docidโป`)** by comparing how both the current model and a frozen reference model score each document:
* `docidโบ`: A relevant document for the query `q`
* `docidโป`: A non-relevant or less relevant document
* $\pi_\theta$: The current model being optimized
* $\pi^{\text{ref}}$: A frozen reference model (typically trained with SFT in Phase 1)
* **ฮฒ**: Temperature-like factor controlling sensitivity.
* $\sigma$: Sigmoid function, to map scores to \[0,1] preference spaceEncourage the model to rank relevant docidโบ higher than non-relevant docidโป:
-
### โ Usage
The DPO loss is used **after** the SFT phase to **fine-tune the ranking behavior** of the model. Instead of just generating `docid`, the model now **learns to rank `docidโบ` higher than `docidโป`** in a relevance/preference-aligned manner.
---
### โ Why It Works
- Directly **encourages higher generation scores for relevant documents**
- Uses **contrastive ranking** rather than token-level generation
- Avoids reward modeling or RL while remaining efficient and scalable---
### ๐ก Why DDRO is Different from Standard DPO
While our optimization is inspired by the DPO framework [Rafailov et al., 2023](https://arxiv.org/abs/2305.18290), its adaptation to **Generative Document Retrieval** is **non-trivial**:
- In contrast to open-ended preference alignment, our task involves **structured docid generation** under **beam decoding constraints**
- Our model uses an **encoder-decoder** architecture rather than decoder-only
- The objective is **document-level ranking**, not open-ended preference generationThis required **novel integration** of preference optimization into **retrieval-specific pipelines**, making DDRO uniquely suited for GenIR.
## ๐ Project Structure
```bash
src/
โโโ data/ # Data downloading, preprocessing, and docid instance generation
โโโ pretrain/ # DDRO model training and evaluation logic (incl. ddro)
โโโ scripts/ # Entry-point shell scripts for SFT, ddro, BM25, and preprocessing
โโโ utils/ # Core utilities (tokenization, trie, metrics, trainers)
โโโ ddro.yml # Conda environment (for training DDRO)
โโโ pyserini.yml # Conda environment (for BM25 retrieval with Pyserini)
โโโ README.md # You're here!
โโโ requirements.txt # Additional Python dependencies
```
### ๐ Important
> ๐ **Each subdirectory includes a detailed `README.md` with instructions.**---
## ๐ ๏ธ Setup & Dependencies
### 1. Install Environment
Clone the repository and create the conda environment:
```bash
git clone https://github.com/kidist-amde/ddro.git
cd ddro
conda env create -f ddro_env.yml
conda activate ddro_env
```
---### 2. Download Datasets and Pretrained Model
We use MS MARCO document (top-300k) and Natural Questions (NQ-320k) datasets, and a pretrained T5 model.To download them, run the following commands from the project root (ddro/):
```bash
bash ./src/data/download/download_msmarco_datasets.sh
bash ./src/data/download/download_nq_datasets.sh
python ./src/data/download/download_t5_model.py
```
๐ For details and download links, refer to: [src/data/download/README.md](https://github.com/kidist-amde/ddro/tree/main/src/data/download#readme)## 3. Data Preparation
DDRO evaluated both on **Natural Questions (NQ)** and **MS MARCO** datasets.โ Sample Top-300K MS MARCO Subset
Run the following script to preprocess and extract the top-300K most relevant MS MARCO documents based on qrels:```bash
bash scripts/preprocess/sample_top_docs.sh
```
- ๐ This will generate: resources/datasets/processed/msmarco-docs-sents.top.300k.json.gz
(sentence-tokenized JSONL format, ranked by relevance frequency)
---
### Expected Directory Structure
Once everything is downloaded and processed, your resources/ directory should look like this:```
resources/
โโโ datasets/
โ โโโ raw/
โ โ โโโ msmarco-data/ # Raw MS MARCO dataset
โ โ โโโ nq-data/ # Raw Natural Questions dataset
โ โโโ processed/ # Preprocessed outputs
โโโ transformer_models/
โโโ t5-base/ # Local copy of T5 model & tokenizer
```
---
### ๐ Important
> ๐ To process and sample both datasets, generate document IDs, and prepare training/evaluation instances, please refer to the corresponding README:
> ๐ [`src/data/dataprep/README.md`](https://github.com/kidist-amde/ddro/tree/main/src/data/data_prep#readme)
---## Training Pipeline
### ๐ Phase 1: Supervised Fine-Tuning (SFT)
We first train a **Supervised Fine-Tuning (SFT) model** using **next-token prediction** across three stages:
1. **Pretraining** on document content (`doc โ docid`)
2. **Search Pretraining** on pseudo queries (`pseudoquery โ docid`)
3. **Finetuning** on real queries using supervised pairs from qrels (with gold docids) (`query โ docid`)This results in a **seed model** trained to autoregressively generate document identifiers.
You can run all stages with a single command:
```bash
bash ddro/src/scripts/sft/launch_SFT_training.sh
```๐ The \--encoding flag in the script supports id formats like pq, url.
---## ๐ง Phase 2: DDRO Training (Pairwise Optimization)
After training the SFT model (Phase 1), we apply **Phase 2: Direct Document Relevance Optimization**, which fine-tunes the model using a **pairwise ranking objective**, that trains the model to prefer relevant documents over non-relevant ones.
This bridges the gap between **autoregressive generation** and **ranking-based retrieval**.
We implement this using a custom version of Hugging Face's [`DPOTrainer`](https://github.com/huggingface/trl).
Run DDRO training and evaluation:
```bash
bash scripts/ddro/slurm_submit_ddro_training.sh
bash scripts/ddro/slurm_submit_ddro_eval.sh
```---
## Model Evaluation### ๐ฌ Evaluate Pre-trained Models from HuggingFace
You can directly evaluate our published models without training from scratch:
#### Available Models:
- `kiyam/ddro-msmarco-pq` - MS MARCO with PQ encoding
- `kiyam/ddro-msmarco-tu` - MS MARCO with Title+URL encoding
- `kiyam/ddro-nq-pq` - Natural Questions with PQ encoding
- `kiyam/ddro-nq-tu` - Natural Questions with Title+URL encoding#### Quick Evaluation:
```bash
# For SLURM clusters:
sbatch src/pretrain/hf_eval/slurm_submit_hf_eval.sh# Or run directly:
encoding="url_title" # Choose from: "url_title", "pq"python src/pretrain/hf_eval/eval_hf_docid_ranking.py \
--per_gpu_batch_size 4 \
--log_path logs/msmarco/dpo_HF_url.log \
--pretrain_model_path kiyam/ddro-msmarco-tu \
--docid_path resources/datasets/processed/msmarco-data/encoded_docid/${encoding}_docid.txt \
--test_file_path resources/datasets/processed/msmarco-data/eval_data/query_dev.${encoding}.jsonl \
--dataset_script_dir src/data/data_scripts \
--dataset_cache_dir ./cache \
--num_beams 15 \
--add_doc_num 6144 \
--max_seq_length 64 \
--max_docid_length 100 \
--use_docid_rank True \
--docid_format msmarco \
--lookup_fallback True \
--device cuda:0
```#### Key Parameters:
- `--encoding`: Use `"url_title"` or `"pq"` to match your model type
- `--docid_format`: Use `"msmarco"` or `"nq"` depending on the dataset
- `--pretrain_model_path`: Specify the HuggingFace model you want to evaluate#### Pre-generated Resources:
You can use our pre-generated encoded document IDs from [HuggingFace Datasets](https://huggingface.co/datasets/kiyam/ddro-docids) to skip the data preparation step.๐ Evaluation logs and metrics are saved to:
```
logs/
outputs/
```
---## ๐ Datasets Used
We evaluate DDRO on two standard retrieval benchmarks:
- ๐ [MS MARCO Document Ranking](https://microsoft.github.io/msmarco/Datasets.html#document-ranking-dataset)
- ๐ [Natural Questions (NQ)](https://ai.google.com/research/NaturalQuestions)## Preprocessed Data & Model Checkpoints
All datasets, pseudo queries, docid encodings, and model checkpoints are available here:
๐ [DDRO Generative IR Collection on Hugging Face ๐ค](https://huggingface.co/collections/kiyam/ddro-generative-document-retrieval-680f63f2e9a72033598461c5)---
## ๐ Acknowledgments
We gratefully acknowledge the following open-source projects:
- [ULTRON](https://github.com/smallporridge/WebUltron)
- [HuggingFace TRL](https://github.com/huggingface/trl)
- [NCI (Neural Corpus Indexer)](https://github.com/solidsea98/Neural-Corpus-Indexer-NCI)
- [docTTTTTquery](https://github.com/castorini/docTTTTTquery)---
## ๐ License
This project is licensed under the [Apache 2.0 License](LICENSE).
---
## Citation
```bibtex
@inproceedings{mekonnen2025lightweight,
title={Lightweight and Direct Document Relevance Optimization for Generative Information Retrieval},
author={Mekonnen, Kidist Amde and Tang, Yubao and de Rijke, Maarten},
booktitle={Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval},
pages={1327--1338},
year={2025}
}
}
```
---## ๐ฌ Contact
For questions, please open an [issue](https://github.com/kidist-amde/DDRO-Direct-Document-Relevance-Optimization/issues).
ยฉ 2025 **Kidist Amde Mekonnen** ยท Made with โค๏ธ at [IRLab](https://irlab.science.uva.nl/), University of Amsterdam.
---