https://github.com/castorini/rank_llm
RankLLM is a Python toolkit for reproducible information retrieval research using rerankers, with a focus on listwise reranking.
https://github.com/castorini/rank_llm
Last synced: 25 days ago
JSON representation
RankLLM is a Python toolkit for reproducible information retrieval research using rerankers, with a focus on listwise reranking.
- Host: GitHub
- URL: https://github.com/castorini/rank_llm
- Owner: castorini
- License: apache-2.0
- Created: 2023-08-07T21:49:33.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-03-28T15:57:10.000Z (about 1 month ago)
- Last Synced: 2025-04-01T18:19:53.483Z (about 1 month ago)
- Language: Python
- Homepage: http://rankllm.ai
- Size: 32.2 MB
- Stars: 420
- Watchers: 11
- Forks: 56
- Open Issues: 20
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
- StarryDivineSky - castorini/rank_llm
README
# RankLLM
[](https://pypi.org/project/rank-llm/)
[](https://pepy.tech/project/rank-llm)
[](https://pepy.tech/project/rank-llm)
[](https://arxiv.org/abs/2309.15088)
[](https://www.apache.org/licenses/LICENSE-2.0)We offer a suite of rerankers - pointwise models like MonoT5, pairwise models like DuoT5 and listwise models with a focus on open source LLMs compatible with [vLLM](https://https://github.com/vllm-project/vllm), [SGLang](https://github.com/sgl-project/sglang), or [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM). We also support RankGPT and RankGemini variants, which are proprietary listwise rerankers. Addtionally, we support reranking with the first-token logits only to improve inference efficiency. Some of the code in this repository is borrowed from [RankGPT](https://github.com/sunnweiwei/RankGPT), [PyGaggle](https://github.com/castorini/pygaggle), and [LiT5](https://github.com/castorini/LiT5)!
![]()
## Releases
current_version = 0.21.0## Content
1. [Installation](#installation)
2. [Quick Start](#quick-start)
3. [End-to-end Run and 2CR](#end-to-end-run-and-2cr)
4. [Model Zoo](#model-zoo)
5. [Training](#training)
6. [Community Contribution](#community-contribution)
7. [References and Citations](#references)
8. [Acknowledgments](#acknowledgments)> **⚠️ RankLLM is not compatible with macOS**, regardless of whether you are using an Intel-based Mac or Apple Silicon (M-series). We recommend using Linux or Windows instead.
## ❗ JDK 21 Warning
As rank_llm relies on [Anserini](https://github.com/castorini/anserini), it is required that you have JDK 21 installed.
Please note that using JDK 11 is not supported and may lead to errors.## Create Conda Environment
```bash
conda create -n rankllm python=3.10
conda activate rankllm
```## Install Pytorch with CUDA (Windows/Linux)
```bash
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
```## Install OpenJDK with Maven if you want to use the retriever
```bash
conda install -c conda-forge openjdk=21 maven -y
```## Install Dependencies
```bash
pip install -r requirements.txt
```## Install SGLang, or TensorRT-LLM (Optional)
### Install SGLang (Optional)
```bash
pip install -e .[sglang] # local installation for development
pip install rank-llm[sglang] # or pip installation
```Remember to install flashinfer to use `SGLang` backend.
```bash
pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.4/
```### TensorRT-LLM
```bash
pip install -e .[tensorrt-llm] # local installation for development
pip install rank-llm[tensorrt-llm] # or pip installation
```
# ⏳ Quick Start
The following code snippet is a minimal walk through of retrieval, reranking, evalaution, and invocations analysis of top 100 retrieved documents for queries from `DL19`. In this example `BM25` is used as the retriever and `RankZephyr` as the reranker. Additional sample snippets are available to run under the `src/rank_llm/demo` directory.
```python
from pathlib import Pathfrom rank_llm.analysis.response_analysis import ResponseAnalyzer
from rank_llm.data import DataWriter
from rank_llm.evaluation.trec_eval import EvalFunction
from rank_llm.rerank import Reranker, get_openai_api_key
from rank_llm.rerank.listwise import (
SafeOpenai,
VicunaReranker,
ZephyrReranker,
)
from rank_llm.retrieve.retriever import Retriever
from rank_llm.retrieve.topics_dict import TOPICS# -------- Retrieval --------
# By default BM25 is used for retrieval of top 100 candidates.
dataset_name = "dl19"
retrieved_results = Retriever.from_dataset_with_prebuilt_index(dataset_name)# Users can specify other retrieval methods and number of retrieved candidates.
# retrieved_results = Retriever.from_dataset_with_prebuilt_index(
# dataset_name, RetrievalMethod.SPLADE_P_P_ENSEMBLE_DISTIL, k=50
# )
# ---------------------------# --------- Rerank ----------
# Rank Zephyr model
reranker = ZephyrReranker()# Rank Vicuna model
# reranker = VicunaReranker()# RankGPT
# model_coordinator = SafeOpenai("gpt-4o-mini", 4096, keys=get_openai_api_key())
# reranker = Reranker(model_coordinator)rerank_results = reranker.rerank_batch(requests=retrieved_results)
# ---------------------------# ------- Evaluation --------
# Evaluate retrieved results.
ndcg_10_retrieved = EvalFunction.from_results(retrieved_results, TOPICS[dataset_name])
print(ndcg_10_retrieved)# Evaluate rerank results.
ndcg_10_rerank = EvalFunction.from_results(rerank_results, TOPICS[dataset_name])
print(ndcg_10_rerank)# By default ndcg@10 is the eval metric, other value can be specified:
# eval_args = ["-c", "-m", "map_cut.100", "-l2"]
# map_100_rerank = EvalFunction.from_results(rerank_results, topics, eval_args)
# print(map_100_rerank)# eval_args = ["-c", "-m", "recall.20"]
# recall_20_rerank = EvalFunction.from_results(rerank_results, topics, eval_args)
# print(recall_20_rerank)# ---------------------------
# --- Analyze invocations ---
analyzer = ResponseAnalyzer.from_inline_results(rerank_results)
error_counts = analyzer.count_errors(verbose=True)
print(error_counts)
# ---------------------------# ------ Save results -------
writer = DataWriter(rerank_results)
Path(f"demo_outputs/").mkdir(parents=True, exist_ok=True)
writer.write_in_jsonl_format(f"demo_outputs/rerank_results.jsonl")
writer.write_in_trec_eval_format(f"demo_outputs/rerank_results.txt")
writer.write_inference_invocations_history(
f"demo_outputs/inference_invocations_history.json"
)
# ---------------------------
```# End-to-end Run and 2CR
If you are interested in running retrieval and reranking end-to-end or reproducing the results from the [reference papers](#✨-references), `run_rank_llm.py` is a convinent wrapper script that combines these two steps.The comperehensive list of our two-click reproduction commands are available on [MS MARCO V1](https://castorini.github.io/rank_llm/src/rank_llm/2cr/msmarco-v1-passage.html) and [MS MARCO V2](https://castorini.github.io/rank_llm/src/rank_llm/2cr/msmarco-v2-passage.html) webpages for DL19 and DL20 and DL21-23 datasets, respectively. Moving forward, we plan to cover more datasets and retrievers in our 2CR pages. The rest of this session provides some sample e2e runs.
## RankZephyrWe can run the RankZephyr model with the following command:
```bash
python src/rank_llm/scripts/run_rank_llm.py --model_path=castorini/rank_zephyr_7b_v1_full --top_k_candidates=100 --dataset=dl20 \
--retrieval_method=SPLADE++_EnsembleDistil_ONNX --prompt_mode=rank_GPT --context_size=4096 --variable_passages
```Including the `--sglang_batched` flag will allow you to run the model in batched mode using the `SGLang` library.
Including the `--tensorrt_batched` flag will allow you to run the model in batched mode using the `TensorRT-LLM` library.
If you want to run multiple passes of the model, you can use the `--num_passes` flag.
## RankGPT4-o
We can run the RankGPT4-o model with the following command:
```bash
python src/rank_llm/scripts/run_rank_llm.py --model_path=gpt-4o --top_k_candidates=100 --dataset=dl20 \
--retrieval_method=bm25 --prompt_mode=rank_GPT_APEER --context_size=4096 --use_azure_openai
```
Note that the `--prompt_mode` is set to `rank_GPT_APEER` to use the LLM refined prompt from [APEER](https://arxiv.org/abs/2406.14449).
This can be changed to `rank_GPT` to use the original prompt.## LiT5
We can run the LiT5-Distill V2 model (which could rerank 100 documents in a single pass) with the following command:
```bash
python src/rank_llm/scripts/run_rank_llm.py --model_path=castorini/LiT5-Distill-large-v2 --top_k_candidates=100 --dataset=dl19 \
--retrieval_method=bm25 --prompt_mode=LiT5 --context_size=150 --batch_size=4 \
--variable_passages --window_size=100
```We can run the LiT5-Distill original model (which works with a window size of 20) with the following command:
```bash
python src/rank_llm/scripts/run_rank_llm.py --model_path=castorini/LiT5-Distill-large --top_k_candidates=100 --dataset=dl19 \
--retrieval_method=bm25 --prompt_mode=LiT5 --context_size=150 --batch_size=32 \
--variable_passages
```We can run the LiT5-Score model with the following command:
```bash
python src/rank_llm/scripts/run_rank_llm.py --model_path=castorini/LiT5-Score-large --top_k_candidates=100 --dataset=dl19 \
--retrieval_method=bm25 --prompt_mode=LiT5 --context_size=150 --batch_size=8 \
--window_size=100 --variable_passages
```## MonoT5
The following runs the 3B variant of MonoT5 trained for 10K steps:
```bash
python src/rank_llm/scripts/run_rank_llm.py --model_path=castorini/monot5-3b-msmarco-10k --top_k_candidates=1000 --dataset=dl19 \
--retrieval_method=bm25 --prompt_mode=monot5 --context_size=512
```Note that we usually rerank 1K candidates with MonoT5.
## DuoT5
The following runs the #B variant of DuoT5 trained for 10K steps:
```bash
python src/rank_llm/scripts/run_rank_llm.py --model_path=castorini/duot5-3b-msmarco-10k --top_k_candidates=50 --dataset=dl19 \
--retrieval_method=bm25 --prompt_mode=duot5
```Since Duo's pairwise comparison has $O(n^2) runtime complexity, we recommend reranking top 50 candidates using DuoT5 models.
## FirstMistral
We can run the FirstMistral model, reranking using the first-token logits only with the following command:
```
python src/rank_llm/scripts/run_rank_llm.py --model_path=castorini/first_mistral --top_k_candidates=100 --dataset=dl20 --retrieval_method=SPLADE++_EnsembleDistil_ONNX --prompt_mode=rank_GPT --context_size=4096 --variable_passages --use_logits --use_alpha --num_gpus 1
```Omit `--use_logits` if you wish to perform traditional listwise reranking.
## Gemini Flash 2.0
First install genai:
```bash
pip install -e .[genai] # local installation for development
pip install rank-llm[genai] # or pip installation
```Then run the following command:
```bash
python src/rank_llm/scripts/run_rank_llm.py --model_path=gemini-2.0-flash-001 --top_k_candidates=100 --dataset=dl20 \
--retrieval_method=SPLADE++_EnsembleDistil_ONNX --prompt_mode=rank_GPT_APEER --context_size=4096
```The following is a table of the listwise models our repository was primarily built to handle (with the models hosted on HuggingFace):
`vLLM`, `SGLang`, and `TensorRT-LLM` backends are only supported for `RankZephyr` and `RankVicuna` models.
| Model Name | Hugging Face Identifier/Link |
|-------------------|---------------------------------------------|
| RankZephyr 7B V1 - Full - BF16 | [castorini/rank_zephyr_7b_v1_full](https://huggingface.co/castorini/rank_zephyr_7b_v1_full) |
| RankVicuna 7B - V1 | [castorini/rank_vicuna_7b_v1](https://huggingface.co/castorini/rank_vicuna_7b_v1) |
| RankVicuna 7B - V1 - No Data Augmentation | [castorini/rank_vicuna_7b_v1_noda](https://huggingface.co/castorini/rank_vicuna_7b_v1_noda) |
| RankVicuna 7B - V1 - FP16 | [castorini/rank_vicuna_7b_v1_fp16](https://huggingface.co/castorini/rank_vicuna_7b_v1_fp16) |
| RankVicuna 7B - V1 - No Data Augmentation - FP16 | [castorini/rank_vicuna_7b_v1_noda_fp16](https://huggingface.co/castorini/rank_vicuna_7b_v1_noda_fp16) |We also officially support the following rerankers built by our group:
## LiT5 Suite
The following is a table specifically for our LiT5 suite of models hosted on HuggingFace:
| Model Name | 🤗 Hugging Face Identifier/Link |
|-----------------------|---------------------------------------------|
| LiT5 Distill base | [castorini/LiT5-Distill-base](https://huggingface.co/castorini/LiT5-Distill-base) |
| LiT5 Distill large | [castorini/LiT5-Distill-large](https://huggingface.co/castorini/LiT5-Distill-large) |
| LiT5 Distill xl | [castorini/LiT5-Distill-xl](https://huggingface.co/castorini/LiT5-Distill-xl) |
| LiT5 Distill base v2 | [castorini/LiT5-Distill-base-v2](https://huggingface.co/castorini/LiT5-Distill-base-v2) |
| LiT5 Distill large v2 | [castorini/LiT5-Distill-large-v2](https://huggingface.co/castorini/LiT5-Distill-large-v2) |
| LiT5 Distill xl v2 | [castorini/LiT5-Distill-xl-v2](https://huggingface.co/castorini/LiT5-Distill-xl-v2) |
| LiT5 Score base | [castorini/LiT5-Score-base](https://huggingface.co/castorini/LiT5-Score-base) |
| LiT5 Score large | [castorini/LiT5-Score-large](https://huggingface.co/castorini/LiT5-Score-large) |
| LiT5 Score xl | [castorini/LiT5-Score-xl](https://huggingface.co/castorini/LiT5-Score-xl) |Now you can run top-100 reranking with the v2 model in a single pass while maintaining efficiency!
## MonoT5 Suite - Pointwise Rerankers
The following is a table specifically for our monoT5 suite of models hosted on HuggingFace:
| Model Name | 🤗 Hugging Face Identifier/Link |
|-----------------------------------|--------------------------------------------------------|
| monoT5 Small MSMARCO 10K | [castorini/monot5-small-msmarco-10k](https://huggingface.co/castorini/monot5-small-msmarco-10k) |
| monoT5 Small MSMARCO 100K | [castorini/monot5-small-msmarco-100k](https://huggingface.co/castorini/monot5-small-msmarco-100k) |
| monoT5 Base MSMARCO | [castorini/monot5-base-msmarco](https://huggingface.co/castorini/monot5-base-msmarco) |
| monoT5 Base MSMARCO 10K | [castorini/monot5-base-msmarco-10k](https://huggingface.co/castorini/monot5-base-msmarco-10k) |
| monoT5 Large MSMARCO 10K | [castorini/monot5-large-msmarco-10k](https://huggingface.co/castorini/monot5-large-msmarco-10k) |
| monoT5 Large MSMARCO | [castorini/monot5-large-msmarco](https://huggingface.co/castorini/monot5-large-msmarco) |
| monoT5 3B MSMARCO 10K | [castorini/monot5-3b-msmarco-10k](https://huggingface.co/castorini/monot5-3b-msmarco-10k) |
| monoT5 3B MSMARCO | [castorini/monot5-3b-msmarco](https://huggingface.co/castorini/monot5-3b-msmarco) |
| monoT5 Base Med MSMARCO | [castorini/monot5-base-med-msmarco](https://huggingface.co/castorini/monot5-base-med-msmarco) |
| monoT5 3B Med MSMARCO | [castorini/monot5-3b-med-msmarco](https://huggingface.co/castorini/monot5-3b-med-msmarco) |We recommend the Med models for biomedical retrieval. We also provide both 10K (generally better OOD effectiveness) and 100K checkpoints (better in-domain).
# Training
Please check the `training` directory for finetuning open-source listwise rerankers.
# Community Contribution
If you would like to contribute to the project, please refer to the [contribution guidelines](CONTRIBUTING.md).If you use RankLLM, please cite the following relevant papers:
[[2309.15088] RankVicuna: Zero-Shot Listwise Document Reranking with Open-Source Large Language Models](https://arxiv.org/abs/2309.15088)
```
@ARTICLE{pradeep2023rankvicuna,
title = {{RankVicuna}: Zero-Shot Listwise Document Reranking with Open-Source Large Language Models},
author = {Ronak Pradeep and Sahel Sharifymoghaddam and Jimmy Lin},
year = {2023},
journal = {arXiv:2309.15088}
}
```[[2312.02724] RankZephyr: Effective and Robust Zero-Shot Listwise Reranking is a Breeze!](https://arxiv.org/abs/2312.02724)
```
@ARTICLE{pradeep2023rankzephyr,
title = {{RankZephyr}: Effective and Robust Zero-Shot Listwise Reranking is a Breeze!},
author = {Ronak Pradeep and Sahel Sharifymoghaddam and Jimmy Lin},
year = {2023},
journal = {arXiv:2312.02724}
}
```If you use one of the LiT5 models please cite the following relevant paper:
[[2312.16098] Scaling Down, LiTting Up: Efficient Zero-Shot Listwise Reranking with Seq2seq Encoder-Decoder Models](https://arxiv.org/abs/2312.16098)
```
@ARTICLE{tamber2023scaling,
title = {Scaling Down, LiTting Up: Efficient Zero-Shot Listwise Reranking with Seq2seq Encoder-Decoder Models},
author = {Manveer Singh Tamber and Ronak Pradeep and Jimmy Lin},
year = {2023},
journal = {arXiv:2312.16098}
}
```If you use one of the monoT5 models please cite the following relevant paper:
[[2101.05667] The Expando-Mono-Duo Design Pattern for Text Ranking with Pretrained Sequence-to-Sequence Models](https://arxiv.org/abs/2101.05667)
```
@ARTICLE{pradeep2021emd,
title = {The Expando-Mono-Duo Design Pattern for Text Ranking with Pretrained Sequence-to-Sequence Models},
author = {Ronak Pradeep and Rodrigo Nogueira and Jimmy Lin},
year = {2021},
journal = {arXiv:2101.05667},
}
```If you use the FirstMistral model, please consider citing:
[[2411.05508] An Early FIRST Reproduction and Improvements to Single-Token Decoding for Fast Listwise Reranking](https://arxiv.org/abs/2411.05508)
```
@ARTICLE{chen2024firstrepro,
title = title={An Early FIRST Reproduction and Improvements to Single-Token Decoding for Fast Listwise Reranking},
author = {Zijian Chen and Ronak Pradeep and Jimmy Lin},
year = {2024},
journal = {arXiv:2411.05508}
}
```If you would like to cite the FIRST methodology, please consider citing:
[[2406.15657] FIRST: Faster Improved Listwise Reranking with Single Token Decoding](https://arxiv.org/abs/2406.15657)
```
@ARTICLE{reddy2024first,
title = {FIRST: Faster Improved Listwise Reranking with Single Token Decoding},
author = {Reddy, Revanth Gangi and Doo, JaeHyeok and Xu, Yifei and Sultan, Md Arafat and Swain, Deevya and Sil, Avirup and Ji, Heng},
year = {2024}
journal = {arXiv:2406.15657},
}
```
# 🙏 AcknowledgmentsThis research is supported in part by the Natural Sciences and Engineering Research Council (NSERC) of Canada.