{"id":27984395,"url":"https://github.com/sunnweiwei/RankGPT","last_synced_at":"2025-05-08T05:01:54.282Z","repository":{"id":153626668,"uuid":"629969902","full_name":"sunnweiwei/RankGPT","owner":"sunnweiwei","description":"Is ChatGPT Good at Search? LLMs as Re-Ranking Agent [EMNLP 2023 Outstanding Paper Award]","archived":false,"fork":false,"pushed_at":"2024-03-10T08:27:04.000Z","size":26165,"stargazers_count":589,"open_issues_count":3,"forks_count":58,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-04-21T10:53:57.695Z","etag":null,"topics":["chatgpt","information-retrieval","large-language-models","reranking"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2304.09542","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sunnweiwei.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-04-19T11:56:06.000Z","updated_at":"2025-04-13T19:24:44.000Z","dependencies_parsed_at":"2023-06-09T05:00:46.151Z","dependency_job_id":"373fa69d-9af4-4666-96fc-c5880d4248f9","html_url":"https://github.com/sunnweiwei/RankGPT","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sunnweiwei%2FRankGPT","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sunnweiwei%2FRankGPT/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sunnweiwei%2FRankGPT/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sunnweiwei%2FRankGPT/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sunnweiwei","download_url":"https://codeload.github.com/sunnweiwei/RankGPT/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253002856,"owners_count":21838640,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chatgpt","information-retrieval","large-language-models","reranking"],"created_at":"2025-05-08T05:01:48.540Z","updated_at":"2025-05-08T05:01:54.194Z","avatar_url":"https://github.com/sunnweiwei.png","language":"Python","funding_links":[],"categories":["A01_文本生成_文本对话","Python","Search \u0026 Retrieval"],"sub_categories":["大语言对话模型及数据"],"readme":"# RankGPT: LLMs as Re-Ranking Agent\n\n[![Generic badge](https://img.shields.io/badge/arXiv-2304.09542-red.svg)](https://arxiv.org/abs/2304.09542)\n[![LICENSE](https://img.shields.io/badge/license-Apache-blue.svg?style=flat)](https://www.apache.org/licenses/LICENSE-2.0)\n\nCode for paper \"[Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agent](https://arxiv.org/abs/2304.09542)\"\n\nThis project aims to explore generative LLMs such as ChatGPT and GPT-4 for relevance ranking in Information Retrieval (IR).\n\n\n## News\n- **[2023.12.10]** Our [RankGPT](https://arxiv.org/abs/2304.09542) paper won the Outstanding Paper Award of EMNLP2023! 🎉🎉🎉\n- **[2023.11.06]** Introduce [Instruction Distillation](https://github.com/sunnweiwei/RankGPT/tree/main/InstructDistill): Simplifing complex ranking instructions to enhance the efficiency of LLMs. Achieve SOTA ranking performances with only open-source LLMs!\n- **[2023.10.08]** Our paper has been accepted for presentation at the EMNLP 2023 main conference. See the updated version at https://arxiv.org/pdf/2304.09542.pdf!\n- **[2023.08.05]** Now support Azure, Claude, Cohere, Llama2 via [LiteLLM](https://github.com/BerriAI/litellm)!\n- **[2023.07.11]** Release a new test set NovelEval with the novel search questions and passages that have not been contaminated by the latest LLMs (e.g., GPT-4). See [NovelEval](https://github.com/sunnweiwei/RankGPT/tree/main/NovelEval) for details.\n- **[2023.04.23]** Sharing 100K ChatGPT predicted permutations on MS MARCO training set [here](#download-data-and-model).\n- **[2023.04.19]** Our paper is now available at https://arxiv.org/abs/2304.09542\n\n## Quick example\nBelow defines a query and three candidate passages:\n\n```python\nitem = {\n    'query': 'How much impact do masks have on preventing the spread of the COVID-19?',\n    'hits': [\n        {'content': 'Title: Universal Masking is Urgent in the COVID-19 Pandemic: SEIR and Agent Based Models, Empirical Validation, Policy Recommendations Content: We present two models for the COVID-19 pandemic predicting the impact of universal face mask wearing upon the spread of the SARS-CoV-2 virus--one employing a stochastic dynamic network based compartmental SEIR (susceptible-exposed-infectious-recovered) approach, and the other employing individual ABM (agent-based modelling) Monte Carlo simulation--indicating (1) significant impact under (near) universal masking when at least 80% of a population is wearing masks, versus minimal impact when only 50% or less of the population is wearing masks, and (2) significant impact when universal masking is adopted early, by Day 50 of a regional outbreak, versus minimal impact when universal masking is adopted late. These effects hold even at the lower filtering rates of homemade masks. To validate these theoretical models, we compare their predictions against a new empirical data set we have collected'},\n        {'content': 'Title: Masking the general population might attenuate COVID-19 outbreaks Content: The effect of masking the general population on a COVID-19 epidemic is estimated by computer simulation using two separate state-of-the-art web-based softwares, one of them calibrated for the SARS-CoV-2 virus. The questions addressed are these: 1. Can mask use by the general population limit the spread of SARS-CoV-2 in a country? 2. What types of masks exist, and how elaborate must a mask be to be effective against COVID-19? 3. Does the mask have to be applied early in an epidemic? 4. A brief general discussion of masks and some possible future research questions regarding masks and SARS-CoV-2. Results are as follows: (1) The results indicate that any type of mask, even simple home-made ones, may be effective. Masks use seems to have an effect in lowering new patients even the protective effect of each mask (here dubbed\"one-mask protection\") is'},\n        {'content': 'Title: To mask or not to mask: Modeling the potential for face mask use by the general public to curtail the COVID-19 pandemic Content: Face mask use by the general public for limiting the spread of the COVID-19 pandemic is controversial, though increasingly recommended, and the potential of this intervention is not well understood. We develop a compartmental model for assessing the community-wide impact of mask use by the general, asymptomatic public, a portion of which may be asymptomatically infectious. Model simulations, using data relevant to COVID-19 dynamics in the US states of New York and Washington, suggest that broad adoption of even relatively ineffective face masks may meaningfully reduce community transmission of COVID-19 and decrease peak hospitalizations and deaths. Moreover, mask use decreases the effective transmission rate in nearly linear proportion to the product of mask effectiveness (as a fraction of potentially infectious contacts blocked) and coverage rate (as'}\n    ]\n}\n\n```\n\nWe can re-rank the passages using ChatGPT with instructional permutation generation:\n\n```python\nfrom rank_gpt import permutation_pipeline\nnew_item = permutation_pipeline(item, rank_start=0, rank_end=3, model_name='gpt-3.5-turbo', api_key='Your OPENAI Key!')\nprint(new_item)\n```\n\nWe get the following result:\n\n```python\n{\n    'query': 'How much impact do masks have on preventing the spread of the COVID-19?',\n    'hits': [\n        {'content': 'Title: Universal Masking is Urgent in the COVID-19 Pandemic: SEIR and Agent Based Models, Empirical Validation, Policy Recommendations Content: We present two models for the COVID-19 pandemic predicting the impact of universal face mask wearing upon the spread of the SARS-CoV-2 virus--one employing a stochastic dynamic network based compartmental SEIR (susceptible-exposed-infectious-recovered) approach, and the other employing individual ABM (agent-based modelling) Monte Carlo simulation--indicating (1) significant impact under (near) universal masking when at least 80% of a population is wearing masks, versus minimal impact when only 50% or less of the population is wearing masks, and (2) significant impact when universal masking is adopted early, by Day 50 of a regional outbreak, versus minimal impact when universal masking is adopted late. These effects hold even at the lower filtering rates of homemade masks. To validate these theoretical models, we compare their predictions against a new empirical data set we have collected'},\n        {'content': 'Title: To mask or not to mask: Modeling the potential for face mask use by the general public to curtail the COVID-19 pandemic Content: Face mask use by the general public for limiting the spread of the COVID-19 pandemic is controversial, though increasingly recommended, and the potential of this intervention is not well understood. We develop a compartmental model for assessing the community-wide impact of mask use by the general, asymptomatic public, a portion of which may be asymptomatically infectious. Model simulations, using data relevant to COVID-19 dynamics in the US states of New York and Washington, suggest that broad adoption of even relatively ineffective face masks may meaningfully reduce community transmission of COVID-19 and decrease peak hospitalizations and deaths. Moreover, mask use decreases the effective transmission rate in nearly linear proportion to the product of mask effectiveness (as a fraction of potentially infectious contacts blocked) and coverage rate (as'},\n        {'content': 'Title: Masking the general population might attenuate COVID-19 outbreaks Content: The effect of masking the general population on a COVID-19 epidemic is estimated by computer simulation using two separate state-of-the-art web-based softwares, one of them calibrated for the SARS-CoV-2 virus. The questions addressed are these: 1. Can mask use by the general population limit the spread of SARS-CoV-2 in a country? 2. What types of masks exist, and how elaborate must a mask be to be effective against COVID-19? 3. Does the mask have to be applied early in an epidemic? 4. A brief general discussion of masks and some possible future research questions regarding masks and SARS-CoV-2. Results are as follows: (1) The results indicate that any type of mask, even simple home-made ones, may be effective. Masks use seems to have an effect in lowering new patients even the protective effect of each mask (here dubbed\"one-mask protection\") is'}\n    ]\n}\n```\n\n\u003cdetails\u003e\n\u003csummary\u003eStep by step example\u003c/summary\u003e\n  \n  ```python\n  from rank_gpt import create_permutation_instruction, run_llm, receive_permutation\n  \n  # (1) Create permutation generation instruction\n  messages = create_permutation_instruction(item=item, rank_start=0, rank_end=3, model_name='gpt-3.5-turbo')\n  # (2) Get ChatGPT predicted permutation\n  permutation = run_llm(messages, api_key=\"Your OPENAI Key!\", model_name='gpt-3.5-turbo')\n  # (3) Use permutation to re-rank the passage\n  item = receive_permutation(item, permutation, rank_start=0, rank_end=3)\n  \n  ```\n  \n\u003c/details\u003e\n\n## Sliding window strategy\n\nWe introduce a sliding window strategy for the instructional permutation generation, that enables LLMs to rank more passages than their maximum token limit.\n\nThe idea is to rank from back to front using a sliding window, re-ranking only the passages within the window at a time.\n\nBelow is an example by re-ranking 3 passages with window size of 2 and step size of 1:\n\n```python\nfrom rank_gpt import sliding_windows\napi_key = \"Your OPENAI Key\"\nnew_item = sliding_windows(item, rank_start=0, rank_end=3, window_size=2, step=1, model_name='gpt-3.5-turbo', api_key=api_key)\nprint(new_item)\n```\n\n## Evaluation on Benchmarks\nWe use [pyserini](https://github.com/castorini/pyserini) to retrieve 100 passages for each query and re-rank them using instructional permutation generation.\n\nExample of evaluation on TREC-DL19:\n\n```python\nfrom pyserini.search import LuceneSearcher, get_topics, get_qrels\nfrom rank_gpt import run_retriever, sliding_windows\nimport tempfile\nopenai_key = None  # Your openai key\n\n# Retrieve passages using pyserini BM25.\nsearcher = LuceneSearcher.from_prebuilt_index('msmarco-v1-passage')\ntopics = get_topics('dl19-passage')\nqrels = get_qrels('dl19-passage')\nrank_results = run_retriever(topics, searcher, qrels, k=100)\n\n# Run sliding window permutation generation\nnew_results = []\nfor item in tqdm(rank_results):\n    new_item = sliding_windows(item, rank_start=0, rank_end=100, window_size=20, step=10, model_name='gpt-3.5-turbo', api_key=openai_key)\n    new_results.append(new_item)\n\n# Evaluate nDCG@10\nfrom trec_eval import EvalFunction\ntemp_file = tempfile.NamedTemporaryFile(delete=False).name\nEvalFunction.write_file(new_results, temp_file)\nEvalFunction.main('dl19-passage', temp_file)\n```\n\nRun evaluation on all benchmarks\n\n```sh\npython run_evaluation.py\n```\n\nBelow are the results (average nDCG@10) of our preliminary experiments on [TREC](https://microsoft.github.io/msmarco/TREC-Deep-Learning-2020.html), [BEIR](https://github.com/beir-cellar/beir) and [Mr. TyDi](https://github.com/castorini/mr.tydi).\n\n![Results on benchmarks](assets/benchmark-results.png)\n\n\n## Training Specialized Models\n\n### Download data and model\n\n|   File  | Note | Link |\n|:-------------------------------|:--------|:--------:|\n| marco-train-10k.jsonl | 10K queries sampled from MS MARCO | [Google drive](https://drive.google.com/file/d/1G3MpQ5a4KgUS13JJZFE9aQvCbQfgSQzj/view?usp=share_link) |\n| marco-train-10k-gpt3.5.json |  Permutations predicted by ChatGPT   | [Google drive](https://drive.google.com/file/d/1i7ckK7kN7BAqq5g7xAd0dLv3cTYYiclA/view?usp=share_link) |\n| deberta-10k-rank_net    |  Specialized Deberta model trained with RankNet loss | [Google drive](https://drive.google.com/file/d/1-KEpJ2KnJCqiJof4zNEA4m78tnwgxKhb/view?usp=share_link)  |\n|marco-train-100k.jsonl | 100K queries from MS MARCO | [Google drive](https://drive.google.com/file/d/1OgF4rj89FWSr7pl1c7Hu4x0oQYIMwhik/view?usp=share_link) |\n| marco-train-100k-gpt3.5.json | Permutations by ChatGPT of the 100K queries | [Google drive](https://drive.google.com/file/d/1z327WOKr70rC4UfOlQVBQnuLxChi_uPs/view?usp=share_link) |\n\n### Distill LLM to a small specialized model\n\n```bash\npython specialization.py \\\n--model microsoft/deberta-v3-base \\\n--loss rank_net \\\n--data data/marco-train-10k.jsonl \\\n--permutation marco-train-10k-gpt3.5.json \\\n--save_path out/deberta-10k-rank_net \\\n--do_train true \\\n--do_eval true\n```\n\nor run on multi-gpus, using [accelerate](https://github.com/huggingface/accelerate):\n\n```bash\naccelerate launch --num_processes 4 specialization.py \\\n--model microsoft/deberta-v3-base \\\n--loss rank_net \\\n--data data/marco-train-10k.jsonl \\\n--permutation marco-train-10k-gpt3.5.json \\\n--save_path out/deberta-10k-rank_net \\\n--do_train true \\\n--do_eval true\n```\n\n### Evaluate the distilled model on benchmarks\n\n```bash\npython specialization.py \\\n--model out/deberta-10k-rank_net \\\n--do_train false \\\n--do_eval true\n```\n\nThe following figure show the results of distilled specialized model with different model size and number of training queires.\n\n![Specialization results.](assets/specialization-results.png)\n\n## Cite\n\n```latex\n@article{Sun2023IsCG,\n  title={Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agent},\n  author={Weiwei Sun and Lingyong Yan and Xinyu Ma and Pengjie Ren and Dawei Yin and Zhaochun Ren},\n  journal={ArXiv},\n  year={2023},\n  volume={abs/2304.09542}\n}\n```\n```\n@article{Sun2023InstructionDM,\n  title={Instruction Distillation Makes Large Language Models Efficient Zero-shot Rankers},\n  author={Weiwei Sun and Zheng Chen and Xinyu Ma and Lingyong Yan and Shuaiqiang Wang and Pengjie Ren and Zhumin Chen and Dawei Yin and Zhaochun Ren},\n  journal={ArXiv},\n  year={2023},\n  volume={abs/2311.01555},\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsunnweiwei%2FRankGPT","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsunnweiwei%2FRankGPT","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsunnweiwei%2FRankGPT/lists"}