https://github.com/vvukimy/mdr
Code for NAACL 2024 "MDR: Model-Specific Demonstration Retrieval at Inference Time for In-Context Learning".
https://github.com/vvukimy/mdr
in-context-learning large-language-models retrieval-augmented-generation
Last synced: 3 months ago
JSON representation
Code for NAACL 2024 "MDR: Model-Specific Demonstration Retrieval at Inference Time for In-Context Learning".
- Host: GitHub
- URL: https://github.com/vvukimy/mdr
- Owner: vvukimy
- Created: 2024-03-19T12:13:52.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-07-04T12:15:54.000Z (almost 2 years ago)
- Last Synced: 2025-08-01T14:51:29.811Z (10 months ago)
- Topics: in-context-learning, large-language-models, retrieval-augmented-generation
- Language: Python
- Homepage:
- Size: 230 KB
- Stars: 6
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# MDR
Code for NAACL 2024 paper: [MDR: Model-Specific Demonstration Retrieval at Inference Time for In-Context Learning](https://aclanthology.org/2024.naacl-long.235/).
## Environment Setup
```bash
cd MDR
bash install.sh
```
## Preparation
Follow the instructions in [UPRISE](https://github.com/microsoft/LMOps/tree/main/uprise#1-download-retriever-and-prompt-pool) to download pre-trained retriever and pre-constructed demonstration pool.
After downloading, encode the demonstration pool with the demonstration encoder:
```bash
bash ./scripts/gen_demonstration_embeds.sh
```
## Quick Start
Download [demonstration_pool_GPTNeo.json](https://drive.google.com/file/d/1m4ls7Unl36-NaGCLyKPAUtqeMAZJcb6J/view?usp=drive_link) to `./demonstration_pools`. Then run the provided shell to evaluate MDR on different tasks with GPTNeo-2.7B and get to know the demonstration retrieval process:
```bash
bash ./scripts/run_GPTNeo_2.7B.sh
```
You can change the variable `DEMONSTRATION_POOL` to `path_to_demonstration_pool` (downloaded from UPRISE) to see how MDR calculate eigenvalue and loss for each sample in test dataset given specific inference model.
## Evaluation MDR on any tasks and models
Customize your scripts to support different tasks and models based on the parameters:
- `LLM`: you can specify the LLM name here (in huggingface format);
- `DEMONSTRATION_POOL`: since the calculation of eigenvalue and loss has a one-to-one correspondence with the model, you should create different demonstration pool files for different models (just copy the downloaded demonstration pool file and rename it);
- `TASKS`: MDR support 20+ datasets, you can specify the task name to evaluate according to the task definition in `./DPR/dpr/utils/tasks.py`;
## Acknowledgement
This repository is built using the [UPRISE](https://github.com/microsoft/LMOps/tree/main/uprise) codebase.