{"id":28532273,"url":"https://github.com/sail-sg/dice","last_synced_at":"2025-07-16T12:35:46.130Z","repository":{"id":244078146,"uuid":"814076961","full_name":"sail-sg/dice","owner":"sail-sg","description":"Official implementation of Bootstrapping Language Models via DPO Implicit Rewards","archived":false,"fork":false,"pushed_at":"2025-04-15T01:48:27.000Z","size":19285,"stargazers_count":44,"open_issues_count":0,"forks_count":3,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-06-09T15:53:00.061Z","etag":null,"topics":["alignment","large-language-models","preference-learning","rlhf"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2406.09760","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sail-sg.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-06-12T09:44:09.000Z","updated_at":"2025-05-03T03:10:36.000Z","dependencies_parsed_at":"2025-04-15T02:43:15.639Z","dependency_job_id":null,"html_url":"https://github.com/sail-sg/dice","commit_stats":null,"previous_names":["sail-sg/dice"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/sail-sg/dice","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sail-sg%2Fdice","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sail-sg%2Fdice/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sail-sg%2Fdice/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sail-sg%2Fdice/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sail-sg","download_url":"https://codeload.github.com/sail-sg/dice/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sail-sg%2Fdice/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":264091945,"owners_count":23556211,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["alignment","large-language-models","preference-learning","rlhf"],"created_at":"2025-06-09T15:38:01.156Z","updated_at":"2025-07-16T12:35:46.086Z","avatar_url":"https://github.com/sail-sg.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Bootstrapping with DPO Implicit Rewards (DICE)\n\n[![Collection](https://img.shields.io/badge/🤗-Model%20Collection-blue)](https://huggingface.co/collections/sail/dice-6684de998e62fe07709d67eb)\n[![Paper Arvix](https://img.shields.io/badge/Paper-Arvix%20Link-green)](https://arxiv.org/abs/2406.09760)\n[![Code License](https://img.shields.io/badge/Code%20License-MIT-yellow.svg)](https://github.com/sail-sg/dice/blob/main/LICENSE)\n\nThis repository contains the implementation of our paper Bootstrapping Language Models via DPO Implicit Rewards. We show that the implicit reward model from the prior DPO training can be utilized to bootstrap and further align LLMs.\n\n\u003cimg src=\"./DICE.png\" width=\"1000px\"\u003e\u003c/img\u003e\n\n## Quick links\n- [Bootstrapping with DPO Implicit Rewards (DICE)](#bootstrapping-with-dpo-implicit-rewards-dice)\n  - [Quick links](#quick-links)\n  - [Base Models and Released Models](#base-models-and-released-models)\n  - [Setup](#setup)\n    - [Install dependencies](#install-dependencies)\n    - [Setup the bash script](#setup-the-bash-script)\n  - [Training scripts](#training-scripts)\n  - [Acknowledgement](#acknowledgement)\n  - [Citation](#citation)\n\n## Base Models and Released Models\n| **Model**                  | **AE2 LC** | **AE2 WR** |\n|----------------------------|:----------:|:----------:|\n| 🤗[Llama-3-Base-8B-SFT-DPO](https://huggingface.co/princeton-nlp/Llama-3-Base-8B-SFT-DPO)    | 18.20      | 15.50      |\n| 🤗[Llama-3-Base-8B-DICE Iter1](https://huggingface.co/sail/Llama-3-Base-8B-DICE-Iter1) | 25.08      | 25.77      |\n| 🤗[Llama-3-Base-8B-DICE Iter2](https://huggingface.co/sail/Llama-3-Base-8B-DICE-Iter2) | 27.55      | 30.99      |\n| 🤗[Zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta)             | 12.69      | 10.71      |\n| 🤗[Zephyr-7B-DICE Iter1](https://huggingface.co/sail/Zephyr-7B-DICE-Iter1)       | 19.03      | 17.67      |\n| 🤗[Zephyr-7B-DICE Iter2](https://huggingface.co/sail/Zephyr-7B-DICE-Iter2)       | 20.71      | 20.16      |\n\nPlease refer to [pipeline.sh#1.1_response_generation](https://github.com/sail-sg/dice/blob/21abbe8c44ad2d608dbcf14551c209064ce66540/scripts/run_dice/pipeline.sh#L105) on instructions for batch inference with the appropriate chat template. \n\n## Setup\n### Install dependencies\nPlease install dependencies using the following command: \n```bash\ngit clone https://github.com/sail-sg/dice.git\nconda create -n dice python=3.10\nconda activate dice\ncd dice/llama-factory\npip install -e .[deepspeed,metrics,bitsandbytes]\n\ncd ..\npip install -e .\npip install -r requirements.txt\n\n# optional to install flash attention\npip install flash-attn --no-build-isolation\n```\n\n### Setup the bash script\nProvide the local path to this repo to `DICE_DIR` in two files: \n- `scripts/run_dice/iter.sh`\n- `scripts/run_dice/pipeline.sh`\n\nE.g. `DICE_DIR=\"/home/username/dice\"`\n\n## Training scripts\nWe provide sample training scripts for both Llama3 and Zephyr settings. It is recommended to run the script with `8x A100 GPUs`. For other hardware environments, you might need to adjust the script. \n\n- Llama3\n  ```bash\n  bash scripts/run_dice/iter.sh llama3\n  ```\n\n- Zephyr\n  ```bash\n  bash scripts/run_dice/iter.sh zephyr\n  ```\n\n\n## Acknowledgement\nThis repo is built on [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory). Thanks for the amazing work!\n\n## Citation\nPlease consider citing our paper if you find the repo helpful in your work:\n\n```bibtex\n@inproceedings{chen2025bootstrapping,\n   title={Bootstrapping Language Models with DPO Implicit Rewards},\n   author={Chen, Changyu and Liu, Zichen and Du, Chao and Pang, Tianyu and Liu, Qian and Sinha, Arunesh and Varakantham, Pradeep and Lin, Min},\n   booktitle={International Conference on Learning Representations (ICLR)},\n   year={2025}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsail-sg%2Fdice","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsail-sg%2Fdice","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsail-sg%2Fdice/lists"}