Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/xfactlab/orpo
Official repository for ORPO
https://github.com/xfactlab/orpo
Last synced: about 1 month ago
JSON representation
Official repository for ORPO
- Host: GitHub
- URL: https://github.com/xfactlab/orpo
- Owner: xfactlab
- License: apache-2.0
- Created: 2024-03-11T12:31:38.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-05-31T06:39:39.000Z (7 months ago)
- Last Synced: 2024-08-01T22:05:40.536Z (5 months ago)
- Language: Python
- Homepage:
- Size: 1.87 MB
- Stars: 390
- Watchers: 6
- Forks: 35
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- StarryDivineSky - xfactlab/orpo
README
# **ORPO**
### **`Updates (24.03.25)`**
- [X] Sample script for ORPOTrainer in 🤗TRL is added to `trl/test_orpo_trainer_demo.py`
- [X] New model, 🤗kaist-ai/mistral-orpo-capybara-7k, is added to 🤗ORPO Collection
- [X] Now you can try ORPO in 🤗TRL, Axolotl and LLaMA-Factory🔥
- [X] We are making general guideline for training LLMs with ORPO, stay tuned🔥
- [X] **Mistral-ORPO-β** achieved a 14.7% in the length-controlled (LC) win rate on official AlpacaEval Leaderboard🔥
This is the official repository for **ORPO: Monolithic Preference Optimization without Reference Model**. The detailed results in the paper can be found in:
- [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kaist-ai%2Fmistral-orpo-beta)
- [AlpacaEval](#alpacaeval)
- [MT-Bench](#mt-bench)
- [IFEval](#ifeval)### **`Model Checkpoints`**
Our models trained with ORPO can be found in:
- [X] **Mistral-ORPO-Capybara-7k**: 🤗 kaist-ai/mistral-orpo-capybara-7k
- [X] **Mistral-ORPO-⍺**: 🤗 kaist-ai/mistral-orpo-alpha
- [X] **Mistral-ORPO-β**: 🤗 kaist-ai/mistral-orpo-betaAnd the corresponding logs for the average log probabilities of chosen/rejected responses during training are reported in:
- [X] **Mistral-ORPO-Capybara-7k**: TBU
- [X] **Mistral-ORPO-⍺**: Wandb Report for Mistral-ORPO-⍺
- [X] **Mistral-ORPO-β**: Wandb Report for Mistral-ORPO-β
### **`AlpacaEval`**
Figure 1. AlpacaEval 2.0 score for the models trained with different alignment methods.
### **`MT-Bench`**
Figure 2. MT-Bench result by category.
### **`IFEval`**
IFEval scores are measured with EleutherAI/lm-evaluation-harness by applying the chat template. The scores for Llama-2-Chat (70B), Zephyr-β (7B), and Mixtral-8X7B-Instruct-v0.1 are originally reported in this tweet.
| **Model Type** | **Prompt-Strict** | **Prompt-Loose** | **Inst-Strict** | **Inst-Loose** |
|--------------------|:-----------------:|:----------------:|:---------------:|----------------|
| **Llama-2-Chat (70B)** | 0.4436 | 0.5342 | 0.5468 | 0.6319 |
| **Zephyr-β (7B)** | 0.4233 | 0.4547 | 0.5492 | 0.5767 |
| **Mixtral-8X7B-Instruct-v0.1** | 0.5213 | **0.5712** | 0.6343 | **0.6823** |
| **Mistral-ORPO-⍺ (7B)** | 0.5009 | 0.5083 | 0.5995 | 0.6163 |
| **Mistral-ORPO-β (7B)** | **0.5287** | 0.5564 | **0.6355** | 0.6619 |