https://github.com/rlhflow/online-dpo-r1
Codebase for Iterative DPO Using Rule-based Rewards
https://github.com/rlhflow/online-dpo-r1
Last synced: 12 months ago
JSON representation
Codebase for Iterative DPO Using Rule-based Rewards
- Host: GitHub
- URL: https://github.com/rlhflow/online-dpo-r1
- Owner: RLHFlow
- Created: 2025-02-12T00:33:18.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-04-11T14:51:52.000Z (about 1 year ago)
- Last Synced: 2025-04-11T15:54:11.215Z (about 1 year ago)
- Language: Python
- Size: 10.4 MB
- Stars: 235
- Watchers: 15
- Forks: 31
- Open Issues: 9
-
Metadata Files:
- Readme: README.md