An open API service indexing awesome lists of open source software.

https://github.com/rlhflow/online-dpo-r1

Codebase for Iterative DPO Using Rule-based Rewards
https://github.com/rlhflow/online-dpo-r1

Last synced: 12 months ago
JSON representation

Codebase for Iterative DPO Using Rule-based Rewards

Awesome Lists containing this project