https://github.com/Cornell-RL/drpo
Dateset Reset Policy Optimization
https://github.com/Cornell-RL/drpo
Last synced: 5 months ago
JSON representation
Dateset Reset Policy Optimization
- Host: GitHub
- URL: https://github.com/Cornell-RL/drpo
- Owner: Cornell-RL
- License: apache-2.0
- Created: 2024-04-12T13:43:03.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-04-12T13:53:22.000Z (about 1 year ago)
- Last Synced: 2024-09-03T03:01:55.672Z (8 months ago)
- Language: Python
- Size: 32.2 KB
- Stars: 25
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-RLHF - official
README
# Dateset Reset Policy Optimization (DR-PO)
Fork of [Learning from Feedback Details](https://github.com/vwxyzjn/summarize_from_feedback_details/tree/62c37d63c212c55bde52833611eb642a95facb5c) for DR-PO and TL;DR.
More information coming soon...