https://github.com/carperai/instructgpt
For experiments involving instruct gpt. Currently used for documenting open research questions.
https://github.com/carperai/instructgpt
Last synced: over 1 year ago
JSON representation
For experiments involving instruct gpt. Currently used for documenting open research questions.
- Host: GitHub
- URL: https://github.com/carperai/instructgpt
- Owner: CarperAI
- License: mit
- Created: 2022-10-10T22:50:30.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2022-11-08T17:48:38.000Z (over 3 years ago)
- Last Synced: 2025-01-08T02:15:42.194Z (over 1 year ago)
- Size: 5.86 KB
- Stars: 71
- Watchers: 9
- Forks: 4
- Open Issues: 25
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# BigModelName
This repository is for open-questions relating to RLHF and InstructGPT as pertaining to BigModelName.
## Open Questions
* What is the preference rate of PPO vs PPO-Ptx? Why was 27.8 chosen as the mixing factor between the pre-training gradients and the PPO gradients?
* What do the gradient norms and gradient noise scales look like for PPO grads vs pre-training grads?
* How important is SFT pretraining on human-written completions?