Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Awesome-RLHF
Awesome Reinforcement Learning from Human Feedback, the secret behind ChatGPT XD
https://github.com/andy-yangz/Awesome-RLHF
Last synced: 2 days ago
JSON representation
-
Repos
-
LM RLHF
-
-
Datasets
-
LM RLHF
- HH-RLHF - rlhf)]:A Dataset created by Anthropic.
-
-
📜 Papers & Blog
-
LM RLHF
- ChatGPT: Optimizing Language Models for Dialogue
- Fine-Tuning Language Models from Human Preferences - human-preferences)]
- Learning to summarize with human feedback
- WebGPT: Browser-assisted question-answering with human feedback
- Training language models to follow instructions with human feedback
- Improving alignment of dialogue agents via targeted human judgements
- Scaling Laws for Reward Model Overoptimization
- Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
- Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
- Dynamic Planning in Open-Ended Dialogue using Reinforcement Learning
- Is Reinforcement Learning (Not) for Natural Language Processing?: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization
- Offline RL for Natural Language Generation with Implicit Language Q Learning - Snell/Implicit-Language-Q-Learning)]
- Recursively Summarizing Books with Human Feedback
-
Survey
-
Pre-LM RLHF
-
-
Videos & Lectures
Programming Languages
Categories
Sub Categories