Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/andy-yangz/Awesome-RLHF

Awesome Reinforcement Learning from Human Feedback, the secret behind ChatGPT XD
https://github.com/andy-yangz/Awesome-RLHF

List: Awesome-RLHF

Last synced: 2 months ago
JSON representation

Awesome Reinforcement Learning from Human Feedback, the secret behind ChatGPT XD

Awesome Lists containing this project

README

        

# Awesome Reinforcement Learning from Human Feedback

![GitHub stars](https://img.shields.io/github/stars/andy-yangz/Awesome-RLHF.svg?color=red&style=for-the-badge)
![GitHub forks](https://img.shields.io/github/forks/andy-yangz/Awesome-RLHF.svg?style=for-the-badge)
![GitHub activity](https://img.shields.io/github/last-commit/andy-yangz/Awesome-RLHF?color=yellow&style=for-the-badge)

A collection of resources on Reinforcement Learning from Human Feedback (RLHF), mainly focused on pretrained models.

- [Awesome Reinforcement Learning from Human Feedback](#awesome-reinforcement-learning-from-human-feedback)
- [📜 Papers \& Blog](#-papers--blog)
- [Survey](#survey)
- [Pre-LM RLHF](#pre-lm-rlhf)
- [LM RLHF](#lm-rlhf)
- [Repos](#repos)
- [Datasets](#datasets)
- [Videos \& Lectures](#videos--lectures)
- [TODO](#todo)
- [📧Contact Me](#contact-me)

## 📜 Papers & Blog

### Survey

- [Illustrating Reinforcement Learning from Human Feedback (RLHF)](https://huggingface.co/blog/rlhf) :Mainly inspired this repo

### Pre-LM RLHF
- [TAMER: Training an Agent Manually via Evaluative Reinforcement](https://www.cs.utexas.edu/~pstone/Papers/bib2html-links/ICDL08-knox.pdf)
- [Interactive Learning from Policy-Dependent Human Feedback](http://proceedings.mlr.press/v70/macglashan17a/macglashan17a.pdf)
- [Deep Reinforcement Learning from Human Preferences](https://proceedings.neurips.cc/paper/2017/hash/d5e2c0adad503c91f91df240d0cd4e49-Abstract.html) [[Blog](https://www.deepmind.com/blog/learning-through-human-feedback)]
- [Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces](https://ojs.aaai.org/index.php/AAAI/article/view/11485)

### LM RLHF

- [Fine-Tuning Language Models from Human Preferences](https://arxiv.org/abs/1909.08593) [[Code (TensorFlow)](https://github.com/openai/lm-human-preferences)]
- [Learning to summarize with human feedback](https://proceedings.neurips.cc/paper/2020/hash/1f89885d556929e98d3ef9b86448f951-Abstract.html) [[Video](https://www.youtube.com/watch?v=vLTmnaMpQCs)]
- [Recursively Summarizing Books with Human Feedback](https://arxiv.org/abs/2109.10862)
- [WebGPT: Browser-assisted question-answering with human feedback](https://arxiv.org/abs/2112.09332)
- [Training language models to follow instructions with human feedback](https://arxiv.org/abs/2203.02155)
- [Teaching language models to support answers with verified quotes](https://www.deepmind.com/publications/gophercite-teaching-language-models-to-support-answers-with-verified-quotes)
- [Improving alignment of dialogue agents via targeted human judgements](https://arxiv.org/abs/2209.14375)
- [ChatGPT: Optimizing Language Models for Dialogue](https://openai.com/blog/chatgpt/)
- [Scaling Laws for Reward Model Overoptimization](https://arxiv.org/abs/2210.10760)
- [Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback](https://arxiv.org/abs/2204.05862)
- [Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned](https://arxiv.org/abs/2209.07858)
- [Dynamic Planning in Open-Ended Dialogue using Reinforcement Learning](https://arxiv.org/abs/2208.02294)
- [Is Reinforcement Learning (Not) for Natural Language Processing?: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization](https://arxiv.org/abs/2210.01241) [[Code](https://github.com/allenai/RL4LMs)]
- [Offline RL for Natural Language Generation with Implicit Language Q Learning](https://arxiv.org/abs/2206.11871) [[Code](https://github.com/Sea-Snell/Implicit-Language-Q-Learning)]

## Repos

- [Transformer Reinforcement Learning (TRL)](https://github.com/lvwerra/trl):Train GPT type transformers model with ***Proximal Policy Optimization*** (**PPO**)
- [Transformer Reinforcement Learning X (TRLX)](https://github.com/CarperAI/trlx):Enhanced TRL with ***Implicit Language Q-Learning*** (**ILQL**)
- [RL4LMs (A modular RL library to fine-tune language models to human preferences)](https://github.com/allenai/RL4LMs) [[Site](https://rl4lms.apps.allenai.org/)]:Thoroughly tested and benchmarked with over **2000 experiments** on Language Generation tasks, with different types of metrics, and several RL algorithms. Also support Seq2Seq type Model (eg. T5, BART).

## Datasets

- [HH-RLHF](https://github.com/anthropics/hh-rlhf) [[HF Hub](https://huggingface.co/datasets/Anthropic/hh-rlhf)]:A Dataset created by Anthropic.

## Videos & Lectures

- [Learning Task Specifications for Reinforcement Learning from Human Feedback](https://www.youtube.com/watch?v=vebzz6EKD2w)
- [Reinforcement Learning from Human Feedback: From Zero to chatGPT](https://www.youtube.com/watch?v=2MBJOuVq380)

## TODO

- [ ] Add more descriptions

## 📧Contact Me

If you have any question, please feel free to contact me (📧: [email protected]).