https://github.com/andy-yangz/Awesome-RLHF

Awesome Reinforcement Learning from Human Feedback, the secret behind ChatGPT XD
https://github.com/andy-yangz/Awesome-RLHF

List: Awesome-RLHF

Last synced: 3 months ago
JSON representation

Awesome Reinforcement Learning from Human Feedback, the secret behind ChatGPT XD

Host: GitHub
URL: https://github.com/andy-yangz/Awesome-RLHF
Owner: andy-yangz
License: mit
Created: 2022-12-12T16:01:03.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2022-12-13T01:17:30.000Z (over 2 years ago)
Last Synced: 2024-05-22T04:03:39.811Z (about 1 year ago)
Homepage:
Size: 3.91 KB
Stars: 23
Watchers: 2
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-human-in-the-loop - Github - andy-yangz/Awesome-RLHF

README

        # Awesome Reinforcement Learning from Human Feedback

 ![GitHub stars](https://img.shields.io/github/stars/andy-yangz/Awesome-RLHF.svg?color=red&style=for-the-badge) 

 ![GitHub forks](https://img.shields.io/github/forks/andy-yangz/Awesome-RLHF.svg?style=for-the-badge) 

 ![GitHub activity](https://img.shields.io/github/last-commit/andy-yangz/Awesome-RLHF?color=yellow&style=for-the-badge) 

  

A collection of resources on Reinforcement Learning from Human Feedback (RLHF), mainly focused on pretrained models.

- [Awesome Reinforcement Learning from Human Feedback](#awesome-reinforcement-learning-from-human-feedback)

  - [📜 Papers \& Blog](#-papers--blog)

    - [Survey](#survey)

    - [Pre-LM RLHF](#pre-lm-rlhf)

    - [LM RLHF](#lm-rlhf)

  - [Repos](#repos)

  - [Datasets](#datasets)

  - [Videos \& Lectures](#videos--lectures)

  - [TODO](#todo)

  - [📧Contact Me](#contact-me)

## 📜 Papers & Blog

### Survey

- [Illustrating Reinforcement Learning from Human Feedback (RLHF)](https://huggingface.co/blog/rlhf) ：Mainly inspired this repo

### Pre-LM RLHF

- [TAMER: Training an Agent Manually via Evaluative Reinforcement](https://www.cs.utexas.edu/~pstone/Papers/bib2html-links/ICDL08-knox.pdf)

- [Interactive Learning from Policy-Dependent Human Feedback](http://proceedings.mlr.press/v70/macglashan17a/macglashan17a.pdf) 

- [Deep Reinforcement Learning from Human Preferences](https://proceedings.neurips.cc/paper/2017/hash/d5e2c0adad503c91f91df240d0cd4e49-Abstract.html) [[Blog](https://www.deepmind.com/blog/learning-through-human-feedback)]

- [Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces](https://ojs.aaai.org/index.php/AAAI/article/view/11485)

### LM RLHF

- [Fine-Tuning Language Models from Human Preferences](https://arxiv.org/abs/1909.08593) [[Code (TensorFlow)](https://github.com/openai/lm-human-preferences)]

- [Learning to summarize with human feedback](https://proceedings.neurips.cc/paper/2020/hash/1f89885d556929e98d3ef9b86448f951-Abstract.html) [[Video](https://www.youtube.com/watch?v=vLTmnaMpQCs)]

- [Recursively Summarizing Books with Human Feedback](https://arxiv.org/abs/2109.10862)

- [WebGPT: Browser-assisted question-answering with human feedback](https://arxiv.org/abs/2112.09332) 

- [Training language models to follow instructions with human feedback](https://arxiv.org/abs/2203.02155)

- [Teaching language models to support answers with verified quotes](https://www.deepmind.com/publications/gophercite-teaching-language-models-to-support-answers-with-verified-quotes)

- [Improving alignment of dialogue agents via targeted human judgements](https://arxiv.org/abs/2209.14375)

- [ChatGPT: Optimizing Language Models for Dialogue](https://openai.com/blog/chatgpt/)

- [Scaling Laws for Reward Model Overoptimization](https://arxiv.org/abs/2210.10760)

- [Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback](https://arxiv.org/abs/2204.05862)

- [Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned](https://arxiv.org/abs/2209.07858)

- [Dynamic Planning in Open-Ended Dialogue using Reinforcement Learning](https://arxiv.org/abs/2208.02294)

- [Is Reinforcement Learning (Not) for Natural Language Processing?: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization](https://arxiv.org/abs/2210.01241) [[Code](https://github.com/allenai/RL4LMs)]

- [Offline RL for Natural Language Generation with Implicit Language Q Learning](https://arxiv.org/abs/2206.11871) [[Code](https://github.com/Sea-Snell/Implicit-Language-Q-Learning)]

## Repos

- [Transformer Reinforcement Learning (TRL)](https://github.com/lvwerra/trl)：Train GPT type transformers model with ***Proximal Policy Optimization*** (**PPO**)

- [Transformer Reinforcement Learning X (TRLX)](https://github.com/CarperAI/trlx)：Enhanced TRL with ***Implicit Language Q-Learning*** (**ILQL**)

- [RL4LMs (A modular RL library to fine-tune language models to human preferences)](https://github.com/allenai/RL4LMs) [[Site](https://rl4lms.apps.allenai.org/)]：Thoroughly tested and benchmarked with over **2000 experiments** on Language Generation tasks, with different types of metrics, and several RL algorithms. Also support Seq2Seq type Model (eg. T5, BART).

## Datasets

- [HH-RLHF](https://github.com/anthropics/hh-rlhf) [[HF Hub](https://huggingface.co/datasets/Anthropic/hh-rlhf)]：A Dataset created by Anthropic.

## Videos & Lectures

- [Learning Task Specifications for Reinforcement Learning from Human Feedback](https://www.youtube.com/watch?v=vebzz6EKD2w)

- [Reinforcement Learning from Human Feedback: From Zero to chatGPT](https://www.youtube.com/watch?v=2MBJOuVq380)

## TODO

- [ ] Add more descriptions

## 📧Contact Me

If you have any question, please feel free to contact me (📧: [email protected]).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/andy-yangz/Awesome-RLHF

Awesome Lists containing this project

README