https://github.com/Fhujinwu/Human-Feedback-awesome

Human-Feedback, RLHF.
https://github.com/Fhujinwu/Human-Feedback-awesome

Last synced: 7 months ago
JSON representation

Human-Feedback, RLHF.

Host: GitHub
URL: https://github.com/Fhujinwu/Human-Feedback-awesome
Owner: Fhujinwu
License: bsd-3-clause
Created: 2023-04-15T06:53:16.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2023-08-14T13:57:20.000Z (almost 2 years ago)
Last Synced: 2024-05-21T23:00:32.952Z (about 1 year ago)
Homepage:
Size: 50.8 KB
Stars: 5
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

ultimate-awesome - Human-Feedback-awesome - Human-Feedback, RLHF. (Other Lists / Julia Lists)

README

        # Human-Feedback-For-AI-awesome

We would like to maintain an up-to-date list of progress (papers, blogs, codes, and *etc.*) made in **Human Feedback For AI** (LLM,Text-image and other task), and provide a guide for some of the papers that have received wide interest.

Please feel free to [open an issue](Fhujinwu/Human-Feedback-For-LLM-awesome) to add papers.

## Table of Contents

- Human Feedback for LLM

- Human Feedback for Text-Image

- Human Feedback for Robot Control

- About Reinforcement Learning

## Human Feedback for LLM

* Deep reinforcement learning from human preferences, nips'17. [[paper]](https://proceedings.neurips.cc/paper_files/paper/2017/file/d5e2c0adad503c91f91df240d0cd4e49-Paper.pdf)

* Recursively Summarizing Books with Human Feedback, arxiv'22. [[paper]](https://arxiv.org/pdf/2109.10862.pdf)

* InstructGPT: Training Language Models to Follow Instructions With Human Feedback, nips'22. [[paper]](https://proceedings.neurips.cc/paper_files/paper/2022/file/b1efde53be364a73914f58805a001731-Paper-Conference.pdf) [[video]](https://www.bilibili.com/video/BV1hd4y187CR/)

* Fine-tuning language models to find agreement among humans with diverse preferences, nips'22. [[paper]](https://proceedings.neurips.cc/paper_files/paper/2022/file/f978c8f3b5f399cae464e85f72e28503-Paper-Conference.pdf)

* Constitutional AI: Harmlessness from AI Feedback, arxiv'22. [[paper]](https://arxiv.org/pdf/2212.08073.pdf)

* Training a helpful and harmless assistant with reinforcement learning from human feedback, arxiv'22. [[paper]](https://arxiv.org/pdf/2204.05862.pdf)

* Direct Preference Optimization:Your Language Model is Secretly a Reward Model, arxiv'23. [[paper]](https://arxiv.org/pdf/2305.18290.pdf) [[code]](https://github.com/eric-mitchell/direct-preference-optimization) [[blogs]](https://zhuanlan.zhihu.com/p/634705904)

* RRHF: Rank responses to align language models with human feedback without tears, arxiv'23. [[paper]](https://arxiv.org/pdf/2304.05302.pdf) [[code]](https://github.com/GanjinZero/RRHF) [[blogs]](https://mp.weixin.qq.com/s/MiToPmFuNXY9wJcKH7pZPw)

* RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment, arxiv'23. [[paper]](https://arxiv.org/pdf/2304.06767.pdf) [[code]](https://github.com/OptimalScale/LMFlow) [[blogs]](https://mp.weixin.qq.com/s/rhO0bE8CCQsQzsH3kdTbCA)

* Fine-Grained Human Feedback Gives Better Rewards for Language Model Training, arxiv'23. [[paper]](https://arxiv.org/pdf/2306.01693.pdf) [[code]](https://github.com/allenai/FineGrainedRLHF) [[blogs]](https://mp.weixin.qq.com/s/iqf6Tw2iyYNAUoAj3f1MNw)

* Fine-Tuning Language Models with Advantage-Induced Policy Alignment, arxiv'23. [[paper]](https://arxiv.org/pdf/2306.02231.pdf)

* Scaling Laws for Reward Model Overoptimization, ICLR'23. [[paper]](https://proceedings.mlr.press/v202/gao23h/gao23h.pdf)

* Reward Collapse in Aligning Large Language Models, arxiv'23. [[paper]](https://arxiv.org/pdf/2305.17608.pdf) [[blogs]](https://mp.weixin.qq.com/s/REqLcA9CMEM8M7DYZpuC-Q)

* Chain of Hindsight Aligns Language Models with Feedback, arxiv'23. [[paper]](https://arxiv.org/pdf/2302.02676.pdf)

* Principled Reinforcement Learning with Human Feedback from Pairwise or K, arxiv'23. [[paper]](https://arxiv.org/pdf/2301.11270.pdf)

* Reinforcement Learning from Diverse Human Preferences, arxiv'23. [[paper]](https://arxiv.org/pdf/2301.11774.pdf)

* Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback, arxiv'23. [[paper]](https://arxiv.org/pdf/2303.05453.pdf)

* Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization, iclr'23. [[paper]](https://arxiv.org/pdf/2210.01241.pdf) [[code]](https://github.com/allenai/RL4LMs)

* How to Query Human Feedback Efficiently in RL? arxiv'23. [[paper]](https://arxiv.org/pdf/2305.18505.pdf)

* Pretraining Language Models with Human Preferences, icml'23. [[paper]](https://proceedings.mlr.press/v202/korbak23a/korbak23a.pdf)

* Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback, arxiv'23. [[paper]](https://arxiv.org/pdf/2307.15217.pdf)

## Human Feedback for Text-Image

* Aligning text-to-image models using human feedback, arxiv'23. [[paper]](https://arxiv.org/pdf/2302.12192.pdf)  [[blogs]](https://mp.weixin.qq.com/s/FrqpybryiJ-ikO4ZVeISIg)

* Better aligning text-to-image models with human preference, arxiv'23. [[paper]](https://arxiv.org/pdf/2303.14420.pdf) [[code]](https://github.com/tgxs002/align_sd)

* DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models, arxiv'23. [[paper]](https://arxiv.org/pdf/2305.16381.pdf)

* ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation, arxiv'23. [[paper]](https://arxiv.org/pdf/2304.05977.pdf) [[code]](https://github.com/THUDM/ImageReward)

* AGIQA-3K: An Open Database for AI-Generated Image Quality Assessment, arxiv'23. [[paper]](https://arxiv.org/pdf/2306.04717.pdf) [[code]](https://github.com/lcysyzxdxc/AGIQA-3k-Database?utm_source=catalyzex.com)

* AIGCIQA2023: A Large-scale Image Quality Assessment Database for AI Generated Images: from the Perspectives of Quality, Authenticity and Correspondence, arxiv'23. [[paper]](https://arxiv.org/pdf/2307.00211.pdf) [[code]](https://github.com/wangjiarui153/AIGCIQA2023)

* Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation, arxiv'23. [[paper]](https://arxiv.org/pdf/2305.01569.pdf)

* Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis, arxiv'23. [[paper]](https://arxiv.org/pdf/2306.09341.pdf) [[code]](https://github.com/tgxs002/HPSv2)

## Human Feedback for Robot Control

* Aligning human preferences with baseline objectives in reinforcement learning, icra'23. [[paper]](https://www.diva-portal.org/smash/get/diva2:1744884/FULLTEXT01.pdf)

* Feedback-efficient interactive reinforcement learning via relabeling experience and unsupervised pre-training, icml'21. [[paper]](https://proceedings.mlr.press/v139/lee21i.html)

* 

## About Reinforcement Learning

* Augmented Proximal Policy Optimization for Safe Reinforcement Learning, aaai'23. [[paper]](https://ojs.aaai.org/index.php/AAAI/article/view/25888)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/Fhujinwu/Human-Feedback-awesome

Awesome Lists containing this project

README