https://github.com/natolambert/rlhf-book

Textbook on reinforcement learning from human feedback
https://github.com/natolambert/rlhf-book

ai alignment rlhf

Last synced: 5 months ago
JSON representation

Textbook on reinforcement learning from human feedback

Host: GitHub
URL: https://github.com/natolambert/rlhf-book
Owner: natolambert
License: other
Created: 2024-05-24T17:08:24.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2026-02-01T01:39:23.000Z (5 months ago)
Last Synced: 2026-02-01T09:18:37.018Z (5 months ago)
Topics: ai, alignment, rlhf
Language: Python
Homepage: https://rlhfbook.com/
Size: 26 MB
Stars: 1,488
Watchers: 22
Forks: 134
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE-CHAPTERS

Awesome Lists containing this project

awesome-LLM-resources - Textbook on reinforcement learning from human feedback
StarryDivineSky - natolambert/rlhf-book - book是一个关于人类反馈强化学习（RLHF）的教科书项目，旨在帮助读者理解和掌握RLHF技术。该项目涵盖了RLHF的基本概念、算法和应用，并提供了丰富的代码示例和实验结果。其核心思想是利用人类的偏好数据来训练奖励模型，进而指导强化学习智能体的学习。该项目可能包括对齐（Alignment）问题的讨论，以及如何使用人类反馈来提高模型的安全性、可靠性和可控性。读者可以通过学习该项目，了解如何构建基于人类反馈的智能体，并将其应用于各种实际场景，例如对话生成、文本摘要和机器人控制等。该项目提供了理论知识和实践指导，适合对RLHF感兴趣的研究人员和工程师学习。 (A01_文本生成_文本对话 / 大语言对话模型及数据)

README

# RLHF Book

A comprehensive guide to Reinforcement Learning from Human Feedback (and a broad introduction to post-training language models).

**[Read online](https://rlhfbook.com)** | **[Pre-order print](https://hubs.la/Q03Tc3cf0)**

This book is my attempt to open-source all the knowledge I've gained working at the frontier of open models in the post-ChatGPT take off of language models.
When I started, many established methods like rejection sampling had no canonical reference.
On the other side, industry practices to make the models more personable -- colloquially called Character Training -- had no open research.
It was obvious to me that there would be payoff to documenting, learning the fundamentals, carefully curating the references (in an era of AI slop), and everything in between would be a wonderful starting point for people.

Today, I'm adding code and seeing this as a home base for people who want to learn.
You should use coding assistants to ask questions.
You should buy the physical book because the real world matters.
You should read the specific AI outputs tailored to you.

In the future I want to build more education resources to this, such as open source slide decks and more ways to learn.
In the end, with how impossible it is to measure human preferences, RLHF will never be a solved problem.

Thank you for reading.
Thank you for contributing any feedback or engaging with the community.

-- Nathan Lambert, @natolambert

## Repository Structure

```
rlhf-book/
├── book/ # Book source and build files
│ ├── chapters/ # Markdown source (01-introduction.md, etc.)
│ ├── images/ # Figures referenced in chapters
│ ├── assets/ # Brand assets (covers, logos)
│ ├── templates/ # Pandoc templates (HTML, PDF, EPUB)
│ ├── scripts/ # Build utilities
│ └── data/ # Library data
├── code/ # Reference implementations
│ ├── policy_gradients/ # PPO, REINFORCE, GRPO, RLOO
│ ├── reward_models/ # Preference RM, ORM, PRM training
│ └── direct_alignment/ # DPO and variants
├── diagrams/ # Diagram source files
│ ├── scripts/ # Python generation scripts
│ ├── tikz/ # LaTeX/TikZ sources
│ └── specs/ # YAML specifications
├── build/ # Generated output (git-ignored)
└── Makefile # Build system
```

## Code Library

Reference implementations for RLHF algorithms in `code/`:
- Policy gradient methods (PPO, REINFORCE, GRPO, RLOO, etc.)
- Reward model training (preference RM, ORM, PRM)
- Direct alignment methods

See [code/README.md](code/README.md) for setup and usage.

## Book Source

Book source files are in `book/`. Build locally:

```bash
make html # Build HTML site
make pdf # Build PDF (requires LaTeX)
```

See [book/README.md](book/README.md) for detailed build instructions.

## Diagrams

The `diagrams/` directory contains source files for figures used in the book. These are designed to be reusable for presentations, blog posts, or your own learning materials. Generate them with:

```bash
cd diagrams && make all
```

## Citation

To cite this book, please use the following format:

```bibtex
@book{rlhf2025,
author = {Nathan Lambert},
title = {Reinforcement Learning from Human Feedback},
year = {2025},
publisher = {Online},
url = {https://rlhfbook.com},
}
```

## License

- Code: [MIT](LICENSE-CODE)
- Chapters: [CC-BY-NC-SA-4.0](LICENSE-CHAPTERS)

## Contributors

Where I get the credit as the sole "author" and creator of this project, I've been super lucky to have many contributions from early readers. These have massively accelerated the editing progress and flat-out added meaningful content to the book. I'm happy to send substantive contributors free copies of the book and expect the internet goodwill to pay them back in unexpected ways.

See all [contributors](https://github.com/natolambert/rlhf-book/graphs/contributors).

## Star History

[![Star History Chart](https://api.star-history.com/svg?repos=natolambert/rlhf-book&type=Date)](https://www.star-history.com/#natolambert/rlhf-book&Date)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/natolambert/rlhf-book

Awesome Lists containing this project

README