https://github.com/ckkissane/rlhf-shakespeare

Shakespeare transformer fine-tuned to generate positive sentiment samples using RLHF
https://github.com/ckkissane/rlhf-shakespeare

Last synced: 2 months ago
JSON representation

Shakespeare transformer fine-tuned to generate positive sentiment samples using RLHF

Host: GitHub
URL: https://github.com/ckkissane/rlhf-shakespeare
Owner: ckkissane
Created: 2022-11-26T18:46:22.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2022-11-26T18:50:19.000Z (over 2 years ago)
Last Synced: 2024-10-31T16:37:57.848Z (7 months ago)
Language: Python
Size: 2.1 MB
Stars: 10
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

awesome-human-in-the-loop - Github - ckkissane/rlhf-shakespeare - tuned to generate positive sentiment samples using RLHF (Awesome RHLF / Tools and Resources)

README

# RLHF Shakespeare

A transformer trained on the works of Shakespeare, then fine-tuned to generate positive sentiment samples using RLHF.
This is a suggested exercise from chapter 7 of this [deep_learning_curriculum](https://github.com/jacobhilton/deep_learning_curriculum/blob/master/7-Alignment.md).

Here are some unconditional samples after fine-tuning:

```
Becomes him nobly; So do's Arcites mirth,
But Palamons sadnes is a kinde of mirth,
So mingled, as if, as I, use love to thee,
Making more clamorous than to be
```
```
Becomes him nobly; So do's Arcites mirth,
But Palamons sadnes is a kinde of mirth,
So mingled, as, as if I were to love;
And every man could make more good
```

And a few from the pre-trained model, for comparison:
```
To strike him there, [_Reads._] Boy, I will you bite it,
And do my rage well compos’d it.
Who may be call’d Anchient Romeo’s friend,
```
```
I will forestall my right disgrace and ever.

SIR TOBY.
Come, come, sir.

CLOWN.
[_Sings._] Away, you’ll be hanged and hang yourselves
```

It seems to have learned to produce more positive samples, but at the expense
of variety and coherence.

## Training:

The model was fine-tuned using [PPO](https://arxiv.org/abs/1707.06347).

![avg return](avg_return.png)

If you're curious about more metrics (fraction of ratios clipped, KL(current model || original model), intermediate samples),
you can check out the logs directory.

## To reproduce:

1. Pre-train a transformer using a standard language modeling objective on the works of shakespeare:

```
python pre_train.py
```

2. Fine-tune a reward model that acts as a sentiment classifier,
trained on human labeled data:

```
python train_rew_model.py
```

3. Fine-tune a PPO agent to generate samples that are classified as positive sentiment by the reward model:

```
python train_ppo.py
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ckkissane/rlhf-shakespeare

Awesome Lists containing this project

README