https://github.com/carperai/autocrit
A repository for transformer critique learning and generation
https://github.com/carperai/autocrit
Last synced: about 1 year ago
JSON representation
A repository for transformer critique learning and generation
- Host: GitHub
- URL: https://github.com/carperai/autocrit
- Owner: CarperAI
- Created: 2023-04-07T13:57:34.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2023-12-07T17:58:14.000Z (over 2 years ago)
- Last Synced: 2023-12-10T19:33:29.247Z (over 2 years ago)
- Language: Python
- Size: 1.12 MB
- Stars: 68
- Watchers: 6
- Forks: 12
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# AutoCrit
A repository for transformer critique learning and generation.
## Scalar reward models
Train [OpenLLaMA-13B](https://github.com/openlm-research/open_llama) on [Helpful and Harmless dataset](https://github.com/anthropics/hh-rlhf):
```bash
accelerate launch --config_file configs/accelerate/zero2.yaml \
train_reward_model.py \
--model_path openlm-research/open_llama_13b \
--dataset pvduy/rm_oa_hh \
--batch_size 1 \
--eval_interval 1000 \
--lr 0.00001 \
--weight_decay 0 \
--num_unfrozen_layers 12 \
--gradient_checkpointing \
--checkpoint_dir checkpoints \
--calibration_datasets reciprocate/vicuna-fair-eval
```
Usage:
```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer
ckpt = "reciprocate/openllama-13b_rm_oasst-hh"
model = AutoModelForSequenceClassification.from_pretrained(ckpt, load_in_4bit=True)
tokenizer = AutoTokenizer.from_pretrained(ckpt)
model(**tokenizer("ASSISTANT: This sentence is a lie.", return_tensors="pt"))[0].item()
```
Output:
```python
-1.626953125
```