https://github.com/freedomintelligence/ovm

Last synced: about 1 year ago
JSON representation

Host: GitHub
URL: https://github.com/freedomintelligence/ovm
Owner: FreedomIntelligence
Created: 2023-11-16T06:08:18.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2024-04-02T06:08:26.000Z (about 2 years ago)
Last Synced: 2025-03-30T19:22:22.419Z (about 1 year ago)
Language: Python
Size: 4.28 MB
Stars: 65
Watchers: 11
Forks: 6
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# OVM, Outcome-supervised Value Models for Planning in Mathematical Reasoning

Code, metrics, and models for the paper [Outcome-supervised Verifiers for Planning in Mathematical Reasoning](https://arxiv.org/pdf/2311.09724.pdf)

The key technical implementations (`utils/sampling.py`):

1. **Value-guided beam search**: step-level beam search guided by a value model

2. **Allow batch generation with caculator using cache** (2-3 times faster than a naive implementation)

## Models and Data

| Model | Dataset | Link |
|----------------------|--------------|--------------------------|
| OVM-Llama2-7B | GSM8K | [parameters](https://huggingface.co/FreedomIntelligence/OVM-llama2-7b) |
| OVM-Mistral-7B | GSM8K | [parameters](https://huggingface.co/FreedomIntelligence/OVM-Mistral-7b) |

See the training data of our value models (generated by the generators) in [dataset](https://huggingface.co/datasets/FreedomIntelligence/OVM-dataset)

See the training data for Process Reward Models in GSM8K in [dataset](https://huggingface.co/datasets/FreedomIntelligence/OVM-process)

## Somethings for code

1. Directories
- `configs`: for model training with `accelerate`
- `data`: benchmark, and generator created data for training the value model
- `eval_results`: metrics and responses
- `generator`: generator-only (greedy, self-consistency, or pass@k)
- `verifier`: ORM accuracy
- `generator_with_verifier`: guided beam search, i.e. OVM and PRM
- `scripts`: scripts for training and inference
- `utils`: functions and classes

2. target_set
- GSM8K: there are `train` and `test`, which corresponds to training set and test set respectively
- Game of 24: there are `train` and `mid`
- `train`: the first 900 problems
- `mid`: problems index 901-1000

3. scripts for GSM8K and Game of 24 are similar. For simplicity, we only take GSM8K as the example below. You can simply run the same pipeline in Game of 24 by replacing `gsm8k` with `game24`

## Training

### Train the generator

Training data for generator:
- GSM8K: `data/gsm8k/train.jsonl`, from [OpenAI GSM8K](https://github.com/openai/grade-school-math/blob/master/grade_school_math/data/train.jsonl)
- Game of 24: `data/game24/train.jsonl`, the first 900 problems in `data/game24/24.csv` (from [ToT](https://github.com/princeton-nlp/tree-of-thought-llm/blob/master/src/tot/data/24/24.csv)) with enumerated solutions

To run the script `train_generator.sh` (under `scripts/gsm8k` or `scripts/game24`), you should first set `WANDB_API_KEY`, `WANDB_ENTITY`, `model_name_or_path`, `save_dir`. The generator is named by `save_generator_id`

```bash
cd OVM
bash scripts/gsm8k/train_generator.sh
```

### Train the OVM

#### Generation

First use the generator `generator_id` to generate `n_solutions` for each question in the training set,
```bash
cd OVM
bash scripts/gsm8k/generate.sh
```
You should first config the path of your generator checkpoint `model_name_or_path`, and set `--target_set train`

The output will be saved to `data/gsm8k/model_generation/`

#### Training

Train OVM using `train_verifier.sh`. First set `WANDB_API_KEY`, `WANDB_ENTITY`, `save_dir`, and `checkpoint_dir` (the path of generator checkpoint). The verifier is named with `save_verifier_id`
```bash
cd OVM
bash scripts/gsm8k/train_verifier.sh
```

## Inference

### Value-Guided Beam Search

Config your generator checkpoint path `model_name_or_path` and verifier checkpoint path `verifier_model_name_or_path` in `eval_step_beam.sh`
```bash
cd OVM
bash scripts/gsm8k/eval_step_beam.sh
```

(when `dedup_mode=1`, it will prioritize linguistically different candidates, which means when the sorted candidates are ['a', 'a', 'b', 'b', 'c'] it will select ['a', 'b', 'c'] rather than ['a', 'a', 'b'] if n_beam=3)

The output will be saved to `eval_results/gsm8k/generator_with_verifier/test`
(or `eval_results/game24/generator_with_verifier/mid`)

### Vanilla Sampling with ORM

1. First sample the data: config the generator checkpoint `model_name_or_path`, and set `--target_set test`
```bash
cd OVM
bash scripts/gsm8k/generate.sh
```

2. Then call ORM to score and rerank the samples: config the verifier checkpoint `verifier_model_name_or_path`
```bash
cd OVM
bash scripts/gsm8k/eval_with_verifier.sh
```

The output will be saved to `eval_results/gsm8k/generator_with_verifier/test`

### Greedy

Config your generator checkpoint path `model_name_or_path`
```bash
cd OVM
bash scripts/gsm8k/greedy_eval.sh
```
The output will be saved to `eval_results/gsm8k/generator/test`

## Citation
```
@misc{yu2023outcomesupervised,
title={Outcome-supervised Verifiers for Planning in Mathematical Reasoning},
author={Fei Yu and Anningzhe Gao and Benyou Wang},
year={2023},
eprint={2311.09724},
archivePrefix={arXiv},
primaryClass={cs.AI}
}
```

## Star History

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/freedomintelligence/ovm

Awesome Lists containing this project

README