https://github.com/tiger-ai-lab/general-reasoner

General Reasoner: Advancing LLM Reasoning Across All Domains
https://github.com/tiger-ai-lab/general-reasoner
llm reasoning rl
Last synced: 4 months ago
JSON representation
General Reasoner: Advancing LLM Reasoning Across All Domains
Host: GitHub
URL: https://github.com/tiger-ai-lab/general-reasoner
Owner: TIGER-AI-Lab
Created: 2025-04-10T03:28:44.000Z (6 months ago)
Default Branch: main
Last Pushed: 2025-06-10T04:14:56.000Z (4 months ago)
Last Synced: 2025-06-10T05:21:28.457Z (4 months ago)
Topics: llm, reasoning, rl
Language: Python
Homepage: https://tiger-ai-lab.github.io/General-Reasoner
Size: 7.49 MB
Stars: 132
Watchers: 2
Forks: 6
Open Issues: 1
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

          # General-Reasoner: Advancing LLM Reasoning Across All Domains (beyond Math)

🪡 We introduce a novel framework within GRPO to utilize model-based verifier to support wider range of answer types. We train on WebInstruct-verfied, a dataset covering wide range of reasoning topics beyond math. We demonstrate substantial improvements over traditional binary rule-based rewards across diverse domains beyond math. 

✅ Model-based verifier to support verification of diverse answer types like math expression, string, list, fraction, matrix, etc;


✅ Small 7B/14B models achieve robust cross-domain rewards; It boosts MMLU-Pro performance by 13%.


✅ Our method does not require any additional SFT.

Check out our [Arxiv Paper](https://arxiv.org/abs/2505.14652) for the details!

## Highlights

Our experimental results are as follows:



  



Our discpline distribution is described as follows:



  



Our model-based verifier helps scaling the verifiable reasoning questions:



  



---

## Resources

### Training Data

|Data|Size|Link|

|-|-|-|

|WebInstruct-verified| 230k | [🤗](https://huggingface.co/datasets/TIGER-Lab/WebInstruct-verified)|

### General Verifier

|Model|Backbone|Link|

|-|-|-|

|General-Verifier|Qwen/Qwen2.5-Math-1.5B|[🤗](https://huggingface.co/TIGER-Lab/general-verifier)|

Check out HF page to learn how to use it. Feel free to plug this into your current RL-training framework. 

### Model Checkpoint

|Model|Backbone|Link|

|-|-|-|

|General-Reasoner-Qwen2.5-7B|Qwen2.5-7B-Base|[🤗](https://huggingface.co/TIGER-Lab/General-Reasoner-Qwen2.5-7B)|

|General-Reasoner-Qwen2.5-14B|Qwen2.5-14B-Base|[🤗](https://huggingface.co/TIGER-Lab/General-Reasoner-Qwen2.5-14B)|

|General-Reasoner-Qwen3-4B|Qwen3-4B-Base|[🤗](https://huggingface.co/TIGER-Lab/General-Reasoner-Qwen3-4B)|

|General-Reasoner-Qwen3-14B|Qwen3-14B-Base|[🤗](https://huggingface.co/TIGER-Lab/General-Reasoner-Qwen3-14B)|

---

## Installation

```bash

pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu124

pip install flash-attn --no-build-isolation

pip install -e ./verl

pip install vllm==0.8.3

pip install flashinfer-python

pip install math-verify

```

---

## Training

### 1. Prepare Data

```bash

python data_preprocess.py --local-dir /webinstruct-verified

```

### 2. Download Verifier

```bash

huggingface-cli download TIGER-Lab/general-verifier --local-dir /general-reasoner-verifier

```

### 3. Download Backbone Model

```bash

huggingface-cli download Qwen/Qwen2.5-7B --local-dir /Qwen2.5-7B

```

### 4. Configure Training Script

Edit the environment variables in `train_general_reasoner.sh` to fit your system setup.

### 5. Launch Ray Cluster

```bash

ray start --address :6379

```

### 6. Start Training

```bash

bash train_general_reasoner.sh

```

---

## Evaluation

### MMLU-PRO:

```bash

python -m evaluation.eval_mmlupro \

    --model_path TIGER-Lab/General-Reasoner-14B \

    --output_file output-mmlupro-General-Reasoner-14B.json

```

### SuperGPQA:

```bash

python -m evaluation.eval_supergpqa \

    --model_path TIGER-Lab/General-Reasoner-14B \

    --output_file output-supergpqa-General-Reasoner-14B.json

```

### Math-Related Tasks & GPQA

We evaluate math and GPQA tasks using the `simple-eval` framework.

For non-multiple choice questions, answer equivalence is verified using `GPT-4o`.

#### 1. Configure OpenAI Key

```bash

export OPENAI_API_KEY=

```

#### 2. Serve the Model

```bash

vllm serve TIGER-Lab/General-Reasoner-14B --tensor-parallel-size 4

```

#### 3. Run Evaluation

```bash

python -m evaluation.simple-evals.run_simple_evals_qwen \

    --model General-Reasoner-14B

```

> By default, the model uses greedy decoding.

> For AIME24 and AIME25, scores are averaged over 32 runs with temperature 1.

> For more configuration details, refer to `evaluation/simple-evals/run_simple_evals_qwen.py`.

---

## Detailed Results

Our 7B and 14B models are trained from the corresponding base qwen models.

### General results

| Model Name                   | MMLU-Pro | GPQA     | SuperGPQA | TheoremQA | BBEH     |

| ---------------------------- | -------- | -------- | --------- | --------- | -------- |

| Qwen2.5-7B-Base              | 47.7     | 25.8     | 26.7      | 29.1      | 8.0      |

| Qwen2.5-7B-Instruct          | 57.0     | 33.8     | 30.7      | 36.6      | 12.2     |

| SimpleRL-Qwen2.5-7B-Zoo      | 51.5     | 24.2     | 29.9      | 38.0      | 11.9     |

| **General-Reasoner-7B**      | **58.9** | **34.3** | **34.2**  | **45.3**  | **12.5** |

|                              |          |          |           |           |          |

| Qwen2.5-14B-Base             | 53.3     | 32.8     | 30.7      | 33.0      | 10.8     |

| Qwen2.5-14B-Instruct         | 62.7     | 41.4     | 35.8      | 41.9      | 15.2     |

| SimpleRL-Qwen2.5-14B-Zoo     | 64.0     | 39.4     | 35.7      | 40.8      | 13.6     |

| **General-Reasoner-14B**     | **66.6** | **43.4** | **39.5**  | **44.3**  | **15.2** |

|                              |          |          |           |           |          |

| Qwen3-4B-Base                | 51.6     | 26.3     | 25.4      | 34.8      | 8.1      |

| Qwen3-4B-Instruct            | 61.8     | 41.7     | 32.1      | 42.0      | **14.9** |

| **General-Reasoner-Qw3-4B**  | **62.8** | **42.9** | **32.5**  | **48.3**  | 12.2     |

|                              |          |          |           |           |          |

| Qwen3-14B-Base               | 64.2     | 45.9     | 36.5      | 44.0      | 13.0     |

| Qwen3-14B-Instruct           | **70.9** | 54.8     | 39.8      | 42.4      | **19.2** |

| **General-Reasoner-Qw3-14B** | 70.3     | **56.1** | **39.9**  | **54.4**  | 17.3     |

### Math-related results

| Model Name                   | MATH-500 | Olympiad | Minerva | GSM8K | AMC  | AIME24x32 | AIME25x32 |

| ---------------------------- | -------- | -------- | ------- | ----- | ---- | --------- | --------- |

| Qwen2.5-7B-Base              | 60.2     | 28.6     | 36.0    | 83.1  | 30.0 | 3.8       | 1.4       |

| Qwen2.5-7B-Instruct          | 75.0     | 39.4     | 45.2    | 90.9  | 52.5 | 12.5      | 8.5       |

| SimpleRL-Qwen2.5-7B-Zoo      | 74.0     | 41.9     | 49.6    | 90.7  | 60.0 | 15.2      | 7.5       |

| **General-Reasoner-7B**      | 76.0     | 37.9     | 54.0    | 92.7  | 55.0 | 13.8      | 10.4      |

|                              |          |          |         |       |      |           |           |

| Qwen2.5-14B-Base             | 65.4     | 33.5     | 24.3    | 91.6  | 37.5 | 3.6       | 2.9       |

| Qwen2.5-14B-Instruct         | 77.4     | 44.7     | 52.2    | 94.5  | 57.5 | 12.2      | 11.0      |

| SimpleRL-Qwen2.5-14B-Zoo     | 77.2     | 44.6     | 54.0    | 94.2  | 60.0 | 12.9      | 11.8      |

| **General-Reasoner-14B**     | 78.6     | 42.1     | 58.1    | 94.2  | 70.0 | 17.5      | 16.9      |

|                              |          |          |         |       |      |           |           |

| Qwen3-4B-Base                | 68.2     | 34.8     | 42.3    | 72.6  | 47.5 | 10.3      | 6.7       |

| Qwen3-4B-Instruct(non-think) | 80.4     | 49.0     | 57.0    | 92.0  | 62.5 | 22.5      | 16.1      |

| **General-Reasoner-Qw3-4B**  | 80.6     | 47.7     | 57.7    | 92.2  | 60.0 | 20.0      | 15.4      |

|                              |          |          |         |       |      |           |           |

| Qwen3-14B-Base               | 74.6     | 44.3     | 55.9    | 93.2  | 55.0 | 14.7      | 11.4      |

| Qwen3-14B-Instruct(non-think)| 82.0     | 52.4     | 59.9    | 93.9  | 57.5 | 28.5      | 25.1      |

| **General-Reasoner-Qw3-14B** | 83.8     | 51.9     | 68.0    | 94.4  | 70.0 | 24.4      | 19.2      |

## Acknowledgements

This project is built upon the following open-source projects:

- [VERL](https://github.com/volcengine/verl/tree/main/verl)  

- [simpleRL-reason](https://github.com/hkust-nlp/simpleRL-reason)  

- [simple-evals](https://github.com/openai/simple-evals)

## Citation

```tex

@article{general-reasoner,

      title={General-Reasoner: Advancing LLM Reasoning Across All Domains}, 

      author={Xueguang Ma and Qian Liu and Dongfu Jiang and Ge Zhang and Zejun Ma and Wenhu Chen},

      year={2025},

      journal={arXiv:2505.14652},

      url={https://arxiv.org/abs/2505.14652}, 

}

```
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tiger-ai-lab/general-reasoner

Awesome Lists containing this project

README