https://github.com/tiger-ai-lab/general-reasoner
General Reasoner: Advancing LLM Reasoning Across All Domains
https://github.com/tiger-ai-lab/general-reasoner
llm reasoning rl
Last synced: 4 months ago
JSON representation
General Reasoner: Advancing LLM Reasoning Across All Domains
- Host: GitHub
- URL: https://github.com/tiger-ai-lab/general-reasoner
- Owner: TIGER-AI-Lab
- Created: 2025-04-10T03:28:44.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2025-06-10T04:14:56.000Z (4 months ago)
- Last Synced: 2025-06-10T05:21:28.457Z (4 months ago)
- Topics: llm, reasoning, rl
- Language: Python
- Homepage: https://tiger-ai-lab.github.io/General-Reasoner
- Size: 7.49 MB
- Stars: 132
- Watchers: 2
- Forks: 6
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# General-Reasoner: Advancing LLM Reasoning Across All Domains (beyond Math)
🪡 We introduce a novel framework within GRPO to utilize model-based verifier to support wider range of answer types. We train on WebInstruct-verfied, a dataset covering wide range of reasoning topics beyond math. We demonstrate substantial improvements over traditional binary rule-based rewards across diverse domains beyond math.
✅ Model-based verifier to support verification of diverse answer types like math expression, string, list, fraction, matrix, etc;
✅ Small 7B/14B models achieve robust cross-domain rewards; It boosts MMLU-Pro performance by 13%.
✅ Our method does not require any additional SFT.Check out our [Arxiv Paper](https://arxiv.org/abs/2505.14652) for the details!
## Highlights
Our experimental results are as follows:
![]()
Our discpline distribution is described as follows:
![]()
Our model-based verifier helps scaling the verifiable reasoning questions:
![]()
---
## Resources
### Training Data
|Data|Size|Link|
|-|-|-|
|WebInstruct-verified| 230k | [🤗](https://huggingface.co/datasets/TIGER-Lab/WebInstruct-verified)|### General Verifier
|Model|Backbone|Link|
|-|-|-|
|General-Verifier|Qwen/Qwen2.5-Math-1.5B|[🤗](https://huggingface.co/TIGER-Lab/general-verifier)|Check out HF page to learn how to use it. Feel free to plug this into your current RL-training framework.
### Model Checkpoint
|Model|Backbone|Link|
|-|-|-|
|General-Reasoner-Qwen2.5-7B|Qwen2.5-7B-Base|[🤗](https://huggingface.co/TIGER-Lab/General-Reasoner-Qwen2.5-7B)|
|General-Reasoner-Qwen2.5-14B|Qwen2.5-14B-Base|[🤗](https://huggingface.co/TIGER-Lab/General-Reasoner-Qwen2.5-14B)|
|General-Reasoner-Qwen3-4B|Qwen3-4B-Base|[🤗](https://huggingface.co/TIGER-Lab/General-Reasoner-Qwen3-4B)|
|General-Reasoner-Qwen3-14B|Qwen3-14B-Base|[🤗](https://huggingface.co/TIGER-Lab/General-Reasoner-Qwen3-14B)|
---## Installation
```bash
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu124
pip install flash-attn --no-build-isolation
pip install -e ./verl
pip install vllm==0.8.3
pip install flashinfer-python
pip install math-verify
```---
## Training
### 1. Prepare Data
```bash
python data_preprocess.py --local-dir /webinstruct-verified
```### 2. Download Verifier
```bash
huggingface-cli download TIGER-Lab/general-verifier --local-dir /general-reasoner-verifier
```### 3. Download Backbone Model
```bash
huggingface-cli download Qwen/Qwen2.5-7B --local-dir /Qwen2.5-7B
```### 4. Configure Training Script
Edit the environment variables in `train_general_reasoner.sh` to fit your system setup.### 5. Launch Ray Cluster
```bash
ray start --address :6379
```### 6. Start Training
```bash
bash train_general_reasoner.sh
```---
## Evaluation
### MMLU-PRO:
```bash
python -m evaluation.eval_mmlupro \
--model_path TIGER-Lab/General-Reasoner-14B \
--output_file output-mmlupro-General-Reasoner-14B.json
```### SuperGPQA:
```bash
python -m evaluation.eval_supergpqa \
--model_path TIGER-Lab/General-Reasoner-14B \
--output_file output-supergpqa-General-Reasoner-14B.json
```### Math-Related Tasks & GPQA
We evaluate math and GPQA tasks using the `simple-eval` framework.
For non-multiple choice questions, answer equivalence is verified using `GPT-4o`.#### 1. Configure OpenAI Key
```bash
export OPENAI_API_KEY=
```#### 2. Serve the Model
```bash
vllm serve TIGER-Lab/General-Reasoner-14B --tensor-parallel-size 4
```#### 3. Run Evaluation
```bash
python -m evaluation.simple-evals.run_simple_evals_qwen \
--model General-Reasoner-14B
```> By default, the model uses greedy decoding.
> For AIME24 and AIME25, scores are averaged over 32 runs with temperature 1.
> For more configuration details, refer to `evaluation/simple-evals/run_simple_evals_qwen.py`.---
## Detailed Results
Our 7B and 14B models are trained from the corresponding base qwen models.
### General results
| Model Name | MMLU-Pro | GPQA | SuperGPQA | TheoremQA | BBEH |
| ---------------------------- | -------- | -------- | --------- | --------- | -------- |
| Qwen2.5-7B-Base | 47.7 | 25.8 | 26.7 | 29.1 | 8.0 |
| Qwen2.5-7B-Instruct | 57.0 | 33.8 | 30.7 | 36.6 | 12.2 |
| SimpleRL-Qwen2.5-7B-Zoo | 51.5 | 24.2 | 29.9 | 38.0 | 11.9 |
| **General-Reasoner-7B** | **58.9** | **34.3** | **34.2** | **45.3** | **12.5** |
| | | | | | |
| Qwen2.5-14B-Base | 53.3 | 32.8 | 30.7 | 33.0 | 10.8 |
| Qwen2.5-14B-Instruct | 62.7 | 41.4 | 35.8 | 41.9 | 15.2 |
| SimpleRL-Qwen2.5-14B-Zoo | 64.0 | 39.4 | 35.7 | 40.8 | 13.6 |
| **General-Reasoner-14B** | **66.6** | **43.4** | **39.5** | **44.3** | **15.2** |
| | | | | | |
| Qwen3-4B-Base | 51.6 | 26.3 | 25.4 | 34.8 | 8.1 |
| Qwen3-4B-Instruct | 61.8 | 41.7 | 32.1 | 42.0 | **14.9** |
| **General-Reasoner-Qw3-4B** | **62.8** | **42.9** | **32.5** | **48.3** | 12.2 |
| | | | | | |
| Qwen3-14B-Base | 64.2 | 45.9 | 36.5 | 44.0 | 13.0 |
| Qwen3-14B-Instruct | **70.9** | 54.8 | 39.8 | 42.4 | **19.2** |
| **General-Reasoner-Qw3-14B** | 70.3 | **56.1** | **39.9** | **54.4** | 17.3 |### Math-related results
| Model Name | MATH-500 | Olympiad | Minerva | GSM8K | AMC | AIME24x32 | AIME25x32 |
| ---------------------------- | -------- | -------- | ------- | ----- | ---- | --------- | --------- |
| Qwen2.5-7B-Base | 60.2 | 28.6 | 36.0 | 83.1 | 30.0 | 3.8 | 1.4 |
| Qwen2.5-7B-Instruct | 75.0 | 39.4 | 45.2 | 90.9 | 52.5 | 12.5 | 8.5 |
| SimpleRL-Qwen2.5-7B-Zoo | 74.0 | 41.9 | 49.6 | 90.7 | 60.0 | 15.2 | 7.5 |
| **General-Reasoner-7B** | 76.0 | 37.9 | 54.0 | 92.7 | 55.0 | 13.8 | 10.4 |
| | | | | | | | |
| Qwen2.5-14B-Base | 65.4 | 33.5 | 24.3 | 91.6 | 37.5 | 3.6 | 2.9 |
| Qwen2.5-14B-Instruct | 77.4 | 44.7 | 52.2 | 94.5 | 57.5 | 12.2 | 11.0 |
| SimpleRL-Qwen2.5-14B-Zoo | 77.2 | 44.6 | 54.0 | 94.2 | 60.0 | 12.9 | 11.8 |
| **General-Reasoner-14B** | 78.6 | 42.1 | 58.1 | 94.2 | 70.0 | 17.5 | 16.9 |
| | | | | | | | |
| Qwen3-4B-Base | 68.2 | 34.8 | 42.3 | 72.6 | 47.5 | 10.3 | 6.7 |
| Qwen3-4B-Instruct(non-think) | 80.4 | 49.0 | 57.0 | 92.0 | 62.5 | 22.5 | 16.1 |
| **General-Reasoner-Qw3-4B** | 80.6 | 47.7 | 57.7 | 92.2 | 60.0 | 20.0 | 15.4 |
| | | | | | | | |
| Qwen3-14B-Base | 74.6 | 44.3 | 55.9 | 93.2 | 55.0 | 14.7 | 11.4 |
| Qwen3-14B-Instruct(non-think)| 82.0 | 52.4 | 59.9 | 93.9 | 57.5 | 28.5 | 25.1 |
| **General-Reasoner-Qw3-14B** | 83.8 | 51.9 | 68.0 | 94.4 | 70.0 | 24.4 | 19.2 |## Acknowledgements
This project is built upon the following open-source projects:
- [VERL](https://github.com/volcengine/verl/tree/main/verl)
- [simpleRL-reason](https://github.com/hkust-nlp/simpleRL-reason)
- [simple-evals](https://github.com/openai/simple-evals)## Citation
```tex
@article{general-reasoner,
title={General-Reasoner: Advancing LLM Reasoning Across All Domains},
author={Xueguang Ma and Qian Liu and Dongfu Jiang and Ge Zhang and Zejun Ma and Wenhu Chen},
year={2025},
journal={arXiv:2505.14652},
url={https://arxiv.org/abs/2505.14652},
}
```