https://github.com/tiger-ai-lab/acecoder
The official repo for "AceCoder: Acing Coder RL via Automated Test-Case Synthesis"
https://github.com/tiger-ai-lab/acecoder
code codellm llm
Last synced: 5 months ago
JSON representation
The official repo for "AceCoder: Acing Coder RL via Automated Test-Case Synthesis"
- Host: GitHub
- URL: https://github.com/tiger-ai-lab/acecoder
- Owner: TIGER-AI-Lab
- License: mit
- Created: 2025-02-02T21:17:31.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2025-04-09T06:00:20.000Z (7 months ago)
- Last Synced: 2025-04-09T06:32:05.138Z (7 months ago)
- Topics: code, codellm, llm
- Language: Python
- Homepage: https://tiger-ai-lab.github.io/AceCoder/
- Size: 5.66 MB
- Stars: 75
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# 🂡 AceCoder
Authors:
Huaye Zeng,
Dongfu Jiang,
HaoZhe Wang,
Ping Nie,
Xiaotong Chen,
Wenhu Chen @
TIGER-Lab Â
## 🔥News
- [2025/2/3] We release the [AceCoder Paper](https://arxiv.org/abs/2502.01718), along with the [🤗 Models and Datasets](https://huggingface.co/collections/TIGER-Lab/acecoder-67a16011a6c7d65cad529eba) on Hugging Face.
## Overview

Abstract
- We introduce AceCoder, the first work to propose a fully automated pipeline for synthesizing large-scale reliable tests used for the reward model training and reinforcement learning in the coding scenario. To do this, we curated the dataset [AceCode-87K](https://huggingface.co/datasets/TIGER-Lab/AceCode-87K), where we start from a seed code dataset and prompt powerful LLMs to "imagine" proper test cases for the coding question and filter the noisy ones.
- We trained two reward model [AceCodeRM-7B](https://huggingface.co/TIGER-Lab/AceCodeRM-7B) and [AceCodeRM-32B](https://huggingface.co/TIGER-Lab/AceCodeRM-32B) on the constructed [preference pairs](https://huggingface.co/datasets/TIGER-Lab/AceCodePair-300K). Best-of-N sampling results on HumanEval(+), MBPP(+), BigCodeBench, LiveCodeBench (V4) show consistent improvement.
- We perform RL training from three policy models: Qwen2.5-7B-Instruct and Qwen2.5-Coder-7B-Base and Qwen2.5-Coder-7B-Instruct. Two types of reward can be used, i.e. the trained reward model RM-7B and the rule-based reward, i.e. binary pass rate over the test cases in dataset. Additionaly, we also experiment with RL from the base model like DeepSeek-R1. Results show that directly RL from the Base Qwen2.5-Coder model can get **25%** improvement on HumanEval-plus and **6%** on MBPP-plus within just **80** optimization steps.
- To our knowledge, this is the first work to propose a fully automated pipeline for synthesizing large-scale reliable tests used for the reward model training and reinforcement learning in the coding scenario. We believe our \dataset{} will unlock the potential of RL training for code generation models and help the community to further push the boundaries of LLM's coding abilities.
## 📚Dataset
- [AceCode-87K](https://huggingface.co/datasets/TIGER-Lab/AceCode-87K): The first large-scale coding dataset with an average of 16 test cases per prompt, synthesized by GPT-4o-mini
- [AceCodePair-300K](https://huggingface.co/datasets/TIGER-Lab/AceCodePair-300K): Constructed preference pairs from AceCode-87K for training reward model.
- AceCode-87K-hard: where you can create sample 25% of the hard examples following commands [here](https://github.com/TIGER-AI-Lab/AceCoder/tree/main/train/train_rl#data-preparation)
## 🤗Model
### AceCodeRM (Reward Model)
- [AceCodeRM-7B](https://huggingface.co/TIGER-Lab/AceCodeRM-7B): A reward model trained on AceCodePair-300K from Qwen2.5-Coder-7B-Instruct
- [AceCodeRM-32B](https://huggingface.co/TIGER-Lab/AceCodeRM-32B): A reward model trained on AceCodePair-300K from Qwen2.5-Coder-32B-Instruct
### AceCoder (RL Model)
| Initial Policy Model | Reward Type | Training dataset | Final RL Model |
|:---------------------:|:-----------:|:----------------:|:--------------:|
| Qwen2.5-7B-Instruct | AceCodeRM-7B | AceCode-87K-hard (22k) | [TIGER-Lab/AceCoder-Qwen2.5-7B-Ins-RM](https://huggingface.co/TIGER-Lab/AceCoder-Qwen2.5-7B-Ins-RM) |
| Qwen2.5-7B-Instruct | Rule | AceCode-87K-hard (22k) | [TIGER-Lab/AceCoder-Qwen2.5-7B-Ins-Rule](https://huggingface.co/TIGER-Lab/AceCoder-Qwen2.5-7B-Ins-Rule) |
| Qwen2.5-Coder-7B-Instruct | AceCodeRM-7B | AceCode-87K-hard (22k) | [TIGER-Lab/AceCoder-Qwen2.5-Coder-7B-Ins-RM](https://huggingface.co/TIGER-Lab/AceCoder-Qwen2.5-Coder-7B-Ins-RM) |
| Qwen2.5-Coder-7B-Instruct | Rule | AceCode-87K-hard (22k) | [TIGER-Lab/AceCoder-Qwen2.5-Coder-7B-Ins-Rule](https://huggingface.co/TIGER-Lab/AceCoder-Qwen2.5-Coder-7B-Ins-Rule) |
| Qwen2.5-Coder-7B | AceCodeRM-7B | AceCode-87K-hard (22k) | [TIGER-Lab/AceCoder-Qwen2.5-Coder-7B-Base-RM](https://huggingface.co/TIGER-Lab/AceCoder-Qwen2.5-Coder-7B-Base-RM) |
| Qwen2.5-Coder-7B | Rule | AceCode-87K-hard (22k) | [TIGER-Lab/AceCoder-Qwen2.5-Coder-7B-Base-Rule](https://huggingface.co/TIGER-Lab/AceCoder-Qwen2.5-Coder-7B-Base-Rule) |
## 📈 Performance
See our [website](https://tiger-ai-lab.github.io/AceCoder/) or [paper](https://arxiv.org/abs/2502.01718) for detailed performance report.
## 🚀Quick Start
```bash
git submodule init
git submodule update
```
### Use AceCodrRM
First install acecoder as a package:
```bash
pip install https://github.com/TIGER-AI-Lab/AceCoder.git
```
Then see [examples/run_acecoderm.py](examples/run_acecoderm.py) for how to use AceCoderRM. Quick command `python examples/run_acecoderm.py` will run the example.
### Training Reward Model
See [train/train_rm/README.md](train/train_rm/README.md) for detailed instructions.
### Training RL Model
See [train/train_rl/README.md](train/train_rl/README.md) for detailed instructions.
### Evaluation
We use [Evalplus](https://github.com/evalplus/evalplus), [bigcodebench](https://github.com/bigcode-project/bigcodebench), [LiveCodeBench](https://github.com/LiveCodeBench/LiveCodeBench) for evaluation of HumanEval(+), MBPP(+), BigCodeBench, LiveCodeBench (V4) respectively.
## Citation
If you find this work helpful, please consider citing:
```bibtex
@article{AceCoder,
title={AceCoder: Acing Coder RL via Automated Test-Case Synthesis},
author={Zeng, Huaye and Jiang, Dongfu and Wang, Haozhe and Nie, Ping and Chen, Xiaotong and Chen, Wenhu},
journal={ArXiv},
year={2025},
volume={2502.01718}
}
```