https://github.com/tiger-ai-lab/acecoder

The official repo for "AceCoder: Acing Coder RL via Automated Test-Case Synthesis"
https://github.com/tiger-ai-lab/acecoder

code codellm llm

Last synced: 5 months ago
JSON representation

The official repo for "AceCoder: Acing Coder RL via Automated Test-Case Synthesis"

Host: GitHub
URL: https://github.com/tiger-ai-lab/acecoder
Owner: TIGER-AI-Lab
License: mit
Created: 2025-02-02T21:17:31.000Z (10 months ago)
Default Branch: main
Last Pushed: 2025-04-09T06:00:20.000Z (7 months ago)
Last Synced: 2025-04-09T06:32:05.138Z (7 months ago)
Topics: code, codellm, llm
Language: Python
Homepage: https://tiger-ai-lab.github.io/AceCoder/
Size: 5.66 MB
Stars: 75
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # 🂡 AceCoder


























     Authors:

     Huaye Zeng, 

     Dongfu Jiang, 

     HaoZhe Wang,

     Ping Nie,

     Xiaotong Chen,

     Wenhu Chen  @ 

     TIGER-Lab   

     

## 🔥News

- [2025/2/3] We release the [AceCoder Paper](https://arxiv.org/abs/2502.01718), along with the [🤗 Models and Datasets](https://huggingface.co/collections/TIGER-Lab/acecoder-67a16011a6c7d65cad529eba) on Hugging Face. 

## Overview

![./assets/images/ac_overview.png](./assets/images/ac_overview.png)

Abstract 

- We introduce AceCoder, the first work to propose a fully automated pipeline for synthesizing large-scale reliable tests used for the reward model training and reinforcement learning in the coding scenario. To do this, we curated the dataset [AceCode-87K](https://huggingface.co/datasets/TIGER-Lab/AceCode-87K), where we start from a seed code dataset and prompt powerful LLMs to "imagine" proper test cases for the coding question and filter the noisy ones.

- We trained two reward model [AceCodeRM-7B](https://huggingface.co/TIGER-Lab/AceCodeRM-7B) and [AceCodeRM-32B](https://huggingface.co/TIGER-Lab/AceCodeRM-32B) on the constructed [preference pairs](https://huggingface.co/datasets/TIGER-Lab/AceCodePair-300K). Best-of-N sampling results on HumanEval(+), MBPP(+), BigCodeBench, LiveCodeBench (V4) show consistent improvement.

- We perform RL training from three policy models: Qwen2.5-7B-Instruct and Qwen2.5-Coder-7B-Base and Qwen2.5-Coder-7B-Instruct. Two types of reward can be used, i.e. the trained reward model RM-7B and the rule-based reward, i.e. binary pass rate over the test cases in dataset. Additionaly, we also experiment with RL from the base model like DeepSeek-R1. Results show that directly RL from the Base Qwen2.5-Coder model can get **25%** improvement on HumanEval-plus and **6%** on MBPP-plus within just **80** optimization steps.

- To our knowledge, this is the first work to propose a fully automated pipeline for synthesizing large-scale reliable tests used for the reward model training and reinforcement learning in the coding scenario. We believe our \dataset{} will unlock the potential of RL training for code generation models and help the community to further push the boundaries of LLM's coding abilities.

## 📚Dataset

- [AceCode-87K](https://huggingface.co/datasets/TIGER-Lab/AceCode-87K): The first large-scale coding dataset with an average of 16 test cases per prompt, synthesized by GPT-4o-mini

- [AceCodePair-300K](https://huggingface.co/datasets/TIGER-Lab/AceCodePair-300K): Constructed preference pairs from AceCode-87K for training reward model.

- AceCode-87K-hard: where you can create sample 25% of the hard examples following commands [here](https://github.com/TIGER-AI-Lab/AceCoder/tree/main/train/train_rl#data-preparation)

## 🤗Model

### AceCodeRM (Reward Model)

- [AceCodeRM-7B](https://huggingface.co/TIGER-Lab/AceCodeRM-7B): A reward model trained on AceCodePair-300K from Qwen2.5-Coder-7B-Instruct

- [AceCodeRM-32B](https://huggingface.co/TIGER-Lab/AceCodeRM-32B): A reward model trained on AceCodePair-300K from Qwen2.5-Coder-32B-Instruct

### AceCoder (RL Model)

| Initial Policy Model | Reward Type | Training dataset | Final RL Model |

|:---------------------:|:-----------:|:----------------:|:--------------:|

| Qwen2.5-7B-Instruct   | AceCodeRM-7B      | AceCode-87K-hard (22k)      | [TIGER-Lab/AceCoder-Qwen2.5-7B-Ins-RM](https://huggingface.co/TIGER-Lab/AceCoder-Qwen2.5-7B-Ins-RM) |

| Qwen2.5-7B-Instruct   | Rule      | AceCode-87K-hard (22k)      | [TIGER-Lab/AceCoder-Qwen2.5-7B-Ins-Rule](https://huggingface.co/TIGER-Lab/AceCoder-Qwen2.5-7B-Ins-Rule) |

| Qwen2.5-Coder-7B-Instruct   | AceCodeRM-7B      | AceCode-87K-hard (22k)      | [TIGER-Lab/AceCoder-Qwen2.5-Coder-7B-Ins-RM](https://huggingface.co/TIGER-Lab/AceCoder-Qwen2.5-Coder-7B-Ins-RM) |

| Qwen2.5-Coder-7B-Instruct   | Rule      | AceCode-87K-hard (22k)      | [TIGER-Lab/AceCoder-Qwen2.5-Coder-7B-Ins-Rule](https://huggingface.co/TIGER-Lab/AceCoder-Qwen2.5-Coder-7B-Ins-Rule) |

| Qwen2.5-Coder-7B   | AceCodeRM-7B      | AceCode-87K-hard (22k)      | [TIGER-Lab/AceCoder-Qwen2.5-Coder-7B-Base-RM](https://huggingface.co/TIGER-Lab/AceCoder-Qwen2.5-Coder-7B-Base-RM) |

| Qwen2.5-Coder-7B   | Rule      | AceCode-87K-hard (22k)      | [TIGER-Lab/AceCoder-Qwen2.5-Coder-7B-Base-Rule](https://huggingface.co/TIGER-Lab/AceCoder-Qwen2.5-Coder-7B-Base-Rule) |

## 📈 Performance

See our [website](https://tiger-ai-lab.github.io/AceCoder/) or [paper](https://arxiv.org/abs/2502.01718) for detailed performance report.

## 🚀Quick Start

```bash

git submodule init

git submodule update

```

### Use AceCodrRM

First install acecoder as a package:

```bash

pip install https://github.com/TIGER-AI-Lab/AceCoder.git

```

Then see [examples/run_acecoderm.py](examples/run_acecoderm.py) for how to use AceCoderRM. Quick command `python examples/run_acecoderm.py` will run the example.

### Training Reward Model

See [train/train_rm/README.md](train/train_rm/README.md) for detailed instructions.

### Training RL Model

See [train/train_rl/README.md](train/train_rl/README.md) for detailed instructions.

### Evaluation

We use [Evalplus](https://github.com/evalplus/evalplus), [bigcodebench](https://github.com/bigcode-project/bigcodebench), [LiveCodeBench](https://github.com/LiveCodeBench/LiveCodeBench) for evaluation of HumanEval(+), MBPP(+), BigCodeBench, LiveCodeBench (V4) respectively.

## Citation

If you find this work helpful, please consider citing:

```bibtex

@article{AceCoder,

    title={AceCoder: Acing Coder RL via Automated Test-Case Synthesis},

    author={Zeng, Huaye and Jiang, Dongfu and Wang, Haozhe and Nie, Ping and Chen, Xiaotong and Chen, Wenhu},

    journal={ArXiv},

    year={2025},

    volume={2502.01718}

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tiger-ai-lab/acecoder

Awesome Lists containing this project

README