{"id":28653883,"url":"https://github.com/tiger-ai-lab/acecoder","last_synced_at":"2025-06-13T07:08:01.018Z","repository":{"id":275491730,"uuid":"926233491","full_name":"TIGER-AI-Lab/AceCoder","owner":"TIGER-AI-Lab","description":"The official repo for \"AceCoder: Acing Coder RL via Automated Test-Case Synthesis\"","archived":false,"fork":false,"pushed_at":"2025-04-09T06:00:20.000Z","size":5930,"stargazers_count":75,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-09T06:32:05.138Z","etag":null,"topics":["code","codellm","llm"],"latest_commit_sha":null,"homepage":"https://tiger-ai-lab.github.io/AceCoder/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TIGER-AI-Lab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-02-02T21:17:31.000Z","updated_at":"2025-04-09T06:00:24.000Z","dependencies_parsed_at":"2025-03-29T08:23:24.005Z","dependency_job_id":"82f40c6b-ca46-4bac-94e7-2913148e2f84","html_url":"https://github.com/TIGER-AI-Lab/AceCoder","commit_stats":null,"previous_names":["wenhuchen/acecoder"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/TIGER-AI-Lab/AceCoder","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TIGER-AI-Lab%2FAceCoder","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TIGER-AI-Lab%2FAceCoder/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TIGER-AI-Lab%2FAceCoder/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TIGER-AI-Lab%2FAceCoder/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TIGER-AI-Lab","download_url":"https://codeload.github.com/TIGER-AI-Lab/AceCoder/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TIGER-AI-Lab%2FAceCoder/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259599331,"owners_count":22882357,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["code","codellm","llm"],"created_at":"2025-06-13T07:08:00.396Z","updated_at":"2025-06-13T07:08:00.983Z","avatar_url":"https://github.com/TIGER-AI-Lab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🂡 AceCoder\n\n\u003ca target=\"_blank\" href=\"https://arxiv.org/abs/2502.01718\"\u003e\n\u003cimg style=\"height:22pt\" src=\"https://img.shields.io/badge/-Paper-red?style=flat\u0026logo=arxiv\"\u003e\u003c/a\u003e\n\u003ca target=\"_blank\" href=\"https://github.com/TIGER-AI-Lab/AceCoder\"\u003e\n\u003cimg style=\"height:22pt\" src=\"https://img.shields.io/badge/-Code-green?style=flat\u0026logo=github\"\u003e\u003c/a\u003e\n\u003ca target=\"_blank\" href=\"https://tiger-ai-lab.github.io/AceCoder/\"\u003e\n\u003cimg style=\"height:22pt\" src=\"https://img.shields.io/badge/-🌐%20Website-blue?style=flat\"\u003e\u003c/a\u003e\n\u003ca target=\"_blank\" href=\"https://huggingface.co/datasets/TIGER-Lab/AceCode-87K\"\u003e\n\u003cimg style=\"height:22pt\" src=\"https://img.shields.io/badge/-🤗%20Dataset-red?style=flat\"\u003e\u003c/a\u003e\n\u003ca target=\"_blank\" href=\"https://huggingface.co/collections/TIGER-Lab/acecoder-67a16011a6c7d65cad529eba\"\u003e\n\u003cimg style=\"height:22pt\" src=\"https://img.shields.io/badge/-🤗%20Models-red?style=flat\"\u003e\u003c/a\u003e\n\u003c!-- \u003ca target=\"_blank\" href=\"https://twitter.com/DongfuJiang/status/1805438506137010326\"\u003e\n\u003cimg style=\"height:22pt\" src=\"https://img.shields.io/badge/-Tweet-blue?style=flat\u0026logo=twitter\"\u003e\u003c/a\u003e --\u003e\n\u003cbr\u003e\n\n\n\u003cspan style=\"color:#183385; font-size: 14pt; font-family: Roboto, Helvetica, Arial, Heveltica Neue, sans-serif\"\u003e\n     \u003cb\u003eAuthors:\u003c/b\u003e\n     \u003ca class=\"name\" target=\"_blank\" href=\"https://www.wyett-zeng.com/about.html\"\u003eHuaye Zeng\u003c/a\u003e, \n     \u003ca class=\"name\" target=\"_blank\" href=\"https://jdf-prog.github.io/\"\u003eDongfu Jiang\u003c/a\u003e, \n     \u003ca class=\"name\" target=\"_blank\" href=\"#\"\u003eHaoZhe Wang\u003c/a\u003e,\n     \u003ca class=\"name\" target=\"_blank\" href=\"#\"\u003ePing Nie\u003c/a\u003e,\n     \u003ca class=\"name\" target=\"_blank\" href=\"#\"\u003eXiaotong Chen\u003c/a\u003e,\n     \u003ca class=\"name\" target=\"_blank\" href=\"https://wenhuchen.github.io/\"\u003eWenhu Chen\u003c/a\u003e\u0026nbsp; @ \n     \u003ca class=\"btna\" target=\"_blank\" href=\"https://huggingface.co/TIGER-Lab\"\u003eTIGER-Lab\u003c/a\u003e \u0026nbsp; \n     \u003c/span\u003e\n\n## 🔥News\n\n- [2025/2/3] We release the [AceCoder Paper](https://arxiv.org/abs/2502.01718), along with the [🤗 Models and Datasets](https://huggingface.co/collections/TIGER-Lab/acecoder-67a16011a6c7d65cad529eba) on Hugging Face. \n\n\n## Overview\n![./assets/images/ac_overview.png](./assets/images/ac_overview.png)\n\n\u003cdetails\u003e\u003csummary\u003eAbstract\u003c/summary\u003e \n\n- We introduce AceCoder, the first work to propose a fully automated pipeline for synthesizing large-scale reliable tests used for the reward model training and reinforcement learning in the coding scenario. To do this, we curated the dataset [AceCode-87K](https://huggingface.co/datasets/TIGER-Lab/AceCode-87K), where we start from a seed code dataset and prompt powerful LLMs to \"imagine\" proper test cases for the coding question and filter the noisy ones.\n\n- We trained two reward model [AceCodeRM-7B](https://huggingface.co/TIGER-Lab/AceCodeRM-7B) and [AceCodeRM-32B](https://huggingface.co/TIGER-Lab/AceCodeRM-32B) on the constructed [preference pairs](https://huggingface.co/datasets/TIGER-Lab/AceCodePair-300K). Best-of-N sampling results on HumanEval(+), MBPP(+), BigCodeBench, LiveCodeBench (V4) show consistent improvement.\n\n- We perform RL training from three policy models: Qwen2.5-7B-Instruct and Qwen2.5-Coder-7B-Base and Qwen2.5-Coder-7B-Instruct. Two types of reward can be used, i.e. the trained reward model RM-7B and the rule-based reward, i.e. binary pass rate over the test cases in dataset. Additionaly, we also experiment with RL from the base model like DeepSeek-R1. Results show that directly RL from the Base Qwen2.5-Coder model can get **25%** improvement on HumanEval-plus and **6%** on MBPP-plus within just **80** optimization steps.\n\n- To our knowledge, this is the first work to propose a fully automated pipeline for synthesizing large-scale reliable tests used for the reward model training and reinforcement learning in the coding scenario. We believe our \\dataset{} will unlock the potential of RL training for code generation models and help the community to further push the boundaries of LLM's coding abilities.\n\n\u003c/details\u003e\n\n## 📚Dataset\n- [AceCode-87K](https://huggingface.co/datasets/TIGER-Lab/AceCode-87K): The first large-scale coding dataset with an average of 16 test cases per prompt, synthesized by GPT-4o-mini\n- [AceCodePair-300K](https://huggingface.co/datasets/TIGER-Lab/AceCodePair-300K): Constructed preference pairs from AceCode-87K for training reward model.\n- AceCode-87K-hard: where you can create sample 25% of the hard examples following commands [here](https://github.com/TIGER-AI-Lab/AceCoder/tree/main/train/train_rl#data-preparation)\n\n## 🤗Model\n\n### AceCodeRM (Reward Model)\n- [AceCodeRM-7B](https://huggingface.co/TIGER-Lab/AceCodeRM-7B): A reward model trained on AceCodePair-300K from Qwen2.5-Coder-7B-Instruct\n- [AceCodeRM-32B](https://huggingface.co/TIGER-Lab/AceCodeRM-32B): A reward model trained on AceCodePair-300K from Qwen2.5-Coder-32B-Instruct\n\n### AceCoder (RL Model)\n| Initial Policy Model | Reward Type | Training dataset | Final RL Model |\n|:---------------------:|:-----------:|:----------------:|:--------------:|\n| Qwen2.5-7B-Instruct   | AceCodeRM-7B      | AceCode-87K-hard (22k)      | [TIGER-Lab/AceCoder-Qwen2.5-7B-Ins-RM](https://huggingface.co/TIGER-Lab/AceCoder-Qwen2.5-7B-Ins-RM) |\n| Qwen2.5-7B-Instruct   | Rule      | AceCode-87K-hard (22k)      | [TIGER-Lab/AceCoder-Qwen2.5-7B-Ins-Rule](https://huggingface.co/TIGER-Lab/AceCoder-Qwen2.5-7B-Ins-Rule) |\n| Qwen2.5-Coder-7B-Instruct   | AceCodeRM-7B      | AceCode-87K-hard (22k)      | [TIGER-Lab/AceCoder-Qwen2.5-Coder-7B-Ins-RM](https://huggingface.co/TIGER-Lab/AceCoder-Qwen2.5-Coder-7B-Ins-RM) |\n| Qwen2.5-Coder-7B-Instruct   | Rule      | AceCode-87K-hard (22k)      | [TIGER-Lab/AceCoder-Qwen2.5-Coder-7B-Ins-Rule](https://huggingface.co/TIGER-Lab/AceCoder-Qwen2.5-Coder-7B-Ins-Rule) |\n| Qwen2.5-Coder-7B   | AceCodeRM-7B      | AceCode-87K-hard (22k)      | [TIGER-Lab/AceCoder-Qwen2.5-Coder-7B-Base-RM](https://huggingface.co/TIGER-Lab/AceCoder-Qwen2.5-Coder-7B-Base-RM) |\n| Qwen2.5-Coder-7B   | Rule      | AceCode-87K-hard (22k)      | [TIGER-Lab/AceCoder-Qwen2.5-Coder-7B-Base-Rule](https://huggingface.co/TIGER-Lab/AceCoder-Qwen2.5-Coder-7B-Base-Rule) |\n\n## 📈 Performance\nSee our [website](https://tiger-ai-lab.github.io/AceCoder/) or [paper](https://arxiv.org/abs/2502.01718) for detailed performance report.\n\n## 🚀Quick Start\n\n```bash\ngit submodule init\ngit submodule update\n```\n\n### Use AceCodrRM\nFirst install acecoder as a package:\n```bash\npip install https://github.com/TIGER-AI-Lab/AceCoder.git\n```\nThen see [examples/run_acecoderm.py](examples/run_acecoderm.py) for how to use AceCoderRM. Quick command `python examples/run_acecoderm.py` will run the example.\n\n### Training Reward Model\nSee [train/train_rm/README.md](train/train_rm/README.md) for detailed instructions.\n\n### Training RL Model\nSee [train/train_rl/README.md](train/train_rl/README.md) for detailed instructions.\n\n### Evaluation\nWe use [Evalplus](https://github.com/evalplus/evalplus), [bigcodebench](https://github.com/bigcode-project/bigcodebench), [LiveCodeBench](https://github.com/LiveCodeBench/LiveCodeBench) for evaluation of HumanEval(+), MBPP(+), BigCodeBench, LiveCodeBench (V4) respectively.\n\n## Citation\nIf you find this work helpful, please consider citing:\n```bibtex\n@article{AceCoder,\n    title={AceCoder: Acing Coder RL via Automated Test-Case Synthesis},\n    author={Zeng, Huaye and Jiang, Dongfu and Wang, Haozhe and Nie, Ping and Chen, Xiaotong and Chen, Wenhu},\n    journal={ArXiv},\n    year={2025},\n    volume={2502.01718}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftiger-ai-lab%2Facecoder","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftiger-ai-lab%2Facecoder","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftiger-ai-lab%2Facecoder/lists"}