{"id":19932048,"url":"https://github.com/amazon-science/llm-code-preference","last_synced_at":"2025-10-07T00:09:03.871Z","repository":{"id":259261337,"uuid":"876971321","full_name":"amazon-science/llm-code-preference","owner":"amazon-science","description":"Training and Benchmarking LLMs for Code Preference.","archived":false,"fork":false,"pushed_at":"2024-11-15T21:00:10.000Z","size":773,"stargazers_count":35,"open_issues_count":0,"forks_count":2,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-09-09T05:03:39.823Z","etag":null,"topics":["code-generation","llm-evaluation","llm-training","llms-benchmarking"],"latest_commit_sha":null,"homepage":"https://llm-code-preference.github.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/amazon-science.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-10-22T21:32:25.000Z","updated_at":"2025-08-15T19:22:06.000Z","dependencies_parsed_at":"2025-07-12T13:29:43.709Z","dependency_job_id":null,"html_url":"https://github.com/amazon-science/llm-code-preference","commit_stats":null,"previous_names":["amazon-science/llm-code-preference"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/amazon-science/llm-code-preference","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amazon-science%2Fllm-code-preference","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amazon-science%2Fllm-code-preference/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amazon-science%2Fllm-code-preference/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amazon-science%2Fllm-code-preference/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/amazon-science","download_url":"https://codeload.github.com/amazon-science/llm-code-preference/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amazon-science%2Fllm-code-preference/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278699052,"owners_count":26030444,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-06T02:00:05.630Z","response_time":65,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["code-generation","llm-evaluation","llm-training","llms-benchmarking"],"created_at":"2024-11-12T23:08:52.403Z","updated_at":"2025-10-07T00:09:03.843Z","avatar_url":"https://github.com/amazon-science.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Learning Code Preference via Synthetic Evolution\n\n[![Web](https://img.shields.io/badge/Website-8A2BE2.svg?style=flat-square)](https://llm-code-preference.github.io/)\n[![arXiv](https://img.shields.io/badge/arXiv-2410.03837-b31b1b.svg?style=flat-square)](https://arxiv.org/abs/2410.03837)\n\n\u003cp align=\"center\"\u003e\n    \u003ca href=\"#-tldr\"\u003e📰 TL;DR\u003c/a\u003e •\n    \u003ca href=\"#-evaluation\"\u003e🔎 Evaluation\u003c/a\u003e •\n    \u003ca href=\"#-training\"\u003e🧪 Training\u003c/a\u003e •\n    \u003ca href=\"#-synthetic-data-generation\"\u003e🔮 Synthetic Data Generation\u003c/a\u003e •\n    \u003ca href=\"#-citation\"\u003e📜 Citation\u003c/a\u003e •\n    \u003ca href=\"#-acknowledgement\"\u003e🙏 Acknowledgement\u003c/a\u003e\n\u003c/p\u003e\n\n## 📰 TL;DR\n\nHow to *effectively* and *efficiently* obtain code preferences and judgements is an important yet under-studied topic! \n\nTo this end, our work provides:\n\n* **CodeFavor**: an open recipe to train code preference models with from-scratch data!\n  * *Commit-Instruct*: code commits -\u003e code preference\n  * *Critic-Evol*: code critique \u0026 revising -\u003e code preference\n* **CodePrefBench**: 1364 code preference tasks covering both verifiable and human objectives:\n  * *Code Correctness*\n  * *Code Efficiency*\n  * *Code Security*\n  * *Human Preference*\n* **Study**: our paper provides comprehensive studies!\n  * *Human studies*: quantifying the cost and performance of human preference based 18 developers\n  * *Case studies*: our Appendix case-studies model preferences over code correctness, efficiency, and security\n  * *Controlled experiments*: impact of data, comment, criteria, modeling, etc. on training preference models\n\n![](./assets/codefavor.png)\n\n## 🔎 Evaluation\n\n### Environment\n\n* Python requirements: 3.10 or higher.\n\n```bash\nconda create -n codefavor python=3.10 -y\nconda activate codefavor\npip install -r requirements.txt\n```\n\n### CodePrefBench\n\n```bash\n# OpenAI server\npython codefavor/evaluate.py --model-id \"gpt-4o-2024-05-13\" --model-type openai --concurrency 80\n# Other OpenAI-compatible servers (vLLM, DeepSeek APIs, etc.)\npython codefavor/evaluate.py --model-id \"google/gemma-2-27b-it\" --model-type openai --concurrency 80 --model-url http://localhost:8000/v1\n# Claude models via Bedrock\npython codefavor/evaluate.py --model-id \"anthropic.claude-3-sonnet-20240229-v1:0\" --model-type bedrock --concurrency 10\n# Pairwise RM\npython codefavor/evaluate.py --model-id ./models/mix-cls-mistral-7b-it_bs32_ep1_lr5e-6-l3-70b/checkpoint-688 --model-type pair-rm\n```\n\n* Supported `--model-type`: `huggingface`, `openai`, `bedrock`, `pair-rm`, and `google`\n\n## 🧪 Training\n\n### Environment\n\n```bash\ngit clone https://github.com/axolotl-ai-cloud/axolotl.git axolotl-dep\ncd axolotl-dep\n\npip install torch==2.3.0\npip install packaging ninja wandb\npip install -e '.[flash-attn,deepspeed]'\n```\n\n### Use existing dataset\n\n```bash\npython scripts/axolotl/prepare_data.py \\\n    --decomposed-dataset datasets/train/editpackft-Llama-3-70B-Instruct.commit_instruct.decompose.jsonl \\\n    --judge-type classification --both-order\npython scripts/axolotl/prepare_data.py \\\n    --decomposed-dataset datasets/train/Llama-3-8B-Instruct-SOSS.teacher.Llama-3-70B-Instruct.critic_evol.decompose.jsonl \\\n    --judge-type classification --both-order\n```\n\n### Train models using Axolotl\n\n```bash\naccelerate launch -m axolotl.cli.train \\\n    scripts/axolotl/recipe/gemma/cls-commit-instruct-from-llama3-70b.yaml \\\n    --deepspeed scripts/axolotl/zero3.json\n# or use `torchrun` if your `accelerate` is complaining\ntorchrun --nproc_per_node 8 -m axolotl.cli.train \\\n    scripts/axolotl/recipe/gemma/cls-commit-instruct-from-llama3-70b.yaml \\\n    --deepspeed scripts/axolotl/zero3.json\n```\n\n## 🔮 Synthetic Data Generation\n\n### Commit-Instruct from Scratch\n\n```bash\n# Support OpenAI and Bedrock interface\n# OAI interface\npython codefavor/prompt/commit_instruct.py --model-id \"deepseek-chat\" --model-type \"openai\" --concurrency 256 --dataset editpackft --model-url \"https://api.deepseek.com/v1\"\n# Bedrock interface\npython codefavor/prompt/commit_instruct.py --model-id \"meta.llama3-1-405b-instruct-v1:0\" --model-type \"bedrock\" --concurrency 10 --dataset editpackft\n```\n\n### Critic-Evol from Scratch\n\n```bash\npython codefavor/prompt/critic_evol.py --weak-dataset ./datasets/train/Llama-3-8B-Instruct-SOSS.jsonl \\\n                                     --model-id \"deepseek-coder\" --model-url \"https://api.deepseek.com/v1\"\npython codefavor/prompt/critic_evol.py --weak-dataset ./datasets/train/Llama-3-8B-Instruct-SOSS.jsonl \\\n                                     --model-id \"meta.llama3-1-405b-instruct-v1:0\" --concurrency 10\n```\n\n* Pairwise training code is partially adopted from https://github.com/RLHFlow/RLHF-Reward-Modeling/tree/main/pair-pm\n\n## 📜 Citation\n\n```bibtex\n@article{codefavor,\n  title = {Learning Code Preference via Synthetic Evolution},\n  author = {Liu, Jiawei and Nguyen, Thanh and Shang, Mingyue and Ding, Hantian and Li, Xiaopeng and Yu, Yu and Kumar, Varun and Wang, Zijian},\n  journal = {arXiv preprint arXiv:2410.03837},\n  year = {2024},\n}\n```\n\n## 🙏 Acknowledgement\n\n* Our training code is partially adapted from [RLHFlow](https://github.com/RLHFlow/RLHF-Reward-Modeling).\n* Our evaluation code is partially adapted from [RepoQA](https://github.com/evalplus/repoqa).\n* The seed corpus used in this paper comes from [EditPackFT](https://huggingface.co/datasets/nuprl/EditPackFT) and [Self-OSS-Instruct](https://huggingface.co/datasets/bigcode/self-oss-instruct-sc2-exec-filter-50k).\n\n## 🎓 Research Use Only\nThis source code is being released solely for academic and scientific reproducibility purposes, in support of the methods and findings described in the associated publication. Pull requests are not being accepted in order to maintain the code exactly as it was used in the paper, but interested parties are encouraged to open an issue requesting open source community development.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famazon-science%2Fllm-code-preference","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Famazon-science%2Fllm-code-preference","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famazon-science%2Fllm-code-preference/lists"}