Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/huybery/Awesome-Code-LLM
👨💻 An awesome and curated list of best code-LLM for research.
https://github.com/huybery/Awesome-Code-LLM
List: Awesome-Code-LLM
awesome code-generation large-language-models
Last synced: 3 months ago
JSON representation
👨💻 An awesome and curated list of best code-LLM for research.
- Host: GitHub
- URL: https://github.com/huybery/Awesome-Code-LLM
- Owner: huybery
- License: mit
- Created: 2023-07-05T06:42:09.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2024-02-21T03:18:42.000Z (3 months ago)
- Last Synced: 2024-03-11T22:04:16.267Z (3 months ago)
- Topics: awesome, code-generation, large-language-models
- Homepage:
- Size: 248 KB
- Stars: 521
- Watchers: 23
- Forks: 27
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Lists
- Awesome-LLM - Awesome-Code-LLM - An awesome and curated list of best code-LLM for research. (Other Papers)
- Awesome-LLM?tab=readme-ov-file - Awesome-Code-LLM - An awesome and curated list of best code-LLM for research. (Other Papers)
- ultimate-awesome - Awesome-Code-LLM - 👨💻 An awesome and curated list of best code-LLM for research. (Other Lists / Julia Lists)
- Awesome-DevAI - Awesome-Code-LLM
- awesome-llmops - Awesome-Code-LLM - LLM for research. | ![GitHub Badge](https://img.shields.io/github/stars/huybery/Awesome-Code-LLM.svg?style=flat-square) | (Awesome Lists / Profiling)
- awesome-stars - Awesome-Code-LLM - LLM for research. | huybery | 712 | (Others)
- awesome - huybery/Awesome-Code-LLM - 👨💻 An awesome and curated list of best code-LLM for research. (miscellaneous)
README
![](code-banner.png)
## 🧵 Table of Contents
- [🧵 Table of Contents](#-table-of-contents)
- [🚀 Leaderboard](#-leaderboard)
- [💡 Evaluation Toolkit](#-evaluation-toolkit)
- [📚 Paper](#-paper)
- [▶️ Pre-Training](#️-pre-training)
- [▶️ Instruction Tuning](#️-instruction-tuning)
- [▶️ Alignment with Feedback](#️-alignment-with-feedback)
- [▶️ Prompting](#️-prompting)
- [▶️ Evaluation \& Benchmark](#️-evaluation--benchmark)
- [▶️ Using LLMs while coding](#️-using-llms-while-coding)
- [🙌 Contributors](#-contributors)
- [Cite as](#cite-as)
- [Acknowledgement](#acknowledgement)
- [Star History](#star-history)## 🚀 Leaderboard
Central Leaderboard (Sort by HumanEval Pass@1)
| Model | Params | HumanEval | MBPP | HF | Source |
| ------------------------ | ------ | --------- | ---- | ------------------------------------------------------------- | ------------------------------------------------------- |
| GPT-4 + Reflexion | ? | 91.0 | 77.1 | | [paper](https://arxiv.org/abs/2303.11366) |
| GPT-4 (latest) | ? | 84.1 | 80.0 | | [github](https://github.com/deepseek-ai/DeepSeek-Coder) |
| DeepSeek-Coder-Instruct | 33B | 79.3 | 70.0 | [ckpt](https://hf.co/deepseek-ai/deepseek-coder-33b-instruct) | [github](https://github.com/deepseek-ai/DeepSeek-Coder) |
| DeepSeek-Coder-Instruct | 7B | 78.6 | 65.4 | [ckpt](https://hf.co/deepseek-ai/deepseek-coder-33b-instruct) | [github](https://github.com/deepseek-ai/DeepSeek-Coder) |
| GPT-3.5-Turbo (latest) | ? | 76.2 | 70.8 | | [github](https://github.com/deepseek-ai/DeepSeek-Coder) |
| Code-Llama | 34B | 62.2 | 61.2 | | [paper](https://arxiv.org/abs/2308.12950) |
| Pangu-Coder2 | 15B | 61.6 | | | [paper](https://arxiv.org/abs/2307.14936) |
| WizardCoder-15B | 15B | 57.3 | 51.8 | [ckpt](https://hf.co/WizardLM/WizardCoder-15B-V1.0) | [paper](https://arxiv.org/abs/2306.08568) |
| Code-Davinci-002 | ? | 47.0 | | | [paper](https://arxiv.org/abs/2107.03374) |
| StarCoder-15B (Prompted) | 15B | 40.8 | 49.5 | [ckpt](https://hf.co/bigcode/starcoder) | [paper](https://arxiv.org/abs/2305.06161) |
| PaLM 2-S | ? | 37.6 | 50.0 | | [paper](https://arxiv.org/abs/2204.02311) |
| PaLM-Coder-540B | 540B | 36.0 | 47.0 | | [paper](https://arxiv.org/abs/2204.02311) |
| InstructCodeT5+ | 16B | 35.0 | | | [paper](https://arxiv.org/abs/2305.07922) |
| StarCoder-15B | 15B | 33.6 | 52.7 | [ckpt](https://hf.co/bigcode/starcoder) | [paper](https://arxiv.org/abs/2305.06161) |
| Code-Cushman-001 | ? | 33.5 | 45.9 | | [paper](https://arxiv.org/abs/2107.03374) |
| CodeT5+ | 16B | 30.9 | | | [paper](https://arxiv.org/abs/2305.07922) |
| LLaMA2-70B | 70B | 29.9 | | [ckpt](https://hf.co/meta-llama/Llama-2-70b-hf) | [paper](https://arxiv.org/abs/2307.09288) |
| CodeGen-16B-Mono | 16B | 29.3 | 35.3 | | [paper](https://arxiv.org/abs/2203.13474) |
| PaLM-540B | 540B | 26.2 | 36.8 | | [paper](https://arxiv.org/abs/2204.02311) |
| LLaMA-65B | 65B | 23.7 | 37.7 | | [paper](https://arxiv.org/abs/2302.13971) |
| CodeGeeX | 13B | 22.9 | 24.4 | | [paper](https://arxiv.org/abs/2303.17568) |
| LLaMA-33B | 33B | 21.7 | 30.2 | | [paper](https://arxiv.org/abs/2302.13971) |
| CodeGen-16B-Multi | 16B | 18.3 | 20.9 | | [paper](https://arxiv.org/abs/2203.13474) |
| AlphaCode | 1.1B | 17.1 | | | [paper](https://arxiv.org/abs/2203.07814) || Leaderboard | Access |
| :----------------------------------: | ----------------------------------------------------------------------------------|
| Big Code Models Leaderboard | [[Source](https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard)] |
| BIRD | [[Source](https://bird-bench.github.io)] |
| CanAiCode Leaderboard | [[Source](https://huggingface.co/spaces/mike-ravkine/can-ai-code-results)] |
| Coding LLMs Leaderboard | [[Source](https://leaderboard.tabbyml.com)] |
| CRUXEval Leaderboard | [[Source](https://crux-eval.github.io/leaderboard.html)] |
| EvalPlus | [[Source](https://evalplus.github.io/leaderboard.html)] |
| HumanEval.jl | [[Source](https://github.com/01-ai/HumanEval.jl)] |
| InfiCoder-Eval | [[Source](https://infi-coder.github.io/inficoder-eval)] |
| InterCode | [[Source](https://intercode-benchmark.github.io)] |
| Program Synthesis Models Leaderboard | [[Source](https://accubits.com/open-source-program-synthesis-models-leaderboard)] |
| Spider | [[Source](https://yale-lily.github.io/spider)] |## 💡 Evaluation Toolkit:
- [bigcode-evaluation-harness](https://github.com/bigcode-project/bigcode-evaluation-harness): A framework for the evaluation of autoregressive code generation language models.
- [code-eval](https://github.com/abacaj/code-eval): A framework for the evaluation of autoregressive code generation language models on HumanEval.## 📚 Paper
### ▶️ Pre-Training
1. **Evaluating Large Language Models Trained on Code** `Preprint`
[[Paper](https://arxiv.org/abs/2107.03374)] *Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto. et al.* 2021.072. **CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis** `ICLR23`
[[Paper](https://arxiv.org/abs/2203.13474)] *Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong.* 2022.033. **ERNIE-Code: Beyond English-Centric Cross-lingual Pretraining for Programming Languages** `ACL23 (Findings)`
[[Paper](https://aclanthology.org/2023.findings-acl.676.pdf)][[Repo](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/model_zoo/ernie-code)] *Yekun Chai, Shuohuan Wang, Chao Pang, Yu Sun, Hao Tian, and Hua Wu.* 2022.12
4. **SantaCoder: don't reach for the stars!** `Preprint`
[[Paper](https://arxiv.org/abs/2301.03988)] *Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff. et al.* 2023.015. **CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X** `Preprint`
[[Paper](https://arxiv.org/abs/2303.17568)] *Qinkai Zheng, Xiao Xia, Xu Zou, Yuxiao Dong, Shan Wang, Yufei Xue, Zihan Wang, Lei Shen, Andi Wang, Yang Li, Teng Su, Zhilin Yang, Jie Tang.* 2023.036. **CodeGen2: Lessons for Training LLMs on Programming and Natural Languages** `ICLR23`
[[Paper](https://arxiv.org/abs/2305.02309)] *Erik Nijkamp, Hiroaki Hayashi, Caiming Xiong, Silvio Savarese, Yingbo Zhou.* 2023.057. **StarCoder: may the source be with you!** `Preprint`
[[Paper](https://arxiv.org/abs/2305.06161)] *Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou. et al.* 2023.058. **CodeT5+: Open Code Large Language Models for Code Understanding and Generation** `Preprint`
[[Paper](https://arxiv.org/abs/2305.07922)] *Yue Wang, Hung Le, Akhilesh Deepak Gotmare, Nghi D.Q. Bui, Junnan Li, Steven C.H. Hoi.* 2023.059. **Textbooks Are All You Need** `Preprint`
[[Paper](https://arxiv.org/abs/2306.11644)] *Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi. et al.* 2023.0610. **Code Llama: Open Foundation Models for Code** `Preprint`
[[Paper](https://arxiv.org/abs/2308.12950)] *Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat. et al.* 2023.08### ▶️ Instruction Tuning
1. **WizardCoder: Empowering Code Large Language Models with Evol-Instruct** `Preprint`
[[Paper](https://arxiv.org/abs/2306.08568)] *Ziyang Luo, Can Xu, Pu Zhao, Qingfeng Sun, Xiubo Geng, Wenxiang Hu, Chongyang Tao, Jing Ma, Qingwei Lin, Daxin Jiang.* 2023.072. **OctoPack: Instruction Tuning Code Large Language Models** `Preprint`
[[Paper](https://arxiv.org/abs/2308.07124)][[Repo](https://github.com/bigcode-project/octopack)] *Niklas Muennighoff, Qian Liu, Armel Zebaze, Qinkai Zheng, Binyuan Hui, Terry Yue Zhuo, Swayam Singh, Xiangru Tang, Leandro von Werra, Shayne Longpre.* 2023.08### ▶️ Alignment with Feedback
1. **CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning** `NeurIPS22`
[[Paper](https://arxiv.org/abs/2207.01780)] *Hung Le, Yue Wang, Akhilesh Deepak Gotmare, Silvio Savarese, Steven C.H. Hoi.* 2022.072. **Execution-based Code Generation using Deep Reinforcement Learning** `TMLR23`
[[Paper](https://arxiv.org/abs/2301.13816)] *Parshin Shojaee, Aneesh Jain, Sindhu Tipirneni, Chandan K. Reddy.* 2023.013. **RLTF: Reinforcement Learning from Unit Test Feedback** `Preprint`
[[Paper](https://arxiv.org/abs/2307.04349)] *Jiate Liu, Yiqin Zhu, Kaiwen Xiao, Qiang Fu, Xiao Han, Wei Yang, Deheng Ye.* 2023.074. **PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback** `Preprint`
[[Paper](https://arxiv.org/abs/2307.14936)] *Bo Shen, Jiaxin Zhang, Taihong Chen, Daoguang Zan, Bing Geng, An Fu, Muhan Zeng, Ailun Yu, Jichuan Ji, Jingyang Zhao, Yuenan Guo, Qianxiang Wang.* 2023.07### ▶️ Prompting
1. **CodeT: Code Generation with Generated Tests** `ICLR23`
[[Paper](https://arxiv.org/abs/2207.10397)] *Bei Chen, Fengji Zhang, Anh Nguyen, Daoguang Zan, Zeqi Lin, Jian-Guang Lou, Weizhu Chen.* 2022.072. **Coder Reviewer Reranking for Code Generation** `ICML23`
[[Paper](https://arxiv.org/abs/2211.16490)] *Tianyi Zhang, Tao Yu, Tatsunori B Hashimoto, Mike Lewis, Wen-tau Yih, Daniel Fried, Sida I Wang.* 2022.113. **LEVER: Learning to Verify Language-to-Code Generation with Execution** `ICML23`
[[Paper](https://arxiv.org/abs/2302.08468)] *Ansong Ni, Srini Iyer, Dragomir Radev, Ves Stoyanov, Wen-tau Yih, Sida I. Wang, Xi Victoria Lin.* 2023.024. **Teaching Large Language Models to Self-Debug** `Preprint`
[[Paper](https://arxiv.org/abs/2304.05128)] *Xinyun Chen, Maxwell Lin, Nathanael Schärli, Denny Zhou.* 2023.065. **Demystifying GPT Self-Repair for Code Generation** `Preprint`
[[Paper](https://arxiv.org/abs/2306.09896)] *Theo X. Olausson, Jeevana Priya Inala, Chenglong Wang, Jianfeng Gao, Armando Solar-Lezama.* 2023.066. **SelfEvolve: A Code Evolution Framework via Large Language Models** `Preprint`
[[Paper](https://arxiv.org/abs/2306.02907)] *Shuyang Jiang, Yuhao Wang, Yu Wang.* 2023.06### ▶️ Evaluation & Benchmark
1. **Measuring Coding Challenge Competence With APPS** `NeurIPS21`
> Named APPS
[[Paper](https://arxiv.org/abs/2108.07732)][[Repo](https://github.com/hendrycks/apps)] *Dan Hendrycks, Steven Basart, Saurav Kadavath, Mantas Mazeika, Akul Arora, Ethan Guo, Collin Burns, Samir Puranik, Horace He, Dawn Song, Jacob Steinhardt.* 2021.052. **Program Synthesis with Large Language Models** `Preprint`
> Named MBPP
[[Paper](https://arxiv.org/abs/2108.07732)] *Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, Charles Sutton.* 2021.083. **DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation** `ICML23`
[[Paper](https://arxiv.org/abs/2211.11501)] *Yuhang Lai, Chengxi Li, Yiming Wang, Tianyi Zhang, Ruiqi Zhong, Luke Zettlemoyer, Scott Wen-tau Yih, Daniel Fried, Sida Wang, Tao Yu.* 2022.11
4. **RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems** `Preprint`
[[Paper](https://arxiv.org/abs/2306.03091)] *Tianyang Liu, Canwen Xu, Julian McAuley.* 2023.06
5. **Can ChatGPT replace StackOverflow? A Study on Robustness and Reliability of Large Language Model Code Generation** `Preprint`
[[Paper](https://arxiv.org/abs/2308.10335)] *Li Zhong, Zilong Wang.* 2023.08
### ▶️ Using LLMs while coding
1. **Awesome-DevAI: A list of resources about using LLMs while building software** `Awesome`
[[Repo](https://github.com/continuedev/Awesome-DevAI)] *Ty Dunn, Nate Sesti.* 2023.10
## 🙌 Contributors
This is an active repository and your contributions are always welcome! If you have any question about this opinionated list, do not hesitate to contact me `[email protected]`.
## Cite as
```
@software{awesome-code-llm,
author = {Binyuan Hui},
title = {An awesome and curated list of best code-LLM for research},
howpublished = {\url{https://github.com/huybery/Awesome-Code-LLM}},
year = 2023,
}
```## Acknowledgement
This project is inspired by [Awesome-LLM](https://github.com/Hannibal046/Awesome-LLM).
## Star History
[![Star History Chart](https://api.star-history.com/svg?repos=huybery/Awesome-Code-LLM&type=Date)](https://star-history.com/#huybery/Awesome-Code-LLM&Date)
**[⬆ Back to ToC](#table-of-contents)**