Awesome-Code-LLM
[TMLR] A curated list of language modeling researches for code (and other software engineering activities), plus related datasets.
https://github.com/codefuse-ai/Awesome-Code-LLM
Last synced: 3 days ago
JSON representation
-
1. Surveys
-
2. Models
-
2.1 Base LLMs and Pretraining Strategies
- 2022-01
- 2022-04 - neox)]
- 2023-03
- 2023-07
- 2023-09 - 1_5)]
- 2023-09 - inc/Baichuan2)]
- 2023-09
- 2023-10 - src)]
- 2023-12
- 2023-12
- 2023-12 - research/YAYI2)]
- 2024-01 - ai/DeepSeek-LLM)]
- 2024-01 - of-experts/)]
- 2024-01
- 2024-02
- 2024-01 - ai/DeepSeek-MoE)]
- 2024-02 - open-models/)]
- 2024-03 - ai/Yi)]
- 2024-03 - 3-family)]
- 2024-04 - 34B)]
- 2024-04 - ai/JetMoE)]
- 2024-04
- 2024-04 - llama/llama3)] [[paper](https://arxiv.org/abs/2407.21783)]
- 2024-05 - ai/DeepSeek-V2)]
- 2024-04
- 2024-04
- 2024-04 - FLM)]
- 2024-05 - 7B)]
- 2024-05 - art-projection/MAP-NEO)]
- 2024-06
- 2024-06
- 2024-06
- 2024-06
- 2024-06
- 2024-06
- 2024-06
- 2024-06
- 2024-06
- 2024-07
- 2024-07
- 2024-07
- 2024-07
- 2024-07
- 2024-08
- 2024-09
- 2024-09
- 2024-09
- 2024-10
- 2024-10
- 2024-11
- 2024-11
- 2024-11
- 2024-12
- 2024-12
- 2024-12
- 2024-12
- 2024-12
- 2024-12
- 2024-12
- 2024-12
- 2024-12
- 2024-12
- 2025-01
- 2025-01
- 2024-06
- 2024-06
- 2024-11
- 2025-02
- 2025-02
- 2025-03
- 2025-02
- 2025-03
- 2025-04
- 2025-03
- 2025-05
- 2025-05
- 2025-05
- 2025-05
- 2025-05
- 2025-06
- 2025-07
- 2025-07
- 2025-08
- 2025-07
- 2025-07
- 2025-08
- 2025-09
- 2025-09
- 2025-10
- 2025-11
- 2025-12
- 2026-01
- 2026-02
- 2026-02
- 2026-02
- 2026-02
- 2026-02
- 2026-04
-
2.2 Existing LLM Adapted to Code
-
2.3 General Pretraining on Code
- 2019-12 - research/google-research/tree/master/cubert)]
- 2020-02
- 2020-09
- 2021-08
- 2021-10
- 2022-05
- 2020-05
- 2021-12
- 2022-02 - LMs)]
- 2022-03
- 2022-04
- 2022-06
- 2022-07
- 2023-01
- 2023-05
- 2023-06 - 1)]
- 2020-10
- 2021-02 - mastropaolo/TransferLearning4Code)]
- 2021-02
- 2021-03
- 2021-09
- 2022-01
- 2022-06
- 2023-05
- 2020-12
- 2022-03
- 2024-01 - ai/DeepSeek-Coder)]
- 2024-02
- 2024-01
- 2022-12 - code)]
- 2024-03
- 2024-04
- 2024-05 - 07] [[paper](https://arxiv.org/abs/2407.13739)]
- 2024-02
- 2024-07
- 2024-09
- 2024-10
- 2024-11
- 2025-01
- 2025-03
- 2025-05
- 2025-06
- 2025-05
- 2025-06
- 2025-09
- 2025-09
- 2025-10
- 2025-09
- 2025-10
- 2025-10
- 2026-02
- 2026-02
- 2026-03
- 2026-02
-
2.4 (Instruction) Fine-Tuning on Code
-
Programming Languages
Categories
5. Methods/Models for Downstream Tasks
1,248
8. Datasets
582
3. When Coding Meets Reasoning
315
2. Models
286
6. Analysis of AI-Generated Code
246
4. Code LLM for Low-Resource, Low-Level, and Domain-Specific Languages
122
7. Human-LLM Interaction
74
News
62
9. Recommended Readings
32
5. Datasets
29
4. Datasets
20
1. Surveys
17
6. Datasets
4
Other Awesome LLM Reading Lists
3
Star History
2
7. User-LLM Interaction
1
Sub Categories
8.2 Benchmarks
612
3.5 Frontend Navigation
179
Text-To-SQL
171
3.3 Code Agents
119
Vulnerability Detection
116
Others
114
2.1 Base LLMs and Pretraining Strategies
98
Code Generation
92
Code Commenting and Summarization
83
Test Generation
79
2.4 (Instruction) Fine-Tuning on Code
76
Malicious Code Detection
75
Program Repair
75
3.1 Coding for Reasoning
66
Security and Vulnerabilities
59
3.4 Interactive Coding
55
2.3 General Pretraining on Code
54
Code Review
49
Code Translation
47
Frontend Development
46
2.5 Reinforcement Learning on Code
44
Repository-Level Coding
42
Code Similarity and Embedding (Clone Detection, Code Search)
38
Correctness
34
Issue Resolution
32
5.2 Benchmarks
30
Requirement Engineering
28
Log Analysis
26
Program Proof
26
Automated Machine Learning
25
AI-Generated Code Detection
25
Compiler Optimization
24
Code RAG
23
Code Refactoring and Migration
23
Binary Analysis and Decompilation
23
4.2 Benchmarks
20
Efficiency
20
3.2 Code Simulation
18
Software Configuration
17
Code Ranking
16
Code QA & Reasoning
16
Robustness
15
Oracle Generation
15
2.2 Existing LLM Adapted to Code
14
Hallucination
13
Fuzz Testing
12
Interpretability
12
Software Modeling
10
API Usage
10
Privacy
9
Commit Message Generation
8
Mutation Testing
7
Bias
7
8.1 Pretraining
6
Type Prediction
4
6.2 Benchmarks
4
Contamination
3