{"id":13526374,"url":"https://github.com/codefuse-ai/Awesome-Code-LLM","last_synced_at":"2025-04-01T07:32:04.732Z","repository":{"id":207297585,"uuid":"694517360","full_name":"codefuse-ai/Awesome-Code-LLM","owner":"codefuse-ai","description":"[TMLR] A curated list of language modeling researches for code and related datasets.","archived":false,"fork":false,"pushed_at":"2024-10-30T05:29:15.000Z","size":9698,"stargazers_count":1607,"open_issues_count":0,"forks_count":106,"subscribers_count":40,"default_branch":"main","last_synced_at":"2024-10-30T08:36:32.729Z","etag":null,"topics":["ai","awesome","datasets","llm","nlp","papers","software-engineering","survey","tmlr"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2311.07989","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/codefuse-ai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-09-21T06:49:47.000Z","updated_at":"2024-10-30T07:25:27.000Z","dependencies_parsed_at":"2023-11-27T13:29:18.590Z","dependency_job_id":"1901aad6-f4a4-43be-9dd5-4acdafb1f2e3","html_url":"https://github.com/codefuse-ai/Awesome-Code-LLM","commit_stats":null,"previous_names":["codefuse-ai/awesome-code-llm"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codefuse-ai%2FAwesome-Code-LLM","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codefuse-ai%2FAwesome-Code-LLM/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codefuse-ai%2FAwesome-Code-LLM/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codefuse-ai%2FAwesome-Code-LLM/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/codefuse-ai","download_url":"https://codeload.github.com/codefuse-ai/Awesome-Code-LLM/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246600769,"owners_count":20803481,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","awesome","datasets","llm","nlp","papers","software-engineering","survey","tmlr"],"created_at":"2024-08-01T06:01:28.739Z","updated_at":"2025-04-01T07:32:04.711Z","avatar_url":"https://github.com/codefuse-ai.png","language":null,"funding_links":[],"categories":["HarmonyOS","📎 附录","A01_文本生成_文本对话","Inbox: Speech-to-text (STT) and spoken content analysis","Getting started","NLP","Others","Other Lists","🙏 Acknowledgements","Other Awesome Lists","Topics"],"sub_categories":["Windows Manager","🤝 友情链接","大语言对话模型及数据","Creative Uses of Generative AI Image Synthesis Tools","TeX Lists","🧪 Frontier Labs and Teams","Others","Code LLM"],"readme":"# Awesome-Code-LLM\n\nThis is the repo for our [TMLR](https://jmlr.org/tmlr/) survey [Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for Code](https://arxiv.org/abs/2311.07989) - a comprehensive review of LLM researches for code. Works in each category are ordered chronologically. If you have a basic understanding of machine learning but are new to NLP, we also provide a list of recommended readings in [section 9](#9-recommended-readings).\n\n## News\n\n🔥🔥🔥 [2025/03/25] Featured papers:\n\n- 🔥🔥 [Every Sample Matters: Leveraging Mixture-of-Experts and High-Quality Data for Efficient and Accurate Code LLM](https://arxiv.org/abs/2503.17793) from Ant Group.\n\n- 🔥🔥 [LLMs Love Python: A Study of LLMs' Bias for Programming Languages and Libraries](https://arxiv.org/abs/2503.17181) from King’s College London.\n\n- 🔥 [EnvBench: A Benchmark for Automated Environment Setup](https://arxiv.org/abs/2503.14443) from JetBrains Research.\n\n- 🔥 [The KoLMogorov Test: Compression by Code Generation](https://arxiv.org/abs/2503.13992) from Meta.\n\n🔥🔥🔥 Recent works from Codefuse:\n\n- Graph-Aligned LLM for Improved Source Code Understanding: [codefuse-ai/GALLa](https://github.com/codefuse-ai/GALLa)\n- Code Graph Model: [codefuse-ai/CodeFuse-CGM](https://github.com/codefuse-ai/CodeFuse-CGM)\n- EasyDeploy: [codefuse-ai/EasyDeploy](https://github.com/codefuse-ai/EasyDeploy)\n- Rodimus: [codefuse-ai/rodimus](https://github.com/codefuse-ai/rodimus)\n- Code General Embedding: [codefuse-ai/CodeFuse-CGE](https://github.com/codefuse-ai/CodeFuse-CGE)\n\n🔥🔥\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp; [2025/01/11] 20 papers from NeurIPS 2024 have been collected. You may search for the keyword \"NeurIPS 2024\" in this page.\n\n🔥\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp; [2024/09/06] **Our survey has been accepted for publication by [Transactions on Machine Learning Research (TMLR)](https://jmlr.org/tmlr/).**\n\n#### How to Contribute\n\nIf you find a paper to be missing from this repository, misplaced in a category, or lacking a reference to its journal/conference information, please do not hesitate to create an issue. If you find this repo helpful, please cite our survey:\n\n```\n@article{zhang2024unifying,\n   title={Unifying the Perspectives of {NLP} and Software Engineering: A Survey on Language Models for Code},\n   author={Ziyin Zhang and Chaoyu Chen and Bingchang Liu and Cong Liao and Zi Gong and Hang Yu and Jianguo Li and Rui Wang},\n   journal={Transactions on Machine Learning Research},\n   issn={2835-8856},\n   year={2024},\n   url={https://openreview.net/forum?id=hkNnGqZnpa},\n   note={}\n}\n```\n\n## Table of Contents\n\n1. [Surveys](#1-surveys)\n\n2. [Models](#2-models)\n\n   2.1 [Base LLMs and Pretraining Strategies](#21-base-llms-and-pretraining-strategies)\n\n   2.2 [Existing LLM Adapted to Code](#22-existing-llm-adapted-to-code)\n\n   2.3 [General Pretraining on Code](#23-general-pretraining-on-code)\n\n   - [Encoder](#encoder)\n   - [Decoder](#decoder)\n   - [Encoder-Decoder](#encoder-decoder)\n   - [UniLM](#unilm)\n\n   \u003c!-- prettier ignore --\u003e\n\n   2.4 [(Instruction) Fine-Tuning on Code](#24-instruction-fine-tuning-on-code)\n\n   2.5 [Reinforcement Learning on Code](#25-reinforcement-learning-on-code)\n\n3. [When Coding Meets Reasoning](#3-when-coding-meets-reasoning)\n\n   3.1 [Coding for Reasoning](#31-coding-for-reasoning)\n\n   3.2 [Code Simulation](#32-code-simulation)\n\n   3.3 [Code Agents](#33-code-agents)\n\n   3.4 [Interactive Coding](#34-interactive-coding)\n\n   3.5 [Frontend Navigation](#35-frontend-navigation)\n\n4. [Code LLM for Low-Resource, Low-Level, and Domain-Specific Languages](#4-code-llm-for-low-resource-low-level-and-domain-specific-languages)\n\n5. [Methods/Models for Downstream Tasks](#5-methodsmodels-for-downstream-tasks)\n\n   - Programming\n\n     - [Code Generation](#code-generation)\n     - [Code RAG](#code-rag)\n     - [Code Ranking](#code-ranking)\n     - [Code Translation](#code-translation)\n     - [Code Commenting and Summarization](#code-commenting-and-summarization)\n     - [Program Repair](#program-repair)\n     - [Code Similarity and Embedding (Clone Detection, Code Search)](#code-similarity-and-embedding-clone-detection-code-search)\n     - [Code Refactoring and Migration](#code-refactoring-and-migration)\n     - [Type Prediction](#type-prediction)\n     - [Repository-Level Coding](#repository-level-coding)\n     - [Frontend Development](#frontend-development)\n     - [Automated Machine Learning](#automated-machine-learning)\n     - [Text-To-SQL](#text-to-sql)\n     - [Program Proof](#program-proof)\n\n   - Testing and Deployment\n\n     - [Test Generation](#test-generation)\n     - [Oracle Generation](#oracle-generation)\n     - [Mutation Testing](#mutation-testing)\n     - [Fuzz Testing](#fuzz-testing)\n     - [Vulnerability Detection](#vulnerability-detection)\n     - [Malicious Code Detection](#malicious-code-detection)\n     - [Compiler Optimization](#compiler-optimization)\n     - [Binary Analysis and Decompilation](#binary-analysis-and-decompilation)\n\n   - DevOps\n\n     - [Commit Message Generation](#commit-message-generation)\n     - [Code Review](#code-review)\n     - [Log Analysis](#log-analysis)\n     - [Software Configuration](#software-configuration)\n     - [Code QA \u0026 Reasoning](#code-qa--reasoning)\n\n   - Requirement\n\n     - [Software Modeling](#software-modeling)\n     - [Requirement Engineering](#requirement-engineering)\n\n6. [Analysis of AI-Generated Code](#6-analysis-of-ai-generated-code)\n\n   - [Security and Vulnerabilities](#security-and-vulnerabilities)\n   - [Correctness](#correctness)\n   - [Hallucination](#hallucination)\n   - [Efficiency](#efficiency)\n   - [Robustness](#robustness)\n   - [Interpretability](#interpretability)\n   - [API Usage](#api-usage)\n   - [Privacy](#privacy)\n   - [Bias](#bias)\n   - [AI-Generated Code Detection](#ai-generated-code-detection)\n   - [Others](#others)\n\n7. [Human-LLM Interaction](#7-human-llm-interaction)\n\n8. [Datasets](#8-datasets)\n\n   8.1 [Pretraining](#81-pretraining)\n\n   8.2 [Benchmarks](#82-benchmarks)\n\n   - [Integrated Benchmarks](#integrated-benchmarks)\n   - [Evaluation Metrics](#evaluation-metrics)\n   - [Program Synthesis](#program-synthesis)\n   - [Visually Grounded Program Synthesis](#visually-grounded-program-synthesis)\n   - [Code Reasoning and QA](#code-reasoning-and-qa)\n   - [Text-to-SQL](#text-to-sql-1)\n   - [Code Translation](#code-translation-1)\n   - [Program Repair](#program-repair-1)\n   - [Code Summarization](#code-summarization)\n   - [Defect/Vulnerability Detection](#defectvulnerability-detection)\n   - [Code Retrieval](#code-retrieval)\n   - [Type Inference](#type-inference)\n   - [Commit Message Generation](#commit-message-generation-1)\n   - [Repo-Level Coding](#repo-level-coding)\n\n9. [Recommended Readings](#9-recommended-readings)\n\n10. [Citation](#citation)\n\n11. [Star History](#star-history)\n\n12. [Join Us](#join-us)\n\n## 1. Surveys\n\nWe list several recent surveys on similar topics. While they are all about language models for code, 1-2 focus on NLP side; 3-6 focus on SE side; 7-11 are released after ours.\n\n1. \"Large Language Models Meet NL2Code: A Survey\" [2022-12] [ACL 2023] [[paper](https://arxiv.org/abs/2212.09420)]\n\n2. \"A Survey on Pretrained Language Models for Neural Code Intelligence\" [2022-12] [[paper](https://arxiv.org/abs/2212.10079)]\n\n3. \"An Empirical Comparison of Pre-Trained Models of Source Code\" [2023-02] [ICSE 2023] [[paper](https://arxiv.org/abs/2302.04026)]\n\n4. \"Large Language Models for Software Engineering: A Systematic Literature Review\" [2023-08] [[paper](https://arxiv.org/abs/2308.10620)]\n\n5. \"Towards an Understanding of Large Language Models in Software Engineering Tasks\" [2023-08] [[paper](https://arxiv.org/abs/2308.11396)]\n\n6. \"Pitfalls in Language Models for Code Intelligence: A Taxonomy and Survey\" [2023-10] [[paper](https://arxiv.org/abs/2310.17903)]\n\n7. \"A Survey on Large Language Models for Software Engineering\" [2023-12] [[paper](https://arxiv.org/abs/2312.15223)]\n\n8. \"Deep Learning for Code Intelligence: Survey, Benchmark and Toolkit\" [2023-12] [[paper](https://arxiv.org/abs/2401.00288)]\n\n9. \"A Survey of Neural Code Intelligence: Paradigms, Advances and Beyond\" [2024-03] [[paper](https://arxiv.org/abs/2403.14734)]\n\n10. \"Tasks People Prompt: A Taxonomy of LLM Downstream Tasks in Software Verification and Falsification Approaches\" [2024-04] [[paper](https://arxiv.org/abs/2404.09384)]\n\n11. \"Automatic Programming: Large Language Models and Beyond\" [2024-05] [[paper](https://arxiv.org/abs/2405.02213)]\n\n12. \"Software Engineering and Foundation Models: Insights from Industry Blogs Using a Jury of Foundation Models\" [2024-10] [[paper](https://arxiv.org/abs/2410.09012)]\n\n13. \"Deep Learning-based Software Engineering: Progress, Challenges, and Opportunities\" [2024-10] [[paper](https://arxiv.org/abs/2410.13110)]\n\n14. \"Large Language Models (LLMs) for Source Code Analysis: applications, models and datasets\" [2025-03] [[paper](https://arxiv.org/abs/2503.17502)]\n\n## 2. Models\n\n\u003cp align='center'\u003e\n\u003cimg src='imgs/overview.png' style='width: 80%; '\u003e\n\u003c/p\u003e\n\n### 2.1 Base LLMs and Pretraining Strategies\n\nThese LLMs are not specifically trained for code, but have demonstrated varying coding capability.\n\n1. **LaMDA**: \"LaMDA: Language Models for Dialog Applications\" [2022-01] [[paper](https://arxiv.org/abs/2201.08239)]\n\n2. **PaLM**: \"PaLM: Scaling Language Modeling with Pathways\" [2022-04] [JMLR] [[paper](https://arxiv.org/abs/2204.02311)]\n\n3. **GPT-NeoX**: \"GPT-NeoX-20B: An Open-Source Autoregressive Language Model\" [2022-04] [ACL 2022 Workshop on Challenges \u0026 Perspectives in Creating LLMs] [[paper](https://arxiv.org/abs/2204.06745)] [[repo](https://github.com/EleutherAI/gpt-neox)]\n\n4. **BLOOM**: \"BLOOM: A 176B-Parameter Open-Access Multilingual Language Model\" [2022-11] [[paper](https://arxiv.org/abs/2211.05100)] [[model](https://huggingface.co/models?search=bigscience/bloom)]\n\n5. **LLaMA**: \"LLaMA: Open and Efficient Foundation Language Models\" [2023-02] [[paper](https://arxiv.org/abs/2302.13971)]\n\n6. **GPT-4**: \"GPT-4 Technical Report\" [2023-03] [[paper](https://arxiv.org/abs/2303.08774)]\n\n7. **LLaMA 2**: \"Llama 2: Open Foundation and Fine-Tuned Chat Models\" [2023-07] [[paper](https://arxiv.org/abs/2307.09288)] [[repo](https://github.com/facebookresearch/llama)]\n\n8. **Phi-1.5**: \"Textbooks Are All You Need II: phi-1.5 technical report\" [2023-09] [[paper](https://arxiv.org/abs/2309.05463)] [[model](https://huggingface.co/microsoft/phi-1_5)]\n\n9. **Baichuan 2**: \"Baichuan 2: Open Large-scale Language Models\" [2023-09] [[paper](https://arxiv.org/abs/2309.10305)] [[repo](https://github.com/baichuan-inc/Baichuan2)]\n\n10. **Qwen**: \"Qwen Technical Report\" [2023-09] [[paper](https://arxiv.org/abs/2309.16609)] [[repo](https://github.com/QwenLM/Qwen)]\n\n11. **Mistral**: \"Mistral 7B\" [2023-10] [[paper](https://arxiv.org/abs/2310.06825)] [[repo](https://github.com/mistralai/mistral-src)]\n\n12. **Gemini**: \"Gemini: A Family of Highly Capable Multimodal Models\" [2023-12] [[paper](https://arxiv.org/abs/2312.11805)]\n\n13. **Phi-2**: \"Phi-2: The surprising power of small language models\" [2023-12] [[blog](https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/)]\n\n14. **YAYI2**: \"YAYI 2: Multilingual Open-Source Large Language Models\" [2023-12] [[paper](https://arxiv.org/abs/2312.14862)] [[repo](https://github.com/wenge-research/YAYI2)]\n\n15. **DeepSeek**: \"DeepSeek LLM: Scaling Open-Source Language Models with Longtermism\" [2024-01] [[paper](https://arxiv.org/abs/2401.02954)] [[repo](https://github.com/deepseek-ai/DeepSeek-LLM)]\n\n16. **Mixtral**: \"Mixtral of Experts\" [2024-01] [[paper](https://arxiv.org/abs/2401.04088)] [[blog](https://mistral.ai/news/mixtral-of-experts/)]\n\n17. **DeepSeekMoE**: \"DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models\" [2024-01] [[paper](https://arxiv.org/abs/2401.12246)] [[repo](https://github.com/deepseek-ai/DeepSeek-MoE)]\n\n18. **Orion**: \"Orion-14B: Open-source Multilingual Large Language Models\" [2024-01] [[paper](https://arxiv.org/abs/2401.06066)] [[repo](https://github.com/OrionStarAI/Orion)]\n\n19. **OLMo**: \"OLMo: Accelerating the Science of Language Models\" [2024-02] [[paper](https://arxiv.org/abs/2402.00838)] [[repo](https://github.com/allenai/OLMo)]\n\n20. **Gemma**: \"Gemma: Open Models Based on Gemini Research and Technology\" [2024-02] [[paper](https://storage.googleapis.com/deepmind-media/gemma/gemma-report.pdf)] [[blog](https://blog.google/technology/developers/gemma-open-models/)]\n\n21. **Claude 3**: \"The Claude 3 Model Family: Opus, Sonnet, Haiku\" [2024-03] [[paper](https://www-cdn.anthropic.com/de8ba9b01c9ab7cbabf5c33b80b7bbc618857627/Model_Card_Claude_3.pdf)] [[blog](https://www.anthropic.com/news/claude-3-family)]\n\n22. **Yi**: \"Yi: Open Foundation Models by 01.AI\" [2024-03] [[paper](https://arxiv.org/abs/2403.04652)] [[repo](https://github.com/01-ai/Yi)]\n\n23. **Poro**: \"Poro 34B and the Blessing of Multilinguality\" [2024-04] [[paper](https://arxiv.org/abs/2404.01856)] [[model](https://huggingface.co/LumiOpen/Poro-34B)]\n\n24. **JetMoE**: \"JetMoE: Reaching Llama2 Performance with 0.1M Dollars\" [2024-04] [[paper](https://arxiv.org/abs/2404.07413)] [[repo](https://github.com/myshell-ai/JetMoE)]\n\n25. **LLaMA 3**: \"The Llama 3 Herd of Models\" [2024-04] [[blog](https://ai.meta.com/blog/meta-llama-3/)] [[repo](https://github.com/meta-llama/llama3)] [[paper](https://arxiv.org/abs/2407.21783)]\n\n26. **Reka Core**: \"Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models\" [2024-04] [[paper](https://arxiv.org/abs/2404.12387)]\n\n27. **Phi-3**: \"Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone\" [2024-04] [[paper](https://arxiv.org/abs/2404.14219)]\n\n28. **OpenELM**: \"OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework\" [2024-04] [[paper](https://arxiv.org/abs/2404.14619)] [[repo](https://github.com/apple/corenet/tree/main/projects/openelm)]\n\n29. **Tele-FLM**: \"Tele-FLM Technical Report\" [2024-04] [[paper](https://arxiv.org/abs/2404.16645)] [[model](https://huggingface.co/CofeAI/Tele-FLM)]\n\n30. **DeepSeek-V2**: \"DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model\" [2024-05] [[paper](https://arxiv.org/abs/2405.04434)] [[repo](https://github.com/deepseek-ai/DeepSeek-V2)]\n\n31. **GECKO**: \"GECKO: Generative Language Model for English, Code and Korean\" [2024-05] [[paper](https://arxiv.org/abs/2405.15640)] [[model](https://huggingface.co/kifai/GECKO-7B)]\n\n32. **MAP-Neo**: \"MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series\" [2024-05] [[paper](https://arxiv.org/abs/2405.19327)] [[repo](https://github.com/multimodal-art-projection/MAP-NEO)]\n\n33. **Zyda**: \"Zyda: A 1.3T Dataset for Open Language Modeling\" [2024-06] [[paper](https://arxiv.org/abs/2406.01981)]\n\n34. **Skywork-MoE**: \"Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models\" [2024-06] [[paper](https://arxiv.org/abs/2406.06563)]\n\n35. **Xmodel-LM**: \"Xmodel-LM Technical Report\" [2024-06] [[paper](https://arxiv.org/abs/2406.02856)]\n\n36. **GEB**: \"GEB-1.3B: Open Lightweight Large Language Model\" [2024-06] [[paper](https://arxiv.org/abs/2406.09900)]\n\n37. **HARE**: \"HARE: HumAn pRiors, a key to small language model Efficiency\" [2024-06] [[paper](https://arxiv.org/abs/2406.11410)]\n\n38. **DCLM**: \"DataComp-LM: In search of the next generation of training sets for language models\" [2024-06] [[paper](https://arxiv.org/abs/2406.11794)]\n\n39. **Nemotron-4**: \"Nemotron-4 340B Technical Report\" [2024-06] [[paper](https://arxiv.org/abs/2406.11704)]\n\n40. **ChatGLM**: \"ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools\" [2024-06] [[paper](https://arxiv.org/abs/2406.12793)]\n\n41. **FineWeb**: \"The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale\" [2024-06] [[paper](https://arxiv.org/abs/2406.17557)]\n\n42. **YuLan**: \"YuLan: An Open-source Large Language Model\" [2024-06] [[paper](https://arxiv.org/abs/2406.19853)]\n\n43. **Gemma 2**: \"Gemma 2: Improving Open Language Models at a Practical Size\" [2024-06] [[paper](https://storage.googleapis.com/deepmind-media/gemma/gemma-2-report.pdf)]\n\n44. **H2O-Danube3**: \"H2O-Danube3 Technical Report\" [2024-07] [[paper](https://arxiv.org/abs/2407.09276)]\n\n45. **Qwen2**: \"Qwen2 Technical Report\" [2024-07] [[paper](https://arxiv.org/abs/2407.10671)]\n\n46. **ALLaM**: \"ALLaM: Large Language Models for Arabic and English\" [2024-07] [[paper](https://arxiv.org/abs/2407.15390)]\n\n47. **SeaLLMs 3**: \"SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages\" [2024-07] [[paper](https://arxiv.org/abs/2407.19672)]\n\n48. **AFM**: \"Apple Intelligence Foundation Language Models\" [2024-07] [[paper](https://arxiv.org/abs/2407.21075)]\n\n49. \"To Code, or Not To Code? Exploring Impact of Code in Pre-training\" [2024-08] [[paper](https://arxiv.org/abs/2408.10914)]\n\n50. **OLMoE**: \"OLMoE: Open Mixture-of-Experts Language Models\" [2024-09] [[paper](https://arxiv.org/abs/2409.02060)]\n\n51. \"How Does Code Pretraining Affect Language Model Task Performance?\" [2024-09] [[paper](https://arxiv.org/abs/2409.04556)]\n\n52. **EuroLLM**: \"EuroLLM: Multilingual Language Models for Europe\" [2024-09] [[paper](https://arxiv.org/abs/2409.16235)]\n\n53. \"Which Programming Language and What Features at Pre-training Stage Affect Downstream Logical Inference Performance?\" [2024-10] [EMNLP 2024] [[paper](https://arxiv.org/abs/2410.06735)]\n\n54. **GPT-4o**: \"GPT-4o System Card\" [2024-10] [[paper](https://arxiv.org/abs/2410.21276)]\n\n55. **Hunyuan-Large**: \"Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent\" [2024-11] [[paper](https://arxiv.org/abs/2411.02265)]\n\n56. **Crystal**: \"Crystal: Illuminating LLM Abilities on Language and Code\" [2024-11] [[paper](https://arxiv.org/abs/2411.04156)]\n\n57. **Zyda-2**: \"Zyda-2: a 5 Trillion Token High-Quality Dataset\" [2024-11] [[paper](https://arxiv.org/abs/2411.06068)]\n\n58. **Xmodel-1.5**: \"Xmodel-1.5: An 1B-scale Multilingual LLM\" [2024-11] [[paper](https://arxiv.org/abs/2411.10083)]\n\n59. **Yi-Lightning**: \"Yi-Lightning Technical Report\" [2024-12] [[paper](https://arxiv.org/abs/2412.01253)]\n\n60. \"RedStone: Curating General, Code, Math, and QA Data for Large Language Models\" [2024-12] [[paper](https://arxiv.org/abs/2412.03398)]\n\n61. **EXAONE 3.5**: \"EXAONE 3.5: Series of Large Language Models for Real-world Use Cases\" [2024-12] [[paper](https://arxiv.org/abs/2412.04862)]\n\n62. \"The Rise and Down of Babel Tower: Investigating the Evolution Process of Multilingual Code Large Language Model\" [2024-12] [[paper](https://arxiv.org/abs/2412.07298)]\n\n63. **Phi-4**: \"Phi-4 Technical Report\" [2024-12] [[paper](https://arxiv.org/abs/2412.08905)]\n\n64. **Typhoon 2**: \"Typhoon 2: A Family of Open Text and Multimodal Thai Large Language Models\" [2024-12] [[paper](https://arxiv.org/abs/2412.13702)]\n\n65. **Qwen2.5**: \"Qwen2.5 Technical Report\" [2024-12] [[paper](https://arxiv.org/abs/2412.15115)]\n\n66. **YuLan-Mini**: \"YuLan-Mini: An Open Data-efficient Language Model\" [2024-12] [[paper](https://arxiv.org/abs/2412.17743)]\n\n67. **DeepSeek-V3**: \"DeepSeek-V3 Technical Report\" [2024-12] [[paper](https://arxiv.org/abs/2412.19437)]\n\n68. **OLMo 2**: \"2 OLMo 2 Furious\" [2024-12] [[paper](https://arxiv.org/abs/2501.00656)]\n\n69. **FinerWeb**: \"FinerWeb-10BT: Refining Web Data with LLM-Based Line-Level Filtering\" [2025-01] [[paper](https://arxiv.org/abs/2501.07314)]\n\n70. **MiniMax-01**: \"MiniMax-01: Scaling Foundation Models with Lightning Attention\" [2025-01] [[paper](https://arxiv.org/abs/2501.08313)]\n\n71. **SmolLM2**: \"SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model\" [2025-02] [[paper](https://arxiv.org/abs/2502.02737)]\n\n72. **Salamandra**: \"Salamandra Technical Report\" [2025-02] [[paper](https://arxiv.org/abs/2502.08489)]\n\n73. **Kanana**: \"Kanana: Compute-efficient Bilingual Language Models\" [2025-02] [[paper](https://arxiv.org/abs/2502.18934)]\n\n74. **Phi-4-Mini**: \"Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs\" [2025-03] [[paper](https://arxiv.org/abs/2503.01743)]\n\n75. **Ling**: \"Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs\" [2025-03] [[paper](https://arxiv.org/abs/2503.05139)]\n\n### 2.2 Existing LLM Adapted to Code\n\nThese models are general-purpose LLMs further pretrained on code-related data.\n\n- **Codex** (GPT-3): \"Evaluating Large Language Models Trained on Code\" [2021-07] [[paper](https://arxiv.org/abs/2107.03374)]\n\n- **PaLM Coder** (PaLM): \"PaLM: Scaling Language Modeling with Pathways\" [2022-04] [JMLR] [[paper](https://arxiv.org/abs/2204.02311)]\n\n- **Minerva** (PaLM): \"Solving Quantitative Reasoning Problems with Language Models\" [2022-06] [[paper](https://arxiv.org/abs/2206.14858)]\n\n- **PaLM 2 \\*** (PaLM 2): \"PaLM 2 Technical Report\" [2023-05] [[paper](https://arxiv.org/abs/2305.10403)]\n\n- **Code LLaMA** (LLaMA 2): \"Code Llama: Open Foundation Models for Code\" [2023-08] [[paper](https://arxiv.org/abs/2308.12950)] [[repo](https://github.com/facebookresearch/codellama)]\n\n- **Lemur** (LLaMA 2): \"Lemur: Harmonizing Natural Language and Code for Language Agents\" [2023-10] [ICLR 2024 Spotlight] [[paper](https://arxiv.org/abs/2310.06830)]\n\n- **BTX** (LLaMA 2): \"Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM\" [2024-03] [[paper](https://arxiv.org/abs/2403.07816)]\n\n- **HiRoPE**: \"HiRoPE: Length Extrapolation for Code Models Using Hierarchical Position\" [2024-03] [ACL 2024] [[paper](https://arxiv.org/abs/2403.19115)]\n\n- \"Mastering Text, Code and Math Simultaneously via Fusing Highly Specialized Language Models\" [2024-03] [[paper](https://arxiv.org/abs/2403.08281)]\n\n- **CodeGemma**: \"CodeGemma: Open Code Models Based on Gemma\" [2024-04] [[paper](https://storage.googleapis.com/deepmind-media/gemma/codegemma_report.pdf)] [[model](https://huggingface.co/models?search=google/codegemma)]\n\n- **DeepSeek-Coder-V2**: \"DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence\" [2024-06] [[paper](https://arxiv.org/abs/2406.11931)]\n\n- \"Promise and Peril of Collaborative Code Generation Models: Balancing Effectiveness and Memorization\" [2024-09] [[paper](https://arxiv.org/abs/2409.12020)]\n\n- **Qwen2.5-Coder**: \"Qwen2.5-Coder Technical Report\" [2024-09] [[paper](https://arxiv.org/abs/2409.12186)]\n\n- **Lingma SWE-GPT**: \"Lingma SWE-GPT: An Open Development-Process-Centric Language Model for Automated Software Improvement\" [2024-11] [[paper](https://arxiv.org/abs/2411.00622)]\n\n- **Ling-Coder-Lite**: \"Every Sample Matters: Leveraging Mixture-of-Experts and High-Quality Data for Efficient and Accurate Code LLM\" [2025-03] [[paper](https://arxiv.org/abs/2503.17793)]\n\n### 2.3 General Pretraining on Code\n\nThese models are Transformer encoders, decoders, and encoder-decoders pretrained from scratch using existing objectives for general language modeling.\n\n\u003cp align='center'\u003e\n\u003cimg src='imgs/model_detail.png' style='width: 90%; '\u003e\n\u003c/p\u003e\n\n#### Encoder\n\n1. **CuBERT** (MLM + NSP): \"Learning and Evaluating Contextual Embedding of Source Code\" [2019-12] [ICML 2020] [[paper](https://arxiv.org/abs/2001.00059)] [[repo](https://github.com/google-research/google-research/tree/master/cubert)]\n\n2. **CodeBERT** (MLM + RTD): \"CodeBERT: A Pre-Trained Model for Programming and Natural Languages\" [2020-02] [EMNLP 2020 findings] [[paper](https://arxiv.org/abs/2002.08155)] [[repo](https://github.com/microsoft/CodeBERT)]\n\n3. **GraphCodeBERT** (MLM + DFG Edge Prediction + DFG Node Alignment): \"GraphCodeBERT: Pre-training Code Representations with Data Flow\" [2020-09] [ICLR 2021] [[paper](https://arxiv.org/abs/2009.08366)] [[repo](https://github.com/microsoft/CodeBERT)]\n\n4. **SynCoBERT** (MLM + Identifier Prediction + AST Edge Prediction + Contrastive Learning): \"SynCoBERT: Syntax-Guided Multi-Modal Contrastive Pre-Training for Code Representation\" [2021-08] [[paper](https://arxiv.org/abs/2108.04556)]\n\n5. **DISCO** (MLM + Node Type MLM + Contrastive Learning): \"Towards Learning (Dis)-Similarity of Source Code from Program Contrasts\" [2021-10] [ACL 2022] [[paper](https://arxiv.org/abs/2110.03868)]\n\n6. **Code-MVP** (MLM + Type Inference + Contrastive Learning): \"CODE-MVP: Learning to Represent Source Code from Multiple Views with Contrastive Pre-Training\" [2022-05] [NAACL 2022 Technical Track] [[paper](https://arxiv.org/abs/2205.02029)]\n\n7. **CodeSage** (MLM + Deobfuscation + Contrastive Learning): \"Code Representation Learning At Scale\" [2024-02] [ICLR 2024] [[paper](https://arxiv.org/abs/2402.01935)]\n\n8. **CoLSBERT** (MLM): \"Scaling Laws Behind Code Understanding Model\" [2024-02] [[paper](https://arxiv.org/abs/2402.12813)]\n\n#### Decoder\n\n1. **GPT-C** (CLM): \"IntelliCode Compose: Code Generation Using Transformer\" [2020-05] [ESEC/FSE 2020] [[paper](https://arxiv.org/abs/2005.08025)]\n\n2. **CodeGPT** (CLM): \"CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation\" [2021-02] [NeurIPS Datasets and Benchmarks 2021] [[paper](https://arxiv.org/abs/2102.04664)] [[repo](https://github.com/microsoft/CodeXGLUE)]\n\n3. **CodeParrot** (CLM) [2021-12] [[blog](https://huggingface.co/blog/codeparrot)]\n\n4. **PolyCoder** (CLM): \"A Systematic Evaluation of Large Language Models of Code\" [2022-02] [DL4C@ICLR 2022] [[paper](https://arxiv.org/abs/2202.13169)] [[repo](https://github.com/VHellendoorn/Code-LMs)]\n\n5. **CodeGen** (CLM): \"CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis\" [2022-03] [ICLR 2023] [[paper](https://arxiv.org/abs/2203.13474)] [[repo](https://github.com/salesforce/CodeGen)]\n\n6. **InCoder** (Causal Masking): \"InCoder: A Generative Model for Code Infilling and Synthesis\" [2022-04] [ICLR 2023] [[paper](https://arxiv.org/abs/2204.05999)] [[repo](https://github.com/dpfried/incoder)]\n\n7. **PyCodeGPT** (CLM): \"CERT: Continual Pre-Training on Sketches for Library-Oriented Code Generation\" [2022-06] [IJCAI-ECAI 2022] [[paper](https://arxiv.org/abs/2206.06888)] [[repo](https://github.com/microsoft/PyCodeGPT)]\n\n8. **PanGu-Coder** (CLM): \"PanGu-Coder: Program Synthesis with Function-Level Language Modeling\" [2022-07] [[paper](https://arxiv.org/abs/2207.11280)]\n\n9. **SantaCoder** (FIM): \"SantaCoder: don't reach for the stars!\" [2023-01] [[paper](https://arxiv.org/abs/2301.03988)] [[model](https://huggingface.co/bigcode/santacoder)]\n\n10. **CodeGeeX** (CLM): \"CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X\" [2023-03] [[paper](https://arxiv.org/abs/2303.17568)] [[repo](https://github.com/THUDM/CodeGeeX)]\n\n11. **StarCoder** (FIM): \"StarCoder: may the source be with you!\" [2023-05] [[paper](https://arxiv.org/abs/2305.06161)] [[model](https://huggingface.co/bigcode/starcoder)]\n\n12. **Phi-1** (CLM): \"Textbooks Are All You Need\" [2023-06] [[paper](https://arxiv.org/abs/2306.11644)] [[model](https://huggingface.co/microsoft/phi-1)]\n\n13. **CodeFuse** (CLM): \"CodeFuse-13B: A Pretrained Multi-lingual Code Large Language Model\" [2023-10] [[paper](https://arxiv.org/abs/2310.06266)] [[model](https://huggingface.co/codefuse-ai/CodeFuse-13B)]\n\n14. **DeepSeek Coder** (CLM+FIM): \"DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence\" [2024-01] [[paper](https://arxiv.org/abs/2401.14196)] [[repo](https://github.com/deepseek-ai/DeepSeek-Coder)]\n\n15. **StarCoder2** (CLM+FIM): \"StarCoder 2 and The Stack v2: The Next Generation\" [2024-02] [[paper](https://arxiv.org/abs/2402.19173)] [[repo](https://github.com/bigcode-project/starcoder2)]\n\n16. **CodeShell** (CLM+FIM): \"CodeShell Technical Report\" [2024-03] [[paper](https://arxiv.org/abs/2403.15747)] [[repo](https://github.com/WisdomShell/codeshell)]\n\n17. **CodeQwen1.5** [2024-04] [[blog](https://qwenlm.github.io/blog/codeqwen1.5/)]\n\n18. **Granite**: \"Granite Code Models: A Family of Open Foundation Models for Code Intelligence\" [2024-05] [[paper](https://arxiv.org/abs/2405.04324)] \"Scaling Granite Code Models to 128K Context\" [2024-07] [[paper](https://arxiv.org/abs/2407.13739)]\n\n19. **NT-Java**: \"Narrow Transformer: Starcoder-Based Java-LM For Desktop\" [2024-07] [[paper](https://arxiv.org/abs/2407.03941)]\n\n20. **Arctic-SnowCoder**: \"Arctic-SnowCoder: Demystifying High-Quality Data in Code Pretraining\" [2024-09] [[paper](https://arxiv.org/abs/2409.02326)]\n\n21. **aiXcoder**: \"aiXcoder-7B: A Lightweight and Effective Large Language Model for Code Completion\" [2024-10] [[paper](https://arxiv.org/abs/2410.13187)]\n\n22. **OpenCoder**: \"OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models\" [2024-11] [[paper](https://arxiv.org/abs/2411.04905)]\n\n#### Encoder-Decoder\n\n1. **PyMT5** (Span Corruption): \"PyMT5: multi-mode translation of natural language and Python code with transformers\" [2020-10] [EMNLP 2020] [[paper](https://arxiv.org/abs/2010.03150)]\n\n2. **Mastropaolo et al.** (MLM + Deobfuscation): \"DOBF: A Deobfuscation Pre-Training Objective for Programming Languages\" [2021-02] [ICSE 2021] [[paper](https://arxiv.org/abs/2102.02017)] [[repo](https://github.com/antonio-mastropaolo/TransferLearning4Code)]\n\n3. **DOBF** (Span Corruption): \"Studying the Usage of Text-To-Text Transfer Transformer to Support Code-Related Tasks\" [2021-02] [NeurIPS 2021] [[paper](https://arxiv.org/abs/2102.07492)] [[repo](https://github.com/facebookresearch/CodeGen/blob/main/docs/dobf.md)]\n\n4. **PLBART** (DAE): \"Unified Pre-training for Program Understanding and Generation\" [2021-03] [NAACL 2021] [[paper](https://arxiv.org/abs/2103.06333)] [[repo](https://github.com/wasiahmad/PLBART)]\n\n5. **CodeT5** (Span Corruption + Identifier Tagging + Masked Identifier Prediction + Text2Code + Code2Text): \"CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation\" [2021-09] [EMNLP 2021] [[paper](https://arxiv.org/abs/2109.00859)] [[repo](https://github.com/salesforce/CodeT5)]\n\n6. **SPT-Code** (Span Corruption + NSP + Method Name Prediction): \"SPT-Code: Sequence-to-Sequence Pre-Training for Learning Source Code Representations\" [2022-01] [ICSE 2022 Technical Track] [[paper](https://arxiv.org/abs/2201.01549)]\n\n7. **AlphaCode** (MLM + CLM): \"Competition-Level Code Generation with AlphaCode\" [2022-02] [Science] [[paper](https://arxiv.org/abs/2203.07814)] [[blog](https://deepmind.google/discover/blog/competitive-programming-with-alphacode/)]\n\n8. **NatGen** (Code Naturalization): \"NatGen: Generative pre-training by \"Naturalizing\" source code\" [2022-06] [ESEC/FSE 2022] [[paper](https://arxiv.org/abs/2206.07585)] [[repo](https://github.com/saikat107/NatGen)]\n\n9. **ERNIE-Code** (Span Corruption + Pivot-based Translation LM): \"ERNIE-Code: Beyond English-Centric Cross-lingual Pretraining for Programming Languages\" [2022-12] [ACL23 (Findings)] [[paper](https://aclanthology.org/2023.findings-acl.676.pdf)][[repo](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/model_zoo/ernie-code)]\n\n10. **CodeT5+** (Span Corruption + CLM + Text-Code Contrastive Learning + Text-Code Translation): \"CodeT5+: Open Code Large Language Models for Code Understanding and Generation\" [2023-05] [EMNLP 2023] [[paper](https://arxiv.org/abs/2305.07922)] [[repo](https://github.com/salesforce/CodeT5)]\n\n11. **AST-T5** (Span Corruption): \"AST-T5: Structure-Aware Pretraining for Code Generation and Understanding\" [2024-01] [ICML 2024] [[paper](https://arxiv.org/abs/2401.03003)]\n\n12. **DivoT5**: \"Directional Diffusion-Style Code Editing Pre-traini\" [2025-01] [[paper](https://arxiv.org/abs/2501.12079)]\n\n#### UniLM\n\n1. **CugLM** (MLM + NSP + CLM): \"Multi-task Learning based Pre-trained Language Model for Code Completion\" [2020-12] [ASE 2020] [[paper](https://arxiv.org/abs/2012.14631)]\n\n2. **UniXcoder** (MLM + NSP + CLM + Span Corruption + Contrastive Learning + Code2Text): \"UniXcoder: Unified Cross-Modal Pre-training for Code Representation\" [2022-03] [ACL 2022] [[paper](https://arxiv.org/abs/2203.03850)] [[repo](https://github.com/microsoft/CodeBERT)]\n\n### 2.4 (Instruction) Fine-Tuning on Code\n\nThese models apply Instruction Fine-Tuning techniques to enhance the capacities of Code LLMs.\n\n1. **WizardCoder** (StarCoder + Evol-Instruct): \"WizardCoder: Empowering Code Large Language Models with Evol-Instruct\" [2023-06] [ICLR 2024] [[paper](https://arxiv.org/abs/2306.08568)] [[repo](https://github.com/nlpxucan/WizardLM)]\n\n2. **PanGu-Coder 2** (StarCoder + Evol-Instruct + RRTF): \"PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback\" [2023-07] [[paper](https://arxiv.org/abs/2307.14936)]\n\n3. **OctoCoder** (StarCoder) / **OctoGeeX** (CodeGeeX2): \"OctoPack: Instruction Tuning Code Large Language Models\" [2023-08] [ICLR 2024 Spotlight] [[paper](https://arxiv.org/abs/2308.07124)] [[repo](https://github.com/bigcode-project/octopack)]\n\n4. \"At Which Training Stage Does Code Data Help LLMs Reasoning\" [2023-09] [ICLR 2024 Spotlight] [[paper](https://arxiv.org/abs/2309.16298)]\n\n5. **InstructCoder**: \"InstructCoder: Instruction Tuning Large Language Models for Code Editing\" [[paper](https://arxiv.org/abs/2310.20329)] [[repo](https://github.com/qishenghu/CodeInstruct)]\n\n6. **MFTCoder**: \"MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning\" [2023-11] [KDD 2024] [[paper](https://arxiv.org/abs/2311.02303)] [[repo](https://github.com/codefuse-ai/MFTCoder)]\n\n7. \"LLM-Assisted Code Cleaning For Training Accurate Code Generators\" [2023-11] [ICLR 2024] [[paper](https://arxiv.org/abs/2311.14904)]\n\n8. **Magicoder**: \"Magicoder: Empowering Code Generation with OSS-Instruct\" [2023-12] [ICML 2024] [[paper](https://arxiv.org/abs/2312.02120)]\n\n9. **WaveCoder**: \"WaveCoder: Widespread And Versatile Enhancement For Code Large Language Models By Instruction Tuning\" [2023-12] [ACL 2024] [[paper](https://arxiv.org/abs/2312.14187)]\n\n10. **Astraios**: \"Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models\" [2024-01] [[paper](https://arxiv.org/abs/2401.00788)]\n\n11. **DolphCoder**: \"DolphCoder: Echo-Locating Code Large Language Models with Diverse and Multi-Objective Instruction Tuning\" [2024-02] [ACL 2024] [[paper](https://arxiv.org/abs/2402.09136)]\n\n12. **SafeCoder**: \"Instruction Tuning for Secure Code Generation\" [2024-02] [ICML 2024] [[paper](https://arxiv.org/abs/2402.09497)]\n\n13. \"Code Needs Comments: Enhancing Code LLMs with Comment Augmentation\" [ACL 2024 Findings] [[paper](https://arxiv.org/abs/2402.13013)]\n\n14. **CCT**: \"Code Comparison Tuning for Code Large Language Models\" [2024-03] [[paper](https://arxiv.org/abs/2403.19121)]\n\n15. **SAT**: \"Structure-aware Fine-tuning for Code Pre-trained Models\" [2024-04] [[paper](https://arxiv.org/abs/2404.07471)]\n\n16. **CodeFort**: \"CodeFort: Robust Training for Code Generation Models\" [2024-04] [EMNLP 2024 Findings] [[paper](https://arxiv.org/abs/2405.01567)]\n\n17. **XFT**: \"XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts\" [2024-04] [ACL 2024] [[paper](https://arxiv.org/abs/2404.15247)] [[repo](https://github.com/ise-uiuc/xft)]\n\n18. **AIEV-Instruct**: \"AutoCoder: Enhancing Code Large Language Model with AIEV-Instruct\" [2024-05] [[paper](https://arxiv.org/abs/2405.14906)]\n\n19. **AlchemistCoder**: \"AlchemistCoder: Harmonizing and Eliciting Code Capability by Hindsight Tuning on Multi-source Data\" [2024-05] [NeurIPS 2024] [[paper](https://arxiv.org/abs/2405.19265)]\n\n20. \"From Symbolic Tasks to Code Generation: Diversification Yields Better Task Performers\" [2024-05] [[paper](https://arxiv.org/abs/2405.19787)]\n\n21. \"Unveiling the Impact of Coding Data Instruction Fine-Tuning on Large Language Models Reasoning\" [2024-05] [[paper](https://arxiv.org/abs/2405.20535)]\n\n22. **SemCoder**: \"SemCoder: Training Code Language Models with Comprehensive Semantics Reasoning\" [2024-06] [NeurIPS 2024] [[paper](https://arxiv.org/abs/2406.01006)]\n\n23. **PLUM**: \"PLUM: Preference Learning Plus Test Cases Yields Better Code Language Models\" [2024-06] [[paper](https://arxiv.org/abs/2406.06887)]\n\n24. **mCoder**: \"McEval: Massively Multilingual Code Evaluation\" [2024-06] [[paper](https://arxiv.org/abs/2406.07436)]\n\n25. \"Unlock the Correlation between Supervised Fine-Tuning and Reinforcement Learning in Training Code Large Language Models\" [2024-06] [[paper](https://arxiv.org/abs/2406.10305)]\n\n26. **Code-Optimise**: \"Code-Optimise: Self-Generated Preference Data for Correctness and Efficiency\" [2024-06] [[paper](https://arxiv.org/abs/2406.12502)]\n\n27. **UniCoder**: \"UniCoder: Scaling Code Large Language Model via Universal Code\" [2024-06] [ACL 2024] [[paper](https://arxiv.org/abs/2406.16441)]\n\n28. \"Brevity is the soul of wit: Pruning long files for code generation\" [2024-06] [[paper](https://arxiv.org/abs/2407.00434)]\n\n29. \"Code Less, Align More: Efficient LLM Fine-tuning for Code Generation with Data Pruning\" [2024-07] [[paper](https://arxiv.org/abs/2407.05040)]\n\n30. **InverseCoder**: \"InverseCoder: Unleashing the Power of Instruction-Tuned Code LLMs with Inverse-Instruct\" [2024-07] [[paper](https://arxiv.org/abs/2407.05700)]\n\n31. \"Curriculum Learning for Small Code Language Models\" [2024-07] [[paper](https://arxiv.org/abs/2407.10194)]\n\n32. **Genetic-Instruct**: \"Genetic Instruct: Scaling up Synthetic Generation of Coding Instructions for Large Language Models\" [2024-07] [[paper](https://arxiv.org/abs/2407.21077)]\n\n33. **DataScope**: \"API-guided Dataset Synthesis to Finetune Large Code Models\" [2024-08] [[paper](https://arxiv.org/abs/2408.08343)]\n\n34. **XCoder**: \"How Do Your Code LLMs Perform? Empowering Code Instruction Tuning with High-Quality Data\" [2024-09] [EMNLP 2024] [[paper](https://arxiv.org/abs/2409.03810)]\n\n35. **GALLa**: \"GALLa: Graph Aligned Large Language Models for Improved Source Code Understanding\" [2024-09] [[paper](https://arxiv.org/abs/2409.04183)]\n\n36. **HexaCoder**: \"HexaCoder: Secure Code Generation via Oracle-Guided Synthetic Training Data\" [2024-09] [[paper](https://arxiv.org/abs/2409.06446)]\n\n37. **AMR-Evol**: \"AMR-Evol: Adaptive Modular Response Evolution Elicits Better Knowledge Distillation for Large Language Models in Code Generation\" [2024-10] [EMNLP 2024] [[paper](https://arxiv.org/abs/2410.00558)]\n\n38. **LintSeq**: \"Training Language Models on Synthetic Edit Sequences Improves Code Synthesis\" [2024-10] [[paper](https://arxiv.org/abs/2410.02749)]\n\n39. **CoBa**: \"CoBa: Convergence Balancer for Multitask Finetuning of Large Language Models\" [2024-10] [EMNLP 2024] [[paper](https://arxiv.org/abs/2410.06741)]\n\n40. **CursorCore**: \"CursorCore: Assist Programming through Aligning Anything\" [2024-10] [[paper](https://arxiv.org/abs/2410.07002)]\n\n41. **SelfCodeAlign**: \"SelfCodeAlign: Self-Alignment for Code Generation\" [2024-10] [NeurIPS 2024] [[paper](https://arxiv.org/abs/2410.24198)]\n\n42. \"Mastering the Craft of Data Synthesis for CodeLLMs\" [2024-10] [[paper](https://arxiv.org/abs/2411.00005)]\n\n43. **CodeLutra**: \"CodeLutra: Boosting LLM Code Generation via Preference-Guided Refinement\" [2024-11] [[paper](https://arxiv.org/abs/2411.05199)]\n\n44. **DSTC**: \"DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs\" [2024-11] [[paper](https://arxiv.org/abs/2411.13611)]\n\n45. **WarriorCoder**: \"WarriorCoder: Learning from Expert Battles to Augment Code Large Language Models\" [2024-12] [[paper](https://arxiv.org/abs/2412.17395)]\n\n46. **EpiCoder**: \"EpiCoder: Encompassing Diversity and Complexity in Code Generation\" [2025-01] [[paper](https://arxiv.org/abs/2501.04694)]\n\n47. **Qwen2.5-xCoder**: \"Multi-Agent Collaboration for Multilingual Code Instruction Tuning\" [2025-02] [[paper](https://arxiv.org/abs/2502.07487)]\n\n48. **UnitCoder**: \"UnitCoder: Scalable Iterative Code Synthesis with Unit Test Guidance\" [2025-02] [[paper](https://arxiv.org/abs/2502.11460)]\n\n49. **GiFT**: \"GiFT: Gibbs Fine-Tuning for Code Generation\" [2025-02] [[paper](https://arxiv.org/abs/2502.11466)]\n\n50. **KODCODE**: \"KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding\" [2025-03] [[paper](https://arxiv.org/abs/2503.02951)]\n\n51. **NextCoder**: \"Robust Learning of Diverse Code Edits\" [2025-03] [[paper](https://arxiv.org/abs/2503.03656)]\n\n52. **FAIT**: \"FAIT: Fault-Aware Fine-Tuning for Better Code Generation\" [2025-03] [[paper](https://arxiv.org/abs/2503.16913)]\n\n### 2.5 Reinforcement Learning on Code\n\n1. **CompCoder**: \"Compilable Neural Code Generation with Compiler Feedback\" [2022-03] [ACL 2022] [[paper](https://arxiv.org/abs/2203.05132)]\n\n2. **CodeRL**: \"CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning\" [2022-07] [NeurIPS 2022] [[paper](https://arxiv.org/abs/2207.01780)] [[repo](https://github.com/salesforce/CodeRL)]\n\n3. **PPOCoder**: \"Execution-based Code Generation using Deep Reinforcement Learning\" [2023-01] [TMLR 2023] [[paper](https://arxiv.org/abs/2301.13816)] [[repo](https://github.com/reddy-lab-code-research/PPOCoder)]\n\n4. **RLTF**: \"RLTF: Reinforcement Learning from Unit Test Feedback\" [2023-07] [[paper](https://arxiv.org/abs/2307.04349)] [[repo](https://github.com/Zyq-scut/RLTF)]\n\n5. **B-Coder**: \"B-Coder: Value-Based Deep Reinforcement Learning for Program Synthesis\" [2023-10] [ICLR 2024] [[paper](https://arxiv.org/abs/2310.03173)]\n\n6. **IRCoCo**: \"IRCoCo: Immediate Rewards-Guided Deep Reinforcement Learning for Code Completion\" [2024-01] [FSE 2024] [[paper](https://arxiv.org/abs/2401.16637)]\n\n7. **StepCoder**: \"StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback\" [2024-02] [ACL 2024] [[paper](https://arxiv.org/abs/2402.01391)]\n\n8. **RLPF \u0026 DPA**: \"Performance-Aligned LLMs for Generating Fast Code\" [2024-04] [[paper](https://arxiv.org/abs/2404.18864)]\n\n9. \"Measuring memorization in RLHF for code completion\" [2024-06] [[paper](https://arxiv.org/abs/2406.11715)]\n\n10. \"Applying RLAIF for Code Generation with API-usage in Lightweight LLMs\" [2024-06] [[paper](https://arxiv.org/abs/2406.20060)]\n\n11. **RLCoder**: \"RLCoder: Reinforcement Learning for Repository-Level Code Completion\" [2024-07] [[paper](https://arxiv.org/abs/2407.19487)]\n\n12. **PF-PPO**: \"Policy Filtration in RLHF to Fine-Tune LLM for Code Generation\" [2024-09] [[paper](https://arxiv.org/abs/2409.06957)]\n\n13. **Coffee-Gym**: \"Coffee-Gym: An Environment for Evaluating and Improving Natural Language Feedback on Erroneous Code\" [2024-09] [EMNLP 2024] [[paper](https://arxiv.org/abs/2409.19715)]\n\n14. **RLEF**: \"RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning\" [2024-10] [[paper](https://arxiv.org/abs/2410.02089)]\n\n15. **CodePMP**: \"CodePMP: Scalable Preference Model Pretraining for Large Language Model Reasoning\" [2024-10] [[paper](https://arxiv.org/abs/2410.02229)]\n\n16. **CodeDPO**: \"CodeDPO: Aligning Code Models with Self Generated and Verified Source Code\" [2024-10] [[paper](https://arxiv.org/abs/2410.05605)]\n\n17. \"Process Supervision-Guided Policy Optimization for Code Generation\" [2024-10] [[paper](https://arxiv.org/abs/2410.17621)]\n\n18. \"Aligning CodeLLMs with Direct Preference Optimization\" [2024-10] [[paper](https://arxiv.org/abs/2410.18585)]\n\n19. **FALCON**: \"FALCON: Feedback-driven Adaptive Long/short-term memory reinforced Coding Optimization system\" [2024-10] [[paper](https://arxiv.org/abs/2410.21349)]\n\n20. **PFPO**: \"Preference Optimization for Reasoning with Pseudo Feedback\" [2024-11] [[paper](https://arxiv.org/abs/2411.16345)]\n\n21. **o1-Coder**: \"o1-Coder: an o1 Replication for Coding\" [2024-11] [[paper](https://arxiv.org/abs/2412.00154)]\n\n22. **PRLCoder**: \"Process-Supervised Reinforcement Learning for Code Generation\" [2025-02] [[paper](https://arxiv.org/abs/2502.01715)]\n\n23. **AceCoder**: \"ACECODER: Acing Coder RL via Automated Test-Case Synthesis\" [2025-02] [[paper](https://arxiv.org/abs/2502.01718)]\n\n24. **Focused-DPO**: \"Focused-DPO: Enhancing Code Generation Through Focused Preference Optimization on Error-Prone Points\" [2025-02] [[paper](https://arxiv.org/abs/2502.11475)]\n\n25. **SWE-RL**: \"SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution\" [2025-02] [[paper](https://arxiv.org/abs/2502.18449)]\n\n## 3. When Coding Meets Reasoning\n\n### 3.1 Coding for Reasoning\n\n1. **PAL**: \"PAL: Program-aided Language Models\" [2022-11] [ICML 2023] [[paper](https://arxiv.org/abs/2211.10435)] [[repo](https://github.com/reasoning-machines/pal)]\n\n2. **PoT**: \"Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks\" [2022-11] [TMLR 2023] [[paper](https://arxiv.org/abs/2211.12588)] [[repo](https://github.com/wenhuchen/Program-of-Thoughts)]\n\n3. **PaD**: \"PaD: Program-aided Distillation Can Teach Small Models Reasoning Better than Chain-of-thought Fine-tuning\" [2023-05] [NAACL 2024] [[paper](https://arxiv.org/abs/2305.13888)]\n\n4. **CSV**: \"Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification\" [2023-08] [ICLR 2024] [[paper](https://arxiv.org/abs/2308.07921)]\n\n5. **MathCoder**: \"MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning\" [2023-10] [ICLR 2024] [[paper](https://arxiv.org/abs/2310.03731)]\n\n6. **CoC**: \"Chain of Code: Reasoning with a Language Model-Augmented Code Emulator\" [2023-12] [ICML 2024] [[paper](https://arxiv.org/abs/2312.04474)]\n\n7. **EHRAgent**: \"EHRAgent: Code Empowers Large Language Models for Few-shot Complex Tabular Reasoning on Electronic Health Records\" [2024-01] [EMNLP 2024] [[paper](https://arxiv.org/abs/2401.07128)]\n\n8. **MARIO**: \"MARIO: MAth Reasoning with code Interpreter Output -- A Reproducible Pipeline\" [2024-01] [ACL 2024 Findings] [[paper](https://arxiv.org/abs/2401.08190)]\n\n9. \"Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs\" [2024-01] [EMNLP 2024] [[paper](https://arxiv.org/abs/2401.10065)]\n\n10. **ReGAL**: \"ReGAL: Refactoring Programs to Discover Generalizable Abstractions\" [2024-01] [ICML 2024] [[paper](https://arxiv.org/abs/2401.16467)]\n\n11. **CodeAct**: \"Executable Code Actions Elicit Better LLM Agents\" [2024-02] [ICML 2024] [[paper](https://arxiv.org/abs/2402.01030)]\n\n12. **MultiPoT**: \"Python is Not Always the Best Choice: Embracing Multilingual Program of Thoughts\" [2024-02] [EMNLP 2024] [[paper](https://arxiv.org/abs/2402.10691)]\n\n13. **HProPro**: \"Exploring Hybrid Question Answering via Program-based Prompting\" [2024-02] [ACL 2024] [[paper](https://arxiv.org/abs/2402.10812)]\n\n14. **HTL**: \"How Do Humans Write Code? Large Models Do It the Same Way Too\" [2024-02] [EMNLP 2024] [[paper](https://arxiv.org/abs/2402.15729)]\n\n15. **xSTREET**: \"Eliciting Better Multilingual Structured Reasoning from LLMs through Code\" [2024-03] [ACL 2024] [[paper](https://arxiv.org/abs/2403.02567)]\n\n16. **FlowMind**: \"FlowMind: Automatic Workflow Generation with LLMs\" [2024-03] [[paper](https://arxiv.org/abs/2404.13050)]\n\n17. **Think-and-Execute**: \"Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models\" [2024-04] [EMNLP 2024] [[paper](https://arxiv.org/abs/2404.02575)]\n\n18. **CoRE**: \"CoRE: LLM as Interpreter for Natural Language Programming, Pseudo-Code Programming, and Flow Programming of AI Agents\" [2024-05] [[paper](https://arxiv.org/abs/2405.06907)]\n\n19. **MuMath-Code**: \"MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical Reasoning\" [2024-05] [EMNLP 2024] [[paper](https://arxiv.org/abs/2405.07551)]\n\n20. **COGEX**: \"Learning to Reason via Program Generation, Emulation, and Search\" [2024-05] [NeurIPS 2024] [[paper](https://arxiv.org/abs/2405.16337)]\n\n21. \"Arithmetic Reasoning with LLM: Prolog Generation \u0026 Permutation\" [2024-05] [[paper](https://arxiv.org/abs/2405.17893)]\n\n22. \"Can LLMs Reason in the Wild with Programs?\" [2024-06] [EMNLP 2024 Findings] [[paper](https://arxiv.org/abs/2406.13764)]\n\n23. **DotaMath**: \"DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning\" [2024-07] [[paper](https://arxiv.org/abs/2407.04078)]\n\n24. **CIBench**: \"CIBench: Evaluating Your LLMs with a Code Interpreter Plugin\" [2024-07] [[paper](https://arxiv.org/abs/2407.10499)]\n\n25. **PyBench**: \"PyBench: Evaluating LLM Agent on various real-world coding tasks\" [2024-07] [[paper](https://arxiv.org/abs/2407.16732)]\n\n26. **AdaCoder**: \"AdaCoder: Adaptive Prompt Compression for Programmatic Visual Question Answering\" [2024-07] [[paper](https://arxiv.org/abs/2407.19410)]\n\n27. **PyramidCoder**: \"Pyramid Coder: Hierarchical Code Generator for Compositional Visual Question Answering\" [2024-07] [[paper](https://arxiv.org/abs/2407.20563)]\n\n28. **CodeGraph**: \"CodeGraph: Enhancing Graph Reasoning of LLMs with Code\" [2024-08] [[paper](https://arxiv.org/abs/2408.13863)]\n\n29. **SIaM**: \"SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models\" [2024-08] [[paper](https://arxiv.org/abs/2408.15565)]\n\n30. **CodePlan**: \"CodePlan: Unlocking Reasoning Potential in Large Langauge Models by Scaling Code-form Planning\" [2024-09] [[paper](https://arxiv.org/abs/2409.12452)]\n\n31. **PoT**: \"Proof of Thought : Neurosymbolic Program Synthesis allows Robust and Interpretable Reasoning\" [2024-09] [[paper](https://arxiv.org/abs/2409.17270)]\n\n32. **MetaMath**: \"MetaMath: Integrating Natural Language and Code for Enhanced Mathematical Reasoning in Large Language Models\" [2024-09] [[paper](https://arxiv.org/abs/2409.19381)]\n\n33. \"BabelBench: An Omni Benchmark for Code-Driven Analysis of Multimodal and Multistructured Data\" [2024-10] [[paper](https://arxiv.org/abs/2410.00773)]\n\n34. **CodeSteer**: \"Steering Large Language Models between Code Execution and Textual Reasoning\" [2024-10] [[paper](https://arxiv.org/abs/2410.03524)]\n\n35. **MathCoder2**: \"MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code\" [2024-10] [[paper](https://arxiv.org/abs/2410.08196)]\n\n36. **LLMFP**: \"Planning Anything with Rigor: General-Purpose Zero-Shot Planning with LLM-based Formalized Programming\" [2024-10] [[paper](https://arxiv.org/abs/2410.12112)]\n\n37. **Prove**: \"Not All Votes Count! Programs as Verifiers Improve Self-Consistency of Language Models for Math Reasoning\" [2024-10] [[paper](https://arxiv.org/abs/2410.12608)]\n\n38. **PROVE**: \"Trust but Verify: Programmatic VLM Evaluation in the Wild\" [2024-10] [[paper](https://arxiv.org/abs/2410.13121)]\n\n39. **GeoCoder**: \"GeoCoder: Solving Geometry Problems by Generating Modular Code through Vision-Language Models\" [2024-10] [[paper](https://arxiv.org/abs/2410.13510)]\n\n40. **ReasonAgain**: \"ReasonAgain: Using Extractable Symbolic Programs to Evaluate Mathematical Reasoning\" [2024-10] [[paper](https://arxiv.org/abs/2410.19056)]\n\n41. **GFP**: \"Gap-Filling Prompting Enhances Code-Assisted Mathematical Reasoning\" [2024-11] [[paper](https://arxiv.org/abs/2411.05407)]\n\n42. **UTMath**: \"UTMath: Math Evaluation with Unit Test via Reasoning-to-Coding Thoughts\" [2024-11] [[paper](https://arxiv.org/abs/2411.07240)]\n\n43. **CoCoP**: \"CoCoP: Enhancing Text Classification with LLM through Code Completion Prompt\" [2024-11] [[paper](https://arxiv.org/abs/2411.08979)]\n\n44. **REPL-Plan**: \"Interactive and Expressive Code-Augmented Planning with Large Language Models\" [2024-11] [[paper](https://arxiv.org/abs/2411.13826)]\n\n45. **CrossPAL**: \"Empowering Multi-step Reasoning across Languages via Program-Aided Language Models\" [2024-11] [EMNLP 2024] [[paper](https://aclanthology.org/2024.emnlp-main.678/)]\n\n46. \"From Code to Play: Benchmarking Program Search for Games Using Large Language Models\" [2024-12] [[paper](https://arxiv.org/abs/2412.04057)]\n\n47. **CoinMath**: \"CoinMath: Harnessing the Power of Coding Instruction for Math LLMs\" [2024-12] [[paper](https://arxiv.org/abs/2412.11699)]\n\n48. **MultiLingPoT**: \"MultiLingPoT: Enhancing Mathematical Reasoning with Multilingual Program Fine-tuning\" [2024-12] [[paper](https://arxiv.org/abs/2412.12609)]\n\n49. **ProgCo**: \"ProgCo: Program Helps Self-Correction of Large Language Models\" [2025-01] [[paper](https://arxiv.org/abs/2501.01264)]\n\n50. **PIE**: \"Pseudocode-Injection Magic: Enabling LLMs to Tackle Graph Computational Tasks\" [2025-01] [[paper](https://arxiv.org/abs/2501.13731)]\n\n51. **AutoCode4Math**: \"Learning Autonomous Code Integration for Math Language Models\" [2025-02] [[paper](https://arxiv.org/abs/2502.00691)]\n\n52. **MIHTCCT**: \"MIH-TCCT: Mitigating Inconsistent Hallucinations in LLMs via Event-Driven Text-Code Cyclic Training\" [2025-02] [[paper](https://arxiv.org/abs/2502.08904)]\n\n53. **ToolCoder**: \"ToolCoder: A Systematic Code-Empowered Tool Learning Framework for Large Language Models\" [2025-02] [[paper](https://arxiv.org/abs/2502.11404)]\n\n54. **RM-PoT**: \"RM-PoT: Reformulating Mathematical Problems and Solving via Program of Thoughts\" [2025-02] [[paper](https://arxiv.org/abs/2502.12589)]\n\n55. **SBSC**: \"SBSC: Step-By-Step Coding for Improving Mathematical Olympiad Performance\" [2025-02] [[paper](https://arxiv.org/abs/2502.16666)]\n\n56. \"Towards Better Understanding of Program-of-Thought Reasoning in Cross-Lingual and Multilingual Environments\" [2025-02] [[paper](https://arxiv.org/abs/2502.17956)]\n\n57. \"Code to Think, Think to Code: A Survey on Code-Enhanced Reasoning and Reasoning-Driven Code Intelligence in LLMs\" [2025-02] [[paper](https://arxiv.org/abs/2502.19411)]\n\n58. \"The KoLMogorov Test: Compression by Code Generation\" [2025-03] [[paper](https://arxiv.org/abs/2503.13992)]\n\n### 3.2 Code Simulation\n\n- \"Code Simulation Challenges for Large Language Models\" [2024-01] [[paper](https://arxiv.org/abs/2401.09074)]\n\n- \"CodeMind: A Framework to Challenge Large Language Models for Code Reasoning\" [2024-02] [[paper](https://arxiv.org/abs/2402.09664)]\n\n- \"Executing Natural Language-Described Algorithms with Large Language Models: An Investigation\" [2024-02] [[paper](https://arxiv.org/abs/2403.00795)]\n\n- \"Can Language Models Pretend Solvers? Logic Code Simulation with LLMs\" [2024-03] [[paper](https://arxiv.org/abs/2403.16097)]\n\n- \"Evaluating Large Language Models with Runtime Behavior of Program Execution\" [2024-03] [[paper](https://arxiv.org/abs/2403.16437)]\n\n- \"NExT: Teaching Large Language Models to Reason about Code Execution\" [2024-04] [ICML 2024] [[paper](https://arxiv.org/abs/2404.14662)]\n\n- \"SelfPiCo: Self-Guided Partial Code Execution with LLMs\" [2024-07] [[paper](https://arxiv.org/abs/2407.16974)]\n\n- \"Large Language Models as Code Executors: An Exploratory Study\" [2024-10] [[paper](https://arxiv.org/abs/2410.06667)]\n\n- \"VISUALCODER: Guiding Large Language Models in Code Execution with Fine-grained Multimodal Chain-of-Thought Reasoning\" [2024-10] [[paper](https://arxiv.org/abs/2410.23402)]\n\n- \"CoCoNUT: Structural Code Understanding does not fall out of a tree\" [2025-01] [[paper](https://arxiv.org/abs/2501.16456)]\n\n- \"CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction\" [2025-02] [[paper](https://arxiv.org/abs/2502.07316)]\n\n- \"SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors\" [2025-02] [[paper](https://arxiv.org/abs/2502.11167)]\n\n- \"What I cannot execute, I do not understand: Training and Evaluating LLMs on Program Execution Traces\" [2025-02] [[paper](https://arxiv.org/abs/2503.05703)]\n\n### 3.3 Code Agents\n\n1. **Self-collaboration**: \"Self-collaboration Code Generation via ChatGPT\" [2023-04] [[paper](https://arxiv.org/abs/2304.07590)]\n\n2. **ChatDev**: \"Communicative Agents for Software Development\" [2023-07] [[paper](https://arxiv.org/abs/2307.07924)] [[repo](https://github.com/OpenBMB/ChatDev)]\n\n3. **MetaGPT**: \"MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework\" [2023-08] [[paper](https://arxiv.org/abs/2308.00352)] [[repo](https://github.com/geekan/MetaGPT)]\n\n4. **CodeChain**: \"CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules\" [2023-10] [ICLR 2024] [[paper](https://arxiv.org/abs/2310.08992)]\n\n5. **CodeAgent**: \"CodeAgent: Enhancing Code Generation with Tool-Integrated Agent Systems for Real-World Repo-level Coding Challenges\" [2024-01] [ACL 2024] [[paper](https://arxiv.org/abs/2401.07339)]\n\n6. **CONLINE**: \"CoCoST: Automatic Complex Code Generation with Online Searching and Correctness Testing\" [2024-03] [EMNLP 2024] [[paper](https://arxiv.org/abs/2403.13583)]\n\n7. **LCG**: \"When LLM-based Code Generation Meets the Software Development Process\" [2024-03] [[paper](https://arxiv.org/abs/2403.15852)]\n\n8. **RepairAgent**: \"RepairAgent: An Autonomous, LLM-Based Agent for Program Repair\" [2024-03] [[paper](https://arxiv.org/abs/2403.17134)]\n\n9. **MAGIS:**: \"MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution\" [2024-03] [[paper](https://arxiv.org/abs/2403.17927)]\n\n10. **SoA**: \"Self-Organized Agents: A LLM Multi-Agent Framework toward Ultra Large-Scale Code Generation and Optimization\" [2024-04] [[paper](https://arxiv.org/abs/2404.02183)]\n\n11. **AutoCodeRover**: \"AutoCodeRover: Autonomous Program Improvement\" [2024-04] [[paper](https://arxiv.org/abs/2404.05427)]\n\n12. **SWE-agent**: \"SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering\" [2024-05] [[paper](https://arxiv.org/abs/2405.15793)]\n\n13. **MapCoder**: \"MapCoder: Multi-Agent Code Generation for Competitive Problem Solving\" [2024-05] [ACL 2024] [[paper](https://arxiv.org/abs/2405.11403)]\n\n14. \"Fight Fire with Fire: How Much Can We Trust ChatGPT on Source Code-Related Tasks?\" [2024-05] [[paper](https://arxiv.org/abs/2405.12641)]\n\n15. **FunCoder**: \"Divide-and-Conquer Meets Consensus: Unleashing the Power of Functions in Code Generation\" [2024-05] [[paper](https://arxiv.org/abs/2405.20092)]\n\n16. **CTC**: \"Multi-Agent Software Development through Cross-Team Collaboration\" [2024-06] [[paper](https://arxiv.org/abs/2406.08979)]\n\n17. **MASAI**: \"MASAI: Modular Architecture for Software-engineering AI Agents\" [2024-06] [[paper](https://arxiv.org/abs/2406.11638)]\n\n18. **AgileCoder**: \"AgileCoder: Dynamic Collaborative Agents for Software Development based on Agile Methodology\" [2024-06] [[paper](https://arxiv.org/abs/2406.11912)]\n\n19. **CodeNav**: \"CodeNav: Beyond tool-use to using real-world codebases with LLM agents\" [2024-06] [[paper](https://arxiv.org/abs/2406.12276)]\n\n20. **INDICT**: \"INDICT: Code Generation with Internal Dialogues of Critiques for Both Security and Helpfulness\" [2024-06] [[paper](https://arxiv.org/abs/2407.02518)]\n\n21. **AppWorld**: \"AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents\" [2024-07] [[paper](https://arxiv.org/abs/2407.18901)]\n\n22. **CortexCompile**: \"CortexCompile: Harnessing Cortical-Inspired Architectures for Enhanced Multi-Agent NLP Code Synthesis\" [2024-08] [[paper](https://arxiv.org/abs/2409.02938)]\n\n23. **Survey**: \"Large Language Model-Based Agents for Software Engineering: A Survey\" [2024-09] [[paper](https://arxiv.org/abs/2409.02977)]\n\n24. **PairCoder**: \"A Pair Programming Framework for Code Generation via Multi-Plan Exploration and Feedback-Driven Refinement\" [2024-09] [ASE 2024] [[paper](https://arxiv.org/abs/2409.05001)] [[repo](https://github.com/nju-websoft/PairCoder)]\n\n25. **AutoSafeCoder**: \"AutoSafeCoder: A Multi-Agent Framework for Securing LLM Code Generation through Static Analysis and Fuzz Testing\" [2024-09] [[paper](https://arxiv.org/abs/2409.10737)]\n\n26. **SuperCoder2.0**: \"SuperCoder2.0: Technical Report on Exploring the feasibility of LLMs as Autonomous Programmer\" [2024-09] [[paper](https://arxiv.org/abs/2409.11190)]\n\n27. **Survey**: \"Agents in Software Engineering: Survey, Landscape, and Vision\" [2024-09] [[paper](https://arxiv.org/abs/2409.09030)]\n\n28. **MOSS**: \"MOSS: Enabling Code-Driven Evolution and Context Management for AI Agents\" [2024-09] [[paper](https://arxiv.org/abs/2409.16120)]\n\n29. **HyperAgent**: \"HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale\" [2024-09] [[paper](https://arxiv.org/abs/2409.16299)]\n\n30. \"Compositional Hardness of Code in Large Language Models -- A Probabilistic Perspective\" [2024-09] [[paper](https://arxiv.org/abs/2409.18028)]\n\n31. **RGD**: \"RGD: Multi-LLM Based Agent Debugger via Refinement and Generation Guidance\" [2024-10] [[paper](https://arxiv.org/abs/2410.01242)]\n\n32. **Seeker**: \"Seeker: Enhancing Exception Handling in Code with LLM-based Multi-Agent Approach\" [2024-10] [[paper](https://arxiv.org/abs/2410.06949)]\n\n33. **REDO**: \"REDO: Execution-Free Runtime Error Detection for COding Agents\" [2024-10] [[paper](https://arxiv.org/abs/2410.09117)]\n\n34. \"Evaluating Software Development Agents: Patch Patterns, Code Quality, and Issue Complexity in Real-World GitHub Scenarios\" [2024-10] [[paper](https://arxiv.org/abs/2410.12468)]\n\n35. **EvoMAC**: \"Self-Evolving Multi-Agent Collaboration Networks for Software Development\" [2024-10] [[paper](https://arxiv.org/abs/2410.16946)]\n\n36. **VisionCoder**: \"VisionCoder: Empowering Multi-Agent Auto-Programming for Image Processing with Hybrid LLMs\" [2024-10] [[paper](https://arxiv.org/abs/2410.19245)]\n\n37. **AutoKaggle**: \"AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions\" [2024-10] [[paper](https://arxiv.org/abs/2410.20424)]\n\n38. **Watson**: \"Watson: A Cognitive Observability Framework for the Reasoning of Foundation Model-Powered Agents\" [2024-11] [[paper](https://arxiv.org/abs/2411.03455)]\n\n39. **CodeTree**: \"CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models\" [2024-11] [[paper](https://arxiv.org/abs/2411.04329)]\n\n40. **EvoCoder**: \"LLMs as Continuous Learners: Improving the Reproduction of Defective Code in Software Issues\" [2024-11] [[paper](https://arxiv.org/abs/2411.13941)]\n\n41. **AEGIS**: \"AEGIS: An Agent-based Framework for General Bug Reproduction from Issue Descriptions\" [2024-11] [[paper](https://arxiv.org/abs/2411.18015)]\n\n42. **ExecutionAgent**: \"You Name It, I Run It: An LLM Agent to Execute Tests of Arbitrary Projects\" [2024-12] [[paper](https://arxiv.org/abs/2412.10133)]\n\n43. **GHIssueMarket**: \"GHIssuemarket: A Sandbox Environment for SWE-Agents Economic Experimentation\" [2024-12] [[paper](https://arxiv.org/abs/2412.11722)]\n\n44. **SWE-Gym**: \"Training Software Engineering Agents and Verifiers with SWE-Gym\" [2024-12] [[paper](https://arxiv.org/abs/2412.21139)]\n\n45. **SWE-Fixer**: \"SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution\" [2025-01] [[paper](https://arxiv.org/abs/2501.05040)]\n\n46. **CodeCoR**: \"CodeCoR: An LLM-Based Self-Reflective Multi-Agent Framework for Code Generation\" [2025-01] [[paper](https://arxiv.org/abs/2501.07811)]\n\n47. **QualityFlow**: \"QualityFlow: An Agentic Workflow for Program Synthesis Controlled by LLM Quality Checks\" [2025-01] [[paper](https://arxiv.org/abs/2501.17167)]\n\n48. **Cogito**: \"Cogito, ergo sum: A Neurobiologically-Inspired Cognition-Memory-Growth System for Code Generation\" [2025-01] [[paper](https://arxiv.org/abs/2501.18653)]\n\n49. **OrcaLoca**: \"OrcaLoca: An LLM Agent Framework for Software Issue Localization\" [2025-02] [[paper](https://arxiv.org/abs/2502.00350)]\n\n50. **BRT Agent**: \"Agentic Bug Reproduction for Effective Automated Program Repair at Google\" [2025-02] [[paper](https://arxiv.org/abs/2502.01821)]\n\n51. **CodeSim**: \"CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging\" [2025-02] [[paper](https://arxiv.org/abs/2502.05664)]\n\n52. **SyncMind**: \"SyncMind: Measuring Agent Out-of-Sync Recovery in Collaborative Software Engineering\" [2025-02] [[paper](https://arxiv.org/abs/2502.06994)]\n\n53. **SoRFT**: \"SoRFT: Issue Resolving with Subtask-oriented Reinforced Fine-Tuning\" [2025-02] [[paper](https://arxiv.org/abs/2502.20127)]\n\n54. \"Is Multi-Agent Debate (MAD) the Silver Bullet? An Empirical Analysis of MAD in Code Summarization and Translation\" [2025-03] [[paper](https://arxiv.org/abs/2503.12029)]\n\n55. **DARS**: \"DARS: Dynamic Action Re-Sampling to Enhance Coding Agent Performance by Adaptive Tree Traversal\" [2025-03] [[paper](https://arxiv.org/abs/2503.14269)]\n\n56. **SEAlign**: \"SEAlign: Alignment Training for Software Engineering Agent\" [2025-03] [[paper](https://arxiv.org/abs/2503.18455)]\n\n### 3.4 Interactive Coding\n\n- \"Interactive Program Synthesis\" [2017-03] [[paper](https://arxiv.org/abs/1703.03539)]\n\n- \"Question selection for interactive program synthesis\" [2020-06] [PLDI 2020] [[paper](https://dl.acm.org/doi/10.1145/3385412.3386025)]\n\n- \"Interactive Code Generation via Test-Driven User-Intent Formalization\" [2022-08] [[paper](https://arxiv.org/abs/2208.05950)]\n\n- \"Improving Code Generation by Training with Natural Language Feedback\" [2023-03] [TMLR] [[paper](https://arxiv.org/abs/2303.16749)]\n\n- \"Self-Refine: Iterative Refinement with Self-Feedback\" [2023-03] [NeurIPS 2023] [[paper](https://arxiv.org/abs/2303.17651)]\n\n- \"Teaching Large Language Models to Self-Debug\" [2023-04] [[paper](https://arxiv.org/abs/2304.05128)]\n\n- \"Self-Edit: Fault-Aware Code Editor for Code Generation\" [2023-05] [ACL 2023] [[paper](https://arxiv.org/abs/2305.04087)]\n\n- \"LeTI: Learning to Generate from Textual Interactions\" [2023-05] [[paper](https://arxiv.org/abs/2305.10314)]\n\n- \"Is Self-Repair a Silver Bullet for Code Generation?\" [2023-06] [ICLR 2024] [[paper](https://arxiv.org/abs/2306.09896)]\n\n- \"InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback\" [2023-06] [NeurIPS 2023] [[paper](https://arxiv.org/abs/2306.14898)]\n\n- \"INTERVENOR: Prompting the Coding Ability of Large Language Models with the Interactive Chain of Repair\" [2023-11] [ACL 2024 Findings] [[paper](https://arxiv.org/abs/2311.09868)]\n\n- \"OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement\" [2024-02] [ACL 2024 Findings] [[paper](https://arxiv.org/abs/2402.14658)]\n\n- \"Iterative Refinement of Project-Level Code Context for Precise Code Generation with Compiler Feedback\" [2024-03] [ACL 2024 Findings] [[paper](https://arxiv.org/abs/2403.16792)]\n\n- \"CYCLE: Learning to Self-Refine the Code Generation\" [2024-03] [[paper](https://arxiv.org/abs/2403.18746)]\n\n- \"LLM-based Test-driven Interactive Code Generation: User Study and Empirical Evaluation\" [2024-04] [[paper](https://arxiv.org/abs/2404.10100)]\n\n- \"SOAP: Enhancing Efficiency of Generated Code via Self-Optimization\" [2024-05] [[paper](https://arxiv.org/abs/2405.15189)]\n\n- \"Code Repair with LLMs gives an Exploration-Exploitation Tradeoff\" [2024-05] [NeurIPS 2024] [[paper](https://arxiv.org/abs/2405.17503)]\n\n- \"ReflectionCoder: Learning from Reflection Sequence for Enhanced One-off Code Generation\" [2024-05] [[paper](https://arxiv.org/abs/2405.17057)]\n\n- \"Training LLMs to Better Self-Debug and Explain Code\" [2024-05] [NeurIPS 2024] [[paper](https://arxiv.org/abs/2405.18649)]\n\n- \"Requirements are All You Need: From Requirements to Code with LLMs\" [2024-06] [[paper](https://arxiv.org/abs/2406.10101)]\n\n- \"I Need Help! Evaluating LLM's Ability to Ask for Users' Support: A Case Study on Text-to-SQL Generation\" [2024-07] [EMNLP 2024] [[paper](https://arxiv.org/abs/2407.14767)]\n\n- \"An Empirical Study on Self-correcting Large Language Models for Data Science Code Generation\" [2024-08] [[paper](https://arxiv.org/abs/2408.15658)]\n\n- \"RethinkMCTS: Refining Erroneous Thoughts in Monte Carlo Tree Search for Code Generation\" [2024-09] [[paper](https://arxiv.org/abs/2409.09584)]\n\n- \"From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging\" [2024-10] [[paper](https://arxiv.org/abs/2410.01215)] [[repo](https://github.com/YerbaPage/MGDebugger)]\n\n- \"What Makes Large Language Models Reason in (Multi-Turn) Code Generation?\" [2024-10] [[paper](https://arxiv.org/abs/2410.08105)]\n\n- \"The First Prompt Counts the Most! An Evaluation of Large Language Models on Iterative Example-based Code Generation\" [2024-11] [[paper](https://arxiv.org/abs/2411.06774)]\n\n- \"Planning-Driven Programming: A Large Language Model Programming Workflow\" [2024-11] [[paper](https://arxiv.org/abs/2411.14503)]\n\n- \"ConAIR:Consistency-Augmented Iterative Interaction Framework to Enhance the Reliability of Code Generation\" [2024-11] [[paper](https://arxiv.org/abs/2411.15587)]\n\n- \"Socratic Human Feedback (SoHF): Expert Steering Strategies for LLM Code Generation\" [2024-11] [EMNLP 2024 Findings] [[paper](https://aclanthology.org/2024.findings-emnlp.908/)]\n\n- \"PerfCodeGen: Improving Performance of LLM Generated Code with Execution Feedback\" [2024-11] [[paper](https://arxiv.org/abs/2412.03578)]\n\n- \"GenX: Mastering Code and Test Generation with Execution Feedback\" [2024-12] [[paper](https://arxiv.org/abs/2412.13464)]\n\n- \"Helping LLMs Improve Code Generation Using Feedback from Testing and Static Analysis\" [2024-12] [[paper](https://arxiv.org/abs/2412.14841)]\n\n- \"Outcome-Refining Process Supervision for Code Generation\" [2024-12] [[paper](https://arxiv.org/abs/2412.15118)]\n\n- \"Tree-of-Code: A Tree-Structured Exploring Framework for End-to-End Code Generation and Execution in Complex Task Handling\" [2024-12] [[paper](https://arxiv.org/abs/2412.15305)]\n\n- \"Dynamic Scaling of Unit Tests for Code Reward Modeling\" [2025-01] [[paper](https://arxiv.org/abs/2501.01054)]\n\n- \"Revisit Self-Debugging with Self-Generated Tests for Code Generation\" [2025-01] [[paper](https://arxiv.org/abs/2501.12793)]\n\n- \"Learning to Generate Unit Tests for Automated Debugging\" [2025-02] [[paper](https://arxiv.org/abs/2502.01619)]\n\n- \"Large Language Model Guided Self-Debugging Code Generation\" [2025-02] [[paper](https://arxiv.org/abs/2502.02928)]\n\n- \"On Iterative Evaluation and Enhancement of Code Quality Using GPT-4o\" [2025-02] [[paper](https://arxiv.org/abs/2502.07399)]\n\n- \"Intention is All You Need: Refining Your Code from Your Intention\" [2025-02] [[paper](https://arxiv.org/abs/2502.08172)]\n\n- \"RefineCoder: Iterative Improving of Large Language Models via Adaptive Critique Refinement for Code Generation\" [2025-02] [[paper](https://arxiv.org/abs/2502.09183)]\n\n- \"VisPath: Automated Visualization Code Synthesis via Multi-Path Reasoning and Feedback-Driven Optimization\" [2025-02] [[paper](https://arxiv.org/abs/2502.11140)]\n\n- \"S\\*: Test Time Scaling for Code Generation\" [2025-02] [[paper](https://arxiv.org/abs/2502.14382)]\n\n- \"LLM4EFFI: Leveraging Large Language Models to Enhance Code Efficiency and Correctness\" [2025-02] [[paper](https://arxiv.org/abs/2502.18489)]\n\n- \"Multi-Turn Code Generation Through Single-Step Rewards\" [2025-02] [[paper](https://arxiv.org/abs/2502.20380)]\n\n- \"ConvCodeWorld: Benchmarking Conversational Code Generation in Reproducible Feedback Environments\" [2025-02] [[paper](https://arxiv.org/abs/2502.19852)]\n\n- \"IterPref: Focal Preference Learning for Code Generation via Iterative Debugging\" [2025-03] [[paper](https://arxiv.org/abs/2503.02783)]\n\n### 3.5 Frontend Navigation\n\n- \"MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding\" [2021-10] [ACL 2022] [[paper](https://arxiv.org/abs/2110.08518)]\n\n- \"WebKE: Knowledge Extraction from Semi-structured Web with Pre-trained Markup Language Model\" [2021-10] [CIKM 2021] [[paper](https://dl.acm.org/doi/abs/10.1145/3459637.3482491)]\n\n- \"WebGPT: Browser-assisted question-answering with human feedback\" [2021-12] [[paper](https://arxiv.org/abs/2112.09332)]\n\n- \"CM3: A Causal Masked Multimodal Model of the Internet\" [2022-01] [[paper](https://arxiv.org/abs/2201.07520)]\n\n- \"DOM-LM: Learning Generalizable Representations for HTML Documents\" [2022-01] [[paper](https://arxiv.org/abs/2201.10608)]\n\n- \"WebFormer: The Web-page Transformer for Structure Information Extraction\" [2022-02] [WWW 2022] [[paper](https://arxiv.org/abs/2202.00217)]\n\n- \"A Dataset for Interactive Vision-Language Navigation with Unknown Command Feasibility\" [2022-02] [ECCV 2022] [[paper](https://arxiv.org/abs/2202.02312)]\n\n- \"WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents\" [2022-07] [NeurIPS 2022] [[paper](https://arxiv.org/abs/2207.01206)]\n\n- \"Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding\" [2022-10] [ICML 2023] [[paper](https://arxiv.org/abs/2210.03347)]\n\n- \"Understanding HTML with Large Language Models\" [2022-10] [EMNLP 2023 findings] [[paper](https://arxiv.org/abs/2210.03945)]\n\n- \"WebUI: A Dataset for Enhancing Visual UI Understanding with Web Semantics\" [2023-01] [CHI 2023] [[paper](https://arxiv.org/abs/2301.13280)]\n\n- \"Mind2Web: Towards a Generalist Agent for the Web\" [2023-06] [NeurIPS 2023] [[paper](https://arxiv.org/abs/2306.06070)]\n\n- \"A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis\", [2023-07] [ICLR 2024] [[paper](https://arxiv.org/abs/2307.12856)]\n\n- \"WebArena: A Realistic Web Environment for Building Autonomous Agents\" [2023-07] [[paper](https://arxiv.org/abs/2307.13854)]\n\n- \"CogAgent: A Visual Language Model for GUI Agents\" [2023-12] [[paper](https://arxiv.org/abs/2312.08914)]\n\n- \"GPT-4V(ision) is a Generalist Web Agent, if Grounded\" [2024-01] [[paper](https://arxiv.org/abs/2401.01614)]\n\n- \"WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models\" [2024-01] [[paper](https://arxiv.org/abs/2401.13919)]\n\n- \"WebLINX: Real-World Website Navigation with Multi-Turn Dialogue\" [2024-02] [[paper](https://arxiv.org/abs/2402.05930)]\n\n- \"OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web\" [2024-02] [[paper](https://arxiv.org/abs/2402.17553)]\n\n- \"AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent\" [2024-04] [[paper](https://arxiv.org/abs/2404.03648)]\n\n- \"WILBUR: Adaptive In-Context Learning for Robust and Accurate Web Agents\" [2024-04] [[paper](https://arxiv.org/abs/2404.05902)]\n\n- \"AutoCrawler: A Progressive Understanding Web Agent for Web Crawler Generation\" [2024-04] [[paper](https://arxiv.org/abs/2404.12753)]\n\n- \"GUICourse: From General Vision Language Models to Versatile GUI Agents\" [2024-06] [[paper](https://arxiv.org/abs/2406.11317)]\n\n- \"NaviQAte: Functionality-Guided Web Application Navigation\" [2024-09] [[paper](https://arxiv.org/abs/2409.10741)]\n\n- \"MobileVLM: A Vision-Language Model for Better Intra- and Inter-UI Understanding\" [2024-09] [[paper](https://arxiv.org/abs/2409.14818)]\n\n- \"Multimodal Auto Validation For Self-Refinement in Web Agents\" [2024-10] [[paper](https://arxiv.org/abs/2410.00689)]\n\n- \"Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents\" [2024-10] [[paper](https://arxiv.org/abs/2410.05243)]\n\n- \"Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation\" [2024-10] [[paper](https://arxiv.org/abs/2410.13232)]\n\n- \"Harnessing Webpage UIs for Text-Rich Visual Understanding\" [2024-10] [[paper](https://arxiv.org/abs/2410.13824)]\n\n- \"AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents\" [2024-10] [[paper](https://arxiv.org/abs/2410.13825)]\n\n- \"Beyond Browsing: API-Based Web Agents\" [2024-10] [[paper](https://arxiv.org/abs/2410.16464)]\n\n- \"Large Language Models Empowered Personalized Web Agents\" [2024-10] [[paper](https://arxiv.org/abs/2410.17236)]\n\n- \"AdvWeb: Controllable Black-box Attacks on VLM-powered Web Agents\" [2024-10] [[paper](https://arxiv.org/abs/2410.17401)]\n\n- \"Auto-Intent: Automated Intent Discovery and Self-Exploration for Large Language Model Web Agents\" [2024-10] [[paper](https://arxiv.org/abs/2410.22552)]\n\n- \"OS-ATLAS: A Foundation Action Model for Generalist GUI Agents\" [2024-10] [[paper](https://arxiv.org/abs/2410.23218)]\n\n- \"From Context to Action: Analysis of the Impact of State Representation and Context on the Generalization of Multi-Turn Web Navigation Agents\" [2024-10] [[paper](https://arxiv.org/abs/2410.23555)]\n\n- \"AutoGLM: Autonomous Foundation Agents for GUIs\" [2024-10] [[paper](https://arxiv.org/abs/2411.00820)]\n\n- \"WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning\" [2024-11] [[paper](https://arxiv.org/abs/2411.02337)]\n\n- \"The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use\" [2024-11] [[paper](https://arxiv.org/abs/2411.10323)]\n\n- \"ScribeAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data\" [2024-11] [[paper](https://arxiv.org/abs/2411.15004)]\n\n- \"ShowUI: One Vision-Language-Action Model for GUI Visual Agent\" [2024-11] [[paper](https://arxiv.org/abs/2411.17465)]\n\n- \"Large Language Model-Brained GUI Agents: A Survey\" [2024-11] [[paper](https://arxiv.org/abs/2411.18279)]\n\n- \"Free your mouse! Command Large Language Models to Generate Code to Format Word Documents\" [2024-11] [EMNLP 2024] [[paper](https://aclanthology.org/2024.emnlp-main.902/)]\n\n- \"Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction\" [2024-12] [[paper](https://arxiv.org/abs/2412.04454)]\n\n- \"Falcon-UI: Understanding GUI Before Following User Instructions\" [2024-12] [[paper](https://arxiv.org/abs/2412.09362)]\n\n- \"WEPO: Web Element Preference Optimization for LLM-based Web Navigation\" [2024-12] [[paper](https://arxiv.org/abs/2412.10742)]\n\n- \"AutoDroid-V2: Boosting SLM-based GUI Agents via Code Generation\" [2024-12] [[paper](https://arxiv.org/abs/2412.18116)]\n\n- \"Beyond Pass or Fail: A Multi-dimensional Benchmark for Mobile UI Navigation\" [2025-01] [[paper](https://arxiv.org/abs/2501.02863)]\n\n- \"WebWalker: Benchmarking LLMs in Web Traversal\" [2025-01] [[paper](https://arxiv.org/abs/2501.07572)]\n\n- \"GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration\" [2025-01] [[paper](https://arxiv.org/abs/2501.13896)]\n\n- \"UI-TARS: Pioneering Automated GUI Interaction with Native Agents\" [2025-01] [[paper](https://arxiv.org/abs/2501.12326)]\n\n- \"Smoothing Grounding and Reasoning for MLLM-Powered GUI Agents with Query-Oriented Pivot Tasks\" [2025-03] [[paper](https://arxiv.org/abs/2503.00401)]\n\n## 4. Code LLM for Low-Resource, Low-Level, and Domain-Specific Languages\n\n- [**Ruby**] \"On the Transferability of Pre-trained Language Models for Low-Resource Programming Languages\" [2022-04] [ICPC 2022] [[paper](https://arxiv.org/abs/2204.09653)]\n\n- [**Verilog**] \"Benchmarking Large Language Models for Automated Verilog RTL Code Generation\" [2022-12] [DATE 2023] [[paper](https://arxiv.org/abs/2212.11140)]\n\n- [**OCL**] \"On Codex Prompt Engineering for OCL Generation: An Empirical Study\" [2023-03] [MSR 2023] [[paper](https://arxiv.org/abs/2303.16244)]\n\n- [**Ansible-YAML**] \"Automated Code generation for Information Technology Tasks in YAML through Large Language Models\" [2023-05] [DAC 2023] [[paper](https://arxiv.org/abs/2305.02783)]\n\n- [**Hansl**] \"The potential of LLMs for coding with low-resource and domain-specific programming languages\" [2023-07] [[paper](https://arxiv.org/abs/2307.13018)]\n\n- [**Verilog**] \"VeriGen: A Large Language Model for Verilog Code Generation\" [2023-07] [[paper](https://arxiv.org/abs/2308.00708)]\n\n- [**Verilog**] \"RTLLM: An Open-Source Benchmark for Design RTL Generation with Large Language Model\" [2023-08] [[paper](https://arxiv.org/abs/2308.05345)]\n\n- [**Racket, OCaml, Lua, R, Julia**] \"Knowledge Transfer from High-Resource to Low-Resource Programming Languages for Code LLMs\" [2023-08] [[paper](https://arxiv.org/abs/2308.09895)]\n\n- [**Verilog**] \"VerilogEval: Evaluating Large Language Models for Verilog Code Generation\" [2023-09] [ICCAD 2023] [[paper](https://arxiv.org/abs/2309.07544)]\n\n- [**Verilog**] \"RTLFixer: Automatically Fixing RTL Syntax Errors with Large Language Models\" [2023-11] [[paper](https://arxiv.org/abs/2311.16543)]\n\n- [**Verilog**] \"Advanced Large Language Model (LLM)-Driven Verilog Development: Enhancing Power, Performance, and Area Optimization in Code Synthesis\" [2023-12] [[paper](https://arxiv.org/abs/2312.01022)]\n\n- [**Verilog**] \"RTLCoder: Outperforming GPT-3.5 in Design RTL Generation with Our Open-Source Dataset and Lightweight Solution\" [2023-12] [[paper](https://arxiv.org/abs/2312.08617)]\n\n- [**Verilog**] \"BetterV: Controlled Verilog Generation with Discriminative Guidance\" [2024-02] [ICML 2024] [[paper](https://arxiv.org/abs/2402.03375)]\n\n- [**R**] \"Empirical Studies of Parameter Efficient Methods for Large Language Models of Code and Knowledge Transfer to R\" [2024-03] [[paper](https://arxiv.org/abs/2405.01553)]\n\n- [**Haskell**] \"Investigating the Performance of Language Models for Completing Code in Functional Programming Languages: a Haskell Case Study\" [2024-03] [[paper](https://arxiv.org/abs/2403.15185)]\n\n- [**Verilog**] \"A Multi-Expert Large Language Model Architecture for Verilog Code Generation\" [2024-04] [[paper](https://arxiv.org/abs/2404.08029)]\n\n- [**Verilog**] \"CreativEval: Evaluating Creativity of LLM-Based Hardware Code Generation\" [2024-04] [[paper](https://arxiv.org/abs/2404.08806)]\n\n- [**Alloy**] \"An Empirical Evaluation of Pre-trained Large Language Models for Repairing Declarative Formal Specifications\" [2024-04] [[paper](https://arxiv.org/abs/2404.11050)]\n\n- [**Verilog**] \"Evaluating LLMs for Hardware Design and Test\" [2024-04] [[paper](https://arxiv.org/abs/2405.02326)]\n\n- [**Kotlin, Swift, and Rust**] \"Software Vulnerability Prediction in Low-Resource Languages: An Empirical Study of CodeBERT and ChatGPT\" [2024-04] [[paper](https://arxiv.org/abs/2404.17110)]\n\n- [**Verilog**] \"MEIC: Re-thinking RTL Debug Automation using LLMs\" [2024-05] [[paper](https://arxiv.org/abs/2405.06840)]\n\n- [**Bash**] \"Tackling Execution-Based Evaluation for NL2Bash\" [2024-05] [[paper](https://arxiv.org/abs/2405.06807)]\n\n- [**Fortran, Julia, Matlab, R, Rust**] \"Evaluating AI-generated code for C++, Fortran, Go, Java, Julia, Matlab, Python, R, and Rust\" [2024-05] [[paper](https://arxiv.org/abs/2405.13101)]\n\n- [**OpenAPI**] \"Optimizing Large Language Models for OpenAPI Code Completion\" [2024-05] [[paper](https://arxiv.org/abs/2405.15729)]\n\n- [**Kotlin**] \"Kotlin ML Pack: Technical Report\" [2024-05] [[paper](https://arxiv.org/abs/2405.19250)]\n\n- [**UCLID5**] \"Synthetic Programming Elicitation for Text-to-Code in Very Low-Resource Programming and Formal Languages\" [2024-06] [NeurIPS 2024] [[paper](https://arxiv.org/abs/2406.03636)]\n\n- [**Verilog**] \"VerilogReader: LLM-Aided Hardware Test Generation\" [2024-06] [[paper](https://arxiv.org/abs/2406.04373)]\n\n- \"Benchmarking Generative Models on Computational Thinking Tests in Elementary Visual Programming\" [2024-06] [[paper](https://arxiv.org/abs/2406.09891)]\n\n- [**Logo**] \"Program Synthesis Benchmark for Visual Programming in XLogoOnline Environment\" [2024-06] [[paper](https://arxiv.org/abs/2406.11334)]\n\n- [**Ansible YAML, Bash**] \"DocCGen: Document-based Controlled Code Generation\" [2024-06] [EMNLP 2024] [[paper](https://arxiv.org/abs/2406.11925)]\n\n- [**Qiskit**] \"Qiskit HumanEval: An Evaluation Benchmark For Quantum Code Generative Models\" [2024-06] [[paper](https://arxiv.org/abs/2406.14712)]\n\n- [**Perl, Golang, Swift**] \"DistiLRR: Transferring Code Repair for Low-Resource Programming Languages\" [2024-06] [[paper](https://arxiv.org/abs/2406.14867)]\n\n- [**Verilog**] \"AssertionBench: A Benchmark to Evaluate Large-Language Models for Assertion Generation\" [2024-06] [[paper](https://arxiv.org/abs/2406.18627)]\n\n- \"A Comparative Study of DSL Code Generation: Fine-Tuning vs. Optimized Retrieval Augmentation\" [2024-07] [[paper](https://arxiv.org/abs/2407.02742)]\n\n- [**Json, XLM, YAML**] \"ConCodeEval: Evaluating Large Language Models for Code Constraints in Domain-Specific Languages\" [2024-07] [[paper](https://arxiv.org/abs/2407.03387)]\n\n- [**Verilog**] \"AutoBench: Automatic Testbench Generation and Evaluation Using LLMs for HDL Design\" [2024-07] [[paper](https://arxiv.org/abs/2407.03891)]\n\n- [**Verilog**] \"CodeV: Empowering LLMs for Verilog Generation through Multi-Level Summarization\" [2024-07] [[paper](https://arxiv.org/abs/2407.10424)]\n\n- [**Verilog**] \"ITERTL: An Iterative Framework for Fine-tuning LLMs for RTL Code Generation\" [2024-07] [[paper](https://arxiv.org/abs/2407.12022)]\n\n- [**Verilog**] \"OriGen:Enhancing RTL Code Generation with Code-to-Code Augmentation and Self-Reflection\" [2024-07] [[paper](https://arxiv.org/abs/2407.16237)]\n\n- [**Verilog**] \"Large Language Model for Verilog Generation with Golden Code Feedback\" [2024-07] [[paper](https://arxiv.org/abs/2407.18271)]\n\n- [**Verilog**] \"AutoVCoder: A Systematic Framework for Automated Verilog Code Generation using LLMs\" [2024-07] [[paper](https://arxiv.org/abs/2407.18333)]\n\n- [**RPA**] \"Plan with Code: Comparing approaches for robust NL to DSL generation\" [2024-08] [[paper](https://arxiv.org/abs/2408.08335)]\n\n- [**Verilog**] \"VerilogCoder: Autonomous Verilog Coding Agents with Graph-based Planning and Abstract Syntax Tree (AST)-based Waveform Tracing Tool\" [2024-08] [[paper](https://arxiv.org/abs/2408.08927)]\n\n- [**Verilog**] \"Revisiting VerilogEval: Newer LLMs, In-Context Learning, and Specification-to-RTL Tasks\" [2024-08] [[paper](https://arxiv.org/abs/2408.11053)]\n\n- [**MaxMSP, Web Audio**] \"Benchmarking LLM Code Generation for Audio Programming with Visual Dataflow Languages\" [2024-09] [[paper](https://arxiv.org/abs/2409.00856)]\n\n- [**Verilog**] \"RTLRewriter: Methodologies for Large Models aided RTL Code Optimization\" [2024-09] [[paper](https://arxiv.org/abs/2409.11414)]\n\n- [**Verilog**] \"CraftRTL: High-quality Synthetic Data Generation for Verilog Code Models with Correct-by-Construction Non-Textual Representations and Targeted Code Repair\" [2024-09] [[paper](https://arxiv.org/abs/2409.12993)]\n\n- [**Bash**] \"ScriptSmith: A Unified LLM Framework for Enhancing IT Operations via Automated Bash Script Generation, Assessment, and Refinement\" [2024-09] [[paper](https://arxiv.org/abs/2409.17166)]\n\n- [**Survey**] \"Survey on Code Generation for Low resource and Domain Specific Programming Languages\" [2024-10] [[paper](https://arxiv.org/abs/2410.03981)]\n\n- [**R**] \"Do Current Language Models Support Code Intelligence for R Programming Language?\" [2024-10] [[paper](https://arxiv.org/abs/2410.07793)]\n\n- [**PLC**] \"Agents4PLC: Automating Closed-loop PLC Code Generation and Verification in Industrial Control Systems using LLM-based Agents\" [2024-10] [[paper](https://arxiv.org/abs/2410.14209)]\n\n- [**Lua**] \"Evaluating Quantized Large Language Models for Code Generation on Low-Resource Language Benchmarks\" [2024-10] [[paper](https://arxiv.org/abs/2410.14766)]\n\n- \"Improving Parallel Program Performance Through DSL-Driven Code Generation with LLM Optimizers\" [2024-10] [[paper](https://arxiv.org/abs/2410.15625)]\n\n- [**R, D, Racket, Bash**]: \"Bridge-Coder: Unlocking LLMs' Potential to Overcome Language Gaps in Low-Resource Code\" [2024-10] [[paper](https://arxiv.org/abs/2410.18957)]\n\n- [**SPICE**]: \"SPICEPilot: Navigating SPICE Code Generation and Simulation with AI Guidance\" [2024-10] [[paper](https://arxiv.org/abs/2410.20553)]\n\n- [**IEC 61131-3 ST**]: \"Training LLMs for Generating IEC 61131-3 Structured Text with Online Feedback\" [2024-10] [[paper](https://arxiv.org/abs/2410.22159)]\n\n- [**Verilog**] \"MetRex: A Benchmark for Verilog Code Metric Reasoning Using LLMs\" [2024-11] [[paper](https://arxiv.org/abs/2411.03471)]\n\n- [**Verilog**] \"CorrectBench: Automatic Testbench Generation with Functional Self-Correction using LLMs for HDL Design\" [2024-11] [[paper](https://arxiv.org/abs/2411.08510)]\n\n- [**MUMPS, ALC**] \"Leveraging LLMs for Legacy Code Modernization: Challenges and Opportunities for LLM-Generated Documentation\" [2024-11] [[paper](https://arxiv.org/abs/2411.14971)]\n\n- [**Power Query M, OfficeScript, Excel formulas**] \"RAR: Retrieval-augmented retrieval for code generation in low resource languages\" [2024-11] [EMNLP 2024] [[paper](https://aclanthology.org/2024.emnlp-main.1199/)]\n\n- [**ST**] \"A Multi-Agent Framework for Extensible Structured Text Generation in PLCs\" [2024-12] [[paper](https://arxiv.org/abs/2412.02410)]\n\n- [**Verilog**] \"PromptV: Leveraging LLM-powered Multi-Agent Prompting for High-quality Verilog Generation\" [2024-12] [[paper](https://arxiv.org/abs/2412.11014)]\n\n- [**HPC**] \"HPC-Coder-V2: Studying Code LLMs Across Low-Resource Parallel Languages\" [2024-12] [[paper](https://arxiv.org/abs/2412.15178)]\n\n- [**Verilog**] \"RTLSquad: Multi-Agent Based Interpretable RTL Design\" [2025-01] [[paper](https://arxiv.org/abs/2501.05470)]\n\n- [**G**] \"GLLM: Self-Corrective G-Code Generation using Large Language Models with User Feedback\" [2025-01] [[paper](https://arxiv.org/abs/2501.17584)]\n\n- [**Julia, Lua, R, Racket**] \"Enhancing Code Generation for Low-Resource Languages: No Silver Bullet\" [2025-01] [[paper](https://arxiv.org/abs/2501.19085)]\n\n- [**F***] \"Building A Proof-Oriented Programmer That Is 64% Better Than GPT-4o Under Data Scarsity\" [2025-02] [[paper](https://arxiv.org/abs/2502.11901)]\n\n- \"Exploring Code Language Models for Automated HLS-based Hardware Generation: Benchmark, Infrastructure and Analysis\" [2025-02] [ASP-DAC 2025] [[paper](https://arxiv.org/abs/2502.13921)]\n\n- [**Alloy***] \"On the Effectiveness of Large Language Models in Writing Alloy Formulas\" [2025-02] [[paper](https://arxiv.org/abs/2502.15441)]\n\n- [**Solidity**] \"SolEval: Benchmarking Large Language Models for Repository-level Solidity Code Generation\" [2025-02] [[paper](https://arxiv.org/abs/2502.18793)]\n\n- [**PennyLane**] \"PennyLang: Pioneering LLM-Based Quantum Code Generation with a Novel PennyLane-Centric Dataset\" [2025-03] [[paper](https://arxiv.org/abs/2503.02497)]\n\n- [**Verilog**] \"VeriMind: Agentic LLM for Automated Verilog Generation with a Novel Evaluation Metric\" [2025-03] [[paper](https://arxiv.org/abs/2503.16514)]\n\n- [**Modelica**] \"ModiGen: A Large Language Model-Based Workflow for Multi-Task Modelica Code Generation\" [2025-03] [[paper](https://arxiv.org/abs/2503.18460)]\n\n- [**Excel**] \"Synthetic Function Demonstrations Improve Generation in Low-Resource Programming Languages\" [2025-03] [[paper](https://arxiv.org/abs/2503.18760)]\n\n## 5. Methods/Models for Downstream Tasks\n\nFor each task, the first column contains non-neural methods (e.g. n-gram, TF-IDF, and (occasionally) static program analysis); the second column contains non-Transformer neural methods (e.g. LSTM, CNN, GNN); the third column contains Transformer based methods (e.g. BERT, GPT, T5).\n\n\u003cp align='center'\u003e\n\u003cimg src='imgs/downstream-1.png' style='width: 100%; '\u003e\n\u003cimg src='imgs/downstream-2.png' style='width: 100%; '\u003e\n\u003cimg src='imgs/downstream-3.png' style='width: 100%; '\u003e\n\u003cimg src='imgs/downstream-4.png' style='width: 100%; '\u003e\n\u003cimg src='imgs/downstream-5.png' style='width: 100%; '\u003e\n\u003c/p\u003e\n\n### Code Generation\n\n- \"Enhancing Large Language Models in Coding Through Multi-Perspective Self-Consistency\" [2023-09] [ACL 2024] [[paper](https://arxiv.org/abs/2309.17272)]\n\n- \"Self-Infilling Code Generation\" [2023-11] [ICML 2024] [[paper](https://arxiv.org/abs/2311.17972)]\n\n- \"JumpCoder: Go Beyond Autoregressive Coder via Online Modification\" [2024-01] [ACL 2024] [[paper](https://arxiv.org/abs/2401.07870)]\n\n- \"The Larger the Better? Improved LLM Code-Generation via Budget Reallocation\" [2024-03] [[paper](https://arxiv.org/abs/2404.00725)]\n\n- \"Quantifying Contamination in Evaluating Code Generation Capabilities of Language Models\" [2024-03] [ACL 2024] [[paper](https://arxiv.org/abs/2403.04811)]\n\n- \"Comments as Natural Logic Pivots: Improve Code Generation via Comment Perspective\" [2024-04] [ACL 2024 Findings] [[paper](https://arxiv.org/abs/2404.07549)]\n\n- \"Distilling Algorithmic Reasoning from LLMs via Explaining Solution Programs\" [2024-04] [[paper](https://arxiv.org/abs/2404.08148)]\n\n- \"Quality Assessment of Prompts Used in Code Generation\" [2024-04] [[paper](https://arxiv.org/abs/2404.10155)]\n\n- \"Model Cascading for Code: Reducing Inference Costs with Model Cascading for LLM Based Code Generation\" [2024-05] [[paper](https://arxiv.org/abs/2405.15842)]\n\n- \"A Survey on Large Language Models for Code Generation\" [2024-06] [[paper](https://arxiv.org/abs/2406.00515)]\n\n- \"Is Programming by Example solved by LLMs?\" [2024-06] [NeurIPS 2024] [[paper](https://arxiv.org/abs/2406.08316)]\n\n- \"Benchmarks and Metrics for Evaluations of Code Generation: A Critical Review\" [2024-06] [[paper](https://arxiv.org/abs/2406.12655)]\n\n- \"MPCODER: Multi-user Personalized Code Generator with Explicit and Implicit Style Representation Learning\" [2024-06] [ACL 2024] [[paper](https://arxiv.org/abs/2406.17255)]\n\n- \"Revisiting the Impact of Pursuing Modularity for Code Generation\" [2024-07] [EMNLP 2024 Findings] [[paper](https://arxiv.org/abs/2407.11406)]\n\n- \"Evaluating Long Range Dependency Handling in Code Generation Models using Multi-Step Key Retrieval\" [2024-07] [[paper](https://arxiv.org/abs/2407.21049)]\n\n- \"When to Stop? Towards Efficient Code Generation in LLMs with Excess Token Prevention\" [2024-07] [[paper](https://arxiv.org/abs/2407.20042)]\n\n- \"Assessing Programming Task Difficulty for Efficient Evaluation of Large Language Models\" [2024-07] [[paper](https://arxiv.org/abs/2407.21227)]\n\n- \"ArchCode: Incorporating Software Requirements in Code Generation with Large Language Models\" [2024-08] [ACL 2024] [[paper](https://www.arxiv.org/abs/2408.00994)]\n\n- \"Fine-tuning Language Models for Joint Rewriting and Completion of Code with Potential Bugs\" [2024-08] [ACL 2024 Findings] [[paper](https://aclanthology.org/2024.findings-acl.938/)]\n\n- \"Selective Prompt Anchoring for Code Generation\" [2024-08] [[paper](https://arxiv.org/abs/2408.09121)]\n\n- \"Bridging the Language Gap: Enhancing Multilingual Prompt-Based Code Generation in LLMs via Zero-Shot Cross-Lingual Transfer\" [2024-08] [[paper](https://arxiv.org/abs/2408.09701)]\n\n- \"No Man is an Island: Towards Fully Automatic Programming by Code Search, Code Generation and Program Repair\" [2024-09] [[paper](https://arxiv.org/abs/2409.03267)]\n\n- \"Planning In Natural Language Improves LLM Search For Code Generation\" [2024-09] [[paper](https://arxiv.org/abs/2409.03733)]\n\n- \"Multi-Programming Language Ensemble for Code Generation in Large Language Model\" [2024-09] [[paper](https://arxiv.org/abs/2409.04114)]\n\n- \"USCD: Improving Code Generation of LLMs by Uncertainty-Aware Selective Contrastive Decoding\" [2024-09] [[paper](https://arxiv.org/abs/2409.05923)]\n\n- \"Eliciting Instruction-tuned Code Language Models' Capabilities to Utilize Auxiliary Function for Code Generation\" [2024-09] [EMNLP 2024 Findings] [[paper](https://arxiv.org/abs/2409.13928)]\n\n- \"Selection of Prompt Engineering Techniques for Code Generation through Predicting Code Complexity\" [2024-09] [[paper](https://arxiv.org/abs/2409.16416)]\n\n- \"Horizon-Length Prediction: Advancing Fill-in-the-Middle Capabilities for Code Generation with Lookahead Planning\" [2024-10] [[paper](https://arxiv.org/abs/2410.03103)]\n\n- \"Showing LLM-Generated Code Selectively Based on Confidence of LLMs\" [2024-10] [[paper](https://arxiv.org/abs/2410.03234)]\n\n- \"AutoFeedback: An LLM-based Framework for Efficient and Accurate API Request Generation\" [2024-10] [[paper](https://arxiv.org/abs/2410.06943)]\n\n- \"Enhancing LLM Agents for Code Generation with Possibility and Pass-rate Prioritized Experience Replay\" [2024-10] [[paper](https://arxiv.org/abs/2410.12236)]\n\n- \"From Solitary Directives to Interactive Encouragement! LLM Secure Code Generation by Natural Language Prompting\" [2024-10] [[paper](https://arxiv.org/abs/2410.14321)]\n\n- \"Self-Explained Keywords Empower Large Language Models for Code Generation\" [2024-10] [[paper](https://arxiv.org/abs/2410.15966)]\n\n- \"Context-Augmented Code Generation Using Programming Knowledge Graphs\" [2024-10] [[paper](https://arxiv.org/abs/2410.18251)]\n\n- \"In-Context Code-Text Learning for Bimodal Software Engineering\" [2024-10] [[paper](https://arxiv.org/abs/2410.18107)]\n\n- \"Combining LLM Code Generation with Formal Specifications and Reactive Program Synthesis\" [2024-10] [[paper](https://arxiv.org/abs/2410.19736)]\n\n- \"Less is More: DocString Compression in Code Generation\" [2024-10] [[paper](https://arxiv.org/abs/2410.22793)]\n\n- \"Multi-Programming Language Sandbox for LLMs\" [2024-10] [[paper](https://arxiv.org/abs/2410.23074)]\n\n- \"Personality-Guided Code Generation Using Large Language Models\" [2024-10] [[paper](https://arxiv.org/abs/2411.00006)]\n\n- \"Scattered Forest Search: Smarter Code Space Exploration with LLMs\" [2024-11] [[paper](https://arxiv.org/abs/2411.05010)]\n\n- \"Anchor Attention, Small Cache: Code Generation with Large Language Models\" [2024-11] [[paper](https://arxiv.org/abs/2411.06680)]\n\n- \"ROCODE: Integrating Backtracking Mechanism and Program Analysis in Large Language Models for Code Generation\" [2024-11] [[paper](https://arxiv.org/abs/2411.07112)]\n\n- \"SRA-MCTS: Self-driven Reasoning Aurmentation with Monte Carlo Tree Search for Enhanced Code Generation\" [2024-11] [[paper](https://arxiv.org/abs/2411.11053)]\n\n- \"Language-to-Code Translation with a Single Labeled Example\" [2024-11] [EMNLP 2024] [[paper](https://aclanthology.org/2024.emnlp-main.462/)]\n\n- \"VeCoGen: Automating Generation of Formally Verified C Code with Large Language Models\" [2024-11] [[paper](https://arxiv.org/abs/2411.19275)]\n\n- \"Seed-CTS: Unleashing the Power of Tree Search for Superior Performance in Competitive Coding Tasks\" [2024-12] [[paper](https://arxiv.org/abs/2412.12544)]\n\n- \"LoGFiLM: Fine-Tuning A Large Language Model for Automated Generation of Log Statements\" [2024-12] [[paper](https://arxiv.org/abs/2412.18835)]\n\n- \"The Impact of Prompt Programming on Function-Level Code Generation\" [2024-12] [[paper](https://arxiv.org/abs/2412.20545)]\n\n- \"Leveraging Metamemory Mechanisms for Enhanced Data-Free Code Generation in LLMs\" [2025-01] [[paper](https://arxiv.org/abs/2501.07892)]\n\n- \"GREEN-CODE: Optimizing Energy Efficiency in Large Language Models for Code Generation\" [2025-01] [[paper](https://arxiv.org/abs/2501.11006)]\n\n- \"Chain of Grounded Objectives: Bridging Process and Goal-oriented Prompting for Code Generation\" [2025-01] [[paper](https://arxiv.org/abs/2501.13978)]\n\n- \"CoCoEvo: Co-Evolution of Programs and Test Cases to Enhance Code Generation\" [2025-02] [[paper](https://arxiv.org/abs/2502.10802)]\n\n- \"Boost, Disentangle, and Customize: A Robust System2-to-System1 Pipeline for Code Generation\" [2025-02] [[paper](https://arxiv.org/abs/2502.12492)]\n\n- \"Learning to Solve and Verify: A Self-Play Framework for Code and Test Generation\" [2025-02] [[paper](https://arxiv.org/abs/2502.14948)]\n\n- \"Thinking Before Running! Efficient Code Generation with Thorough Exploration and Optimal Refinement\" [2025-02] [[paper](https://arxiv.org/abs/2502.17442)]\n\n- \"Grammar-Based Code Representation: Is It a Worthy Pursuit for LLMs?\" [2025-03] [[paper](https://arxiv.org/abs/2503.05507)]\n\n- \"Enhancing High-Quality Code Generation in Large Language Models with Comparative Prefix-Tuning\" [2025-03] [[paper](https://arxiv.org/abs/2503.09020)]\n\n- \"Modularization is Better: Effective Code Generation with Modular Prompting\" [2025-03] [[paper](https://arxiv.org/abs/2503.12483)]\n\n- \"A Semantic-based Optimization Approach for Repairing LLMs: Case Study on Code Generation\" [2025-03] [[paper](https://arxiv.org/abs/2503.12899)]\n\n- \"Uncertainty-Guided Chain-of-Thought for Code Generation with LLMs\" [2025-03] [[paper](https://arxiv.org/abs/2503.15341)]\n\n### Code RAG\n\n- \"CodeGRAG: Extracting Composed Syntax Graphs for Retrieval Augmented Cross-Lingual Code Generation\" [2024-05] [[paper](https://arxiv.org/abs/2405.02355)]\n\n- \"Prompt-based Code Completion via Multi-Retrieval Augmented Generation\" [2024-05] [[paper](https://arxiv.org/abs/2405.07530)]\n\n- \"A Lightweight Framework for Adaptive Retrieval In Code Completion With Critique Model\" [2024-06] [[papaer](https://arxiv.org/abs/2406.10263)]\n\n- \"Preference-Guided Refactored Tuning for Retrieval Augmented Code Generation\" [2024-09] [[paper](https://arxiv.org/abs/2409.15895)]\n\n- \"Building A Coding Assistant via the Retrieval-Augmented Language Model\" [2024-10] [[paper](ht","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcodefuse-ai%2FAwesome-Code-LLM","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcodefuse-ai%2FAwesome-Code-LLM","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcodefuse-ai%2FAwesome-Code-LLM/lists"}