awesome-ai4code
A collection of recent papers, benchmarks and datasets of AI4Code domain.
https://github.com/bdqnghi/awesome-ai4code
Last synced: about 8 hours ago
JSON representation
-
Tools/Products
-
AI code completion tools
-
More General Coding Assistants
-
ChatGPT in your editor
-
LLM-powered natural language compilers
-
-
Academic
-
Conferences
- Automated Software Engineering (ASE)
- Programming Language Design and Implementation (PLDI)
- International Conference on Learning Representation (ICLR)
- Empirical Methods in Natural Language Processing (EMNLP)
- North American Chapter of the Association for Computational Linguistics (NAACL)
- Annual Meeting of the Association for Computational Linguistics (ACL)
- Interational Conference on Software Engineering (ICSE)
- Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE)
- Interational Conference in Machine Learning (ICML)
- International Conference on Neural Information Processing Systems(NeurIPS)
- Interational Conference on Software Engineering (ICSE)
-
Papers (This list is a bit outdated, need to update)
- Large Language Models of Code Fail at Completing Code with Potential Bugs - Tuan Dinh, Jinman Zhao, Samson Tan, Renato Negrinho, Leonard Lausen, Sheng Zha, George Karypis.
- Large Language Models Meet NL2Code: A Survey - Daoguang Zan, Bei Chen, Fengji Zhang, Dianjie Lu, Bingchao Wu, Bei Guan, Yongji Wang, Jian-Guang Lou (EMNLP 2023)
- RepoFusion: Training Code Models to Understand Your Repository - Disha Shrivastava, Denis Kocetkov, Harm de Vries, Dzmitry Bahdanau, Torsten Scholak
- XCODEEVAL: An Execution-based Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval - Mohammad Abdullah Matin Khan, M Saiful Bari, Xuan Long Do, Weishi Wang, Md Rizwan Parvez, Shafiq Joty
- Large Language Models of Code Fail at Completing Code with Potential Bugs - Tuan Dinh, Jinman Zhao, Samson Tan, Renato Negrinho, Leonard Lausen, Sheng Zha, George Karypis.
- Large Language Models Meet NL2Code: A Survey - Daoguang Zan, Bei Chen, Fengji Zhang, Dianjie Lu, Bingchao Wu, Bei Guan, Yongji Wang, Jian-Guang Lou (EMNLP 2023)
- XCODEEVAL: An Execution-based Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval - Mohammad Abdullah Matin Khan, M Saiful Bari, Xuan Long Do, Weishi Wang, Md Rizwan Parvez, Shafiq Joty
- RepoFusion: Training Code Models to Understand Your Repository - Disha Shrivastava, Denis Kocetkov, Harm de Vries, Dzmitry Bahdanau, Torsten Scholak
-
-
Pretrained CodeLLMs
-
Papers (This list is a bit outdated, need to update)
- CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation - Yue Wang, Weishi Wang, Shafiq Joty, Steven C.H. Hoi (EMNLP 2021) (***CodeT5***).
- CodeBERT:A Pre-Trained Model for Programming and Natural Language - Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, Ming Zhou (EMNLP 2020 Findings) (***CodeBERT***).
- Learning and Evaluating Contextual Embedding of Source Code - Aditya Kanade, Petros Maniatis, Gogul Balakrishnan, Kensen Shi. (ICML 2020) (***CuBERT***).
- Unsupervised Translation of Programming Languages - Marie-Anne Lachaux, Baptiste Roziere, Lowik Chanussot, Guillaume Lample (NeurIPS 2020) (***Transcoder***).
- Contrastive Code Representation Learning
- CoTexT: Multi-task Learning with Code-Text Transformer
- How could Neural Networks understand Programs? - Yan Liu (ICML 2021) (***OSCAR***)
- Unified Pre-training for Program Understanding and Generation - Wei Chang (NAACL 2021) (***PLBART***).
- Exploring Software Naturalness through Neural Language Models - BERT***).
- PYMT5: multi-mode translation of natural language and PYTHON code with transformers
- DOBF: A Deobfuscation Pre-Training Objective for Programming Languages - Anne Lachaux, Marc Szafraniec, Guillaume Lample, (arXiv 2021) (***DOBF***).
- Studying the Usage of Text-To-Text Transfer Transformer to Support Code-Related Tasks
- Disentangled Code Representation Learning for Multiple Programming Languages - Fingings 2021) (***CODEDISEN***).
- SYNCOBERT: Syntax-Guided Multi-Modal Contrastive Pre-Training for Code Representation
- TreeBERT: A Tree-Based Pre-Trained Model for Programming Language
- Empirical Study of Transformers for Source Code
- GraphCodeBERT: Pre-training Code Representations with Data Flow - Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Alexey Svyatkovskiy, Shengyu Fu, Michele Tufano, Shao Kun Deng, Colin Clement, Dawn Drain, Neel Sundaresan, Jian Yin, Daxin Jiang, Ming Zhou (ICLR 2021) (***GraphCodeBERT***).
- CodeTrans: Towards Cracking the Language of Silicone’s Code Through Self-Supervised Deep Learning and High Performance Computing
- Self-Supervised Learning for Code Retrieval and Summarization through Semantic-Preserving Program Transformations - Nghi D. Q. BUI, Yijun YU, Lingxiao JIANG (SIGIR 2021) (***Corder***).
- CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation - Yue Wang, Weishi Wang, Shafiq Joty, Steven C.H. Hoi (EMNLP 2021) (***CodeT5***).
- CodeBERT:A Pre-Trained Model for Programming and Natural Language - Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, Ming Zhou (EMNLP 2020 Findings) (***CodeBERT***).
- Unsupervised Translation of Programming Languages - Marie-Anne Lachaux, Baptiste Roziere, Lowik Chanussot, Guillaume Lample (NeurIPS 2020) (***Transcoder***).
- Contrastive Code Representation Learning
- How could Neural Networks understand Programs? - Yan Liu (ICML 2021) (***OSCAR***)
- SYNCOBERT: Syntax-Guided Multi-Modal Contrastive Pre-Training for Code Representation
-
-
Talks and Tutorials
-
Papers (This list is a bit outdated, need to update)
-
-
Talk and Tutorial
-
Dataset and Benchmark
-
Papers (This list is a bit outdated, need to update)
-
Programming Languages
Categories
Sub Categories
Keywords
program-synthesis
3
ai
3
chatgpt
2
machine-learning
2
ml
2
typescript
2
python
2
authorship-attribution
1
vscode
1
gpt-4
1
gpt-3
1
tensorflow
1
self-attention
1
rnn
1
representation-learning
1
programming-language-theory
1
open-data
1
nlp-machine-learning
1
nlp
1
neural-networks
1
natural-language-processing
1
machine-learning-on-source-code
1
deep-learning
1
datasets
1
data-science
1
data
1
cnn
1
bert
1
puzzles
1
programming-competitions
1
code-generation
1
search
1
react
1
javascript
1
frontend
1
robotics
1
lean
1
language-model
1
documentation-generator
1
cli-tool
1
llms
1
gpt
1
generative-models
1
emacs
1
stylometry
1
source
1
jam-programming-competition
1
google-code-jam
1
dataset
1
contest
1