awesome-ai4code
A collection of recent papers, benchmarks and datasets of AI4Code domain.
https://github.com/bdqnghi/awesome-ai4code
Last synced: 8 days ago
JSON representation
-
Academic
-
Conferences
- Interational Conference on Software Engineering (ICSE)
- Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE)
- Automated Software Engineering (ASE)
- Programming Language Design and Implementation (PLDI)
- Interational Conference in Machine Learning (ICML)
- International Conference on Neural Information Processing Systems(NeurIPS)
- International Conference on Learning Representation (ICLR)
- Empirical Methods in Natural Language Processing (EMNLP)
- North American Chapter of the Association for Computational Linguistics (NAACL)
- Annual Meeting of the Association for Computational Linguistics (ACL)
- Interational Conference on Software Engineering (ICSE)
- Association for the Advancement of Artificial Intelligence (AAAI)
-
Papers (This list is a bit outdated, need to update)
- Large Language Models of Code Fail at Completing Code with Potential Bugs - Tuan Dinh, Jinman Zhao, Samson Tan, Renato Negrinho, Leonard Lausen, Sheng Zha, George Karypis.
- Large Language Models Meet NL2Code: A Survey - Daoguang Zan, Bei Chen, Fengji Zhang, Dianjie Lu, Bingchao Wu, Bei Guan, Yongji Wang, Jian-Guang Lou (EMNLP 2023)
- RepoFusion: Training Code Models to Understand Your Repository - Disha Shrivastava, Denis Kocetkov, Harm de Vries, Dzmitry Bahdanau, Torsten Scholak
- XCODEEVAL: An Execution-based Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval - Mohammad Abdullah Matin Khan, M Saiful Bari, Xuan Long Do, Weishi Wang, Md Rizwan Parvez, Shafiq Joty
- RepoFusion: Training Code Models to Understand Your Repository - Disha Shrivastava, Denis Kocetkov, Harm de Vries, Dzmitry Bahdanau, Torsten Scholak
- Large Language Models of Code Fail at Completing Code with Potential Bugs - Tuan Dinh, Jinman Zhao, Samson Tan, Renato Negrinho, Leonard Lausen, Sheng Zha, George Karypis.
- Large Language Models Meet NL2Code: A Survey - Daoguang Zan, Bei Chen, Fengji Zhang, Dianjie Lu, Bingchao Wu, Bei Guan, Yongji Wang, Jian-Guang Lou (EMNLP 2023)
- XCODEEVAL: An Execution-based Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval - Mohammad Abdullah Matin Khan, M Saiful Bari, Xuan Long Do, Weishi Wang, Md Rizwan Parvez, Shafiq Joty
-
-
Dataset and Benchmark
-
Papers (This list is a bit outdated, need to update)
-
-
Pretrained CodeLLMs
-
Papers (This list is a bit outdated, need to update)
- CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation - Yue Wang, Weishi Wang, Shafiq Joty, Steven C.H. Hoi (EMNLP 2021) (***CodeT5***).
- CodeBERT:A Pre-Trained Model for Programming and Natural Language - Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, Ming Zhou (EMNLP 2020 Findings) (***CodeBERT***).
- Learning and Evaluating Contextual Embedding of Source Code - Aditya Kanade, Petros Maniatis, Gogul Balakrishnan, Kensen Shi. (ICML 2020) (***CuBERT***).
- Self-Supervised Learning for Code Retrieval and Summarization through Semantic-Preserving Program Transformations - Nghi D. Q. BUI, Yijun YU, Lingxiao JIANG (SIGIR 2021) (***Corder***).
- Unsupervised Translation of Programming Languages - Marie-Anne Lachaux, Baptiste Roziere, Lowik Chanussot, Guillaume Lample (NeurIPS 2020) (***Transcoder***).
- Contrastive Code Representation Learning
- CoTexT: Multi-task Learning with Code-Text Transformer
- How could Neural Networks understand Programs? - Yan Liu (ICML 2021) (***OSCAR***)
- Unified Pre-training for Program Understanding and Generation - Wei Chang (NAACL 2021) (***PLBART***).
- Exploring Software Naturalness through Neural Language Models - BERT***).
- PYMT5: multi-mode translation of natural language and PYTHON code with transformers
- DOBF: A Deobfuscation Pre-Training Objective for Programming Languages - Anne Lachaux, Marc Szafraniec, Guillaume Lample, (arXiv 2021) (***DOBF***).
- Studying the Usage of Text-To-Text Transfer Transformer to Support Code-Related Tasks
- CodeTrans: Towards Cracking the Language of Silicone’s Code Through Self-Supervised Deep Learning and High Performance Computing
- Disentangled Code Representation Learning for Multiple Programming Languages - Fingings 2021) (***CODEDISEN***).
- SYNCOBERT: Syntax-Guided Multi-Modal Contrastive Pre-Training for Code Representation
- TreeBERT: A Tree-Based Pre-Trained Model for Programming Language
- Empirical Study of Transformers for Source Code
- CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation - Yue Wang, Weishi Wang, Shafiq Joty, Steven C.H. Hoi (EMNLP 2021) (***CodeT5***).
- CodeBERT:A Pre-Trained Model for Programming and Natural Language - Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, Ming Zhou (EMNLP 2020 Findings) (***CodeBERT***).
- Unsupervised Translation of Programming Languages - Marie-Anne Lachaux, Baptiste Roziere, Lowik Chanussot, Guillaume Lample (NeurIPS 2020) (***Transcoder***).
- Contrastive Code Representation Learning
- How could Neural Networks understand Programs? - Yan Liu (ICML 2021) (***OSCAR***)
- SYNCOBERT: Syntax-Guided Multi-Modal Contrastive Pre-Training for Code Representation
- GraphCodeBERT: Pre-training Code Representations with Data Flow - Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Alexey Svyatkovskiy, Shengyu Fu, Michele Tufano, Shao Kun Deng, Colin Clement, Dawn Drain, Neel Sundaresan, Jian Yin, Daxin Jiang, Ming Zhou (ICLR 2021) (***GraphCodeBERT***).
- InferCode: Self-Supervised Learning of Code Representations by Predicting Subtrees - Nghi D. Q. BUI, Yijun YU, Lingxiao JIANG (ICSE 2021) (***InferCode***).
-
-
Talk and Tutorial
-
Talks and Tutorials
-
Papers (This list is a bit outdated, need to update)
-
-
Tools/Products
-
AI code completion tools
-
ChatGPT in your editor
-
LLM-powered natural language compilers
-
More General Coding Assistants
-
Programming Languages
Categories
Sub Categories
Keywords
program-synthesis
3
ai
3
chatgpt
2
machine-learning
2
ml
2
typescript
2
python
2
authorship-attribution
1
vscode
1
gpt-4
1
gpt-3
1
tensorflow
1
self-attention
1
rnn
1
representation-learning
1
programming-language-theory
1
open-data
1
nlp-machine-learning
1
nlp
1
neural-networks
1
natural-language-processing
1
machine-learning-on-source-code
1
deep-learning
1
datasets
1
data-science
1
data
1
cnn
1
bert
1
puzzles
1
programming-competitions
1
code-generation
1
search
1
react
1
javascript
1
frontend
1
robotics
1
lean
1
language-model
1
documentation-generator
1
cli-tool
1
llms
1
gpt
1
generative-models
1
emacs
1
stylometry
1
source
1
jam-programming-competition
1
google-code-jam
1
dataset
1
contest
1