awesome-ai4code

A collection of recent papers, benchmarks and datasets of AI4Code domain.
https://github.com/bdqnghi/awesome-ai4code

Last synced: 25 days ago
JSON representation

Tools/Products
- AI code completion tools
- More General Coding Assistants
  - ZZZ Code AI
  - StackSpot AI
  - 16x Prompt
  - Phind
  - CensusGPT
  - Autodoc
  - Wizi
  - Safurai
  - Pixee
- ChatGPT in your editor
- LLM-powered natural language compilers
  - Parsel 🐍
Academic
- Conferences
- Papers (This list is a bit outdated, need to update)
  - Large Language Models of Code Fail at Completing Code with Potential Bugs - Tuan Dinh, Jinman Zhao, Samson Tan, Renato Negrinho, Leonard Lausen, Sheng Zha, George Karypis.
  - Large Language Models Meet NL2Code: A Survey - Daoguang Zan, Bei Chen, Fengji Zhang, Dianjie Lu, Bingchao Wu, Bei Guan, Yongji Wang, Jian-Guang Lou (EMNLP 2023)
  - RepoFusion: Training Code Models to Understand Your Repository - Disha Shrivastava, Denis Kocetkov, Harm de Vries, Dzmitry Bahdanau, Torsten Scholak
  - XCODEEVAL: An Execution-based Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval - Mohammad Abdullah Matin Khan, M Saiful Bari, Xuan Long Do, Weishi Wang, Md Rizwan Parvez, Shafiq Joty
  - RepoFusion: Training Code Models to Understand Your Repository - Disha Shrivastava, Denis Kocetkov, Harm de Vries, Dzmitry Bahdanau, Torsten Scholak
Pretrained CodeLLMs
- Papers (This list is a bit outdated, need to update)
  - CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation - Yue Wang, Weishi Wang, Shafiq Joty, Steven C.H. Hoi (EMNLP 2021) (***CodeT5***).
  - CodeBERT:A Pre-Trained Model for Programming and Natural Language - Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, Ming Zhou (EMNLP 2020 Findings) (***CodeBERT***).
  - Learning and Evaluating Contextual Embedding of Source Code - Aditya Kanade, Petros Maniatis, Gogul Balakrishnan, Kensen Shi. (ICML 2020) (***CuBERT***).
  - InferCode: Self-Supervised Learning of Code Representations by Predicting Subtrees - Nghi D. Q. BUI, Yijun YU, Lingxiao JIANG (ICSE 2021) (***InferCode***).
  - Unsupervised Translation of Programming Languages - Marie-Anne Lachaux, Baptiste Roziere, Lowik Chanussot, Guillaume Lample (NeurIPS 2020) (***Transcoder***).
  - Contrastive Code Representation Learning
  - CoTexT: Multi-task Learning with Code-Text Transformer
  - How could Neural Networks understand Programs? - Yan Liu (ICML 2021) (***OSCAR***)
  - Unified Pre-training for Program Understanding and Generation - Wei Chang (NAACL 2021) (***PLBART***).
  - Exploring Software Naturalness through Neural Language Models - BERT***).
  - PYMT5: multi-mode translation of natural language and PYTHON code with transformers
  - DOBF: A Deobfuscation Pre-Training Objective for Programming Languages - Anne Lachaux, Marc Szafraniec, Guillaume Lample, (arXiv 2021) (***DOBF***).
  - Studying the Usage of Text-To-Text Transfer Transformer to Support Code-Related Tasks
  - Disentangled Code Representation Learning for Multiple Programming Languages - Fingings 2021) (***CODEDISEN***).
  - SYNCOBERT: Syntax-Guided Multi-Modal Contrastive Pre-Training for Code Representation
  - TreeBERT: A Tree-Based Pre-Trained Model for Programming Language
  - Empirical Study of Transformers for Source Code
  - CodeTrans: Towards Cracking the Language of Silicone’s Code Through Self-Supervised Deep Learning and High Performance Computing
  - Self-Supervised Learning for Code Retrieval and Summarization through Semantic-Preserving Program Transformations - Nghi D. Q. BUI, Yijun YU, Lingxiao JIANG (SIGIR 2021) (***Corder***).
Talks and Tutorials
- Papers (This list is a bit outdated, need to update)
Talk and Tutorial
- ETH Zurich Workshop on Software Correctness and Reliability
Dataset and Benchmark
- Papers (This list is a bit outdated, need to update)

Programming Languages

Python 5 TypeScript 3 HTML 1 C# 1 Jupyter Notebook 1 Emacs Lisp 1 Lua 1

Categories

Tools/Products 22 Pretrained CodeLLMs 19 Academic 15 Dataset and Benchmark 8 Talks and Tutorials 3 Talk and Tutorial 1

Sub Categories

Papers (This list is a bit outdated, need to update) 35 Conferences 10 AI code completion tools 9 More General Coding Assistants 9 ChatGPT in your editor 3 LLM-powered natural language compilers 1

Keywords

program-synthesis 3 ai 3 typescript 2 python 2 chatgpt 2 ml 2 machine-learning 2 data-science 1 data 1 cnn 1 bert 1 robotics 1 lean 1 vscode 1 gpt-4 1 gpt-3 1 llms 1 gpt 1 generative-models 1 emacs 1 search 1 react 1 javascript 1 frontend 1 language-model 1 documentation-generator 1 stylometry 1 source 1 jam-programming-competition 1 google-code-jam 1 dataset 1 contest 1 code 1 authorship-recognition 1 authorship-identification 1 authorship-attribution 1 puzzles 1 programming-competitions 1 code-generation 1 tensorflow 1 self-attention 1 rnn 1 representation-learning 1 programming-language-theory 1 open-data 1 nlp-machine-learning 1 nlp 1 neural-networks 1 natural-language-processing 1 machine-learning-on-source-code 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

awesome-ai4code

Tools/Products

AI code completion tools

More General Coding Assistants

ChatGPT in your editor

LLM-powered natural language compilers

Academic

Conferences

Papers (This list is a bit outdated, need to update)

Pretrained CodeLLMs

Papers (This list is a bit outdated, need to update)

Talks and Tutorials

Papers (This list is a bit outdated, need to update)

Talk and Tutorial

Dataset and Benchmark

Papers (This list is a bit outdated, need to update)