Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/yuewang-cuhk/awesome-programming-language-pretraining-papers

Recent Advances in Programming Language Pre-Trained Models (PL-PTMs)
https://github.com/yuewang-cuhk/awesome-programming-language-pretraining-papers

List: awesome-programming-language-pretraining-papers

Last synced: 3 months ago
JSON representation

Recent Advances in Programming Language Pre-Trained Models (PL-PTMs)

Awesome Lists containing this project

README

        

# Recent Advances in Programming Language Pre-Trained Models (PL-PTMs)
Maintained by WANG Yue ([email protected]). Last update on 2021/12/17.

## General PL-PTMs

[Learning and Evaluating Contextual Embedding of Source Code](https://arxiv.org/abs/2001.00059), \[[code](https://github.com/google-research/google-research/tree/master/cubert)\] ICML 2020 (CuBERT)

[CodeBERT:A Pre-Trained Model for Programming and Natural Languages](https://arxiv.org/abs/2002.08155), \[[code](https://github.com/microsoft/CodeBERT)\] EMNLP 2020 Findings, (CodeBERT)

[GraphCodeBERT: Pre-training Code Representations with Data Flow](https://arxiv.org/abs/2009.08366), \[[code](https://github.com/microsoft/CodeBERT/tree/master/GraphCodeBERT)\] ICLR 2021 (GraphCodeBERT)

[Unified Pre-training for Program Understanding and Generation](https://arxiv.org/abs/2103.06333), \[[code](https://github.com/wasiahmad/PLBART)\] NAACL 2021 (PLBART)

[Unsupervised Translation of Programming Languages](https://arxiv.org/abs/2006.03511), \[[code](https://github.com/facebookresearch/TransCoder)\] NeurIPS 2020 (TransCoder)

[Exploring Software Naturalness through Neural Language Models](https://arxiv.org/abs/2006.12641), arXiv 2020/06 (C-BERT)

[PYMT5: multi-mode translation of natural language and PYTHON code with transformers](https://arxiv.org/abs/2010.03150), EMNLP 2020 (PYMT5)

[Contrastive Code Representation Learning](https://arxiv.org/abs/2007.04973), \[[code](https://github.com/parasj/contracode)\] arXiv 2020/07 (ContraCode)

[DOBF: A Deobfuscation Pre-Training Objective for Programming Languages](https://arxiv.org/abs/2102.07492), arXiv 2021/02 (DOBF)

[Studying the Usage of Text-To-Text Transfer Transformer to Support Code-Related Tasks](https://arxiv.org/abs/2102.02017), \[[code](https://github.com/antonio-mastropaolo/T5-learning-ICSE_2021)\] ICSE 2021

[CodeTrans: Towards Cracking the Language of Silicone’s Code Through Self-Supervised Deep Learning and High Performance Computing](https://arxiv.org/abs/2104.02443), \[[code](https://github.com/agemagician/CodeTrans)\] arXiv 2021/04 (CodeTrans)

[How could Neural Networks understand Programs?](https://arxiv.org/pdf/2105.04297.pdf), \[[code](https://github.com/pdlan/OSCAR)\] ICML 2021 (OSCAR)

[CoTexT: Multi-task Learning with Code-Text Transformer](https://arxiv.org/abs/2105.08645), arXiv 2021/05 (CoTexT)

[Disentangled Code Representation Learning for Multiple Programming Languages](https://aclanthology.org/2021.findings-acl.391.pdf), ACL-Fingings 2021 (CODEDISEN)

[SYNCOBERT: Syntax-Guided Multi-Modal Contrastive Pre-Training for Code Representation](https://arxiv.org/pdf/2108.04556v3.pdf), arXiv 2021/09 (SYNCOBERT)

[TreeBERT: A Tree-Based Pre-Trained Model for Programming Language](https://arxiv.org/abs/2105.12485), UAI 2021

[CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation](https://arxiv.org/pdf/2109.00859.pdf), EMNLP 2021 \[[code](https://github.com/salesforce/CodeT5)\] \[[blog](https://blog.einstein.ai/codet5/)\] \[[media](https://venturebeat.com/2021/09/07/salesforces-codet5-system-can-understand-and-generate-code/)\]\[[slide](https://yuewang-cuhk.github.io/file/CodeT5_final_slide_p20.pdf)\]\[[poster](https://yuewang-cuhk.github.io/file/CodeT5_Poster.pdf)\]

## Task-specific PL-PTMs
**Code Completion:** [Multi-task Learning based Pre-trained Language Model for Code Completion](https://arxiv.org/abs/2012.14631), ASE 2020 (CugLM)

**Code Completion:** [IntelliCode Compose: Code Generation using Transformer](https://arxiv.org/abs/2005.08025), FSE 2020 (IntelliCode Compose)

**Code Completion:** [Improving Code Autocompletion with Transfer Learning](https://arxiv.org/abs/2105.05991), arXiv 2021/05

**Program Repair:** [Generating Bug-Fixes Using Pretrained Transformers](https://arxiv.org/abs/2104.07896), arXiv 2021/04 (DeepCode)

**Program Repair:** [DeepDebug: Fixing Python Bugs Using Stack Traces, Backtranslation, and Code Skeletons](https://arxiv.org/pdf/2105.09352.pdf), arXiv 2021/05 (DeepDebug)

**Program Repair:** [TFix: Learning to Fix Coding Errors with a Text-to-Text Transformer](https://files.sri.inf.ethz.ch/website/papers/icml21-tfix.pdf), ICML 2021

**Program Repair:** [CURE: Code-Aware Neural Machine Translation for Automatic Program Repair](https://arxiv.org/abs/2103.00073), ICSE 2021

**Unit Test Generation:** [Unit Test Case Generation with Transformers and Focal Context](https://arxiv.org/pdf/2009.05617.pdf), arXiv 2021/05

**Code Generation:** [Evaluating Large Language Models Trained on Code](https://arxiv.org/abs/2107.03374), arXiv 2021/07 (Codex)

**Code Generation:** [Program Synthesis with Large Language Models](https://arxiv.org/abs/2108.07732), arXiv 2021/08

## Other Deep Models for Code-related Tasks

[Language-Agnostic Representation Learning of Source Code from Structure and Context](https://arxiv.org/abs/2103.11318), \[[code](https://github.com/danielzuegner/code-transformer)\] ICLR 2021 (Code Transformer)

[GN-Transformer: Fusing AST and Source Code information in Graph Networks](https://openreview.net/forum?id=XavM6v_q59q), openreview 2020/09 (GN-Transformer)

**Program Repair:** [HOPPITY: LEARNING GRAPH TRANSFORMATIONS TO DETECT AND FIX BUGS IN PROGRAMS](https://openreview.net/forum?id=SJeqs6EFvB), ICLR 2020 (HOPPITY)

## Benchmarks & Datasets

[CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation](https://arxiv.org/abs/2102.04664), \[[code](https://github.com/microsoft/CodeXGLUE)\] arXiv 2021/02

[Project CodeNet: A Large-Scale AI for Code Dataset for Learning a Diversity of Coding Tasks](https://github.com/IBM/Project_CodeNet/blob/main/ProjectCodeNet_NeurIPS2021.pdf) \[[code](https://github.com/IBM/Project_CodeNet)\]

[Measuring Coding Challenge Competence With APPS](https://arxiv.org/pdf/2105.09938.pdf), arXiv 2021/05