Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/yuewang-cuhk/awesome-programming-language-pretraining-papers
Recent Advances in Programming Language Pre-Trained Models (PL-PTMs)
https://github.com/yuewang-cuhk/awesome-programming-language-pretraining-papers
List: awesome-programming-language-pretraining-papers
Last synced: about 1 month ago
JSON representation
Recent Advances in Programming Language Pre-Trained Models (PL-PTMs)
- Host: GitHub
- URL: https://github.com/yuewang-cuhk/awesome-programming-language-pretraining-papers
- Owner: yuewang-cuhk
- Created: 2021-05-22T13:13:42.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2021-12-17T02:47:30.000Z (about 3 years ago)
- Last Synced: 2024-05-19T14:00:56.205Z (7 months ago)
- Size: 20.5 KB
- Stars: 56
- Watchers: 2
- Forks: 10
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- ultimate-awesome - awesome-programming-language-pretraining-papers - Recent Advances in Programming Language Pre-Trained Models (PL-PTMs). (Other Lists / Monkey C Lists)
README
# Recent Advances in Programming Language Pre-Trained Models (PL-PTMs)
Maintained by WANG Yue ([email protected]). Last update on 2021/12/17.## General PL-PTMs
[Learning and Evaluating Contextual Embedding of Source Code](https://arxiv.org/abs/2001.00059), \[[code](https://github.com/google-research/google-research/tree/master/cubert)\] ICML 2020 (CuBERT)
[CodeBERT:A Pre-Trained Model for Programming and Natural Languages](https://arxiv.org/abs/2002.08155), \[[code](https://github.com/microsoft/CodeBERT)\] EMNLP 2020 Findings, (CodeBERT)
[GraphCodeBERT: Pre-training Code Representations with Data Flow](https://arxiv.org/abs/2009.08366), \[[code](https://github.com/microsoft/CodeBERT/tree/master/GraphCodeBERT)\] ICLR 2021 (GraphCodeBERT)
[Unified Pre-training for Program Understanding and Generation](https://arxiv.org/abs/2103.06333), \[[code](https://github.com/wasiahmad/PLBART)\] NAACL 2021 (PLBART)
[Unsupervised Translation of Programming Languages](https://arxiv.org/abs/2006.03511), \[[code](https://github.com/facebookresearch/TransCoder)\] NeurIPS 2020 (TransCoder)
[Exploring Software Naturalness through Neural Language Models](https://arxiv.org/abs/2006.12641), arXiv 2020/06 (C-BERT)
[PYMT5: multi-mode translation of natural language and PYTHON code with transformers](https://arxiv.org/abs/2010.03150), EMNLP 2020 (PYMT5)
[Contrastive Code Representation Learning](https://arxiv.org/abs/2007.04973), \[[code](https://github.com/parasj/contracode)\] arXiv 2020/07 (ContraCode)
[DOBF: A Deobfuscation Pre-Training Objective for Programming Languages](https://arxiv.org/abs/2102.07492), arXiv 2021/02 (DOBF)
[Studying the Usage of Text-To-Text Transfer Transformer to Support Code-Related Tasks](https://arxiv.org/abs/2102.02017), \[[code](https://github.com/antonio-mastropaolo/T5-learning-ICSE_2021)\] ICSE 2021
[CodeTrans: Towards Cracking the Language of Silicone’s Code Through Self-Supervised Deep Learning and High Performance Computing](https://arxiv.org/abs/2104.02443), \[[code](https://github.com/agemagician/CodeTrans)\] arXiv 2021/04 (CodeTrans)
[How could Neural Networks understand Programs?](https://arxiv.org/pdf/2105.04297.pdf), \[[code](https://github.com/pdlan/OSCAR)\] ICML 2021 (OSCAR)
[CoTexT: Multi-task Learning with Code-Text Transformer](https://arxiv.org/abs/2105.08645), arXiv 2021/05 (CoTexT)
[Disentangled Code Representation Learning for Multiple Programming Languages](https://aclanthology.org/2021.findings-acl.391.pdf), ACL-Fingings 2021 (CODEDISEN)
[SYNCOBERT: Syntax-Guided Multi-Modal Contrastive Pre-Training for Code Representation](https://arxiv.org/pdf/2108.04556v3.pdf), arXiv 2021/09 (SYNCOBERT)
[TreeBERT: A Tree-Based Pre-Trained Model for Programming Language](https://arxiv.org/abs/2105.12485), UAI 2021
[CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation](https://arxiv.org/pdf/2109.00859.pdf), EMNLP 2021 \[[code](https://github.com/salesforce/CodeT5)\] \[[blog](https://blog.einstein.ai/codet5/)\] \[[media](https://venturebeat.com/2021/09/07/salesforces-codet5-system-can-understand-and-generate-code/)\]\[[slide](https://yuewang-cuhk.github.io/file/CodeT5_final_slide_p20.pdf)\]\[[poster](https://yuewang-cuhk.github.io/file/CodeT5_Poster.pdf)\]
## Task-specific PL-PTMs
**Code Completion:** [Multi-task Learning based Pre-trained Language Model for Code Completion](https://arxiv.org/abs/2012.14631), ASE 2020 (CugLM)**Code Completion:** [IntelliCode Compose: Code Generation using Transformer](https://arxiv.org/abs/2005.08025), FSE 2020 (IntelliCode Compose)
**Code Completion:** [Improving Code Autocompletion with Transfer Learning](https://arxiv.org/abs/2105.05991), arXiv 2021/05
**Program Repair:** [Generating Bug-Fixes Using Pretrained Transformers](https://arxiv.org/abs/2104.07896), arXiv 2021/04 (DeepCode)
**Program Repair:** [DeepDebug: Fixing Python Bugs Using Stack Traces, Backtranslation, and Code Skeletons](https://arxiv.org/pdf/2105.09352.pdf), arXiv 2021/05 (DeepDebug)
**Program Repair:** [TFix: Learning to Fix Coding Errors with a Text-to-Text Transformer](https://files.sri.inf.ethz.ch/website/papers/icml21-tfix.pdf), ICML 2021
**Program Repair:** [CURE: Code-Aware Neural Machine Translation for Automatic Program Repair](https://arxiv.org/abs/2103.00073), ICSE 2021
**Unit Test Generation:** [Unit Test Case Generation with Transformers and Focal Context](https://arxiv.org/pdf/2009.05617.pdf), arXiv 2021/05
**Code Generation:** [Evaluating Large Language Models Trained on Code](https://arxiv.org/abs/2107.03374), arXiv 2021/07 (Codex)
**Code Generation:** [Program Synthesis with Large Language Models](https://arxiv.org/abs/2108.07732), arXiv 2021/08
## Other Deep Models for Code-related Tasks
[Language-Agnostic Representation Learning of Source Code from Structure and Context](https://arxiv.org/abs/2103.11318), \[[code](https://github.com/danielzuegner/code-transformer)\] ICLR 2021 (Code Transformer)
[GN-Transformer: Fusing AST and Source Code information in Graph Networks](https://openreview.net/forum?id=XavM6v_q59q), openreview 2020/09 (GN-Transformer)
**Program Repair:** [HOPPITY: LEARNING GRAPH TRANSFORMATIONS TO DETECT AND FIX BUGS IN PROGRAMS](https://openreview.net/forum?id=SJeqs6EFvB), ICLR 2020 (HOPPITY)
## Benchmarks & Datasets
[CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation](https://arxiv.org/abs/2102.04664), \[[code](https://github.com/microsoft/CodeXGLUE)\] arXiv 2021/02
[Project CodeNet: A Large-Scale AI for Code Dataset for Learning a Diversity of Coding Tasks](https://github.com/IBM/Project_CodeNet/blob/main/ProjectCodeNet_NeurIPS2021.pdf) \[[code](https://github.com/IBM/Project_CodeNet)\]
[Measuring Coding Challenge Competence With APPS](https://arxiv.org/pdf/2105.09938.pdf), arXiv 2021/05