awesome-software-engineering-research

A curated list of software engineering research, data set, tool.
https://github.com/FudanSELab/awesome-software-engineering-research

Last synced: 2 days ago
JSON representation

PapersWithCode
- Query Refinement/Expansion
  - Chatbot4QR
- Stack Overflow
  - AnswerBot
  - AnswerBot
- Method Name Generation
  - debug-method-name-2019-ICSE - world code bases. This work will be presented at ICSE 2019. This reincluding data with .
  - debug-method-name-2019-ICSE - world code bases. This work will be presented at ICSE 2019. This reincluding data with .
  - DeepName-2021-ICSE - based Automated Approach for Method Name Consistency Checking and Suggestion.
- Documentation Bad Smell
  - DocSmell Benchmark
Data Sets and Benchmarks
- API Usage Pattern Recommendation
  - FOCUS - aware collaborative-filtering system that exploits cross relationships among OSS projects to suggest the inclusion of additional API invocations and concrete API usage patterns. Paper: "FOCUS: A Recommender System for Mining API Function Calls and Usage Patterns".
- Bug Localization
  - EMSE Bug Location Data Set - Retrieval-based Bug Localization" by Oscar Chaparro, Juan Manuel Florez, Andrian Marcus.
  - DeepLocalize
  - TSSB-3M - scale datasets of single statement bug fixes in Python.
- Others
  - Stack Exchange - Anonymized dump of all user-contributed content on the Stack Exchange network.
- Code Language
  - Code-LMs - lingual code corpus used to train language model and some pretrained language model for code, e.g., GPT-2, PolyCoder. [Paper](https://arxiv.org/pdf/2202.13169.pdf).
- Test Oracle Generation
  - Toga
- Microservice System
  - Train Ticket：A Benchmark Microservice System
- NL-based Code Search
  - facebook Neural-Code-Search-Evaluation-Dataset
  - CosBench
  - codesearchnet - H. Wu, T. Gazit, M. Allamanis, and M. Brockschmidt, “Codesearchnet challenge: Evaluating the state of semantic code search,” ArXiv, vol. abs/1909.09436, 2019, github/codesearchnet.
- Code-Related Task Benchmark
  - CodeXGLUE - A benchmark dataset and open challenge for code intelligence. It includes 14 datasets for 10 diversified code intelligence tasks covering the following scenarios: 1) code-code (clone detection, defect detection, cloze test, code completion, code repair, and code-to-code translation); 2) text-code (natural language code search, text-to-code generation); 3) code-text (code summarization); 4) text-text (documentation translation).
  - Project CodeNet - for-Code research community with a large scale, diverse, and high quality curated dataset to drive innovation in AI techniques. Project CodeNet is a large scale dataset with approximately 14 million code samples, each of which is an intended solution to one of 4000 coding problems. Project CodeNet aims to do for AI for Code what ImageNet did for computer vision.
- Library-Oriented Code Generation
  - PyCodeGPT - Training on Sketches for Library-Oriented Code Generation
- API Misuse
  - MUBench - misuse detectors, based on the MUBench benchmarking dataset. If you encounter any problems using MUBench, please report them to us. If you have any questions, please contact Sven Amann.
  - CryptoAPI-Bench
- Variable Misuee
  - great - Great](https://github.com/VHellendoorn/ICLR20-Great),, the dataset for the variable-misuse task, described in the ICLR 2020 paper 'Global Relational Models of Source Code' [https://openreview.net/forum?id=B1lnbRNtwr]. This repository contains the data and code to replicate our ICLR 2020 paper on models of source code that combine global and structural information, including the Graph-Sandwich model family and the GREAT (Graph-Relational Embedding Attention Transformer) model.
- Programming-Language Understanding and Repair
  - PLUR - Language Understanding and Repair) is a collection of source code datasets suitable for graph-based machine learning. We provide scripts for downloading, processing, and loading the datasets. This is done by offering a unified API and data structures for all datasets.
- Feature Location
  - Deny Benchmark on Feature Location - A Taxonomy and Survey". It provides ArgoUML，Eclipse，JabRef ，jEdit ，muCommander feaure location data sets.
- API Recommendation
  - BIKER - API Knowledge Gap", ASE, [Paper](https://dl.acm.org/doi/10.1145/3238147.3238191), including about 400 API retrieval tasks from Stack Overflow.
Other Resource Lists
- Code Analysis

Programming Languages

Python 12 Java 3 Jupyter Notebook 1 C# 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

awesome-software-engineering-research

PapersWithCode

Query Refinement/Expansion

Stack Overflow

Method Name Generation

Documentation Bad Smell

Data Sets and Benchmarks

API Usage Pattern Recommendation

Bug Localization

Others

Code Language

Test Oracle Generation

Microservice System

NL-based Code Search

Library-Oriented Code Generation

API Misuse

Variable Misuee

Programming-Language Understanding and Repair

Feature Location

API Recommendation

Other Resource Lists

Code Analysis

awesome-software-engineering-research

PapersWithCode

Query Refinement/Expansion

Stack Overflow

Method Name Generation

Documentation Bad Smell

Data Sets and Benchmarks

API Usage Pattern Recommendation

Bug Localization

Others

Code Language

Test Oracle Generation

Microservice System

NL-based Code Search

Code-Related Task Benchmark

Library-Oriented Code Generation

API Misuse

Variable Misuee

Programming-Language Understanding and Repair

Feature Location

API Recommendation

Other Resource Lists

Code Analysis