Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
awesome-software-engineering-research
A curated list of software engineering research, data set, tool.
https://github.com/FudanSELab/awesome-software-engineering-research
Last synced: about 9 hours ago
JSON representation
-
PapersWithCode
-
Query Refinement/Expansion
-
Stack Overflow
-
Method Name Generation
- debug-method-name-2019-ICSE - world code bases. This work will be presented at ICSE 2019. This reincluding data with .
- debug-method-name-2019-ICSE - world code bases. This work will be presented at ICSE 2019. This reincluding data with .
- DeepName-2021-ICSE - based Automated Approach for Method Name Consistency Checking and Suggestion.
-
Documentation Bad Smell
-
-
Data Sets and Benchmarks
-
API Usage Pattern Recommendation
- FOCUS - aware collaborative-filtering system that exploits cross relationships among OSS projects to suggest the inclusion of additional API invocations and concrete API usage patterns. Paper: "FOCUS: A Recommender System for Mining API Function Calls and Usage Patterns".
-
Bug Localization
- EMSE Bug Location Data Set - Retrieval-based Bug Localization" by Oscar Chaparro, Juan Manuel Florez, Andrian Marcus.
- DeepLocalize
- TSSB-3M - scale datasets of single statement bug fixes in Python.
-
Others
- Stack Exchange - Anonymized dump of all user-contributed content on the Stack Exchange network.
-
Code Language
- Code-LMs - lingual code corpus used to train language model and some pretrained language model for code, e.g., GPT-2, PolyCoder. [Paper](https://arxiv.org/pdf/2202.13169.pdf).
-
Test Oracle Generation
-
Microservice System
-
NL-based Code Search
- facebook Neural-Code-Search-Evaluation-Dataset
- CosBench
- codesearchnet - H. Wu, T. Gazit, M. Allamanis, and M. Brockschmidt, “Codesearchnet challenge: Evaluating the state of semantic code search,” ArXiv, vol. abs/1909.09436, 2019, github/codesearchnet.
-
Code-Related Task Benchmark
- CodeXGLUE - A benchmark dataset and open challenge for code intelligence. It includes 14 datasets for 10 diversified code intelligence tasks covering the following scenarios: 1) code-code (clone detection, defect detection, cloze test, code completion, code repair, and code-to-code translation); 2) text-code (natural language code search, text-to-code generation); 3) code-text (code summarization); 4) text-text (documentation translation).
- Project CodeNet - for-Code research community with a large scale, diverse, and high quality curated dataset to drive innovation in AI techniques. Project CodeNet is a large scale dataset with approximately 14 million code samples, each of which is an intended solution to one of 4000 coding problems. Project CodeNet aims to do for AI for Code what ImageNet did for computer vision.
-
Library-Oriented Code Generation
- PyCodeGPT - Training on Sketches for Library-Oriented Code Generation
-
API Misuse
- MUBench - misuse detectors, based on the MUBench benchmarking dataset. If you encounter any problems using MUBench, please report them to us. If you have any questions, please contact Sven Amann.
- CryptoAPI-Bench
-
Variable Misuee
- great - Great](https://github.com/VHellendoorn/ICLR20-Great),, the dataset for the variable-misuse task, described in the ICLR 2020 paper 'Global Relational Models of Source Code' [https://openreview.net/forum?id=B1lnbRNtwr]. This repository contains the data and code to replicate our ICLR 2020 paper on models of source code that combine global and structural information, including the Graph-Sandwich model family and the GREAT (Graph-Relational Embedding Attention Transformer) model.
-
Programming-Language Understanding and Repair
- PLUR - Language Understanding and Repair) is a collection of source code datasets suitable for graph-based machine learning. We provide scripts for downloading, processing, and loading the datasets. This is done by offering a unified API and data structures for all datasets.
-
API Recommendation
- BIKER - API Knowledge Gap", ASE, [Paper](https://dl.acm.org/doi/10.1145/3238147.3238191), including about 400 API retrieval tasks from Stack Overflow.
-
Feature Location
- Deny Benchmark on Feature Location - A Taxonomy and Survey". It provides ArgoUML,Eclipse,JabRef ,jEdit ,muCommander feaure location data sets.
-
-
Other Resource Lists
Programming Languages
Sub Categories
Code Analysis
6
Bug Localization
3
NL-based Code Search
3
Method Name Generation
3
Stack Overflow
2
Code-Related Task Benchmark
2
API Misuse
2
Feature Location
1
Others
1
API Recommendation
1
Variable Misuee
1
Documentation Bad Smell
1
Library-Oriented Code Generation
1
Microservice System
1
Query Refinement/Expansion
1
Test Oracle Generation
1
Code Language
1
Programming-Language Understanding and Repair
1
API Usage Pattern Recommendation
1
Keywords
deep-learning
3
datasets
2
machine-learning
2
aaron-swartz
1
awesome-public-datasets
1
opendata
1
gpt-2
1
source-code
1
bugs
1
dataset
1
mining
1
program-synthesis
1
research
1
software-engineering
1
bert
1
cnn
1
data
1
data-science
1
machine-learning-on-source-code
1
ml
1
natural-language-processing
1
neural-networks
1
nlp
1
nlp-machine-learning
1
open-data
1
programming-language-theory
1
python
1
representation-learning
1
rnn
1
self-attention
1
tensorflow
1