Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/seart-group/dl4se
Building Training Datasets for Deep Learning Models in Software Engineering and Empirical Software Engineering Research
https://github.com/seart-group/dl4se
dataset-generation deep-learning docker-compose jsonl liquibase mining-software-repositories mining-source-code msr postgresql software-engineering source-code-analysis spring-boot spring-boot-application spring-boot-server
Last synced: 15 days ago
JSON representation
Building Training Datasets for Deep Learning Models in Software Engineering and Empirical Software Engineering Research
- Host: GitHub
- URL: https://github.com/seart-group/dl4se
- Owner: seart-group
- License: mit
- Created: 2022-02-09T20:57:32.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2024-06-26T15:00:44.000Z (6 months ago)
- Last Synced: 2024-06-27T16:09:10.889Z (6 months ago)
- Topics: dataset-generation, deep-learning, docker-compose, jsonl, liquibase, mining-software-repositories, mining-source-code, msr, postgresql, software-engineering, source-code-analysis, spring-boot, spring-boot-application, spring-boot-server
- Language: Java
- Homepage: https://seart-dh.si.usi.ch
- Size: 4.2 MB
- Stars: 16
- Watchers: 2
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Codeowners: CODEOWNERS
Awesome Lists containing this project
README
# SEART Data Hub
The SEART Data Hub platform allows to easily create large-scale datasets that can be used to either run empirical MSR
studies or to train Deep Learning models to automate software engineering tasks.## Contents
This project contains several modules:
- `dl4se-model`: A module containing domain model classes used for mapping the relational database structure to the
programming environment;
- `dl4se-analyzer`: A module containing implementations of code analysis operations running on `tree-sitter`;
- `dl4se-transformer`: A module containing implementations of code transformation operations running on `tree-sitter`;
- `dl4se-crawler`: A standalone crawler application that we use to mine source code from GitHub repositories indexed by
[GitHub Search](https://seart-ghs.si.usi.ch/);
- `dl4se-server`: A Spring Boot server application that acts as our platform back-end;
- `dl4se-spring`: Common Spring Boot configuration and utilities used in both the server and the crawler;
- `dl4se-website`: A front-end web-application written in Vue.## Installation and Usage
This section will detail the necessary actions for setting up and running the project locally on your machine.
### [Environment](README_ENV.md)
### [Database](README_DB.md)
### [Usage](README_RUN.md)
### [Dockerization](README_DOCKER.md)
## License
[MIT](LICENSE)
## FAQ
### How do you implement language-specific analysis heuristics?
Heuristics used to identify test code in Java and Python can be found
[here](dl4se-analyzer/src/main/java/ch/usi/si/seart/analyzer/predicate/path/JavaTestFilePredicate.java) and
[here](dl4se-analyzer/src/main/java/ch/usi/si/seart/analyzer/predicate/path/PythonTestFilePredicate.java).
Heuristics used to identify boilerplate code can be found
[here](dl4se-analyzer/src/main/java/ch/usi/si/seart/analyzer/enumerate/JavaBoilerplateEnumerator.java) and
[here](dl4se-analyzer/src/main/java/ch/usi/si/seart/analyzer/enumerate/PythonBoilerplateEnumerator.java) respectively.### How can I request a feature or ask a question?
If you have ideas for a feature you would like to see implemented or if you have any questions, we encourage you to
create a new [discussion](https://github.com/seart-group/DL4SE/discussions/). By initiating a discussion, you can engage
with the community and our team, and we'll respond promptly to address your queries or consider your feature requests.### How can I report a bug?
To report any issues or bugs you encounter, please create a [new issue](https://github.com/seart-group/DL4SE/issues/).
Providing detailed information about the problem you're facing will help us understand and address it more effectively.
Rest assured, we are committed to promptly reviewing and responding to the issues you raise, working collaboratively
to resolve any bugs and improve the overall user experience.