Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/bambuscontrol/satd-git-extractor
A tool for extracting additional data from GIT repositories of a selection Apache OSS projects in an SATD dataset.
https://github.com/bambuscontrol/satd-git-extractor
Last synced: about 1 month ago
JSON representation
A tool for extracting additional data from GIT repositories of a selection Apache OSS projects in an SATD dataset.
- Host: GitHub
- URL: https://github.com/bambuscontrol/satd-git-extractor
- Owner: BambusControl
- License: apache-2.0
- Created: 2024-01-24T13:59:07.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2024-01-25T08:41:56.000Z (11 months ago)
- Last Synced: 2024-01-25T18:53:32.626Z (11 months ago)
- Language: Python
- Size: 156 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
- License: LICENSE
Awesome Lists containing this project
README
# SATD GIT Extractor
A Python project for extracting additional data from GIT repositories of a selection Apache OSS projects in the SATD dataset.
- [SATD Dataset Project](https://github.com/yikun-li/satd-different-sources-data)
- [Dataset](https://github.com/yikun-li/satd-different-sources-data/blob/c3c13955ce6c3e68f98fa08829adf41f37281b9a/satd-dataset-commit_messages.csv)A dataset containing listings of Apache repositories is provided under [resources](./resources).
> [!NOTE]
> [PyDriller](https://github.com/ishepard/pydriller) library is used for extracting data from GIT repositories.
> It does not handle well commits that are merged from a fork of a repository.## Usage
Data extraction will output a CSV file per extracted directory.
```shell
python -m satd_git_extractor extract --repositories repos.csv --commits satd-dataset-commit_messages.csv --exports-dir export --clone-dir repos
```Merge the extracted data into the SATD CSV file.
```shell
python -m satd_git_extractor merge --commits satd-dataset-commit_messages.csv --exports-dir export --output satd-commits-merged-dataset.csv
```## Initialize Project
> [!NOTE]
> The following instructions are for Windows using [virtual environment](https://docs.python.org/3/library/venv.html).
> The [Python version](./.python-version) is specified for [PyEnv](https://github.com/pyenv/pyenv).Initialize Python virtual environment.
```shell
python -m venv env
```Activate the environment.
```powershell
.\env\Scripts\Activate.ps1
```Install dependencies.
```shell
pip install -r requirements.txt
```Install the project (in editable mode).
```shell
pip install --editable .
```