Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/michaeldorner/tax_se
Replication package for our work on "Taxing Collaborative Software Engineering"
https://github.com/michaeldorner/tax_se
codereview github github-api python replication-package tax
Last synced: 2 months ago
JSON representation
Replication package for our work on "Taxing Collaborative Software Engineering"
- Host: GitHub
- URL: https://github.com/michaeldorner/tax_se
- Owner: michaeldorner
- License: mit
- Created: 2023-04-06T08:03:25.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-04-12T07:24:47.000Z (8 months ago)
- Last Synced: 2024-04-12T15:08:37.849Z (8 months ago)
- Topics: codereview, github, github-api, python, replication-package, tax
- Language: Jupyter Notebook
- Homepage:
- Size: 52.7 KB
- Stars: 4
- Watchers: 1
- Forks: 3
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Taxing Collaborative Software Engineering
[![GitHub](https://img.shields.io/github/license/michaeldorner/tax_se)](./LICENSE)
[![Codacy Badge](https://app.codacy.com/project/badge/Grade/cca06dbbf55946b883129195e855ecd1)](https://app.codacy.com/gh/michaeldorner/tax_se/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade)Replication package for our work on "Taxing Collaborative Software Engineering"
## Requirements
This replication package requires Python 3.10 or higher. Install the dependencies via:
```
python3 -m pip install -r requirements.txt
```For a faster loading, we recommend to optionally install [`orjson`](https://github.com/ijl/orjson) via pip:
```
python3 -m pip install orjson
```## How to run
### Step 1: Crawl
First, we collect all timelines from all pull requests at a GitHub instance. [`crawler.py`](crawler.py) requires an [`` for your GitHub instance](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token) and an `` where the results are stored into:
```
python3 crawl.py
```
[`crawl.py`](crawl.py) also provides the following optional command line arguments:
- `--api_url` for the GitHub instance URL (default: `https://api.github.com`)
- `--disable_cache` for disable caching (for larger instances not recommended)
- `--num_workers` for parallel processes (default: 1)
- `--organization` for limiting to one organization (helpful for organizations hosted on github.com)To list all options in detail, run:
```
python3 crawl.py --h
```### Step 2: Model pull requests as cross-border communication channels
For this step, you will need:
1) The directory of the previously collected data; and,
2) A mapping of users and countries. This can be either a `dict` for a static mapping (does not capture changes in the users' location over time) or a dataframe for time-dependent mapping as data frame monthly sampled (captures changes in the users' location over time).Run [`notebook.ipynb`](notebook.ipynb). Look out for the instructions as inline comments.
## License
Copyright © 2023 Michael Dorner.
This work is licensed under [MIT license](LICENSE).