https://github.com/michaeldorner/tax_se

Replication package for our work on "Taxing Collaborative Software Engineering"
https://github.com/michaeldorner/tax_se

codereview github github-api python replication-package tax

Last synced: 3 months ago
JSON representation

Replication package for our work on "Taxing Collaborative Software Engineering"

Host: GitHub
URL: https://github.com/michaeldorner/tax_se
Owner: michaeldorner
License: mit
Created: 2023-04-06T08:03:25.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2024-04-12T07:24:47.000Z (about 1 year ago)
Last Synced: 2025-03-27T05:12:28.526Z (3 months ago)
Topics: codereview, github, github-api, python, replication-package, tax
Language: Jupyter Notebook
Homepage:
Size: 52.7 KB
Stars: 4
Watchers: 1
Forks: 3
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # Taxing Collaborative Software Engineering

[![GitHub](https://img.shields.io/github/license/michaeldorner/tax_se)](./LICENSE)

[![Codacy Badge](https://app.codacy.com/project/badge/Grade/cca06dbbf55946b883129195e855ecd1)](https://app.codacy.com/gh/michaeldorner/tax_se/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade)

Replication package for our work on "Taxing Collaborative Software Engineering"

## Requirements

This replication package requires Python 3.10 or higher. Install the dependencies via:

```

python3 -m pip install -r requirements.txt

```

For a faster loading, we recommend to optionally install [`orjson`](https://github.com/ijl/orjson) via pip:

```

python3 -m pip install orjson

```

## How to run

### Step 1: Crawl 

First, we collect all timelines from all pull requests at a GitHub instance. [`crawler.py`](crawler.py) requires an [`` for your GitHub instance](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token) and an `` where the results are stored into:

```

python3 crawl.py  

```

[`crawl.py`](crawl.py) also provides the following optional command line arguments:

- `--api_url` for the GitHub instance URL (default: `https://api.github.com`)

- `--disable_cache` for disable caching (for larger instances not recommended)

- `--num_workers` for parallel processes (default: 1)

- `--organization` for limiting to one organization (helpful for organizations hosted on github.com)

To list all options in detail, run:

```

python3 crawl.py --h

```

### Step 2: Model pull requests as cross-border communication channels

For this step, you will need:

1) The directory of the previously collected data; and,

2) A mapping of users and countries. This can be either a `dict` for a static mapping (does not capture changes in the users' location over time) or a dataframe for time-dependent mapping as data frame monthly sampled (captures changes in the users' location over time). 

Run [`notebook.ipynb`](notebook.ipynb). Look out for the instructions as inline comments. 

## License

Copyright © 2023 Michael Dorner.

This work is licensed under [MIT license](LICENSE).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/michaeldorner/tax_se

Awesome Lists containing this project

README