{"id":17160612,"url":"https://github.com/michaeldorner/tax_se","last_synced_at":"2025-04-13T14:11:16.851Z","repository":{"id":186168477,"uuid":"624322075","full_name":"michaeldorner/tax_se","owner":"michaeldorner","description":"Replication package for our work on \"Taxing Collaborative Software Engineering\"","archived":false,"fork":false,"pushed_at":"2024-04-12T07:24:47.000Z","size":54,"stargazers_count":4,"open_issues_count":1,"forks_count":3,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-27T05:12:28.526Z","etag":null,"topics":["codereview","github","github-api","python","replication-package","tax"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/michaeldorner.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-04-06T08:03:25.000Z","updated_at":"2023-04-24T18:58:18.000Z","dependencies_parsed_at":null,"dependency_job_id":"ad1774bc-bf2f-4fbc-bb43-c17f78af85cd","html_url":"https://github.com/michaeldorner/tax_se","commit_stats":null,"previous_names":["michaeldorner/tax_se"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michaeldorner%2Ftax_se","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michaeldorner%2Ftax_se/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michaeldorner%2Ftax_se/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michaeldorner%2Ftax_se/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/michaeldorner","download_url":"https://codeload.github.com/michaeldorner/tax_se/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248724629,"owners_count":21151561,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["codereview","github","github-api","python","replication-package","tax"],"created_at":"2024-10-14T22:25:24.602Z","updated_at":"2025-04-13T14:11:16.830Z","avatar_url":"https://github.com/michaeldorner.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Taxing Collaborative Software Engineering\n\n[![GitHub](https://img.shields.io/github/license/michaeldorner/tax_se)](./LICENSE)\n[![Codacy Badge](https://app.codacy.com/project/badge/Grade/cca06dbbf55946b883129195e855ecd1)](https://app.codacy.com/gh/michaeldorner/tax_se/dashboard?utm_source=gh\u0026utm_medium=referral\u0026utm_content=\u0026utm_campaign=Badge_grade)\n\nReplication package for our work on \"Taxing Collaborative Software Engineering\"\n\n## Requirements\n\nThis replication package requires Python 3.10 or higher. Install the dependencies via:\n\n```\npython3 -m pip install -r requirements.txt\n```\n\nFor a faster loading, we recommend to optionally install [`orjson`](https://github.com/ijl/orjson) via pip:\n```\npython3 -m pip install orjson\n```\n\n## How to run\n\n### Step 1: Crawl \n\nFirst, we collect all timelines from all pull requests at a GitHub instance. [`crawler.py`](crawler.py) requires an [`\u003capi_token\u003e` for your GitHub instance](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token) and an `\u003cout_dir\u003e` where the results are stored into:\n```\npython3 crawl.py \u003capi_token\u003e \u003cout_dir\u003e\n```\n[`crawl.py`](crawl.py) also provides the following optional command line arguments:\n- `--api_url` for the GitHub instance URL (default: `https://api.github.com`)\n- `--disable_cache` for disable caching (for larger instances not recommended)\n- `--num_workers` for parallel processes (default: 1)\n- `--organization` for limiting to one organization (helpful for organizations hosted on github.com)\n\nTo list all options in detail, run:\n```\npython3 crawl.py --h\n```\n\n### Step 2: Model pull requests as cross-border communication channels\n\nFor this step, you will need:\n1) The directory of the previously collected data; and,\n2) A mapping of users and countries. This can be either a `dict` for a static mapping (does not capture changes in the users' location over time) or a dataframe for time-dependent mapping as data frame monthly sampled (captures changes in the users' location over time). \n\nRun [`notebook.ipynb`](notebook.ipynb). Look out for the instructions as inline comments. \n\n## License\n\nCopyright © 2023 Michael Dorner.\n\nThis work is licensed under [MIT license](LICENSE).\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmichaeldorner%2Ftax_se","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmichaeldorner%2Ftax_se","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmichaeldorner%2Ftax_se/lists"}