{"id":18521601,"url":"https://github.com/transferwise/pipelinewise-tap-github","last_synced_at":"2025-04-09T09:33:16.952Z","repository":{"id":45495864,"uuid":"387439997","full_name":"transferwise/pipelinewise-tap-github","owner":"transferwise","description":"Singer.io Tap for extracting data from the GitHub API - PipelineWise compatible","archived":true,"fork":false,"pushed_at":"2024-09-18T16:02:27.000Z","size":257,"stargazers_count":0,"open_issues_count":4,"forks_count":4,"subscribers_count":64,"default_branch":"main","last_synced_at":"2025-03-10T23:08:53.965Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/transferwise.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-07-19T11:26:27.000Z","updated_at":"2024-09-19T23:21:25.000Z","dependencies_parsed_at":"2024-11-06T17:33:42.649Z","dependency_job_id":"4bbb49d3-adba-4bb6-b38b-9d819ff9de47","html_url":"https://github.com/transferwise/pipelinewise-tap-github","commit_stats":{"total_commits":176,"total_committers":34,"mean_commits":5.176470588235294,"dds":0.7784090909090909,"last_synced_commit":"5e4d2456d510eee5da23b016b2324f86a3612ddb"},"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/transferwise%2Fpipelinewise-tap-github","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/transferwise%2Fpipelinewise-tap-github/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/transferwise%2Fpipelinewise-tap-github/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/transferwise%2Fpipelinewise-tap-github/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/transferwise","download_url":"https://codeload.github.com/transferwise/pipelinewise-tap-github/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248012851,"owners_count":21033253,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-06T17:26:42.938Z","updated_at":"2025-04-09T09:33:16.391Z","avatar_url":"https://github.com/transferwise.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Notice\nTo better serve Wise business and customer needs, the PipelineWise codebase needs to shrink.\nWe have made the difficult decision that, going forward many components of PipelineWise will be removed or incorporated in the main repo.\nThe last version before this decision is [v0.64.1](https://github.com/transferwise/pipelinewise/tree/v0.64.1)\n\nWe thank all in the open-source community, that over the past 6 years, have helped to make PipelineWise a robust product for heterogeneous replication of many many Terabytes, daily\n\n# pipelinewise-tap-github\n\n[![PyPI version](https://badge.fury.io/py/pipelinewise-tap-github.svg)](https://badge.fury.io/py/pipelinewise-tap-github)\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/pipelinewise-tap-github.svg)](https://pypi.org/project/pipelinewise-tap-github/)\n[![License: MIT](https://img.shields.io/badge/License-AGPLv3-yellow.svg)](https://opensource.org/licenses/AGPL-3.0)\n\n[Singer](https://singer.io) tap that produces JSON-formatted data from the GitHub API following the [Singer spec](https://github.com/singer-io/getting-started/blob/master/SPEC.md).\n\nThis is a [PipelineWise](https://transferwise.github.io/pipelinewise) compatible tap connector.\n\nThis tap:\n- Pulls raw data from the [GitHub REST API](https://developer.github.com/v3/)\n- Extracts the following resources from GitHub for a single repository:\n  - [Assignees](https://developer.github.com/v3/issues/assignees/#list-assignees)\n  - [Collaborators](https://developer.github.com/v3/repos/collaborators/#list-collaborators)\n  - [Commits](https://developer.github.com/v3/repos/commits/#list-commits-on-a-repository)\n  - [Issues](https://developer.github.com/v3/issues/#list-issues-for-a-repository)\n  - [Pull Requests](https://developer.github.com/v3/pulls/#list-pull-requests)\n  - [Comments](https://developer.github.com/v3/issues/comments/#list-comments-in-a-repository)\n  - [Reviews](https://developer.github.com/v3/pulls/reviews/#list-reviews-on-a-pull-request)\n  - [Review Comments](https://developer.github.com/v3/pulls/comments/)\n  - [Stargazers](https://developer.github.com/v3/activity/starring/#list-stargazers)\n- Outputs the schema for each resource\n- Incrementally pulls data based on the input state\n\n## Quick start\n\n1. Install\n\n   We recommend using a virtualenv:\n\n    ```bash\n    python3 -m venv venv\n    . venv/bin/activate\n    pip install --upgrade pip\n    pip install .\n    ```\n\n2. Create a GitHub access token\n\n    Login to your GitHub account, go to the\n    [Personal Access Tokens](https://github.com/settings/tokens) settings\n    page, and generate a new token with at least the `repo` scope. Save this\n    access token, you'll need it for the next step.\n\n3. Create the config file\n\n    Create a JSON file containing the required fields and/or the optional ones.\n    You can decide between allow-list or deny-list strategy combining organization with repos_include and repos_exclude using wildcards.\n\nConfig                      |Required?  |Description\n:---------------------------|:---------:|:---------------\naccess_token                |yes        |The access token to access github api\nstart_date                  |yes        |The date inclusive to start extracting the data\norganization                |no         |The organization you want to extract the data from\nrepos_include               |no         |Allow list strategy to extract selected repos data from organization. Supports wildcard matching\nrepos_exclude               |no         |Deny list to extract all repos from organization except the ones listed. Supports wildcard matching\ninclude_archived            |no         |true/false to include archived repos. Default false\ninclude_disabled            |no         |true/false to include disabled repos. Default false\nrepository                  |no         |(DEPRECATED) Allow list strategy to extract selected repos data from organization(has priority over repos_exclude)\nmax_rate_limit_wait_seconds |no         |Max time to wait if you hit the github api limit. DEFAULT to 600s\n\nExample:\n```json\n{\n  \"access_token\": \"ghp_16C7e42F292c6912E7710c838347Ae178B4a\",\n  \"organization\": \"singer-io\",\n  \"repos_exclude\": \"*tests* api-docs\",\n  \"repos_include\": \"tap* getting-started pipelinewise-github\",\n  \"start_date\": \"2021-01-01T00:00:00Z\",\n  \"include_archived\": false,\n  \"include_disabled\": false,\n  \"max_rate_limit_wait_seconds\": 800\n}\n```\n\n\u003e You can also pass `singer-io/tap-github another-org/tap-octopus` on `repos_include`.\n\n\u003e For retro compatibility you can pass `repository: \"singer-io/tap-github singer-io/getting-started\"`\n\n\u003e :warning: **If you have very small repos with total size less than 0.5KB**: These will currently be excluded, as the Github repositories API returns `size: 0` for these, and `tap_github/__init__.py` currently uses `size \u003c= 0` as a way to filter out repos with no commits.\n\n4. Run the tap in discovery mode to get properties.json file\n\n    ```bash\n    tap-github --config config.json --discover \u003e properties.json\n    ```\n5. In the properties.json file, select the streams to sync\n\n    Each stream in the properties.json file has a \"schema\" entry.  To select a stream to sync, add `\"selected\": true` to that stream's \"schema\" entry.  For example, to sync the pull_requests stream:\n    ```\n    ...\n    \"tap_stream_id\": \"pull_requests\",\n    \"schema\": {\n      \"selected\": true,\n      \"properties\": {\n        \"updated_at\": {\n          \"format\": \"date-time\",\n          \"type\": [\n            \"null\",\n            \"string\"\n          ]\n        }\n    ...\n    ```\n\n6. Run the application\n\n    `tap-github` can be run with:\n\n    ```bash\n    tap-github --config config.json --properties properties.json\n    ```\n\n\n## To run tests\n\n1. Install python test dependencies in a virtual env and run nose unit and integration tests\n```\n  python3 -m venv venv\n  . venv/bin/activate\n  pip install --upgrade pip\n  pip install -e .[test]\n```\n\n2. To run unit tests:\n```\n  pytest tests/unittests\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftransferwise%2Fpipelinewise-tap-github","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftransferwise%2Fpipelinewise-tap-github","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftransferwise%2Fpipelinewise-tap-github/lists"}