{"id":19231499,"url":"https://github.com/maximilien/ghtrack","last_synced_at":"2025-04-21T03:32:00.694Z","repository":{"id":136510770,"uuid":"280541613","full_name":"maximilien/ghtrack","owner":"maximilien","description":"A python tool to keep track of GitHub users stats","archived":false,"fork":false,"pushed_at":"2023-09-28T19:09:36.000Z","size":79,"stargazers_count":3,"open_issues_count":0,"forks_count":3,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-04-01T09:01:31.239Z","etag":null,"topics":["cli","github","maximilien","python","stats"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/maximilien.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.adoc","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-07-17T23:08:01.000Z","updated_at":"2023-11-15T01:59:44.000Z","dependencies_parsed_at":null,"dependency_job_id":"6cd41194-6008-4170-b138-8102032ed89d","html_url":"https://github.com/maximilien/ghtrack","commit_stats":null,"previous_names":[],"tags_count":18,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maximilien%2Fghtrack","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maximilien%2Fghtrack/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maximilien%2Fghtrack/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maximilien%2Fghtrack/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/maximilien","download_url":"https://codeload.github.com/maximilien/ghtrack/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249991043,"owners_count":21357192,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cli","github","maximilien","python","stats"],"created_at":"2024-11-09T15:44:08.530Z","updated_at":"2025-04-21T03:32:00.424Z","avatar_url":"https://github.com/maximilien.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ghtrack\n\nAutomate getting tracking commits, prs, reviews, and issues for group of GitHub users in repos from one organization.\n\n# Getting started\n\nThere are two things you need to get started with the ghtrack CLI. From now on called CLI or `ght`.\n\nFirst, you need to get a developer, or admin account, for your `GitHub` ID. [Register](https://developer.github.com/program/), it's free. When you do, you can then create an access token in order to use the GitHub APIs v3 (or later).\n\nSecond, you need to setup your [environment](#Environment). See that section for details. However, note also that you can either setup your local machine with Python3 and the dependencies. \n\nYou can also use one of my published images, or best, the latest one here: `docker.io/drmax/ghtrack:latest`. And then run the image locally and start an interactive BASH session with:\n\n```bash\ndocker run -it docker.io/drmax/ghtrack:latest /bin/bash\n```\n\n## Credentials\n\nOnce you have an access token for your Github account, make sure to copy and keep it safe.\n\nYou will need to use this access token while invoking the CLI. You can either pass the key with each invocation, via an environment variable, or adding it to a file.\n\n1. Pass it locally using the `--access-token \u003cGitHub access token here\u003e`, without the '\u003c\u003e'\n\n2. When using an environment variable, just set it as follows:\n\n```bash\nexport GH_ACCESS_TOKEN=\u003cGitHub access token here\u003e\n```\n\n3. Alternatively, you can create a `.ghtrack.yml` file and add the key in there. Then `ght` will find the credentials for you automatically.\n\nCreate your `./.ghtrack.yml` file with a command as follows or with your favorite editor:\n\n```bash\ncat \u003e .ghtrack.yml \u003c\u003cEOF\ngh_access_token: \u003cGitHub access token here\u003e\nEOF\n```\n\n*WARNING* needless to say that you should not share, nor checkin to GitHub, nor make public any access token or any credentials data.\n\n## User guide\n\nThe following is a brief user guide for the `ght` CLI. You can see an abreviated version of this user guide by running `ght --help`\n\n```bash\n➜  ghtrack git:(master) ./ght -h\nGitHub track\n\nUsage:\n  ght commits MONTH ORG [options]\n  ght prs MONTH ORG [options]\n  ght reviews MONTH ORG [options]\n  ght issues MONTH ORG [options]\n  ght stats MONTH ORG [options]\n\n  ght (-h | --help)\n  ght (-v | --version)\n\nOptions:\n  --verbose                      Show all output.\n\n  --commits                      Collect commits stats.\n  --prs                          Collect PRs stats.\n  --reviews                      Collect reviews stats.\n  --issues                       Collect issues stats.\n\n  --summarize                    Summarize collected stats.\n\n  --rate-limit                   Enables rate limiting (default or speficy --rt-* options).\n  --rate-limit-random            Enables rate limiting by randomly picking max and sleep value with default or --rl-* values as ceilings.\n  --rl-max=100                   Max number of API calls before sleeping [default: 100].\n  --rl-sleep=30m                 Time to sleep once max API calls reach, e.g., 30m, 1h for 30 mins, 1 hour [default: 30m].\n\n  -s --state=closed              State one of 'open' or 'closed' [default: closed].\n\n  --users=user1,user2,...        List of GitHub user IDs to track.\n\n  --all-repos                    Track all repositories in GitHub organization.\n  --repos=repo1,repo2,...        List of repositories in GitHub organization to track.\n  --skip-repos=repo1,repo2,...   List of repositories in GitHub organization to skip.\n  --show-all-stats               Show all stats even when 0 or non-existant for a user [default: False].\n\n  -a --access-token=ACCESS_TOKEN Your GitHub access token to access GitHub APIs.\n\n  -o --output=CSV                The format of the output: text, json, yml, or csv [default: text].\n  -f --file=output.csv           The file path to save results file.\n\n  -h --help                      Show this screen.\n  -v --version                   Show version.\n```\n\n### `commits`\n\nThe `commits` command group is used to get commit statistics.\n\n#### Usage\n\n```bash\nght commits march knative --users=maximilien \\\n                          --all-repos \\\n                          --output=CSV \\\n                          --show-all-stats \\\n                          --file=maximilien-march-knative.csv\n```\n\n#### Description\n\nCollects all commits data for GitHub user 'maximilien' during the month of 'march' in all repos of the 'knative' organization and saves it into CSV file. All stats are displayed, so if 'maximilien' has 0 commits in a repo, the output will display 0.\n\n### `prs`\n\nThe `prs` command group is used to get commit statistics.\n\n#### Usage\n\n```bash\nght prs apr knative --users=maximilien \\\n                    --all-repos \\\n                    --skip-repos=client,client-contrib \\\n                    --output=yaml \\\n                    --file=maximilien-april-all-but-client-client-repos.yml\n```\n\n#### Description\n\nCollects all PRs data for GitHub user 'maximilien' during the month of 'april' (three letter abreviations are OK) in all repos of the 'knative' organization, except 'client' and 'client-contrib', and saves it into YAML file. Since `--show-all-stats` is not used, only repos for which user has `prs` will show in output.\n\n### `reviews`\n\nThe `reviews` command group is used to get reviews statistics on 'open' or 'closed' (default) PRs.\n\n#### Usage\n\n```bash\nght reviews july  knative --users=maximilien,mattmore \\\n                          --repos=client \\\n                          --state=open \\\n                          --output=JSON\n```\n\n#### Description\n\nCollects all reviews statistics, on 'open' PRs, for GitHub users 'maximilien' and 'mattmore' during the month of 'july' in the 'client' repo of the 'knative' organization and displays JSON in terminal.\n\n#### Example Output\n\n```bash\nGetting reviews for 2 users in 1 repos via GitHub APIs... be patient\nGetting 'reviews' for 'maximilien' in organization: 'knative'\n[============================================================] 100.0% ...processing repos\nGetting 'reviews' for 'mattmore' in organization: 'knative'\n[=========================================================---] 94.7% ...processing repos\n\n{\n    \"mattmore\": {\n        \"client\": 0\n    },\n    \"maximilien\": {\n        \"client\": 2\n    },\n    \"request\": {\n        \"data\": \"reviews\",\n        \"month\": \"july\",\n        \"org\": \"knative\",\n        \"state\": \"open\",\n        \"year\": 2020\n    }\n}\nShowing only non-zero stats, use --show-all-stats to view all\nOK\n```\n\n### `issues`\n\nThe `issues` command group is used to get issue statistics on 'open' or 'closed' (default) issues.\n\n#### Usage\n\n```bash\nght issues november knative --users=maximilien \\\n                            --repos=client,client-contrib \\\n                            --show-all-stats \\\n                            --output=txt\n```\n\n#### Description\n\nCollects all issues statistics, counting only issues that are 'closed', for GitHub user 'maximilien' during the month of 'november' in 'client-contrib' repo of the 'knative' organization and shows it as text in standard output. Show all stats, even when 0.\n\n#### Example Output\n\n```bash\nGetting issues for 1 users in 2 repos via GitHub APIs... be patient\nGetting 'issues' for 'maximilien' in organization: 'knative'\n[============================================================] 100.0% ...processing repos\n\norg        year  month     data    state\n-------  ------  --------  ------  -------\nknative    2020  november  issues  closed\n\nuser        repo            data      count\n----------  --------------  ------  -------\nmaximilien  client          issues        0\nmaximilien  client-contrib  issues        0\n```\n\n### `stats`\n\nThe `stats` command group is used to get statistics summary data for commits, prs, reviews, and issues.\n\n#### Usage\n\n```bash\nght stats june knative --users=maximilien \\\n                       --commits --prs --reviews --issues \\\n                       --all-repos \\\n                       --skip-repos=client,client-contrib\n```\n\n#### Description\n\nCollects stats summary ('--commits', '--prs', '--reviews', and '--issues') data for GitHub user 'maximilien' during the month of 'june' in all repos except 'client' and 'client-contrib' repos of the 'knative' organization and display it in standard output.\n\nYou can of course specify a subset of flags: '--commits', '--prs', '--reviews', and '--issues', and only collect these statistics.\n\n### common flags\n\nSome additional documentation on common flags:\n\n#### `--verbose`\n\nTurn this on by simply using `--verbose` to see all output. The CLI by default shows a lot of output but with `--verbose` all output is shown.\n\n#### `--summarize`\n\nUsing this flag for any of the commands will generate two additional tables of data that summarize the results independent of specific users. So the total number of commits, issues, reviews, and prs for each repo. This data is shown in two views [data, repo, total] and [repo, data, total]. For example:\n\n```bash\n./ght stats july knative --commits --issues --summarize \\\n                         --users=maximilien,octocat \\\n                         --repos=client,client-contrib \\\n                         --show-all-stats -o text \nGetting 'commits' for 'maximilien' in organization: 'knative'\n[============================================================] 100.0% ...processing repos\n...\n\norg        year  month    data     state\n-------  ------  -------  -------  -------\nknative    2020  july     commits  closed\n\nuser        repo            data       count\n----------  --------------  -------  -------\nmaximilien  client          commits        5\nmaximilien  client-contrib  commits        0\noctocat     client          commits        0\noctocat     client-contrib  commits        0\n\n...\n\norg        year  month    data    state\n-------  ------  -------  ------  -------\nknative    2020  july     issues  closed\n\nuser        repo            data      count\n----------  --------------  ------  -------\nmaximilien  client          issues        0\nmaximilien  client-contrib  issues        0\noctocat     client          issues        0\noctocat     client-contrib  issues        0\n\nrepo            data       total\n--------------  -------  -------\nclient          commits        5\nclient          prs            0\nclient          reviews        0\n...\n\ndata     repo              total\n-------  --------------  -------\ncommits  client                0\ncommits  client                5\ncommits  client-contrib        0\ncommits  client-contrib        5\n...\n\nOK\n```\n\n#### `--show-all-stats`\n\nIn many cases queries results end up with various entries with 0 totals. For instance, user `octocat` has 0 reviews, 0 prs, and 0 commits. Using `--show-all-stats` will show an entry for all collected data (0 or not). By default, 0 total entries are ommitted.\n\n#### `--rate-limit`\n\nUsing the CLI for large queries (particularly for reviews) will end up with 100s of API calls to GitHub. While there are places where `ght` could get faster by caching intermediate data and perhaps better totaling algorithms or even using smarter data structure, none will solve the fundamental issue. \n\nThe GitHub v3 public API is limited (publicly) in what we as end users can do with it. So various options are not allowed at this point. For instance, unlike commits' queries, which is fast as all such querries are pre-computed and cached --- the result is a complete totals for the past year; most other API calls, e.g., for PRs, reviews, and issues for instance are limited. You cannot do fine grained queries, ask for count, and even date limits (in case of reviews).\n\nSo in the current implementation, `ght` has to often get all the data and process it locally. This is good for the GitHub API servers but bad for the local clients (`ght`). But as the GitHub APIs is free, one cannot complain.\n\nSo one solution to avoid running into rate limiting errors (performing more API calls than allowed within a period of time), the CLI offers `--rate-limit` and `--rate-limit-random` which allows the CLI to slow down its API invocations. This is done as follows:\n\n1. Use `--rate-limit` and `ght` will automatically sleep periodically once it reaches some fixed number of API calls. \n\n2. Use `--rate-limit` and the associated `--rl-max` and `--rl-sleep` to specify the values for max number of API calls and the value of the sleep. For instance the following call will rate limit after 5 API calls and sleep for 10 seconds before continueing:\n\n```bash\n/ght stats july knative --commits --issues --summarize \\\n                        --users=maximilien,octocat \\\n                        --repos=client,client-contrib \\\n                        --show-all-stats -o text \\\n                        --rate-limit --rl-max=5 --rl-sleep=10s\n\nGetting 'commits' for 'maximilien' in organization: 'knative'\n[============================================================] 100.0% ...processing repos\nGetting 'commits' for 'octocat' in organization: 'knative'\n[================================================------------] 80.0% ...processing repos\nWarning: Rate limit API calls reach '5' and sleeping for '10' seconds\n...\n```\n\nAll of the various commands (stats, commits, prs, reviews, and issues) can use `--rate-limit` flags.\n\n#### `--rate-limit-random`\n\nExactly like `--rate-limit` except that the value for max API calls and for sleep is determine using a radom number generator selecting a random value between 1 and the value for max API calls or for sleep.\n\n## Workflows\n\nTODO\n\n# Developing\n\nWe welcome your contributions. You can do so by opening [issues](/issues) for features and bugs you find. Or you can submit [PRs](/pulls) when you have specific changes you would like to suggest. These changes can be both for source code, tests, and docs.\n\n## Environment\n\nThis CLI uses Python 3.0 or later. Please [download Python 3](https://www.python.org/downloads/) for your particular environment to get started.\n\n### Local\n\nTo run this CLI in your local machine. Besides Python 3 you will also need to install some dependencies. You can do so using Python's `pip` tool. Fist ensure [`pip` is installed](https://pip.pypa.io/en/stable/installing/) on your machine.\n\nOnce `pip` is installed, then install the dependencies with:\n\n```bash\npip install PyGitHub==1.51\npip install PyYAML==5.3.1\npip install docopt==0.6.2\npip install tabulate==0.8.7\n```\n\nYou can verify that your system is running by running the unit tests: `./hack/build.sh --test`.\n\nAlso run the CLI help with: `ght --help`\n\n### Container\n\nAlternatively, you can use `docker.io/drmax/ghtrack:latest` public image. Instantiate it. Get access to it via a command line shell and use the tool there. This image has all dependencies and code for `ght`. The following command should do this:\n\n```bash\ndocker run -it docker.io/drmax/ghtrack:latest /bin/bash\n```\n\n### Create image\n\nIf you set your the environment variable 'DOCKER_USERNAME' with your [Docker Hub](https://hub.docker.com/) username and you install the [docker tooling](https://docs.docker.com/get-docker/), then you can generate a Docker container image by running `./hack/build.sh --docker`. The image will contain all dependencies and this tool source code.\n\n## Testing\n\nThe code includes both unit tests and integration tests. You can run all unit tests by invoking: `./hack/build.sh --tests`.\n\nIntegration tests will require you to have a GitHub access token in a file called `.ghtrack.yml` or setting the access token in an environment variable called `GH_ACCESS_TOKEN`. You can then invoke `./build/build.sh --e2e` to run the integration tests.\n\nYou can run both types of tests in sequence with `./hack/build.sh --tests`\n\nOnce you can run all the tests. Please make your changes, add more tests, verify that all tests are still passing. Create and submit a PR.\n\n# Next steps?\n\nThe following are immediate next steps:\n\n1. add some common workflows (e.g., get reviews in last two months for k8s or knative)\n2. look how to speed up some of the operations by caching intermediary data or Github APIs calls\n3. make e2e tests faster (which might get there with solution for 2)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaximilien%2Fghtrack","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmaximilien%2Fghtrack","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaximilien%2Fghtrack/lists"}