{"id":13585576,"url":"https://github.com/jgehrcke/github-repo-stats","last_synced_at":"2025-04-07T10:31:14.715Z","repository":{"id":37537741,"uuid":"321770581","full_name":"jgehrcke/github-repo-stats","owner":"jgehrcke","description":"GitHub Action for advanced repository traffic analysis and reporting","archived":false,"fork":false,"pushed_at":"2023-10-01T13:53:50.000Z","size":437,"stargazers_count":312,"open_issues_count":16,"forks_count":41,"subscribers_count":4,"default_branch":"main","last_synced_at":"2024-10-30T06:34:01.717Z","etag":null,"topics":["monitoring","statistics","visualization"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jgehrcke.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGES.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2020-12-15T19:37:04.000Z","updated_at":"2024-10-29T18:09:34.000Z","dependencies_parsed_at":"2024-01-14T04:35:51.246Z","dependency_job_id":"1bc55889-4e06-4c19-aeaf-e9427e9b31f7","html_url":"https://github.com/jgehrcke/github-repo-stats","commit_stats":{"total_commits":313,"total_committers":5,"mean_commits":62.6,"dds":"0.019169329073482455","last_synced_commit":"306db38ad131cab2aa5f2cd3062bf6f8aa78c1aa"},"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jgehrcke%2Fgithub-repo-stats","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jgehrcke%2Fgithub-repo-stats/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jgehrcke%2Fgithub-repo-stats/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jgehrcke%2Fgithub-repo-stats/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jgehrcke","download_url":"https://codeload.github.com/jgehrcke/github-repo-stats/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":222456219,"owners_count":16987612,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["monitoring","statistics","visualization"],"created_at":"2024-08-01T15:05:01.403Z","updated_at":"2024-11-06T03:31:29.418Z","avatar_url":"https://github.com/jgehrcke.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# github-repo-stats\n\nThis is a GitHub Action originally built to overcome the [14-day limitation](https://github.com/isaacs/github/issues/399) of GitHub's built-in traffic statistics.\n\nRun this daily to collect potentially valuable data.\n\nAccording to the motto: a data snapshot each day keeps the doctor away 🍎\n\nSee [this Action in Marketplace](https://github.com/marketplace/actions/github-repo-stats).\n\nHigh-level method description:\n\n* This GitHub Action runs once per day. Each run yields a snapshot of repository traffic statistics (influenced by the past 14 days). Snapshots are persisted via git.\n* Each run performs data analysis on all individual snapshots and generates a report from the aggregate — covering an *arbitrarily* long time frame.\n\nLooking for a quick start? Follow the [simple tutorial](https://github.com/jgehrcke/github-repo-stats/wiki/Tutorial) in the Wiki.\n\n\n## Demo\n\n**Demo 1**:\n* [HTML report](https://jgehrcke.github.io/ghrs-test/jgehrcke/github-repo-stats/latest-report/report.html), [PDF report](https://jgehrcke.github.io/ghrs-test/jgehrcke/github-repo-stats/latest-report/report.pdf)\n* [Workflow file](https://github.com/jgehrcke/ghrs-test/blob/github-repo-stats/.github/workflows/github-repo-stats-ghrs.yml), [data branch](https://github.com/jgehrcke/ghrs-test/tree/github-repo-stats/jgehrcke/github-repo-stats)\n\n**Demo 2**:\n* [HTML report](https://jgehrcke.github.io/ghrs-test/jgehrcke/covid-19-germany-gae/latest-report/report.html), [PDF report](https://jgehrcke.github.io/ghrs-test/jgehrcke/covid-19-germany-gae/latest-report/report.pdf)\n* [Workflow file](https://github.com/jgehrcke/ghrs-test/blob/github-repo-stats/.github/workflows/github-repo-stats-cov19.yml), [data branch](https://github.com/jgehrcke/ghrs-test/tree/github-repo-stats/jgehrcke/github-repo-stats)\n\nFor more use cases (and their setup), see \"Used by\" section below.\n\n## Highlights\n\n* The report is generated in two document formats: HTML and PDF.\n* The HTML report resembles how GitHub renders Markdown and is meant to be exposed via GitHub pages.\n* Charts are based on [Altair](https://github.com/altair-viz/altair)/[Vega](https://vega.github.io/vega/).\n* The PDF report contains vector graphics.\n* Data updates, aggregation results, and report files are stored in the git repository that you install this Action in: this Action commits changes to a special branch. No cloud storage or database needed. As a result, you have complete and transparent history for data updates and reports, with clear commit messages, in a single place.\n* The observed repository (the one to build the report for) can be different from the repository you install this Action in.\n* The HTML report can be served right away via GitHub pages (that is how the demo above works).\n* Careful data analysis: there are a number of traps ([example](https://github.com/jgehrcke/github-repo-stats/blob/5fefc527288995e2e7e35593db496451580f51db/analyze.py#L748)) when aggregating data based on what the GitHub Traffic API returns. This project tries to not fall for them. One goal of this project is to perform [advanced analysis](https://github.com/jgehrcke/github-repo-stats/blob/5fefc527288995e2e7e35593db496451580f51db/analyze.py#L478) where possible.\n\n## Report content\n\n* Traffic stats:\n  * Unique and total views per day\n  * Unique and total clones per day\n  * Top referrers (where people come from when they land in your repository)\n  * Top paths (what people like to look at in your repository)\n* Evolution of stargazers\n* Evolution of forks\n\n## Credits\n\nThis walks on the shoulders of giants:\n\n* [Pandoc](https://pandoc.org/) for rendering HTML from Markdown.\n* [Altair](https://altair-viz.github.io/) and [Vega-Lite](https://vega.github.io/vega-lite/) for visualization.\n* [Pandas](https://pandas.pydata.org/) for data analysis.\n* The [CPython](https://www.python.org/) ecosystem which has always been fun for me to build software in.\n\n## Documentation\n\n### Terminology: *stats repository* and *data repository*\n\nNaming is hard :-). Let's define two concepts and their names:\n\n* The *stats repository* is the repository to fetch stats for and to generate the report for.\n* The *data repository* is the repository to store data and report files in. This is also the repository where this Action runs in.\n\nLet me know if you can think of better names.\n\nThese two repositories can be the same. But they don't have to be :-).\n\nThat is, you can for example set up this Action in a private repository but have it observe a public repository.\n\n### Setup\n\nThis section contains brief instructions for a scenario where the data repository is different from the stats repository.\nFor a more detailed walkthrough (showing how to greate a personal access token, and also which `git` commands to use) please follow the [Tutorial](https://github.com/jgehrcke/github-repo-stats/wiki/Tutorial) in the wiki.\n\nExample scenario:\n\n* stats repository: `bob/nice-project`\n* data repository: `bob/private-ghrs-data-repo`\n\nCreate a GitHub Actions workflow file in the *data repository* (in the example this is the repo `bob/private-ghrs-data-repo`). Example path: `.github/workflows/repostats-for-nice-project.yml`.\n\nExample workflow file content with code comments:\n\n```yaml\non:\n  schedule:\n    # Run this once per day, towards the end of the day for keeping the most\n    # recent data point most meaningful (hours are interpreted in UTC).\n    - cron: \"0 23 * * *\"\n  workflow_dispatch: # Allow for running this manually.\n\njobs:\n  j1:\n    name: repostats-for-nice-project\n    runs-on: ubuntu-latest\n    steps:\n      - name: run-ghrs\n        uses: jgehrcke/github-repo-stats@RELEASE\n        with:\n          # Define the stats repository (the repo to fetch\n          # stats for and to generate the report for).\n          # Remove the parameter when the stats repository\n          # and the data repository are the same.\n          repository: bob/nice-project\n          # Set a GitHub API token that can read the GitHub\n          # repository traffic API for the stats repository,\n          # and that can push commits to the data repository\n          # (which this workflow file lives in, to store data\n          # and the report files).\n          ghtoken: ${{ secrets.ghrs_github_api_token }}\n\n```\n\n**Note:** the recommended way to run this Action is on a schedule, once per day. Really.\n\n**Note:** defining `ghtoken: ${{ secrets.ghrs_github_api_token }}` is required. In the _data_ repository (where the action is executed) you need to have a secret defined, with the name `GHRS_GITHUB_API_TOKEN` (of course you can change the name in both places).\nThe content of the secret needs to be an API token that has the `repo` scope. Follow the [tutorial](https://github.com/jgehrcke/github-repo-stats/wiki/Tutorial) for precise instructions.\n\n### Config parameter reference\n\nIn the workflow file you can set various configuration parameters. They\nare specified and documented in the `action.yml` file (the reference). Here\nis a quick description, for convenience:\n\n* `ghtoken`: GitHub API token for reading the GitHub repository traffic API for\n  the stats repo, and for pushing commits to the data repo. Required.\n* `repository`: Repository spec (`\u003cowner-or-org\u003e/\u003creponame\u003e`) for the repository\n  to fetch statistics for. Default: `${{ github.repository }}` (the repo this\n  Action runs in).\n* `databranch`: Branch to push data to (in the data repo).\n  Default: `github-repo-stats`\n* `ghpagesprefix`: Set this if the data branch in the data repo is exposed via\n  GitHub pages. Must not end with a slash.\n  Example: `https://jgehrcke.github.io/ghrs-test`\n  Default: none\n\nIt is recommended that you create the data branch and delete all files from that branch before setting this Action up in your repository, so that this data branch appears as a tidy environment.\nYou can of course remove files from that branch at any other point in time, too.\n\n### Tracking multiple repositories via `matrix`\n\nThe GitHub Actions workflow specification language allows for defining a matrix of different job configurations through the [`jobs.\u003cjob_id\u003e.strategy.matrix`](https://docs.github.com/en/actions/learn-github-actions/workflow-syntax-for-github-actions#jobsjob_idstrategymatrix) directive.\nThis can be used for efficiently tracking multiple stats repositories from within the same data repository.\n\n_Example workflow file:_\n\n```yaml\nname: fetch-repository-stats\nconcurrency: fetch-repository-stats\n\non:\n  schedule:\n    - cron: \"0 23 * * *\"\n  workflow_dispatch:\n\njobs:\n  run-ghrs-with-matrix:\n    name: repostats-for-nice-projects\n    runs-on: ubuntu-latest\n    strategy:\n      matrix:\n        # The repositories to generate reports for.\n        statsRepo: ['bob/nice-project', 'alice/also-nice-project']\n      # Do not cancel\u0026fail all remaining jobs upon first job failure.\n      fail-fast: false\n      # Help avoid commit conflicts. Note(JP): this should not be\n      # necessary anymore, feedback appreciated\n      max-parallel: 1\n    steps:\n      - name: run-ghrs\n        uses: jgehrcke/github-repo-stats@RELEASE\n        with:\n          repository: ${{ matrix.statsRepo }}\n          ghtoken: ${{ secrets.ghrs_github_api_token }}\n```\n\n## Developer notes\n\n### CLI tests\n\nHere is how to run [bats](https://github.com/bats-core/bats-core)-based checks from within a checkout:\n\n```bash\n$ git clone https://github.com/jgehrcke/github-repo-stats\n$ cd github-repo-stats/\n\n$ make clitests\n...\n1..5\nok 1 analyze.py: snapshots: some, vcagg: yes, stars: some, forks: none\nok 2 analyze.py: snapshots: some, vcagg: yes, stars: none, forks: some\nok 3 analyze.py: snapshots: some, vcagg: yes, stars: some, forks: some\nok 4 analyze.py: snapshots: some, vcagg: no, stars: some, forks: some\nok 5 analyze.py + pdf.py: snapshots: some, vcagg: no, stars: some, forks: some\n```\n\n### Lint\n\n```bash\n$ make lint\n...\nAll done! ✨ 🍰 ✨\n...\n```\n\n### local run of entrypoint.sh\n\nSet environment variables, example:\n\n```bash\nexport GITHUB_REPOSITORY=jgehrcke/ghrs-test\nexport GITHUB_WORKFLOW=\"localtesting\"\nexport INPUT_DATABRANCH=databranch-test\nexport INPUT_GHTOKEN=\"c***1\"\nexport INPUT_REPOSITORY=jgehrcke/covid-19-germany-gae\nexport INPUT_GHPAGESPREFIX=\"none\"\nexport GHRS_FILES_ROOT_PATH=\"/home/jp/dev/github-repo-stats\"\nexport GHRS_TESTING=\"true\"\n```\n\n(for an up-to-date list of required env vars see `.github/workflows/prs.yml`)\n\nRun in empty directory. Example:\n\n```bash\ncd /tmp/ghrstest\nrm -rf .* *; bash /home/jp/dev/github-repo-stats/entrypoint.sh\n```\n\n## Further resources\n\n* [“GitHub Stars” -- useful for *what*?](https://opensource.stackexchange.com/questions/5110/github-stars-is-a-very-useful-metric-but-for-what/5114#5114)\n* [GitHub Traffic API docs](https://docs.github.com/en/free-pro-team@latest/rest/reference/repos#traffic)\n* [Do your own views count?](https://stackoverflow.com/a/63697886/145400)\n\n## Used by\n\nA few rather randomly picked use cases:\n\n\n* https://github.com/idurar/erp-crm/tree/github-repo-stats/idurar/idurar-erp-crm\n* https://github.com/awslabs/aws-security-analytics-bootstrap/tree/github-repo-stats/awslabs/aws-security-analytics-bootstrap\n* https://github.com/centerofci/mathesar/tree/github-repo-stats/centerofci/mathesar\n* https://github.com/carbon-design-system/carbon/tree/github-repo-stats/carbon-design-system/carbon\n* https://github.com/Pythagora-io/pythagora/tree/github-repo-stats/Pythagora-io/pythagora\n* https://github.com/ignite-hq/cli/tree/github-repo-stats/ignite-hq/cli\n* https://github.com/tom-doerr/github_repo_stats_data/tree/master/tom-doerr\n* https://github.com/ethyca/fides-stats/tree/main/ethyca/fides\n* https://github.com/dylansdaniels/hnn_tracking_test/tree/main/jonescompneurolab/hnn\n* https://github.com/idaholab/repository-statistics/tree/main/idaholab\n* https://github.com/Declipsonator/Tweaks-Stats/tree/main/Declipsonator/Meteor-Tweaks\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjgehrcke%2Fgithub-repo-stats","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjgehrcke%2Fgithub-repo-stats","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjgehrcke%2Fgithub-repo-stats/lists"}