{"id":13405650,"url":"https://github.com/dogsheep/github-to-sqlite","last_synced_at":"2025-04-12T17:45:40.899Z","repository":{"id":35102946,"uuid":"207052882","full_name":"dogsheep/github-to-sqlite","owner":"dogsheep","description":"Save data from GitHub to a SQLite database","archived":false,"fork":false,"pushed_at":"2024-01-15T05:56:13.000Z","size":143,"stargazers_count":420,"open_issues_count":24,"forks_count":45,"subscribers_count":8,"default_branch":"main","last_synced_at":"2025-04-03T20:11:43.121Z","etag":null,"topics":["datasette","datasette-io","datasette-tool","dogsheep","github-api","sqlite"],"latest_commit_sha":null,"homepage":"https://github-to-sqlite.dogsheep.net/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dogsheep.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-09-08T02:50:28.000Z","updated_at":"2025-03-06T09:02:19.000Z","dependencies_parsed_at":"2024-06-19T13:27:52.578Z","dependency_job_id":"72089f4a-a632-408a-971e-0ca241a20021","html_url":"https://github.com/dogsheep/github-to-sqlite","commit_stats":{"total_commits":175,"total_committers":6,"mean_commits":"29.166666666666668","dds":0.06285714285714283,"last_synced_commit":"6eb97a2da73e1d71a53d3039474de34b0408f478"},"previous_names":[],"tags_count":24,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dogsheep%2Fgithub-to-sqlite","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dogsheep%2Fgithub-to-sqlite/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dogsheep%2Fgithub-to-sqlite/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dogsheep%2Fgithub-to-sqlite/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dogsheep","download_url":"https://codeload.github.com/dogsheep/github-to-sqlite/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248609548,"owners_count":21132915,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["datasette","datasette-io","datasette-tool","dogsheep","github-api","sqlite"],"created_at":"2024-07-30T19:02:07.392Z","updated_at":"2025-04-12T17:45:40.867Z","avatar_url":"https://github.com/dogsheep.png","language":"Python","readme":"# github-to-sqlite\n\n[![PyPI](https://img.shields.io/pypi/v/github-to-sqlite.svg)](https://pypi.org/project/github-to-sqlite/)\n[![Changelog](https://img.shields.io/github/v/release/dogsheep/github-to-sqlite?include_prereleases\u0026label=changelog)](https://github.com/dogsheep/github-to-sqlite/releases)\n[![Tests](https://github.com/dogsheep/github-to-sqlite/workflows/Test/badge.svg)](https://github.com/dogsheep/github-to-sqlite/actions?query=workflow%3ATest)\n[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/dogsheep/github-to-sqlite/blob/main/LICENSE)\n\nSave data from GitHub to a SQLite database.\n\n\u003c!-- toc --\u003e\n\n- [Demo](#demo)\n- [How to install](#how-to-install)\n- [Authentication](#authentication)\n- [Fetching issues for a repository](#fetching-issues-for-a-repository)\n- [Fetching pull requests for a repository](#fetching-pull-requests-for-a-repository)\n- [Fetching issue comments for a repository](#fetching-issue-comments-for-a-repository)\n- [Fetching commits for a repository](#fetching-commits-for-a-repository)\n- [Fetching releases for a repository](#fetching-releases-for-a-repository)\n- [Fetching tags for a repository](#fetching-tags-for-a-repository)\n- [Fetching contributors to a repository](#fetching-contributors-to-a-repository)\n- [Fetching repos belonging to a user or organization](#fetching-repos-belonging-to-a-user-or-organization)\n- [Fetching specific repositories](#fetching-specific-repositories)\n- [Fetching repos that have been starred by a user](#fetching-repos-that-have-been-starred-by-a-user)\n- [Fetching users that have starred specific repos](#fetching-users-that-have-starred-specific-repos)\n- [Fetching GitHub Actions workflows](#fetching-github-actions-workflows)\n- [Scraping dependents for a repository](#scraping-dependents-for-a-repository)\n- [Fetching emojis](#fetching-emojis)\n- [Making authenticated API calls](#making-authenticated-api-calls)\n\n\u003c!-- tocstop --\u003e\n\n## Demo\n\nhttps://github-to-sqlite.dogsheep.net/ hosts a [Datasette](https://datasette.io/) demo of a database created by [running this tool](https://github.com/dogsheep/github-to-sqlite/blob/main/.github/workflows/deploy-demo.yml#L40-L60) against all of the repositories in the [Dogsheep GitHub organization](https://github.com/dogsheep), plus the [datasette](https://github.com/simonw/datasette) and [sqlite-utils](https://github.com/simonw/sqlite-utils) repositories.\n\n## How to install\n\n    $ pip install github-to-sqlite\n\n## Authentication\n\nCreate a GitHub personal access token: https://github.com/settings/tokens\n\nRun this command and paste in your new token:\n\n    $ github-to-sqlite auth\n\nThis will create a file called `auth.json` in your current directory containing the required value. To save the file at a different path or filename, use the `--auth=myauth.json` option.\n\nAs an alternative to using an `auth.json` file you can add your access token to an environment variable called `GITHUB_TOKEN`.\n\n## Fetching issues for a repository\n\nThe `issues` command retrieves all of the issues belonging to a specified repository.\n\n    $ github-to-sqlite issues github.db simonw/datasette\n\nIf an `auth.json` file is present it will use the token from that file. It works without authentication for public repositories but you should be aware that GitHub have strict IP-based rate limits for unauthenticated requests.\n\nYou can point to a different location of `auth.json` using `-a`:\n\n    $ github-to-sqlite issues github.db simonw/datasette -a /path/to/auth.json\n\nYou can use the `--issue` option one or more times to load specific issues:\n\n    $ github-to-sqlite issues github.db simonw/datasette --issue=1\n\nExample: [issues table](https://github-to-sqlite.dogsheep.net/github/issues)\n\n## Fetching pull requests for a repository\n\nWhile pull requests are a type of issue, you will get more information on pull requests by pulling them separately. For example, whether a pull request has been merged and when.\n\nFollowing the API of issues, the `pull-requests` command retrieves all of the pull requests belonging to a specified repository.\n\n    $ github-to-sqlite pull-requests github.db simonw/datasette\n\nYou can use the `--pull-request` option one or more times to load specific pull request:\n\n    $ github-to-sqlite pull-requests github.db simonw/datasette --pull-request=81\n\nNote that the `merged_by` column on the `pull_requests` table will only be populated for pull requests that are loaded using the `--pull-request` option - the GitHub API does not return this field for pull requests that are loaded in bulk.\n\nYou can load only pull requests in a certain state with the `--state` option:\n\n    $ github-to-sqlite pull-requests --state=open github.db simonw/datasette\n\nPull requests across an entire organization (or more than one) can be loaded with `--org`:\n\n    $ github-to-sqlite pull-requests --state=open --org=psf --org=python github.db\n\nYou can use a search query to find pull requests.  Note that no more than 1000 will be loaded (this is a GitHub API limitation), and some data will be missing (base and head SHAs).  When using searches, other filters are ignored; put all criteria into the search itself:\n\n    $ github-to-sqlite pull-requests --search='org:python defaultdict state:closed created:\u003c2023-09-01' github.db\n\nExample: [pull_requests table](https://github-to-sqlite.dogsheep.net/github/pull_requests)\n\n## Fetching issue comments for a repository\n\nThe `issue-comments` command retrieves all of the comments on all of the issues in a repository.\n\nIt is recommended you run `issues` first, so that each imported comment can have a foreign key pointing to its issue.\n\n    $ github-to-sqlite issues github.db simonw/datasette\n    $ github-to-sqlite issue-comments github.db simonw/datasette\n\nYou can use the `--issue` option to only load comments for a specific issue within that repository, for example:\n\n    $ github-to-sqlite issue-comments github.db simonw/datasette --issue=1\n\nExample: [issue_comments table](https://github-to-sqlite.dogsheep.net/github/issue_comments)\n\n## Fetching commits for a repository\n\nThe `commits` command retrieves details of all of the commits for one or more repositories. It currently fetches the SHA, commit message and author and committer details; it does not retrieve the full commit body.\n\n    $ github-to-sqlite commits github.db simonw/datasette simonw/sqlite-utils\n\nThe command accepts one or more repositories.\n\nBy default it will stop as soon as it sees a commit that has previously been retrieved. You can force it to retrieve all commits (including those that have been previously inserted) using `--all`.\n\nExample: [commits table](https://github-to-sqlite.dogsheep.net/github/commits)\n\n## Fetching releases for a repository\n\nThe `releases` command retrieves the releases for one or more repositories.\n\n    $ github-to-sqlite releases github.db simonw/datasette simonw/sqlite-utils\n\nThe command accepts one or more repositories.\n\nExample: [releases table](https://github-to-sqlite.dogsheep.net/github/releases)\n\n## Fetching tags for a repository\n\nThe `tags` command retrieves all of the tags for one or more repositories.\n\n    $ github-to-sqlite tags github.db simonw/datasette simonw/sqlite-utils\n\nExample: [tags table](https://github-to-sqlite.dogsheep.net/github/tags)\n\n## Fetching contributors to a repository\n\nThe `contributors` command retrieves details of all of the contributors for one or more repositories.\n\n    $ github-to-sqlite contributors github.db simonw/datasette simonw/sqlite-utils\n\nThe command accepts one or more repositories. It populates a `contributors` table, with foreign keys to `repos` and `users` and a `contributions` table listing the number of commits to that repository for each contributor.\n\nExample: [contributors table](https://github-to-sqlite.dogsheep.net/github/contributors)\n\n## Fetching repos belonging to a user or organization\n\nThe `repos` command fetches repos belonging to a user or organization.\n\nWithout any other arguments, this command will fetch all repos that the currently authenticated user owns, collaborates on or can access via one of their organizations:\n\n    $ github-to-sqlite repos github.db\n\nTo fetch repos belonging to a specific user or organization, provide their username as an argument:\n\n    $ github-to-sqlite repos github.db dogsheep # organization\n    $ github-to-sqlite repos github.db simonw # user\n\nYou can pass more than one username to fetch for multiple users or organizations at once:\n\n    $ github-to-sqlite repos github.db simonw dogsheep\n\nAdd the `--readme` option to save the README for the repo in a column called `readme`. Add `--readme-html` to save the HTML rendered version of the README into a column called `readme_html`.\n\nExample: [repos table](https://github-to-sqlite.dogsheep.net/github/repos)\n\n## Fetching specific repositories\n\nYou can use `-r` with the `repos` command one or more times to fetch just specific repositories.\n\n    $ github-to-sqlite repos github.db -r simonw/datasette -r dogsheep/github-to-sqlite\n\n## Fetching repos that have been starred by a user\n\nThe `starred` command fetches the repos that have been starred by a user.\n\n    $ github-to-sqlite starred github.db simonw\n\nIf you are using an `auth.json` file you can omit the username to retrieve the starred repos for the authenticated user.\n\nExample: [stars table](https://github-to-sqlite.dogsheep.net/github/stars)\n\n## Fetching users that have starred specific repos\n\nThe `stargazers` command fetches the users that have starred the specified repos.\n\n    $ github-to-sqlite stargazers github.db simonw/datasette dogsheep/github-to-sqlite\n\nYou can specify one or more repository using `owner/repo` syntax.\n\nUsers fetched using this command will be inserted into the `users` table. Many-to-many records showing which repository they starred will be added to the `stars` table.\n\n## Fetching GitHub Actions workflows\n\nThe `workflows` command fetches the YAML workflow configurations from each repository's `.github/workflows` directory and parses them to populate `workflows`, `jobs` and `steps` tables.\n\n    $ github-to-sqlite workflows github.db simonw/datasette dogsheep/github-to-sqlite\n\nYou can specify one or more repository using `owner/repo` syntax.\n\nExample: [workflows table](https://github-to-sqlite.dogsheep.net/github/workflows), [jobs table](https://github-to-sqlite.dogsheep.net/github/jobs), [steps table](https://github-to-sqlite.dogsheep.net/github/steps)\n\n## Scraping dependents for a repository\n\nThe GitHub dependency graph can show other GitHub projects that depend on a specific repo, for example [simonw/datasette/network/dependents](https://github.com/simonw/datasette/network/dependents).\n\nThis data is not yet available through the GitHub API. The `scrape-dependents` command scrapes those pages and uses the GitHub API to load full versions of the dependent repositories.\n\n    $ github-to-sqlite scrape-dependents github.db simonw/datasette\n\nThe command accepts one or more repositories.\n\nAdd `-v` for verbose output.\n\nExample: [dependents table](https://github-to-sqlite.dogsheep.net/github/dependents?_sort_desc=first_seen_utc)\n\n## Fetching emojis\n\nYou can fetch a list of every emoji supported by GitHub using the `emojis` command:\n\n    $ github-to-sqlite emojis github.db\n\nThis will create a table called `emojis` with a primary key `name` and a `url` column.\n\nIf you add the `--fetch` option the command will also fetch the binary content of the images and place them in an `image` column:\n\n    $ github-to-sqlite emojis emojis.db -f\n    [########----------------------------]  397/1799   22%  00:03:43\n\nYou can then use the [datasette-render-images](https://github.com/simonw/datasette-render-images) plugin to browse them visually.\n\nExample: [emojis table](https://github-to-sqlite.dogsheep.net/github/emojis)\n\n## Making authenticated API calls\n\nThe `github-to-sqlite get` command provides a convenient shortcut for making authenticated calls to the API. Once you have created your `auth.json` file (or set a `GITHUB_TOKEN` environment variable) you can use it like this:\n\n    $ github-to-sqlite get https://api.github.com/gists\n\nThis will make an authenticated call to the URL you provide and pretty-print the resulting JSON to the console.\n\nYou can omit the `https://api.github.com/` prefix, for example:\n\n    $ github-to-sqlite get /gists\n\nMany GitHub APIs are [paginated using the HTTP Link header](https://docs.github.com/en/rest/guides/traversing-with-pagination). You can follow this pagination and output a list of all of the resulting items using `--paginate`:\n\n    $ github-to-sqlite get /users/simonw/repos --paginate\n\nYou can outline newline-delimited JSON for each item using `--nl`. This can be useful for streaming items into another tool.\n\n    $ github-to-sqlite get /users/simonw/repos --nl\n","funding_links":[],"categories":["Python","github-api","sqlite"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdogsheep%2Fgithub-to-sqlite","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdogsheep%2Fgithub-to-sqlite","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdogsheep%2Fgithub-to-sqlite/lists"}