{"id":29953753,"url":"https://github.com/bocaletto-luca/githubtointernetarchive","last_synced_at":"2026-05-11T07:15:12.899Z","repository":{"id":302742503,"uuid":"1013476733","full_name":"bocaletto-luca/GithubToInternetArchive","owner":"bocaletto-luca","description":"GitHub Archiver to Internet Archive A single-file Python tool (main.py) that mirrors every repository of a GitHub user or organization and uploads each mirror as a tarball to Internet Archive. Metadata (description, license, topics) are automatically pulled from GitHub and attached to each upload. @bocaletto-luca","archived":false,"fork":false,"pushed_at":"2025-07-04T01:22:36.000Z","size":21,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-07-04T02:25:12.632Z","etag":null,"topics":["bocaletto-luca","github-to-internet-archive","internet-archive","linux","opensource","python","terminal"],"latest_commit_sha":null,"homepage":"https://bocaletto-luca.github.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bocaletto-luca.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-07-04T01:14:10.000Z","updated_at":"2025-07-04T01:22:39.000Z","dependencies_parsed_at":"2025-07-04T02:25:17.872Z","dependency_job_id":"394f220c-4f0a-4cbd-99bc-2645ae91587e","html_url":"https://github.com/bocaletto-luca/GithubToInternetArchive","commit_stats":null,"previous_names":["bocaletto-luca/githubtointernetarchive"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/bocaletto-luca/GithubToInternetArchive","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bocaletto-luca%2FGithubToInternetArchive","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bocaletto-luca%2FGithubToInternetArchive/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bocaletto-luca%2FGithubToInternetArchive/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bocaletto-luca%2FGithubToInternetArchive/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bocaletto-luca","download_url":"https://codeload.github.com/bocaletto-luca/GithubToInternetArchive/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bocaletto-luca%2FGithubToInternetArchive/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":268563888,"owners_count":24270663,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-03T02:00:12.545Z","response_time":2577,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bocaletto-luca","github-to-internet-archive","internet-archive","linux","opensource","python","terminal"],"created_at":"2025-08-03T15:11:09.099Z","updated_at":"2026-05-11T07:15:12.861Z","avatar_url":"https://github.com/bocaletto-luca.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# GitHub Archiver to Internet Archive\n\nA single-file Python tool (`main.py`) that mirrors every repository of a GitHub user or organization and uploads each mirror as a tarball to Internet Archive. Metadata (description, license, topics) are automatically pulled from GitHub and attached to each upload.\n\n---\n\n## Features\n\n- Fetch all public repos for a GitHub user/org via the GitHub API  \n- Clone or update each repo as a bare mirror (`git clone --mirror`)  \n- Package each mirror into a `tar.gz` archive  \n- Push archives to archive.org under your chosen collection  \n- Embed rich metadata (repo description, SPDX license URL, topics)  \n- Clean up local mirrors by default (optional `--keep-mirror`)  \n- Zero external scripts—everything lives in one `main.py`  \n\n---\n\n## Prerequisites\n\n- **Python 3.7+**  \n- **git** CLI on your `PATH`  \n- **A GitHub Personal Access Token (PAT)** with `repo` scope  \n- **An Internet Archive account** with an Access Key \u0026 Secret Key  \n\nPython dependencies (install in a virtual environment):\n\n```bash\npip install PyGithub internetarchive\n```\n\n---\n\n## Installation\n\n1. Clone this repo:\n\n   ```bash\n   git clone https://github.com/bocaletto-luca/github-archiver.git\n   cd github-archiver\n   ```\n\n2. (Optional) Create and activate a virtual environment:\n\n   ```bash\n   python3 -m venv .venv\n   source .venv/bin/activate\n   ```\n\n3. Install Python requirements:\n\n   ```bash\n   pip install -r requirements.txt\n   ```\n\n4. Make the script executable:\n\n   ```bash\n   chmod +x main.py\n   ```\n\n---\n\n## Configuration\n\n### GitHub Token\n\nExport your GitHub PAT to the environment:\n\n```bash\nexport GITHUB_TOKEN=\"ghp_your_personal_access_token\"\n```\n\n### Internet Archive Credentials\n\nConfigure the `internetarchive` CLI once:\n\n```bash\npip install internetarchive\nia configure\n```\n\nYou will be prompted for your **Access Key** and **Secret Key**—these are stored in `~/.netrc`.\n\n---\n\n## Usage\n\n```bash\n./main.py \\\n  --github-user bocaletto-luca \\\n  --github-token \"${GITHUB_TOKEN}\" \\\n  --ia-collection github-archive \\\n  [--output-dir ./backups] \\\n  [--keep-mirror]\n```\n\nOptions:\n\n- `--github-user`  GitHub username or org (e.g. `bocaletto-luca`)  \n- `--github-token` GitHub PAT with `repo` scope  \n- `--ia-collection` Target Internet Archive collection (e.g. `github-archive`)  \n- `--output-dir`    Local folder to store mirrors \u0026 tarballs (default `./backups`)  \n- `--keep-mirror`   Do not delete the local mirror after upload  \n\n### Example: Full Run\n\n```bash\nexport GITHUB_TOKEN=\"ghp_ABC123...\"\nia configure     # set IA_ACCESS_KEY \u0026 IA_SECRET_KEY\n./main.py \\\n  --github-user bocaletto-luca \\\n  --github-token \"$GITHUB_TOKEN\" \\\n  --ia-collection github-archive\n```\n\nThis will:\n\n1. Fetch all repos under `bocaletto-luca`  \n2. Mirror or update each as `./backups/\u003crepo\u003e.git`  \n3. Create `./backups/\u003crepo\u003e.git.tar.gz`  \n4. Upload to `archive.org/details/github-archive__\u003crepo\u003e`  \n5. Clean up the local mirror  \n\n---\n\n## How It Works\n\n1. **GitHub API**  \n   The script uses `PyGithub` to list all repositories for the given user/org.  \n\n2. **Mirroring**  \n   Each repo is cloned (or updated) as a bare mirror. This preserves all branches, tags, PR refs, etc.  \n\n3. **Archiving**  \n   Mirrors are compressed into `tar.gz` files for efficient upload.  \n\n4. **Uploading**  \n   The `internetarchive` Python client pushes the tarball to archive.org, adding:\n   - **title**: `username/repo mirror`  \n   - **description**: the repo’s GitHub description  \n   - **licenseurl** (if SPDX license is set)  \n   - **subject**: GitHub topics as tags  \n   - **collection** and **mediatype**  \n\n5. **Cleanup**  \n   By default, local mirrors are removed after upload. Use `--keep-mirror` to retain them.\n\n---\n\n## Repository Structure\n\n```text\ngithub-archiver/\n├── LICENSE\n├── main.py\n├── requirements.txt\n└── README.md\n```\n\n- **main.py**         – all-in-one script  \n- **requirements.txt** – Python dependencies  \n- **LICENSE**          – GPL-3.0 License  \n- **README.md**        – this documentation  \n\n---\n\n## Author\n\n**Luca Bocaletto**  \nGitHub: [@bocaletto-luca](https://github.com/bocaletto-luca)  \n\n---\n\n## License\n\nDistributed under the **GPL-3.0 License**. See [LICENSE](LICENSE) for details.  \n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbocaletto-luca%2Fgithubtointernetarchive","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbocaletto-luca%2Fgithubtointernetarchive","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbocaletto-luca%2Fgithubtointernetarchive/lists"}