{"id":18243950,"url":"https://github.com/rybesh/capture-urls","last_synced_at":"2025-04-04T13:30:58.641Z","repository":{"id":39883328,"uuid":"367513102","full_name":"rybesh/capture-urls","owner":"rybesh","description":"Archive a list of URLs using the Wayback Machine","archived":false,"fork":false,"pushed_at":"2024-12-06T16:27:48.000Z","size":40,"stargazers_count":5,"open_issues_count":1,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-20T13:11:47.114Z","etag":null,"topics":["save-page-now","wayback-machine","web-archiving"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"unlicense","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rybesh.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-05-15T01:08:11.000Z","updated_at":"2024-12-30T22:26:02.000Z","dependencies_parsed_at":"2023-02-02T20:20:10.490Z","dependency_job_id":"37a5d777-70ab-4b8d-a04b-20c1cc2a303a","html_url":"https://github.com/rybesh/capture-urls","commit_stats":{"total_commits":26,"total_committers":2,"mean_commits":13.0,"dds":0.5,"last_synced_commit":"91cd36d420d2ac9c30245e61c45e4fbfcdbf4f56"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rybesh%2Fcapture-urls","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rybesh%2Fcapture-urls/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rybesh%2Fcapture-urls/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rybesh%2Fcapture-urls/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rybesh","download_url":"https://codeload.github.com/rybesh/capture-urls/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247184949,"owners_count":20897860,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["save-page-now","wayback-machine","web-archiving"],"created_at":"2024-11-05T09:04:23.354Z","updated_at":"2025-04-04T13:30:58.326Z","avatar_url":"https://github.com/rybesh.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Archive a list of URLs using the Wayback Machine\n\n** You need Python 3.10 or later to run this script. **\n\nThis script uses the [Save Page Now 2 Public API](https://docs.google.com/document/d/1Nsv52MvSjbLb2PCpHlat0gkzw0EvtSgpKHu4mk0MnrA/edit).\n\nTo use it:\n\n1. Clone or [download](https://github.com/rybesh/capture-urls/archive/refs/heads/main.zip \"download repository as a zip file\") and unzip this repository.\n\n1. Install the required Python libraries. Assuming you cloned or\n   unzipped this repository to the directory `path/to/capture-urls/`:\n\n   ```\n   cd path/to/capture-urls/\n   make\n   ```\n\n1. Go to https://archive.org/account/s3.php and get your S3-like API keys.\n\n1. In `path/to/capture-urls/`, create a file called `secret.py` with\n   the following contents:\n\n   ```python\n   ACCESS_KEY = 'your access key'\n   SECRET_KEY = 'your secret key'\n   ```\n   \n   (Use the actual values of your access key and secret key, not `your\n   access key` and `your secret key`.)\n   \n1. *Optionally* edit `config.py` to your liking.\n\n1. Archive your URLs:\n   ```\n   cat urls.txt | ./capture-urls.py \u003e archived-urls.txt\n   ```\n   `urls.txt` should contain a list of URLs to be archived, one on each line.\n\n1. Archiving URLs can take a long time. You can interrupt the process\n   with `Ctrl-C`. This will create a file called `progress.json` with\n   the state of the archiving process so far. If you start the process\n   again, it will pick up where it left off. You can add new URLs to\n   `urls.txt` before you restart the process.\n\n1. When it finishes running you should have a list of the archived\n   URLs in `archived-urls.txt`.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frybesh%2Fcapture-urls","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frybesh%2Fcapture-urls","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frybesh%2Fcapture-urls/lists"}