{"id":16776726,"url":"https://github.com/n0tan3rd/memgatorbulkdownload","last_synced_at":"2025-06-26T09:33:58.052Z","repository":{"id":74125214,"uuid":"129684228","full_name":"N0taN3rd/memgatorBulkDownload","owner":"N0taN3rd","description":null,"archived":false,"fork":false,"pushed_at":"2018-04-18T02:16:26.000Z","size":14,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-01-23T05:28:31.298Z","etag":null,"topics":["memento","memento-protocol","memento-rfc","memgator","timemap","timemaps","web-archiving"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/N0taN3rd.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-04-16T04:23:29.000Z","updated_at":"2024-07-30T16:14:14.000Z","dependencies_parsed_at":null,"dependency_job_id":"1f2bfef0-5023-4c8e-b394-d1ebc316e1b7","html_url":"https://github.com/N0taN3rd/memgatorBulkDownload","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/N0taN3rd%2FmemgatorBulkDownload","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/N0taN3rd%2FmemgatorBulkDownload/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/N0taN3rd%2FmemgatorBulkDownload/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/N0taN3rd%2FmemgatorBulkDownload/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/N0taN3rd","download_url":"https://codeload.github.com/N0taN3rd/memgatorBulkDownload/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243911496,"owners_count":20367684,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["memento","memento-protocol","memento-rfc","memgator","timemap","timemaps","web-archiving"],"created_at":"2024-10-13T07:10:48.844Z","updated_at":"2025-03-16T18:42:45.591Z","avatar_url":"https://github.com/N0taN3rd.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Memgator Bulk TimeMap Downloader\n\n Have you ever had a need to download 100 or 1 million TimeMaps using [oduwsdl/memgator](https://github.com/oduwsdl/memgator)?\n\nWith the caveat that it must be done in a timely manner?\n\nIf so then you are in luck because this project has you covered.\n\n# Requirements\n**Requires python 3**\n\nBe sure to install the dependencies first\n\n- ```[sudo] pip install -r requirements.txt```\n\nYou also need a running instance of [oduwsdl/MemGator](https://github.com/oduwsdl/memgator)\n\nIf you do not have one. You can get one at [oduwsdl/MemGator/releases](https://github.com/oduwsdl/MemGator/releases)\n\n# Usage\n\n#### Basic usage\n```\n$ python download.py -m {MGURL} {FORMAT2} -d {DUMDIR} -u {LIST}\n# MGURL   =\u003e http://localhost:1208\n# FORMAT  =\u003e link|json|cdxj\n# FORMAT2 =\u003e (-l, --link)|(-j, --json)|(-c, --cdxj)\n# DUMDIR  =\u003e Path to directory where timemaps will be dumped\n# LIST    =\u003e Path to URL list\n```\n\n#### Full Usage\n```\n$ python download.py --help\nusage: download [-h] [-m MEMURL] [-w WORKERS] [-r REQUESTS] [-d DUMP] -u URLS\n                [-k KEY] [-j | -l | -c]\n\nBulk download TimeMaps using a local memgator instance\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -m MEMURL, --memurl MEMURL\n                        URL for running memgator instance. Defaults to\n                        http://localhost:1208/timemap/json\n  -w WORKERS, --workers WORKERS\n                        Max number of worker processes spawned. Defaults to 5\n  -r REQUESTS, --requests REQUESTS\n                        How many requests should be queued per chunk. Defaults\n                        to 10\n  -d DUMP, --dump DUMP  Directory to dump the TimeMaps in. Defaults to\n                        \u003ccwd\u003e/timemaps\n  -u URLS, --urls URLS  Path to file (.txt, .csv, .json) containing list of\n                        URLs. File type detected by considering extension. If\n                        .csv must supply -k \u003ckey\u003e so we know where to get the\n                        url\n  -k KEY, --key KEY     The csv key for the urls\n  -j, --json            Download TimeMaps in json format. Default format\n  -l, --link            Download TimeMaps in link format\n  -c, --cdxj            Download TimeMaps in cdxj format\n```\n\n#### URL List Format\n- **.txt**: 1 URL per line\n- **.csv**: Requires -k or --key {KEY} argument. _KEY_ is the csv column containing the URL\n- **.json**: List of URLs  \n\n# License\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fn0tan3rd%2Fmemgatorbulkdownload","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fn0tan3rd%2Fmemgatorbulkdownload","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fn0tan3rd%2Fmemgatorbulkdownload/lists"}