{"id":13910644,"url":"https://github.com/rob-sve/iadownloader","last_synced_at":"2025-07-18T09:32:22.789Z","repository":{"id":52475736,"uuid":"350103861","full_name":"rsvensson/iadownloader","owner":"rsvensson","description":"Auto-download files and collections from Internet Archive","archived":false,"fork":false,"pushed_at":"2021-04-27T21:57:44.000Z","size":48,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-05-11T07:35:11.573Z","etag":null,"topics":["download","downloader","internet-archive","python","tqdm"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rsvensson.png","metadata":{"files":{"readme":"README.org","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-03-21T19:56:10.000Z","updated_at":"2023-09-04T04:15:30.000Z","dependencies_parsed_at":"2022-09-16T07:22:12.010Z","dependency_job_id":null,"html_url":"https://github.com/rsvensson/iadownloader","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rsvensson%2Fiadownloader","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rsvensson%2Fiadownloader/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rsvensson%2Fiadownloader/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rsvensson%2Fiadownloader/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rsvensson","download_url":"https://codeload.github.com/rsvensson/iadownloader/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":214276910,"owners_count":15709598,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["download","downloader","internet-archive","python","tqdm"],"created_at":"2024-08-07T00:01:40.069Z","updated_at":"2024-11-25T19:31:30.245Z","avatar_url":"https://github.com/rsvensson.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"#+TITLE: iadownloader\n#+AUTHOR: rsvensson\n#+EMAIL: rsvensson.malmo@gmail.com\n#+DESCRIPTION: Auto-download files from Internet Archive\n#+KEYWORDS: python, internet archive, download\n\n** Summary\n/iadownloader/ is a tool to automatically download files from the [[https://archive.org/][Internet Archive]]. It will download all the files - individually or as a compressed archive - in an internet archive upload url automatically, to a configurable download location (defaults to the current working directory). It can also download complete collections etc, by parsing either json or csv files generated by Internet Archive's [[https://archive.org/advancedsearch.php][advanced search]] tool.\n\n** Usage\n#+BEGIN_SRC shell\niadownloader.py [-h] [-c] [-o OUTPUT_DIR] [-t THREADS] [-T] url\n\npositional arguments:\n  url                   URL or path to json/csv file\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -c, --compressed      Get the compressed archive download instead of the individual files\n  -o OUTPUT_DIR, --output_dir OUTPUT_DIR\n                        Path to output directory\n  -t THREADS, --threads THREADS\n                        Number of simultaneous downloads (maximum of 10)\n  -T, --torrent         Only download the torrent file if available\n#+END_SRC\n\nThe basic usage is to simply invoke iadownloader with a download url.\n#+BEGIN_SRC shell\npython iadownloader.py https://archive.org/download/\u003curl\u003e\n#+END_SRC\nThis causes all the files in the url to be downloaded to the directory the script was invoked from.\n\nOptionally specify the download location:\n#+BEGIN_SRC shell\npython iadownloader.py -o /download/path https://archive.org/download/\u003curl\u003e\n#+END_SRC\n\nTo download the compressed archive of the upload just add the '-c' flag:\n#+BEGIN_SRC shell\npython iadownloader.py -c -o /download/path https://archive.org/download/\u003curl\u003e\n#+END_SRC\n\nYou can also specify the amount of threads (up to 10):\n#+BEGIN_SRC shell\npython iadownloader.py -t 8 /download/path https://archive.org/download/\u003curl\u003e\n#+END_SRC\nIt defaults to 4 threads if not specified.\n\n*Don't confuse \"download url\" with individual file urls.* Those are trivially downloaded through your web browser. This tool is to simplify downloading all the included urls in an upload on Internet Archive. Even this can be done using the Web UI quite easily. Where iadownloader shines is the ability to download full collections automatically.\n\nTo download a whole collection, all files from a certain author, etc, go to Internet Archive's [[https://archive.org/advancedsearch.php][advanced search]] tool and follow the following steps:\n1. Scroll down to \"Advanced Search returning JSON, XML, and more\". In the \"Query\" field enter /collection:\u003cname of collection\u003e/ for collections, /creator:\u003cname of creator\u003e/ for creators, etc. In \"Field to return\" select \"identifier\" if not already selected. Select an appropriate \"Number of results\" depending on the collection.\n2. Choose either JSON format or CSV format. CSV format is a bit more convenient since it prompts you to download it immediately, while the JSON format opens a javascript page with embedded JSON data. Save the .csv file to a location. If you choose JSON, save the page and make sure to save it with the .json ending rather than the suggested .js one.\n3. Run iadownloader.py like this:\n   #+BEGIN_SRC sh\n   python iadownloader.py -o /download/path /path/to/csv-or-json-file\n   #+END_SRC\niadownloader will go through all the downloads of the collection and download them into the download path.\n\n** Requirements\niadownloader uses /requests/, /lxml/, and /tqdm/ to do its magic. To make sure you have them use the included requirements.txt:\n#+BEGIN_SRC sh\npip install -r requirements.txt\n#+END_SRC\nOf course, you need python and pip as well.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frob-sve%2Fiadownloader","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frob-sve%2Fiadownloader","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frob-sve%2Fiadownloader/lists"}