{"id":13582646,"url":"https://github.com/ArchiveTeam/wpull","last_synced_at":"2025-04-06T14:31:08.055Z","repository":{"id":12358165,"uuid":"15005712","full_name":"ArchiveTeam/wpull","owner":"ArchiveTeam","description":"Wget-compatible web downloader and crawler.","archived":false,"fork":false,"pushed_at":"2024-04-29T12:41:59.000Z","size":4107,"stargazers_count":556,"open_issues_count":198,"forks_count":77,"subscribers_count":23,"default_branch":"develop","last_synced_at":"2024-10-30T00:56:00.243Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ArchiveTeam.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2013-12-07T13:03:15.000Z","updated_at":"2024-10-26T02:21:06.000Z","dependencies_parsed_at":"2023-07-13T13:08:05.230Z","dependency_job_id":"fa7834ee-55f6-4d94-87c1-552df2b1077e","html_url":"https://github.com/ArchiveTeam/wpull","commit_stats":{"total_commits":1876,"total_committers":10,"mean_commits":187.6,"dds":"0.020788912579957408","last_synced_commit":"cfa5bcc571e7ff2d5175d8299e90651955c72df5"},"previous_names":["chfoo/wpull"],"tags_count":97,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ArchiveTeam%2Fwpull","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ArchiveTeam%2Fwpull/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ArchiveTeam%2Fwpull/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ArchiveTeam%2Fwpull/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ArchiveTeam","download_url":"https://codeload.github.com/ArchiveTeam/wpull/tar.gz/refs/heads/develop","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247495809,"owners_count":20948110,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T15:02:54.570Z","updated_at":"2025-04-06T14:31:08.025Z","avatar_url":"https://github.com/ArchiveTeam.png","language":"HTML","funding_links":[],"categories":["Tools \u0026 Software","HTML","Download utilities","Web Archiving"],"sub_categories":["Acquisition","General","Crawlers \u0026 Capture"],"readme":"=====\nWpull\n=====\n\n\nWpull is a Wget-compatible (or remake/clone/replacement/alternative) web\ndownloader and crawler.\n\n.. image:: https://raw.githubusercontent.com/chfoo/wpull/master/icon/wpull_logo_full.png\n   :target: https://github.com/chfoo/wpull\n   :alt: A dog pulling a box via a harness.\n\nNotable Features:\n\n* Written in Python: lightweight, modifiable, robust, \u0026 scriptable\n* Graceful stopping; on-disk database resume\n* PhantomJS \u0026 youtube-dl integration (experimental)\n\n\nInstall\n=======\n\nWpull uses `Python 3 \u003chttp://python.org/download/\u003e`_.\n\nOnce Python is installed, download Wpull from PyPI using pip::\n\n    pip3 install wpull\n\nFor detailed installation instructions and potential caveats, please see\nhttps://wpull.readthedocs.io/en/master/install.html.\n\n\nExample Commands\n================\n\nTo download the About page of Google.com::\n\n    wpull google.com/about\n\nTo archive a website::\n\n    wpull billy.blogsite.example \\\n        --warc-file blogsite-billy \\\n        --no-check-certificate \\\n        --no-robots --user-agent \"InconspiuousWebBrowser/1.0\" \\\n        --wait 0.5 --random-wait --waitretry 600 \\\n        --page-requisites --recursive --level inf \\\n        --span-hosts-allow linked-pages,page-requisites \\\n        --escaped-fragment --strip-session-id \\\n        --sitemaps \\\n        --reject-regex \"/login\\.php\" \\\n        --tries 3 --retry-connrefused --retry-dns-error \\\n        --timeout 60 --session-timeout 21600 \\\n        --delete-after --database blogsite-billy.db \\\n        --quiet --output-file blogsite-billy.log\n\nTo see all options::\n\n    wpull --help\n\n\nDocumentation\n=============\n\nDocumentation is located at https://wpull.readthedocs.io/. Please have\na look at it before using Wpull's advanced features.\n\n\nHelp\n====\n\nNeed help? Please see our `Help\n\u003chttps://wpull.readthedocs.io/en/master/help.html\u003e`_ page which contains\nfrequently asked questions and support information.\n\nThe issue tracker is located at https://github.com/chfoo/wpull/issues.\n\n\nDev\n===\n\n.. image:: https://travis-ci.org/ArchiveTeam/wpull.png\n   :target: https://travis-ci.org/ArchiveTeam/wpull\n   :alt: Travis CI build status\n\n.. image:: https://coveralls.io/repos/chfoo/wpull/badge.png\n   :target: https://coveralls.io/r/chfoo/wpull\n   :alt: Coveralls report\n\n\nContributions and feedback are greatly appreciated. \n\n\nCredits\n=======\n\nCopyright 2013-2016 by Christopher Foo and others. License GPL v3.\n\nThis project contains third-party source code licensed under different terms:\n\n* wpull.backport.logging\n* wpull.thirdparty.robotexclusionrulesparser\n* wpull.thirdparty.dammit\n\nWe would like to acknowledge the authors of GNU Wget as Wpull uses algorithms\nfrom Wget.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FArchiveTeam%2Fwpull","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FArchiveTeam%2Fwpull","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FArchiveTeam%2Fwpull/lists"}