{"id":21035473,"url":"https://github.com/archiveteam/ludios_wpull","last_synced_at":"2025-03-13T20:24:31.824Z","repository":{"id":140744262,"uuid":"151554523","full_name":"ArchiveTeam/ludios_wpull","owner":"ArchiveTeam","description":"wpull fork with fixes and faster parsing using html5-parser; used by grab-site; should go away when wpull is similarly improved","archived":false,"fork":false,"pushed_at":"2024-07-07T01:55:07.000Z","size":2760,"stargazers_count":27,"open_issues_count":11,"forks_count":6,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-03-12T03:29:08.968Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ArchiveTeam.png","metadata":{"files":{"readme":"README.orig.rst","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-10-04T10:27:34.000Z","updated_at":"2025-01-05T06:41:27.000Z","dependencies_parsed_at":null,"dependency_job_id":"5a135d27-b442-4878-931e-a7b9d5354527","html_url":"https://github.com/ArchiveTeam/ludios_wpull","commit_stats":{"total_commits":1976,"total_committers":13,"mean_commits":152.0,"dds":0.07034412955465585,"last_synced_commit":"dcdca16e970f0cba17755038611b37e6bd93a6e9"},"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ArchiveTeam%2Fludios_wpull","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ArchiveTeam%2Fludios_wpull/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ArchiveTeam%2Fludios_wpull/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ArchiveTeam%2Fludios_wpull/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ArchiveTeam","download_url":"https://codeload.github.com/ArchiveTeam/ludios_wpull/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243475424,"owners_count":20296723,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-19T13:15:01.027Z","updated_at":"2025-03-13T20:24:31.804Z","avatar_url":"https://github.com/ArchiveTeam.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"=====\nWpull\n=====\n\n\nWpull is a Wget-compatible (or remake/clone/replacement/alternative) web\ndownloader and crawler.\n\n.. image:: https://raw.githubusercontent.com/chfoo/wpull/master/icon/wpull_logo_full.png\n   :target: https://github.com/chfoo/wpull\n   :alt: A dog pulling a box via a harness.\n\nNotable Features:\n\n* Written in Python: lightweight, modifiable, robust, \u0026 scriptable\n* Graceful stopping; on-disk database resume\n* youtube-dl integration (experimental)\n\n\nInstall\n=======\n\nWpull uses `Python 3 \u003chttp://python.org/download/\u003e`_.\n\nOnce Python is installed, download Wpull from PyPI using pip::\n\n    pip3 install wpull\n\nFor detailed installation instructions and potential caveats, please see\nhttps://wpull.readthedocs.io/en/master/install.html.\n\n\nExample Commands\n================\n\nTo download the About page of Google.com::\n\n    wpull google.com/about\n\nTo archive a website::\n\n    wpull billy.blogsite.example \\\n        --warc-file blogsite-billy \\\n        --no-check-certificate \\\n        --no-robots --user-agent \"InconspiuousWebBrowser/1.0\" \\\n        --wait 0.5 --random-wait --waitretry 600 \\\n        --page-requisites --recursive --level inf \\\n        --span-hosts-allow linked-pages,page-requisites \\\n        --escaped-fragment --strip-session-id \\\n        --sitemaps \\\n        --reject-regex \"/login\\.php\" \\\n        --tries 3 --retry-connrefused --retry-dns-error \\\n        --timeout 60 --session-timeout 21600 \\\n        --delete-after --database blogsite-billy.db \\\n        --quiet --output-file blogsite-billy.log\n\nTo see all options::\n\n    wpull --help\n\n\nDocumentation\n=============\n\nDocumentation is located at https://wpull.readthedocs.io/. Please have\na look at it before using Wpull's advanced features.\n\n\nHelp\n====\n\nNeed help? Please see our `Help\n\u003chttps://wpull.readthedocs.io/en/master/help.html\u003e`_ page which contains\nfrequently asked questions and support information.\n\nThe issue tracker is located at https://github.com/chfoo/wpull/issues.\n\n\nDev\n===\n\n.. image:: https://travis-ci.org/chfoo/wpull.png\n   :target: https://travis-ci.org/chfoo/wpull\n   :alt: Travis CI build status\n\n.. image:: https://coveralls.io/repos/chfoo/wpull/badge.png\n   :target: https://coveralls.io/r/chfoo/wpull\n   :alt: Coveralls report\n\n\nContributions and feedback are greatly appreciated. \n\n\nCredits\n=======\n\nCopyright 2013-2016 by Christopher Foo and others. License GPL v3.\n\nThis project contains third-party source code licensed under different terms:\n\n* wpull.backport.logging\n* wpull.thirdparty.robotexclusionrulesparser\n* wpull.thirdparty.dammit\n\nWe would like to acknowledge the authors of GNU Wget as Wpull uses algorithms\nfrom Wget.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farchiveteam%2Fludios_wpull","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Farchiveteam%2Fludios_wpull","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farchiveteam%2Fludios_wpull/lists"}