{"id":13826043,"url":"https://github.com/rajatomar788/pywebcopy","last_synced_at":"2025-07-08T23:30:46.634Z","repository":{"id":32766795,"uuid":"142004087","full_name":"rajatomar788/pywebcopy","owner":"rajatomar788","description":"Locally saves webpages to your hard disk with images, css, js \u0026 links as is.","archived":false,"fork":false,"pushed_at":"2024-07-31T07:30:00.000Z","size":1811,"stargazers_count":547,"open_issues_count":27,"forks_count":108,"subscribers_count":8,"default_branch":"master","last_synced_at":"2024-11-07T16:50:52.602Z","etag":null,"topics":["archive-tool","crawler","html","html-parser","mirror","python","web","webpage"],"latest_commit_sha":null,"homepage":"https://rajatomar788.github.io/pywebcopy/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rajatomar788.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2018-07-23T11:18:54.000Z","updated_at":"2024-11-07T14:19:28.000Z","dependencies_parsed_at":"2023-12-12T17:54:44.226Z","dependency_job_id":"42307685-131b-40d2-bade-8b5968059825","html_url":"https://github.com/rajatomar788/pywebcopy","commit_stats":{"total_commits":125,"total_committers":11,"mean_commits":"11.363636363636363","dds":0.36,"last_synced_commit":"cf4bbaf8fc2a0632db3a5fbe67b18b28637b1153"},"previous_names":[],"tags_count":9,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rajatomar788%2Fpywebcopy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rajatomar788%2Fpywebcopy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rajatomar788%2Fpywebcopy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rajatomar788%2Fpywebcopy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rajatomar788","download_url":"https://codeload.github.com/rajatomar788/pywebcopy/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225470632,"owners_count":17479366,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["archive-tool","crawler","html","html-parser","mirror","python","web","webpage"],"created_at":"2024-08-04T09:01:31.304Z","updated_at":"2025-07-08T23:30:46.622Z","avatar_url":"https://github.com/rajatomar788.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"```\n    ____       _       __     __    ______                     _____   \n   / __ \\__  _| |     / /__  / /_  / ____/___  ____  __  __   /__  /   \n  / /_/ / / / / | /| / / _ \\/ __ \\/ /   / __ \\/ __ \\/ / / /     / /    \n / ____/ /_/ /| |/ |/ /  __/ /_/ / /___/ /_/ / /_/ / /_/ /     / /     \n/_/    \\__, / |__/|__/\\___/_.___/\\____/\\____/ .___/\\__, /     /_/      \n      /____/                               /_/    /____/               \n```\n\n`Created By : Raja Tomar`\n`License : Apache License 2.0`\n`Email: rajatomar788@gmail.com`\n[![Downloads](https://pepy.tech/badge/pywebcopy)](https://pepy.tech/project/pywebcopy)\n\n\nPyWebCopy is a free tool for copying full or partial websites locally\nonto your hard-disk for offline viewing.\n\nPyWebCopy will scan the specified website and download its content onto your hard-disk.\nLinks to resources such as style-sheets, images, and other pages in the website\nwill automatically be remapped to match the local path.\nUsing its extensive configuration you can define which parts of a website will be copied and how.\n\n## What can PyWebCopy do?\n\nPyWebCopy will examine the HTML mark-up of a website and attempt to discover all linked resources\nsuch as other pages, images, videos, file downloads - anything and everything.\nIt will download all of theses resources, and continue to search for more.\nIn this manner, WebCopy can \"crawl\" an entire website and download everything it sees\nin an effort to create a reasonable facsimile of the source website.\n\n## What can PyWebCopy not do?\n\nPyWebCopy does not include a virtual DOM or any form of JavaScript parsing.\nIf a website makes heavy use of JavaScript to operate, it is unlikely PyWebCopy will be able\nto make a true copy if it is unable to discover all of the website due to\nJavaScript being used to dynamically generate links.\n\nPyWebCopy does not download the raw source code of a web site,\nit can only download what the HTTP server returns.\nWhile it will do its best to create an offline copy of a website,\nadvanced data driven websites may not work as expected once they have been copied.\n\n## Installation\n\n`pywebcopy` is available on PyPi and is easily installable using `pip`\n\n```shell\n\n$ pip install pywebcopy\n\n```\n\nYou are ready to go. Read the tutorials below to get started.\n\n## First steps\n\nYou should always check if the latest pywebcopy is installed successfully.\n\n```pydocstring\n\u003e\u003e\u003e import pywebcopy\n\u003e\u003e\u003e pywebcopy.__version___\n7.x.x\n```\n\nYour version may be different, now you can continue the tutorial.\n\n## Basic Usages\n\nTo save any single page, just type in python console\n\n```python\n\nfrom pywebcopy import save_webpage\nsave_webpage(\n      url=\"https://httpbin.org/\",\n      project_folder=\"E://savedpages//\",\n      project_name=\"my_site\",\n      bypass_robots=True,\n      debug=True,\n      open_in_browser=True,\n      delay=None,\n      threaded=False,\n)\n\n```\n\nTo save full website (This could overload the target server, So, be careful)\n\n```Python\n\nfrom pywebcopy import save_website\n\nsave_website(\n      url=\"https://httpbin.org/\",\n      project_folder=\"E://savedpages//\",\n      project_name=\"my_site\",\n      bypass_robots=True,\n      debug=True,\n      open_in_browser=True,\n      delay=None,\n      threaded=False,\n)\n\n```\n\n### Running Tests\nRunning tests is simple and doesn't require any external library. \nJust run this command from root directory of pywebcopy package.\n\n\n```shell\n$ python -m pywebcopy -t\n```\n\n\n\n### Command Line Interface\n`pywebcopy` have a very easy to use command-line interface which\ncan help you do task without having to worrying about the inner\nlong way.\n\n- #### Getting list of commands\n    ```shell\n    $ python -m pywebcopy --help\n    ```\n- #### Using CLI\n  ```\n  Usage: pywebcopy [-p|--page|-s|--site|-t|--tests] [--url=URL [,--location=LOCATION [,--name=NAME [,--pop [,--bypass_robots [,--quite [,--delay=DELAY]]]]]]]\n  \n  Python library to clone/archive pages or sites from the Internet.\n  \n  Options:\n    --version             show program's version number and exit\n    -h, --help            show this help message and exit\n    --url=URL             url of the entry point to be retrieved.\n    --location=LOCATION   Location where files are to be stored.\n    -n NAME, --name=NAME  Project name of this run.\n    -d DELAY, --delay=DELAY\n                          Delay between consecutive requests to the server.\n    --bypass_robots       Bypass the robots.txt restrictions.\n    --threaded            Use threads for faster downloading.\n    -q, --quite           Suppress the logging from this library.\n    --pop                 open the html page in default browser window after\n                          finishing the task.\n  \n    CLI Actions List:\n      Primary actions available through cli.\n  \n      -p, --page          Quickly saves a single page.\n      -s, --site          Saves the complete site.\n      -t, --tests         Runs tests for this library.\n  \n  \n  ```\n- #### Running tests\n  ```shell\n    $ python -m pywebcopy run_tests\n  ```\n\n\n### Authentication and Cookies\nMost of the time authentication is needed to access a certain page.\nIts real easy to authenticate with `pywebcopy` because it uses an \n`requests.Session` object for base http activity which can be accessed \nthrough `WebPage.session` attribute. And as you know there\nare ton of tutorials on setting up authentication with `requests.Session`.\n\nHere is an example to fill forms\n\n```python\nfrom pywebcopy.configs import get_config\n\nconfig = get_config('http://httpbin.org/')\nwp = config.create_page()\nwp.get(config['project_url'])\nform = wp.get_forms()[0]\nform.inputs['email'].value = 'bar' # etc\nform.inputs['password'].value = 'baz' # etc\nwp.submit_form(form)\nwp.get_links()\n\n```\n\n\nYou can read more in the github repositories `docs` folder.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frajatomar788%2Fpywebcopy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frajatomar788%2Fpywebcopy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frajatomar788%2Fpywebcopy/lists"}