{"id":26121701,"url":"https://github.com/threezh1/sitecopy","last_synced_at":"2025-04-07T18:11:28.518Z","repository":{"id":40613951,"uuid":"227624588","full_name":"Threezh1/SiteCopy","owner":"Threezh1","description":"sitecopy is a tool that facilitates personal website backup and network data collection","archived":false,"fork":false,"pushed_at":"2024-01-21T11:48:45.000Z","size":14,"stargazers_count":599,"open_issues_count":7,"forks_count":178,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-03-31T17:16:20.335Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Threezh1.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-12-12T14:23:33.000Z","updated_at":"2025-03-31T02:57:23.000Z","dependencies_parsed_at":"2025-03-17T15:21:46.465Z","dependency_job_id":null,"html_url":"https://github.com/Threezh1/SiteCopy","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Threezh1%2FSiteCopy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Threezh1%2FSiteCopy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Threezh1%2FSiteCopy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Threezh1%2FSiteCopy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Threezh1","download_url":"https://codeload.github.com/Threezh1/SiteCopy/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247704571,"owners_count":20982298,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-03-10T14:35:35.726Z","updated_at":"2025-04-07T18:11:28.492Z","avatar_url":"https://github.com/Threezh1.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SiteCopy\n\nsitecopy is a tool that facilitates personal website backup and network data collection\n\n## 介绍\n\n网站复制，也可称为网站备份。是通过工具将网页上的内容全部保存下来。当然不仅仅只是保存了一个html页面，而是将网页源码内所包含的css、js和静态文件等全部保存，以在本地也可以完整的浏览整个网站。网络上也有一些类似的工具，但使用起来并不理想。于是自己写一个Python脚本，方便个人对网站的备份，也方便一些网络资料的收集。\n\n- 工具名称: SiteCopy\n- 作者: Threezh1\n- 博客: http://www.threezh1.com/\n\n关于SiteCopy的开发记录：[论如何优雅的复制一个网站的所有页面](https://xz.aliyun.com/t/6941)\n\n对互联网任何网站的复制需在取得授权后方可进行，若使用者因此做出危害网络安全的行为后果自负，与作者无关，特此声明。\n\n## 使用\n\nPython版本: 3.7\n\n安装依赖库: `pip3 install -r requirements.txt`\n\n- 复制单个页面\n\n`python sitecopy.py -u \"http://www.threezh1.com\"`\n\n- 复制整个网站\n\n`python sitecopy.py -u \"http://www.threezh1.com\" -e`\n\n- 复制多个页面\n\n`python sitecopy.py -s \"site.txt\"`\n\n- 复制多个网站\n\n`python sitecopy.py -s \"site.txt\" -e`\n\n\n指定链接爬取的循环次数： -d (默认为200)\n\n指定线程数：-e (默认为30)\n\n例子： 爬取 www.threezh1.com 网站所有页面，指定链接爬取的循环次数为200，指定线程数为30\n\n`python sitecopy.py -u \"http://www.threezh1.com\" -e -d 200 -t 30`\n\n## 复制网站测试\n\n- 复制自己的博客：https://threezh1.com 花费时间：2分钟48秒\n\n运行截图：\n\n![pic_11.jpg](https://s2.ax1x.com/2019/12/12/QcnOp9.jpg)\n\n目录截图：\n\n![pic_07.jpg](https://i.loli.net/2019/12/12/MRmv4licZCb5OzD.jpg)\n\n页面截图：\n\n![pic_06.jpg](https://i.loli.net/2019/12/12/4ydL371zCEiVJnZ.jpg)\n\n\n## 已知存在的问题\n\n1. 目录替换时在有些情况下会进行多次替换导致页面无法正常显示\n2. 网站或图床有防爬措施时无法正常保存\n3. 网络问题导致脚本无法正常执行\n\n非常希望能够和师傅们共同交流对这些问题的解决方式，我的邮箱：makefoxm@qq.com\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthreezh1%2Fsitecopy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthreezh1%2Fsitecopy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthreezh1%2Fsitecopy/lists"}