{"id":20660695,"url":"https://github.com/zenoyang/webcrawler","last_synced_at":"2025-08-02T09:15:52.222Z","repository":{"id":118164660,"uuid":"88851861","full_name":"zenoyang/WebCrawler","owner":"zenoyang","description":"一些爬虫代码","archived":false,"fork":false,"pushed_at":"2018-05-04T03:29:42.000Z","size":26913,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-05-31T11:35:11.334Z","etag":null,"topics":["crawler","scrapy","spider","web-crawler"],"latest_commit_sha":null,"homepage":null,"language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zenoyang.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-04-20T10:12:06.000Z","updated_at":"2020-10-22T11:27:24.000Z","dependencies_parsed_at":null,"dependency_job_id":"d48ef716-1565-427e-986f-a4b442ad9355","html_url":"https://github.com/zenoyang/WebCrawler","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/zenoyang/WebCrawler","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zenoyang%2FWebCrawler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zenoyang%2FWebCrawler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zenoyang%2FWebCrawler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zenoyang%2FWebCrawler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zenoyang","download_url":"https://codeload.github.com/zenoyang/WebCrawler/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zenoyang%2FWebCrawler/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":268362046,"owners_count":24238533,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-02T02:00:12.353Z","response_time":74,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","scrapy","spider","web-crawler"],"created_at":"2024-11-16T19:05:49.675Z","updated_at":"2025-08-02T09:15:51.948Z","avatar_url":"https://github.com/zenoyang.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"# WebCrawler\n\n## 100offer_crawler\n100offer招聘信息采集 \n\n## caike_crawler\n才客网职业信息采集\n\n## Ganji_JN.py\n爬取赶集网济南市租房信息 地址：http://jn.ganji.com/fang1/\n\n## Scrapy/xici\nScrapy爬取西刺的代理ip，并存储到mongodb，ip待验证 http://www.xicidaili.com/nn/\n\n\n## Scrapy/zhihu\nScrapy爬取知乎所有用户信息，并存储到mongdb，封ip了，待解决 \n\n\n## Scrapy/doubanBook\nScrapy爬取豆瓣图书信息，保存为csv格式 https://book.douban.com/tag/%E5%8E%86%E5%8F%B2\n\n\n## huaban\n异步加载，爬取花瓣网美图 http://huaban.com/\n\n\n## shixiseng\n爬取实习僧Python实习工作信息并保存为xls格式  http://www.shixiseng.com/\n\n\n## ss\n利用爬虫科学上网  http://free.ishadow.online/  http://h6v6.com/\n\n\n## 读写文档\ncsv、doc、pdf、txt格式的读写\n\n\n## send_qq_email\n用Python发送qq邮箱\n\n\n## toutiao\n分析Ajax爬取今日头条街拍图  http://www.toutiao.com/\n\n\n## jupyter\njupyter的安装与启动\n\n\n\n## craw_bin_tdp\n爬取今年来robocup2d世界杯所有TDP与可执行  http://chaosscripting.net/files/competitions/RoboCup/WorldCup/\n\n\n## meizitu\n爬取妹子图所有图片 http://www.mzitu.com/\n\n\n## baike_spider\n爬取百度百科词条1000个 http://baike.baidu.com/view/21087.htm\n\n\n## login_weibo_cn\n登录新浪微博手机版  https://weibo.cn/login/\n\n\n## 静谧\ncookie的使用、urllib库的基本使用、URLError异常处理\n爬取百度贴吧帖子、爬取糗事百科段子\n\n\n## 爬虫隐藏\n模拟真实浏览器访问网页的几种简单方法\n\n\n## 翻译脚本\n利用有道写的翻译脚本 http://fanyi.youdao.com/\n\n\n## 使用proxy\n使用和检验代理\nhttp://www.whatismyip.com.tw\nhttp://www.ip138.com\nhttp://www.ip.cn/\n\n\n## 数据库存储\n链接到SQLServer、MySQL\n\n\n## 图片的存储\n图片的下载\n\n\n## 网页下载器\nurllib的使用\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzenoyang%2Fwebcrawler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzenoyang%2Fwebcrawler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzenoyang%2Fwebcrawler/lists"}