{"id":13414824,"url":"https://github.com/DropsDevopsOrg/ECommerceCrawlers","last_synced_at":"2025-03-14T22:32:13.312Z","repository":{"id":37405991,"uuid":"178336701","full_name":"DropsDevopsOrg/ECommerceCrawlers","owner":"DropsDevopsOrg","description":"实战🐍多种网站、电商数据爬虫🕷。包含🕸：淘宝商品、微信公众号、大众点评、企查查、招聘网站、闲鱼、阿里任务、博客园、微博、百度贴吧、豆瓣电影、包图网、全景网、豆瓣音乐、某省药监局、搜狐新闻、机器学习文本采集、fofa资产采集、汽车之家、国家统计局、百度关键词收录数、蜘蛛泛目录、今日头条、豆瓣影评、携程、小米应用商店、安居客、途家民宿❤️❤️❤️。微信爬虫展示项目:","archived":false,"fork":false,"pushed_at":"2024-05-22T15:19:02.000Z","size":7943,"stargazers_count":4753,"open_issues_count":47,"forks_count":1343,"subscribers_count":144,"default_branch":"master","last_synced_at":"2024-11-13T04:00:19.776Z","etag":null,"topics":["alitask","baidu","baidu-tieba","baotu","boss","crawler","ctrip","dazhong-spider","douban-movie","douban-music","fofa","lagou","python3","quanjing","scrapy","sohu","taobao-spider","wechat","xianyu","zhilianzhaopin"],"latest_commit_sha":null,"homepage":"http://wechat.doonsec.com/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DropsDevopsOrg.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":null,"patreon":null,"open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"custom":null}},"created_at":"2019-03-29T05:12:50.000Z","updated_at":"2024-11-12T13:06:38.000Z","dependencies_parsed_at":"2023-01-24T14:15:48.042Z","dependency_job_id":"cfee821c-e522-476e-94a2-e8d3ba10a897","html_url":"https://github.com/DropsDevopsOrg/ECommerceCrawlers","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DropsDevopsOrg%2FECommerceCrawlers","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DropsDevopsOrg%2FECommerceCrawlers/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DropsDevopsOrg%2FECommerceCrawlers/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DropsDevopsOrg%2FECommerceCrawlers/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DropsDevopsOrg","download_url":"https://codeload.github.com/DropsDevopsOrg/ECommerceCrawlers/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243482876,"owners_count":20297900,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["alitask","baidu","baidu-tieba","baotu","boss","crawler","ctrip","dazhong-spider","douban-movie","douban-music","fofa","lagou","python3","quanjing","scrapy","sohu","taobao-spider","wechat","xianyu","zhilianzhaopin"],"created_at":"2024-07-30T21:00:37.363Z","updated_at":"2025-03-14T22:32:13.306Z","avatar_url":"https://github.com/DropsDevopsOrg.png","language":"Python","funding_links":[],"categories":["网络服务","Crawler","Python"],"sub_categories":["网络爬虫"],"readme":"[![](https://img.shields.io/badge/language-Python35-green.svg)]() [![](https://img.shields.io/badge/Branch-master-green.svg?longCache=true)]() [![](https://img.shields.io/github/followers/DropsDevopsOrg.svg?label=Follow)]() ![GitHub contributors](https://img.shields.io/github/contributors/DropsDevopsOrg/ECommerceCrawlers.svg) [![](https://img.shields.io/github/forks/DropsDevopsOrg/ECommerceCrawlers.svg?label=Fork\u0026style=social)]() [![](https://img.shields.io/github/stars/DropsDevopsOrg/ECommerceCrawlers.svg?style=social)]() [![](https://img.shields.io/github/watchers/DropsDevopsOrg/ECommerceCrawlers.svg?label=Watch\u0026style=social)]()\n\n## ECommerceCrawlers\n\n多种电商商品数据 🐍 爬虫，整理收集爬虫练习。每个项目都是成员写的。通过实战项目练习解决一般爬虫中遇到的问题。\n\n通过每个项目的 readme，了解爬取过程分析。\n\n对于精通爬虫的 pyer，这将是一个很好的例子减少重复收集轮子的过程。项目经常更新维护，确保即下即用，减少爬取的时间。\n\n对于小白通过 ✍️ 实战项目，了解爬虫的从无到有。爬虫知识构建可以移步[项目 wiki](https://github.com/DropsDevopsOrg/ECommerceCrawlers/wiki/%E7%88%AC%E8%99%AB%E5%88%B0%E5%BA%95%E8%BF%9D%E6%B3%95%E5%90%97%3F)。爬虫可能是一件非常复杂、技术门槛很高的事情，但掌握正确的方法，在短时间内做到能够爬取主流网站的数据，其实非常容易实现，但建议从一开始就要有一个具体的目标。\n\n在目标的驱动下，你的学习才会更加精准和高效。那些所有你认为必须的前置知识，都是可以在完成目标的过程中学到的 😁😁😁。\n\n需要进阶学习爬虫技巧，推荐王平大师傅的[猿人学·爬虫逆向高阶课](https://j.youzan.com/zF-n-2)，报AJay13推荐，可享受内部优惠价格。\n\n欢迎大家对本项目的不足加以指正，⭕️Issues 或者 🔔Pr\n\n\u003e 在之前上传的大文件贯穿了 3/4 的 commits，发现每次 clone 达到 100M，这与我们最初的想法违背，我们不能很有效的删除每一个文件（太懒），将重新进行初始化仓库的 commit。并在今后不上传爬虫数据，优化仓库结构。\n\n## About\n\n- 码云仓库链接:[AJay13/ECommerceCrawlers](https://gitee.com/AJay13/ECommerceCrawlers)\n- Github 仓库链接:[DropsDevopsOrg/ECommerceCrawlers](https://github.com/DropsDevopsOrg/ECommerceCrawlers)\n- 项目展示平台链接:[http://wechat.doonsec.com](http://wechat.doonsec.com)\n\n## Income\n\n几乎 80%的项目都是帮客户写的爬虫，在添加到仓库之前已经经过客户同意可开源原则。\n\n\n\n## CrawlerDemo\n\n- [x] [DianpingCrawler](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/DianpingCrawler)：大众点评爬取\n- [x] [East_money](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/East_money)：scrapy 爬取东方财富网\n- [x] [📛TaobaoCrawler(new)](\u003chttps://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/TaobaoCrawler(new)\u003e)：阿里系全自主平台(淘宝、天猫、咸鱼、菜鸟裹裹、飞猪等)信息爬取 免 cookie, 理论上不被反爬虫机制(只提供淘宝，其他思路一样，加密方式一样)，\n- [x] [📛SIPO 专利审查](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/SIPO专利审查)：SIPO 专利审查 自动化客户端\n- [x] [📛QiChaCha](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/QiChaCha)：企查查 全国工业园区及企业信息\n- [x] [TaobaoCrawler](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/TaobaoCrawler)：淘宝商品爬取\n- [x] [📛ZhaopinCrawler](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/ZhaopinCrawler)：各大招聘网站爬取\n- [x] [ShicimingjuCrawleAndDisplayr](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/ShicimingjuCrawleAndDisplay)：诗词名家句网站爬取展示\n- [x] [XianyuCrawler](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/XianyuCrawler)：闲鱼商品爬取\n- [x] [SohuNewCrawler](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/SohuNewCrawler)：新闻网爬取\n- [x] [WechatCrawler](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/WechatCrawler)：微信公众号爬取\n- [x] [cnblog](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/cnblog)：scrapy 博客园爬取\n- [x] [WeiboCrawler](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/WeiboCrawler)：微博数据爬取免 cookie\n- [x] [OtherCrawlers](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/OthertCrawler)：一些有趣的爬虫例子\n  - [x] [0x01 百度贴吧](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/OthertCrawler#0x01baidutieba)\n  - [x] [0x02 豆瓣电影](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/OthertCrawler#0x02doubanmovie)\n  - [x] [0x03 阿里任务](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/OthertCrawler#0x03alitask)\n  - [x] [0x04 包图网视频](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/OthertCrawler#0x04baotu)\n  - [x] [0x05 全景网图片](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/OthertCrawler#0x05quanjing)\n  - [x] [0x06 豆瓣音乐](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/OthertCrawler#0x06douban_music)\n  - [x] [0x07 某省药监局](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/OthertCrawler#0x07gdfda_pharmacy)\n  - [x] [0x08 fofa](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/OthertCrawler#0x08fofa)\n  - [ ] [0x09 汽车之家](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/OthertCrawler#0x09autohome)\n  - [ ] [0x010 国家统计局]()\n  - [x] [0x10 baidu](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/OthertCrawler/0x10baidu)\n  - [x] [0x11 蜘蛛泛目录](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/OthertCrawler/0x11zzc)\n  - [x] [0x12 今日头条](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/OthertCrawler/0x12toutiao)\n  - [x] [0x13 豆瓣影评分析](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/OthertCrawler/0x13douban_yingping)\n  - [x] [0x14 协程评论爬取](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/OthertCrawler/0x14ctrip_crawler)\n  - [x] [0x15 小米应用商店爬取](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/OthertCrawler/0x15xiaomiappshop)\n  - [x] [0x16 酷安app信息采集](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/OthertCrawler/0x16kuanappshop)\n  - [ ] [0x17 知乎信息采集](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/OthertCrawler/0x17zhihu)\n  - [x] [0x18 必应图片采集](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/OthertCrawler/0x18bing_img)\n  - [x] [0x19 安居客信息采集](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/OthertCrawler/0x19anjuke)\n  - [x] [0x20 途家民宿信息采集](https://github.com/DropsDevopsOrg/ECommerceCrawlers/tree/master/OthertCrawler/0x20tujiaminsu)\n## Contribution👏\n\n| \u003ca  href=\"https://gitee.com/joseph31\"\u003e\u003cimg class=\"avatar\" src=\"https://avatars3.githubusercontent.com/u/47005658?s=460\u0026v=4\" width=\"48\" height=\"48\" alt=\"@joseph31\"\u003e\u003c/a\u003e | \u003ca  href=\"https://github.com/Joynice\"\u003e\u003cimg class=\"avatar\" src=\"https://avatars0.githubusercontent.com/u/22851022?s=96\u0026amp;v=4\" width=\"48\" height=\"48\" alt=\"@Joynice\"\u003e\u003c/a\u003e | \u003ca href=\"https://github.com/liangweiyang\"\u003e\u003cimg class=\"avatar\" src=\"https://avatars0.githubusercontent.com/u/37971213?s=96\u0026amp;v=4\" width=\"48\" height=\"48\" alt=\"@liangweiyang\"\u003e\u003c/a\u003e | \u003ca href=\"https://github.com/Hatcat123\"\u003e\u003cimg class=\"avatar\" src=\"https://avatars0.githubusercontent.com/u/28727970?s=96\u0026amp;v=4\" width=\"48\" height=\"48\" alt=\"@Hatcat123\"\u003e\u003c/a\u003e | \u003ca href=\"https://github.com/jihu9\"\u003e\u003cimg class=\"avatar\" src=\"https://avatars0.githubusercontent.com/u/17663102?s=96\u0026amp;v=4\" width=\"48\" height=\"48\" alt=\"@jihu9\"\u003e\u003c/a\u003e | \u003ca href=\"https://github.com/ctycode\"\u003e\u003cimg class=\"avatar\" src=\"https://avatars3.githubusercontent.com/u/56985178?s=96\u0026amp;v=4\" width=\"48\" height=\"48\" alt=\"@ctycode\"\u003e\u003c/a\u003e |\u003ca href=\"https://github.com/sparkyuyuanyuan\"\u003e\u003cimg class=\"avatar\" src=\"https://avatars3.githubusercontent.com/u/50583631?s=96\u0026amp;v=4\" width=\"48\" height=\"48\" alt=\"@sparkyuyuanyuan\"\u003e\u003c/a\u003e |\n| :---------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------: |:------------------------------:|\n|    [joseph31](https://gitee.com/joseph31)                                                                  |        [Joynice](https://github.com/Joynice)  |    [liangweiyang](https://github.com/liangweiyang)    |         [Hatcat123](https://github.com/Hatcat123)                                                                   |                                                                  [jihu9](https://github.com/jihu9)                                                                   |                                                                  [ctycode](https://github.com/ctycode)                                                                   |                                                                  [sparkyuyuanyuan](https://github.com/sparkyuyuanyuan)                                                                   |\n\n\n\u003e wait for you\n\n## What You Learn ?\n\n本项目使用了哪些有用的技术\n\n- 数据分析\n  - [x] chrome Devtools\n  - [x] Fiddler\n  - [x] Firefox\n  - [ ] appnium\n  - [x] anyproxy\n  - [x] mitmproxy\n- 数据采集\n  - [x] [urllib]()\n  - [x] [requests](https://2.python-requests.org//zh_CN/latest/user/quickstart.html)\n  - [x] scrapy\n  - [x] selenium\n  - [ ] pypputeer\n- 数据解析\n  - [x] re\n  - [x] beautifulsoup\n  - [x] xpath\n  - [x] pyquery\n  - [x] css\n- 数据保存\n  - [x] txt 文本\n  - [x] csv\n  - [x] excel\n  - [x] mysql\n  - [x] redis\n  - [x] mongodb\n- 反爬验证\n  - [x] mitmproxy 绕过淘宝检测\n  - [x] js 数据解密\n  - [x] js 数据生成对应指纹库\n  - [x] 文字混淆\n  - [ ] 穿插脏数据\n- 效率爬虫\n  - [x] 单线程\n  - [x] 多线程\n  - [x] 多进程\n  - [x] 异步协成\n  - [x] 生产者消费者多线程\n  - [x] 分布式爬虫系统\n\n\u003e _链接标识官方文档或推荐例子_\n\n## What`s Spider 🕷？\n\n**[ECommerceCrawlerswiki](https://github.com/DropsDevopsOrg/ECommerceCrawlers/wiki)**\n\n### 🙋0x01 爬虫简介\n\n**爬虫**\n\n爬虫是一种按照一定的规则，自动地抓取万维网信息的程序或者脚本。\n\n**[爬虫到底违法吗？](https://github.com/DropsDevopsOrg/ECommerceCrawlers/wiki/%E7%88%AC%E8%99%AB%E5%88%B0%E5%BA%95%E8%BF%9D%E6%B3%95%E5%90%97%3F)**\n\n**爬虫作用**\n\n- 市场分析：电商分析、商圈分析、一二级市场分析等\n- 市场监控：电商、新闻、房源监控等\n- 商机发现：招投标情报发现、客户资料发掘、企业客户发现等\n\n**网页介绍**\n\n- url\n- html\n- css\n- js\n\n**Roobots 协议**\n\n无规矩不成方圆，Robots 协议就是爬虫中的规矩，它告诉爬虫和搜索引擎哪些页面可以抓取，哪些不可以抓取。\n通常是一个叫作 robots.txt 的文本文件，放在网站的根目录下。\n\n### 🙋0x02 爬取过程\n\n**获取数据**\n\n**模拟获取数据**\n\n### 🙋0x03 解析数据\n\n**re**\n\n**beautifulsoup**\n\n**xpath**\n\n**pyquery**\n\n**css**\n\n### 🙋0x04 存储数据\n\n小规模数据存储（文本）\n\n- txt 文本\n- csv\n- excel\n\n大规模数据存储（数据库）\n\n- mysql\n- redis\n- mongodb\n\n### 🙋0x05 反爬措施\n\n反爬\n\n反反爬\n\n### 🙋0x06 效率爬虫\n\n多线程\n\n多进程\n\n异步协程\n\nscrapy 框架\n\n### 🙋0x07 可视化处理\n\nflask Web\n\ndjango Web\n\ntkinter\n\necharts\n\nelectron\n\n## Padding\n\n…………\n\n## Awesome-Example😍:\n\n- [CriseLYJ/awesome-python-login-model](https://github.com/CriseLYJ/awesome-python-login-model)\n\n- [lb2281075105/Python-Spider](https://github.com/lb2281075105/Python-Spider)\n\n- [SpiderCrackDemo](https://github.com/wkunzhi/SpiderCrackDemo)\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FDropsDevopsOrg%2FECommerceCrawlers","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FDropsDevopsOrg%2FECommerceCrawlers","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FDropsDevopsOrg%2FECommerceCrawlers/lists"}