{"id":13843732,"url":"https://github.com/striver-ing/wechat-spider","last_synced_at":"2025-04-14T08:56:43.437Z","repository":{"id":38549982,"uuid":"111760843","full_name":"striver-ing/wechat-spider","owner":"striver-ing","description":"开源微信爬虫：爬取公众号所有 文章、阅读量、点赞量和评论内容。易部署。持续维护！！！","archived":false,"fork":false,"pushed_at":"2023-03-31T14:36:41.000Z","size":2693,"stargazers_count":2538,"open_issues_count":52,"forks_count":611,"subscribers_count":68,"default_branch":"master","last_synced_at":"2025-04-07T01:11:20.190Z","etag":null,"topics":["wechat"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/striver-ing.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2017-11-23T03:53:15.000Z","updated_at":"2025-04-06T16:00:37.000Z","dependencies_parsed_at":"2024-02-06T00:58:06.396Z","dependency_job_id":"37f7fbbb-1fbb-44b9-b1b4-e84d8f751ca1","html_url":"https://github.com/striver-ing/wechat-spider","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/striver-ing%2Fwechat-spider","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/striver-ing%2Fwechat-spider/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/striver-ing%2Fwechat-spider/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/striver-ing%2Fwechat-spider/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/striver-ing","download_url":"https://codeload.github.com/striver-ing/wechat-spider/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248852112,"owners_count":21171839,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["wechat"],"created_at":"2024-08-04T17:02:25.532Z","updated_at":"2025-04-14T08:56:43.395Z","avatar_url":"https://github.com/striver-ing.png","language":"Python","funding_links":[],"categories":["Python (1887)","Python"],"sub_categories":[],"readme":"# 微信爬虫\n\n\u003c!--\u003c!--**该爬虫为基于中间人的方式，时效性不高，且可能会封号，请酌情使用。\n若需`长期``稳定``实时`监控大批量公众号，可使用如下api接口：**\n\n[http://182.92.108.94:2119/client/wechat_article/document](http://182.92.108.94:2119/client/wechat_article/document)\n--\u003e\n\n以下为部署文档\n\n技术文档请查看：[https://t.zsxq.com/7ubmqNJ](https://t.zsxq.com/7ubmqNJ)\n\n逆向方式抓取的方案请查看：[https://wx.zsxq.com/dweb2/index/topic_detail/215584212588541](https://wx.zsxq.com/dweb2/index/topic_detail/215584212588541)\n\n## 功能：\n\n- [x] 检测公众号每日新发文章\n- [x] 抓取公众号信息\n- [x] 抓取文章列表\n- [x] 抓取文章信息\n- [x] 抓取阅读量、点赞量、评论量\n- [x] 抓取评论信息\n- [x] 临时链接转永久链接\n\n打包好的执行文件下载地址\n\n链接: https://pan.baidu.com/s/1hyhj6YnV-L9w8LPx42FFzQ  密码: qnk6\n\n## 特色：\n\n1. **免安装**：支持mac、window，双击软件即可执行\n2. **自动化**：只需要配置好待监控的公众号列表，启动软件后即可每日自动抓取公众号及文章等信息\n3. **好对接**：抓取到的数据使用mysql存储，方便处理数据\n4. **不漏采**：采用任务状态标记的方式，防止遗漏每一个公众号、每一篇文章\n5. **分布式**：支持多个微信号同时采集，微信客户端支持Android、iphone、Mac、Window 全平台\n\n## 数据示例\n\n**1. 公众号数据**\n![-w829](media/15584541954959.jpg)\n\n**2. 文章列表数据**\n![-w1369](media/15584542414888.jpg)\n\n**3. 文章数据**\n![-w1466](media/15584545518249.jpg)\n\n**4. 阅读点赞评论数据**\n![-w623](media/15584546784023.jpg)\n\n**5. 评论数据**\n![-w1033](media/15584547028361.jpg)\n\n## 所需环境\n\n1. mysql：用来存储抓取到的数据以及任务表\n2. redis：任务缓存，减少操作mysql的次数\n\n## 安装配置\n\n\u003e 以下安装说明安需查看，仅作为参考。因每个人环境不同，可能安装会有些差异，可参考网上的资料\n\n### 1. 安装mysql\n#### 1.1 window\n#### 1.2 mac\n### 2. 安装redis\n#### 2.1 window\n#### 2.2 mac\n### 3. 安装证书\n\n可用浏览器访问 mitm.it 然后下载，或者百度如何安装mitmproxy证书\n\n#### 3.1 iphone\n1. 下载安装完毕后别忘记最后一步\n2. 打开设置-通用-关于本机-证书信任设置\n3. 开启mitmproxy选项。\n\n#### 3.2 android\n1. 安装完毕检查\n2. 打开设置-安全-信任的凭据\n3. 查看安装的证书是否存在\n\n#### 3.3 window\n 2. 双击运行\n 3. 安装到本地计算机\n 4. 需要密钥时跳过\n 5. 选择“将所有的证书都放入下列存储”，接着选择“受信任的根证书颁发机构”\n 6. 最后，弹出警告窗口，直接点击“是”\n\n#### 3.4 mac\n2. 下载完双击安装\n3. 打开Keychain Access.app\n4. 选择login(Keychains)和Certificates(Category)中找到mitmproxy\n5. 点击mitmproxy，在Trust中选择Always Trust\n\n\n### 4. 配置代理\n\n\u003e 如果使用手机，需要确保手机和运行wechat-spider的电脑连接在同一个路由器上\n\n#### 3.1 iphone\n\n打开设置-无线局域网-所连接的Wifi-配置代理-手动\n填上该安装服务器的IP和端口8080\n\n#### 3.2 android\n\n打开设置-WLAN-长按所连接的网络-修改网络-高级选项-手动\n填上该安装服务器的IP和端口8080\n\n#### 3.3 window\n打开chrome 设置-\u003e高级\n![A580D0082CCEE0621F98FAF003C5530E](media/A580D0082CCEE0621F98FAF003C5530E.png)\n![95AE10B3227FDE0637AB227A5A8267E3](media/95AE10B3227FDE0637AB227A5A8267E3.png)\n\n#### 3.4 mac\n\n打开系统配置（System Preferences.app）- 网络（Network）- 高级（Advanced）- 代理（Proxies）- Secure Web Proxy(HTTPS)\n填上该安装服务器的IP和端口8080\n\n![-w668](media/15584581938431.jpg)\n![-w667](media/15584582326072.jpg)\n\n\n\n## 使用说明\n\n### 1. 安装如上说明安装好证书及配置好代理\n### 2. 正确配置config.yaml\n\n主要是配置mysql及redis的链接信息，确保能正确链接上\n\n### 3. 创建数据库 wechat\n\n![-w418](media/15610827417503.jpg)\n\n\n### 4. 启动wechat-spider\n\n此步骤如果config里的auto_create_tables值为true时，会自动创建mysql数据表。建议首次启动时设置为true，创建完表后设置为false\n    \n### 5. 下发公众号任务\n\n![-w201](media/15584578582622.jpg)\n录入数据到wechat_account_task, 如：\n![-w503](media/15584579051963.jpg)\n只填写__biz就好, 如：MzIxNzg1ODQ0MQ==\n\n### 6. 点击任意一公众号，查看历史消息\n\n![-w637](media/15584585019970.jpg)\n\n当出现如上红框中的提示信息时，说明大功告成了，过一会可以去数据库里验证数据了\n\n技术交流\n----\n若大家有什么疑问或指教，可加qq群，一起讨论问题。请备注`微信爬虫学习交流`\n\n\u003cimg src='https://i.imgur.com/5FM26rc.png' align = 'center' width = \"250\" style = \"margin-top:20px\"\u003e\n\n\n## 常见问题\n\n### 1. mysql 链接问题\n\n问题：链接时打印object supporting the buffer api required异常\n![](media/15610832298058.jpg)\n解决: 如果密钥是整形的，如123456，需要在配置文件中加双引号，如下：\n\n    mysqldb:\n      ip: localhost\n      port: 3306\n      db: wechat\n      user: root\n      passwd: \"123456\"\n      auto_create_tables: true # 是否自动建表 建议当表不存在是设置为true，表存在是设置为false，加快软件启动速度\n\n### 2. 正确配置完代理后提示证书或安全问题\n\n原因是我那个证书失效了，可参考 https://www.cnblogs.com/yunlongaimeng/p/9617708.html 安装数据\n\n### 3. 提示无任务\n\n检查 wechat_account_task 表中是否下发了__biz。可多下发几个测试\n\n### 4. Exception:DISCARD without MULTI\n\n![-w406](media/15632498867519.jpg)\n\n### 5. 正常启动后抓不到包\n\n1. 检是否设置代理\n2. 检查端口是否被占用\n\n## 微信攒赏\n\n开源项目不易，维护代码以及解决大家问题往往占据了大部分时间，为了保证内容持续输出，且**本项目恰巧对您有帮助**，还望多多支持哦(*￣︶￣)。\n\n可提供部署支持，答疑解惑（仅限打赏用户、提PR的开发者）。\n\n微信: boris_tm\n\n![赞赏码](media/%E8%B5%9E%E8%B5%8F%E7%A0%81.png)\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstriver-ing%2Fwechat-spider","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstriver-ing%2Fwechat-spider","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstriver-ing%2Fwechat-spider/lists"}