{"id":13602417,"url":"https://github.com/lijinma/wechat_spider","last_synced_at":"2025-04-12T21:33:52.673Z","repository":{"id":43189717,"uuid":"87156368","full_name":"lijinma/wechat_spider","owner":"lijinma","description":"使用“代理”的方式来抓取微信公众账号文章，可以抓取阅读数、点赞数，基于 anyproxy。","archived":false,"fork":false,"pushed_at":"2020-09-04T02:44:14.000Z","size":44,"stargazers_count":949,"open_issues_count":27,"forks_count":226,"subscribers_count":55,"default_branch":"master","last_synced_at":"2025-04-04T01:09:08.272Z","etag":null,"topics":["anyproxy","wechat","wechat-spider"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lijinma.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-04-04T06:49:24.000Z","updated_at":"2025-03-07T14:57:26.000Z","dependencies_parsed_at":"2022-08-31T21:03:08.789Z","dependency_job_id":null,"html_url":"https://github.com/lijinma/wechat_spider","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lijinma%2Fwechat_spider","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lijinma%2Fwechat_spider/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lijinma%2Fwechat_spider/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lijinma%2Fwechat_spider/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lijinma","download_url":"https://codeload.github.com/lijinma/wechat_spider/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248636502,"owners_count":21137470,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anyproxy","wechat","wechat-spider"],"created_at":"2024-08-01T18:01:22.639Z","updated_at":"2025-04-12T21:33:52.644Z","avatar_url":"https://github.com/lijinma.png","language":"JavaScript","funding_links":[],"categories":["JavaScript"],"sub_categories":[],"readme":"wechat_spider\n=====\n\n[![NPM version](https://badge.fury.io/js/wechat_spider.png)](http://badge.fury.io/js/wechat_spider)\n[![David Status](https://david-dm.org/lijinma/wechat_spider.png)](https://david-dm.org/lijinma/wechat_spider)\n\n## 【提醒】此抓取工具因为微信 api 修改无法跑通，请参考代码思路。\n\n\u003cp align=\"center\"\u003e\n  \u003cbr\u003e\n  \u003cb\u003e创造不息，交付不止\u003c/b\u003e\n  \u003cbr\u003e\n  \u003ca href=\"https://www.yousails.com\"\u003e\n    \u003cimg src=\"https://yousails.com/banners/brand.png\" width=350\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n这个项目是使用打理的方式抓取微信公众账号文章，首先你需要了解一下现在抓取微信公众账号的两种主流方法，请参考我的文章：\n\n[如何优雅的抓取微信公众号历史文章](https://mp.weixin.qq.com/s?__biz=MjM5NDA0Mjc0MQ==\u0026mid=2651552202\u0026idx=1\u0026sn=832cd8e9c4f5babcd20e6a52ee03611e\u0026chksm=bd721fd08a0596c6005f9c77f1c7b1f06fef2cceebd67dde0f33c822f8053d7c521a753c0101\u0026scene=0\u0026key=31688975937a18944006a2d2a5b0c346a7e091a3f473e69af65ebce0e0722a9bdac3cc4281c2eb40f110c3b87a727d8f42b8265a7c1cb20744f74eadf0178023744783aab775c2d47ac7a30b16c65548\u0026ascene=0\u0026uin=OTM1MDQxMDQw)\n\n所以现在一般有两种做法，一种通过搜狗微信，一种通过代理的方式抓取，这个项目就是使用代理的方式抓取。\n\n我本来是写了更复杂的工具，使用 Node.js 的 anyproxy 加上 php 的 Laravel 框架，完成这些功能，但是某天洗澡的时候终于想通了，我其实把一个工具复杂化了，这个工具本来是很简单的，我给一位媒体朋友指导了一下，他也很快就用起来了。\n\n## 输出\n\n输出有两个东西，一个是 wechat.sqlite，一个是 wechat.csv，wechat.csv 需要通过命令 `wechat_spider csv` 来生成。\n\n如下是我的公众账号对应的数据：\n\n![file](https://raw.githubusercontent.com/lijinma/MyBox/master/spider/spider_1.png)\n\n表格头解释：\n\n```\naccountName: 公众号名称\nauthor: 作者\ntitle: 文章标题\ncontentUrl: 文章链接\ncover: 文章封面图\ndigest: 文章摘要\nidx: 如果是1，代表的是当天第一篇文章，如果是2，代表当天第二篇文章，以此类推。\nsourceUrl: 阅读原文对应的链接\ncreateTime: 文章创建时间\nreadNum: 阅读数\nlikeNum: 点赞数\nrewardNum: 赞赏数\nelectedCommentNum: 被选择显示的回复数\n```\n\n## 安装\n\n### 安装 Node.js\n\n通过网站 https://nodejs.org/zh-cn/ 下载最新版本。\n\n### 安装 Python 2.x 等编译环境依赖\n\n因为里面依赖 sqlite，通过 [node-gyp](https://github.com/nodejs/node-gyp) 编译的过程中需要 python 2.x (3.x 不行) 以及 VCBuild.exe ，所以 Windows 的同学一定要安装一下，否则会出错。\n\nWindows 用户通过在具有管理员权限的 PowerShell 下输入 `npm install --global --production windows-build-tools` 下载安装编译环境依赖。\n\n### 测试 Node 和 Python 安装正确\n\nMac 在终端下，Windows 在 cmd 下：\n\n```bash\n$ npm -v\n4.3.0\n\n$ python\nPython 2.7.6 (default, Nov 18 2013, 15:12:51)\n[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)] on darwin\nType \"help\", \"copyright\", \"credits\" or \"license\" for more information.\n\u003e\u003e\u003e\n```\n\n如果输出以上类似的信息，证明工具已经安装好了。\n\n### 安装 wechat_spider\n\n```bash\n$ npm install wechat_spider -g\n```\n\n### 测试 wechat_spider 安装正确\n\n```bash\n$ wechat_spider --help\n\n  Usage: wechat_spider [options]\n\n  Options:\n\n    -h, --help     output usage information\n    -V, --version  output the version number\n```\n如果输出以上类似信息，证明 wechat_spider 已经安装成功\n## 使用\n\n使用分四步，开启代理，手机设置代理，查看公众账号历史记录，接下来就开始自动抓取了，最后生成 csv。\n\n### 首次打开需要安装证书\n\n第一步：Mac 在终端下，Windows 在 cmd 下打开工具：\n\n$ wechat_spider\n\n首次需要信任证书。\n\n默认会打开证书的文件夹，如果没有打开，浏览器打开 http://localhost:8002/fetchCrtFile ，也能获取rootCA.crt文件，获取到根证书后，双击，根据操作系统提示，信任rootCA：\n\n* Windows\n  * ![https://t.alipayobjects.com/tfscom/T1D3hfXeFtXXXXXXXX.jpg_700x.jpg](https://t.alipayobjects.com/tfscom/T1D3hfXeFtXXXXXXXX.jpg_700x.jpg)\n* Mac\n  * ![https://t.alipayobjects.com/tfscom/T1NwFfXn0oXXXXXXXX.jpg_400x.jpg](https://t.alipayobjects.com/tfscom/T1NwFfXn0oXXXXXXXX.jpg_400x.jpg)\n\n第二步：使用手机代理：\n\n* 首次手机需要安装证书，浏览器打开：http://localhost:8002/qr_root ，使用微信扫描二维码，[重要] 用浏览器打开：\n\n  * \u003cimg src=\"https://raw.githubusercontent.com/lijinma/MyBox/master/spider/spider_2.jpeg\" width=\"300px\"\u003e\n  * \u003cimg src=\"https://raw.githubusercontent.com/lijinma/MyBox/master/spider/spider_3.jpeg\" width=\"300px\"\u003e\n  * \u003cimg src=\"https://raw.githubusercontent.com/lijinma/MyBox/master/spider/spider_4.jpeg\" width=\"300px\"\u003e\n  * \u003cimg src=\"https://raw.githubusercontent.com/lijinma/MyBox/master/spider/spider_5.jpeg\" width=\"300px\"\u003e\n  * \u003cimg src=\"https://raw.githubusercontent.com/lijinma/MyBox/master/spider/spider_6.jpeg\" width=\"300px\"\u003e\n\n* 然后获取到你电脑的 IP 地址，假设是 192.168.1.5\n* 设置手机代理为电脑：\n\n  * \u003cimg src=\"https://raw.githubusercontent.com/lijinma/MyBox/master/spider/spider_7.jpeg\" width=\"300px\"\u003e\n\n  * \u003cimg src=\"https://raw.githubusercontent.com/lijinma/MyBox/master/spider/spider_8.jpeg\" width=\"300px\"\u003e\n  \n第三步：选择一个微信公众号，点击查看历史记录\n\n* \u003cimg src=\"https://raw.githubusercontent.com/lijinma/MyBox/master/spider/spider_9.jpeg\" width=\"300px\"\u003e\n* \u003cimg src=\"https://raw.githubusercontent.com/lijinma/MyBox/master/spider/spider_10.jpeg\" width=\"300px\"\u003e\n\n第四步：等待出现页面“一个公众号采集完成”，就可以生成 csv 了\n\n```bash\n $ wechat_spider csv\n```\n\n## 打赏\n我是金马，一个想搞点事情的程序员。如果这个小工具对你有帮助，你可以请我喝杯咖啡，谢谢 :)\n\n![](http://xiaolai.co/img/alipay.jpeg)\n![](http://xiaolai.co/img/pay.png)\n\n\n## LICENSE\n\nMIT.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flijinma%2Fwechat_spider","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flijinma%2Fwechat_spider","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flijinma%2Fwechat_spider/lists"}