Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/15921483570/wechat_spider
微信公众号爬虫 (只需设置代理, 一键可以爬取所有历史文章)
https://github.com/15921483570/wechat_spider
Last synced: 5 days ago
JSON representation
微信公众号爬虫 (只需设置代理, 一键可以爬取所有历史文章)
- Host: GitHub
- URL: https://github.com/15921483570/wechat_spider
- Owner: 15921483570
- Created: 2016-10-26T10:25:29.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2018-04-09T12:53:56.000Z (over 6 years ago)
- Last Synced: 2024-08-02T18:39:01.184Z (3 months ago)
- Language: Go
- Size: 0 Bytes
- Stars: 137
- Watchers: 7
- Forks: 242
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# wechat_spider
微信公众号爬虫 (只需设置代理, 一键可以爬取指定公众号的所有历史文章)- 一个简单的Demo [simple_server.go][1]
```
package mainimport (
"log"
"net/http""github.com/sundy-li/wechat_spider"
"github.com/elazarl/goproxy"
)func main() {
var port = "8899"
proxy := goproxy.NewProxyHttpServer()
//open it see detail logs
// wechat.Verbose = true
proxy.OnResponse().DoFunc(
wechat_spider.ProxyHandle(wechat_spider.NewBaseProcessor()),
)
log.Println("server will at port:" + port)
log.Fatal(http.ListenAndServe(":"+port, proxy))}
```- 使用方法:
运行后, 设置手机的代理为 本机ip 8899端口, 打开微信客户端, 点击任一公众号查看历史文章按钮, 即可爬取该公众号的所有历史文章(已经支持自动翻页爬取)- 自定义输出源,实现Processor接口的Output方法即可, [custom_output_server.go][2]
[1]: https://github.com/sundy-li/wechat_spider/blob/master/examples/simple_server.go
[2]: https://github.com/sundy-li/wechat_spider/blob/master/examples/custom_output_server.go- 微信会屏蔽频繁的请求,所以历史文章的翻页请求调用了Sleep()方法, 默认每个请求休眠50ms,可以根据实际情况自定义Processor覆盖此方法