Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/15921483570/wechat_spider

微信公众号爬虫 (只需设置代理, 一键可以爬取所有历史文章)
https://github.com/15921483570/wechat_spider

Last synced: 5 days ago
JSON representation

微信公众号爬虫 (只需设置代理, 一键可以爬取所有历史文章)

Host: GitHub
URL: https://github.com/15921483570/wechat_spider
Owner: 15921483570
Created: 2016-10-26T10:25:29.000Z (about 8 years ago)
Default Branch: master
Last Pushed: 2018-04-09T12:53:56.000Z (over 6 years ago)
Last Synced: 2024-08-02T18:39:01.184Z (3 months ago)
Language: Go
Size: 0 Bytes
Stars: 137
Watchers: 7
Forks: 242
Open Issues: 2
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        # wechat_spider

微信公众号爬虫  (只需设置代理, 一键可以爬取指定公众号的所有历史文章)

- 一个简单的Demo  [simple_server.go][1]

```

package main

import (

	"log"

	"net/http"

	"github.com/sundy-li/wechat_spider"

	"github.com/elazarl/goproxy"

)

func main() {

	var port = "8899"

	proxy := goproxy.NewProxyHttpServer()

	//open it see detail logs

	// wechat.Verbose = true

	proxy.OnResponse().DoFunc(

		wechat_spider.ProxyHandle(wechat_spider.NewBaseProcessor()),

	)

	log.Println("server will at port:" + port)

	log.Fatal(http.ListenAndServe(":"+port, proxy))

}

```

- 使用方法:

运行后, 设置手机的代理为 本机ip 8899端口,  打开微信客户端, 点击任一公众号查看历史文章按钮, 即可爬取该公众号的所有历史文章(已经支持自动翻页爬取)

- 自定义输出源,实现Processor接口的Output方法即可, [custom_output_server.go][2]

  [1]: https://github.com/sundy-li/wechat_spider/blob/master/examples/simple_server.go

  [2]: https://github.com/sundy-li/wechat_spider/blob/master/examples/custom_output_server.go

- 微信会屏蔽频繁的请求,所以历史文章的翻页请求调用了Sleep()方法, 默认每个请求休眠50ms,可以根据实际情况自定义Processor覆盖此方法