https://github.com/lki/wescraper

依赖Scrapy和搜狗搜索微信公众号文章
https://github.com/lki/wescraper

scrapy sogou wechat

Last synced: 30 days ago
JSON representation

依赖Scrapy和搜狗搜索微信公众号文章

Host: GitHub
URL: https://github.com/lki/wescraper
Owner: LKI
Created: 2016-05-17T08:21:04.000Z (over 9 years ago)
Default Branch: gh-pages
Last Pushed: 2017-03-25T04:37:47.000Z (over 8 years ago)
Last Synced: 2025-05-16T08:43:51.494Z (5 months ago)
Topics: scrapy, sogou, wechat
Language: Python
Size: 43 KB
Stars: 46
Watchers: 10
Forks: 27
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# WeScraper (WEchat SCRAPER)

本工具使用Python2.7和[scrapy][scrapy]来搜索微信公众号文章。

# 使用教程

## 命令行直接查询

安装Scrapy，直接查询。

```
pip install scrapy
python wescraper/scraper.py account liriansu miawu > we.json # 查询liriansu和miawu相关的公众号
python wescraper/scraper.py key-day liriansu miawu > we.json # 查询liriansu和miawu相关的文章（一天内）
```

## Web Server查询

安装Scrapy与Tornado，通过本地server查询：

```
pip install scrapy tornado
python wescraper/server.py
```

在server起来以后就可以通过`http://localhost/account/foo/bar/baz...`
来获取微信公众号文章列表了。

或者可以通过`http://localhost/key-year/foo/bar/baz...`
以关键字来查询公众号文章。

## Python Code调用

参见[scraper.py][scraper-py]源码

# 详细说明

* 一些可配置的参数见[config.py][config-py]

* 查询公众号默认获取列表的第一个。

* 本工具有可能会被Ban，解决方案可以参考[Scrapy: Avoiding getting banned][anti]
（一般而言，换IP就可以解决问题了）

* [cookie.py][cookie-py]内维护了一个Cookie池，会在n个Cookie中随机选取来访问，假如Cookie被ban了就会换一个Cookie。

* 欢迎在本代码基础上修改，记得跑一下单元测试噢：`python wescraper/test/test.py`

* 本工具完全依赖[搜狗微信搜索][sogou]抓取文章，假如搜狗微信搜索接口什么的变了可能就会抓取失败。

* [Python大法好！][dive-into-python] :wink:

# 版权/免责

代码版权归GitHub原作者 @LKI 所有。
严禁用于商业用途，其它转载/Fork随意。

[scrapy]: https://github.com/scrapy/scrapy
[scraper-py]: /wescraper/scraper.py
[config-py]: /wescraper/config.py
[anti]: http://doc.scrapy.org/en/latest/topics/practices.html#avoiding-getting-banned
[cookie-py]: /wescraper/cookie.py
[sogou]: http://weixin.sogou.com/
[dive-into-python]: http://www.diveintopython.net/

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/lki/wescraper

Awesome Lists containing this project

README