https://github.com/zhanlaoban/spider_for_crawling_comments

基于Scrapy的爬虫，爬取地方领导留言网 “http://liuyan.people.com.cn/threads/list?fid=539 中的留言内容。
https://github.com/zhanlaoban/spider_for_crawling_comments

Last synced: 2 months ago
JSON representation

基于Scrapy的爬虫，爬取地方领导留言网 “http://liuyan.people.com.cn/threads/list?fid=539 中的留言内容。

Host: GitHub
URL: https://github.com/zhanlaoban/spider_for_crawling_comments
Owner: zhanlaoban
Created: 2019-04-16T02:47:16.000Z (about 6 years ago)
Default Branch: master
Last Pushed: 2019-04-16T03:21:57.000Z (about 6 years ago)
Last Synced: 2025-04-12T01:54:30.261Z (2 months ago)
Language: Python
Size: 53.7 KB
Stars: 3
Watchers: 1
Forks: 2
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# spider_for_Crawling_Comments
基于Scrapy的爬虫，爬取地方领导留言网 “http://liuyan.people.com.cn" 中的留言内容。

# Environment：
```pip install scrapy```

# Usage:
1. 找到你想要爬取的有关某个城市的留言，这里以北京市为例：http://liuyan.people.com.cn/threads/list?fid=539
2. 记下fid为539，写入get_tid.py中的fid值处；可以修改get_tid.py中最外层for循环中的循环次数，这里默认为20；默认爬取20*10条数据，10是不可控的
3. 运行以下命令：
```python get_tid.py```
在当前文件夹下得到tid.txt
4. 将上面得到的tid.txt拷贝到tutorial/tutorial/spiders下，与comments_spider.py处于同一目录下
5. 到/tutorial/目录下，运行以下命令：
```scrapy crawl comments -o commmets.csv```
即可得到名为comments.csv的留言内容

# Technology stack
1. 使用requests模块以POST方式请求网站，获取到tid的值
2. 再以上面获取到的tid值去爬取网站留言内容，使用的是scrapy
3. YOU CAN CONTACT ME IF YOU HAVE ANY QUESTIONS ABOUT THIS REPO.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/zhanlaoban/spider_for_crawling_comments

Awesome Lists containing this project

README