https://github.com/facert/tumblr_spider
汤不热 python 多线程爬虫
https://github.com/facert/tumblr_spider
python spider tumblr
Last synced: about 1 month ago
JSON representation
汤不热 python 多线程爬虫
- Host: GitHub
- URL: https://github.com/facert/tumblr_spider
- Owner: facert
- License: mit
- Created: 2016-11-21T02:50:06.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2020-07-22T02:56:48.000Z (almost 5 years ago)
- Last Synced: 2024-08-01T18:39:11.793Z (9 months ago)
- Topics: python, spider, tumblr
- Language: Python
- Size: 131 KB
- Stars: 462
- Watchers: 37
- Forks: 167
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
tumblr_spider is being sponsored by the following tool; please help to support us by taking a look and signing up to a free trial.
# tumblr_spider
汤不热 python 多线程爬虫#### install
> pip install -r requirements.txt#### run
> python tumblr.py username (usename 为任意一个热门博主的 usename)## snapshoot
#### 爬取结果
> `user.txt` 是爬取的博主用户名结果, `source.txt` 是视频地址集#### 原理
> 根据一个热门博主的 usename, 脚本自动会获取博主转过文章的其他博主的 username,并放入爬取队列中,递归爬取。#### 申明
> 这是一个正经的爬虫(严肃脸),爬取的资源跟你第一个填入的 username 有很大关系,另外由于某些原因,导致 tumblr 被墙,所以最简单的方式就是用国外 vps 去跑。