https://github.com/swhl/baiduimagecrawling
一个超级轻量的百度图片爬虫, modified from https://github.com/kong36088/BaiduImageSpider
https://github.com/swhl/baiduimagecrawling
baidu crawling image spider
Last synced: 4 months ago
JSON representation
一个超级轻量的百度图片爬虫, modified from https://github.com/kong36088/BaiduImageSpider
- Host: GitHub
- URL: https://github.com/swhl/baiduimagecrawling
- Owner: SWHL
- License: mit
- Created: 2025-01-11T09:21:06.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-01-16T00:05:44.000Z (over 1 year ago)
- Last Synced: 2025-09-15T11:53:29.631Z (9 months ago)
- Topics: baidu, crawling, image, spider
- Language: Python
- Homepage:
- Size: 13.7 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
### 简介
一个超级轻量的百度图片爬虫, modified from
### 安装
```bash
pip install baidu_image_crawling
```
### Python使用
```python
from baidu_image_crawling.main import Crawler
crawler = Crawler(0.05, save_dir="outputs") # 抓取延迟为 0.05
# 抓取关键词为 “美女”,总数为2页,开始页码为1,每页 30 张, 即总共2*30=60张
crawler(word="美女", total_page=2, start_page=1, per_page=30)
```
### 终端使用
```bash
baidu_image_crawling -w 美女 -tp 1 -sp 1 -pp 2
```
查看参数文档:
```bash
$ baidu_image_crawling -h
usage: baidu_image_crawling [-h] -w WORD -tp TOTAL_PAGE -sp START_PAGE [-pp [PER_PAGE]] [-sd SAVE_DIR] [-d DELAY]
options:
-h, --help show this help message and exit
-w WORD, --word WORD 抓取关键词
-tp TOTAL_PAGE, --total_page TOTAL_PAGE
需要抓取的总页数
-sp START_PAGE, --start_page START_PAGE
起始页数
-pp [PER_PAGE], --per_page [PER_PAGE]
每页大小
-sd SAVE_DIR, --save_dir SAVE_DIR
图片保存目录
-d DELAY, --delay DELAY
抓取延时(间隔)
```