https://github.com/swhl/baiduimagecrawling

一个超级轻量的百度图片爬虫, modified from https://github.com/kong36088/BaiduImageSpider
https://github.com/swhl/baiduimagecrawling

baidu crawling image spider

Last synced: 4 months ago
JSON representation

一个超级轻量的百度图片爬虫, modified from https://github.com/kong36088/BaiduImageSpider

Host: GitHub
URL: https://github.com/swhl/baiduimagecrawling
Owner: SWHL
License: mit
Created: 2025-01-11T09:21:06.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-01-16T00:05:44.000Z (over 1 year ago)
Last Synced: 2025-09-15T11:53:29.631Z (9 months ago)
Topics: baidu, crawling, image, spider
Language: Python
Homepage:
Size: 13.7 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE

Awesome Lists containing this project

README

🕷️ Baidu Image Crawling

### 简介

一个超级轻量的百度图片爬虫, modified from

### 安装

```bash
pip install baidu_image_crawling
```

### Python使用

```python
from baidu_image_crawling.main import Crawler

crawler = Crawler(0.05, save_dir="outputs") # 抓取延迟为 0.05

# 抓取关键词为 “美女”，总数为2页，开始页码为1，每页 30 张, 即总共2*30=60张
crawler(word="美女", total_page=2, start_page=1, per_page=30)
```

### 终端使用

```bash
baidu_image_crawling -w 美女 -tp 1 -sp 1 -pp 2
```

查看参数文档：

```bash
$ baidu_image_crawling -h
usage: baidu_image_crawling [-h] -w WORD -tp TOTAL_PAGE -sp START_PAGE [-pp [PER_PAGE]] [-sd SAVE_DIR] [-d DELAY]

options:
-h, --help show this help message and exit
-w WORD, --word WORD 抓取关键词
-tp TOTAL_PAGE, --total_page TOTAL_PAGE
需要抓取的总页数
-sp START_PAGE, --start_page START_PAGE
起始页数
-pp [PER_PAGE], --per_page [PER_PAGE]
每页大小
-sd SAVE_DIR, --save_dir SAVE_DIR
图片保存目录
-d DELAY, --delay DELAY
抓取延时（间隔）
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/swhl/baiduimagecrawling

Awesome Lists containing this project

README

🕷️ Baidu Image Crawling