https://github.com/dxsooo/shortvideocrawl
Short video crawler based on scrapy
https://github.com/dxsooo/shortvideocrawl
crawler kuaishou scrapy spider video-crawler
Last synced: 11 months ago
JSON representation
Short video crawler based on scrapy
- Host: GitHub
- URL: https://github.com/dxsooo/shortvideocrawl
- Owner: dxsooo
- License: apache-2.0
- Created: 2023-01-29T10:53:41.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2025-07-02T18:15:18.000Z (12 months ago)
- Last Synced: 2025-07-02T19:26:32.940Z (12 months ago)
- Topics: crawler, kuaishou, scrapy, spider, video-crawler
- Language: Python
- Homepage:
- Size: 205 KB
- Stars: 13
- Watchers: 2
- Forks: 2
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ShortVideoCrawl
[](./LICENSE)
[](https://www.codefactor.io/repository/github/dxsooo/shortvideocrawl)
Short video crawler based on [scrapy](https://github.com/scrapy/scrapy), crawling with search query of the target sites.
Supports:
|Site|Name|Status|
|-|-|-|
|
| [kuaishou](https://www.kuaishou.com/)| :heavy_check_mark: |
|
| [ixigua](https://www.ixigua.com/)| :construction: |
|新片场|[xinpianchang](https://www.xinpianchang.com/)| :heavy_check_mark: |
|
|[haokan](https://haokan.baidu.com/)| :construction: |
|度小视/全民小视频*|quanmin| :heavy_check_mark: |
> \*度小视/全民小视频官网已经下线,但是目前本项目仍可用(2024.6测试)
## Usage
requirements:
- python 3.10+
- poetry
### prepare
```bash
git clone https://github.com/dxsooo/ShortVideoCrawl
cd ShortVideoCrawl
poetry install --only main
poetry shell
```
### run
For example:
```bash
cd shortvideocrawl
# main parameters:
# query: query word
# count: target video count
# kuaishou
scrapy crawl kuaishou -a query='蔡徐坤' -a count=50
# xigua, with highest resolution and size smaller than 64 MB, duration smaller than 5 min
# scrapy crawl ixigua -a query='蔡徐坤' -a count=50
# xinpianchang, with highest resolution and size smaller than 64 MB, duration smaller than 5 min, but can only get a fixed number of video
scrapy crawl xinpianchang -a query='蔡徐坤'
# haokan, with highest resolution
# scrapy crawl haokan -a query='蔡徐坤' -a count=50
# quanmin
scrapy crawl quanmin -a query='蔡徐坤' -a count=50
```
videos are saved in `./videos`, named with video id of source platform.