https://github.com/dxsooo/shortvideocrawl

Short video crawler based on scrapy
https://github.com/dxsooo/shortvideocrawl

crawler kuaishou scrapy spider video-crawler

Last synced: 3 months ago
JSON representation

Short video crawler based on scrapy

Host: GitHub
URL: https://github.com/dxsooo/shortvideocrawl
Owner: dxsooo
License: apache-2.0
Created: 2023-01-29T10:53:41.000Z (over 2 years ago)
Default Branch: master
Last Pushed: 2024-11-18T11:10:53.000Z (7 months ago)
Last Synced: 2025-03-26T03:41:40.878Z (3 months ago)
Topics: crawler, kuaishou, scrapy, spider, video-crawler
Language: Python
Homepage:
Size: 261 KB
Stars: 13
Watchers: 2
Forks: 2
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # ShortVideoCrawl

[![GitHub](https://img.shields.io/github/license/dxsooo/ShortVideoCrawl)](./LICENSE)

[![CodeFactor](https://www.codefactor.io/repository/github/dxsooo/shortvideocrawl/badge)](https://www.codefactor.io/repository/github/dxsooo/shortvideocrawl)

Short video crawler based on [scrapy](https://github.com/scrapy/scrapy), crawling with search query of the target sites.

Supports:

|Site|Name|Status|

|-|-|-|

|| [kuaishou](https://www.kuaishou.com/)| :heavy_check_mark: |

|| [ixigua](https://www.ixigua.com/)| :construction: |

|新片场|[xinpianchang](https://www.xinpianchang.com/)| :heavy_check_mark: |

||[haokan](https://haokan.baidu.com/)| :construction: |

|度小视/全民小视频*|quanmin| :heavy_check_mark: |

> \*度小视/全民小视频官网已经下线，但是目前本项目仍可用（2024.6测试）

## Usage

requirements:

- python 3.10+

- poetry

### prepare

```bash

git clone https://github.com/dxsooo/ShortVideoCrawl

cd ShortVideoCrawl

poetry install --only main

poetry shell

```

### run

For example:

```bash

cd shortvideocrawl

# main parameters:

#   query: query word

#   count: target video count

# kuaishou

scrapy crawl kuaishou -a query='蔡徐坤' -a count=50

# xigua, with highest resolution and size smaller than 64 MB, duration smaller than 5 min

# scrapy crawl ixigua -a query='蔡徐坤' -a count=50

# xinpianchang, with highest resolution and size smaller than 64 MB, duration smaller than 5 min, but can only get a fixed number of video

scrapy crawl xinpianchang -a query='蔡徐坤'

# haokan, with highest resolution

# scrapy crawl haokan -a query='蔡徐坤' -a count=50

# quanmin

scrapy crawl quanmin -a query='蔡徐坤' -a count=50

```

videos are saved in `./videos`, named with video id of source platform.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dxsooo/shortvideocrawl

Awesome Lists containing this project

README