Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/dxsooo/shortvideocrawl

Short video crawler based on scrapy
https://github.com/dxsooo/shortvideocrawl

crawler kuaishou scrapy spider video-crawler

Last synced: about 24 hours ago
JSON representation

Short video crawler based on scrapy

Awesome Lists containing this project

README

        

# ShortVideoCrawl

[![GitHub](https://img.shields.io/github/license/dxsooo/ShortVideoCrawl)](./LICENSE)
[![CodeFactor](https://www.codefactor.io/repository/github/dxsooo/shortvideocrawl/badge)](https://www.codefactor.io/repository/github/dxsooo/shortvideocrawl)

Short video crawler based on [scrapy](https://github.com/scrapy/scrapy), crawling with search query of the target sites.

Supports:

|Site|Name|Status|
|-|-|-|
|kuaishou| [kuaishou](https://www.kuaishou.com/)| :heavy_check_mark: |
|xigua| [ixigua](https://www.ixigua.com/)| :construction: |
|新片场|[xinpianchang](https://www.xinpianchang.com/)| :heavy_check_mark: |
|haokan|[haokan](https://haokan.baidu.com/)| :construction: |
|度小视/全民小视频*|quanmin| :heavy_check_mark: |

> \*度小视/全民小视频官网已经下线,但是目前本项目仍可用(2024.6测试)

## Usage

requirements:

- python 3.10+
- poetry

### prepare

```bash
git clone https://github.com/dxsooo/ShortVideoCrawl
cd ShortVideoCrawl
poetry install --only main
poetry shell
```

### run

For example:

```bash
cd shortvideocrawl

# main parameters:
# query: query word
# count: target video count

# kuaishou
scrapy crawl kuaishou -a query='蔡徐坤' -a count=50

# xigua, with highest resolution and size smaller than 64 MB, duration smaller than 5 min
# scrapy crawl ixigua -a query='蔡徐坤' -a count=50

# xinpianchang, with highest resolution and size smaller than 64 MB, duration smaller than 5 min, but can only get a fixed number of video
scrapy crawl xinpianchang -a query='蔡徐坤'

# haokan, with highest resolution
# scrapy crawl haokan -a query='蔡徐坤' -a count=50

# quanmin
scrapy crawl quanmin -a query='蔡徐坤' -a count=50
```

videos are saved in `./videos`, named with video id of source platform.