Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dxsooo/shortvideocrawl
Short video crawler based on scrapy
https://github.com/dxsooo/shortvideocrawl
crawler kuaishou scrapy spider video-crawler
Last synced: about 24 hours ago
JSON representation
Short video crawler based on scrapy
- Host: GitHub
- URL: https://github.com/dxsooo/shortvideocrawl
- Owner: dxsooo
- License: apache-2.0
- Created: 2023-01-29T10:53:41.000Z (almost 2 years ago)
- Default Branch: master
- Last Pushed: 2024-06-09T16:20:38.000Z (5 months ago)
- Last Synced: 2024-06-09T17:54:10.066Z (5 months ago)
- Topics: crawler, kuaishou, scrapy, spider, video-crawler
- Language: Python
- Homepage:
- Size: 259 KB
- Stars: 11
- Watchers: 2
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ShortVideoCrawl
[![GitHub](https://img.shields.io/github/license/dxsooo/ShortVideoCrawl)](./LICENSE)
[![CodeFactor](https://www.codefactor.io/repository/github/dxsooo/shortvideocrawl/badge)](https://www.codefactor.io/repository/github/dxsooo/shortvideocrawl)Short video crawler based on [scrapy](https://github.com/scrapy/scrapy), crawling with search query of the target sites.
Supports:
|Site|Name|Status|
|-|-|-|
|| [kuaishou](https://www.kuaishou.com/)| :heavy_check_mark: |
|| [ixigua](https://www.ixigua.com/)| :construction: |
|新片场|[xinpianchang](https://www.xinpianchang.com/)| :heavy_check_mark: |
||[haokan](https://haokan.baidu.com/)| :construction: |
|度小视/全民小视频*|quanmin| :heavy_check_mark: |> \*度小视/全民小视频官网已经下线,但是目前本项目仍可用(2024.6测试)
## Usage
requirements:
- python 3.10+
- poetry### prepare
```bash
git clone https://github.com/dxsooo/ShortVideoCrawl
cd ShortVideoCrawl
poetry install --only main
poetry shell
```### run
For example:
```bash
cd shortvideocrawl# main parameters:
# query: query word
# count: target video count# kuaishou
scrapy crawl kuaishou -a query='蔡徐坤' -a count=50# xigua, with highest resolution and size smaller than 64 MB, duration smaller than 5 min
# scrapy crawl ixigua -a query='蔡徐坤' -a count=50# xinpianchang, with highest resolution and size smaller than 64 MB, duration smaller than 5 min, but can only get a fixed number of video
scrapy crawl xinpianchang -a query='蔡徐坤'# haokan, with highest resolution
# scrapy crawl haokan -a query='蔡徐坤' -a count=50# quanmin
scrapy crawl quanmin -a query='蔡徐坤' -a count=50
```videos are saved in `./videos`, named with video id of source platform.