https://github.com/dxsooo/shortvideocrawl
Short video crawler based on scrapy
https://github.com/dxsooo/shortvideocrawl
crawler kuaishou scrapy spider video-crawler
Last synced: 3 months ago
JSON representation
Short video crawler based on scrapy
- Host: GitHub
- URL: https://github.com/dxsooo/shortvideocrawl
- Owner: dxsooo
- License: apache-2.0
- Created: 2023-01-29T10:53:41.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2024-11-18T11:10:53.000Z (7 months ago)
- Last Synced: 2025-03-26T03:41:40.878Z (3 months ago)
- Topics: crawler, kuaishou, scrapy, spider, video-crawler
- Language: Python
- Homepage:
- Size: 261 KB
- Stars: 13
- Watchers: 2
- Forks: 2
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ShortVideoCrawl
[](./LICENSE)
[](https://www.codefactor.io/repository/github/dxsooo/shortvideocrawl)Short video crawler based on [scrapy](https://github.com/scrapy/scrapy), crawling with search query of the target sites.
Supports:
|Site|Name|Status|
|-|-|-|
|| [kuaishou](https://www.kuaishou.com/)| :heavy_check_mark: |
|| [ixigua](https://www.ixigua.com/)| :construction: |
|新片场|[xinpianchang](https://www.xinpianchang.com/)| :heavy_check_mark: |
||[haokan](https://haokan.baidu.com/)| :construction: |
|度小视/全民小视频*|quanmin| :heavy_check_mark: |> \*度小视/全民小视频官网已经下线,但是目前本项目仍可用(2024.6测试)
## Usage
requirements:
- python 3.10+
- poetry### prepare
```bash
git clone https://github.com/dxsooo/ShortVideoCrawl
cd ShortVideoCrawl
poetry install --only main
poetry shell
```### run
For example:
```bash
cd shortvideocrawl# main parameters:
# query: query word
# count: target video count# kuaishou
scrapy crawl kuaishou -a query='蔡徐坤' -a count=50# xigua, with highest resolution and size smaller than 64 MB, duration smaller than 5 min
# scrapy crawl ixigua -a query='蔡徐坤' -a count=50# xinpianchang, with highest resolution and size smaller than 64 MB, duration smaller than 5 min, but can only get a fixed number of video
scrapy crawl xinpianchang -a query='蔡徐坤'# haokan, with highest resolution
# scrapy crawl haokan -a query='蔡徐坤' -a count=50# quanmin
scrapy crawl quanmin -a query='蔡徐坤' -a count=50
```videos are saved in `./videos`, named with video id of source platform.