https://github.com/clemapfel/douyin_scraper
take list of douyin videos, outputs filtered csv of metadata along with downloaded videos
https://github.com/clemapfel/douyin_scraper
Last synced: 10 months ago
JSON representation
take list of douyin videos, outputs filtered csv of metadata along with downloaded videos
- Host: GitHub
- URL: https://github.com/clemapfel/douyin_scraper
- Owner: Clemapfel
- License: mit
- Created: 2023-01-15T15:12:31.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2023-01-18T16:02:13.000Z (over 3 years ago)
- Last Synced: 2025-08-11T14:51:01.147Z (11 months ago)
- Language: Python
- Homepage:
- Size: 30.3 KB
- Stars: 8
- Watchers: 2
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Douyin Scraper
# Installation
```
git clone https://github.com/Clemapfel/douyin_scraper.git
```
# Dependencies
```
pip install json
pip install csv
pip install requests
pip install douyin-tiktok-scraper
```
# Usage
### 1. Collect Video URLs
Paste video urls into a file, henceforth assumed to be `douyin_scraper/video_ids.txt`.
Example `video_ids.txt`:
```
https://www.douyin.com/video/7188679476946029885
https://www.douyin.com/video/7188514977211305277
https://www.douyin.com/video/7188463948864261434
```
### 2. Specify Metadata Filter
Paste json keys into a file, henceforth assumed to be `douying_scraper/filter.txt`. Data for these keys will be extracted
from the raw metadata into the output csv file.
Example `filter.txt`:
```
digg_count
play_count
share_count
```
For a full list of allowed keys, run the script once and inspect one of the `raw.json` files in `douyin_scraper/out`.
### 3. Execute Script
In your console, navigate to `douyin_scraper/`, then execute:
```commandline
python3 scrape.py video_ids.txt filter.txt
```
Where `video_ids.txt` is the file from step 1, `filter.txt` the file from step 2, both of which are located in the folder same folder as `scrape.py`.
Let the script run until the following appear:
```
Process finished with exit code 0
```
### 4. Collect Output
For each script run, a new folder in `./out` will be created, each containing the videos and video metadata for all videos
in `video_ids.txt`. Furthermore, an a `.csv` with the current time in the format `YYYY-MM-DD_hh:mm` in its name that contains
the values for all keys specified by `filter.txt` will appear in the same directory as the script.