https://github.com/clemapfel/douyin_scraper

take list of douyin videos, outputs filtered csv of metadata along with downloaded videos
https://github.com/clemapfel/douyin_scraper

Last synced: 10 months ago
JSON representation

take list of douyin videos, outputs filtered csv of metadata along with downloaded videos

Host: GitHub
URL: https://github.com/clemapfel/douyin_scraper
Owner: Clemapfel
License: mit
Created: 2023-01-15T15:12:31.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2023-01-18T16:02:13.000Z (over 3 years ago)
Last Synced: 2025-08-11T14:51:01.147Z (11 months ago)
Language: Python
Homepage:
Size: 30.3 KB
Stars: 8
Watchers: 2
Forks: 2
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Douyin Scraper

# Installation

```
git clone https://github.com/Clemapfel/douyin_scraper.git
```

# Dependencies

```
pip install json
pip install csv
pip install requests
pip install douyin-tiktok-scraper
```

# Usage

### 1. Collect Video URLs

Paste video urls into a file, henceforth assumed to be `douyin_scraper/video_ids.txt`.

Example `video_ids.txt`:
```
https://www.douyin.com/video/7188679476946029885
https://www.douyin.com/video/7188514977211305277
https://www.douyin.com/video/7188463948864261434
```
### 2. Specify Metadata Filter

Paste json keys into a file, henceforth assumed to be `douying_scraper/filter.txt`. Data for these keys will be extracted
from the raw metadata into the output csv file.

Example `filter.txt`:
```
digg_count
play_count
share_count
```

For a full list of allowed keys, run the script once and inspect one of the `raw.json` files in `douyin_scraper/out`.

### 3. Execute Script

In your console, navigate to `douyin_scraper/`, then execute:

```commandline
python3 scrape.py video_ids.txt filter.txt
```

Where `video_ids.txt` is the file from step 1, `filter.txt` the file from step 2, both of which are located in the folder same folder as `scrape.py`.

Let the script run until the following appear:
```
Process finished with exit code 0
```

### 4. Collect Output

For each script run, a new folder in `./out` will be created, each containing the videos and video metadata for all videos
in `video_ids.txt`. Furthermore, an a `.csv` with the current time in the format `YYYY-MM-DD_hh:mm` in its name that contains
the values for all keys specified by `filter.txt` will appear in the same directory as the script.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/clemapfel/douyin_scraper

Awesome Lists containing this project

README