https://github.com/parva101/video2dataset
FiftyOne plugin that converts YouTube or local videos into image datasets with smart keyframe extraction.
https://github.com/parva101/video2dataset
computer-vision dataset fiftyone plugin video youtube
Last synced: 2 months ago
JSON representation
FiftyOne plugin that converts YouTube or local videos into image datasets with smart keyframe extraction.
- Host: GitHub
- URL: https://github.com/parva101/video2dataset
- Owner: Parva101
- License: apache-2.0
- Created: 2026-03-25T03:11:22.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2026-03-25T05:47:53.000Z (3 months ago)
- Last Synced: 2026-03-26T09:32:10.752Z (3 months ago)
- Topics: computer-vision, dataset, fiftyone, plugin, video, youtube
- Language: Python
- Homepage: https://github.com/Parva101/video2dataset#readme
- Size: 23.4 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Codeowners: .github/CODEOWNERS
- Security: SECURITY.md
Awesome Lists containing this project
README
# video2dataset (`@parva101/video2dataset`)
[](https://github.com/Parva101/video2dataset/actions/workflows/ci.yml)
[](./LICENSE)
[](https://docs.voxel51.com/plugins/index.html)
## Demo
### YouTube -> Dataset

### Local Video -> Dataset

A production-ready FiftyOne Python plugin that converts YouTube URLs or local
video files into image datasets by extracting representative keyframes.
## Features
- YouTube ingest via `yt-dlp`
- Local video ingest
- Three frame selection strategies:
- `uniform` (every N seconds)
- `scene_change` (histogram-based scene detection)
- `hybrid` (uniform + scene change)
- Optional perceptual hash deduplication (`pHash`)
- Automatic dataset creation + open in FiftyOne App
- Source metadata attached to samples and dataset info
## Operators
- `sample_from_youtube`
- `sample_from_video`
## Requirements
- Python 3.10+
- FiftyOne (latest stable recommended)
- Python packages in [`requirements.txt`](./requirements.txt)
- `ffmpeg` on `PATH` (recommended)
Install dependencies:
```bash
pip install -r requirements.txt
```
## Installation
Install from GitHub:
```bash
fiftyone plugins download https://github.com/Parva101/video2dataset
```
Install only this plugin:
```bash
fiftyone plugins download https://github.com/Parva101/video2dataset --plugin-names @parva101/video2dataset
```
Verify install:
```bash
fiftyone plugins list
fiftyone operators list
```
## Usage
1. Launch FiftyOne App
2. Open Operator Browser
3. Run:
- `Video Sampler: YouTube to dataset`, or
- `Video Sampler: local video to dataset`
4. Configure:
- `dataset_name` (required)
- `strategy`: `uniform | scene_change | hybrid`
- `max_frames`
- `interval_seconds`
- `scene_threshold` (`0` to `1`)
- `dedup` and `dedup_threshold`
- `overwrite_dataset`
- `output_dir` (optional)
## Output schema
Sample fields:
- `filepath`
- `source_type`
- `sampling_strategy`
- `timestamp_sec`
- `frame_number`
Additional YouTube fields:
- `source_url`
- `video_id`
- `title`
- `duration_sec`
- `uploader`
Dataset `info` fields:
- `source_type`
- `source_metadata`
- `extraction_info`
## Development
```bash
pip install -r requirements.txt
pip install -r requirements-dev.txt
pytest
ruff check .
```
## Troubleshooting
- If YouTube download fails, verify the URL is public and update `yt-dlp`
- If no frames are selected, lower `scene_threshold` or `interval_seconds`
- If dedup removes everything, lower `dedup_threshold` or disable `dedup`
- If dataset exists, enable `overwrite_dataset` or use a different name
## Contributing
See [CONTRIBUTING.md](./CONTRIBUTING.md).
## Security
Report vulnerabilities via [SECURITY.md](./SECURITY.md).
## Changelog
See [CHANGELOG.md](./CHANGELOG.md).
## License
Licensed under Apache-2.0. See [LICENSE](./LICENSE).