An open API service indexing awesome lists of open source software.

https://github.com/hellock/wld

WildLife Documentary Dataset
https://github.com/hellock/wld

Last synced: 10 months ago
JSON representation

WildLife Documentary Dataset

Awesome Lists containing this project

README

          

# **W**ild**L**ife **D**ocumentary (WLD) Dataset

## Introduction
The dataset contains 15 documentary films that are downloaded from YouTube,
whose durations vary from 9 minutes to as long as 50 minutes,
and the total number of frames is more than 747,000.
More than 4000 object tracklets of 65 categories are annotated.

Here is an overview of the dataset.
![Dataset overview](http://www.chenkai.site/projects/documentary-learning/dataset.png)

## Content
The dataset are organized as the following structure:
- `videos/`: Downloaded raw videos should be extracted here.
- `frames/`: Video frames will be generated here.
- `subtitles/`: Subtitles of the videos, in srt format. The subtitles are
originally auto-generated by YouTube and we correct some obvious mistakes manually.
- `annotations/`: Bounding box annotations, in json format. Coordinates are 0-based and the bounding boxes are labeled as [x1, y1, x2, y2]. The videos are fully annotated with the help of object tracking.

## Citation
If you use WLD dataset in your research, please consider citing our paper:

```
@inproceedings{chen2017discover,
author = {Kai Chen, Hang Song, Chen Change Loy, Dahua Lin},
title = {Discover and Learn New Objects from Documentaries},
booktitle = {Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = July,
year = {2017}
}
```

## Download
1. Download the raw videos from [Google Drive](https://drive.google.com/open?id=0BwdE-vDvqKjHVG5ETmtCRU9qNzQ) and extract all videos to the folder `video/`.
2. run the script `video2frames.py` (opencv required) to convert all videos into frames.