https://github.com/DmZhukov/CrossTask

Last synced: 9 months ago
JSON representation

Host: GitHub
URL: https://github.com/DmZhukov/CrossTask
Owner: DmZhukov
License: bsd-3-clause
Created: 2019-03-19T10:49:15.000Z (almost 7 years ago)
Default Branch: master
Last Pushed: 2022-02-14T13:33:10.000Z (almost 4 years ago)
Last Synced: 2024-10-27T07:32:05.049Z (about 1 year ago)
Language: Python
Size: 7.81 KB
Stars: 86
Watchers: 5
Forks: 9
Open Issues: 4
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-self-supervised-multimodal-learning - Link

README

# Cross-task weakly supervised learning from instructional videos

## About
This is an implementation of the paper "Cross-task weakly supervised learning from instructional videos" by D. Zhukov, J.-B. Alayrac, R. G. Cinbis, D. Fouhey, I. Laptev and J. Sivic [[arXiv](https://arxiv.org/abs/1903.08225)]

Please, consider siting the paper, if you use our code or data:
> @INPROCEEDINGS{Zhukov2019,
> author = {Zhukov, Dimitri and Alayrac, Jean-Baptiste and Cinbis, Ramazan Gokberk and Fouhey, David and Laptev, Ivan and Sivic, Josef},
> title = {Cross-task weakly supervised learning from instructional videos},
> booktitle = CVPR,
> year = {2019},
> }

## CrossTask dataset
CrossTask dataset contains instructional videos, collected for 83 different tasks.
For each task we provide an ordered list of steps with manual descriptions.
The dataset is divided in two parts: 18 primary and 65 related tasks.
Videos for the primary tasks are collected manually and provided with annotations for temporal step boundaries.
Videos for the related tasks are collected automatically and don't have annotations.

Tasks, video URLs and annotations are provided [here](https://www.di.ens.fr/~dzhukov/crosstask/crosstask_release.zip). See readme.txt for details.

Features are available [here](https://www.di.ens.fr/~dzhukov/crosstask/crosstask_features.zip) (30Gb). Features for each video are provided in a NumPy array with one 3200-dimensional feature per second. The feature vector is a concatenation of RGB I3D features (columns 0-1023), Resnet-152 (columns 1024-3071) and audio VGG features (columns 3072-3199).

Temporal constraints, extracted from narration are available [here](https://www.di.ens.fr/~dzhukov/crosstask/crosstask_constraints.zip).

**Update 30/06/2019:** added videos_val.csv with validation set from the paper, removed extra lines from the constraints.

**Update 14/02/2022:** Use [this](https://www.rocq.inria.fr/cluster-willow/dzhukov/missing_videos.tar.gz) link to download the videos, which are no longer available on YouTube. Subtitles for the videos are available [here](https://www.rocq.inria.fr/cluster-willow/dzhukov/crosstask-subtitles.tar.gz).

## Code
Provided code can be used to train and evaluate the component model, proposed in the paper, on CrossTask dataset.
It was tested with Python 3.7, PyTorch 1.0, NumPy 1.16 and Cython 0.29.

1. Clone the repository
```bash
git clone https://github.com/DmZhukov/CrossTask.git
cd CrossTask
```
2. Download and unpack the dataset
```bash
wget https://www.di.ens.fr/~dzhukov/crosstask/crosstask_release.zip
wget https://www.di.ens.fr/~dzhukov/crosstask/crosstask_features.zip
wget https://www.di.ens.fr/~dzhukov/crosstask/crosstask_constraints.zip
unzip '*.zip'
```
3. Compile Cython code
```bash
python setup.py build_ext --inplace
```
4. Run training
```bash
python train.py
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/DmZhukov/CrossTask

Awesome Lists containing this project

README