https://github.com/XiaohangZhan/conditional-motion-propagation

Code for our CVPR 2019 work.
https://github.com/XiaohangZhan/conditional-motion-propagation

deep-learning interactive-annotation motion-prediction representation-learning self-supervised self-supervised-learning unsupervised-learning video-generation

Last synced: 8 months ago
JSON representation

Code for our CVPR 2019 work.

Host: GitHub
URL: https://github.com/XiaohangZhan/conditional-motion-propagation
Owner: XiaohangZhan
License: mit
Created: 2019-04-08T05:40:31.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2019-07-19T13:04:30.000Z (almost 7 years ago)
Last Synced: 2024-07-05T16:09:12.383Z (about 2 years ago)
Topics: deep-learning, interactive-annotation, motion-prediction, representation-learning, self-supervised, self-supervised-learning, unsupervised-learning, video-generation
Language: Python
Homepage:
Size: 50.1 MB
Stars: 142
Watchers: 9
Forks: 20
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-self-supervised-learning - [code
awesome-self-supervised-learning - [code

README

          # Implementation of "Self-Supervised Learning via Conditional Motion Propagation" (CMP)

## Paper

Xiaohang Zhan, Xingang Pan, Ziwei Liu, Dahua Lin, Chen Change Loy, "[Self-Supervised Learning via Conditional Motion Propagation](https://arxiv.org/abs/1903.11412)", in CVPR 2019 [[Project Page](http://mmlab.ie.cuhk.edu.hk/projects/CMP/)]

For further information, please contact [Xiaohang Zhan](https://xiaohangzhan.github.io/).

## Demos (Watching full demos in [YouTube](https://www.youtube.com/watch?v=6R_oJCq5qMw))

* Conditional motion propagation (motion prediction by guidance)

![](demos/demo_cmp.gif)

* Guided video generation (draw arrows to let a static image animated)

![](demos/demo_video_generation.gif)

* Semi-automatic annotation (first row: interface, auto zoom-in, mask; second row: optical flows)

![](demos/demo_annotation.gif)

## Data collection

[YFCC frames](https://dl.fbaipublicfiles.com/unsupervised-video/UnsupVideo_Frames_v1.tar.gz) (45G).

[YFCC optical flows (LiteFlowNet)](https://drive.google.com/open?id=1S_TU1UjKms-U_Q4bOhXfUfIJX5hgwOtq) (29G).

[YFCC lists](https://drive.google.com/open?id=1ObzO7xWXolPKrIC39XCvjttZYEoVn6k2) (251M).

## Model collection

* Pre-trained models for semantic segmentation, instance segmentation and human parsing by CMP can be downloaded [here](https://drive.google.com/open?id=1Kx-OIcr2U44p9mlpV-SbhANQdtbn2rJR)

* Models for demos (conditinal motion propagation, guided video generation and semi-automatic annotation) can be downloaded [here](https://drive.google.com/open?id=1JMuoexvRCUQ0cmtfyse-8OScLHA6tjuI)

## Requirements

 

* python>=3.6

* pytorch>=0.4.0

* others

    ```sh

    pip install -r requirements.txt

    ```

## Usage

0. Clone the repo.

    ```sh

    git clone git@github.com:XiaohangZhan/conditional-motion-propagation.git

    cd conditional-motion-propagation

    ```

### Representation learning

1. Prepare data (YFCC as an example)

    ```sh

    mkdir data

    mkdir data/yfcc

    cd data/yfcc

    # download YFCC frames, optical flows and lists to data/yfcc

    tar -xf UnsupVideo_Frames_v1.tar.gz

    tar -xf flow_origin.tar.gz

    tar -xf lists.tar.gz

    ```

    Then folder `data` looks like:

    ```

    data

      ├── yfcc

        ├── UnsupVideo_Frames

        ├── flow_origin

        ├── lists

          ├── train.txt

          ├── val.txt

    ```

2. Train CMP for Representation Learning.

    * If your server supports multi-nodes training.

    ```sh

    sh experiments/rep_learning/alexnet_yfcc_voc_16gpu_70k/train.sh # 16 GPUs distributed training

    python tools/weight_process.py --config experiments/rep_learning/alexnet_yfcc_voc_16gpu_70k/config.yaml --iter 70000 # extract weights of the image encoder to experiments/rep_learning/alexnet_yfcc_voc_16gpu_70k/checkpoints/convert_iter_70000.pth.tar

    ```

    * If your server does not support multi-nodes training.

    ```sh

    sh experiments/rep_learning/alexnet_yfcc_voc_8gpu_140k/train.sh # 8 GPUs distributed training

    python tools/weight_process.py --config experiments/rep_learning/alexnet_yfcc_voc_8gpu_140k/config.yaml --iter 140000 # extract weights of the image encoder

    ```

### Run demos

1. Download the [model](https://drive.google.com/open?id=1JMuoexvRCUQ0cmtfyse-8OScLHA6tjuI) and move it to `experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints/`.

2. Launch jupyter notebook and run `demos/cmp.ipynb` for conditional motion propagation, or `demos/demo_annot.ipynb` for semi-automatic annotation.

3. Train the model by yourself (optional)

    ```sh

    # data not ready

    sh experiments/semiauto_annot/resnet50_vip+mpii_liteflow/train.sh # 8 GPUs distributed training

    ```

### Results

1. Pascal VOC 2012 Semantic Segmentation (AlexNet)

    

        Method (AlexNet)Supervision (data amount)% mIoU

        

        Krizhevsky et al. [1]ImageNet labels (1.3M)48.0

        Random- (0)19.8

        Pathak et al. [2]In-painting (1.2M)29.7

        Zhang et al. [3]Colorization (1.3M)35.6

        Zhang et al. [4]Split-Brain (1.3M)36.0

        Noroozi et al. [5]Counting (1.3M)36.6

        Noroozi et al. [6]Jigsaw (1.3M)37.6

        Noroozi et al. [7]Jigsaw++ (1.3M)38.1

        Jenni et al. [8]Spot-Artifacts (1.3M)38.1

        Larsson et al. [9]Colorization (3.7M)38.4

        Gidaris et al. [10]Rotation (1.3M)39.1

        Pathak et al. [11]*Motion Segmentation (1.6M)39.7

        Walker et al. [12]*Flow Prediction (3.22M)40.4

        Mundhenk et al. [13]Context (1.3M)40.6

        Mahendran et al. [14]Flow Similarity (1.6M)41.4

        OursCMP (1.26M)42.9

        OursCMP (3.22M)44.5

        Caron et al. [15]Clustering (1.3M)45.1

        Feng et al. [16]Feature Decoupling (1.3M)45.3

        

    

    2. Pascal VOC 2012 Semantic Segmentation (ResNet-50)

    

        Method (ResNet-50)Supervision (data amount)% mIoU

        

        Krizhevsky et al. [1]ImageNet labels (1.2M)69.0

        Random- (0)42.4

        Walker et al. [12]*Flow Prediction (1.26M)54.5

        Pathak et al. [11]*Motion Segmentation (1.6M)54.6

        OursCMP (1.26M)59.0

        

    

    3. COCO 2017 Instance Segmentation (ResNet-50)

    

        Method (ResNet-50)Supervision (data amount)Det. (% mAP)Seg. (% mAP)

        

        Krizhevsky et al. [1]ImageNet labels (1.2M)37.234.1

        Random- (0)19.718.8

        Pathak et al. [11]*Motion Segmentation (1.6M)27.725.8

        Walker et al. [12]*Flow Prediction (1.26M)31.529.2

        OursCMP (1.26M)32.329.8

        

    

    4. LIP Human Parsing (ResNet-50)

    

        Method (ResNet-50)Supervision (data amount)Single-Person (% mIoU)Multi-Person (% mIoU)

        

        Krizhevsky et al. [1]ImageNet labels (1.2M)42.555.4

        Random- (0)32.535.0

        Pathak et al. [11]*Motion Segmentation (1.6M)36.650.9

        Walker et al. [12]*Flow Prediction (1.26M)36.752.5

        OursCMP (1.26M)36.951.8

        OursCMP (4.57M)40.252.9

        

    

    Note: Methods marked * have not reported the results in their paper, hence we reimplemented them to obtain the results.

    


    References

    

        Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In NeurIPS, 2012.

        Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, and Alexei A Efros. Context encoders: Feature learning by inpainting. In CVPR, 2016.

        Richard Zhang, Phillip Isola, and Alexei A Efros. Colorful image colorization. In ECCV. Springer, 2016.

        Richard Zhang, Phillip Isola, and Alexei A Efros. Split-brain autoencoders: Unsupervised learning by cross-channel prediction. In CVPR, 2017.

        Mehdi Noroozi, Hamed Pirsiavash, and Paolo Favaro. Representation learning by learning to count. In ICCV, 2017.

        Mehdi Noroozi and Paolo Favaro. Unsupervised learning of visual representations by solving jigsaw puzzles. In ECCV. Springer, 2016.

        Mehdi Noroozi, Ananth Vinjimoor, Paolo Favaro, and Hamed Pirsiavash. Boosting self-supervised learning via knowledge transfer. In CVPR, 2018.

        Simon Jenni and Paolo Favaro. Self-supervised feature learning by learning to spot artifacts. In CVPR, 2018.

        Gustav Larsson, Michael Maire, and Gregory Shakhnarovich. Colorization as a proxy task for visual understanding. In CVPR, 2017.

        Spyros Gidaris, Praveer Singh, and Nikos Komodakis. Unsupervised representation learning by predicting image rotations. In ICLR, 2018.

        Deepak Pathak, Ross B Girshick, Piotr Dollar, Trevor Darrell, and Bharath Hariharan. Learning features by watching objects move. In CVPR, 2017.

        Jacob Walker, Abhinav Gupta, and Martial Hebert. Dense optical flow prediction from a static image. In ICCV, 2015.

        T Nathan Mundhenk, Daniel Ho, and Barry Y Chen. Improvements to context based self-supervised learning. CVPR, 2018.

        A. Mahendran, J. Thewlis, and A. Vedaldi. Cross pixel optical flow similarity for self-supervised learning. In ACCV, 2018.

        Mathilde Caron, Piotr Bojanowski, Armand Joulin, and Matthijs Douze. Deep clustering for unsupervised learning of visual features. In ECCV, 2018.

        Zeyu Feng, Chang Xu, and Dacheng Tao. Self-Supervised Representation Learning by Rotation Feature Decoupling. In CVPR, 2019.

    


### Core idea

    A Chinese proverb: "牵一发而动全身".

### Bibtex

```

@inproceedings{zhan2019self,

 author = {Zhan, Xiaohang and Pan, Xingang and Liu, Ziwei and Lin, Dahua and Loy, Chen Change},

 title = {Self-Supervised Learning via Conditional Motion Propagation},

 booktitle = {Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)},

 month = {June},

 year = {2019}

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/XiaohangZhan/conditional-motion-propagation

Awesome Lists containing this project

README

1. Pascal VOC 2012 Semantic Segmentation (AlexNet)

2. Pascal VOC 2012 Semantic Segmentation (ResNet-50)

3. COCO 2017 Instance Segmentation (ResNet-50)

4. LIP Human Parsing (ResNet-50)

References