https://github.com/XiaohangZhan/conditional-motion-propagation
Code for our CVPR 2019 work.
https://github.com/XiaohangZhan/conditional-motion-propagation
deep-learning interactive-annotation motion-prediction representation-learning self-supervised self-supervised-learning unsupervised-learning video-generation
Last synced: 21 days ago
JSON representation
Code for our CVPR 2019 work.
- Host: GitHub
- URL: https://github.com/XiaohangZhan/conditional-motion-propagation
- Owner: XiaohangZhan
- License: mit
- Created: 2019-04-08T05:40:31.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2019-07-19T13:04:30.000Z (over 6 years ago)
- Last Synced: 2024-07-05T16:09:12.383Z (over 1 year ago)
- Topics: deep-learning, interactive-annotation, motion-prediction, representation-learning, self-supervised, self-supervised-learning, unsupervised-learning, video-generation
- Language: Python
- Homepage:
- Size: 50.1 MB
- Stars: 142
- Watchers: 9
- Forks: 20
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-self-supervised-learning - [code
- awesome-self-supervised-learning - [code
README
# Implementation of "Self-Supervised Learning via Conditional Motion Propagation" (CMP)
## Paper
Xiaohang Zhan, Xingang Pan, Ziwei Liu, Dahua Lin, Chen Change Loy, "[Self-Supervised Learning via Conditional Motion Propagation](https://arxiv.org/abs/1903.11412)", in CVPR 2019 [[Project Page](http://mmlab.ie.cuhk.edu.hk/projects/CMP/)]
For further information, please contact [Xiaohang Zhan](https://xiaohangzhan.github.io/).
## Demos (Watching full demos in [YouTube](https://www.youtube.com/watch?v=6R_oJCq5qMw))
* Conditional motion propagation (motion prediction by guidance)

* Guided video generation (draw arrows to let a static image animated)

* Semi-automatic annotation (first row: interface, auto zoom-in, mask; second row: optical flows)

## Data collection
[YFCC frames](https://dl.fbaipublicfiles.com/unsupervised-video/UnsupVideo_Frames_v1.tar.gz) (45G).
[YFCC optical flows (LiteFlowNet)](https://drive.google.com/open?id=1S_TU1UjKms-U_Q4bOhXfUfIJX5hgwOtq) (29G).
[YFCC lists](https://drive.google.com/open?id=1ObzO7xWXolPKrIC39XCvjttZYEoVn6k2) (251M).
## Model collection
* Pre-trained models for semantic segmentation, instance segmentation and human parsing by CMP can be downloaded [here](https://drive.google.com/open?id=1Kx-OIcr2U44p9mlpV-SbhANQdtbn2rJR)
* Models for demos (conditinal motion propagation, guided video generation and semi-automatic annotation) can be downloaded [here](https://drive.google.com/open?id=1JMuoexvRCUQ0cmtfyse-8OScLHA6tjuI)
## Requirements
* python>=3.6
* pytorch>=0.4.0
* others
```sh
pip install -r requirements.txt
```
## Usage
0. Clone the repo.
```sh
git clone git@github.com:XiaohangZhan/conditional-motion-propagation.git
cd conditional-motion-propagation
```
### Representation learning
1. Prepare data (YFCC as an example)
```sh
mkdir data
mkdir data/yfcc
cd data/yfcc
# download YFCC frames, optical flows and lists to data/yfcc
tar -xf UnsupVideo_Frames_v1.tar.gz
tar -xf flow_origin.tar.gz
tar -xf lists.tar.gz
```
Then folder `data` looks like:
```
data
├── yfcc
├── UnsupVideo_Frames
├── flow_origin
├── lists
├── train.txt
├── val.txt
```
2. Train CMP for Representation Learning.
* If your server supports multi-nodes training.
```sh
sh experiments/rep_learning/alexnet_yfcc_voc_16gpu_70k/train.sh # 16 GPUs distributed training
python tools/weight_process.py --config experiments/rep_learning/alexnet_yfcc_voc_16gpu_70k/config.yaml --iter 70000 # extract weights of the image encoder to experiments/rep_learning/alexnet_yfcc_voc_16gpu_70k/checkpoints/convert_iter_70000.pth.tar
```
* If your server does not support multi-nodes training.
```sh
sh experiments/rep_learning/alexnet_yfcc_voc_8gpu_140k/train.sh # 8 GPUs distributed training
python tools/weight_process.py --config experiments/rep_learning/alexnet_yfcc_voc_8gpu_140k/config.yaml --iter 140000 # extract weights of the image encoder
```
### Run demos
1. Download the [model](https://drive.google.com/open?id=1JMuoexvRCUQ0cmtfyse-8OScLHA6tjuI) and move it to `experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints/`.
2. Launch jupyter notebook and run `demos/cmp.ipynb` for conditional motion propagation, or `demos/demo_annot.ipynb` for semi-automatic annotation.
3. Train the model by yourself (optional)
```sh
# data not ready
sh experiments/semiauto_annot/resnet50_vip+mpii_liteflow/train.sh # 8 GPUs distributed training
```
### Results
1. Pascal VOC 2012 Semantic Segmentation (AlexNet)
Method (AlexNet)Supervision (data amount)% mIoU
Krizhevsky et al. [1]ImageNet labels (1.3M)48.0
Random- (0)19.8
Pathak et al. [2]In-painting (1.2M)29.7
Zhang et al. [3]Colorization (1.3M)35.6
Zhang et al. [4]Split-Brain (1.3M)36.0
Noroozi et al. [5]Counting (1.3M)36.6
Noroozi et al. [6]Jigsaw (1.3M)37.6
Noroozi et al. [7]Jigsaw++ (1.3M)38.1
Jenni et al. [8]Spot-Artifacts (1.3M)38.1
Larsson et al. [9]Colorization (3.7M)38.4
Gidaris et al. [10]Rotation (1.3M)39.1
Pathak et al. [11]*Motion Segmentation (1.6M)39.7
Walker et al. [12]*Flow Prediction (3.22M)40.4
Mundhenk et al. [13]Context (1.3M)40.6
Mahendran et al. [14]Flow Similarity (1.6M)41.4
OursCMP (1.26M)42.9
OursCMP (3.22M)44.5
Caron et al. [15]Clustering (1.3M)45.1
Feng et al. [16]Feature Decoupling (1.3M)45.3
2. Pascal VOC 2012 Semantic Segmentation (ResNet-50)
Method (ResNet-50)Supervision (data amount)% mIoU
Krizhevsky et al. [1]ImageNet labels (1.2M)69.0
Random- (0)42.4
Walker et al. [12]*Flow Prediction (1.26M)54.5
Pathak et al. [11]*Motion Segmentation (1.6M)54.6
OursCMP (1.26M)59.0
3. COCO 2017 Instance Segmentation (ResNet-50)
Method (ResNet-50)Supervision (data amount)Det. (% mAP)Seg. (% mAP)
Krizhevsky et al. [1]ImageNet labels (1.2M)37.234.1
Random- (0)19.718.8
Pathak et al. [11]*Motion Segmentation (1.6M)27.725.8
Walker et al. [12]*Flow Prediction (1.26M)31.529.2
OursCMP (1.26M)32.329.8
4. LIP Human Parsing (ResNet-50)
Method (ResNet-50)Supervision (data amount)Single-Person (% mIoU)Multi-Person (% mIoU)
Krizhevsky et al. [1]ImageNet labels (1.2M)42.555.4
Random- (0)32.535.0
Pathak et al. [11]*Motion Segmentation (1.6M)36.650.9
Walker et al. [12]*Flow Prediction (1.26M)36.752.5
OursCMP (1.26M)36.951.8
OursCMP (4.57M)40.252.9
Note: Methods marked * have not reported the results in their paper, hence we reimplemented them to obtain the results.
References
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In NeurIPS, 2012.
- Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, and Alexei A Efros. Context encoders: Feature learning by inpainting. In CVPR, 2016.
- Richard Zhang, Phillip Isola, and Alexei A Efros. Colorful image colorization. In ECCV. Springer, 2016.
- Richard Zhang, Phillip Isola, and Alexei A Efros. Split-brain autoencoders: Unsupervised learning by cross-channel prediction. In CVPR, 2017.
- Mehdi Noroozi, Hamed Pirsiavash, and Paolo Favaro. Representation learning by learning to count. In ICCV, 2017.
- Mehdi Noroozi and Paolo Favaro. Unsupervised learning of visual representations by solving jigsaw puzzles. In ECCV. Springer, 2016.
- Mehdi Noroozi, Ananth Vinjimoor, Paolo Favaro, and Hamed Pirsiavash. Boosting self-supervised learning via knowledge transfer. In CVPR, 2018.
- Simon Jenni and Paolo Favaro. Self-supervised feature learning by learning to spot artifacts. In CVPR, 2018.
- Gustav Larsson, Michael Maire, and Gregory Shakhnarovich. Colorization as a proxy task for visual understanding. In CVPR, 2017.
- Spyros Gidaris, Praveer Singh, and Nikos Komodakis. Unsupervised representation learning by predicting image rotations. In ICLR, 2018.
- Deepak Pathak, Ross B Girshick, Piotr Dollar, Trevor Darrell, and Bharath Hariharan. Learning features by watching objects move. In CVPR, 2017.
- Jacob Walker, Abhinav Gupta, and Martial Hebert. Dense optical flow prediction from a static image. In ICCV, 2015.
- T Nathan Mundhenk, Daniel Ho, and Barry Y Chen. Improvements to context based self-supervised learning. CVPR, 2018.
- A. Mahendran, J. Thewlis, and A. Vedaldi. Cross pixel optical flow similarity for self-supervised learning. In ACCV, 2018.
- Mathilde Caron, Piotr Bojanowski, Armand Joulin, and Matthijs Douze. Deep clustering for unsupervised learning of visual features. In ECCV, 2018.
- Zeyu Feng, Chang Xu, and Dacheng Tao. Self-Supervised Representation Learning by Rotation Feature Decoupling. In CVPR, 2019.
### Core idea
A Chinese proverb: "牵一发而动全身".
### Bibtex
```
@inproceedings{zhan2019self,
author = {Zhan, Xiaohang and Pan, Xingang and Liu, Ziwei and Lin, Dahua and Loy, Chen Change},
title = {Self-Supervised Learning via Conditional Motion Propagation},
booktitle = {Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)},
month = {June},
year = {2019}
}
```