Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.
Awesome Lists | Featured Topics | Projects
https://github.com/HobbitLong/CMC

[arXiv 2019] "Contrastive Multiview Coding", also contains implementations for MoCo and InstDis
https://github.com/HobbitLong/CMC
Last synced: 1 day ago
JSON representation
[arXiv 2019] "Contrastive Multiview Coding", also contains implementations for MoCo and InstDis
Host: GitHub
URL: https://github.com/HobbitLong/CMC
Owner: HobbitLong
License: bsd-2-clause
Created: 2019-06-13T10:32:11.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2020-11-10T07:33:17.000Z (about 4 years ago)
Last Synced: 2024-10-29T22:32:23.654Z (15 days ago)
Language: Python
Homepage:
Size: 74.2 KB
Stars: 1,299
Watchers: 28
Forks: 179
Open Issues: 25
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

        Official implementation:

- CMC: Contrastive Multiview Coding ([Paper](http://arxiv.org/abs/1906.05849))

Unofficial implementation:

- MoCo: Momentum Contrast for Unsupervised Visual Representation Learning ([Paper](https://arxiv.org/abs/1911.05722)) 

- InsDis: Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination ([Paper](https://arxiv.org/abs/1805.01978))

## Citation

If you find this repo useful for your research, please consider citing the paper

```

@article{tian2019contrastive,

  title={Contrastive Multiview Coding},

  author={Tian, Yonglong and Krishnan, Dilip and Isola, Phillip},

  journal={arXiv preprint arXiv:1906.05849},

  year={2019}

}

```

## Contrastive Multiview Coding

This repo covers the implementation for CMC (as well as Momentum Contrast and Instance Discrimination), which learns representations from multiview data in a self-supervised way (by multiview, we mean multiple sensory, multiple modal data, or literally multiple viewpoint data. It's flexible to define what is a "view"):

"Contrastive Multiview Coding" [Paper](http://arxiv.org/abs/1906.05849), [Project Page](http://hobbitlong.github.io/CMC/).

![Teaser Image](http://hobbitlong.github.io/CMC/CMC_files/teaser.jpg)

## Highlights

**(1) Representation quality as a function of number of contrasted views.** 

We found that, the more views we train with, the better the representation (of each single view).

**(2) Contrastive objective v.s. Predictive objective**

We compare the contrastive objective to cross-view prediction, finding an advantage to the contrastive approach.

**(3) Unsupervised v.s. Supervised**

Several ResNets trained with our **unsupervised** CMC objective surpasses **supervisedly** trained AlexNet on ImageNet classification ( e.g., 68.4% v.s. 59.3%). For this first time on ImageNet classification, unsupervised methods are surpassing the classic supervised-AlexNet proposed in 2012 (CPC++ and AMDIM also achieve this milestone concurrently). 

## Updates

Aug 20, 2019 - ResNets on ImageNet have been added.

Nov 26, 2019 - New results updated. Implementation of **MoCo** and **InsDis** added.

Jan 18, 2020 - Weights of **InsDis** and **MoCo** added.

May 22, 2020 - ImageNet-100 list uploaded, see [`imagenet100.txt`](imagenet100.txt).

## Installation

This repo was tested with Ubuntu 16.04.5 LTS, Python 3.5, PyTorch 0.4.0, and CUDA 9.0. But it should be runnable with recent PyTorch versions >=0.4.0

**Note:** It seems to us that training with Pytorch version >= 1.0 yields slightly worse results. If you find the similar discrepancy and figure out the problem, please report this since we are trying to fix it as well.

## Training AlexNet/ResNets with CMC on ImageNet

**Note:** For AlexNet, we split across the channel dimension and use each half to encode L and ab. For ResNets, we use a standard ResNet model to encode each view.

NCE flags:

- `--nce_k`: number of negatives to contrast for each positive. Default: 4096

- `--nce_m`: the momentum for dynamically updating the memory. Default: 0.5

- `--nce_t`: temperature that modulates the distribution. Default: 0.07 for ImageNet, 0.1 for STL-10

Path flags:

- `--data_folder`: specify the ImageNet data folder.

- `--model_path`: specify the path to save model.

- `--tb_path`: specify where to save tensorboard monitoring events.

Model flag:

- `--model`: specify which model to use, including *alexnet*, *resnets18*, *resnets50*, and *resnets101*

An example of command line for training CMC (Default: `AlexNet` on Single GPU)

```

CUDA_VISIBLE_DEVICES=0 python train_CMC.py --batch_size 256 --num_workers 36 \

 --data_folder /path/to/data 

 --model_path /path/to/save 

 --tb_path /path/to/tensorboard

```

Training CMC with ResNets requires at least 4 GPUs, the command of using `resnet50v1` looks like

```

CUDA_VISIBLE_DEVICES=0,1,2,3 python train_CMC.py --model resnet50v1 --batch_size 128 --num_workers 24

 --data_folder path/to/data \

 --model_path path/to/save \

 --tb_path path/to/tensorboard \

```

To support mixed precision training, simply append the flag `--amp`, which, however is likely to harm the downstream classification. I measure it on ImageNet100 subset and the gap is about 0.5-1%.

By default, the training scripts will use L and ab as two views for contrasting. You can switch to `YCbCr` by specifying `--view YCbCr`, which yields better results (about 0.5-1%). If you want to use other color spaces as different views, follow the line [here](https://github.com/HobbitLong/CMC/blob/master/train_CMC.py#L146) and other color transfer functions are already available in `dataset.py`.

## Training Linear Classifier

Path flags:

- `--data_folder`: specify the ImageNet data folder. Should be the same as above.

- `--save_path`: specify the path to save the linear classifier.

- `--tb_path`: specify where to save tensorboard events monitoring linear classifier training.

Model flag `--model` is similar as above and should be specified.

Specify the checkpoint that you want to evaluate with `--model_path` flag, this path should directly point to the `.pth` file.

This repo provides 3 ways to train the linear classifier: *single GPU*, *data parallel*, and *distributed data parallel*.

An example of command line for evaluating, say `./models/alexnet.pth`, should look like:

```

CUDA_VISIBLE_DEVICES=0 python LinearProbing.py --dataset imagenet \

 --data_folder /path/to/data \

 --save_path /path/to/save \

 --tb_path /path/to/tensorboard \

 --model_path ./models/alexnet.pth \

 --model alexnet --learning_rate 0.1 --layer 5

```

**Note:** When training linear classifiers on top of ResNets, it's important to use large learning rate, e.g., 30~50. Specifically, change `--learning_rate 0.1 --layer 5` to `--learning_rate 30 --layer 6` for `resnet50v1` and `resnet50v2`, to `--learning_rate 50 --layer 6` for `resnet50v3`.

## Pretrained Models

Pretrained weights can be found in [Dropbox](https://www.dropbox.com/sh/5k4t77mt4011gyr/AABkBvKm2bGNNut0m6bLMK84a?dl=0).

Note: 

- CMC weights are trained with `NCE` loss, `Lab` color space, `4096` negatives and `amp` option. Switching to `softmax-ce` loss, `YCbCr`, `65536` negatives, and turning off `amp` option, are likely to improve the results.

- `CMC_resnet50v2.pth` and `CMC_resnet50v3.pth` are trained with FastAutoAugment, which improves the downstream accuracy by 0.8~1%. I will update weights without FastAutoAugment once they are available.

InsDis and MoCo are trained using the same hyperparameters as in MoCo (`epochs=200, lr=0.03, lr_decay_epochs=120,160, weight_decay=1e-4`), but with only 4 GPUs.

|          |Arch | #Params(M) | Loss  | #Negative  | Accuracy(%) | Delta(%) |

|----------|:----:|:---:|:---:|:---:|:---:|:---:|

|  InsDis | ResNet50 | 24  | NCE  | 4096  |  56.5  |   - | 

|  InsDis | ResNet50 | 24  | Softmax-CE  | 4096  |  57.1  | +0.6 |

|  InsDis | ResNet50 | 24  | Softmax-CE  | 16384  |  58.5  | +1.4 |

|  MoCo | ResNet50 | 24  | Softmax-CE  | 16384  |  59.4  | +0.9|

## Momentum Contrast and Instance Discrimination

I have implemented and tested MoCo and InsDis on a ImageNet100 subset (but the code allows one to train on full ImageNet simply by setting the flag `--dataset imagenet`):

The pre-training stage:

- For InsDis:

    ```

    CUDA_VISIBLE_DEVICES=0,1,2,3 python train_moco_ins.py \

     --batch_size 128 --num_workers 24 --nce_k 16384 --softmax

    ```

- For MoCo:

    ```

    CUDA_VISIBLE_DEVICES=0,1,2,3 python train_moco_ins.py \

     --batch_size 128 --num_workers 24 --nce_k 16384 --softmax --moco

    ```

  

The linear evaluation stage:

- For both InsDis and MoCo (lr=10 is better than 30 on this subset, for full imagenet please switch to 30):

    ```

    CUDA_VISIBLE_DEVICES=0 python eval_moco_ins.py --model resnet50 \

     --model_path /path/to/model --num_workers 24 --learning_rate 10

    ```

  

The comparison of `CMC` (using YCbCr), `MoCo` and `InsDIS` on my ImageNet100 subset, is tabulated as below:

|          |Arch | #Params(M) | Loss  | #Negative  | Accuracy |

|----------|:----:|:---:|:---:|:---:|:---:|

|  InsDis | ResNet50 | 24  | NCE  | 16384  |  --  |

|  InsDis | ResNet50 | 24  | Softmax-CE  | 16384  |  69.1  |

|  MoCo | ResNet50 | 24  | NCE  | 16384  |  --  |

|  MoCo | ResNet50 | 24  | Softmax-CE  | 16384  |  73.4  |

|  CMC | 2xResNet50half | 12  | NCE  | 4096  |  --  |

|  CMC | 2xResNet50half | 12  | Softmax-CE  | 4096  |  75.8  |

For any questions, please contact Yonglong Tian ([email protected]).

## Acknowledgements

Part of this code is inspired by Zhirong Wu's unsupervised learning algorithm [lemniscate](https://github.com/zhirongw/lemniscate.pytorch).