https://github.com/andrewatanov/simclr-pytorch
PyTorch implementation of SimCLR: supports multi-GPU training and closely reproduces results
https://github.com/andrewatanov/simclr-pytorch
contrastive-learning deep-learning pytorch pytorch-implementation representation-learning self-supervised-learning
Last synced: 3 months ago
JSON representation
PyTorch implementation of SimCLR: supports multi-GPU training and closely reproduces results
- Host: GitHub
- URL: https://github.com/andrewatanov/simclr-pytorch
- Owner: AndrewAtanov
- License: mit
- Created: 2020-12-10T13:52:17.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2024-04-29T10:29:25.000Z (about 1 year ago)
- Last Synced: 2025-04-09T15:05:36.199Z (3 months ago)
- Topics: contrastive-learning, deep-learning, pytorch, pytorch-implementation, representation-learning, self-supervised-learning
- Language: Jupyter Notebook
- Homepage:
- Size: 1.76 MB
- Stars: 203
- Watchers: 4
- Forks: 40
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
# SimCLR PyTorch
This is an unofficial repository reproducing results of the paper [A Simple Framework for Contrastive Learning of Visual Representations](https://arxiv.org/abs/2002.05709). The implementation supports multi-GPU distributed training on several nodes with PyTorch `DistributedDataParallel`.
## How close are we to the original SimCLR?
The implementation closely reproduces the original ResNet50 results on ImageNet and CIFAR-10.
![]()
| Dataset | Batch Size | \# Epochs | Training GPUs | Training time | Top\-1 accuracy of Linear evaluation (100% labels)| Reference |
|----------|------------|-----------|---------------|---------------|-----------------------------------|------------|
| CIFAR-10 | 1024 | 1000 | 2v100 | 13h | 93\.44 | 93.95 |
| ImageNet | 512 | 100 | 4v100 | 85h | 60\.14 | 60.62 |
| ImageNet | 2048 | 200 | 16v100 | 55h | 65\.58 | 65.83 |
| ImageNet | 2048 | 600 | 16v100 | 170h | 67\.84 | 68.71 |## Pre-trained weights
Try out a pre-trained models [](https://colab.research.google.com/github/AndrewAtanov/simclr-pytorch/blob/master/colabs/model_apply.ipynb)
You can download pre-trained weights from [here](https://drive.google.com/file/d/13tjpWYTzV8qLB5yY5raBn5cwtIyFtt6-/view?usp=sharing).
To eval the preatrained CIFAR-10 linear model and encoder use the following command:
```(bash)
python train.py --problem eval --eval_only true --iters 1 --arch linear \
--ckpt pretrained_models/resnet50_cifar10_bs1024_epochs1000_linear.pth.tar \
--encoder_ckpt pretrained_models/resnet50_cifar10_bs1024_epochs1000.pth.tar
```To eval the preatrained ImageNet linear model and encoder use the following command:
```(bash)
export IMAGENET_PATH=.../raw-data
python train.py --problem eval --eval_only true --iters 1 --arch linear --data imagenet \
--ckpt pretrained_models/resnet50_imagenet_bs2k_epochs600_linear.pth.tar \
--encoder_ckpt pretrained_models/resnet50_imagenet_bs2k_epochs600.pth.tar
```## Enviroment Setup
Create a python enviroment with the provided config file and [miniconda](https://docs.conda.io/en/latest/miniconda.html):
```(bash)
conda env create -f environment.yml
conda activate simclr_pytorchexport IMAGENET_PATH=... # If you have enough RAM using /dev/shm usually accelerates data loading time
export EXMAN_PATH=... # A path to logs
```## Training
Model training consists of two steps: (1) self-supervised encoder pretraining and (2) classifier learning with the encoder representations. Both steps are done with the `train.py` script. To see the help for `sim-clr/eval` problem call the following command: `python source/train.py --help --problem sim-clr/eval`.### Self-supervised pretraining
#### CIFAR-10
The config `cifar_train_epochs1000_bs1024.yaml` contains the parameters to reproduce results for CIFAR-10 dataset. It requires 2 V100 GPUs. The pretraining command is:```(bash)
python train.py --config configs/cifar_train_epochs1000_bs1024.yaml
```#### ImageNet
The configs `imagenet_params_epochs*_bs*.yaml` contain the parameters to reproduce results for ImageNet dataset. It requires at 4v100-16v100 GPUs depending on a batch size. The single-node (4 v100 GPUs) pretraining command is:```(bash)
python train.py --config configs/imagenet_train_epochs100_bs512.yaml
```#### Logs
The logs and the model will be stored at `./logs/exman-train.py/runs//`. You can access all the experiments from python with `exman.Index('./logs/exman-train.py').info()`.See how to work with logs [](https://colab.research.google.com/github/AndrewAtanov/simclr-pytorch/blob/master/colabs/read_logs.ipynb)
### Linear Evaluation
To train a linear classifier on top of the pretrained encoder, run the following command:```(bash)
python train.py --config configs/cifar_eval.yaml --encoder_ckpt
```The above model with batch size 1024 gives `93.5` linear eval test accuracy.
:` format
### Pretraining with `DistributedDataParallel`
To train a model with larger batch size on several nodes you need to set `--dist ddp` flag and specify the following parameters:
- `--dist_address`: the address and a port of the main node in the `
- `--node_rank`: 0 for the main node and 1,... for the others.
- `--world_size`: the number of nodes.For example, to train with two nodes you need to run the following command on the main node:
: --node_rank 0 --world_size 2
```(bash)
python train.py --config configs/cifar_train_epochs1000_bs1024.yaml --dist ddp --dist_address
```
and on the second node:
```(bash)
python train.py --config configs/cifar_train_epochs1000_bs1024.yaml --dist ddp --dist_address : --node_rank 1 --world_size 2
```The ImageNet the pretaining on 4 nodes all with 4 GPUs looks as follows:
: --node_rank 0
```
node1: python train.py --config configs/imagenet_train_epochs200_bs2k.yaml --dist ddp --world_size 4 --dist_address
node2: python train.py --config configs/imagenet_train_epochs200_bs2k.yaml --dist ddp --world_size 4 --dist_address : --node_rank 1
node3: python train.py --config configs/imagenet_train_epochs200_bs2k.yaml --dist ddp --world_size 4 --dist_address : --node_rank 2
node4: python train.py --config configs/imagenet_train_epochs200_bs2k.yaml --dist ddp --world_size 4 --dist_address : --node_rank 3
```## Attribution
Parts of this code are based on the following repositories:v
- [PyTorch](https://github.com/pytorch/pytorch), [PyTorch Examples](https://github.com/pytorch/examples/tree/ee964a2/imagenet), [PyTorch Lightning](https://github.com/PyTorchLightning/pytorch-lightning) for standard backbones, training loops, etc.
- [SimCLR - A Simple Framework for Contrastive Learning of Visual Representations](https://github.com/google-research/simclr) for more details on the original implementation
- [diffdist](https://github.com/ag14774/diffdist) for multi-gpu contrastive loss implementation, allows backpropagation through `all_gather` operation (see [models/losses.py#L58](https://github.com/AndrewAtanov/simclr-pytorch/blob/master/models/losses.py#L62))
- [Experiment Manager (exman)](https://github.com/ferrine/exman) a tool that distributes logs, checkpoints, and parameters-dicts via folders, and allows to load them in a pandas DataFrame, that is handly for processing in ipython notebooks.
- [NVIDIA APEX](https://github.com/NVIDIA/apex) for LARS optimizer. We modeified LARC to make it consistent with SimCLR repo.## Acknowledgements
- This work was supported in part through computational resources of HPC facilities at NRU HSE