Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/gaoliyao/Replica_Exchange_Stochastic_Gradient_MCMC

Code for Non-convex Learning via Replica Exchange Stochastic Gradient MCMC, ICML 2020.
https://github.com/gaoliyao/Replica_Exchange_Stochastic_Gradient_MCMC

icml-2020 sgmcmc

Last synced: 3 months ago
JSON representation

Code for Non-convex Learning via Replica Exchange Stochastic Gradient MCMC, ICML 2020.

Host: GitHub
URL: https://github.com/gaoliyao/Replica_Exchange_Stochastic_Gradient_MCMC
Owner: gaoliyao
License: mit
Created: 2020-06-30T01:13:11.000Z (over 4 years ago)
Default Branch: master
Last Pushed: 2020-12-03T14:19:02.000Z (almost 4 years ago)
Last Synced: 2024-07-04T01:02:12.084Z (4 months ago)
Topics: icml-2020, sgmcmc
Language: Python
Homepage:
Size: 1.36 MB
Stars: 22
Watchers: 4
Forks: 3
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Replica Exchange Stochastic Gradient MCMC

Experiment code for "[Non-convex Learning via Replica Exchange Stochastic Gradient MCMC](https://arxiv.org/pdf/2008.05367.pdf)". This is a scalable replica exchange (also known as parallel tempering) stochastic gradient MCMC algorithm with clear acceleration guarantees. This algorithm proposes **corrected swaps** to connect the high-temperature process for **exploration** and the low-temperature process for **exploitation**.

```
@inproceedings{reSGMCMC,
title={Non-convex Learning via Replica Exchange Stochastic Gradient MCMC},
author={Wei Deng and Qi Feng* and Liyao Gao* and Faming Liang and Guang Lin},
booktitle = {Proceedings of the 37th International Conference on Machine Learning},
pages = {2474--2483},
year = {2020},
volume = {119}
}
```

# Simulation of Gaussian mixture distributions

## Environment

1. R

2. numDeriv (library)

3. ggplot2 (library)

Please check the file in the **simulation** folder

# Optimization of Supervised Learning on CIFAR100

## Environment

1. Python2.7

2. PyTorch >= 1.1

3. Numpy

## How to run code on CIFAR100 using Resnet20

Setup: batch size 256 and 500 epochs. Simulated annealing is used by default.

- ![#f03c15](https://via.placeholder.com/15/f03c15/000000?text=+) `SGHMC` Set the default learning rate (lr) to 2e-6 and the temperature (T) to 0.01
```bash
$ python bayes_cnn.py -data cifar100 -model resnet -depth 20 -sn 500 -train 256 -lr 2e-6 -T 0.01 -chains 1
```

- ![#c5f015](https://via.placeholder.com/15/c5f015/000000?text=+) `reSGHMC` The low-temperature chain has the same setting as SGHMC; The high-temperature chain has a higher lr=3e-6 (2e-6/LRgap) and a higher T=0.05 (0.01/Tgap); The initial F is 3e5.
```bash
$ python bayes_cnn.py -data cifar100 -model resnet -depth 20 -sn 500 -train 256 -chains 2 -LRgap 0.66 -Tgap 0.2 -F_jump 0.8 -bias_F 3e5
```

- ![#1589F0](https://via.placeholder.com/15/1589F0/000000?text=+) `Naive reSGHMC` Simply set bias_F=1e300 and F_jump=1 as follows
```bash
$ python bayes_cnn.py -data cifar100 -model resnet -depth 20 -sn 500 -train 256 -chains 2 -F_jump 1 -bias_F 1e300
```

To use a large batch size 1024, you need a slower annealing rate and 2000 epochs to keep the same iterations.
```bash
$ python bayes_cnn.py -data cifar100 -model resnet -depth 20 -sn 2000 -train 1024 -chains 1 -lr_anneal 0.996 -anneal 1.005
$ python bayes_cnn.py -data cifar100 -model resnet -depth 20 -sn 2000 -train 1024 -chains 2 -lr_anneal 0.996 -anneal 1.005 -F_jump 0.8
```

Remark: If you do Bayesian model average every epoch and there are two swaps in the same epoch, the **acceleration may be neutralized**. To handle this issue, you need to consider a cooling time.

To run the WRN models (WRN-16-8 and wrn-28-10) , you can try the following
```bash
$ python bayes_cnn.py -data cifar100 -model wrn -sn 500 -train 256 -chains 2 -F_jump 0.8 -cool 20 -bias_F 3e5
$ python bayes_cnn.py -data cifar100 -model wrn28 -sn 500 -train 256 -chains 2 -F_jump 0.8 -cool 20 -bias_F 3e5
```
Note that in WRN models, we need to include the extra **cooling time** because cases of two consecutive swaps during the same epoch happens a lot and cancel the acceleration effects.

To reduce the hyperparameter tuning cost, you can try **greedy** instead of swap to break the detailed balance. This strategy has the same optimization performance as the swap type. For example
```bash
$ python bayes_cnn.py -data cifar100 -model wrn -types greedy -sn 500 -train 256 -chains 2 -cool 20 -bias_F 3e5
```

# Semi-supervised Learning via Bayesian GAN
## Environment

1. Python2.7

2. Tensorflow == 1.0.0 (version number might be critical)

3. Numpy

## How to run code on CIFAR10 using Replica Exchange Stochastic Gradient MCMC
```bash
python ./bayesian_gan_hmc.py --dataset cifar --numz 10 --num_mcmc 2 --data_path ./output --out_dir ./output --train_iter 15000 --N 4000 --lr 0.00045 -LRgap 0.66 -Tgap 100 --semi_supervised --n_save 100 --gen_observed 4000 --fileName cifar10_4000_0.00045_0.66_100
```
For detailed instruction please check the README.md file inside semi_supervised_learning folder.