{"id":18587294,"url":"https://github.com/vlievin/ovis","last_synced_at":"2025-04-10T14:30:27.471Z","repository":{"id":37634569,"uuid":"243788652","full_name":"vlievin/ovis","owner":"vlievin","description":"Official code for the \"Optimal Variance Control of the Score Function Gradient Estimator for Importance Weighted Bounds\"","archived":false,"fork":false,"pushed_at":"2023-02-16T01:58:42.000Z","size":25216,"stargazers_count":10,"open_issues_count":6,"forks_count":0,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-03-24T21:51:12.350Z","etag":null,"topics":["deep-learning","machine-learning","optimization","vae","variational-autoencoder"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vlievin.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-02-28T15:04:45.000Z","updated_at":"2022-12-25T06:48:53.000Z","dependencies_parsed_at":"2024-11-13T15:01:25.898Z","dependency_job_id":null,"html_url":"https://github.com/vlievin/ovis","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vlievin%2Fovis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vlievin%2Fovis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vlievin%2Fovis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vlievin%2Fovis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vlievin","download_url":"https://codeload.github.com/vlievin/ovis/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248233946,"owners_count":21069493,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","machine-learning","optimization","vae","variational-autoencoder"],"created_at":"2024-11-07T00:38:50.580Z","updated_at":"2025-04-10T14:30:26.437Z","avatar_url":"https://github.com/vlievin.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"![Optimal Variance Control of the Score Function Gradient Estimator for Importance Weighted Bounds (a.k.a **OVIS**) credits: Thomas Jarrand](.assets/ovis-banner.png)\n\nOfficial code for the *Optimal Variance Control of the Score Function Gradient Estimator for Importance Weighted Bounds* (a.k.a **OVIS** : Optimal Variance -- Importance Sampling). Published at NeuriPS 2020.\n\n- [NeurIPS 2020 proceedings](https://proceedings.neurips.cc/paper/2020/hash/c15203a83f778ce8934d0efaf2d5c6f3-Abstract.html)\n- [Arxiv preprint](https://arxiv.org/abs/2008.01998)\n\nOVIS is a state-of-the-art gradient estimator for discrete VAEs. This repo provides a user-friendly interface to OVIS, and other gradient estimators. OVIS can easily be imported in your project to train and evaluate discrete VAEs. The implementation is compatible with a wide variety of VAE models, including hierarchical ones. This library allows reproducing all the experiments from the paper.\n\n## Citation\n\n```\n@inproceedings{NEURIPS2020_c15203a8,\n author = {Li\\'{e}vin, Valentin and Dittadi, Andrea and Christensen, Anders and Winther, Ole},\n booktitle = {Advances in Neural Information Processing Systems},\n editor = {H. Larochelle and M. Ranzato and R. Hadsell and M. F. Balcan and H. Lin},\n pages = {16591--16602},\n publisher = {Curran Associates, Inc.},\n title = {Optimal Variance Control of the Score-Function Gradient Estimator for Importance-Weighted Bounds},\n url = {https://proceedings.neurips.cc/paper/2020/file/c15203a83f778ce8934d0efaf2d5c6f3-Paper.pdf},\n volume = {33},\n year = {2020}\n}\n\n```\n\n## Included in this library\n\n* Reparameterization-free:\n    * Reinforce\n    * Reinforce + Neural Baseline\n    * [Vimco](https://arxiv.org/abs/1602.06725)\n    * [RWS (Reweighted Wake-Sleep)](https://arxiv.org/abs/1805.10469)\n    * [TVO (Thermodynamic Variational Objective)](https://arxiv.org/abs/1907.00031)\n    * [OVIS](https://arxiv.org/abs/2008.0199)\n\n* Reparameterization-based:\n    * VAE\n    * [IWAE](https://arxiv.org/abs/1509.00519)\n    * [IWAE-STL (Sticking the Landing)](https://arxiv.org/abs/1703.09194)\n    * [IWAE-DReG (Doubly Reparameterized Gradient Estimators for Monte Carlo Objectives)](https://arxiv.org/abs/1810.04152)\n\n\n## Train your own models using OVIS\n\nOVIS can easily be imported in your own project to train your own discrete VAE/generative models. You simply need to define your model following the example bellow. The full example is available in `example.py`. \n\n#### 0. installing as a package\n\n```bash\n# install the latest release\npip install git+https://github.com/vlievin/ovis.git\n# OR install in dev. mode\ngit clone https://github.com/vlievin/ovis.git \u0026\u0026 pip install -e ovis/\n```\n\n#### 1. Initialize a gradient estimator\n\nGradient estimators can be initialized in 2 lines. Using the estimator, computing the loss for your model is a one liner.\n\n```python\n# init the estimator\nfrom ovis.estimators.config import parse_estimator_id\nEstimator, config = parse_estimator_id(\"ovis-gamma1\")\nestimator = Estimator(mc=1, iw=16, **config)\n\n# use it to compute the differentiable loss\nloss, diagnostics, output = estimator(model, x)\n```\n\n#### 2. Implement your own `nn.Module` following the `TemplateModel` class (`ovis/models/template.py`):\n\nOVIS relies on `torch.distributions` to implement the variational distributions. The library has been tested with normal, bernoulli and categorical distributions, but this should work with other distributions as well as long as it comes with a `.log_prob()` method. \n\nEvery model should implement:\n* `forward(self, x:Tensor, reparam:bool=False, **kwargs) -\u003e OUTPUT` \n* `sample_from_prior(self, bs: int, **kwargs)-\u003e OUTPUT`\n\nWhere the output format is defined as `OUTPUT=Dict[str, Union[Distribution, List[Tensor], List[Distribution]]]`.\n\nThe output is a dictionary with keys:\n* `px`: `Distribution` : distribution modelling `p(x|z)`\n* `z`: `List[Tensor]` : latent samples `z`, one item for each layer\n* `pz`: `List[Distribution]` : prior distribution `p(z)`, one item for each layer\n* `qz`: `List[Distribution]` : posterior distribution `q(z|x)`, one item for each layer\n\n\n```python\nfrom torch import nn, Tensor, zeros\nfrom torch.distributions import Bernoulli\nfrom ovis.models import TemplateModel\n\nclass SimpleModel(TemplateModel):\n    def __init__(self, xdim, zdim):\n        super().__init__()\n        self.inference_network = nn.Linear(xdim, zdim)\n        self.generative_model = nn.Linear(zdim, xdim)\n        self.register_buffer('prior', zeros((1, zdim,)))\n\n    def forward(self, x:Tensor, reparam:bool=False, **kwargs):\n        # q(z|x)\n        qz = Bernoulli(logits=self.inference_network(x))\n        # z ~ q(z|x)\n        z = qz.rsample() if reparam else qz.sample()\n        # p(x)\n        pz = Bernoulli(logits=self.prior)\n        # p(x|z)\n        px = Bernoulli(logits=self.generative_model(z))\n        # store z, pz, qz as lists (useful for hierarchical models)\n        return {'px': px, 'z': [z], 'qz': [qz], 'pz': [pz]}\n\n    def sample_from_prior(self, bs: int, **kwargs):\n        pz = Bernoulli(logits=self.prior.expand(bs, *self.prior.shape[1:]))\n        z = pz.sample()\n        px = Bernoulli(logits=self.generative_model(z))\n        return {'px': px, 'z': [z], 'pz': [pz]}\n\n# generate x ~ Bernoulli(0.5), initialize a simple VAE, forward pass, prior sampling\nx = Bernoulli(logits=zeros((1, 10,))).sample()\nmodel = SimpleModel(10, 10)\noutput = model(x)\noutput = model.sample_from_prior(1)\n```\n\n#### 3. Train your model and analyse the gradients\n\nThe code bellow shows a simple training loop for training a model.\nNotice how `parameters` can be used for various types of scheduling (i.e. $\\beta$-annealing).\nThe estimator also returns useful information regarding the computation of the gradients, such as the ELBO, KL or the effective sample size (ESS).\n\n```python\nfrom ovis.analysis.gradients import get_gradients_statistics\nfrom booster import Aggregator\nagg = Aggregator()\nparameters = {'alpha': 0.9, 'beta': 1}\nfor x in loader:\n    global_step += 1\n    loss, diagnostics, output = estimator(model, x, backward=False, **parameters)\n    loss.mean().backward()\n    optimizer.step()\n    optimizer.zero_grad()\n    agg.update(diagnostics)\n    # update parameters\n    update_fn(parameters)\n    \n# epoch summary\nsummary = agg.data.to('cpu')\n\n# analyse the gradients of the parameters of the inference network\ngrad_stats, _ = get_gradients_statistics(estimator, model, x, mc_samples=10, key_filter='inference_network')\nsummary.update(grad_stats)\n\n# log data\nsummary.log(tensorboard_writer, global_step)\n```\n\n## Requirements\n\n```bash\nconda create -n ovis python=3.7\nconda activate ovis\n# use the instructions from https://pytorch.org/\nconda install pytorch=1.5.1 torchvision cudatoolkit=10.2 -c pytorch \npip install -r requirements.txt\n# [Optional] Install Latex (used for the figures)\n```\n\n## Abstract \n\nThis paper introduces novel results for the score function gradient estimator of the importance weighted variational bound (IWAE). We prove that in the limit of large $K$ (number of importance samples) one can choose the control variate such that the Signal-to-Noise ratio (SNR) of the estimator grows as $\\sqrt{K}$. This is in contrast to the standard pathwise gradient estimator where the SNR decreases as $1/\\sqrt{K}$. Based on our theoretical findings we develop a novel control variate that extends on VIMCO. Empirically, for the training of both continuous and discrete generative models, the proposed method yields superior variance reduction, resulting in an SNR for IWAE that increases with $K$ without relying on the reparameterization trick. The novel estimator is competitive with state-of-the-art reparameterization-free gradient estimators such as Reweighted Wake-Sleep (RWS) and the thermodynamic variational objective (TVO) when training generative models.\n\n## Reproducing the experiments\n\nAll experiments are managed through the script `manager.py` which implement a mutli-threaded queue system based on\n`TinyDB` and a `filelock` protection. See `python manager.py --help` for more information about the number of \nsubprocesses and resuming experiments. The scripts `dbutils.py` provides a few utilities to inspect and clean \nthe experiment database.  `report.py` allows parsing an experiment directory and producing figures. Usage:\n\n```bash\n# begins an experiment with 2 processes per GPU (max. 2 GPUs)\npython manager.py --exp exp_id --processes n_procs_per_gpu --max_gpus 2\n# show the experiment status [queued, aborted, failed, running, success] \npython dbutils.py --exp exp_id --check\n# requeue aborted experiments\npython dbutils.py --exp exp_id --requeue --requeue_level 1\n# generate plots\npython report.py --exp exp_id --metrics train:loss/L_k,train:grads/snr --pivot_metrics train:loss/L_k,train:grads/snr \n```\n\n### Asymptotic Variance\n\n![Asymptotic Variance](.assets/asymptotic-gradients.png)\n![Distribution of the Asymptotic Gradients](.assets/asymptotic-gradients-dist.png)\n\nAnaysis of the gradients for a simple Gaussian model. Figure 1:\n\n```bash\n# run the experiment\npython manager.py --exp asymptotic-variance\n# produce the figures\npython report_asymptotic_variance --exp asymptotic-variance\n# access the results\nopen reports/asymptotic-variance\n```\n\n### Gaussian Mixture Model\n\n![Training Summaries vs. K](.assets/gmm.png)\n\nTrain a simple Gaussian Mixture model. Figure 2:\n\n```bash\n# run the experiment\npython manager.py --exp gaussian-mixture-model\n# produce the figures\npython report.py --exp=gaussian-mixture-model \\\n    --keys=dataset,estimator,iw \\\n    --metrics=test:gmm/posterior_mse,test:gmm/prior_mse,train:grads/variance,train:grads/snr \\\n    --detailed_metrics=test:gmm/posterior_mse,test:gmm/prior_mse,train:loss/ess,train:grads/variance,train:grads/snr \\\n    --pivot_metrics=min:test:gmm/posterior_mse,min:test:gmm/prior_mse,mean:train:grads/snr \\\n    --ylims=train:loss/ess:0:21\n# access the results\nopen reports/gaussian-mixture-model\n```\n\n### Sigmoid Belief Network\n\nTrain a 3-layers Sigmoid Belief Network using the Importance-Weighted Bound (IW) and the Rényi Importance Weighted Bound (IWR).\nRun all experiments:\n\n```bash\n# run the experiment\npython manager.py --exp sigmoid-belief-network\n```\n\nFigure 3 (left, VIMCO + OVIS-IW), 3 seeds:\n\n![Training curves](.assets/figure3_left.png) \n\n```bash\n# gather the data\npython report.py --exp=sigmoid-belief-network  \\\n    --include=iwbound \\\n    --keys=dataset,estimator,iw  \\\n    --metrics=test:loss/L_k,train:loss/L_k,train:loss/kl_q_p,train:grads/snr \\\n    --detailed_metrics=test:loss/L_k,train:loss/L_k,train:loss/kl_q_p,train:loss/kl,train:loss/ess,train:active_units/au,train:grads/snr \\\n    --pivot_metrics=max:test:loss/L_k,max:train:loss/L_k,last:train:loss/kl_q_p,last:train:loss/ess \\\n    --ylims=test:loss/L_k:-94:-88,train:loss/L_k:-93:-86\n# produce the figure\npython report_figure3.py --figure left\n# access the results\nopen reports/sigmoid-belief-network-inc=iwbound\n```\n\nFigure 3 (right, TVO + OVIS-IWR), 3 seeds:\n\n![Training curves](.assets/figure3_right.png)\n\n```bash\n# gather the data\npython report.py --exp=sigmoid-belief-network  \\\n    --include=iwrbound \\\n    --keys=dataset,estimator,iw  \\\n    --metrics=test:loss/L_k,train:loss/L_k,train:loss/kl_q_p,train:grads/snr \\\n    --detailed_metrics=test:loss/L_k,train:loss/L_k,train:loss/kl_q_p,train:loss/kl,train:loss/ess,train:active_units/au,train:grads/snr \\\n    --pivot_metrics=max:test:loss/L_k,max:train:loss/L_k,last:train:loss/kl_q_p,last:train:loss/ess \\\n    --ylims=test:loss/L_k:-94:-88,train:loss/L_k:-93:-86\n# produce the figure\npython report_figure3.py --figure right\n# access the results\nopen reports/sigmoid-belief-network-inc=iwrbound\n```\n\n### Gaussian VAE\n\n![Training summary vs. K](.assets/gaussian-vae.png)\n\nTrain a 1-layer Gaussian VAE. Figure 4:\n\n```bash\n# produce the figures\npython report.py --exp=gaussian-vae  \\\n    --keys=dataset,estimator,iw  \\\n    --metrics=test:loss/L_k,train:loss/L_k,train:loss/kl_q_p,train:grads/snr \\\n    --detailed_metrics=train:loss/L_k,train:loss/kl_q_p,train:grads/snr  \\\n    --pivot_metrics=max:train:loss/L_k,last:train:loss/kl_q_p,mean:train:loss/ess\n# access the results\nopen reports/gaussian-vae\n``` \n    \n## Additional Results\n\n### Binarized MNIST, Fashion MNIST and Omniglot\n\nFitting the Binarized MNIST, Fashion MNIST and Omniglot datasets. The hyperparameters are identical for all experiments.\nWith and Without Rényi warmup.\n\n#### Sigmoid Belief Network\n\n```bash\npython report.py --exp=sigmoid-belief-network  \\\n   --keys=dataset,estimator,iw,warmup  \\\n   --metrics=test:loss/L_k,train:loss/L_k,train:loss/kl_q_p,train:grads/snr  \\\n   --detailed_metrics=test:loss/L_k,train:loss/L_k,train:loss/kl_q_p,train:loss/kl,train:loss/ess,train:active_units/au,train:grads/snr  \\\n   --pivot_metrics=max:test:loss/L_k,max:train:loss/L_k,last:train:loss/kl_q_p,last:train:loss/ess,last:train:active_units/au \\\n   --downsample 50 \\\n   --include tvo,vimco,ovis-gamma1\n```\n\n![Training a Sigmoid Belief Network](.assets/sbm-all-pivot.png)\n\n#### Gaussian VAE\n\n```bash\npython report.py --exp=gaussian-vae \\\n   --keys=dataset,estimator,iw,alpha  \\\n   --metrics=test:loss/L_k,train:loss/L_k,train:loss/kl_q_p,train:grads/snr  \\\n   --detailed_metrics=test:loss/L_k,train:loss/L_k,train:loss/kl_q_p,train:loss/kl,train:loss/ess,train:active_units/au,train:grads/snr  \\\n   --pivot_metrics=max:test:loss/L_k,max:train:loss/L_k,last:train:loss/kl_q_p,last:train:loss/ess,last:train:grads/snr \\\n   --downsample 50 \n```\n\n![Training a Sigmoid Belief Network](.assets/gaussian-vae-all-pivot.png)\n\n\n### Budget Analysis\n\nIn this experiment, we compare the asymptotic OVIS (gamma=1) with the sample based control OVIS-MC. By contrast with \nthe previous experiments, the total particle budget remains equals to `K`. \nThe `K` particles are used to estimate the gradient of the generative model, \n`K-S` particles are used to evaluate the score based estimate of the gradient of the inference network and `S` particles \nare used to estimate the control variate. In the following plots, the identifier `ovis-Sy` indicates that `S = yK`. \nSee experiment `.json` file for more details.\n\n#### Gaussian VAE\n\n```bash\n# run the experiment\npython manager.py --exp budget-analysis\n# produce the figures\npython report.py --exp=budget-analysis  \\\n    --keys=dataset,estimator,iw \\\n    --metrics=test:loss/L_k,train:loss/L_k,train:loss/kl_q_p,train:grads/snr  \\\n    --detailed_metrics=train:loss/L_k,train:loss/kl_q_p  \\\n    --pivot_metrics=max:train:loss/L_k,last:train:loss/kl_q_p,mean:train:loss/ess\n# access the results\nopen reports/budget-analysis\n```\n\n![budget analysis for a Gaussian VAE](.assets/budget-gaussian-vae.png)\n\n#### Sigmoid Belief Network\n\n```bash\n# run the experiment\npython manager.py --exp budget-analysis-sbm\n# produce the figures\npython report.py --exp=budget-analysis-sbm  \\\n    --keys=dataset,estimator,iw \\\n    --metrics=test:loss/L_k,train:loss/L_k,train:loss/kl_q_p,train:grads/snr  \\\n    --detailed_metrics=train:loss/L_k,train:loss/kl_q_p  \\\n    --pivot_metrics=max:train:loss/L_k,last:train:loss/kl_q_p,mean:train:loss/ess\n# access the results\nopen reports/budget-analysis-sbm\n```\n\n![budget analysis for a Sigmoid Belief Network](.assets/budget-sbm.png)\n\n\n### Computational Efficiency\n\nChecking the memory usage of the different estimators given different particles budgets.\n\n```bash\npython measure_efficiency.py\n```\n\n![Memory usage and epoch time](.assets/efficiency.png)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvlievin%2Fovis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvlievin%2Fovis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvlievin%2Fovis/lists"}