{"id":20623218,"url":"https://github.com/soskek/variational_dropout_sparsifies_dnn","last_synced_at":"2025-10-13T15:13:45.282Z","repository":{"id":113372941,"uuid":"89415177","full_name":"soskek/variational_dropout_sparsifies_dnn","owner":"soskek","description":"Variational Dropout Sparsifies Deep Neural Networks (Molchanov et al. 2017) by Chainer","archived":false,"fork":false,"pushed_at":"2017-06-22T00:27:10.000Z","size":93,"stargazers_count":18,"open_issues_count":0,"forks_count":4,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-06-04T18:55:54.916Z","etag":null,"topics":["chainer","deep-learning","icml","machine-learning","neural-network"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/soskek.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-04-25T23:13:43.000Z","updated_at":"2025-01-23T06:10:17.000Z","dependencies_parsed_at":"2023-04-19T23:32:08.964Z","dependency_job_id":null,"html_url":"https://github.com/soskek/variational_dropout_sparsifies_dnn","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/soskek/variational_dropout_sparsifies_dnn","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/soskek%2Fvariational_dropout_sparsifies_dnn","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/soskek%2Fvariational_dropout_sparsifies_dnn/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/soskek%2Fvariational_dropout_sparsifies_dnn/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/soskek%2Fvariational_dropout_sparsifies_dnn/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/soskek","download_url":"https://codeload.github.com/soskek/variational_dropout_sparsifies_dnn/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/soskek%2Fvariational_dropout_sparsifies_dnn/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266041822,"owners_count":23867950,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chainer","deep-learning","icml","machine-learning","neural-network"],"created_at":"2024-11-16T12:26:16.391Z","updated_at":"2025-10-13T15:13:45.230Z","avatar_url":"https://github.com/soskek.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Variational Dropout Sparsifies Deep Neural Networks\nThe code includes variational dropout for linear and convolutional layers to sparsify deep neural networks using Chainer.\nIt will replicate experiments in the paper below  \n```\nVariational Dropout Sparsifies Deep Neural Networks.  \nDmitry Molchanov, Arsenii Ashukha, Dmitry Vetrov.  \nICML 2017.\n```\n\nSee https://arxiv.org/pdf/1701.05369.pdf.\n\nThis repository contains  \n- MNIST example using variational dropout\n    - LeNet-300-100 and LeNet5\n- CIFAR-10 or -100 example using variational dropout\n    - VGGNet16\n- PennTreeBank RNN language model example using variational dropout\n    - 2-layer LSTM LM\n    - This experiment is original and does not exist in the paper\n- General Chain for models using variational dropout\n- Linear link using variational dropout\n- Convolution2D link using variational dropout\n- Sparse forward computation of Linear link\n\nThe code of variational dropout is partly based on the paper and the authors' [repository](https://github.com/ars-ashuha/variational-dropout-sparsifies-dnn), which uses theano instead of Chainer.\nExample scripts are derived from official examples of Chainer.\n\n# Requirements\n\n- Python 3.6.0+\n- Chainer 2.0.0+ (this version is strictly required)\n- numpy 1.12.1+\n- [scipy](https://www.scipy.org/) for sparse matrix computation on CPU.\n- cupy 1.0.0+ (if using gpu)\n- and their dependencies\n\n\n# Examples\n\n- MNIST: Convolutional network (LeNet5) or fully-connected feedforward network (LeNet-300-100) for MNIST. The example is derived from the official MNIST example of Chainer v2.  \n  ```\n  python -u train_mnist.py --gpu=0\n  ```\n  Some settings are different from those of experiments in the paper;\n  this learning rate is higher and not decayed and this uses warmup (annealing) training rather than\n  two seperate stages of pretraining (w/o VD) and finetuning (w/ VD).\n  \n- CIFAR-10 or 100: Convolutional network (VGGNet) for CIFAR. The example is derived from the official CIFAR example of Chainer v2.  \n  ```\n  python -u train_cifar.py --gpu=0\n  ```\n  This currently fails to completely reproduce results shown in the paper w.r.t both improving sparsity and retaining accuracy.  \n  Additional arguments for running are as follows  \n  - `--resume FILE`: Load a pretrain model (if needed). Default is none, and start training from random initialization.\n  - `--pretrain 1/0`: 1 -\u003e Pretrain w/o VD. 0 -\u003e finetune (from `resume`) or warmup training w/ VD. Default is 0.\n  - `--dataset cifar10/cifar100`: Target dataset. Default is cifar10.\n\n- PTB: RNNLM using recurrent network for language modeling on PennTreeBank. This experiment is original from this repository rather than from the paper. The example is derived from the official PTB example of Chainer v2.  \n  ```\n  python -u train_ptb.py --gpu=0\n  ```\n  VD-RNN require large memory and much time. In our experiment, introducing VD into LSTM damages performance even after pretraining.\n\n# How to use variational dropout (VD) in Chainer\n\n## VariationalDropoutChain\nThis implements a general model class `VariationalDropoutChain`, which inherits `chainer.Chain`.\nThe class has a function to calculate joint objective about loss (sum of cross entroy and KL divergence).\nSo, if you use Chainer's official Updater in your code, you can use VD training by writing as follows\n```\nupdater = training.StandardUpdater(\n    train_iter, optimizer, device=args.gpu,\n    loss_func=model.calc_loss)\n```\nYou can also observe some statistics about VD (e.g., sparsity) in the model\nduring training using `chainer.extensions.PrintReport` (see the MNIST or CIFAR example).\n\n## VariationalDropoutLinear, VariationalDropoutConvolution2D\nA model based on `VariationalDropoutChain` can use special layers (Chainer's `link`) in its structure.\nThis repository provides both\n- `VariationalDropoutLinear`, which inherits `chainer.links.Linear`\n- `VariationalDropoutConvolution2D`, which inherits `chainer.links.Convolution2D`\n\nYou can use them just by replacing exsisting `chainer.links.Linear` or `chainer.links.Convolution2D` respectively.\nAll available arguments of the old variants are supported.\nAnd, additional arguments for hyperparameters\n(`p_threshold`, `loga_threshold` and `initial_log_sigma2`) are also available.\nThey are already set good parameters shown in the paper by default.\n\nThese links are used as a primitive part of more complex neural networks.\nFor example,\ntanh RNN (i.e., vanilla RNN) can be written with `chainer.links.Linear` layer.\nThus, the VD variant of tanh RNN `VariationalDropoutTanhRNN` can also be written with `VariationalDropoutLinear`.\nThis is used in PTB example, and see `VariationalDropoutTanhRNN` in `net.py` for detailed structure.\n\n\n## Convert common Chain to new Chain using VD\nYou can also use variational dropout on an existing `chainer.Chain` model class\nby wrapping the target class with `VariationalDropoutChain` and\ncalling `.to_variational_dropout()` as follows\n```\nclass VGG16VD(VD.VariationalDropoutChain, VGG16):\n    def __init__(self, class_labels=10, warm_up=0.0001):\n        super(VGG16VD, self).__init__(warm_up=warm_up, class_labels=class_labels)\n\nmodel = VGG16VD()\nmodel.to_variational_dropout()\n```\n\nYou can see this usage in CIFAR example.\n\n## Forward Propagation using Sparse Computation of scipy.sparse\nAfter training, especially VD training,\nit is desirable to use a model for inference lightly on CPU.\nA model based on `VariationalDropoutChain` can use the method `.to_cpu_sparse()`.\nThe method transforms all linear layers in the model into new layers with pruned weights\nusing sparse matrix on `scipy.sparse`.\nThis accelerates the forward propagation and reduces memory after VD training.\nHowever, the current implementation does not accelerates convolutional layers\ndue to a lack of good methods of convoluions with sparse filters.\nThus, a model almost consisting of convolutional layers (e.g. VGGNet) can not be accelerated.\nPlease see this usage in MNIST example.\n\nNote: The transformed model works only on CPUs, for the forward propagation, and in inference.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsoskek%2Fvariational_dropout_sparsifies_dnn","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsoskek%2Fvariational_dropout_sparsifies_dnn","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsoskek%2Fvariational_dropout_sparsifies_dnn/lists"}