{"id":19401158,"url":"https://github.com/google-research/rigl","last_synced_at":"2025-04-06T05:16:10.468Z","repository":{"id":43367573,"uuid":"224050000","full_name":"google-research/rigl","owner":"google-research","description":"End-to-end training of sparse deep neural networks with little-to-no performance loss. ","archived":false,"fork":false,"pushed_at":"2023-01-26T17:47:14.000Z","size":830,"stargazers_count":320,"open_issues_count":0,"forks_count":49,"subscribers_count":16,"default_branch":"master","last_synced_at":"2025-03-30T04:08:44.697Z","etag":null,"topics":["computer-vision","machine-learning","neural-networks","sparse-training"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/google-research.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-11-25T22:03:16.000Z","updated_at":"2025-03-24T16:51:47.000Z","dependencies_parsed_at":"2023-02-14T20:15:59.723Z","dependency_job_id":null,"html_url":"https://github.com/google-research/rigl","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-research%2Frigl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-research%2Frigl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-research%2Frigl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-research%2Frigl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/google-research","download_url":"https://codeload.github.com/google-research/rigl/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247436286,"owners_count":20938533,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","machine-learning","neural-networks","sparse-training"],"created_at":"2024-11-10T11:17:23.879Z","updated_at":"2025-04-06T05:16:10.381Z","avatar_url":"https://github.com/google-research.png","language":"Python","readme":"# Rigging the Lottery: Making All Tickets Winners\n\u003cimg src=\"https://github.com/google-research/rigl/blob/master/imgs/flops8.jpg\" alt=\"80% Sparse Resnet-50\" width=\"45%\" align=\"middle\"\u003e\n\n**Paper**: [https://arxiv.org/abs/1911.11134](https://arxiv.org/abs/1911.11134)\n\n**15min Presentation** [[pml4dc](https://pml4dc.github.io/iclr2020/program/pml4dc_7.html)] [[icml](https://icml.cc/virtual/2020/paper/5808)]\n\n**ML Reproducibility Challenge 2020** [report](https://openreview.net/forum?id=riCIeP6LzEE)\n\n## Colabs for Calculating FLOPs of Sparse Models\n[MobileNet-v1](https://github.com/google-research/rigl/blob/master/rigl/imagenet_resnet/colabs/MobileNet_Counting.ipynb)\n\n[ResNet-50](https://github.com/google-research/rigl/blob/master/rigl/imagenet_resnet/colabs/Resnet_50_Param_Flops_Counting.ipynb)\n\n## Best Sparse Models\nParameters are float, so each parameter is represented with 4 bytes. Uniform\nsparsity distribution keeps first layer dense therefore have slightly larger size\nand parameters. ERK applies to all layers except for 99% sparse model, in which\nwe set the first layer to be dense, since otherwise we observe much worse\nperformance.\n\n### Extended Training Results\nPerformance of RigL increases significantly with extended training iterations.\nIn this section we extend the training of sparse models by 5x. Note that sparse\nmodels require much less FLOPs per training iteration and therefore most of the\nextended trainings cost less FLOPs than baseline dense training.\n\nObserving improving performance we wanted to understand where the performance of sparse networks saturates. Longest training we ran had 100x training length of the original\n100 epoch ImageNet training. This training costs 5.8x of the original dense training FLOPS and the resulting 99% sparse Resnet-50 achieves an impressive 68.15% test accuracy (vs 5x training accuracy of 61.86%).\n\n| S. Distribution |  Sparsity | Training FLOPs | Inference FLOPs | Model Size (Bytes) | Top-1 Acc | Ckpt         |\n|-----------------|-----------|----------------|-----------------|-------------------------------------|-----------|--------------|\n| - (DENSE)       | 0         | 3.2e18         | 8.2e9           | 102.122                             | 76.8      | -            |\n| ERK             | 0.8       | 2.09x          | 0.42x           | 23.683                              | 77.17     | [link](https://storage.googleapis.com/gresearch/rigl/s80erk5x.tar.gz) |\n| Uniform         | 0.8       | 1.14x          | 0.23x           | 23.685                              | 76.71     | [link](https://storage.googleapis.com/gresearch/rigl/s80uniform5x.tar.gz) |\n| ERK             | 0.9       | 1.23x          | 0.24x           | 13.499                              | 76.42     | [link](https://storage.googleapis.com/gresearch/rigl/s90erk5x.tar.gz) |\n| Uniform         | 0.9       | 0.66x          | 0.13x           | 13.532                              | 75.73     | [link](https://storage.googleapis.com/gresearch/rigl/s90uniform5x.tar.gz) |\n| ERK             | 0.95      | 0.63x          | 0.12x           | 8.399                               | 74.63     | [link](https://storage.googleapis.com/gresearch/rigl/s95erk5x.tar.gz) |\n| Uniform         | 0.95      | 0.42x          | 0.08x           | 8.433                               | 73.22     | [link](https://storage.googleapis.com/gresearch/rigl/s95uniform5x.tar.gz) |\n| ERK             | 0.965     | 0.45x          | 0.09x           | 6.904                               | 72.77     | [link](https://storage.googleapis.com/gresearch/rigl/s965erk5x.tar.gz) |\n| Uniform         | 0.965     | 0.34x          | 0.07x           | 6.904                               | 71.31     | [link](https://storage.googleapis.com/gresearch/rigl/s965uniform5x.tar.gz) |\n| ERK             | 0.99      | 0.29x          | 0.05x           | 4.354                    | 61.86     | [link](https://storage.googleapis.com/gresearch/rigl/s99erk5x.tar.gz) |\n| ERK             | 0.99  | 0.58x          | 0.05x           | 4.354                               | 63.89 | [link](https://storage.googleapis.com/gresearch/rigl/s99erk10x.tar.gz) |\n| ERK             | 0.99  | 2.32x          | 0.05x           | 4.354                               | 66.94 | [link](https://storage.googleapis.com/gresearch/rigl/s99erk40x.tar.gz) |\n| ERK             | **0.99**  | 5.8x          | 0.05x           | 4.354                               | **68.15** | [link](https://storage.googleapis.com/gresearch/rigl/s99erk100x.tar.gz) |\n\nWe also ran extended training runs with MobileNet-v1. Again training 100x more,\nwe were not able saturate the performance. Training longer consistently achieved\nbetter results.\n\n| S. Distribution |  Sparsity | Training FLOPs | Inference FLOPs | Model Size (Bytes) | Top-1 Acc | Ckpt         |\n|-----------------|-----------|----------------|-----------------|-------------------------------------|-----------|--------------|\n| - (DENSE)       | 0         | 4.5e17         | 1.14e9           | 16.864                            | 72.1      | -            |\n| ERK             | 0.89       | 1.39x         | 0.21x           | 2.392                             | 69.31     | [link](https://storage.googleapis.com/gresearch/rigl/mbv1_s90_erk10x.tar.gz) |\n| ERK             | 0.89       | 2.79x          | 0.21x         | 2.392                              | 70.63     | [link](https://storage.googleapis.com/gresearch/rigl/mbv1_s90_erk50x.tar.gz) |\n| Uniform         | 0.89       | 1.25x          | 0.09x           | 2.392                              | 69.28     | [link](https://storage.googleapis.com/gresearch/rigl/mbv1_s90_uniform10x.tar.gz) |\n| Uniform         | 0.89       | 6.25x          | 0.09x           | 2.392                              | 70.25     | [link](https://storage.googleapis.com/gresearch/rigl/mbv1_s90_uniform50x.tar.gz) |\n| Uniform         | 0.89       | 12.5x          | 0.09x           | 2.392                              | 70.59     | [link](https://storage.googleapis.com/gresearch/rigl/mbv1_s90_uniform100x.tar.gz) |\n\n\n### 1x Training Results\n\n| S. Distribution |  Sparsity | Training FLOPs | Inference FLOPs | Model Size (Bytes) | Top-1 Acc | Ckpt         |\n|-----------------|-----------|----------------|-----------------|-------------------------------------|-----------|--------------|\n| ERK             | 0.8       | 0.42x          | 0.42x           | 23.683                              | 75.12     | [link](https://storage.googleapis.com/gresearch/rigl/s80erk1x.tar.gz) |\n| Uniform         | 0.8       | 0.23x          | 0.23x           | 23.685                              | 74.60     | [link](https://storage.googleapis.com/gresearch/rigl/s80uniform1x.tar.gz) |\n| ERK             | 0.9       | 0.24x          | 0.24x           | 13.499                              | 73.07     | [link](https://storage.googleapis.com/gresearch/rigl/s90erk1x.tar.gz) |\n| Uniform         | 0.9       | 0.13x          | 0.13x           | 13.532                              | 72.02     | [link](https://storage.googleapis.com/gresearch/rigl/s90uniform1x.tar.gz) |\n\n### Results w/o label smoothing\n\n| S. Distribution |  Sparsity | Training FLOPs | Inference FLOPs | Model Size (Bytes) | Top-1 Acc | Ckpt         |\n|-----------------|-----------|----------------|-----------------|-------------------------------------|-----------|--------------|\n| ERK             | 0.8       | 0.42x          | 0.42x           | 23.683                              | 75.02     | [link](https://storage.googleapis.com/gresearch/rigl/S80erk_nolabelsmooth_1x.tar.gz) |\n| ERK             | 0.8       | 2.09x          | 0.42x           | 23.683                              | 76.17     | [link](https://storage.googleapis.com/gresearch/rigl/S80erk_nolabelsmooth_5x.tar.gz) |\n| ERK             | 0.9       | 0.24x          | 0.24x           | 13.499                              | 73.4     | [link](https://storage.googleapis.com/gresearch/rigl/S90erk_nolabelsmooth_1x.tar.gz) |\n| ERK             | 0.9       | 1.23x          | 0.24x           | 13.499                              | 75.9     | [link](https://storage.googleapis.com/gresearch/rigl/S90erk_nolabelsmooth_5x.tar.gz) |\n| ERK             | 0.95      | 0.13x          | 0.12x           | 8.399                              | 70.39     | [link](https://storage.googleapis.com/gresearch/rigl/S95erk_nolabelsmooth_1x.tar.gz) |\n| ERK             | 0.95      | 0.63x          | 0.12x           | 8.399                              | 74.36    | [link](https://storage.googleapis.com/gresearch/rigl/S95erk_nolabelsmooth_5x.tar.gz) |\n\n### Evaluating checkpoints\nDownload the checkpoints and run the evaluation on ERK checkpoints with the\nfollowing:\n\n```python\npython imagenet_train_eval.py --mode=eval_once --output_dir=path/to/ckpt/folder \\\n    --eval_once_ckpt_prefix=model.ckpt-3200000 --use_folder_stub=False \\\n    --training_method=rigl --mask_init_method=erdos_renyi_kernel \\\n    --first_layer_sparsity=-1\n```\n\nWhen running checkpoints with uniform sparsity distribution use `--mask_init_method=random` and `--first_layer_sparsity=0`. Set \n`--model_architecture=mobilenet_v1` when evaluating mobilenet checkpoints.\n\n## Sparse Training Algorithms\nIn this repository we implement following dynamic sparsity strategies:\n\n1.  [SET](https://www.nature.com/articles/s41467-018-04316-3): Implements Sparse\n    Evalutionary Training (SET) which corresponds to replacing low magnitude\n    connections randomly with new ones.\n\n2.  [SNFS](https://arxiv.org/abs/1907.04840): Implements momentum based training\n    *without* sparsity re-distribution:\n\n3.  [RigL](https://arxiv.org/abs/1911.11134): Our method, RigL, removes a\n    fraction of connections based on weight magnitudes and activates new ones\n    using instantaneous gradient information.\n\nAnd the following one-shot pruning algorithm:\n\n1. [SNIP](https://arxiv.org/abs/1810.02340): Single-shot Network Pruning based \n  on connection sensitivity prunes the least salient connections before training.\n\nWe have code for following settings:\n- [Imagenet2012](https://github.com/google-research/rigl/tree/master/rigl/imagenet_resnet):\n  TPU compatible code with Resnet-50 and MobileNet-v1/v2.\n- [CIFAR-10](https://github.com/google-research/rigl/tree/master/rigl/cifar_resnet)\n  with WideResNets.\n- [MNIST](https://github.com/google-research/rigl/tree/master/rigl/mnist) with\n  2 layer fully connected network.\n\n## Setup\nFirst clone this repo.\n```bash\ngit clone https://github.com/google-research/rigl.git\ncd rigl\n```\n\nWe use [Neurips 2019 MicroNet Challenge](https://micronet-challenge.github.io/)\ncode for counting operations and size of our networks. Let's clone the\ngoogle_research repo and add current folder to the python path.\n```bash\ngit clone https://github.com/google-research/google-research.git\nmv google-research/ google_research/\nexport PYTHONPATH=$PYTHONPATH:$PWD\n```\n\nNow we can run some tests. Following script creates a virtual environment and\ninstalls the necessary libraries. Finally, it runs few tests.\n```bash\nbash run.sh\n```\n\nWe need to activate the virtual environment before running an experiment. With\nthat, we are ready to run some trivial MNIST experiments.\n```bash\nsource env/bin/activate\n\npython rigl/mnist/mnist_train_eval.py\n```\n\nYou can load and verify the performance of the Resnet-50 checkpoints\nlike following.\n```bash\npython rigl/imagenet_resnet/imagenet_train_eval.py --mode=eval_once --training_method=baseline --eval_batch_size=100 --output_dir=/path/to/folder --eval_once_ckpt_prefix=s80_model.ckpt-1280000 --use_folder_stub=False\n```\n\nWe use the [Official TPU Code](https://github.com/tensorflow/tpu/tree/master/models/official/resnet)\nfor loading ImageNet data. First clone the\ntensorflow/tpu repo and then add models/ folder to the python path.\n```bash\ngit clone https://github.com/tensorflow/tpu.git\nexport PYTHONPATH=$PYTHONPATH:$PWD/tpu/models/\n```\n\n## Other Implementations\n- [Graphcore-TF-MNIST](https://github.com/graphcore/examples/tree/master/applications/tensorflow/dynamic_sparsity/mnist_rigl): with sparse matrix ops!\n- [Pytorch implementation](https://github.com/McCrearyD/rigl-torch) by Dyllan McCreary.\n- [Micrograd-Pure Python](https://evcu.github.io/ml/sparse-micrograd/): This is\na toy example with pure python sparse implementation. Caution, very slow but fun.\n\n## Citation\n```\n@incollection{rigl,\n author = {Evci, Utku and Gale, Trevor and Menick, Jacob and Castro, Pablo Samuel and Elsen, Erich},\n booktitle = {Proceedings of Machine Learning and Systems 2020},\n pages = {471--481},\n title = {Rigging the Lottery: Making All Tickets Winners},\n year = {2020}\n}\n```\n## Disclaimer\nThis is not an official Google product.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoogle-research%2Frigl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgoogle-research%2Frigl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoogle-research%2Frigl/lists"}