{"id":18646423,"url":"https://github.com/ucbrise/actnn","last_synced_at":"2025-08-20T01:31:51.330Z","repository":{"id":41082643,"uuid":"360466575","full_name":"ucbrise/actnn","owner":"ucbrise","description":"ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training","archived":false,"fork":false,"pushed_at":"2022-12-22T03:03:50.000Z","size":180,"stargazers_count":201,"open_issues_count":9,"forks_count":30,"subscribers_count":7,"default_branch":"main","last_synced_at":"2024-12-09T00:26:34.529Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ucbrise.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-04-22T09:38:36.000Z","updated_at":"2024-10-18T17:13:23.000Z","dependencies_parsed_at":"2023-01-30T05:31:28.170Z","dependency_job_id":null,"html_url":"https://github.com/ucbrise/actnn","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ucbrise%2Factnn","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ucbrise%2Factnn/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ucbrise%2Factnn/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ucbrise%2Factnn/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ucbrise","download_url":"https://codeload.github.com/ucbrise/actnn/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":230382858,"owners_count":18216854,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-07T06:19:48.600Z","updated_at":"2024-12-19T05:08:02.297Z","avatar_url":"https://github.com/ucbrise.png","language":"Python","readme":"# ActNN : Activation Compressed Training\n\nThis is the official project repository for [ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training](https://arxiv.org/abs/2104.14129) by\nJianfei Chen\\*, Lianmin Zheng\\*, Zhewei Yao, Dequan Wang, Ion Stoica, Michael W. Mahoney, and Joseph E. Gonzalez.\n\n**TL; DR.**\nActNN is a PyTorch library for memory-efficient training. It reduces the training memory footprint by compressing the saved activations. ActNN is implemented as a collection of memory-saving layers. These layers have an identical interface to their PyTorch counterparts.\n\n## Abstract\nThe increasing size of neural network models has been critical for improvements in their accuracy, but device memory is not growing at the same rate. This creates fundamental challenges for training neural networks within limited memory environments. In this work, we propose ActNN, a memory-efficient training framework that stores randomly quantized activations for back propagation. We prove the convergence of ActNN for general network architectures, and we characterize the impact of quantization on the convergence via an exact expression for the gradient variance. Using our theory, we propose novel mixed-precision quantization strategies that exploit the activation's heterogeneity across feature dimensions, samples, and layers. These techniques can be readily applied to existing dynamic graph frameworks, such as PyTorch, simply by substituting the layers. We evaluate ActNN on mainstream computer vision models for classification, detection, and segmentation tasks. On all these tasks, ActNN compresses the activation to 2 bits on average, with negligible accuracy loss. ActNN reduces the memory footprint of the activation by 12×, and it enables training with a 6.6× to 14× larger batch size.\n\n![mem_speed_r50](mem_speed_benchmark/mem_speed_r50.png)\n*Batch size vs. training throughput on ResNet-50. Red cross mark means out-of-memory. The shaded yellow region denotes the possible batch sizes with full precision training. ActNN achieves significantly larger maximum batch size over other state-of-the-art systems and displays a nontrivial trade-off curve.*\n\n\n## Install\n- Requirements\n```\ntorch\u003e=1.7.1,\u003c=1.8.0\ntorchvision\u003e=0.8.2\nninja\u003e=1.10.1\n```\nGPU and CUDA Toolkit are required.\n\n- Build\n```bash\ngit clone git@github.com:ucbrise/actnn.git\ncd actnn/actnn\npip install -v -e .\n```\n\n## Usage\n[mem_speed_benchmark/train.py](mem_speed_benchmark/train.py) is an example on using ActNN for models from torchvision.\n\n### Basic Usage\n- Step1: Configure the optimization level  \nActNN provides several optimization levels to control the trade-off between memory saving and computational overhead.\nYou can set the optimization level by\n```python\nimport actnn\n# available choices are [\"L0\", \"L1\", \"L2\", \"L3\", \"L4\", \"L5\"]\nactnn.set_optimization_level(\"L3\")\n```\nSee [set_optimization_level](actnn/actnn/conf.py) for more details.\n\n- Step2: Convert the model to use ActNN's layers.  \n```python\nmodel = actnn.QModule(model)\n```\n**Note**:\n1. Convert the model _before_ calling `.cuda()`.\n2. Set the optimization level _before_ invoking `actnn.QModule` or constructing any ActNN layers.\n3. Automatic model conversion only works with standard PyTorch layers.\nPlease use the modules (`nn.Conv2d`, `nn.ReLU`, etc.), not the functions (`F.conv2d`, `F.relu`).  \n\n\n- Step3: Print the model to confirm that all the modules (Conv2d, ReLU, BatchNorm) are correctly converted to ActNN layers.\n```python\nprint(model)    # Should be actnn.QConv2d, actnn.QBatchNorm2d, etc.\n```\n\n\n### Advanced Features\n- Convert the model manually.  \nActNN is implemented as a collection of memory-saving layers, including `actnn.QConv1d, QConv2d, QConv3d, QConvTranspose1d, QConvTranspose2d, QConvTranspose3d,\n    QBatchNorm1d, QBatchNorm2d, QBatchNorm3d, QLinear, QReLU, QSyncBatchNorm, QMaxPool2d`. These layers have identical interface to their PyTorch counterparts.\nYou can construct the model manually using these layers as the building blocks.\nSee `ResNetBuilder` and `resnet_configs` in [image_classification/image_classification/resnet.py](image_classification/image_classification/resnet.py) for example.\n- (Optional) Change the data loader  \nIf you want to use per-sample gradient information for adaptive quantization,\nyou have to update the dataloader to return sample indices.\nYou can see `train_loader` in [mem_speed_benchmark/train.py](mem_speed_benchmark/train.py) for example.\nIn addition, you have to update the configurations.\n```python\nfrom actnn import config, QScheme\nconfig.use_gradient = True\nQScheme.num_samples = 1300000   # the size of training set\n```\nYou can find sample code in the above script.\n- (Beta) Mixed precision training   \nActNN works seamlessly with [Amp](https://github.com/NVIDIA/apex), please see [image_classification](image_classification/) for an example.\n\n## Examples\n\n### Benchmark Memory Usage and Training Speed\nSee [mem_speed_benchmark](mem_speed_benchmark/). Please do NOT measure the memory usage by `nvidia-smi`.\n`nvidia-smi` reports the size of the memory pool allocated by PyTorch, which can be much larger than the size of acutal used memory.\n\n### Image Classification\nSee [image_classification](image_classification/)\n\n### Object Detection, Semantic Segmentation, Self-Supervised Learning, ...\nHere is the example memory-efficient training for ResNet50, built upon the [OpenMMLab](https://openmmlab.com/) toolkits.\nWe use ActNN with the default optimization level (L3).\nOur training runs are available at [Weights \u0026 Biases](https://wandb.ai/actnn).\n\n#### Installation\n\n1. Install [mmcv](https://github.com/DequanWang/actnn-mmcv)\n```bash\nexport MMCV_ROOT=/path/to/clone/actnn-mmcv\ngit clone https://github.com/DequanWang/actnn-mmcv $MMCV_ROOT\ncd $MMCV_ROOT\nMMCV_WITH_OPS=1 MMCV_WITH_ORT=0 pip install -e .\n```\n\n2. Install [mmdet](https://github.com/DequanWang/actnn-mmdet), [mmseg](https://github.com/DequanWang/actnn-mmseg), [mmssl](https://github.com/DequanWang/actnn-mmssl), ...\n```bash\nexport MMDET_ROOT=/path/to/clone/actnn-mmdet\ngit clone https://github.com/DequanWang/actnn-mmdet $MMDET_ROOT\ncd $MMDET_ROOT\npython setup.py develop\n```\n\n```bash\nexport MMSEG_ROOT=/path/to/clone/actnn-mmseg\ngit clone https://github.com/DequanWang/actnn-mmseg $MMSEG_ROOT\ncd $MMSEG_ROOT\npython setup.py develop\n```\n\n```bash\nexport MMSSL_ROOT=/path/to/clone/actnn-mmssl\ngit clone https://github.com/DequanWang/actnn-mmssl $MMSSL_ROOT\ncd $MMSSL_ROOT\npython setup.py develop\n```\n\n#### Single GPU training\n```python\ncd $MMDET_ROOT\npython tools/train.py configs/actnn/faster_rcnn_r50_fpn_1x_coco_1gpu.py\n# https://wandb.ai/actnn/detection/runs/ye0aax5s\n# ActNN mAP 37.4 vs Official mAP 37.4\npython tools/train.py configs/actnn/retinanet_r50_fpn_1x_coco_1gpu.py\n# https://wandb.ai/actnn/detection/runs/1x9cwokw\n# ActNN mAP 36.3 vs Official mAP 36.5\n```\n\n```python\ncd $MMSEG_ROOT\npython tools/train.py configs/actnn/fcn_r50-d8_512x1024_80k_cityscapes_1gpu.py\n# https://wandb.ai/actnn/segmentation/runs/159if8da\n# ActNN mIoU 72.9 vs Official mIoU 73.6\npython tools/train.py configs/actnn/fpn_r50_512x1024_80k_cityscapes_1gpu.py\n# https://wandb.ai/actnn/segmentation/runs/25j9iyv3\n# ActNN mIoU 74.7 vs Official mIoU 74.5\n```\n\n#### Multiple GPUs training\n```python\ncd $MMSSL_ROOT\nbash tools/dist_train.sh configs/selfsup/actnn/moco_r50_v2_bs512_e200_imagenet_2gpu.py 2\n# https://wandb.ai/actnn/mmssl/runs/lokf7ydo\n# https://wandb.ai/actnn/mmssl/runs/2efmbuww\n# ActNN top1 67.3 vs Official top1 67.7\n```\n\nFor more detailed guidance, please refer to the docs of [mmcv](https://github.com/DequanWang/actnn-mmcv), [mmdet](https://github.com/DequanWang/actnn-mmdet), [mmseg](https://github.com/DequanWang/actnn-mmseg), [mmssl](https://github.com/DequanWang/actnn-mmssl).\n\n## FAQ\n1. Does ActNN supports CPU training?  \nCurrently, ActNN only supports CUDA.\n\n2. Accuracy degradation / diverged training with ActNN.  \nActNN applies lossy compression to the activations. In some challenging cases, our default compression strategy might be too aggressive.\nIn this case, you may try more conservative compression strategies (which consume more memory):\n    - 4-bit per-group quantization  \n   ```python\n   actnn.set_optimization_level(\"L2\")\n   ```\n   - 8-bit per-group quantization\n   ```python\n   actnn.set_optimization_level(\"L2\")\n   actnn.config.activation_compression_bits = [8]\n   ```\n    If none of these works, you may report to us by creating an issue.\n\n## Citation\n\nIf the actnn library is helpful in your research, please consider citing our paper:\n\n```bibtex\n@inproceedings{chen2021actnn,\n  title={ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training},\n  author={Chen, Jianfei and Zheng, Lianmin and Yao, Zhewei and Wang, Dequan and Stoica, Ion and Mahoney, Michael W and Gonzalez, Joseph E},\n  booktitle={International Conference on Machine Learning},\n  year={2021}\n}\n```\n","funding_links":[],"categories":["其他_机器学习与深度学习"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fucbrise%2Factnn","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fucbrise%2Factnn","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fucbrise%2Factnn/lists"}