{"id":19066132,"url":"https://github.com/epfml/powersgd","last_synced_at":"2025-04-12T20:42:09.211Z","repository":{"id":40366991,"uuid":"189630233","full_name":"epfml/powersgd","owner":"epfml","description":"Practical low-rank gradient compression for distributed optimization:  https://arxiv.org/abs/1905.13727","archived":false,"fork":false,"pushed_at":"2024-10-29T20:27:53.000Z","size":131,"stargazers_count":147,"open_issues_count":2,"forks_count":32,"subscribers_count":10,"default_branch":"master","last_synced_at":"2025-04-04T00:09:51.952Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/epfml.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-05-31T17:04:01.000Z","updated_at":"2025-03-28T15:11:18.000Z","dependencies_parsed_at":"2024-11-23T02:01:27.504Z","dependency_job_id":"f735b7af-89c3-4d9e-ba0f-167c9b500776","html_url":"https://github.com/epfml/powersgd","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epfml%2Fpowersgd","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epfml%2Fpowersgd/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epfml%2Fpowersgd/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epfml%2Fpowersgd/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/epfml","download_url":"https://codeload.github.com/epfml/powersgd/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248631668,"owners_count":21136554,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-09T00:54:33.135Z","updated_at":"2025-04-12T20:42:09.189Z","avatar_url":"https://github.com/epfml.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# PowerSGD\n\nPractical Low-Rank Gradient Compression for Distributed Optimization\n\n[Video](https://www.youtube.com/watch?v=xVxSu7KGtHw)\n\nAbstract:\nWe study gradient compression methods to alleviate the communication bottleneck in data-parallel distributed optimization. Despite the significant attention received, current compression schemes either do not scale well or fail to achieve the target test accuracy. We propose a new low-rank gradient compressor based on power iteration that can i) compress gradients rapidly, ii) efficiently aggregate the compressed gradients using all-reduce, and iii) achieve test performance on par with SGD. The proposed algorithm is the only method evaluated that achieves consistent wall-clock speedups when benchmarked against regular SGD with an optimized communication backend. We demonstrate reduced training times for convolutional networks as well as LSTMs on common datasets.\n\n\n## Reference implementation\n\nThis is a reference implementation for the PowerSGD algorithm.\n\nInstallation:\n\n```bash\npip install git+https://github.com/epfml/powersgd.git\n```\n\nUsage:\n\n```diff\n+ from powersgd import PowerSGD, Config, optimizer_step\n\n  model = torchvision.models.resnet50(pretrained=True)\n  params = list(model.parameters())\n  optimizer = torch.optim.SGD(params, lr=0.1)\n\n+ powersgd = PowerSGD(params, config=Config(\n+     rank=1,  # lower rank =\u003e more aggressive compression\n+     min_compression_rate=10,  # don't compress gradients with less compression\n+     num_iters_per_step=2,  #   # lower number =\u003e more aggressive compression\n+     start_compressing_after_num_steps=0,\n+ ))\n\n  for each batch:\n      loss = ...\n-     optimizer.zero_grad()\n      loss.backward()\n-     optimizer.step()\n+     optimizer_step(optimizer, powersgd)\n```\n\n## Differences with the paper version\n\nThe version in this code base is a slight improvement over the version in the PowerSGD paper.\nIt looks a bit like Algorithm 2 in [this follow-up paper](https://arxiv.org/pdf/2008.01425.pdf).\n\nWe found that there are two ways to control the approximation quality in PowerSGD: the first is the 'rank' of the approximation, and the second is the 'number of powerSGD iterations' in between gradient steps, while keeping the rank 1. Because the cost of orthogonalisation grows as $O(\\text{rank}^2)$, increasing the rank can become inefficient, leaving changing the number of iterations as the best option.\n\nIn the original PowerSGD paper, more iterations only improves the quality of the rank-k approximation, as the approximation converges to the \"best rank k approximation\". In the [follow-up paper](https://arxiv.org/pdf/2008.01425.pdf), intermediate results from these rank 1 power iterations are all used and communicated, effectively increasing the rank as the number of iterations grows.\n\nIn the original PowerSGD paper, we used two iterations per SGD step (a left and a right iteration). In this setting, there is not much of a difference. The difference appears when you use more power iteration steps per SGD step.\n\n\n\n## PyTorch implementation\nPyTorch features an implementation of PowerSGD as a [communucation hook](https://pytorch.org/docs/stable/ddp_comm_hooks.html) for `DistributedDataParallel` models.\nBecause of the integration with DDP, the code is more involved than the code in this repository.\n## Research code\n\nResearch code for the experiments in the [PowerSGD paper](https://arxiv.org/abs/1905.13727) is located under [paper-code](./paper-code/README.md).\n\n## Selected follow-up work \n- [(Cho et al., 2019)](http://learningsys.org/neurips19/assets/papers/1_CameraReadySubmission_mlsys_grz_camera_ready.pdf) concurrently developed an algorithm that is fundamentally very similar to PowerSGD.\n- [(Ramesh et al., 2021 - DALL-E)](https://arxiv.org/abs/2102.12092) share valuable recommendations in using PowerSGD for large-scale transformer training.\n- [(Agarwal et al., 2020)](https://arxiv.org/pdf/2010.16248.pdf) share insights into adaptive compression with PowerSGD.\n- [(Vogels et al., 2020)](https://arxiv.org/abs/2008.01425) adapt PowerSGD to work in a decentralized setting (with sparse connectivity between workers.)\n- [(Wang, 2021)](https://medium.com/pytorch/accelerating-pytorch-ddp-by-10x-with-powersgd-585aef12881d) introduces a variation to PowerSGD and describes his experience with PowerSGD on large language models.\n- [(Song et al., 2023)](https://arxiv.org/abs/2301.09830) utilizes PowerSGD (and its slight variant) to compressing pipeline-/data-parallelism gradients in 3D parallelism-based LLM training.\n- (Please submit a PR if you want your work to be included here.)\n\n\n# Reference\n\nIf you use this code, please cite the following [paper](https://arxiv.org/abs/1905.13727)\n\n    @inproceedings{vkj2019powersgd,\n      author = {Vogels, Thijs and Karimireddy, Sai Praneeth and Jaggi, Martin},\n      title = \"{{PowerSGD}: Practical Low-Rank Gradient Compression for Distributed Optimization}\",\n      booktitle = {NeurIPS 2019 - Advances in Neural Information Processing Systems},\n      year = 2019,\n      url = {https://arxiv.org/abs/1905.13727}\n    }\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fepfml%2Fpowersgd","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fepfml%2Fpowersgd","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fepfml%2Fpowersgd/lists"}