{"id":13604230,"url":"https://github.com/MachineLearningSystem/swarm","last_synced_at":"2025-04-11T23:32:06.748Z","repository":{"id":185461989,"uuid":"596341291","full_name":"MachineLearningSystem/swarm","owner":"MachineLearningSystem","description":"Official code for \"SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient\"","archived":false,"fork":true,"pushed_at":"2023-02-01T12:28:53.000Z","size":4325,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2024-11-07T08:42:31.895Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"yandex-research/swarm","license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MachineLearningSystem.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-02-02T00:59:47.000Z","updated_at":"2023-02-01T12:45:29.000Z","dependencies_parsed_at":null,"dependency_job_id":"6c0817ea-58f5-4959-934c-f939a7a38fdc","html_url":"https://github.com/MachineLearningSystem/swarm","commit_stats":null,"previous_names":["machinelearningsystem/swarm"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MachineLearningSystem%2Fswarm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MachineLearningSystem%2Fswarm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MachineLearningSystem%2Fswarm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MachineLearningSystem%2Fswarm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MachineLearningSystem","download_url":"https://codeload.github.com/MachineLearningSystem/swarm/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248495064,"owners_count":21113561,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T19:00:41.954Z","updated_at":"2025-04-11T23:32:01.734Z","avatar_url":"https://github.com/MachineLearningSystem.png","language":null,"funding_links":[],"categories":["Paper-Code"],"sub_categories":["Parallellism Training"],"readme":"# SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient\n\n![Illustration of SWARM parallelism](swarm.png)\n\nThis repository contains the code to replicate experiments of\n[\"SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient\"](https://arxiv.org/abs/2301.11913).\n\n**Note:** this codebase, as well as the project itself, is a work in progress: \ncertain features (e.g., rebalancing between pipeline stages) are not yet added to the repository, expect the paper to get updated as well.\nIn the meantime, you can watch this repository or visit the [repository](https://github.com/bigscience-workshop/petals)\nof [Petals](https://petals.ml/) — a similar project for *inference* of large language models that was inspired by SWARM\nand shares portions of codebase with it.\n\n# Large-scale experiments and throughput estimation\n\nInstructions to replicate the experiments on large-scale language model pretraining and throughput estimation on\nmultiple preemptible nodes, as well as the prototype implementation of SWARM, are located in\nthe [swarm](./swarm) subfolder.\n\n# Bottleneck experiments\n\nInstructions to replicate the compression-aware architecture experiments can be found\nin [bottleneck/README.md](bottleneck/README.md).\n\n# Contacts\n\nFeel free to ask any questions about this work [by email](mailto:mryabinin0@gmail.com).\n\n# References\n\n```\n@misc{ryabinin2023swarm,\n    title={SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient},\n    author={Max Ryabinin and Tim Dettmers and Michael Diskin and Alexander Borzunov},\n    year={2023},\n    eprint={2301.11913},\n    archivePrefix={arXiv},\n    primaryClass={cs.DC}\n}\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMachineLearningSystem%2Fswarm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FMachineLearningSystem%2Fswarm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMachineLearningSystem%2Fswarm/lists"}