{"id":13592221,"url":"https://github.com/kakaobrain/torchgpipe","last_synced_at":"2025-05-15T18:04:06.702Z","repository":{"id":42425405,"uuid":"185968705","full_name":"kakaobrain/torchgpipe","owner":"kakaobrain","description":"A GPipe implementation in PyTorch","archived":false,"fork":false,"pushed_at":"2024-07-25T10:55:29.000Z","size":460,"stargazers_count":840,"open_issues_count":9,"forks_count":99,"subscribers_count":32,"default_branch":"master","last_synced_at":"2025-05-15T18:04:02.129Z","etag":null,"topics":["checkpointing","deep-learning","gpipe","model-parallelism","parallelism","pipeline-parallelism","pytorch"],"latest_commit_sha":null,"homepage":"https://torchgpipe.readthedocs.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kakaobrain.png","metadata":{"files":{"readme":"README.ko.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-05-10T10:25:41.000Z","updated_at":"2025-05-12T04:56:09.000Z","dependencies_parsed_at":"2024-11-24T16:00:43.305Z","dependency_job_id":null,"html_url":"https://github.com/kakaobrain/torchgpipe","commit_stats":{"total_commits":334,"total_committers":8,"mean_commits":41.75,"dds":0.5239520958083832,"last_synced_commit":"a1b4ee25574864e7650e7905a69ce156da9752ec"},"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kakaobrain%2Ftorchgpipe","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kakaobrain%2Ftorchgpipe/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kakaobrain%2Ftorchgpipe/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kakaobrain%2Ftorchgpipe/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kakaobrain","download_url":"https://codeload.github.com/kakaobrain/torchgpipe/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254394720,"owners_count":22063984,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["checkpointing","deep-learning","gpipe","model-parallelism","parallelism","pipeline-parallelism","pytorch"],"created_at":"2024-08-01T16:01:07.050Z","updated_at":"2025-05-15T18:04:06.675Z","avatar_url":"https://github.com/kakaobrain.png","language":"Python","funding_links":[],"categories":["Pipeline Parallelism or Inter-layer Model Parallelism only:","Frameworks","分布式机器学习","Pytorch \u0026 related libraries｜Pytorch \u0026 相关库","Pytorch \u0026 related libraries","Python","Open Source Projects"],"sub_categories":["Survey","Other libraries｜其他库:","Other libraries:","3. Open Source Auto-Parallelism Framework"],"readme":"# torchgpipe \u003cimg src=\"docs/_static/not-pipe.svg\" height=\"20\" /\u003e\n\n[![PyPI](https://img.shields.io/pypi/v/torchgpipe.svg)](https://pypi.org/project/torchgpipe)\n[![Build Status](https://travis-ci.org/kakaobrain/torchgpipe.svg?branch=master)](https://travis-ci.org/kakaobrain/torchgpipe)\n[![Coverage Status](https://coveralls.io/repos/github/KakaoBrain/torchgpipe/badge.svg?branch=master)](https://coveralls.io/github/KakaoBrain/torchgpipe?branch=master)\n[![Documentation Status](https://readthedocs.org/projects/torchgpipe/badge/?version=latest)](https://torchgpipe.readthedocs.io/en/latest/?badge=latest)\n[![English README](https://img.shields.io/badge/readme-english-blue.svg)](README.md)\n\nPyTorch 용 [GPipe](https://arxiv.org/abs/1811.06965) 구현입니다. TPU 대신\nCUDA를 활용합니다.\n\n```python\nfrom torchgpipe import GPipe\nmodel = nn.Sequential(a, b, c, d)\nmodel = GPipe(model, balance=[1, 1, 1, 1], chunks=8)\noutput = model(input)\n```\n\n## GPipe란?\n\nGPipe는 Google Brain에서 발표한 학습 기법으로, 메모리를 많이 차지하는 큰 모델을\n효율적으로 학습시키는 데 유용합니다. Google이 공개한 논문의 벤치마크에 따르면\n기준보다 8배 많은 장치(TPU)로 25배 큰 모델을 학습시킬 수 있고, 기준보다 4배\n많은 장치에서 3.5배 빨리 학습시킬 수 있다고 합니다.\n\n[GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism](https://arxiv.org/abs/1811.06965)\n\nGoogle은 GPipe를 이용해 5.6억개의 패러미터를 가지는 AmoebaNet-B 모델을\n학습시켰습니다. 이 모델은 ImageNet에서 top-1 정확도 84.3%, top-5 정확도 97.0%로\nSOTA를 기록하고 있습니다. (2019년 5월 기준)\n\nGPipe는 Pipeline Parallelism과 Checkpointing, 두 가지 방법으로 가능한 큰 모델을\n학습시킵니다.\n\n\u003cdl\u003e\n\u003cdt\u003ePipeline Parallelism\u003c/dt\u003e\n\u003cdd\u003e우선 GPipe는 모델을 여러 파티션으로 나눠 각각 서로 다른 장치에 배치해 더\n    많은 메모리를 사용할 수 있게 한다. 그리고 여러 파티션이 최대한 병렬적으로\n    작동할 수 있도록, 모델에 입력되는 미니배치를 여러 마이크로배치로 나눠서\n    모델에 흘려보낸다.\u003c/dd\u003e\n\n\u003cdt\u003eCheckpointing\u003c/dt\u003e\n\u003cdd\u003e각 파티션엔 체크포인트를 만들어 메모리 가용량을 극대화한다. 순전파(forward\n    propagation) 때 파티션 경계의 입출력만 기억하고 내부의 히든레이어는\n    휘발시킨다. 휘발된 히든레이어는 역전파(backpropagation) 때 다시\n    계산된다.\u003c/dd\u003e\n\u003c/dl\u003e\n\n## 사용법\n\n현재 torchgpipe는 다음 환경을 지원합니다:\n\n- Python 3.6 이상\n- PyTorch 1.1 이상\n\n우선 `torchgpipe`를 PyPI에서 설치합니다:\n\n```sh\n$ pip install torchgpipe\n```\n\n임의의 `nn.Sequential` 모듈을 `torchgpipe.GPipe`로 감싸면 GPipe가 적용됩니다.\n`balance` 인자는 각 파티션의 레이어 개수를 정합니다. `chunks` 인자는\n마이크로배치 개수를 설정합니다. 모듈의 입출력과 각 파티션 경계의 입출력은 모두\n`Tensor` 혹은 `Tuple[Tensor, ...]` 형식이어야 합니다.\n\n다음 예제코드는 총 4층으로 이뤄진 모듈을 각각 1층씩 지니는 4개의 파티션으로\n나누는 방법을 보여줍니다. 마이크로배치 개수는 8개로 설정했습니다:\n\n```python\nfrom torchgpipe import GPipe\n\nmodel = nn.Sequential(a, b, c, d)\nmodel = GPipe(model, balance=[1, 1, 1, 1], chunks=8)\n\nfor input in data_loader:\n    output = model(input)\n```\n\n## 문서화\n\nAPI 문서를 비롯한 자세한 문서는 [torchgpipe.readthedocs.io][rtd]에서 확인할 수\n있습니다.\n\n[rtd]: https://torchgpipe.readthedocs.io/\n\n## 벤치마크\n\n각 벤치마크의 자세한 내용과 추가적인 벤치마크는\n[torchgpipe.readthedocs.io][rtd-benchmarks]에서 확인할 수 있습니다.\n\n[rtd-benchmarks]: https://torchgpipe.readthedocs.io/en/stable/benchmarks.html\n\n### ResNet-101 정확도 벤치마크\n\n배치크기 | torchgpipe | nn.DataParallel | Goyal et al.\n-------- | ---------: | --------------: | -----------:\n256      | 21.99±0.13 |      22.02±0.11 |   22.08±0.06\n1K       | 22.24±0.19 |      22.04±0.24 |          N/A\n4K       | 22.13±0.09 |             N/A |          N/A\n\nGPipe를 사용해 학습할 때 추가적인 하이퍼파라미터 조정이 없길 바랍니다. 이를\n검증하기 위해 [Accurate, Large Minibatch SGD](https://arxiv.org/abs/1706.02677)\n논문의 표2(c)에 보고된 ResNet-101 정확도(오답률) 벤치마크를 재현했습니다.\n\n### U-Net (B, C) 메모리 벤치마크\n\n실험       | U-Net (B, C) | 파라미터 | 메모리 사용량\n---------- | ------------ | -------: | ------------:\nbaseline   | (6, 72)      |   362.2M |      20.3 GiB\npipeline-1 | (11, 128)    |    2.21B |      20.5 GiB\npipeline-2 | (24, 128)    |    4.99B |      43.4 GiB\npipeline-4 | (24, 160)    |    7.80B |      79.1 GiB\npipeline-8 | (48, 160)    |   15.82B |     154.1 GiB\n\nGPipe로 얼마나 큰 U-Net 모델을 학습시킬 수 있는지 측정했습니다. *baseline*은\nGPipe를 적용하지 않은 경우를 나타내고, *pipeline-1*, *-2*, *-4*, *-8*은 GPipe를\n적용했을 때 GPU 수에 따른 경우를 나타냅니다.\n\n이 벤치마크엔 간략화한 U-Net 구조를 사용했습니다. 모델의 크기는 하이퍼파라미터\nB와 C로 결정합니다. 각각 레이어 수와 필터 수에 비례합니다.\n\n### U-Net (5, 64) 속도 벤치마크\n\n실험       | 처리량   | 속도향상\n---------- | -------: | -------:\nbaseline   | 28.500/s |       1×\npipeline-1 | 24.456/s |   0.858×\npipeline-2 | 35.502/s |   1.246×\npipeline-4 | 67.042/s |   2.352×\npipeline-8 | 88.497/s |   3.105×\n\nU-Net 구조는 여러 롱스킵커넥션을 포함합니다. 스킵커넥션에서의 효율성을 검증하기\n위해 U-Net에서 GPU 수에 따른 처리량을 측정했습니다.\n\n### AmoebaNet-D (18, 256) Speed Benchmark\n\n실험      | 처리량    | torchgpipe | Huang et al.\n--------- | --------: | ---------: | -----------:\nn=2, m=1  |  26.733/s |         1× |           1×\nn=2, m=4  |  41.133/s |     1.546× |        1.07×\nn=2, m=32 |  47.386/s |     1.780× |        1.21×\nn=4, m=1  |  26.827/s |     1.006× |        1.13×\nn=4, m=4  |  44.543/s |     1.680× |        1.26×\nn=4, m=32 |  72.412/s |     2.711× |        1.84×\nn=8, m=1  |  24.918/s |     0.932× |        1.38×\nn=8, m=4  |  70.065/s |     2.625× |        1.72×\nn=8, m=32 | 132.413/s |     4.966× |        3.48×\n\n(*n*: 파티션 수, *m*: 마이크로배치 수)\n\n[GPipe](https://arxiv.org/abs/1811.06965) 논문의 표2에 보고된 AmoebaNet-D (18,\n256) 학습 속도 벤치마크를 재현했습니다. 논문의 *K*를 *n*으로 바꿔 표기했습니다.\n\n## 참고사항\n\n이 프로젝트는 개발진이 의도한대로 동작하나, 아직 인터페이스가 확정되지\n않았습니다. v0.1.0 전까지는 공개된 API가 경고 없이 바뀔 수 있습니다.\n\n## 개발진 및 사용권\n\ntorchgpipe 프로젝트는 [카카오브레인][]의 [이흥섭][], [정명룡][], [김치헌][]이\n개발하고 [임성빈][], [김일두][], [백운혁][], [윤부근][]의 도움을 받았습니다.\n[BSD-3-Clause 사용권](LICENSE)으로 배포됩니다.\n\n[카카오브레인]: https://kakaobrain.com/\n[이흥섭]: https://subl.ee/\n[정명룡]: https://github.com/mrJeong\n[김치헌]: https://github.com/chiheonk\n[임성빈]: https://github.com/sungbinlim\n[김일두]: https://github.com/ildoonet\n[백운혁]: https://github.com/wbaek\n[윤부근]: https://github.com/bgyoon\n\n## 인용\n\n해당 라이브러리를 연구용으로 사용할 경우, 아래 BibTeX 링크를 인용해야 합니다.\n\n```\n@article{kim2020torchgpipe,\n    title={torchgpipe: On-the-fly Pipeline Parallelism for Training Giant Models},\n    author={Chiheon Kim and Heungsub Lee and Myungryong Jeong and Woonhyuk Baek and Boogeon Yoon and Ildoo Kim and Sungbin Lim and Sungwoong Kim},\n    year={2020},\n    eprint={2004.09910},\n    archivePrefix={arXiv}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkakaobrain%2Ftorchgpipe","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkakaobrain%2Ftorchgpipe","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkakaobrain%2Ftorchgpipe/lists"}