https://github.com/zjykzj/pytorch-distributed
demo for pytorch-distributed
https://github.com/zjykzj/pytorch-distributed
amp distributed distributed-data-parallel distributeddataparallel hybrid-precision-training mixed-precision-training pytorch
Last synced: 2 days ago
JSON representation
demo for pytorch-distributed
- Host: GitHub
- URL: https://github.com/zjykzj/pytorch-distributed
- Owner: zjykzj
- License: apache-2.0
- Created: 2020-09-14T12:12:02.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2022-02-09T15:02:43.000Z (about 4 years ago)
- Last Synced: 2025-06-13T11:50:20.199Z (11 months ago)
- Topics: amp, distributed, distributed-data-parallel, distributeddataparallel, hybrid-precision-training, mixed-precision-training, pytorch
- Language: Python
- Homepage:
- Size: 29.3 KB
- Stars: 4
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
«pytorch-distributed» use PyTorch DistributedDataParallel implements distributed computing, and use AMP implements the mixed precision operation
***At present, only single machine and multi-card scenarios are considered***
## Table of Contents
- [Table of Contents](#table-of-contents)
- [Background](#background)
- [Install](#install)
- [Usage](#usage)
- [Maintainers](#maintainers)
- [Thanks](#thanks)
- [Contributing](#contributing)
- [License](#license)
## Background
Distributed computing can make full use of the computing power of multi-card GPU and train better model parameters faster; At the same time, on the one hand, mixed precision training can improve the training speed, on the other hand, it can also reduce the memory occupation in the training stage and allow larger batches
## Install
```
$ pip install -r requirements.txt
```
## Usage
At present, four training scenarios are implemented:
* Single card training
* Multi-card training
* Single card hybrid precision training
* Multi-card hybrid precision training
## Maintainers
* zhujian - *Initial work* - [zjykzj](https://github.com/zjykzj)
## Thanks
* [examples/mnist_hogwild](https://github.com/pytorch/examples/tree/master/mnist_hogwild)
* [Distributed data parallel training in Pytorch](https://yangkky.github.io/2019/07/08/distributed-pytorch-tutorial.html)
* [tczhangzhi/pytorch-distributed](https://github.com/tczhangzhi/pytorch-distributed)
## Contributing
Anyone's participation is welcome! Open an [issue](https://github.com/zjykzj/pytorch-distributed/issues) or submit PRs.
Small note:
* Git submission specifications should be complied with [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0-beta.4/)
* If versioned, please conform to the [Semantic Versioning 2.0.0](https://semver.org) specification
* If editing the README, please conform to the [standard-readme](https://github.com/RichardLitt/standard-readme) specification.
## License
[Apache License 2.0](LICENSE) © 2020 zjykzj
