https://github.com/zjykzj/pytorch-distributed

demo for pytorch-distributed
https://github.com/zjykzj/pytorch-distributed

amp distributed distributed-data-parallel distributeddataparallel hybrid-precision-training mixed-precision-training pytorch

Last synced: 2 days ago
JSON representation

demo for pytorch-distributed

Host: GitHub
URL: https://github.com/zjykzj/pytorch-distributed
Owner: zjykzj
License: apache-2.0
Created: 2020-09-14T12:12:02.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2022-02-09T15:02:43.000Z (about 4 years ago)
Last Synced: 2025-06-13T11:50:20.199Z (11 months ago)
Topics: amp, distributed, distributed-data-parallel, distributeddataparallel, hybrid-precision-training, mixed-precision-training, pytorch
Language: Python
Homepage:
Size: 29.3 KB
Stars: 4
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          


  Language:

    🇺🇸

  🇨🇳



 




  «pytorch-distributed» use PyTorch DistributedDataParallel implements distributed computing, and use AMP implements the mixed precision operation







  

  

  



***At present, only single machine and multi-card scenarios are considered***

## Table of Contents

- [Table of Contents](#table-of-contents)

- [Background](#background)

- [Install](#install)

- [Usage](#usage)

- [Maintainers](#maintainers)

- [Thanks](#thanks)

- [Contributing](#contributing)

- [License](#license)

## Background

Distributed computing can make full use of the computing power of multi-card GPU and train better model parameters faster; At the same time, on the one hand, mixed precision training can improve the training speed, on the other hand, it can also reduce the memory occupation in the training stage and allow larger batches

## Install

```

$ pip install -r requirements.txt

```

## Usage

At present, four training scenarios are implemented:

* Single card training

* Multi-card training

* Single card hybrid precision training

* Multi-card hybrid precision training

## Maintainers

* zhujian - *Initial work* - [zjykzj](https://github.com/zjykzj)

## Thanks

* [examples/mnist_hogwild](https://github.com/pytorch/examples/tree/master/mnist_hogwild)

* [Distributed data parallel training in Pytorch](https://yangkky.github.io/2019/07/08/distributed-pytorch-tutorial.html)

* [tczhangzhi/pytorch-distributed](https://github.com/tczhangzhi/pytorch-distributed)

## Contributing

Anyone's participation is welcome! Open an [issue](https://github.com/zjykzj/pytorch-distributed/issues) or submit PRs.

Small note:

* Git submission specifications should be complied with [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0-beta.4/)

* If versioned, please conform to the [Semantic Versioning 2.0.0](https://semver.org) specification

* If editing the README, please conform to the [standard-readme](https://github.com/RichardLitt/standard-readme) specification.

## License

[Apache License 2.0](LICENSE) © 2020 zjykzj

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/zjykzj/pytorch-distributed

Awesome Lists containing this project

README