https://github.com/feifeibear/improved-baidu-allreduce
https://github.com/feifeibear/improved-baidu-allreduce
Last synced: 3 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/feifeibear/improved-baidu-allreduce
- Owner: feifeibear
- License: apache-2.0
- Created: 2018-01-25T22:34:30.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2018-01-28T04:47:14.000Z (over 7 years ago)
- Last Synced: 2025-01-23T00:41:16.565Z (4 months ago)
- Language: Jupyter Notebook
- Size: 75.2 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# `Improvement Allreduce`
Improve BaiduAllreduce for large-scale DNN.# `baidu-allreduce`
`baidu-allreduce` is a small C++ library, demonstrating the ring allreduce and
ring allgather techniques. The goal is to provide a template for deep learning
framework authors to use when implementing these communication algorithms
within their respective frameworks.A description of the ring allreduce with its application to deep learning is
available on the [Baidu SVAIL blog](http://research.baidu.com/bringing-hpc-techniques-deep-learning/).## Installation
**Prerequisites:** Before compiling `baidu-allreduce`, make sure you have
installed CUDA (7.5 or greater) and an MPI implementation.`baidu-allreduce` has been tested with [OpenMPI](https://www.open-mpi.org/),
but should work with any CUDA-aware MPI implementation, such as MVAPICH.To compile `baidu-allreduce`, run
```bash
# Modify MPI_ROOT to point to your installation of MPI.
# You should see $MPI_ROOT/include/mpi.h and $MPI_ROOT/lib/libmpi.so.
# Modify CUDA_ROOT to point to your installation of CUDA.
make MPI_ROOT=/usr/lib/openmpi CUDA_ROOT=/path/to/cuda/lib64
```You may need to modify your `LD_LIBRARY_PATH` environment variable to point to
your MPI implementation as well as your CUDA libraries.To run the `baidu-allreduce` tests after compiling it, run
```bash
# On CPU.
mpirun --np 3 allreduce-test cpu# On GPU. Requires a CUDA-aware MPI implementation.
mpirun --np 3 allreduce-test gpu
```## Interface
The `baidu-allreduce` library provides the following C++ functions:
```c++
// Initialize the library, including MPI and if necessary the CUDA device.
// If device == NO_DEVICE, no GPU is used; otherwise, the device specifies which CUDA
// device should be used. All data passed to other functions must be on that device.
#define NO_DEVICE -1
void InitCollectives(int device);// The ring allreduce. The lengths of the data chunks passed to this function
// must be the same across all MPI processes. The output memory will be
// allocated and written into `output`.
void RingAllreduce(float* data, size_t length, float** output);// The ring allgather. The lengths of the data chunks passed to this function
// may differ across different devices. The output memory will be allocated and
// written into `output`.
void RingAllgather(float* data, size_t length, float** output);
```The interface is simple and inflexible and is meant as a demonstration. The
code is fairly straightforward and the same technique can be integrated into
existing codebases in a variety of ways.