https://github.com/typhoonzero/nccl_rdma_demo
https://github.com/typhoonzero/nccl_rdma_demo
Last synced: about 1 year ago
JSON representation
- Host: GitHub
- URL: https://github.com/typhoonzero/nccl_rdma_demo
- Owner: typhoonzero
- Created: 2018-04-25T11:38:45.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2018-04-28T03:29:34.000Z (about 8 years ago)
- Last Synced: 2025-02-01T14:45:14.355Z (over 1 year ago)
- Language: C++
- Size: 5.86 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# NCCL multi node communication demo
Some demo code to show how to use NCCL2 to do AllReduce on multiple nodes.
# Build
To build, you **must** install CUDA and NCCL before start to build:
```bash
git clone https://github.com/typhoonzero/nccl_rdma_demo.git
cd nccl_rdma_demo
make
```
# Run
We use [Redis](https://redis.io) to broadcast NCCL unique id, so start a redis
instance before you start, if you have docker, just run:
```bash
docker run -d --name myredis -p 6379:6379 redis
```
Run `./demo` to get help message.
```
Usage: demo [redisip:port] [node count] [node id]
```
Assume you have 2 nodes, each node have 4 GPUs, then you can run the demo like below:
* On node 1: `./demo [redis ip]:6379 2 0`
* On node 2: `./demo [redis ip]:6379 2 1`