Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/quiver-team/torch-quiver

PyTorch Library for Low-Latency, High-Throughput Graph Learning on GPUs.
https://github.com/quiver-team/torch-quiver

distributed-computing geometric-deep-learning gpu-acceleration graph-learning graph-neural-networks pytorch

Last synced: about 12 hours ago
JSON representation

PyTorch Library for Low-Latency, High-Throughput Graph Learning on GPUs.

Awesome Lists containing this project

README

        

[pypi-image]: https://badge.fury.io/py/torch-geometric.svg
[pypi-url]: https://pypi.org/project/torch-quiver/



--------------------------------------------------------------------------------

Quiver is a distributed graph learning library for [PyTorch Geometric](https://github.com/pyg-team/pytorch_geometric) (PyG). The goal of Quiver is to make distributed graph learning easy-to-use and achieve high-performance.

[![Documentation Status](https://readthedocs.org/projects/torch-quiver/badge/?version=latest)](https://torch-quiver.readthedocs.io/en/latest/?badge=latest)

--------------------------------------------------------------------------------

## Release 0.2.0 is out!

In the latest release `torch-quiver==0.2.0`, we have added support for efficient GNN serving and faster feature collection.

### High-throughput & Low-latency GNN Serving

Quiver now supports efficient GNN serving. The serving API is simple and easy-to-use. For example, the following code snippet shows how to use Quiver to serve a GNN model:

```python
from torch_geometric.datasets import Reddit
from torch.multiprocessing import Queue
from quiver import AutoBatch, ServingSampler, ServerInference

# Define dataset and sampler
dataset = Reddit(...)

# Instantiate the auto batch component
request_batcher = RequestBatcher(stream_input_queue, ...)
# batched_request_queue_list = [cpu_batched_request_queue_list, gpu_batched_request_queue_list]
batched_queue_list = request_batcher.batched_request_queue_list()

# Instantiate the sampler component
hybrid_sampler = HybridSampler(dataset, batched_queue_list, ...)
# sampled_request_queue_list = [cpu_sampled_request_queue_list, gpu_sampled_request_queue_list]
sampled_queue_list = hybrid_sampler.sampled_request_queue_list()
hybrid_sampler.start()

# Instantiate the inference server component
server = InferenceServer(model_path, dataset, sampled_queue_list, ...)
# result_queue_list = [Queue, ..., Queue]
result_queue_list = server.result_queue_list()

server.start()
```

A full example using Quiver to serve a GNN model with Reddit dataset on a single machine can be found [here](https://github.com/quiver-team/torch-quiver/examples/serving/reddit/reddit_serving.py).

### Test Serving

```cmd
$ cd examples/serving/reddit
$ python prepare_data.py
$ python reddit_serving.py
```

### Key Idea

Quiver's key idea is to exploit **workload metrics** for predicting the irregular computation of GNN requests, and governing the use of GPUs for graph sampling and feature aggregation: (1) for graph sampling, Quiver calculates the **probabilistic sampled graph size**, a metric that predicts the degree of parallelism in graph sampling. Quiver uses this metric to assign sampling tasks to GPUs only when the performance gains surpass CPU-based sampling; and (2) for feature aggregation, Quiver relies on the **feature access probability** to decide which features to partition and replicate across a distributed GPU NUMA topology. Quiver achieves up to 35$\times$ lower latency with a 8$\times$ higher throughput compared to state-of-the-art GNN approaches (DGL and PyG).

Below is a figure that describes a benchmark that evaluates the performance of Quiver in serving situation, PyG (2.0.3) and [DGL](https://github.com/dmlc/dgl) (1.0.2) on a 2-GPU server that runs the [Reddit with GraphSage](http://snap.stanford.edu/graphsage/).

![Throughput vs. Latency of GNN request serving](docs/serving/tp99.png)

---

## Why Quiver?

----
The primary motivation for this project is to make it easy to take a PyG program and scale it across many GPUs and CPUs. A typical scenario is: Users can use the easy-to-use APIs of PyG to efficiently develop graph learning programs, and rely on Quiver to run these PyG programs at large scale. To make such scaling effective, Quiver has several novel features:

* **High performance**: Quiver enables GPUs to be effectively used in accelerating performance-critical graph learning tasks: graph sampling, feature collection and data-parallel training. Quiver thus often significantly out-perform PyG and DGL even with a single GPU (see benchmark results below), especially when processing large-scale datasets and models.

* **High scalability**: Quiver can achieve (super) linear scalability in distributed graph learning. This is contributed by Quiver's novel adaptive data/feature/processor management techniques and effective usage of fast networking technologies (e.g., NVLink and RDMA).

* **Easy to use**: To use Quiver, developers only need to add a few lines of code in existing PyG programs. Quiver is thus easy to be adopted by PyG users and deployed in production clusters.

### Faster Feature Aggregation

Feature aggregation is one of the performance bottleneck of GNN systems. Quiver enables faster feature aggregation with the following techniques:

- Quiver uses the **feature access probability** metric to place popular features strategically on GPUs. A primary objective of feature placement is to
enable GPUs to take advantage of low-latency connectivity,
such as NVLink and InfiniBand, to their peer GPUs. This
allows GPUs to achieve low-latency access to features when
aggregating features.

- Quiver uses GPU kernels that can leverage efficient one-sided
reads to access remote features over NVLink/InfiniBand.

More details of our feature aggregation techniques can be found in our repo [quiver-feature](https://github.com/quiver-team/quiver-feature).

Below is a chart that describes a benchmark that evaluates the performance of Quiver, PyG (2.0.1) and [DGL](https://github.com/dmlc/dgl) (0.7.0) on a 4-GPU server that runs the [Open Graph Benchmark](https://ogb.stanford.edu/).

![e2e_benchmark](docs/multi_medias/imgs/benchmark_e2e_performance-min.png)

We will add multi-node result soon.

For system design details, see Quiver's [design overview](docs/Introduction_en.md) (Chinese version: [设计简介](docs/Introduction_cn.md)).

## Install

----
### Install Dependence

To install Quiver:
1. Install [Pytorch](https://pytorch.org/get-started/locally/)
2. Install [PyG](https://github.com/pyg-team/pytorch_geometric)

### Pip Install

```cmd
$ pip install torch-quiver
```

We have tested Quiver with the following setup:

* OS: Ubuntu 18.04, Ubuntu 20.04
* CUDA: 10.2, 11.1
* GPU: P100, V100, Titan X, A6000

### Install From Source

```cmd
$ git clone https://github.com/quiver-team/torch-quiver.git && cd torch-quiver
$ QUIVER_ENABLE_CUDA=1 python setup.py install
```

### Test Install

You can download Quiver's examples to test installation:

```cmd
$ git clone [email protected]:quiver-team/torch-quiver.git && cd torch-quiver
$ python3 examples/pyg/reddit_quiver.py
```

A successful run should contain the following line:

`Epoch xx, Loss: xx.yy, Approx. Train: xx.yy`

### Use Quiver with Docker

[Docker](https://www.docker.com/) is the simplest way to use Quiver. Check the [guide](docker/README.md) for details.

## Quick Start

To use Quiver, you need to replace PyG's graph sampler and feature collector with `quiver.Sampler` and `quiver.Feature`. The replacement usually requires only a few changes in existing PyG programs.

### Use Quiver in Single-GPU PyG Scripts

Only three steps are required to enable Quiver in a single-GPU PyG script:

```python
import quiver

...

## Step 1: Replace PyG graph sampler
# train_loader = NeighborSampler(data.edge_index, ...) # Comment out PyG sampler
train_loader = torch.utils.data.DataLoader(train_idx) # Quiver: PyTorch Dataloader
quiver_sampler = quiver.pyg.GraphSageSampler(quiver.CSRTopo(data.edge_index), sizes=[25, 10]) # Quiver: Graph sampler

...

## Step 2: Replace PyG feature collectors
# feature = data.x.to(device) # Comment out PyG feature collector
quiver_feature = quiver.Feature(rank=0, device_list=[0]).from_cpu_tensor(data.x) # Quiver: Feature collector

...

## Step 3: Train PyG models with Quiver
# for batch_size, n_id, adjs in train_loader: # Comment out PyG training loop
for seeds in train_loader: # Use PyTorch training loop in Quiver
n_id, batch_size, adjs = quiver_sampler.sample(seeds) # Use Quiver graph sampler
batch_feature = quiver_feature[n_id] # Use Quiver feature collector
...
...

```
### Use Quiver in Multi-GPU PyG Scripts

To use Quiver in multi-GPU PyG scripts, we can simply pass `quiver.Feature` and `quiver.Sampler` as arguments to the child processes launched in PyTorch's DDP training, as shown below:

```python
import quiver

# PyG DDP function that trains GNN models
def ddp_train(rank, feature, sampler):
...

# Replace PyG graph sampler and feature collector with Quiver's alternatives
quiver_sampler = quiver.pyg.GraphSageSampler(...)
quiver_feature = quiver.Feature(...)

mp.spawn(
ddp_train,
args=(quiver_feature, quiver_sampler), # Pass Quiver components as arguments
nprocs=world_size,
join=True
)
```

A full multi-gpu Quiver example is [here](examples/multi_gpu/pyg/ogb-products/dist_sampling_ogb_products_quiver.py).

### Run Quiver

Below is an example command that runs a Quiver's script `examples/pyg/reddit_quiver.py`:

```cmd
$ python3 examples/pyg/reddit_quiver.py
```

Quiver has the same launch command on both single-GPU servers and multi-GPU servers. We will provide multi-node examples soon.

## Examples

We provide rich examples to show how to enable Quiver in real-world PyG scripts:

- Enabling Quiver in PyG's single-GPU examples: [ogbn-product](examples/pyg/) and [reddit](examples/pyg/).
- Enabling Quiver in PyG's multi-GPU examples: [ogbn-product](examples/multi_gpu/pyg/ogb-products/) and [reddit](examples/multi_gpu/pyg/reddit/).

## Documentation

Quiver provides many parameters to optimise the performance of its graph samplers (e.g., GPU-local or CPU-GPU hybrid) and feature collectors (e.g., feature replication/sharding strategies). Check [Documentation](https://torch-quiver.readthedocs.io/en/latest/) for details.

## Community

We welcome contributors to join the development of Quiver. Quiver is currently maintained by researchers from the [University of Edinburgh](https://www.ed.ac.uk/), [Imperial College London](https://www.imperial.ac.uk/), [Tsinghua University](https://www.tsinghua.edu.cn/en/index.htm) and [University of Waterloo](https://uwaterloo.ca/). The development of Quiver has received the support from [Alibaba](https://github.com/alibaba) and [Lambda Labs](https://lambdalabs.com/).

## Citation

If you find the design of Quiver useful or use Quiver in your work, please cite Quiver with the bibtex below:
```bibtex
@misc{quiver2023,
author = {Zeyuan Tan, Xiulong Yuan, Congjie He, Man-Kit Sit, Guo Li, Xiaoze Liu, Baole Ai, Kai Zeng, Peter Pietzuch and Luo Mai},
title = {Quiver: Supporting GPUs for Low-Latency, High-Throughput GNN Serving with Workload Awareness},
eprint={2305.10863},
year = {2023}
}
```