Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/DeepGraphLearning/graphvite
GraphVite: A General and High-performance Graph Embedding System
https://github.com/DeepGraphLearning/graphvite
cuda data-visualization gpu knowledge-graph machine-learning network-embedding representation-learning
Last synced: 3 months ago
JSON representation
GraphVite: A General and High-performance Graph Embedding System
- Host: GitHub
- URL: https://github.com/DeepGraphLearning/graphvite
- Owner: DeepGraphLearning
- License: apache-2.0
- Created: 2019-07-16T15:48:20.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2024-06-14T21:18:09.000Z (5 months ago)
- Last Synced: 2024-06-14T22:34:32.498Z (5 months ago)
- Topics: cuda, data-visualization, gpu, knowledge-graph, machine-learning, network-embedding, representation-learning
- Language: C++
- Homepage: https://graphvite.io
- Size: 5.4 MB
- Stars: 1,198
- Watchers: 32
- Forks: 150
- Open Issues: 53
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
- awesome-python-machine-learning-resources - GitHub - 42% open · ⏱️ 14.01.2021): (图数据处理)
- awesome-document-similarity - GraphVite - graph embedding at high speed and large scale
- StarryDivineSky - DeepGraphLearning/graphvite
README
![GraphVite logo](asset/logo/logo.png)
GraphVite - graph embedding at high speed and large scale
=========================================================[![Install with conda](https://anaconda.org/milagraph/graphvite/badges/version.svg)][conda]
[![License](https://anaconda.org/milagraph/graphvite/badges/license.svg)][license]
[![Downloads](https://anaconda.org/milagraph/graphvite/badges/downloads.svg)][conda][conda]: https://anaconda.org/milagraph/graphvite
[license]: LICENSE[Docs] | [Tutorials] | [Benchmarks] | [Pre-trained Models]
[Docs]: https://graphvite.io/docs/latest/api/application
[Tutorials]: https://graphvite.io/tutorials
[Benchmarks]: https://graphvite.io/docs/latest/benchmark
[Pre-trained Models]: https://graphvite.io/docs/latest/pretrained_modelGraphVite is a general graph embedding engine, dedicated to high-speed and
large-scale embedding learning in various applications.GraphVite provides complete training and evaluation pipelines for 3 applications:
**node embedding**, **knowledge graph embedding** and
**graph & high-dimensional data visualization**. Besides, it also includes 9 popular
models, along with their benchmarks on a bunch of standard datasets.
Node Embedding
Knowledge Graph Embedding
Graph & High-dimensional Data Visualization
Here is a summary of the training time of GraphVite along with the best open-source
implementations on 3 applications. All the time is reported based on a server with
24 CPU threads and 4 V100 GPUs.Training time of node embedding on [Youtube] dataset.
| Model | Existing Implementation | GraphVite | Speedup |
|------------|-------------------------------|-----------|---------|
| [DeepWalk] | [1.64 hrs (CPU parallel)][1] | 1.19 mins | 82.9x |
| [LINE] | [1.39 hrs (CPU parallel)][2] | 1.17 mins | 71.4x |
| [node2vec] | [24.4 hrs (CPU parallel)][3] | 4.39 mins | 334x |[Youtube]: http://conferences.sigcomm.org/imc/2007/papers/imc170.pdf
[DeepWalk]: https://arxiv.org/pdf/1403.6652.pdf
[LINE]: https://arxiv.org/pdf/1503.03578.pdf
[node2vec]: https://www.kdd.org/kdd2016/papers/files/rfp0218-groverA.pdf
[1]: https://github.com/phanein/deepwalk
[2]: https://github.com/tangjianpku/LINE
[3]: https://github.com/aditya-grover/node2vecTraining / evaluation time of knowledge graph embedding on [FB15k] dataset.
| Model | Existing Implementation | GraphVite | Speedup |
|-----------------|-----------------------------------|--------------------|---------------|
| [TransE] | [1.31 hrs / 1.75 mins (1 GPU)][3] | 13.5 mins / 54.3 s | 5.82x / 1.93x |
| [RotatE] | [3.69 hrs / 4.19 mins (1 GPU)][4] | 28.1 mins / 55.8 s | 7.88x / 4.50x |[FB15k]: http://papers.nips.cc/paper/5071-translating-embeddings-for-modeling-multi-relational-data.pdf
[TransE]: http://papers.nips.cc/paper/5071-translating-embeddings-for-modeling-multi-relational-data.pdf
[RotatE]: https://arxiv.org/pdf/1902.10197.pdf
[3]: https://github.com/DeepGraphLearning/KnowledgeGraphEmbedding
[4]: https://github.com/DeepGraphLearning/KnowledgeGraphEmbeddingTraining time of high-dimensional data visualization on [MNIST] dataset.
| Model | Existing Implementation | GraphVite | Speedup |
|--------------|-------------------------------|-----------|---------|
| [LargeVis] | [15.3 mins (CPU parallel)][5] | 13.9 s | 66.8x |[MNIST]: http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf
[LargeVis]: https://arxiv.org/pdf/1602.00370.pdf
[5]: https://github.com/lferry007/LargeVisRequirements
------------Generally, GraphVite works on any Linux distribution with CUDA >= 9.2.
The library is compatible with Python 2.7 and 3.6/3.7.
Installation
------------### From Conda ###
```bash
conda install -c milagraph -c conda-forge graphvite cudatoolkit=$(nvcc -V | grep -Po "(?<=V)\d+.\d+")
```If you only need embedding training without evaluation, you can use the following
alternative with minimal dependencies.```bash
conda install -c milagraph -c conda-forge graphvite-mini cudatoolkit=$(nvcc -V | grep -Po "(?<=V)\d+.\d+")
```### From Source ###
Before installation, make sure you have `conda` installed.
```bash
git clone https://github.com/DeepGraphLearning/graphvite
cd graphvite
conda install -y --file conda/requirements.txt
mkdir build
cd build && cmake .. && make && cd -
cd python && python setup.py install && cd -
```### On Colab ###
```bash
!wget -c https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
!chmod +x Miniconda3-latest-Linux-x86_64.sh
!./Miniconda3-latest-Linux-x86_64.sh -b -p /usr/local -f!conda install -y -c milagraph -c conda-forge graphvite \
python=3.6 cudatoolkit=$(nvcc -V | grep -Po "(?<=V)\d+\.\d+")
!conda install -y wurlitzer ipykernel
``````python
import site
site.addsitedir("/usr/local/lib/python3.6/site-packages")
%reload_ext wurlitzer
```Quick Start
-----------Here is a quick-start example of the node embedding application.
```bash
graphvite baseline quick start
```Typically, the example takes no more than 1 minute. You will obtain some output like
```
Batch id: 6000
loss = 0.371041------------- link prediction --------------
AUC: 0.899933----------- node classification ------------
macro-F1@20%: 0.242114
micro-F1@20%: 0.391342
```Baseline Benchmark
------------------To reproduce a baseline benchmark, you only need to specify the keywords of the
experiment. e.g. model and dataset.```bash
graphvite baseline [keyword ...] [--no-eval] [--gpu n] [--cpu m] [--epoch e]
```You may also set the number of GPUs and the number of CPUs per GPU.
Use ``graphvite list`` to get a list of available baselines.
Custom Experiment
-----------------Create a yaml configuration scaffold for graph, knowledge graph, visualization or
word graph.```bash
graphvite new [application ...] [--file f]
```Fill some necessary entries in the configuration following the instructions. You
can run the configuration by```bash
graphvite run [config] [--no-eval] [--gpu n] [--cpu m] [--epoch e]
```High-dimensional Data Visualization
-----------------------------------You can visualize your high-dimensional vectors with a simple command line in
GraphVite.```bash
graphvite visualize [file] [--label label_file] [--save save_file] [--perplexity n] [--3d]
```The file can be either a numpy dump `*.npy` or a text matrix `*.txt`. For the save
file, we recommend to use `png` format, while `pdf` is also supported.Contributing
------------We welcome all contributions from bug fixs to new features. Please let us know if you
have any suggestion to our library.Development Team
----------------GraphVite is developed by [MilaGraph], led by Prof. [Jian Tang].
Authors of this project are [Zhaocheng Zhu], [Shizhen Xu], [Meng Qu] and [Jian Tang].
Contributors include [Kunpeng Wang] and [Zhijian Duan].[MilaGraph]: https://github.com/DeepGraphLearning
[Zhaocheng Zhu]: https://kiddozhu.github.io
[Shizhen Xu]: https://github.com/xsz
[Meng Qu]: https://mnqu.github.io
[Jian Tang]: https://jian-tang.com
[Kunpeng Wang]: https://github.com/Kwinpeng
[Zhijian Duan]: https://github.com/zjduanCitation
--------If you find GraphVite useful for your research or development, please cite the
following [paper].[paper]: https://arxiv.org/pdf/1903.00757.pdf
```
@inproceedings{zhu2019graphvite,
title={GraphVite: A High-Performance CPU-GPU Hybrid System for Node Embedding},
author={Zhu, Zhaocheng and Xu, Shizhen and Qu, Meng and Tang, Jian},
booktitle={The World Wide Web Conference},
pages={2494--2504},
year={2019},
organization={ACM}
}
```Acknowledgements
----------------We would like to thank Compute Canada for supporting GPU servers. We specially thank
Wenbin Hou for useful discussions on C++ and GPU programming techniques.