https://github.com/veritasyin/subg_acc

SubG is a C/OpenMP-based library for accelerating subgraph operations in Python.
https://github.com/veritasyin/subg_acc

graph-representation-learning parallel-computing scalable-graph-learning subgraph

Last synced: about 22 hours ago
JSON representation

SubG is a C/OpenMP-based library for accelerating subgraph operations in Python.

Host: GitHub
URL: https://github.com/veritasyin/subg_acc
Owner: VeritasYin
License: bsd-2-clause
Created: 2021-06-22T18:42:50.000Z (about 4 years ago)
Default Branch: master
Last Pushed: 2024-12-31T02:04:56.000Z (6 months ago)
Last Synced: 2025-06-26T01:02:15.206Z (about 22 hours ago)
Topics: graph-representation-learning, parallel-computing, scalable-graph-learning, subgraph
Language: C
Homepage:
Size: 1.68 MB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # **SubG**: Subgraph Operation Accelerator



    

    

    



The `SubG` package is an extension library based on C and OpenMP to accelerate subgraph operations for building structural features and subgraph-based graph representation learning (SGRL). 

Follow the principles of algorithm system co-design, subgraph queries (e.g. ego-network in canonical SGRLs) of target links, motifs, and high-order patterns can be decomposed into node-level intermediate results (e.g. collection of walks by `walk_sampler` in [SUREL](https://arxiv.org/abs/2202.13538), set of nodes by `gset_sampler` in [SUREL+](https://github.com/VeritasYin/SUREL_Plus/blob/main/manuscript/SUREL_Plus_Full.pdf)), whose joint can act as proxies of subgraphs, and be reused among different queries.

Currently, `SubG` consists of the following methods for efficient and scalable implementation of SGRLs:

- `gset_sampler` node set sampling with structure encoder of landing probability (LP) 

- `walk_sampler` walk sampling with relative positional encoder (RPE)

- `batch_sampler` query sampling (a group of nodes) for mini-batch training of link prediction

- `walk_join` online joining of node-level walks to construct the proxy of subgraph for given queries (e.g. a link query $Q= \lbrace u,v \rbrace$ $\to$ join sampled walks of node $u$ and $v$ as $\mathcal{G}_{Q} = \lbrace W_u \uplus W_v \rbrace$)

## Update

**Dec. 30, 2024**

* Release v2.3 with bug fixes and improved memory efficiency

* Support pip install & MacOS

**Feb. 25, 2023**:

* Release v2.2 with more robust memory management of allocation, release and indexing (billion edges).

* Add bitwise-based hash for encoding structural features.

* Add test cases and script of wall time measure.

**Jan. 29, 2023**:

* Release v2.1 with refactored code base.

* More robust memory accessing with buffer for set sampler on large graphs (million nodes).

**Jan. 28, 2023**:

* Release v2.0 with the walk-based set sampler `gset_sampler`.

## Requirements

(Other versions may work, but are untested)

- python >= 3.8

- numpy >= 1.17

- gcc >= 8.4

- openmp (for MacOS, install llvm-openmp via Conda)

## Installation

```

pip install .

```

## Functions

### walk_sampler

```

subg.gset_sampler(indptr, indices, query, num_walks, num_steps) 

-> (numpy.array [n, num_walks*(num_steps+1)], n * (numpy.array [?], numpy.array [?,num_steps+1]))

```

Sample a collection of paths for each node in `query` (size of `n`) through `num_walks`-many `num_steps`-step random walks on the input graph in CSR format (`indptr`, `indices`), and encodes landing probability at each step of all nodes in the sampled set as structural features of the seed node. 

For usage examples, see [test.py](https://github.com/VeritasYin/subg_acc/blob/master/test/test.py).

#### Parameters

* **indptr** *(np.array)* - Index pointer array of the adjacency matrix in CSR format.

* **indices** *(np.array)* - Index array of the adjacency matrix in CSR format.

* **query** *(np.array / list)* - Nodes are queried to be sampled.

* **num_walks** *(int)* - The number of random walks.

* **num_steps** *(int)* - The number of steps in a walk.

* **nthread** *(int, optional)* - The number of threads.

* **seed** *(int, optional)* - Random seed.

#### Returns

* **walks** *(np.array)* - Sampled walks $W_q$ for each node in `query`.

* **rpes** *(np.array, np.array)* - Unique node set of sampled walks for each node in `query` and their corresponding structural encodings.

### gset_sampler

```

subg.gset_sampler(indptr, indices, query, num_walks, num_steps) 

-> (numpy.array [n], numpy.array [2,?], numpy.array [?,num_steps+1])

```

Sample a node set for each node in `query` (size of `n`) through `num_walks`-many `num_steps`-step random walks on the input graph in CSR format (`indptr`, `indices`), and encodes landing probability at each step of all nodes in the sampled set as structural features of the seed node. 

For usage examples, see [test.py](https://github.com/VeritasYin/subg_acc/blob/master/test/test.py).

#### Parameters

* **indptr** *(np.array)* - Index pointer array of the adjacency matrix in CSR format.

* **indices** *(np.array)* - Index array of the adjacency matrix in CSR format.

* **query** *(np.array / list)* - Nodes are queried to be sampled.

* **num_walks** *(int)* - The number of random walks.

* **num_steps** *(int)* - The number of steps in a walk.

* **bucket** *(int, optional)* - The buffer size for sampled neighbors per node.

* **nthread** *(int, optional)* - The number of threads.

* **seed** *(int, optional)* - Random seed.

#### Returns

* **nsize** *(np.array)* - The size of sampled set for each node in `query`.

* **remap** *(np.array)* - Pairwised node id and the index of its associated structural encoding in `enc` array.

* **enc** *(np.array)* - The compressed (unique) encoding of structural features.

### walk_join

```

subg.walk_join(walks, indices, query) 

-> (numpy.array [2,n*num_walks*(num_steps+1)*2])

```

Join the sampled walks for nodes in each `query` (size of `n`). For a link query $Q= \lbrace u,v \rbrace$, `walk_join` returns the indices of structural features for $u$ as $W_{u|u} \bigoplus W_{u|v}$ and for $v$ as $W_{v|u} \bigoplus W_{v|v}$. 

For usage examples, see [test.py](https://github.com/VeritasYin/subg_acc/blob/master/test/test.py).

#### Parameters

* **indptr** *(np.array)* - Index pointer array of the adjacency matrix in CSR format.

* **indices** *(np.array)* - Index array of the adjacency matrix in CSR format.

* **query** *(np.array / list)* - Nodes are queried to be sampled.

* **nthread** *(int, optional)* - The number of threads.

#### Returns

* **join_walk** *(np.array)* - The indices of structural features attached to the joint walks of given queries.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/veritasyin/subg_acc

Awesome Lists containing this project

README