{"id":22231449,"url":"https://github.com/veritasyin/subg_acc","last_synced_at":"2025-08-23T06:35:29.809Z","repository":{"id":93523567,"uuid":"379368822","full_name":"VeritasYin/subg_acc","owner":"VeritasYin","description":"SubG is a C/OpenMP-based library for accelerating subgraph operations in Python.","archived":false,"fork":false,"pushed_at":"2024-12-31T02:04:56.000Z","size":1764,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-06-26T01:02:15.206Z","etag":null,"topics":["graph-representation-learning","parallel-computing","scalable-graph-learning","subgraph"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/VeritasYin.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-06-22T18:42:50.000Z","updated_at":"2025-06-23T07:33:55.000Z","dependencies_parsed_at":"2025-01-30T08:25:24.903Z","dependency_job_id":"1f8ccce2-9a06-4997-a7ca-8cbf468ef3f2","html_url":"https://github.com/VeritasYin/subg_acc","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/VeritasYin/subg_acc","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VeritasYin%2Fsubg_acc","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VeritasYin%2Fsubg_acc/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VeritasYin%2Fsubg_acc/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VeritasYin%2Fsubg_acc/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/VeritasYin","download_url":"https://codeload.github.com/VeritasYin/subg_acc/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VeritasYin%2Fsubg_acc/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271745679,"owners_count":24813521,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-23T02:00:09.327Z","response_time":69,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["graph-representation-learning","parallel-computing","scalable-graph-learning","subgraph"],"created_at":"2024-12-03T01:26:34.632Z","updated_at":"2025-08-23T06:35:29.759Z","avatar_url":"https://github.com/VeritasYin.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"# **SubG**: Subgraph Operation Accelerator\n\u003cp align=\"center\"\u003e\n    \u003ca href=\"https://github.com/VeritasYin/subg_acc/blob/master/LICENSE\"\u003e\u003cimg src=\"https://img.shields.io/badge/License-BSD%202--Clause-red.svg\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://github.com/VeritasYin/subg_acc/blob/master/setup.py\"\u003e\u003cimg src=\"https://img.shields.io/badge/Version-v2.3-orange\" alt=\"Version\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://hits.seeyoufarm.com\"\u003e\u003cimg src=\"https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Fgithub.com%2FVeritasYin%2Fsubg_acc\u0026count_bg=%2379C83D\u0026title_bg=%23555555\u0026icon=\u0026icon_color=%23E7E7E7\u0026title=Hits\u0026edge_flat=false\"/\u003e\u003c/a\u003e\n\u003c/p\u003e\n\nThe `SubG` package is an extension library based on C and OpenMP to accelerate subgraph operations for building structural features and subgraph-based graph representation learning (SGRL). \n\nFollow the principles of algorithm system co-design, subgraph queries (e.g. ego-network in canonical SGRLs) of target links, motifs, and high-order patterns can be decomposed into node-level intermediate results (e.g. collection of walks by `walk_sampler` in [SUREL](https://arxiv.org/abs/2202.13538), set of nodes by `gset_sampler` in [SUREL+](https://github.com/VeritasYin/SUREL_Plus/blob/main/manuscript/SUREL_Plus_Full.pdf)), whose joint can act as proxies of subgraphs, and be reused among different queries.\n\nCurrently, `SubG` consists of the following methods for efficient and scalable implementation of SGRLs:\n\n- `gset_sampler` node set sampling with structure encoder of landing probability (LP) \n- `walk_sampler` walk sampling with relative positional encoder (RPE)\n- `batch_sampler` query sampling (a group of nodes) for mini-batch training of link prediction\n- `walk_join` online joining of node-level walks to construct the proxy of subgraph for given queries (e.g. a link query $Q= \\lbrace u,v \\rbrace$ $\\to$ join sampled walks of node $u$ and $v$ as $\\mathcal{G}_{Q} = \\lbrace W_u \\uplus W_v \\rbrace$)\n\n## Update\n**Dec. 30, 2024**\n* Release v2.3 with bug fixes and improved memory efficiency\n* Support pip install \u0026 MacOS\n\n**Feb. 25, 2023**:\n* Release v2.2 with more robust memory management of allocation, release and indexing (billion edges).\n* Add bitwise-based hash for encoding structural features.\n* Add test cases and script of wall time measure.\n\n**Jan. 29, 2023**:\n* Release v2.1 with refactored code base.\n* More robust memory accessing with buffer for set sampler on large graphs (million nodes).\n\n**Jan. 28, 2023**:\n* Release v2.0 with the walk-based set sampler `gset_sampler`.\n\n## Requirements\n(Other versions may work, but are untested)\n\n- python \u003e= 3.8\n- numpy \u003e= 1.17\n- gcc \u003e= 8.4\n- openmp (for MacOS, install llvm-openmp via Conda)\n\n## Installation\n```\npip install .\n```\n\n## Functions\n\n### walk_sampler\n\n```\nsubg.gset_sampler(indptr, indices, query, num_walks, num_steps) \n-\u003e (numpy.array [n, num_walks*(num_steps+1)], n * (numpy.array [?], numpy.array [?,num_steps+1]))\n```\n\nSample a collection of paths for each node in `query` (size of `n`) through `num_walks`-many `num_steps`-step random walks on the input graph in CSR format (`indptr`, `indices`), and encodes landing probability at each step of all nodes in the sampled set as structural features of the seed node. \n\nFor usage examples, see [test.py](https://github.com/VeritasYin/subg_acc/blob/master/test/test.py).\n\n#### Parameters\n\n* **indptr** *(np.array)* - Index pointer array of the adjacency matrix in CSR format.\n* **indices** *(np.array)* - Index array of the adjacency matrix in CSR format.\n* **query** *(np.array / list)* - Nodes are queried to be sampled.\n* **num_walks** *(int)* - The number of random walks.\n* **num_steps** *(int)* - The number of steps in a walk.\n* **nthread** *(int, optional)* - The number of threads.\n* **seed** *(int, optional)* - Random seed.\n\n#### Returns\n\n* **walks** *(np.array)* - Sampled walks $W_q$ for each node in `query`.\n* **rpes** *(np.array, np.array)* - Unique node set of sampled walks for each node in `query` and their corresponding structural encodings.\n\n### gset_sampler\n\n```\nsubg.gset_sampler(indptr, indices, query, num_walks, num_steps) \n-\u003e (numpy.array [n], numpy.array [2,?], numpy.array [?,num_steps+1])\n```\n\nSample a node set for each node in `query` (size of `n`) through `num_walks`-many `num_steps`-step random walks on the input graph in CSR format (`indptr`, `indices`), and encodes landing probability at each step of all nodes in the sampled set as structural features of the seed node. \n\nFor usage examples, see [test.py](https://github.com/VeritasYin/subg_acc/blob/master/test/test.py).\n\n#### Parameters\n\n* **indptr** *(np.array)* - Index pointer array of the adjacency matrix in CSR format.\n* **indices** *(np.array)* - Index array of the adjacency matrix in CSR format.\n* **query** *(np.array / list)* - Nodes are queried to be sampled.\n* **num_walks** *(int)* - The number of random walks.\n* **num_steps** *(int)* - The number of steps in a walk.\n* **bucket** *(int, optional)* - The buffer size for sampled neighbors per node.\n* **nthread** *(int, optional)* - The number of threads.\n* **seed** *(int, optional)* - Random seed.\n\n#### Returns\n\n* **nsize** *(np.array)* - The size of sampled set for each node in `query`.\n* **remap** *(np.array)* - Pairwised node id and the index of its associated structural encoding in `enc` array.\n* **enc** *(np.array)* - The compressed (unique) encoding of structural features.\n\n### walk_join\n\n```\nsubg.walk_join(walks, indices, query) \n-\u003e (numpy.array [2,n*num_walks*(num_steps+1)*2])\n```\nJoin the sampled walks for nodes in each `query` (size of `n`). For a link query $Q= \\lbrace u,v \\rbrace$, `walk_join` returns the indices of structural features for $u$ as $W_{u|u} \\bigoplus W_{u|v}$ and for $v$ as $W_{v|u} \\bigoplus W_{v|v}$. \n\nFor usage examples, see [test.py](https://github.com/VeritasYin/subg_acc/blob/master/test/test.py).\n\n#### Parameters\n\n* **indptr** *(np.array)* - Index pointer array of the adjacency matrix in CSR format.\n* **indices** *(np.array)* - Index array of the adjacency matrix in CSR format.\n* **query** *(np.array / list)* - Nodes are queried to be sampled.\n* **nthread** *(int, optional)* - The number of threads.\n\n#### Returns\n\n* **join_walk** *(np.array)* - The indices of structural features attached to the joint walks of given queries.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fveritasyin%2Fsubg_acc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fveritasyin%2Fsubg_acc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fveritasyin%2Fsubg_acc/lists"}