{"id":18800571,"url":"https://github.com/xtra-computing/thunderrw","last_synced_at":"2025-09-23T04:05:20.741Z","repository":{"id":86257287,"uuid":"363801585","full_name":"Xtra-Computing/ThunderRW","owner":"Xtra-Computing","description":"Source code of \"ThunderRW: An In-Memory Graph Random Walk Engine\" published in VLDB'2021 - By Shixuan Sun, Yuhang Chen, Shengliang Lu, Bingsheng He and Yuchen Li","archived":false,"fork":false,"pushed_at":"2021-08-15T05:51:28.000Z","size":4699,"stargazers_count":26,"open_issues_count":1,"forks_count":6,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-27T08:22:28.355Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Xtra-Computing.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-05-03T02:55:38.000Z","updated_at":"2024-10-16T14:31:23.000Z","dependencies_parsed_at":"2023-06-13T20:07:01.481Z","dependency_job_id":null,"html_url":"https://github.com/Xtra-Computing/ThunderRW","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Xtra-Computing%2FThunderRW","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Xtra-Computing%2FThunderRW/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Xtra-Computing%2FThunderRW/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Xtra-Computing%2FThunderRW/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Xtra-Computing","download_url":"https://codeload.github.com/Xtra-Computing/ThunderRW/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248752375,"owners_count":21156080,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-07T22:19:08.481Z","updated_at":"2025-09-23T04:05:20.581Z","avatar_url":"https://github.com/Xtra-Computing.png","language":"C++","readme":"# ThunderRW: An In-Memory Graph Random Walk Engine\n\n## Introduction\n\nAs random walk is a powerful tool in many graph processing, mining and\nlearning applications, this paper proposes an efficient in-memory random\nwalk engine named ThunderRW. Compared with existing parallel systems on\nimproving the performance of a single graph operation, ThunderRW supports\nmassive parallel random walks. The core design of ThunderRW is motivated\nby our profiling results: common RW algorithms have as high as 73.1% CPU\npipeline slots stalled due to irregular memory access, which suffers\nsignificantly more memory stalls than the conventional graph workloads\nsuch as BFS and SSSP. To improve the memory efficiency, we first design a\ngeneric step-centric programming model named Gather-Move-Update to abstract\ndifferent RW algorithms. Based on the programming model, we develop the step\ninterleaving technique to hide memory access latency by switching the executions\nof different random walk queries. In our experiments, we use four representative\nRW algorithms including PPR, DeepWalk, Node2Vec and MetaPath to demonstrate the\nefficiency and programming flexibility of ThunderRW. Experimental results show\nthat ThunderRW outperforms state-of-the-art approaches by an order of magnitude,\nand the step interleaving technique significantly reduces the CPU pipeline stall\nfrom 73.1% to 15.0%.\n\nFor the details, please refer to our VLDB'2021 paper\n\"ThunderRW: An In-Memory Graph Random Walk Engine\"\nby [Shixuan Sun](https://shixuansun.github.io/), [Yuhang Chen](https://alexcyh7.github.io/),\n[Shengliang Lu](https://github.com/lushl9301), [Bingsheng He](https://www.comp.nus.edu.sg/~hebs/) and\n[Yuchen Li](http://yuchenli.net/). If you have any further questions, please feel free to contact us.\n\nPlease cite our paper, if you use our source code.\n\n* \"Shixuan Sun, Yuhang Chen, Shengliang Lu, Bingsheng He and Yuchen Li.\nThunderRW: An In-Memory Graph Random Walk Engine. VLDB 2021.\"\n\n\n## Compile\n\nUnder the root directory of the project, execute the following commands to compile the source code\n\n```zsh\nmkdir build\ncd build\ncmake ..\nmake\n```\n\n## Preprocessing\n\nThunderRW takes a graph stored in the CSR format as the input. It supports directed,\nedge labeled and edge weighted graphs. Particularly, it requires that\nthe graph data is stored as follows: 1) the vertex ID is ranged from 0 to N-1\nwhere N is the number of vertices in the graph; 2) the graph data such as\nthe graph structure, the edge label and the edge weight is stored in the same\nfolder; 3) b_degree.bin is an int32_t array recording the degree of each vertex where\nthe first element is unused, the second element records the number of vertices,\nand the following elements are vertex degrees; 4) b_adj.bin is an int32_t array\nrecording the neighbors of each vertex (i.e., the neighbor array in CSR); 5)\nb_edge_label is an int32_t array recording the label of each edge with the\none-to-one relationship to the b_adj.bin; and 6) b_edge_weight.bin is a double\narray recording the weight of each edge with the one-to-one relationship to the\nb_adj.bin. Therefore, you need to convert the graph file into the type that\ncan be consumed by ThunderRW.\n\nWe provide the script `prepare_data.sh` to convert the edge list file. Before\nrunning our converting tool, you must change your graph file name\ninto `b_edge_list.bin`. In this repository, we use the amazon dataset as\nthe running example. The edge list file is under `sample_data/amazon` folder.\nIn the script, `dataset_path` sets the dataset root path, `array` stores\nthe dataset name list and `skip_char` configures the skip character. Execute\nthe following command to convert the data.\n\n```zsh\n./prepare_data.sh\n```\n\nYou will see the following files in the data folder: `b_degree.bin`,\n`b_adj.bin`, `b_edge_label.bin` and `b_edge_weight.bin`.\nThe tool converts the edge list to undirected graph and randomly assigned a label and a weight to each edge.\nWe use undirected graphs as the input to keep the workload of random walks queries nearly the same in our experiments.\n\n## Execution\n\nWe implement four algorithms with ThunderRW, which include PPR, DeepWalk,\nNode2Vec and MetaPath. The source files are under `random_walk/apps` folder.\nThe parameters have been configured in the source files. You can rewrite\nthe default setting (e.g., the number of walkers, the sampling\nmethod and the execution mode) in the source files or through parameters with the\ninstructions in the files. Here, we demonstrate the input of graph data.\n\nExecute PPR with the following command. `-f` sets the input graph folder and\n`-n` sets the number of threads.\n\n```zsh\n./build/random_walk/ppr.out -f sample_dataset/amazon -n 10\n```\n\nExecute DeepWalk with the following command. `-ew` is to load the edge\nweight array into main memory.\n\n```zsh\n./build/random_walk/deepwalk.out -f sample_dataset/amazon -n 10 -ew\n```\n\nExecute Node2Vec with the following command.\n\n```zsh\n./build/random_walk/node2vec.out -f sample_dataset/amazon -n 10 -ew\n```\n\nExecute MetaPath with the following command. `-el` is to load the\nedge label array into main memory. `-s` is to set the meta path schema.\n\n```zsh\n./build/random_walk/metapath.out -f sample_dataset/amazon -n 10 -el -s 0,1,2,3,4\n```\n\n## Configuration\n\nIn `random_walk/types.h`, you can disable the step interleaving technique\nby commenting out `ENABLE_STEPINTERLEAVING`. The ring size k and k´ can be configured\nby `RING_SIZE` and `SEARCH_RING_SIZE`.\n\n## Experiment Datasets\n\nYou can download the graphs used in our paper by following the\ninstructions in Section 6.1.","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxtra-computing%2Fthunderrw","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fxtra-computing%2Fthunderrw","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxtra-computing%2Fthunderrw/lists"}