{"id":27099542,"url":"https://github.com/puzzlef/pagerank-cuda","last_synced_at":"2026-02-03T13:31:05.630Z","repository":{"id":109077789,"uuid":"374990003","full_name":"puzzlef/pagerank-cuda","owner":"puzzlef","description":"Design of CUDA-based PageRank algorithm for link analysis.","archived":false,"fork":false,"pushed_at":"2025-04-08T18:00:08.000Z","size":547,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-09-05T15:28:43.741Z","etag":null,"topics":["algorithm","block","config","cuda","graph","launch","pagerank","point","switch","switched","thread","vertex"],"latest_commit_sha":null,"homepage":"https://gist.github.com/wolfram77/4ef16ab9699ac03a617b8731dd240e1f","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/puzzlef.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-06-08T11:49:11.000Z","updated_at":"2025-04-08T18:00:12.000Z","dependencies_parsed_at":"2023-05-31T21:16:01.848Z","dependency_job_id":null,"html_url":"https://github.com/puzzlef/pagerank-cuda","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":"puzzlef/pagerank-openmp","purl":"pkg:github/puzzlef/pagerank-cuda","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/puzzlef%2Fpagerank-cuda","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/puzzlef%2Fpagerank-cuda/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/puzzlef%2Fpagerank-cuda/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/puzzlef%2Fpagerank-cuda/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/puzzlef","download_url":"https://codeload.github.com/puzzlef/pagerank-cuda/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/puzzlef%2Fpagerank-cuda/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29046558,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-03T10:09:22.136Z","status":"ssl_error","status_checked_at":"2026-02-03T10:09:16.814Z","response_time":96,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["algorithm","block","config","cuda","graph","launch","pagerank","point","switch","switched","thread","vertex"],"created_at":"2025-04-06T12:36:11.301Z","updated_at":"2026-02-03T13:31:05.623Z","avatar_url":"https://github.com/puzzlef.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"Design of **CUDA-based** *PageRank algorithm* for link analysis.\n\nAll *seventeen* graphs used in below experiments are\nstored in the *MatrixMarket (.mtx)* file format, and obtained from the\n*SuiteSparse* *Matrix Collection*. These include: *web-Stanford, web-BerkStan,*\n*web-Google, web-NotreDame, soc-Slashdot0811, soc-Slashdot0902,*\n*soc-Epinions1, coAuthorsDBLP, coAuthorsCiteseer, soc-LiveJournal1,*\n*coPapersCiteseer, coPapersDBLP, indochina-2004, italy_osm,*\n*great-britain_osm, germany_osm, asia_osm*. The experiments are implemented\nin *C++*, and compiled using *GCC 9* with *optimization level 3 (-O3)*.\nThe *iterations* taken with each test case is measured. `500` is the\n*maximum iterations* allowed. Statistics of each test case is\nprinted to *standard output (stdout)*, and redirected to a *log file*,\nwhich is then processed with a *script* to generate a *CSV file*, with\neach *row* representing the details of a *single test case*.\n\n\u003cbr\u003e\n\n\n### Finding Launch config for Block-per-vertex\n\nThis experiment ([block-adjust-launch]) was for finding a suitable **launch**\n**config** for **CUDA block-per-vertex**. For the launch config, the\n**block-size** (threads) was adjusted from `32`-`1024`, and the **grid-limit**\n(max grid-size) was adjusted from `1024`-`32768`. Each config was run 5 times\nper graph to get a good time measure.\n\n`MAXx64` appears to be a good config for most graphs. Here `MAX` is the\n**grid-limit**, and `64` is the **block-size**. This launch config is for the\nentire graph, and could be slightly different for subset of graphs. Also note\nthat this applies to *Tesla V100 PCIe 16GB*, and could be different for other\nGPUs. In order to measure error, [nvGraph] pagerank is taken as a reference.\n\n[block-adjust-launch]: https://github.com/puzzlef/pagerank-cuda/tree/block-adjust-launch\n\n\u003cbr\u003e\n\n\n### Finding Launch config for Thread-per-vertex\n\nThis experiment ([thread-adjust-launch]) was for finding a suitable **launch**\n**config** for **CUDA thread-per-vertex**. For the launch config, the\n**block-size** (threads) was adjusted from `32`-`1024`, and the **grid-limit**\n(max grid-size) was adjusted from `1024`-`32768`. Each config was run 5 times\nper graph to get a good time measure.\n\nOn average, the launch config doesn't seem to have a good enough impact on\nperformance. However `8192x128` appears to be a good config. Here `8192` is the\n*grid-limit*, and `128` is the *block-size*. Comparing with [graph properties],\nseems it would be better to use `8192x512` for graphs with **high** *avg.\ndensity*, and `8192x32` for graphs with **high** *avg. degree*. Maybe, sorting\nthe vertices by degree can have a good effect (due to less warp divergence).\nNote that this applies to **Tesla V100 PCIe 16GB**, and would be different for\nother GPUs. In order to measure error, [nvGraph] pagerank is taken as a\nreference.\n\n[thread-adjust-launch]: https://github.com/puzzlef/pagerank-cuda/tree/thread-adjust-launch\n\n\u003cbr\u003e\n\n\n### Sorting vertices by in-degree for Block-per-vertex?\n\nThis experiment ([block-sort-by-indegree]) was for finding the effect of sorting\nvertices and/or edges by in-degree for CUDA **block-per-vertex** based PageRank.\nFor this experiment, sorting of vertices and/or edges was either `NO`, `ASC`, or\n`DESC`. This gives a total of `3 * 3 = 9` cases. Each case is run on multiple\ngraphs, running each 5 times per graph for good time measure.\n\nResults show that sorting in *most cases* is **not faster**. In fact, in a\nnumber of cases, sorting actually slows dows performance. Maybe (just maybe)\nthis is because sorted arrangement tend to overflood certain memory chunks with\ntoo many requests. In order to measure error, [nvGraph] pagerank is taken as a\nreference.\n\n[block-sort-by-indegree]: https://github.com/puzzlef/pagerank-cuda/tree/block-sort-by-indegree\n\n\u003cbr\u003e\n\n\n### Sorting vertices by in-degree for Thread-per-vertex?\n\nThis experiment ([thread-sort-by-indegree]) was for finding the effect of\nsorting vertices and/or edges by in-degree for CUDA **thread-per-vertex** based\nPageRank. For this experiment, sorting of vertices and/or edges was either `NO`,\n`ASC`, or `DESC`. This gives a total of `3 * 3 = 9` cases. Each case is run on\nmultiple graphs, running each 5 times per graph for good time measure.\n\nResults show that sorting in *most cases* is **slower**. Maybe this is because\nsorted arrangement tends to overflood certain memory chunks with too many\nrequests. In order to measure error, [nvGraph] pagerank is taken as a reference.\n\n[thread-sort-by-indegree]: https://github.com/puzzlef/pagerank-cuda/tree/thread-sort-by-indegree\n\n\u003cbr\u003e\n\n\n### Sorting vertices by in-degree for Switched-per-vertex?\n\nThis experiment ([switched-sort-by-indegree]) was for finding the effect of\nsorting vertices and/or edges by in-degree for CUDA **switched-per-vertex**\nbased PageRank. For this experiment, sorting of vertices and/or edges was either\n`NO`, `ASC`, or `DESC`. This gives a total of `3 * 3 = 9` cases. `NO` here means\nthat vertices are partitioned by in-degree (edges remain unchanged). Each case\nis run on multiple graphs, running each 5 times per graph for good time measure.\n\nResults show that **sorting** in most cases is **not faster**. Its better to\nsimply **partition** *vertices* by *degree*. In order to measure error,\n[nvGraph] pagerank is taken as a reference.\n\n[switched-sort-by-indegree]: https://github.com/puzzlef/pagerank-cuda/tree/switched-sort-by-indegree\n\n\u003cbr\u003e\n\n\n### Finding Block Launch config for Switched-per-vertex\n\nThis experiment ([switched-adjust-block-launch]) was for finding a suitable\n**launch config** for **CUDA switched-per-vertex** for block approach. For the\nlaunch config, the **block-size** (threads) was adjusted from `32`-`1024`, and\nthe **grid-limit** (max grid-size) was adjusted from `1024`-`32768`. Each config\nwas run 5 times per graph to get a good time measure.\n\n`MAXx256` appears to be a good config for most graphs. Here `MAX` is the\n*grid-limit*, and `256` is the *block-size*. Note that this applies to **Tesla**\n**V100 PCIe 16GB**, and would be different for other GPUs. In order to measure\nerror, [nvGraph] pagerank is taken as a reference.\n\n[switched-adjust-block-launch]: https://github.com/puzzlef/pagerank-cuda/tree/switched-adjust-block-launch\n\n\u003cbr\u003e\n\n\n### Finding Block Launch config for Switched-per-vertex\n\nThis experiment ([switched-adjust-thread-launch]) was for finding a suitable\n**launch config** for **CUDA switched-per-vertex** for thread approach. For the\nlaunch config, the **block-size** (threads) was adjusted from `32`-`1024`, and\nthe **grid-limit** (max grid-size) was adjusted from `1024`-`32768`. Each config\nwas run 5 times per graph to get a good time measure.\n\n`MAXx512` appears to be a good config for most graphs. Here `MAX` is the\n*grid-limit*, and `512` is the *block-size*. Note that this applies to **Tesla**\n**V100 PCIe 16GB**, and would be different for other GPUs. In order to measure\nerror, [nvGraph] pagerank is taken as a reference.\n\n[switched-adjust-thread-launch]: https://github.com/puzzlef/pagerank-cuda/tree/switched-adjust-thread-launch\n\n\u003cbr\u003e\n\n\n### Finding Switch point for Switched-per-vertex\n\nFor this experiment ([switched-adjust-switch-point]), `switch_degree` was varied\nfrom `2` - `1024`, and `switch_limit` was varied from `1` - `1024`.\n`switch_degree` defines the *in-degree* at which *pagerank kernel* switches from\n**thread-per-vertex** approach to **block-per-vertex**. `switch_limit` defines\nthe minimum block size for **thread-per-vertex** / **block-per-vertex** approach\n(if a block size is too small, it is merged with the other approach block). Each\ncase is run on multiple graphs, running each 5 times per graph for good time\nmeasure. It seems `switch_degree` of **64** and `switch_limit` of **32** would\nbe a good choice.\n\n[switched-adjust-switch-point]: https://github.com/puzzlef/pagerank-cuda/tree/switched-adjust-switch-point\n\n\u003cbr\u003e\n\n\n### Adjusting Per-iteration Rank scaling\n\n[nvGraph PageRank] appears to use [L2-norm per-iteration scaling]. This is\n(probably) required for finding a solution to **eigenvalue problem**. However,\nas the *eigenvalue* for PageRank is `1`, this is not necessary. This experiement\nwas for observing if this was indeed true, and that any such *per-iteration\nscaling* doesn't affect the number of *iterations* needed to converge.\n\nIn this experiment ([adjust-iteration-scaling]), PageRank was computed with\n**L1**, **L2**, or **L∞-norm** and the effect of **L1** or **L2-norm** *scaling*\n*of ranks* was compared with **baseline (L0)**. Results match the above\nassumptions, and indeed no performance benefit is observed (except a reduction\nin a single iteration for *soc-Slashdot0811*, *soc-Slashdot-0902*,\n*soc-LiveJournal1*, and *italy_osm* graphs).\n\n[adjust-iteration-scaling]: https://github.com/puzzlef/pagerank-cuda/tree/adjust-iteration-scaling\n[nvGraph PageRank]: https://github.com/rapidsai/nvgraph/blob/main/cpp/src/pagerank.cu\n[L2-norm per-iteration scaling]: https://github.com/rapidsai/nvgraph/blob/main/cpp/src/pagerank.cu#L145\n\n\u003cbr\u003e\n\n\n### Comparing with nvGraph PageRank\n\nThis experiment ([compare-nvgraph]) was for comparing the performance between\nfinding pagerank using [nvGraph], finding pagerank using **CUDA**, and finding\npagerank using a single thread ([sequential]). Each technique was attempted on\ndifferent types of graphs, running each technique 5 times per graph to get a\ngood time measure. **CUDA** is the [switched-per-vertex] approach running on\nGPU. **CUDA** based pagerank is indeed much faster than **sequential** (CPU). In\norder to measure error, [nvGraph] pagerank is taken as a reference.\n\n[![](https://i.imgur.com/vDeiY1n.gif)][sheetp]\n\n[![](https://i.imgur.com/N1EUPCS.png)][sheetp]\n[![](https://i.imgur.com/5LaxhV4.png)][sheetp]\n\n[compare-nvgraph]: https://github.com/puzzlef/pagerank-cuda/tree/compare-nvgraph\n\n\u003cbr\u003e\n\n\n### Other experiments\n\n- [adjust-damping-factor](https://github.com/puzzlef/pagerank-cuda/tree/adjust-damping-factor)\n- [adjust-tolerance](https://github.com/puzzlef/pagerank-cuda/tree/adjust-tolerance)\n- [adjust-tolerance-function](https://github.com/puzzlef/pagerank-cuda/tree/adjust-tolerance-function)\n\n\u003cbr\u003e\n\u003cbr\u003e\n\n\n## References\n\n- [PageRank Algorithm, Mining massive Datasets (CS246), Stanford University](http://snap.stanford.edu/class/cs246-videos-2019/lec9_190205-cs246-720.mp4)\n- [CUDA by Example :: Jason Sanders, Edward Kandrot](http://www.mat.unimi.it/users/sansotte/cuda/CUDA_by_Example.pdf)\n- [Managed memory vs cudaHostAlloc - TK1](https://forums.developer.nvidia.com/t/managed-memory-vs-cudahostalloc-tk1/34281)\n- [SuiteSparse Matrix Collection]\n\n\u003cbr\u003e\n\u003cbr\u003e\n\n\n[![](https://i.imgur.com/fjeKRUf.jpg)](https://www.youtube.com/watch?v=TtTHBmL7N5U)\n[![ORG](https://img.shields.io/badge/org-puzzlef-green?logo=Org)](https://puzzlef.github.io)\n[![DOI](https://zenodo.org/badge/374990003.svg)](https://zenodo.org/badge/latestdoi/374990003)\n![](https://ga-beacon.deno.dev/G-KD28SG54JQ:hbAybl6nQFOtmVxW4if3xw/github.com/puzzlef/pagerank-cuda)\n\n[Prof. Dip Sankar Banerjee]: https://sites.google.com/site/dipsankarban/\n[Prof. Kishore Kothapalli]: https://cstar.iiit.ac.in/~kkishore/\n[SuiteSparse Matrix Collection]: https://suitesparse-collection-website.herokuapp.com\n[nvGraph]: https://github.com/rapidsai/nvgraph\n[sequential]: https://github.com/puzzlef/pagerank-sequential-vs-openmp\n[switched-per-vertex]: https://github.com/puzzlef/pagerank-cuda-switched-adjust-switch-point\n[charts]: https://photos.app.goo.gl/MLcbhUPmLEC7iaEm9\n[sheets]: https://docs.google.com/spreadsheets/d/12u5yq49MLS2QRhWHkZF7SWs1JSS4u1sb7wKl8ExrJgg/edit?usp=sharing\n[sheetp]: https://docs.google.com/spreadsheets/d/e/2PACX-1vTijFuWx76ZnNfJs5U0IEY1jMEWffi6Pc8uw4FbnXB1R3Puduyn-mPvq4kdMFyyhq0V7GJZQ0722nDS/pubhtml\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpuzzlef%2Fpagerank-cuda","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpuzzlef%2Fpagerank-cuda","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpuzzlef%2Fpagerank-cuda/lists"}