{"id":27099516,"url":"https://github.com/puzzlef/pagerank-openmp","last_synced_at":"2025-04-06T12:35:54.370Z","repository":{"id":109078427,"uuid":"366356464","full_name":"puzzlef/pagerank-openmp","owner":"puzzlef","description":"Design of OpenMP-based PageRank algorithm for link analysis.","archived":false,"fork":false,"pushed_at":"2024-07-22T14:24:04.000Z","size":227,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-07-22T17:18:34.337Z","etag":null,"topics":["experiment","graph","multi-threaded","openmp","pagerank","sequential","single-threaded"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/puzzlef.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-05-11T11:23:22.000Z","updated_at":"2024-07-22T14:24:08.000Z","dependencies_parsed_at":"2023-04-21T13:56:25.558Z","dependency_job_id":null,"html_url":"https://github.com/puzzlef/pagerank-openmp","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/puzzlef%2Fpagerank-openmp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/puzzlef%2Fpagerank-openmp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/puzzlef%2Fpagerank-openmp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/puzzlef%2Fpagerank-openmp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/puzzlef","download_url":"https://codeload.github.com/puzzlef/pagerank-openmp/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247485270,"owners_count":20946397,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["experiment","graph","multi-threaded","openmp","pagerank","sequential","single-threaded"],"created_at":"2025-04-06T12:35:53.908Z","updated_at":"2025-04-06T12:35:54.362Z","avatar_url":"https://github.com/puzzlef.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"Design of **OpenMP-based** *PageRank algorithm* for link analysis.\n\n\u003cbr\u003e\n\n\n### Comparing with Ordered approach\n\n**Unordered PageRank** is the *standard* approach of PageRank computation (as\ndescribed in the original paper by Larry Page et al. [(1)][page]), where *two*\n*different rank vectors* are maintained; one representing the *current* ranks of\nvertices, and the other representing the *previous* ranks. On the other hand,\n**ordered PageRank** uses *a single rank vector*, representing the current ranks\nof vertices [(2)][pagerank]. This is similar to barrierless non-blocking\nimplementations of the PageRank algorithm by Hemalatha Eedi et al. [(3)][eedi].\nAs ranks are updated in the same vector (with each iteration), the order in\nwhich vertices are processed *affects* the final result (hence the adjective\n*ordered*). However, as PageRank is an iteratively converging algorithm, results\nobtained with either approach are *mostly the same*.\n\nIn this experiment ([approach-ordered]), we compare the performance of\n**ordered** and **unordered OpenMP-based PageRank** (and compare it alongside\n*ordered* and *unordered sequential PageRank*). A *schedule* of `dynamic, 2048`\nis used for *OpenMP-based PageRank* as obtained in [(4)][pagerank-openmp]. We\nuse the follwing PageRank parameters: damping factor `α = 0.85`, tolerance\n`τ = 10^-6`, and limit the maximum number of iterations to `L = 500.` The error\nbetween the current and the previous iteration is obtained with *L1-norm*, and\nis used to detect convergence. *Dead ends* in the graph are handled by always\nteleporting any vertex in the graph at random (*teleport* approach [(5)][teleport]).\nError in ranks obtained for each approach is measured relative to the *unordered*\n*sequential approach* using *L1-norm*.\n\nFrom the results, we observe that the **ordered OpenMP-based approach is**\n**somewhat faster** than the unordered approach **in terms of time**, and follows\na trend similar to that of sequential PageRank. However, the **ordered**\n**approach** (both OpenMP-based and sequential) **converges in significantly fewer**\n**iterations** than the unordered approach. This indicates that the ordered\napproach could have been quite a bit faster, but is not, because of *some*\noverhead (possibly *cache coherence* overhead due to parallel read-write access\nto the same vector). In any case, **ordered PageRank** is indeed **faster than**\n**unordered Pagerank**.\n\n[approach-ordered]: https://github.com/puzzlef/pagerank-openmp/tree/approach-ordered\n\n\u003cbr\u003e\n\n\n### Adjusting Tolerance (Ordered approach)\n\nIn this experiment ([adjust-tolerance-ordered]), we perform *OpenMP-based*\n*ordered PageRank* while adjusting the tolerance `τ` from `10^-1` to `10^-14`\nwith three different tolerance functions: `L1-norm`, `L2-norm`, and `L∞-norm`.\nWe also compare it with unordered PageRank (both OpenMP-based and sequential)\nfor the same tolerance and tolerance function. We use a damping factor of\n`α = 0.85` and limit the maximum number of iterations to `L = 500`. The error between\nthe approaches is calculated with *L1-norm*. The *sequential unordered* approach\nis considered to be the *gold standard* (wrt to which error is measured). *Dead ends*\nin the graph are handled by always teleporting any vertex in the graph at\nrandom (*teleport* approach [(4)]). The teleport contribution to all vertices is\ncalculated *once* (for all vertices) at the begining of each iteration.\n\nFrom the results, we observe that **OpenMP-based ordered PageRank** only\nconverges **faster** than the unordered approach **below a tolerance of**\n`τ = 10^-6`. This may be due to *cache coherence overhead* associated with the\nordered approach, which can exceed the benefit provided by ordered approach with\nloose tolerance values. In terms of the number of iterations, we interestingly\nobserve that iterations of OpenMP-based unordered/ordered approaches are higher\nthan with sequential approaches. We currently do not have an explanation for\nthis.\n\n[adjust-tolerance-ordered]: https://github.com/puzzlef/pagerank-openmp/tree/adjust-tolerance-ordered\n\n\u003cbr\u003e\n\n\n### Adjusting OpenMP schedule\n\nIn this experiment ([adjust-schedule]), we compare performance obtained for\n*OpenMP-based PageRank* for various *schedules*. Each thread is assigned a\ncertain number of *vertices* to process. The **schedule kind** is adjusted among\n`static` / `dynamic` / `guided` / `auto`, and the **chunk size** is adjusted\nfrom `1` to `65536`. We do this for the **rank computation step**. PageRank\nfactors, contributions, and teleport contribution computation is calculated with\nsuitable OpenMP schedule (`auto`). We use the follwing PageRank parameters:\ndamping factor `α = 0.85`, tolerance `τ = 10^-6`, and limit the maximum number\nof iterations to `L = 500`. The error between the current and the previous\niteration is obtained with *L1-norm*, and is used to detect convergence.\n\nFrom the results, we observe that a **dynamic schedule with a chunk size of**\n**2048** appears to perform the **best**. This however may change based on the\nsize of graphs in the dataset, or the system used. In such cases `auto` schedule\nmay be used as a fallback. We also observe that the *difference in ranks*\nobtained from sequential and OpenMP-based approach is *relatively high*\n(`\u003c 10^-3`) on *large directed graphs*. This may be due to the fact that parallel\nreduce performed for teleport contibution calculation differs from sequential\nreduce due to *inaccuracies associated with 32-bit floating point format*\n*(float)*, and can be avoided by using *64-bit floating point format (double)*.\n\n[adjust-schedule]: https://github.com/puzzlef/pagerank-openmp/tree/adjust-schedule\n\n\u003cbr\u003e\n\n\n### Comparision with Hybrid approach\n\nThis experiment ([approach-hybrid]) was for comparing the performance between\nfinding pagerank using **uniform** OpenMP (*all* routines use OpenMP), or using\n**hybrid** OpenMP (*some* routines are *sequential*). Both techniques were\nattempted on different types of graphs, running each technique 5 times per graph\nto get a good time measure. Number of threads for this experiment (using\n`OMP_NUM_THREADS`) was varied from `2` to `48`.\n\nIt appears that **hybrid** approach performs **worse** in most cases, and only\nslightly better than *uniform* approach in a few cases. I am not sure why\nthat is the case, possibly there could be some correlation between execution\ntime and some other parameter. Note that neither approach makes use of\n*SIMD instructions* which are available on all modern hardware.\n\n[approach-hybrid]: https://github.com/puzzlef/pagerank-openmp/tree/approach-hybrid\n\n\u003cbr\u003e\n\n\n### Comparision with Sequential implementation\n\nThis experiment ([compare-sequential]) was for comparing the performance between\nfinding pagerank using a single thread (**sequential**), or finding pagerank\naccelerated using **OpenMP**. Both techniques were attempted on different types\nof graphs, running each technique 5 times per graph to get a good time measure.\nNumber of threads for this experiment (using `OMP_NUM_THREADS`) was varied from\n`2` to `48`.\n\n**OpenMP** does seem to provide a **clear benefit** for most graphs (except for\nthe smallest ones). This speedup is definitely not directly proportional to the\nnumber of threads, as one would normally expect (Amdahl's law). Note that there\nis still room for improvement with **OpenMP** by using sequential versions of\ncertain routines instead of OpenMP versions because not all calculations benefit\nfrom multiple threads (ex. [vector-multiplication-openmp]). Also note that\nneither approach makes use of *SIMD instructions* which are available on all\nmodern hardware.\n\n[![](https://i.imgur.com/Quuaqnv.gif)][sheets]\n\n[compare-sequential]: https://github.com/puzzlef/pagerank-openmp/tree/compare-sequential\n\n\u003cbr\u003e\n\u003cbr\u003e\n\n\n## References\n\n- [An Efficient Practical Non-Blocking PageRank Algorithm for Large Scale Graphs; Hemalatha Eedi et al. (2021)](https://ieeexplore.ieee.org/document/9407114)\n- [PageRank Algorithm, Mining massive Datasets (CS246), Stanford University](https://www.youtube.com/watch?v=ke9g8hB0MEo)\n- [The PageRank Citation Ranking: Bringing Order to the Web; Larry Page et al. (1998)](https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.38.5427)\n- [Ranking nodes in growing networks: When PageRank fails; Mariani et al. (2015)](https://www.nature.com/articles/srep16181)\n- [Local Approximation of PageRank and Reverse PageRank; Bar-Yossef et al. (2008)](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/34455.pdf)\n- [PageRank Beyond the Web; David F. Gleich (2015)](https://www.cs.purdue.edu/homes/dgleich/publications/Gleich%202015%20-%20prbeyond.pdf)\n- [Some methods of speeding up the convergence of iteration methods; Boris T. Polyak (1964)](https://www.researchgate.net/publication/243648538_Some_methods_of_speeding_up_the_convergence_of_iteration_methods)\n- [The University of Florida Sparse Matrix Collection; Timothy A. Davis et al. (2011)](https://doi.org/10.1145/2049662.2049663)\n- [When to Stop Slowly Convergent Iteration?; Prof. W. Kahan](https://people.eecs.berkeley.edu/~wkahan/Math128/SlowIter.pdf)\n- [Simple Trick to Dramatically Improve Speed of Convergence; Vincent Granville](https://www.datasciencecentral.com/simple-trick-to-dramatically-improve-speed-of-convergence/)\n- [What's the difference between \"static\" and \"dynamic\" schedule in OpenMP?](https://stackoverflow.com/a/10852852/1413259)\n- [OpenMP Dynamic vs Guided Scheduling](https://stackoverflow.com/a/43047074/1413259)\n- [Block Compressed Row Format (BSR)](https://scipy-lectures.org/advanced/scipy_sparse/bsr_matrix.html)\n- [Aitken's delta-squared process](https://en.wikipedia.org/wiki/Aitken%27s_delta-squared_process)\n- [Fixed-point iteration](https://en.wikipedia.org/wiki/Fixed-point_iteration)\n- [Steffensen's method](https://en.wikipedia.org/wiki/Steffensen%27s_method)\n- [Rate of convergence](https://en.wikipedia.org/wiki/Rate_of_convergence)\n\n\u003cbr\u003e\n\u003cbr\u003e\n\n\n[![](https://i.imgur.com/5vdxPZ3.jpg)](https://www.youtube.com/watch?v=rKv_l1RnSqs)\n[![ORG](https://img.shields.io/badge/org-puzzlef-green?logo=Org)](https://puzzlef.github.io)\n[![DOI](https://zenodo.org/badge/366356464.svg)](https://zenodo.org/badge/latestdoi/366356464)\n\n\n[Prof. Dip Sankar Banerjee]: https://sites.google.com/site/dipsankarban/\n[Prof. Kishore Kothapalli]: https://cstar.iiit.ac.in/~kkishore/\n[SuiteSparse Matrix Collection]: https://suitesparse-collection-website.herokuapp.com\n[graphs]: https://github.com/puzzlef/graphs\n[vector-multiplication-openmp]: https://github.com/puzzlef/vector-multiplication-openmp\n[charts]: https://photos.app.goo.gl/Bd8bwdZbppkdUQTU9\n[sheets]: https://docs.google.com/spreadsheets/d/1Mzmo9KYunJ9yv2ZNwFv73qPjf9VYNaP5YXJT0HVZgpo/edit?usp=sharing\n[page]: https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.38.5427\n[pagerank]: https://github.com/puzzlef/pagerank\n[eedi]: https://ieeexplore.ieee.org/document/9407114\n[pagerank-openmp]: https://github.com/puzzlef/pagerank-openmp/tree/adjust-schedule\n[teleport]: https://gist.github.com/wolfram77/94c38b9cfbf0c855e5f42fa24a8602fc\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpuzzlef%2Fpagerank-openmp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpuzzlef%2Fpagerank-openmp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpuzzlef%2Fpagerank-openmp/lists"}