{"id":39055121,"url":"https://github.com/dsg-uwaterloo/graphsurge","last_synced_at":"2026-01-17T18:00:05.974Z","repository":{"id":142054914,"uuid":"254778879","full_name":"dsg-uwaterloo/graphsurge","owner":"dsg-uwaterloo","description":"Graphs analytics on collections of views!","archived":false,"fork":false,"pushed_at":"2023-11-28T11:36:06.000Z","size":609,"stargazers_count":32,"open_issues_count":0,"forks_count":3,"subscribers_count":3,"default_branch":"main","last_synced_at":"2023-11-28T12:36:13.950Z","etag":null,"topics":["differential-computations","graph-analytics","view-collection","views"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2004.05297","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dsg-uwaterloo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2020-04-11T02:39:31.000Z","updated_at":"2023-11-28T11:36:18.000Z","dependencies_parsed_at":"2023-11-28T12:44:24.798Z","dependency_job_id":null,"html_url":"https://github.com/dsg-uwaterloo/graphsurge","commit_stats":null,"previous_names":[],"tags_count":0,"template":null,"template_full_name":null,"purl":"pkg:github/dsg-uwaterloo/graphsurge","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dsg-uwaterloo%2Fgraphsurge","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dsg-uwaterloo%2Fgraphsurge/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dsg-uwaterloo%2Fgraphsurge/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dsg-uwaterloo%2Fgraphsurge/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dsg-uwaterloo","download_url":"https://codeload.github.com/dsg-uwaterloo/graphsurge/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dsg-uwaterloo%2Fgraphsurge/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28514939,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-17T17:57:59.192Z","status":"ssl_error","status_checked_at":"2026-01-17T17:57:52.527Z","response_time":85,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["differential-computations","graph-analytics","view-collection","views"],"created_at":"2026-01-17T18:00:05.754Z","updated_at":"2026-01-17T18:00:05.960Z","avatar_url":"https://github.com/dsg-uwaterloo.png","language":"Rust","readme":"# Graphsurge: Graph Analytics on View Collections using Differential Computations\n\n\u003cp align=\"center\"\u003e\n  \u003cimg width=\"400\" src=\"logo.png?raw=true\"\u003e\n\u003c/p\u003e\n\n![Continuous integration](https://github.com/dsg-uwaterloo/graphsurge/workflows/CI/badge.svg)\n\nGraphsurge is a new system for performing analytical computations on multiple snapshots or _views_\nof large-scale static property graphs. Graphsurge allows users to create _view collections_, a set\nof related views of a graph created by applying filter predicates on node and edge properties, and\nrun analytical computations on all the views of a collection efficiently.\n\nGraphsurge is built on top of [Timely Dataflow](https://github.com/TimelyDataflow/timely-dataflow)\nand [Differential Dataflow](https://github.com/TimelyDataflow/differential-dataflow), which provides\ntwo huge benefits:\n* Differential Dataflow can incrementally maintain the results for any general computation, including\ncyclic or iterative computations (which include many graph algorithms such as\n[Connected Components](https://en.wikipedia.org/wiki/Component_(graph_theory)). Analytical\ncomputations in Graphsurge are expressed using Differential operators and enables reusing\ncomputation results across the views of a collection instead of running computations from scratch\non each view. This results in huge savings on the total runtime.\n* We use the Timely execution engine to seamlessly scale both the materialization of view\ncollections and running analytical computations to a distributed environment, similar to using\nother execution frameworks such as [Spark](https://spark.apache.org) or\n[Flink](https://flink.apache.org).\n\nGraphsurge stores view collections using a form of [delta encoding](https://en.wikipedia.org/wiki/Delta_encoding),\nwhere the data for a view GV\u003csub\u003ei\u003c/sub\u003e represent its difference with the previous view GV\u003csub\u003ei-1\u003c/sub\u003e.\nThis representation can also be directly used as inputs to Differential Dataflow computations.\n\nIn _general_, the runtime of a black-box differential computation (such as the\nuser-defined computations in Graphsurge) is correlated with the total number of diffs of a view\ncollection. Graphsurge enables 2 key optimizations based on this observation:\n* **Collection Ordering**: The total number of diffs of a view collection depends on the order the\n views (similar views placed next to each other will generate less diffs) and we want to reorder\n a given set of views to get the lowest number of diffs. This Collection Ordering Problem is related\n to the Consecutive Block Minimization Problem, which is NP-Hard! Graphsurge solves this problem\n using a constant-factor approximation algorithm (resulting in up to 10x less diffs\n in our experiments).\n\n* **Adaptive Collection Splitting**: Maintaining computation results unsurprisingly implies an\noverhead for Differential Dataflow, as it needs to check the entire history of a\nkey to determine the effect of a new update. This overhead is especially large for cases where the\nnumber of diffs of a view are high, or for computations (like PageRank) which results\nin a large number of output changes even for small number of input updates. In such cases, it is\nfaster to run the computation on a view from scratch instead of trying to reuse results from\nprevious views.\n\n  Graphsurge keeps track of the correlation between the number of the diffs and the\n  actual computation time when running differentially and also when rerunning from scratch. It uses\n  a linear regression model to adaptively decide at runtime to split the view collection at the\n  point where rerunning from scratch is predicted to be faster than to continue running\n  differentially.\n\nMore details on our techniques and experimental results can be found in [our paper](https://arxiv.org/abs/2004.05297).\n\n## Using Graphsurge\n\nGraphsurge is written in [Rust](https://www.rust-lang.org). To run the Graphsurge cli, download and build\nthe binary:\n\n```bash\n$ git clone https://github.com/dsg-uwaterloo/graphsurge \u0026\u0026 cd graphsurge\n$ cargo build --release\n$ ./target/bin/graphsurge\n```\n\n### Set the number of worker threads and process id:\n```bash\ngraphsurge\u003e SET THREADS 4 AND PROCESS_ID 0;\n```\n\n### Load a graph:\n```bash\ngraphsurge\u003e LOAD GRAPH WITH\n    VERTICES FROM 'data/small_properties/vertices.txt' and\n    EDGES FROM 'data/small_properties/edges.txt'\n    COMMENT '#';\n```\n### Create a view collection:\n```bash\ngraphsurge\u003e CREATE VIEW COLLECTION Years WHERE\n    [year \u003c= 2000 and u.country = 'canada' and v.country = 'canada'],\n    [year \u003c= 2005 and u.country = 'canada' and v.country = 'canada'],\n    [year \u003c= 2010 and u.country = 'canada' and v.country = 'canada'];\n```\n\n### Run computations:\n```bash\n$ mkdir bfs_results\n```\n```bash\ngraphsurge\u003e RUN COMPUTATION wcc ON COLLECTION Years SAVE RESULTS TO 'bfs_results';\n```\n\n### Running in a distributed environment:\n\nTo run Graphsurge on multiple machines, say on 2 hosts _server1_ and _server2_, start\nGraphsurge and set the process ids:\n\n```bash\n# On server1\ngraphsurge\u003e SET THREADS 32 AND PROCESS_ID 0;\n```\n\n```bash\n# On server2\ngraphsurge\u003e SET THREADS 32 AND PROCESS_ID 1;\n```\n\nThen run the same queries on both of them. Make sure that server1 and server2\ncan access each other at the specified port.\n\n```bash\ngraphsurge\u003e LOAD GRAPH WITH\n    VERTICES FROM 'data/small_properties/vertices.txt' and\n    EDGES FROM 'data/small_properties/edges.txt'\n    COMMENT '#';\ngraphsurge\u003e CREATE VIEW COLLECTION Years WHERE\n    [year \u003c= 2000 and u.country = 'canada' and v.country = 'canada'],\n    [year \u003c= 2005 and u.country = 'canada' and v.country = 'canada'],\n    [year \u003c= 2010 and u.country = 'canada' and v.country = 'canada']\n    HOSTS 'server1:9000' 'server2:9000';\ngraphsurge\u003e RUN ARRANGED_DIFFERENTIAL COMPUTATION wcc on COLLECTION Years\n    HOSTS 'server1:9000' 'server2:9000';\n```\n\nThe same process can be repeated for additional hosts machines.\n\n### Writing new computations:\nGraphsurge already has [implementations](src/computations/builder.rs#L45)\nfor a set of common graph algorithms. New computations can be written using the [Analytics\nComputation API](gs_analytics_api/src). You can see examples of how to use the API for\n[bfs](src/computations/bfs) and [scc](src/computations/scc).\n\nCheck the [experiments](experiments/) folder for examples on how to use Graphsurge.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdsg-uwaterloo%2Fgraphsurge","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdsg-uwaterloo%2Fgraphsurge","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdsg-uwaterloo%2Fgraphsurge/lists"}