{"id":18811510,"url":"https://github.com/lhelwerd/webgraph","last_synced_at":"2025-04-13T20:31:35.628Z","repository":{"id":22989015,"uuid":"26339418","full_name":"lhelwerd/WebGraph","owner":"lhelwerd","description":"WebGraph framework with extensions","archived":false,"fork":false,"pushed_at":"2014-12-20T14:29:51.000Z","size":602,"stargazers_count":22,"open_issues_count":1,"forks_count":10,"subscribers_count":5,"default_branch":"master","last_synced_at":"2023-10-20T21:14:09.372Z","etag":null,"topics":["forked-repo","graph-compression","java"],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lhelwerd.png","metadata":{"files":{"readme":"README.txt","changelog":"CHANGES","contributing":null,"funding":null,"license":"COPYING","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2014-11-07T21:48:12.000Z","updated_at":"2023-10-20T21:14:09.373Z","dependencies_parsed_at":"2022-08-21T18:10:37.377Z","dependency_job_id":null,"html_url":"https://github.com/lhelwerd/WebGraph","commit_stats":null,"previous_names":[],"tags_count":0,"template":null,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lhelwerd%2FWebGraph","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lhelwerd%2FWebGraph/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lhelwerd%2FWebGraph/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lhelwerd%2FWebGraph/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lhelwerd","download_url":"https://codeload.github.com/lhelwerd/WebGraph/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223604358,"owners_count":17172277,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["forked-repo","graph-compression","java"],"created_at":"2024-11-07T23:26:35.798Z","updated_at":"2024-11-07T23:26:36.404Z","avatar_url":"https://github.com/lhelwerd.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"How to set up and use the WebGraph framework\n\nIn order to compile the WebGraph framework and set it up so that it can be run,\nthe following steps should be followed:\n\n- Download and extract Apache Maven binaries from\n  http://maven.apache.org/download.cgi\n- Download the dependencies tarball from http://webgraph.di.unimi.it and\n  extract it.\n- Retrieve the WebGraph framework source code with extensions from this\n  repository. This version contains the copy list and copy flags compression\n  formats, as well as additional flags for the other compression schemes.\n- Compile the JAR file of the framework using Maven by running \"mvn install\"\n  within the WebGraph root directory.\n- Copy the target/webgraph-3.4.2.jar file it to the same location as the JAR\n  files from the dependencies.\n\nIn order to use Maven on a computer where it is not yet in the PATH, one\ncan run mvn.sh provided in the repository. This sets up a link to the Maven\nroot directory. Note that it only keeps the path setting for the current\nsession by default. Instructions on adding the paths to a .bashrc file, for\nexample, are given by the script. Otherwise, one should either run the script\nin a bash with \". ./mvn.sh\" and run mvn afterwards, or with \"./mvn.sh install\"\n(i.e., in place of mvn).\n\nThe framework has different modes of operations that allow compressing,\ndecompressing and testing graph files in different formats. For example, the\nfollowing command recompresses a dataset with other parameters:\njava -cp \"*\" it.unimi.dsi.webgraph.BVGraph -o -m 1 ../sets/uk-2002 ../sets/uk-fast\n\nThis command creates an offsets file and alters the maximum reference chain\nlength of the given dataset.\n\nThe following parameters can be used to alter the implemented compression\nalgorithms:\n- -m determines the maximum length of a reference chain that we can make\n  during reference compression. -1 is essentially the same as allowing any\n  length of reference chains, while 0 skips reference compression, which is\n  used by copy list, copy blocks and copy flags compression.\n- -w is the window size. 0 skips all reference compression, which is used by\n  copy list, copy blocks and copy flags compression.\n- -i is interval minimum length. 0 skips interval compression.\n- -r is residual compression. This is a boolean parameter, where the value\n  0 skips gaps compression, and 1 performs gaps compression, which is the\n  default.\n- -b determines which copy compression is used. 0 is copy list, 1 is copy\n  blocks (default) and 2 is copy flags. This is only used if the maximum\n  reference length (-m) and window size (-w) are not 0.\n\nAs we can see, we can set certain parameters to 0 to make sure that their\nrespective compression schemes are not used. We can therefore test the algorithms separately. To be specific, we have the following parameters for the\ngiven compression formats:\n\n- No compression algorithm: -m 0 -w 0 -i 0 -r 0\n- Gaps compression: -m 0 -w 0 -i 0\n- Interval compression: -m 0 -w 0 -r 0\n- Copy blocks: -i 0 -r 0 -b 1\n- Copy list: -i 0 -r 0 -b 0\n- Copy flags: -i 0 -r 0 -b 2\n\nFor instance, this command creates a graph file with only gaps compression:\njava -cp \"*\" it.unimi.dsi.webgraph.BVGraph -m 0 -w 0 -i 0 ../sets/uk-2002 ../sets/uk-gaps\n\nThe compressed files are binary files, with special representations for numeric\nvalues that improve compression compared to plain text formats. The framework\ncan also decompress the graphs and store them in a readable format. There are\ntwo major plain text formats that the framework supports. The first plain text\nformat is the naive adjacency list format:\njava -cp \"*\" it.unimi.dsi.webgraph.ASCIIGraph -g BVGraph ../sets/uk-2002 ../sets/uk-raw\n\nThe second format is a naive listing of pairs of edges:\njava -cp \"*\" it.unimi.dsi.webgraph.ArcListASCIIGraph -g BVGraph ../sets/uk-2002 ../sets/uk.edges\n\nOne can also import a graph from these flat files, by swapping around the\n\"ASCIIGraph\" or \"ArcListASCIIGraph\" class with the \"BVGraph\" class. This\nrequires that those files are sorted correctly and do not contain non-edge\ndata such as comments. For example:\njava -cp \"*\" it.unimi.dsi.webgraph.BVGraph -g ArcListASCIIGraph ../sets/huge.txt.e ../sets/huge\nHere, one can again add specific compression flags to tune which compression\nand which settings to use. Note that this will almost always giv increased\ncompression since it is no longer stored as ASCII text but as binary codes,\nskewing the comparison.\n\nThe WebGraph framework also provides a speed test module, which has been\nadapted to use CPU time instead of wall-clock time. The speed test has two\ndifferent modes in which it can operate. By default, the speed test performs\nsequential testing, which times how long it takes to expand the whole\ncompressed graph. The following command can be used to run a sequential speed\ntest:\njava -cp \"*\" it.unimi.dsi.webgraph.test.SpeedTest -g BVGraph ../sets/uk-gaps\n\nWe can also perform random access testing. This mode can be started by giving\nthe parameter -r. This parameter requires a sample size as value, which\ndetermines how many randomly chosen nodes we use in the test in order to\nmeasure and average the random access times. This command can be used to run\na random access speed test:\njava -cp \"*\" it.unimi.dsi.webgraph.test.SpeedTest -g BVGraph -r 100000 ../sets/uk-gaps\n\nFinally, if the WebGraph framework runs out of memory, one can increase the\nJava heap size using the -Xmx command line flag and the Java thread stack size\nusing -Xss.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flhelwerd%2Fwebgraph","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flhelwerd%2Fwebgraph","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flhelwerd%2Fwebgraph/lists"}