{"id":23341680,"url":"https://github.com/filedrive-team/go-parallel-graphsync","last_synced_at":"2025-04-10T00:52:24.065Z","repository":{"id":62143239,"uuid":"520773303","full_name":"filedrive-team/go-parallel-graphsync","owner":"filedrive-team","description":"Parallel-GraphSync is an enhanced implementation for parallel synchronization of large IPLD graph data based on GraphSync protocol.","archived":false,"fork":false,"pushed_at":"2024-11-22T08:41:46.000Z","size":47810,"stargazers_count":5,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-04-10T00:52:12.543Z","etag":null,"topics":["filecoin","graphsync","ipfs"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/filedrive-team.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-08-03T07:04:38.000Z","updated_at":"2024-11-22T08:41:51.000Z","dependencies_parsed_at":"2024-08-21T10:36:14.134Z","dependency_job_id":null,"html_url":"https://github.com/filedrive-team/go-parallel-graphsync","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/filedrive-team%2Fgo-parallel-graphsync","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/filedrive-team%2Fgo-parallel-graphsync/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/filedrive-team%2Fgo-parallel-graphsync/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/filedrive-team%2Fgo-parallel-graphsync/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/filedrive-team","download_url":"https://codeload.github.com/filedrive-team/go-parallel-graphsync/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248137995,"owners_count":21053775,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["filecoin","graphsync","ipfs"],"created_at":"2024-12-21T05:11:02.273Z","updated_at":"2025-04-10T00:52:24.044Z","avatar_url":"https://github.com/filedrive-team.png","language":"Go","readme":"# go-parallel-graphsync\n\n### Project Description\n\nCurrently, most projects in Filecoin Retrieval Market use [GraphSync](https://github.com/ipfs/go-graphsync) to synchronize data.\n\nGraphSync works well when synchronizing small-volume files. Users can synchronize these multiple files from different nodes by downloading them in parallel. This way could spread the bandwidth pressure of one single node and shorten the synchronization time.\n\nHowever, this one-to-one synchronizing method may not work well for large files. If data synchronization time becomes longer,  from minutes to tens of minutes or even hours, a single node's bandwidth load will increase. Besides, other nodes with the same data do not provide retrieval services, so resources in the Filecoin network can not be fully utilized.\n\nParallelGraphSync aims to provide a solution for synchronizing large files in Filecoin Retrieval Market. It can dynamically adjust the synchronization strategy of data blocks to achieve parallel download by connecting multiple nodes with retrieved data and measuring the synchronization speed of the nodes.\n\nFor example, we assume Node A needs to synchronize a 1GiB file and Node B, C, and D have duplicate file copies. Network delays between Node A and other nodes are:\n\n- A to B: 10ms\n- A to C: 50ms\n- A to D: 100ms\n\nThe transmission speed from Node B, C, and D to A are all 1MB/s.\n\n![comparison diagram](docs/compare.png)\n\nThe results could be:\n\n- By using GraphSync, it takes about 17mins to synchronize this file from Node B(optimal) to Node A.\n- By using ParallelGraphSync, if we adjust the synchronization strategy according to their network delay and transmission speed, it takes about 5mins 42s to complete the same work.\n\nThe actual situation will be more complicated and time-consuming but will not affect parallel synchronization's advantages.\n\nFor one-to-one data synchronization, the increase in data copies only increases the throughput of a CDN network. The data retrieval speed can not improve significantly and is limited to a single node's transmission speed.\nWith ParallelGraphSync, the number of data copies can increase the entire network's throughput and significantly improve the retrieval speed, especially for large files.\n\n\n\n### Value\n\n- Shift traffic from a single node to multiple nodes\n- Spread the bandwidth pressure of one single node\n- Increase the speed of content delivery\n- Improve the utilization of network resources\n\n\n### Usage\n\n### Initializing a Parallel GraphSync Exchange\n\n```golang\nimport (\n  pargraphsyncimpl \"github.com/filedrive-team/go-parallel-graphsync/impl\"\n  gsnet \"github.com/ipfs/go-graphsync/network\"\n  ipld \"github.com/ipld/go-ipld-prime\"\n)\n\nvar ctx context.Context\nvar host libp2p.Host\nvar lsys ipld.LinkSystem\n\nnetwork := gsnet.NewFromLibp2pHost(host)\nexchange := pargraphsyncimpl.New(ctx, network, lsys)\n```\n\nParameter Notes:\n\n1. `context` is just the parent context for all of GraphSync\n2. `network` is a network abstraction provided to Graphsync on top\n   of libp2p. This allows graphsync to be tested without the actual network\n3. `lsys` is an go-ipld-prime LinkSystem, which provides mechanisms loading and constructing go-ipld-prime nodes from a link, and saving ipld prime nodes to serialized data\n\n### Using Parallel GraphSync With An IPFS BlockStore\n\nGraphSync provides a convenience function in the `storeutil` package for\nintegrating with BlockStore's from IPFS.\n\n```golang\nimport (\n  pargraphsyncimpl \"github.com/filedrive-team/go-parallel-graphsync/impl\"\n  gsnet \"github.com/ipfs/go-graphsync/network\"\n  storeutil \"github.com/ipfs/go-graphsync/storeutil\"\n  ipld \"github.com/ipld/go-ipld-prime\"\n  blockstore \"github.com/ipfs/go-ipfs-blockstore\"\n)\n\nvar ctx context.Context\nvar host libp2p.Host\nvar bs blockstore.Blockstore\n\nnetwork := gsnet.NewFromLibp2pHost(host)\nlsys := storeutil.LinkSystemForBlockstore(bs)\n\nexchange := pargraphsyncimpl.New(ctx, network, lsys)\n```\n\n### Calling Parallel GraphSync\n\n```golang\nvar exchange pargraphsync.ParallelGraphExchange\nvar ctx context.Context\nvar peers []peer.ID\nvar selector ipld.Node\nvar root ipld.Link\nvar extensions []graphsync.ExtensionData\n\nvar responseProgress \u003c-chan graphsync.ResponseProgress\nvar errors \u003c-chan error\n\nresponseProgress, errors = exchange.RequestMany(ctx context.Context, peers []peer.ID, root ipld.Link, selector ipld.Node, extensions ...graphsync.ExtensionData)\n```\n\nParamater Notes:\n1. `ctx` is the context for this request. To cancel an in progress request, cancel the context.\n2. `peers` is the peers list you will send this request to\n3. `root` is an IPLD Link, i.e. a CID (cidLink.Link{Cid})\n4. `selector` is an IPLD selector node. Recommend using selector builders from go-ipld-prime to construct these\n5. `extensions` is the extension list parameter for this request，generally be ignored\n\n### Response Type\n\n```golang\n\ntype ResponseProgress struct {\n  Node      ipld.Node // a node which matched the graphsync query\n  Path      ipld.Path // the path of that node relative to the traversal start\n\tLastBlock struct {  // LastBlock stores the Path and Link of the last block edge we had to load. \n\t\tipld.Path\n\t\tipld.Link\n\t}\n}\n\n```\n\nThe above provides both immediate and relevant metadata for matching nodes in a traversal, and is very similar to the information provided by a local IPLD selector traversal in `go-ipld-prime`\n\n\n\n### Benchmark\n\n```shell\ngo test -v -test.run '^Bench.*$' -test.bench 'BenchmarkGraphSync' ./example/ -benchtime=20x --benchmem\n\ngoos: darwin\ngoarch: amd64\npkg: github.com/filedrive-team/go-parallel-graphsync/example\ncpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz\nBenchmarkGraphSync\n    pargraphsync_test.go:599: peerId: 12D3KooWScBXkrSWH8vREGXSzNe3HEGDeRByWrytFr1G9BhQRJtj latency: 20ms\n    pargraphsync_test.go:599: peerId: 12D3KooWEfPLL4YQ1JpUir4R2tZoxcjrD78tUK6KzQd9ecFcx6zB latency: 600ms\n    pargraphsync_test.go:599: peerId: 12D3KooWFD11Qx1MPPksb8H3bvGBnRpwJywbmf8173Uhquzna43m latency: 25ms\n    pargraphsync_test.go:599: peerId: 12D3KooWAuScHgMXuVbWz8vYYVifFVaxyxBZwtyims98esGZ341k latency: 300ms\n    pargraphsync_test.go:599: peerId: 12D3KooWAdoznehjYnmJezqwu6AsYwMHAP8dPk2sXfx7isemnwoB latency: 26ms\n    pargraphsync_test.go:599: peerId: 12D3KooWB8WSx93KRVETxTF7Q4sSiLAUQQSXoFKjXTjtA632hgRH latency: 40ms\nBenchmarkGraphSync/Parallel-Graphsync_request_to_2_services\nBenchmarkGraphSync/Parallel-Graphsync_request_to_2_services-12                20        6114752715 ns/op        945141576 B/op    251551 allocs/op\nBenchmarkGraphSync/Parallel-Graphsync_request_to_3_services\nBenchmarkGraphSync/Parallel-Graphsync_request_to_3_services-12                20        4078232365 ns/op        944034287 B/op    237269 allocs/op\nBenchmarkGraphSync/Parallel-Graphsync_request_to_4_services\nBenchmarkGraphSync/Parallel-Graphsync_request_to_4_services-12                20        3723480395 ns/op        948365954 B/op    249734 allocs/op\nBenchmarkGraphSync/Parallel-Graphsync_request_to_5_services\nBenchmarkGraphSync/Parallel-Graphsync_request_to_5_services-12                20        3009754078 ns/op        948281245 B/op    251301 allocs/op\nBenchmarkGraphSync/Parallel-Graphsync_request_to_6_services\nBenchmarkGraphSync/Parallel-Graphsync_request_to_6_services-12                20        2518337921 ns/op        952988479 B/op    266350 allocs/op\nBenchmarkGraphSync/Graphsync_request_to_1_service\nBenchmarkGraphSync/Graphsync_request_to_1_service-12                          20        6174913552 ns/op        936396342 B/op    231778 allocs/op\nPASS\nok      github.com/filedrive-team/go-parallel-graphsync/example 540.403s\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffiledrive-team%2Fgo-parallel-graphsync","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffiledrive-team%2Fgo-parallel-graphsync","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffiledrive-team%2Fgo-parallel-graphsync/lists"}