{"id":30893242,"url":"https://github.com/n0-computer/big-block-sync","last_synced_at":"2025-09-08T20:06:51.756Z","repository":{"id":311159220,"uuid":"1042689004","full_name":"n0-computer/big-block-sync","owner":"n0-computer","description":"A sync algorithm for somewhat content-addressed data","archived":false,"fork":false,"pushed_at":"2025-08-27T08:25:30.000Z","size":110,"stargazers_count":0,"open_issues_count":1,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-31T14:50:37.361Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/n0-computer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-08-22T12:17:16.000Z","updated_at":"2025-08-27T08:25:34.000Z","dependencies_parsed_at":"2025-08-22T14:40:30.227Z","dependency_job_id":"c96b8529-66aa-4c23-900a-e33bc8bba047","html_url":"https://github.com/n0-computer/big-block-sync","commit_stats":null,"previous_names":["n0-computer/big-block-sync"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/n0-computer/big-block-sync","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/n0-computer%2Fbig-block-sync","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/n0-computer%2Fbig-block-sync/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/n0-computer%2Fbig-block-sync/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/n0-computer%2Fbig-block-sync/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/n0-computer","download_url":"https://codeload.github.com/n0-computer/big-block-sync/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/n0-computer%2Fbig-block-sync/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":274231511,"owners_count":25245600,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-08T02:00:09.813Z","response_time":121,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-09-08T20:06:46.124Z","updated_at":"2025-09-08T20:06:51.715Z","avatar_url":"https://github.com/n0-computer.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Sync from multiple sources with different hash\n\n**🧪 Experimental**\n\n## Provide side\n\nThis is just a vanilla iroh-blobs blobs provide, just so the cli is self-contained.\n\nMake up some random data and share it:\n```\n\u003e head -c 1000000000 /dev/urandom \u003e test1\n\u003e cargo run --release provide test1\n\nProviding content with ticket:\nblobac7gr35ay6cpsw2xw4wed4zfzmrtbc5mcpp5nyx2ygallc5kitdxcajinb2hi4dthixs6ylqomys2mjoojswyylzfzxdaltjojxwqltjojxwqltmnfxgwlrpaiaalxzlfgnn4aqbfiaqj7yc6akniaaaaaaaaaaaagn54aqa4xnynd53jhyewkn2ayynga3axgdtxm4ygy2acj4zbz4umdkvnwra\n```\n\nAlternatively, share some generated random data:\n\n```\n\u003e cargo run --release provide -n 1000000000\n\nProviding content with ticket:\nblobadqfuc5jvidteqajzlrxbsgdkdrjocigoowdwss5z5dzpk2qht2gyajinb2hi4dthixs6zlvmmys2mjoojswyylzfzxdaltjojxwqltjojxwqltmnfxgwlrpaiae7sdexkdjwayayculffmgtmbqaiqo4jkthf6j4euyg6bimypexqlexdivrel4myohgv5uwe5urmqj\n```\n\n## Sync side\n\nThe sync side takes multiple blob tickets for raw blobs. The hashes don't have to be identical, but the size must be.\n\nIt will download from all tickets concurrently and build the result out of the pieces.\n\nYou can print verbose statistics with `-v` or `-v -v`.\n\nYou can also provide a target path, in case you actually want the data.\n\n```\ncargo run --release sync -v -v \\\n  blobac7gr35ay6cpsw2xw4wed4zfzmrtbc5mcpp5nyx2ygallc5kitdxcajinb2hi4dthixs6ylqomys2mjoojswyylzfzxdaltjojxwqltjojxwqltmnfxgwlrpaiaalxzlfgnn4aqbfiaqj7yc6akniaaaaaaaaaaaagn54aqa4xnynd53jhyewkn2ayynga3axgdtxm4ygy2acj4zbz4umdkvnwra \\\n  blobadqfuc5jvidteqajzlrxbsgdkdrjocigoowdwss5z5dzpk2qht2gyajinb2hi4dthixs6zlvmmys2mjoojswyylzfzxdaltjojxwqltjojxwqltmnfxgwlrpaiae7sdexkdjwayayculffmgtmbqaiqo4jkthf6j4euyg6bimypexqlexdivrel4myohgv5uwe5urmqj\n\nNode       Errors\tChunks\tDuration\tRate\nfeca429989\t0\t963251\t9.331   s\t100.807  MiB/s\nbe68efa0c7\t1\t14336\t9.331   s\t1.500    MiB/s\nNode       Bitfield\nfeca429989 ████████████████████████████████████████████████████████████████████████████████████████████████████\nbe68efa0c7 ░                          ░      ░     ░     ░     ░     ░     ░    ░     ░     ░    ░░    ░     ░ \nDownloaded content: 1000000000 bytes, hash b57f9fd9af7dc03fd254a26da58176b374cae00af0a981f98fd71f5d95fc748a\n```\n\nTo get even more info, run with `RUST_LOG=big_block_sync=trace`.\n\n# Algorithm\n\n\u003c!-- thanks claude --\u003e\n\nThis implementation provides a smart parallel downloader that syncs content from multiple sources simultaneously, using dynamic quality-based scheduling and bitfield tracking for optimal performance.\nThe algorithm begins by connecting to all provided sources and measuring their initial latency to establish baseline quality metrics. Each source is ranked by a composite quality score that prioritizes nodes with fewer errors, higher download rates, and lower latency. Content is divided into fixed size blocks, with the system maintaining separate bitfields to track which chunks are missing from the target, which ranges are currently claimed by active downloads, and which ranges each node has successfully contributed.\n\nThe core scheduling logic operates by claiming unclaimed chunks and assigning them to the highest-quality available nodes up to a configurable parallelism limit. As downloads complete, the algorithm updates the per-node statistics including error counts, download rates, and contributed ranges. When no unclaimed chunks remain but downloads are still active, the system enters a *finish mode* where it compares the worst-performing busy node against the best available free node. If the performance gap is significant (typically a 4x rate difference), it cancels the slow download to reassign those chunks to a faster node.\n\nThis adaptive approach ensures optimal resource utilization throughout the download process, automatically load-balancing based on real-time performance while providing detailed statistics and visual bitfield representations showing exactly which parts of the content each node contributed to the final result.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fn0-computer%2Fbig-block-sync","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fn0-computer%2Fbig-block-sync","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fn0-computer%2Fbig-block-sync/lists"}