{"id":15285736,"url":"https://github.com/mg98/ipfs-replicate","last_synced_at":"2026-05-02T13:35:46.454Z","repository":{"id":65536816,"uuid":"594014793","full_name":"mg98/ipfs-replicate","owner":"mg98","description":"Replicate IPFS' distributed data structure locally, based on network traces.","archived":false,"fork":false,"pushed_at":"2023-02-07T20:54:47.000Z","size":232,"stargazers_count":0,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-04-27T05:36:59.078Z","etag":null,"topics":["crawler","dag","ipfs","redisgraph","scraper"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mg98.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2023-01-27T11:53:00.000Z","updated_at":"2026-02-13T11:59:55.000Z","dependencies_parsed_at":"2023-02-15T13:01:19.487Z","dependency_job_id":null,"html_url":"https://github.com/mg98/ipfs-replicate","commit_stats":{"total_commits":12,"total_committers":2,"mean_commits":6.0,"dds":0.08333333333333337,"last_synced_commit":"8b16a4e6e17adc57f48f961e1a33e88912789408"},"previous_names":["mg98/ipfs-replica"],"tags_count":11,"template":false,"template_full_name":null,"purl":"pkg:github/mg98/ipfs-replicate","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mg98%2Fipfs-replicate","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mg98%2Fipfs-replicate/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mg98%2Fipfs-replicate/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mg98%2Fipfs-replicate/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mg98","download_url":"https://codeload.github.com/mg98/ipfs-replicate/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mg98%2Fipfs-replicate/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32536580,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-02T12:25:33.646Z","status":"ssl_error","status_checked_at":"2026-05-02T12:24:51.733Z","response_time":132,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","dag","ipfs","redisgraph","scraper"],"created_at":"2024-09-30T15:07:23.792Z","updated_at":"2026-05-02T13:35:46.428Z","avatar_url":"https://github.com/mg98.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# IPFS Replicate\n\n[![Test](https://github.com/mg98/ipfs-replicate/actions/workflows/test.yml/badge.svg)](https://github.com/mg98/ipfs-replicate/actions/workflows/test.yml)\n[![codecov](https://codecov.io/gh/mg98/ipfs-replicate/branch/main/graph/badge.svg?token=R3OYXX1HC7)](https://codecov.io/gh/mg98/ipfs-replicate)\n[![Go Report Card](https://goreportcard.com/badge/github.com/mg98/ipfs-replicate?)](https://goreportcard.com/report/github.com/mg98/ipfs-replicate)\n![License](https://img.shields.io/github/license/mg98/ipfs-replicate)\n\nThis software lets you replicate the distributed DAG of content blocks in IPFS locally, based on network traces.\nTo this end, it replicates the data structure in a RedisGraph database and downloads raw data blocks to disk.\n\n## How Does It Work?\n\nThis program uses [IPFS Metric Exporter](https://github.com/trudi-group/ipfs-metric-exporter),\nwhich is a plugin to IPFS that exports (among other things) the CID requests from the P2P gossip to a RabbitMQ exchange instance.\n\n**IPFS Replicate** subscribes to this exchange and processes incoming messages\nby recursively fetching the contents of the requested CIDs\nand populating the local database and data folder.\n\nThe raw data blocks are written as files to disk while the data structure is persisted in a RedisGraph database.\n\nFurthermore, this tool allows you to export those user events.\nThis can be useful in combination with the locally produced data structure for analyses that also contemplate user behavior.\n\n## Setup and Run\n\nTo quickly spin something up, you can launch the infrastructure using:\n\n```sh\ndocker-compose up\n```\n\nIf you want to run this program without Docker, \nyou can download the [binary](https://github.com/mg98/ipfs-replicate/releases) directly\nor build this project from source (`go build .`).\n\nNote that this program depends on other components, which you can be comprehended from the [`docker-compose.yml`](./docker-compose.yml).\nYou might then want to adjust some [environment variables](./.env.example).\n\n## Author Notes\n\nThis software has its origin in my [master thesis](https://marcelgregoriadis.com/master-thesis.pdf), \nwhere I used it to understand the type of files traded on IPFS\nand to follow the sequence of file retrievals for individual peers.\nThis allowed me to analyze the effectiveness of alternative chunking algorithms on data deduplication.\nI hope that by publishing this part of the software I can support the development of future scientific projects (or any other).\n\nPlease note that there is still a lot of room for improvement with this software.\nAs the biggest issue I regard the poor efficiency or _event throughput_.\nDue to the nature of IPFS and the reliance on network retrievals (sometimes for heavy CID trees),\nthis program will not keep up with the pace of incoming Bitswap events... at all!\nAlthough this project already leverages parallelization techniques, I think those can be further extended or optimized.\n\nThat said, contributions will be considered and are very welcome ❤️ \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmg98%2Fipfs-replicate","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmg98%2Fipfs-replicate","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmg98%2Fipfs-replicate/lists"}