{"id":21643915,"url":"https://github.com/application-research/autoretrieve","last_synced_at":"2025-04-11T18:21:47.314Z","repository":{"id":37079762,"uuid":"440653034","full_name":"application-research/autoretrieve","owner":"application-research","description":"A server to make GraphSync data accessible on IPFS","archived":false,"fork":false,"pushed_at":"2024-08-21T03:29:28.000Z","size":678,"stargazers_count":22,"open_issues_count":35,"forks_count":7,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-04-11T18:21:41.547Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/application-research.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-12-21T21:19:24.000Z","updated_at":"2023-07-14T23:59:58.000Z","dependencies_parsed_at":"2023-02-18T03:00:46.694Z","dependency_job_id":null,"html_url":"https://github.com/application-research/autoretrieve","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/application-research%2Fautoretrieve","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/application-research%2Fautoretrieve/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/application-research%2Fautoretrieve/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/application-research%2Fautoretrieve/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/application-research","download_url":"https://codeload.github.com/application-research/autoretrieve/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248456388,"owners_count":21106607,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-25T05:36:52.471Z","updated_at":"2025-04-11T18:21:47.268Z","avatar_url":"https://github.com/application-research.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Autoretrieve\n\nAutoretrieve is a standalone Graphsync-to-Bitswap proxy server, which allows IPFS clients to retrieve data which may be available on the Filecoin network but not on IPFS (such as data that has become unpinned, or data that was uploaded to a Filecoin storage provider but was never sent to an IPFS provider).\n\n## What problem does Autoretrieve solve?\n\nProtocol Labs develops two decentralized data transfer protocols - Bitswap and GraphSync, which back the IPFS and Filecoin networks, respectively. Although built on similar principles, these networks are fundamentally incompatible and separate - a client of one network cannot retrieve data from a provider of the other. [TODO: link more info about differences.] This raises the following issue: what if there is data that exists on Filecoin, but not on IPFS?\n\nAutoretrieve is a \"translational proxy\" that allows data to be transferred from Filecoin to IPFS in an automated fashion. The existing alternatives for Autoretrieve's Filecoin-to-IPFS flow are the Boost IPFS node that providers may optionally enable, and manual transfer.\n\n[Boost IPFS node](https://github.com/filecoin-project/boost/issues/709) is not always a feasible option for several reasons:\n- Providers are not incentivized to enable this feature\n- Only free retrievals are supported\n\nIn comparison, Autoretrieve:\n- Is a dedicated node, and operational burden/cost does not fall on storage provider operators\n- Supports paid retrievals (the Autoretrieve node operator covers the payment)\n\n## How does Autoretrieve work?\n\nAutoretrieve is at its core a Bitswap server. When a Bitswap request comes in, Autoretrieve queries an indexer for Filecoin storage providers that have the requested CID. The providers are sorted and retrieval is attempted sequentially until a successful retrieval is opened. As the GraphSync data lands into Autoretrieve from the storage provider, the data is streamed live back to the IPFS client.\n\nIn order for IPFS clients to be able to retrieve Filecoin data using Autoretrieve, they must be connected to Autoretrieve. Currently, Autoretrieve can be advertised to the indexer (and by extension the DHT) by Estuary. Autoretrieve does not currently have an independent way to advertise its own data.\n\nIf an Autoretrieve node is not advertised, clients may still download data from it if a connection is established either manually, or by chance while walking through the DHT searching for other providers.\n\n## Usage\n\nAutoretrieve uses Docker with Buildkit for build caching. Docker rebuilds are\nquite fast, and it is usable for local development. Check the docker-compose\ndocumentation for more help.\n\n```console\n$ DOCKER_BUILDKIT=1 docker-compose up\n```\n\nYou may optionally set `FULLNODE_API_INFO` to a custom fullnode's WebSocket\naddress. The default is `FULLNODE_API_INFO=wss://api.chain.love`.\n\nBy default, config files and cache are stored at `~/.autoretrieve`. When using\ndocker-compose, a binding is created to this directory. This location can be\nconfigured by setting `AUTORETRIEVE_DATA_DIR`.\n\nInternally, the Docker volume's path on the image is `/root/.autoretrieve`. Keep\nthis in mind when using the Docker image directly.\n\n## Configuration\n\nSome CLI flags and corresponding environment variables are available for basic configuration.\n\nFor more advanced configuration, `config.yaml` may be used. It lives in the autoretrieve data directory, and will be automatically generated by running autoretrieve. It may also be manually generated using the `gen-config` subcommand.\n\nConfigurations are applied in the following order, from least to most important:\n- YAML config\n- Environment variables\n- CLI flags\n\n### YAML Example\n\n```yaml\nadvertise-endpoint-url: # leave blank to disable, example https://api.estuary.tech/autoretrieve/heartbeat (must be registered)\nadvertise-endpoint-token: # leave blank to disable\nlookup-endpoint-type: indexer # indexer | estuary\nlookup-endpoint-url: https://cid.contact # for estuary endpoint-type: https://api.estuary.tech/retrieval-candidates\nmax-bitswap-workers: 1\nrouting-table-type: dht\nprune-threshold: 1GiB # 1000000000, 1 GB, etc. Uses go-humanize for parsing. Table of valid byte sizes can be found here: https://github.com/dustin/go-humanize/blob/v1.0.0/bytes.go#L34-L62\npin-duration: 1h # 1h30m, etc.\nlog-resource-manager: false\nlog-retrieval-stats: false\ndisable-retrieval: false\ncid-blacklist:\n  - QmCID01234\n  - QmCID56789\n  - QmCIDabcde\nminer-blacklist:\n  - f01234\n  - f05678\nminer-whitelist:\n  - f01234\ndefault-miner-config:\n  retrieval-timeout: 1m\n  max-concurrent-retrievals: 1\nminer-configs:\n  f01234:\n    retrieval-timeout: 2m30s\n    max-concurrent-retrievals: 2\n  f05678:\n    max-concurrent-retrievals: 10\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapplication-research%2Fautoretrieve","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fapplication-research%2Fautoretrieve","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapplication-research%2Fautoretrieve/lists"}