{"id":19294571,"url":"https://github.com/web3-storage/pickup","last_synced_at":"2025-04-22T08:30:33.501Z","repository":{"id":37482460,"uuid":"496962809","full_name":"web3-storage/pickup","owner":"web3-storage","description":"🛻 Pull CARs from IPFS to S3","archived":false,"fork":false,"pushed_at":"2023-08-18T09:41:12.000Z","size":2936,"stargazers_count":3,"open_issues_count":12,"forks_count":1,"subscribers_count":12,"default_branch":"main","last_synced_at":"2025-04-01T20:51:27.539Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/web3-storage.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-05-27T10:58:55.000Z","updated_at":"2023-11-28T22:59:43.000Z","dependencies_parsed_at":"2024-11-09T22:38:53.387Z","dependency_job_id":"1db0ef6b-a425-4db5-a981-21405a36fcc8","html_url":"https://github.com/web3-storage/pickup","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/web3-storage%2Fpickup","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/web3-storage%2Fpickup/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/web3-storage%2Fpickup/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/web3-storage%2Fpickup/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/web3-storage","download_url":"https://codeload.github.com/web3-storage/pickup/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250205971,"owners_count":21392157,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-09T22:38:44.513Z","updated_at":"2025-04-22T08:30:33.016Z","avatar_url":"https://github.com/web3-storage.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Pickup 🛻\n\nFetch content from IPFS by CID save it to S3 as a CAR.\n\nThis repo deploys resources to AWS and stiches them together to provide an Lambda-based HTTP interface and a worker pool in ECS. Pin requests are queued and handled by the `pickup` service, an auto-scaling set of `kubo` nodes. The DAG is saved as a CAR to S3, where E-IPFS can index and provide it to the public IPFS network.\n\n## API\n\nA minimal [ipfs-cluster](https://github.com/ipfs-cluster/ipfs-cluster) compatible http API is provided for adding pins and checking pin status in [api/basic](api/basic). The response objects match the shape ipfs-cluster would return so `pickup` can be used as a drop in replacement. Many of the properties make no sense for pickup and are faked.\n\n🏗 A full [pinning service api] is also implemented in [api/functions/PinningService.ts](api/functions/PinningService.ts), but is not currently in use. A future release may switch this to be the main interface once we need it.\n\n### POST pins/:cid\n\nMake a pin request by CID, asking the service to fetch the content from IPFS.\n\n```bash\n$ curl -X POST 'https://pickup.dag.haus/pins/bafybeifpaez32hlrz5tmr7scndxtjgw3auuloyuyxblynqmjw5saapewmu' -H \"Authorization: Basic $PICKUP_BASIC_AUTH_TOKEN\" -s | jq\n{\n  \"replication_factor_min\": -1,\n  \"replication_factor_max\": -1,\n  \"name\": \"\",\n  \"mode\": \"recursive\",\n  \"shard_size\": 0,\n  \"user_allocations\": null,\n  \"expire_at\": \"0001-01-01T00:00:00Z\",\n  \"metadata\": {},\n  \"pin_update\": null,\n  \"origins\": [],\n  \"cid\": \"bafybeifpaez32hlrz5tmr7scndxtjgw3auuloyuyxblynqmjw5saapewmu\",\n  \"type\": \"pin\",\n  \"allocations\": [],\n  \"max_depth\": -1,\n  \"reference\": null,\n  \"timestamp\": \"2022-10-21T08:50:48.304Z\"\n}\n```\n\n### GET /pins/:cid\n\nFind the status of a pin\n\n```bash\n❯ curl -X GET 'https://pickup.dag.haus/pins/bafybeifpaez32hlrz5tmr7scndxtjgw3auuloyuyxblynqmjw5saapewmu' -H \"Authorization: Basic $PICKUP_BASIC_AUTH_TOKEN\" -s | jq\n{\n  \"cid\": \"bafybeifpaez32hlrz5tmr7scndxtjgw3auuloyuyxblynqmjw5saapewmu\",\n  \"name\": \"\",\n  \"allocations\": [],\n  \"origins\": [],\n  \"created\": \"2022-10-21T08:50:48.304Z\",\n  \"metadata\": null,\n  \"peer_map\": {\n    \"12D3KooWArSKMUUeLk3z2m5LKyb9wGyFL1BtWCT7Gq7Apoo77PUR\": {\n      \"peername\": \"elastic-ipfs\",\n      \"ipfs_peer_id\": \"bafzbeibhqavlasjc7dvbiopygwncnrtvjd2xmryk5laib7zyjor6kf3avm\",\n      \"ipfs_peer_addresses\": [\n        \"/dns4/elastic.dag.house/tcp/443/wss/p2p/bafzbeibhqavlasjc7dvbiopygwncnrtvjd2xmryk5laib7zyjor6kf3avm\"\n      ],\n      \"status\": \"pinned\",\n      \"timestamp\": \"2022-10-21T08:54:28.962Z\",\n      \"error\": \"\",\n      \"attempt_count\": 0,\n      \"priority_pin\": false\n    }\n  }\n}\n```\n\n## Environment\n\nSet the following in the pickup worker env to tune it's behavior\n\n### `MAX_CAR_BYTES`\n\nMaximum bytes size of a CAR that pickup will fetch. Caps the anmount of data we will pull in a single job.\n\n**default: 31 GiB** _(33,285,996,544 bytes)_\n\n### `FETCH_TIMEOUT_MS`\n\nHow long to wait for fetching a CAR before failing the job. Caps the amount of time we spend on a job.\n\n**default: 4 hrs**\n\n_2/3rs of home internet users can upload faster than 20Mbit/s (fixed broadband), at which 32GiB would transfer in 3.5hrs._\n\nsee: https://www.speedtest.net/global-index\nsee: https://www.omnicalculator.com/other/download-time?c=GBP\u0026v=fileSize:32!gigabyte,downloadSpeed:5!megabit\n\n### `FETCH_CHUNK_TIMEOUT_MS`\n\nHow long to wait between chunks of data before failing a CAR. Limit the amount of time we spend waiting of a stalled fetch.\n\n**default: 2 min**\n\n### `BATCH_SIZE`\n\nHow many pin requests to handle concurrently per worker.\n\nUsed to set both the concurrency per worker *and* the max number of messages each worker fetches from the queue in a single batch. \n\n**default: 10**\n\n## Getting Started\n\nPR's are deployed automatically to `https://\u003cpr#\u003e.pickup.dag.haus`. The `main` branch is deployed to https://staging.pickup.dag.haus and staging builds are promoted to prod manually via the UI at https://console.seed.run/dag-house/pickup\n\nTo work on this codebase you need:\n- node v16\n- An AWS account with the AWS CLI configured locally\n- Copy `.env.tpl` to `.env.local` and set `CLUSTER_BASIC_AUTH_TOKEN` with a base64 encoded user:pass string.\n- Install the deps with `npm i`\n\nDeploy dev services to your aws account and start dev console\n\n```console\nnpm start\n```\n\nSee: https://docs.sst.dev for more info on how things get deployed.\n\nTo remove dev services to your aws account:\n\n```console\nnpm run remove\n```\n\n## Overview\n\nProject structure:\n\n```\n├── Dockerfile - image for the pickup worker run in ECS\n├── api        - lambda \u0026 dynamoDB implementation of the pinning service api \n├── pickup     - worker to fetch cid as CAR and write to s3\n└── stacks     - sst and aws cdk code to deploy all the things \n```\n\nThe pinning service API is implemented as a lambda:\n\n`POST /pins {cid, name, origins, meta}` route creates:\n- A pinning service record in a dynamo db table. Needed to fulfil the pinning service api. \n`(requestId, status, created, userid, appName, cid, name, origins[], meta{})`\n- A message to sqs queue with details needed to fetch a cid and write CAR to S3. \n`(requestId, cid, origins[], awsRegion, s3Bucket, s3Path)`\n\nThe queue consumer is an autoscaling set of go-ipfs nodes (thanks @thattommyhall ✨), with a pickup sidecar, in ECS. The sidecar long-polls the sqs queue, gets next message, connects to `origins[]`, fetches `cid` as a CAR, and writes it to S3 at `(awsRegion, s3Bucket, s3Path)`.\n\nWhile we wait for fetching the CAR to complete, we bump up the \"visibility timeout\" on the message, so that message remains hidden from other workers, up to a configured `ipfsTimeout`.\n\nOn failure, where processing hits an error or a timeout, pickup will stop incrementing the visibility timeout on the message and it becomes visible in the queue again to be retried.\n\nAfter `maxRetries` we send the message to the Dead Letter Queue to take it out of circulation, and track metrics on failures.\n\nSuccess means the complete CAR has been saved on s3, for indexing by Elastic provider 🌐✨. Pickup deletes the message from the queue. The CAR has the `psaRequestId` in it's metadata.\n\nOn succesful write to s3, a lambda is triggered to update status of DynamoDB record for that `psaRequestId`.\n\n## Diagram\n\n\u003cpre\u003e\n\n                    ┌─────────────┐\n                    │   lambda    │\n    ●──────1.──────▶│ POST /pins  │────────2. insert──────────┐\n                    └─────────────┘                           │\n                           │                                  ▼\n                           │                        /───────────────────\\\n                           │                        │                   │\n                           │                        │     DynamoDB      │\n                      3. send msg                   │    PinRequests    │\n                           │                        │                   │\n                           │                        \\───────────────────/\n                           │                                  ▲\n                           ▼                                  │\n                      ┌─────────┐                        8. update\n                      │         │                             │\n                      │         │                      ┌─────────────┐\n                      │         │                      │   lambda    │\n                      │   SQS   │                      │   S3 PUT    │\n                      │  queue  │                      └─────────────┘\n                      │         │                             ▲\n                      │         │                             │\n                      │         │                        7. S3 Event\n                      └─────────┘                             │\n                           │                        ┌───────────────────┐\n                           │                        │                   │\n                           │                        │        S3         │\n                           │                        │                   │\n           ─ ─ ─ ─ ─ ─ ─ ─ ┼─ 4. process msg─┐      └───────────────────┘\n          │                                  │                ▲\n                           │                 │                │\n          │                                  │            6. S3 PUT\n          ▼                ▼                 ▼                │\n   ┌─────────────┐  ┌─────────────┐   ┌─────────────┐         │\n┌ ─│             │─ ┤             ├ ─ ┤             ├ ┐       │\n   │   pickup    │  │   pickup    │   │   pickup    │─────────┘\n│  │             │  │             │   │             │ │\n   └─────────────┘  └─────────────┘   └─────────────┘\n│         │                │                 ▲        │\n                                             │\n│         │                │            5. ipfs get   │\n                                             │\n│         ▼                ▼                 ▼        │\n   ┌─────────────┐  ┌─────────────┐   ┌─────────────┐\n│  │             │  │             │   │             │ │\n   │   go-ipfs   │  │   go-ipfs   │   │   go-ipfs   │\n│  │             │  │             │   │             │ │\n   └─────────────┘  └─────────────┘   └─────────────┘\nECS ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘\n\n\u003c/pre\u003e\n\n## Validation\n\nThe system provides a validation step that run after the upload on S3. \n\nCARs are written to a temporary bucket. If the CAR is valid, it's copied to the target bucket, removed from the temporary one, and the pin state is updated to `pinned` on DynamoDB\n\n## Integration with Elastic IPFS\n\nsee: https://github.com/elastic-ipfs/elastic-ipfs\n\nSends a message on the indexer SQS topic from our lambda when the CAR is written to our s3 bucket.\n\n## aws notes\n\nremove a bunch of buckets by bucket prefix name\n\n```sh\n# danger! will delete things!\naws s3 ls | grep olizilla-pickup | awk '{print \"s3://\"$3}' | xargs -n 1 -I {} aws s3 rb {} --force;\n```\n\n[pinning service api]: https://ipfs.github.io/pinning-services-api-spec/\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fweb3-storage%2Fpickup","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fweb3-storage%2Fpickup","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fweb3-storage%2Fpickup/lists"}