{"id":21643917,"url":"https://github.com/application-research/delta-importer","last_synced_at":"2025-03-19T08:47:44.113Z","repository":{"id":153069538,"uuid":"619008200","full_name":"application-research/delta-importer","owner":"application-research","description":"Import client for Delta","archived":false,"fork":false,"pushed_at":"2023-07-04T16:37:21.000Z","size":7480,"stargazers_count":2,"open_issues_count":10,"forks_count":1,"subscribers_count":7,"default_branch":"main","last_synced_at":"2025-01-25T04:11:10.386Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/application-research.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-03-26T01:33:28.000Z","updated_at":"2023-04-14T19:56:08.000Z","dependencies_parsed_at":"2024-06-20T23:29:53.400Z","dependency_job_id":"4c54b570-24b3-4a9f-ada3-a7bee0018961","html_url":"https://github.com/application-research/delta-importer","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/application-research%2Fdelta-importer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/application-research%2Fdelta-importer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/application-research%2Fdelta-importer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/application-research%2Fdelta-importer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/application-research","download_url":"https://codeload.github.com/application-research/delta-importer/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244394463,"owners_count":20445634,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-25T05:36:53.118Z","updated_at":"2025-03-19T08:47:44.086Z","avatar_url":"https://github.com/application-research.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\u003ch1\u003e Δ Importer \u003c/h1\u003e\n\n\n\u003cimg src=\"./docs/assets/hero.png\" width=700/\u003e\n\u003c/div\u003e\n\n## What is this?\n- Delta Importer is a tool designed to be run on the Storage Provider infrastructure.\n- It facilitates automation of import deals - that is, importing .car files from the filesystem that match the CID of deal proposals sent to the provider.\n- It integrates with Delta-DM (Dataset Manager) to request deals from the self-service API, facilitating a fully automated dealmaking \u0026 deal ingestion pipeline.\n- It has multiple modes of operation, covering a variety of different data ingestion strategies\n- It’s designed from the ground up to be high performance, written in Go. It has tuneable import frequency/concurrent maximum to optimize for sealing throughput\n- Only one instance of Delta Importer is required per instance of Boost \n\n## Project Goals\n\u003e We intend to make the deal ingestion process fully automated, intelligent and streamlined, such that there is no functional difference between End-to-end (Online) and Import (Offline) deals.\n\u003e This will allow large-scale providers to easily and efficiently onboard large datasets, where the data transfer is decoupled from the dealmaking process.\n\n# Requirements\n- Go v1.19+\n- Rust (needed to build filecoin-ffi)\n- [Boost v1.6.0+](https://github.com/filecoin-project/boost)\n\n\u003e Assumption: all carfiles to import are named `\u003cpieceCID\u003e.car` , which matches the PieceCID of the deal made with Boost.\n\u003e This obviates the need for a File\u003c\u003eDeal mapping, as the importer can simply scan the filesystem for a file matching the PieceCID of the deal.\n\n# Installation\n\nPerform the following steps from a user account with `root` privileges. Note: Once installed, the `delta-importer` binary can be run from any user account.\n\nBuild from Source\n1. Clone `git clone https://github.com/application-research/delta-importer.git` \n2. `make all`\n3. `make install`\n\nThis will install the `delta-importer` binary to `/usr/local/bin`. Test it out by running `delta-importer --help`.\n\n# Usage\n\n`delta-importer`\n\n```\nNAME:\n   delta-importer - An application to facilitate importing deals into a Filecoin Storage Provider\n\nUSAGE:\n   delta-importer [global options] command [command options] [arguments...]\n\nCOMMANDS:\n   daemon, d  run the delta-importer daemon to continuously import deals\n   stats      get stats about imported deals\n   help, h    Shows a list of commands or help for one command\n\nGLOBAL OPTIONS:\n   --help, -h     show help\n   --version, -v  print the version\n```\n\n\n## Running the Importer Daemon\n\n### Configuration\n\nBy default, `delta-importer` stores all its local data in the `~/delta/importer` directory for the currently running user. If it does not exist, the tool will attempt to create the directory structure on first launch of the `daemon` command. This can be changed using the `--dir` flag or `DELTA_DIR` environment variable.\n\n### Command-Line Operation\nDelta Importer daemon requires a few configuration options to be set. These can be set via environment variables, or via command line flags.\n\nBelow is an example shell script to launch the importer daemon, running in **default** mode, and importing a new deal every **260** seconds, until a maximum of **175** deals are active in the sealing pipeline (AP+PC1+PC2+C2).\n\n```bash\ndelta-importer daemon \\\n  --boost-url 10.10.10.20 \\\n  --boost-gql-port 8080 \\\n  --boost-port 1288 \\\n  --boost-auth-token XXX.YYY.ZZZ \\ \n  --max_concurrent 175 \\ \n  --interval 260 \\ \n  --mode default  \n```\n\n\n#### Daemon Command Flags\n- Obtain the `boost-auth-token` by running the `boostd auth create-token --perm admin` command on your Boost node.\n- Obtain the `boost-url` and `boost-port` by running `boostd auth api-info --perm admin` on your Boost node.\n- The `--interval` and `--max_concurrent` flags are used to tweak the importer's speed. These parameters should be carefully tuned to match the provider's sealing throughput and available bandwidth. The example provided above is a good starting point for a provider with approximately 10TiB/day of sealing throughput.\n- See *Operational Modes* below for explanation of the `--mode` flag\n- Set the `--staging-dir` flag to have Delta Importer automatically copy carfiles to a staging directory before importing them. This is useful if your carfiles reside on a slower or remote filesystem, as Boost needs to read them twice (once for CommP verification, and once for AddPiece). If this is set, the carfiles will be automatically deleted from the staging directory after import is complete.\n\n### datasets.json\nThe `datasets.json` file is required to be present in the `delta-importer` data directory (defaults to `~/delta/importer/`). This file maintains a mapping between client `wallets` (i.e, who is making deals) with a `dataset slug` (identifier), and a directory to search for CAR files to import.\n\nExample `datasets.json`\n```json\n[\n  {\n    \"dataset\": \"radiant-ml\",\n    \"address\": [\"f1p3l3wgnfukemmaupqecwcoqp7fcgjcqgqcq7rja\"],\n    \"dir\": \"/mnt/delta-datasets/radiant-poc\",\n    \"ignore\": false\n  },\n  {\n    \"dataset\": \"cancer-imaging-archive\",\n    \"address\": [\"f1p3l3wgnfukemmaupqecwcoqp7fcgjcqgqcq7rja\", \"f2vyp7qmi4pvuj3f3qiha6oyskrjdho2xw6cjiexi\"],\n    \"dir\": \"/mnt/delta-datasets/cancer-imaging-archive\",\n    \"ignore\": true\n  }\n]\n```\n\nThis `datasets.json` file will be processed in order, preferring deals with the first dataset in the list. \n\nUsing the above example,\n- If a deal is found for `radiant-ml`, the importer will scan the `/mnt/delta-datasets/radiant-poc` directory for a CAR file matching the PieceCID of the deal. \n- If a match is found, the importer will import the data. \n- If no match is found, the importer will move on to the next dataset in the list, and attempt to import data for that dataset.\n\nSet the `ignore` flag to `true` to skip a dataset. This is useful if you want to speed-up the import loop by disabling a dataset from being imported (ex. if datacap has been exhausted, or data transfer is not complete yet)\n\n\u003eNote: The `dataset` field must be unique across all entries in the `datasets.json` file\n\n### Operational Modes\nDelta-Importer can be ran in three modes:\n\n1. **Default (Boost Scanning) Mode**: This is the default mode. \n`--mode default // not required`\nIn this mode, Delta Importer will scan Boost for deals awaiting import, and automatically match them to CAR files on the filesystem and import them.\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"./docs/assets/default-mode.png\" width=800/\u003e\n\u003c/div\u003e\n\u003cbr/\u003e\n\n\n2. **Pull Mode - Dataset**\n`--mode pull-dataset`\nIn this mode, the Delta Importer will request deals from the DDM self-service API per-dataset, before attempting to import them. \n\n\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"./docs/assets/pull-dataset-mode.png\" width=800/\u003e\n\u003c/div\u003e\n\u003cbr/\u003e\n\n3. **Pull Mode - CID**\n`--mode pull-cid`\nIn this mode, the Delta Importer will scan the filesystem for CAR files, and make requests to the DDM self-service API for each carfile.\nIt will check Boost to ensure duplicate deals are not requested.\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"./docs/assets/pull-cid-mode.png\" width=800/\u003e\n\u003c/div\u003e\n\u003cbr/\u003e\n\nWhen using in either `Pull Mode`, the `--ddm-api` and `--ddm-token` flags are required. These indicate the DDM API endpoint and the API token to use when making deal requests to the DDM API. Contact your DDM administrator for these parameters.\n\nAdditionally, `Pull Mode` allows optional specification of\n- `--ddm-delay-start`, which delays the number of days for requested deals start epoch. Valid values are between `1` and `14`, for example `--ddm-delay-start 7`\n- `--ddm-advance-end`, which advances the end epoch (i.e, shortens deal duration) by the specified number of days. Valid values are between `0` and `20`, for example `--ddm-advance-end 10`\n\n*example pull mode (Dataset) configuration*\n```bash\ndelta-importer daemon\\\n--boost-url 10.32.32.20 \\\n--boost-gql-port 8080\n--boost-port 1288 \\\n--boost-auth-token XXX.YYY.ZZZ \\ \n--max_concurrent 160 \\\n--interval 220 \\\n--mode pull-dataset \\\n--ddm-delay-start 7 \\\n--ddm-advance-end 10 \\\n--ddm-api http://ddm-api.delta.store/api/v1/self-service \\\n--ddm-token 4b28d311-8be6-48d7-801f-dcb6a87ad49d \n```\n\n## Other commands\n\nRun `delta-importer stats` to get a table showing statistics on imported deal data.\n\n\n\u003cimg src=\"./docs/assets/stats.png\" width=300/\u003e","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapplication-research%2Fdelta-importer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fapplication-research%2Fdelta-importer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapplication-research%2Fdelta-importer/lists"}