{"id":23113764,"url":"https://github.com/arken/arken","last_synced_at":"2025-08-16T20:31:28.070Z","repository":{"id":47485985,"uuid":"271689796","full_name":"arken/arken","owner":"arken","description":"A Distributed Digital Archive Built for the World's Open Source and Scientific Data.","archived":false,"fork":false,"pushed_at":"2023-12-27T15:21:30.000Z","size":747,"stargazers_count":9,"open_issues_count":2,"forks_count":2,"subscribers_count":5,"default_branch":"main","last_synced_at":"2024-11-24T18:18:21.818Z","etag":null,"topics":["arken","backup","cluster","distributed","ipfs","keyset","storage","storage-space"],"latest_commit_sha":null,"homepage":"https://arken.io","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/arken.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-06-12T02:24:12.000Z","updated_at":"2024-05-30T11:54:56.000Z","dependencies_parsed_at":"2024-11-15T22:02:23.690Z","dependency_job_id":"02ae62d3-d528-49de-bee3-06063336db8e","html_url":"https://github.com/arken/arken","commit_stats":null,"previous_names":["arkenproject/arken"],"tags_count":10,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arken%2Farken","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arken%2Farken/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arken%2Farken/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arken%2Farken/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/arken","download_url":"https://codeload.github.com/arken/arken/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":230056122,"owners_count":18165879,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arken","backup","cluster","distributed","ipfs","keyset","storage","storage-space"],"created_at":"2024-12-17T03:13:36.962Z","updated_at":"2024-12-17T03:13:38.509Z","avatar_url":"https://github.com/arken.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Arken\n\n\u003cimg src=\"https://avatars.githubusercontent.com/u/66809416?s=200\u0026v=4\"\u003e\n\nA Distributed Digital Archive Built for the World's Open Source and Scientific Data.\n\n[![Go Report Card](https://goreportcard.com/badge/github.com/arken/arken)](https://goreportcard.com/report/github.com/arken/arken)\n\n## Table of Contents\n\n- [A Bit of Backstory](#a-bit-of-backstory)\n- [What is Arken?](#what-is-arken)\n  - [What's a manifest](#whats-a-manifest)\n    - [Manifest Security](#manifest-security)\n    - [Rebalancing Data Across the Community](#rebalancing-data-across-the-community)\n- [Getting Started](#getting-started)\n- [What's the process as someone who want's to backup important data?](#what's-the-process-as-someone-who-want's-to-backup-important-data?)\n- [What's the process as someone donating their extra storage space?](#what's-the-process-as-someone-donating-their-extra-storage-space?)\n\n## A Bit of Backstory\n\nMany researchers, museums, and archivists are struggling to host and protect a vast amount of important public data. \nOn the other hand, there are many of us developers, tinkerers, and general computer enthusiasts who have extra storage \nspace on our home servers.\n\nThe goal of Arken is to build an autonomous system for organizing, balancing, and distributing this data among users who \ncan donate their extra space. \n\n```\n+-------------GitHub/GitLab/GitTea-----------------+\n|    +----------+   +----------+    +----------+   |\n|    | manifest |   | manifest |    | manifest |   |\n|    +-----|----+   +-----|\\---+    +-----|----+   |\n+----------|--------------|-\\-------------|--------+\n           |              |  \\           /\\\n           |              |   \\         /  \\\n           |              |    \\       /    \\\n           |              |     \\     /      \\\n           |              |      \\   /        \\\n           |              |       \\ /          \\\n           v              v        v            v\n        [Arken]       [Arken]\u003c--\u003e[Arken]\u003c---\u003e[Arken]\n```\n\n# What is Arken?\n\nArken is a management engine that runs on top of the IPFS (Interplanetary File System) protocol. Each instance of Arken \ncalculates which important files are hosted by the fewest number of other nodes on the network and should thus be \nlocally backed up to reduce the risk of data loss. Arken also knows how much space it's using on your system and will \nrespect limits you set by locally deleting data that is backed up by more than 10% of the cluster. \n\n### What's a Manifest?\n\nArken uses Manifests to transparently keep track of which files are important to the network and should be\nmonitored and backed up if needed. Unlike a Pinset in an IPFS cluster, a manifest is simply a plain text git repository\nmade of up file identifiers. Additionally, Manifests are easy to audit so you can actually know what data you're helping\npreserve. manifest repositories can contain an arbitrary number of directories used to organize manifest files as long as \nthey also contain a `config` TOML file. This config file provides a replication factor that is the number of nodes in the\ntotal network that should be storing a file at any given time.\n\nWhile Manifests tell Arken which files should be stored on the subscribed nodes, they don't contain any of the\ndata to be backed up onto the network. To import data to a manifest, users add files to IPFS and record the File \nIdentifiers (IPFS CID) to a manifest file. From there, nodes will begin pulling data directly from the user to the cluster.\n\n##### Manifest Security\n\nSince Manifests are openly available through Git repositories, they can be easily audited but can only be changed by \nusers who have access to those Git repositories or through pull requests.\n\n##### Rebalancing Data Across the Community\n\nArken instances will periodically query IPFS for the number of other nodes hosting a particular file and attempt to \nreplace one well backed up file on the system with files below the optimal threshold.\n\n## Getting Started\n\n#### Tutorials:\n[Getting Started with Arken on a Raspberry Pi](https://github.com/arken/arken/blob/master/docs/raspberry-pi-setup.md)\n\nTo start running a node, you can download Arken as a Golang program or as a Docker container. \n**It's recommended to run Arken as a Docker container for simplicity and ease of updating.** \n\n##### Docker:\n\n```\ndocker run -d --name arken \\\n -v STORAGE:/data/storage \\\n -v DATABASE:/data/database \\\n -v REPOSITORIES:/data/repositories \\\n -v CONFIG:/data/config \\\n -e ARKEN_GENERAL_POOLSIZE=2TB \\\n -e ARKEN_DB_PATH=/data/database/keys.db \\\n -e ARKEN_SOURCES_CONFIG=/data/config/keysets.yaml \\\n -e ARKEN_SOURCES_REPOSITORIES=/data/repositories \\\n -e ARKEN_SOURCES_STORAGE=/data/storage \\\n -p 4001:4001 \\\n --restart=always ghcr.io/arken/arken\n```\n\n##### Go Package:\n\n```\ngo get github.com/arken/arken\ngo run arken\n```\n\n### What's the process as someone who wants to back up important data?\n\nLet's say that you are a scholar who wants to preserve some important works of humanity, or a researcher who wants \nto back up the DNA of an extinct animal/plant. How would you go about adding your data to the distributed file system? \nFirst, you would download \u0026 run the [Arken Import Tool](https://github.com/arken/ait). Using the Arken Import tool you can create \na manifest file of the IPFS identifiers for your data. At this point you can either upload the manifest to your own Git \nrepository (this is best if you want to run your own pool of workers) or make an application to put your data in the\nCore manifest repository. The Core manifest repository consists of extremely important data to preserve and is what the \ncommunity donating their extra disk space uses by default.\n\n### What's the process as someone donating their extra storage space?\n\nOld computers or servers with some empty storage space make excellent Arken nodes. Check out our guide for configuring a Raspberry Pi with Docker and External Storage Arken [here](https://github.com/arken/arken/blob/master/docs/raspberry-pi-setup.md). After installing the \nArken program, you can configure it either through environment variables or the Arken configuration file located at `~/.arken/`. You can check out an example of an Arken Docker-Compose file [here](https://github.com/arken/arken/blob/master/docs/examples/docker-compose.yml). The core manifest will be available by default, but because Manifests are just Git repositories, you can add and use \nany manifest you'd like. For example, you can donate space to the core community pool but also sync a custom manifest of \nsome vacation pictures amongst yours and a few friends' machines.\n\nAfter the configuration, that's it! Arken will continue to run in the background, determining files with the fewest \nnumber of other nodes hosting them and rebalancing as necessary.\n\n## License\n\nCopyright 2020-2022 Alec Scott \u0026 Arken Team \u003cteam@arken.io\u003e\n\nLicensed under the Apache License, Version 2.0 (the \"License\");\nyou may not use this file except in compliance with the License.\nYou may obtain a copy of the License at\n\nhttp://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software\ndistributed under the License is distributed on an \"AS IS\" BASIS,\nWITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\nSee the License for the specific language governing permissions and\nlimitations under the License.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farken%2Farken","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Farken%2Farken","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farken%2Farken/lists"}