{"id":15074479,"url":"https://github.com/fedragon/ark","last_synced_at":"2026-03-07T02:02:03.891Z","repository":{"id":197312548,"uuid":"627751646","full_name":"fedragon/ark","owner":"fedragon","description":"Manages an archive of media files","archived":false,"fork":false,"pushed_at":"2025-08-11T19:15:18.000Z","size":40672,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-10-13T05:31:41.366Z","etag":null,"topics":["golang","grpc","nas"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fedragon.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2023-04-14T05:59:57.000Z","updated_at":"2025-08-11T19:15:23.000Z","dependencies_parsed_at":null,"dependency_job_id":"6375c437-c664-4e09-961f-e60e80c42834","html_url":"https://github.com/fedragon/ark","commit_stats":null,"previous_names":["fedragon/ark"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/fedragon/ark","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fedragon%2Fark","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fedragon%2Fark/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fedragon%2Fark/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fedragon%2Fark/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fedragon","download_url":"https://codeload.github.com/fedragon/ark/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fedragon%2Fark/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30205893,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-06T19:07:06.838Z","status":"online","status_checked_at":"2026-03-07T02:00:06.765Z","response_time":53,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["golang","grpc","nas"],"created_at":"2024-09-25T03:33:43.612Z","updated_at":"2026-03-07T02:02:03.834Z","avatar_url":"https://github.com/fedragon.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Ark\n\nManages an archive of media files, identifying and skipping duplicates on import; it archives files by their creation date. Use at your own risk.\n\n**Note:** it can only guarantee atomic file moves on UNIX filesystems.\n\n## Raison d'être\n\nOver the years I have accumulated hundreds of GBs of photos and videos, stored on a multitude of removable drives. I eventually bought a NAS for home usage and moved all files there, but there's a lot of duplication and mess because of backups taken over the years.\n\nThere are of course plenty of applications on the market that can manage an archive of media, but this looked like (and was, in fact) an interesting pet project.\n\n## Decision log\n\n- I'm going to consider a file to be the duplicate of another one if and only if hashing them yields the same result: no other attributes (e.g. file name, creation date, ...) are taken into account\n- I'm going to compute file hashes using the [go porting](https://github.com/lukechampine/blake3) of the BLAKE3 cryptographic hash function, because of its performance\n- Server and client will communicate over gRPC: this enables them to run on different machines and to leverage HTTP/2 to stream files over the network\n- Instead of vanilla gRPC, I'm going to use the [connect-go](https://github.com/connectrpc/connect-go) library, primarily to experiment with it\n- Clients will authenticate their requests using JWT tokens, leveraging connect-go's [interceptors](https://connect.build/docs/go/interceptors)\n- I'm going to report some key metrics to [Prometheus](https://prometheus.io/docs/introduction/overview/) to measure the system's performance\n- I'm going to store file metadata (e.g. path, hash, ...) in a [Redis](https://redis.io/) database\n\n## Note: Creation date\n\nThe creation date is extracted, whenever possible, from the file's [EXIF](https://exiftool.org/TagNames/EXIF.html) header. When that is not possible (either because the file type is not supported or there is no EXIF data), the file modification time is used as a fallback.\n\nEXIF can currently be parsed from:\n\n- JPEG, thanks to [go-jpeg-image-structure](https://github.com/dsoprea/go-jpeg-image-structure)\n- HEIC, thanks to [go-heic-exif-extractor](https://github.com/dsoprea/go-heic-exif-extractor)\n- TIFF-like headers such as TIFF, CR2, and ORF using my own [tiff-parser](https://github.com/fedragon/tiff-parser)\n\n## Components\n\n### Server\n\nRuns on a dedicated machine (a NAS or wherever you'd like to store your media).\nReceives `UploadFile` gRPC requests from clients, archiving files by creation date. It identifies files by their pre-computed hash and skips any duplicates that may be submitted for upload.\n\n### Client\n\nMay run on any machine having network access to the server.\nRecursively walks through a directory containing media files, computing the hash of each of them and issuing `UploadFile` requests to the server. It initially only sends the file metadata: the actual file content is only sent (in chunks) if the server confirms that it's not a duplicate.\n\n## How it works\n\nThe diagram below describes how a Client uploads files to the Server. For brevity's sake, the diagram only shows how a single file is uploaded and errors are not displayed. Any error will break the circuit.\n\n```\n                         +---------+                +---------+                +-----+ +-----+\n                         | Client  |                | Server  |                | DB  | | HDD |\n                         +---------+                +---------+                +-----+ +-----+\n                              |                          |                        |       |\n                              | Compute file hash        |                        |       |\n                              |------------------        |                        |       |\n                              |                 |        |                        |       |\n                              |\u003c-----------------        |                        |       |\n                              |                          |                        |       |\n                              | Send file metadata       |                        |       |\n                              |-------------------------\u003e|                        |       |\n                              |                          |                        |       |\n                              |                          | Does it exist          |       |\n                              |                          |-----------------------\u003e|       |\n         -------------------\\ |                          |                        |       |\n         | alt: file exists |-|                          |                        |       |\n         |------------------| |                          |                        |       |\n                              |                          |                        |       |\n                              |                          |                    Yes |       |\n                              |                          |\u003c-----------------------|       |\n                              |                          |                        |       |\n                              |      File already exists |                        |       |\n                              |\u003c-------------------------|                        |       |\n                              |                          |                        |       |\n                              | Skip file                |                        |       |\n                              |----------                |                        |       |\n                              |         |                |                        |       |\n                              |\u003c---------                |                        |       |\n----------------------------\\ |                          |                        |       |\n| else: file does not exist |-|                          |                        |       |\n|---------------------------| |                          |                        |       |\n----------------------------\\ |                          |                        |       |\n| loop: for each file chunk |-|                          |                        |       |\n|---------------------------| |                          |                        |       |\n                              |                          |                        |       |\n                              | Send file chunk          |                        |       |\n                              |-------------------------\u003e|                        |       |\n                              |                          |                        |       |\n                              |                          | Store file chunk       |       |\n                              |                          |-----------------       |       |\n                              |                          |                |       |       |\n                              |                          |\u003c----------------       |       |\n                              |                          |                        |       |\n                              |                       OK |                        |       |\n                              |\u003c-------------------------|                        |       |\n                 -----------\\ |                          |                        |       |\n                 | end loop |-|                          |                        |       |\n                 |----------| |                          |                        |       |\n                              |                          |                        |       |\n                              |                          | Atomically write file  |       |\n                              |                          |-------------------------------\u003e|\n                              |                          |                        |       |\n                              |                          |                        |    OK |\n                              |                          |\u003c-------------------------------|\n                              |                          |                        |       |\n                              |                          | Store file metadata    |       |\n                              |                          |-----------------------\u003e|       |\n                              |                          |                        |       |\n                              |                          |                     OK |       |\n                              |                          |\u003c-----------------------|       |\n                              |                          |                        |       |\n                              |                       OK |                        |       |\n                              |\u003c-------------------------|                        |       |\n                      ------\\ |                          |                        |       |\n                      | end |-|                          |                        |       |\n                      |-----| |                          |                        |       |\n                              |                          |                        |       |\n```\n\n### Credits\n\nThe diagram has been generated by https://weidagang.github.io/text-diagram/ using the following script:\n\n```\nobject Client Server DB HDD\nClient-\u003eClient: Compute file hash\nClient-\u003eServer: Send file metadata\nServer-\u003eDB: Does it exist\nnote left of Client: alt: file exists\nDB-\u003eServer: Yes\nServer-\u003eClient: File already exists\nClient-\u003eClient: Skip file\nnote left of Client: else: file does not exist\nnote left of Client: loop: for each file chunk\nClient-\u003eServer: Send file chunk\nServer-\u003eServer: Store file chunk\nServer-\u003eClient: OK\nnote left of Client: end loop\nServer-\u003eHDD: Atomically write file\nHDD-\u003eServer: OK\nServer-\u003eDB: Store file metadata\nDB-\u003eServer: OK\nServer-\u003eClient: OK\nnote left of Client: end\n```\n\n## EXIF parsing resources\n\n- https://exiftool.org/TagNames/EXIF.html\n- http://lclevy.free.fr/cr2/\n- https://github.com/lclevy/libcraw2/blob/master/docs/cr2_poster.pdf\n- https://github.com/ImranAtBhimsoft/metadata-extractor\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffedragon%2Fark","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffedragon%2Fark","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffedragon%2Fark/lists"}