{"id":23344450,"url":"https://github.com/rec/hardback","last_synced_at":"2025-10-31T10:32:29.090Z","repository":{"id":137549430,"uuid":"178737544","full_name":"rec/hardback","owner":"rec","description":"📓 Hardcopy backups of digital data 📓","archived":false,"fork":false,"pushed_at":"2024-07-09T19:02:55.000Z","size":4379,"stargazers_count":1,"open_issues_count":18,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-02-13T18:53:01.426Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rec.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":"FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":"rec"}},"created_at":"2019-03-31T20:17:04.000Z","updated_at":"2023-02-26T12:35:52.000Z","dependencies_parsed_at":null,"dependency_job_id":"ddc7ba66-f236-44c3-8ea0-3142acdbe5a8","html_url":"https://github.com/rec/hardback","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rec%2Fhardback","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rec%2Fhardback/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rec%2Fhardback/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rec%2Fhardback/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rec","download_url":"https://codeload.github.com/rec/hardback/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247693704,"owners_count":20980726,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-21T06:26:41.485Z","updated_at":"2025-10-31T10:32:29.015Z","avatar_url":"https://github.com/rec.png","language":"Python","funding_links":["https://github.com/sponsors/rec"],"categories":[],"sub_categories":[],"readme":"# hardback: hard-copy backup of digital data\n\nNewest updates are\n[here](https://github.com/rec/hardback/blob/master/UPDATES.rst).\n\n## In one sentence\n\nArchive a digital document as a hardcopy book that can then be turned\nback into the original document.\n\n## High level picture\n\nThere are only two parts to this project:\n\n-   Writing the original document or book from a digital document\n-   Reading the book back in\n\nWriting is non-trivial, but there is a clear path to a good solution for\nthat.\n\nBut I really don\\'t have any solid solution yet to reading, short of\nsomeone scanning each QR code individually.\n\nOf course, that is a reasonable solution if you care about the data and\ndon\\'t mind paying someone to take the time.\n\n## The book format is EPUB\n\nThe output format will be EPUB, \u003chttps://en.wikipedia.org/wiki/EPUB\u003e\n-the only choice for an open-source book format, full-featured and\nuniversally accepted.\n\nI\\'m using a Python library called EBookLib for this - I haven\\'t looked\ninto it thoroughly yet, but it seems well-received and there is no other\ncandidate in Python.\n\nUpdate: EBookLib is fairly gnarly, but the underlying format is just\nXHTML, so I\\'m having reasonable success getting output.\n\n## The data format within the book is QR code\n\nQR codes will be used to store the data in 1k blocks - again, QR is the\nonly reasonable choice for solving the problem of printable data.\n\nA Python library called segno can write each one as a tiny PNG file\nabout 2K in size. This is quite reasonable - it means that we can aim to\ncreate a book document that\\'s less than three times the size of the\noriginal digital document. (Interestingly enough, SVG files were an\norder of magnitude larger -in some cases over one hundred times larger!)\n\nWe\\'ll be using QR code format 36, which holds up to 1,051 bytes at the\nhighest error correction code level, \\'H\\'.\n\nThe official list of all the QR code formats,\n\u003chttps://www.qrcode.com/en/about/version.html\u003e is poorly organized -\nclick on 31-40 and then scroll down.\n\nI\\'m going to use that to hold 1024 bytes of target data with an index\nand a hash of the original document, totalling 1,048 bytes. (The extra 3\nbytes aren\\'t entirely wasted - we get a tiny bit better error\ncorrection.)\n\n## Data layout\n\nThe binary data is divided into 1K *chunks*. A chunk is written to a QR\ncode as part of a *block*, which also contains an index and a hash of\nthe original documet.\n\nThe layout in bytes within the block is by default like this:\n\n``` text\n| index [8] | document[8] | chunk [up to 1024] |\n```\n\nbut you can customize all these sizes.\n\nThere\\'s no checksum or error correction for this block itself, as the\nQR code is already taking care of that for us.\n\n`hash` is the first 16 bytes of the 32-byte SHA256 hash of the entire\ndocument. `data` is one kilobyte from your target file.\n\n`index` is an 8-byte signed integer - a number that can be positive,\nnegative or zero, and that fits into 8 bytes (or equivalently 16 hex\ndigits).\n\nIf the index is zero or negative, then it is a metadata block.\n\nThe block with index zero always contains a JSON description of the\noriginal file with the fields `filename`, `timestamp`, `size` and\n`sha256`. If the original filename is too long (which would be about 900\ncharacters or so!), it is truncated from the left.\n\nBlocks with negative indexs are currently unspecified and reserved for\nfuture expansion or individuals to use. The first version of the\nsoftware will only produce output with non-negative indexs.\n\nIf `index` is positive, it\\'s the index of a data block. This means that\nthe first data block has `index` 1.\n\nEight bytes allows us to generate 2 to the power of 63 blocks of 1K\neach, or about 9 zetabytes (which is 9,000,000,000,000 gigabytes) -\nroughly the entire size of all the world\\'s data in 2019.\n\nWithin a block, `index` is is represented in\n[big-endian](https://en.wikipedia.org/wiki/Endianness) (or intuitive or\nnetwork order) -which means the *most* significant digits occur first.\n\nIntel processors are little-ended, where the *least* significant digits\ncome first, so we use the [struct\nlibrary](https://docs.python.org/3/library/struct.html#byte-order-size-and-alignment)\nto make sure that the output is system-independent.\n\nRemembering that one byte is equal to two hex digits, if the hash of a\nfull document is\n`56484fd9aad8e87540609ca6c938f98fab60296b3bec808ea8b3e24da2035ce9` then\nthe resulting sequence of QR codes would look like:\n\n``` text\n0000000000000000 56484fd9aad8e87540609ca6c938f98f {\"filename\": \"me.jpg\", ...\n0000000000000001 56484fd9aad8e87540609ca6c938f98f ... 1024 bytes ...\n0000000000000002 56484fd9aad8e87540609ca6c938f98f ... more data  ...\n... etc\n```\n\nThis means that each QR code identifies itself as to what part of the\nwhole document it is.\n\nIt also means that the metadata block is key to understanding how the\nwhole system works! If you have a metadata block, then you can\nreconstruct at least part of the data even if a lot of it is lost.\nOtherwise, you really have to guess.\n\nSo we\\'re going to have to intersperse the metadata block within all the\nother blocks periodically if we really want something that can be\npartially reconstructed!\n\nUpdate - this is done: the metadata blocks appear in varying locations\non each page so even a hole were punched through the book, some copy of\nthe metadata would probably survive.\n\nAlso, \\\"raw\\\" formats like RAW and AIFF are much preferable for this\nsort of archival activity because compressed formats dramatically\nmagnify the effect of any errors or gaps. If you had a book containing\nthe digital data for an AIFF or RAW, you could still reconstruct pieces\nof it even if you only have a limited number of pages, whereas you might\nget nothing at all if you were using mp3 or jpg files.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frec%2Fhardback","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frec%2Fhardback","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frec%2Fhardback/lists"}