{"id":18974548,"url":"https://github.com/m7a/bo-maxbupst","last_synced_at":"2026-03-19T07:25:42.738Z","repository":{"id":186383605,"uuid":"675090771","full_name":"m7a/bo-maxbupst","owner":"m7a","description":"Ma_Sys.ma Bupstash Extractor alternative Restoration Program for Bupstash Backups","archived":false,"fork":false,"pushed_at":"2024-04-28T19:31:00.000Z","size":1196,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-09-20T22:47:46.289Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Ada","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/m7a.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-08-05T18:37:01.000Z","updated_at":"2024-04-28T19:31:04.000Z","dependencies_parsed_at":null,"dependency_job_id":"c08cb9bb-59db-4d46-90ca-c935e63db926","html_url":"https://github.com/m7a/bo-maxbupst","commit_stats":null,"previous_names":["m7a/bo-maxbupst"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/m7a/bo-maxbupst","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/m7a%2Fbo-maxbupst","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/m7a%2Fbo-maxbupst/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/m7a%2Fbo-maxbupst/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/m7a%2Fbo-maxbupst/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/m7a","download_url":"https://codeload.github.com/m7a/bo-maxbupst/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/m7a%2Fbo-maxbupst/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29510156,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-16T09:05:14.864Z","status":"ssl_error","status_checked_at":"2026-02-16T08:55:59.364Z","response_time":115,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-08T15:15:23.758Z","updated_at":"2026-02-16T14:31:46.959Z","avatar_url":"https://github.com/m7a.png","language":"Ada","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\nsection: 32\nx-masysma-name: maxbupst\ntitle: Ma_Sys.ma Bupstash Extractor\ndate: 2023/06/13 22:48:05\nlang: en-US\nauthor: [\"Linux-Fan, Ma_Sys.ma (Ma_Sys.ma@web.de)\"]\nkeywords: [\"backup\", \"restore\", \"bupstash\", \"extract\", \"114.115\"]\nx-masysma-version: 1.0.0\nx-masysma-website: https://masysma.net/32/maxbupst.xhtml\nx-masysma-repository: https://www.github.com/m7a/bo-maxbupst\nx-masysma-owned: 1\nx-masysma-copyright: (c) 2022, 2023 Ma_Sys.ma \u003cinfo@masysma.net\u003e.\n---\nAbstract\n========\n\nThe Ma_Sys.ma Bupstash Extractor (`maxbupst`) is an application that can read\nand decode the data from Bupstash (\u003chttps://bupstash.io\u003e) Repositories as\ncreated with a supported Bupstash version.\n\nIntroduction\n============\n\nIn [backup_tests_borg_bupstash_kopia(37)](../37/backup_tests_borg_bupstash_kopia.xhtml)\nmultiple modern backup programs were tested as potential replacements for\n[jmbb(32)](../32/jmbb.xhtml). There, the conclusion was that Bupstash is a most\nviable replacement for JMBB.\n\nOne problem of the modern backup tools is that due to their advanced features\nlike encryption and deduplication, they tend to store data in their own\nproprietary formats that no other tool can read. For tools using fast-paced\ndevelopment or new programming languages (like e.g. Rust) it can be challenging\nto get them to compile as often new compilers and online dependency downloads\nare required.\n\nThis makes the old problem of not having the restoration software available\nin the time of need even more critical since while the software itself may be\navailable, some of its dependencies or an adequate compiler may not.\n\nAdditionally, given the rich feature set that supports multiple backups from\ndifferent machines and potentially uses different encryption keys for various\nparts of the backups, the modern tools come with a high amount of inherent\ncomplexity.\n\nJMBB, which is less modern a tool, has a design that tries to mitigate these\nrisks by being based on formats that can be decoded by combining multiple\nthird-party tools (aescrypt, cpio, xz) for restoring the backup contents\nalthough the restoration process may be slow and slightly off (in that\nrestored data can contain files that were deleted in a recent backup).\n\nFor using any of the more modern alternatives like Bupstash it seems it would\nbe best if there were multiple ways to restore a backup, too. To achieve this,\nthis repository provides an alternative restoration implementation in a\ndifferent programming language (Ada instead of Rust) and using different\nlibraries except for the decryption where it currently uses libsodium just like\nBupstash.\n\nJust like with the JMBB “emergency” restoration using standard tools, this\nimplementation does not aim at being as good as the original. Instead, it is\nintended to serve as an alternative _for the sake of having such_ that may\ncome at degraded performance and with a hugely limited set of features.\n\nThe Bupstash data format consists of versioned data structures. This\nimplementation does not support all of the data structure revisions. Instead\nit focuses on the structures that were used by specific versions of Bupstash\nthat were used productively by the Ma_Sys.ma i.e. some arbitrary set of\nversions is supported. See the table under _Supported Versions_ for details.\nThe idea is that if you only ever switch from one listed Bupstash versions to\nanother one then the resulting backup data structures are restorable by the\nnewest Maxbupstash revision.\n\nIn an ideal world, this implementation would have been created entirely\nindepdendently from Bupstash without looking at its implementation, because\nthis might greatly increase resilience in that truly different implementations\nare unlikely to contain the same bugs, making a successful data restoration more\nlikely. Since the existing online documentation about Bupstash's data and crypto\nstructures are not comprehensive specifications, this was unfortunately not\nfeasible. Instead, the implementation closely follows Bupstash's in many places.\nThis makes it likely that structural bugs that exist in the original Bupstash\nare present in this implementation, too.\n\nIf you are interested in doing a proper “clean room” implementation of a tool\nto be compatible with the Bupstash data format, do not hesitate to contact me.\nI might be able to assist with the specification part.\n\nSupported Versions\n==================\n\nBupstash Versions  Maxbupstash Versions\n-----------------  --------------------\n0.10.3             1.1.1\n\nLicense\n=======\n\n\tMa_Sys.ma Bupstash Extractor\n\t(c) 2022, 2023 Ma_Sys.ma \u003cinfo@masysma.net\u003e\n\t\n\tThis program is free software: you can redistribute it and/or modify\n\tit under the terms of the GNU General Public License as published by\n\tthe Free Software Foundation, either version 3 of the License, or\n\t(at your option) any later version.\n\t\n\tThis program is distributed in the hope that it will be useful,\n\tbut WITHOUT ANY WARRANTY; without even the implied warranty of\n\tMERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\n\tGNU General Public License for more details.\n\nHere is an overview about the Ma_Sys.ma-supplied dependencies' licenses:\n\nDependency                                License\n----------------------------------------  -------------\n[lz4_ada(32)](../32/lz4_ada.xhtml)        Expat (“MIT”)\n[blake3_ada(32)](../32/blake3_ada.xhtml)  CC0\n[tar_ada(32)](../32/tar_ada.xhtml)        GPL 3+\n\nWhy this chaos you might ask? The idea behind the licenses for LZ4 and Blake3\nis to align them with other important projects from the respective ecosystems,\ne.g. the LZ4 Specification or the Blake3 reference implementation as not to\nneedlessly restrict the license further than the original projects do.\n\nFor the Tar implementation there was no real reference implementation although\nit is inspired by `tar.rs` (cf. \u003chttps://docs.rs/tar/latest/tar/\u003e) and\nBupstash's `xtar.rs` hence the Ma_Sys.ma default “GPL v3 or later” is used. As\nfor maxbupst itself: Since it is not a library but rather an application\nprogram, the GPL should be less of a problem for practical use and hence the\nMa_Sys.ma default GPL v3+ was chosen over Bupstash's more permissive Expat\nlicense. If this is an issue for you, feel free to contact me about it and\nexplain whatever difficulty you see regarding the licensing.\n\nCompilation\n===========\n\nSome of Bupstash's required dependencies were found to not have any Ada\nequivalent readily available. Specifically, the following programs seemed not\nto be available in the Ada world: LZ4, Blake3 and TAR Archive creation.\n\nTo provide these features, dedicated separate libraries were thus developed as\npart of the Maxbupst development. Their pages are here:\n\n * [lz4_ada(32)](../32/lz4_ada.xhtml)\n * [blake3_ada(32)](../32/blake3_ada.xhtml)\n * [tar_ada(32)](../32/tar_ada.xhtml)\n\nAdditionally, the external dependency on libsodium is required, i.e. on\nDebian systems this is package `libsodium-dev`.\n\nThere are multiple ways to go about compiling this program depending on the\nintended mode of deployment. Please refer to the following subsections for\ndetails.\n\n## To Install as a Debian Package\n\nThe primary intended use case is to build all of the libraries as separate\nDebian packages, install them on the running Debian system and then compile\nmaxbupst and also install it as a Debian package. If the necessary dependencies\nlike `ant`, `gnat-12` and `devscripts` are installed, this can be achieved by\nrunning\n\n\tant package\n\nin all of the dependencies' individual directories, then installing all of the\nresulting packages like e.g. with `apt install ./...deb` and then compiling\n`maxbupst` with the same command in the repository checkout:\n\n\tant package\n\nThe resulting package can then be installed and the `maxbupst` command becomes\navailable.\n\n## To compile without Installation\n\nIn order to compile this package and obtain an executable that does not have\nany external dependencies except for the system's libsodium, use the following\ntarget:\n\n\tant build-rogue\n\nThis automatically downloads the required Ma_Sys.ma dependencies next to the\ncurrent repository checkout and statically links all of them into the single\n`maxbupst` binary output.\n\n## To test that the tool runs\n\nThe following sequence of commands is expected to produce a YAML output equal to\nthe one found in `testdata/small-0.10.3-expected.yml`:\n\n~~~{.bash}\nroot=\"$(pwd)\"\nexport BUPSTASH_KEY=\"$root/testdata/maxbupst-testkey.key\"\nexport BUPSTASH_REPOSITORY=\"$root/testdata/small-0.10.3\"\nulimit -s unlimited\nmaxbupst -l\n~~~\n\n## To compile for Windows\n\nCompilation on Windows is a little bit involved because installing the\ndependencies requires a lot of manual actions. Also, in order to setup GNAT on\nWindows, the currently recommended course of operation is to use Alire which\nmeans that in order to make use of this installation method, Alire itself needs\nto be setup first. The following steps give a rough guide that worked for me:\n\n 1. Install chocolatey if not already installed in an administrative powershell.\n    `Set-ExecutionPolicy Bypass -Scope Process -Force; [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))`\n    Details: command from \u003chttps://chocolatey.org/install\u003e “individual” variant.\n    Test by running `choco`. It should print out a version.\n 2. In the same shell, install git `choco install git`.\n    The tool becomes available in newly created shells afterwards.\n 3. Install ant and its dependencies:\n    `choco install microsoft-openjdk` and then `choco install ant`\n    Further info: \u003chttps://community.chocolatey.org/packages/ant\u003e,\n    \u003chttps://mkyong.com/ant/how-to-install-apache-ant-on-windows/\u003e\n 4. Switch to a user shell in an empty directory\n 5. Clone the repository: `git clone https://github.com/m7a/bo-maxbupst`\n 6. Get GNAT via Alire\n    \u003chttps://github.com/alire-project/alire/releases/\u003e\n    Download e.g. `alr-1.2.2-bin-x86_64-windows.zip`. Copy `alr.exe` to\n    `bo-maxbupst`.\n 7. Switch directory: `cd bo-maxbupst`\n 8. Run Alire to install the GNAT compiler. Select not to install msys2.\n    `.\\alr toolchain --select`. Chose `gnat_native` and a recent gprbuild.\n    Try out `gnatmake` to check if the command is available.\n 9. Add Alire's toolchain to your PATH\n    `${env:PATH} = \"${env:PATH};${env:USERPROFILE}\\.config\\alire\\cache\\dependencies\\gnat_native_12.2.1_c210a022\\bin\"`\n    (adjust version to your installation)\n 10. Get Libsodium \u003chttps://download.libsodium.org/libsodium/releases/\u003e\n     Download the file with suffix `-msvc.zip` and extract the `libsodium.dll`\n     for your architecture. Copy it to `bo-maxbupst`.\n\nNow you should be ready to compile the “rogue” variant (others are not\nsupported on Windows):\n\n\tant build-rogue\n\nThe files that are necessary for Maxbupst to run are then: `libsodium.dll` and\n`maxbupst.exe`.\n\nIn order to test the functioning of the build, create a new directory `test`\noutside the repository directory and copy the files `libsodium.dll` and\n`maxbupst.exe` there.\n\nThen use a regular `cmd.exe` (not a powershell because that one mangles the\nbinary output and hence the result .tar will not be extractable) and run e.g.\nthe following commands:\n\n~~~{.batch}\nset ROOT=%CD%\\..\\bo-maxbupst\nset BUPSTASH_KEY=%ROOT%\\testdata\\maxbupst-testkey.key\nset BUPSTASH_REPOSITORY=%ROOT%\\testdata\\small-0.10.3\nmaxbupst -l\nmaxbupst -g -i 81e33ca1db1e6d562bc7146ddd9b37ab | tar -xf -\n~~~\n\nNote that it is really `tar -xf -` because `tar -x` gives an error on Windows :)\n\nUsage Documentation (Manpage)\n=============================\n\n## Name\n\n`maxbupst` -- Ma_Sys.ma Bupstash Extractor\n\n## Synopsis\n\n\tmaxbupst -l|list [-k KEY] [-r REPO]\n\tmaxbupst -g|get  [-k KEY] [-r REPO] -i ID\n\n## Description\n\nRead from a Bupstash repository and either display the list of items (`-l`) or\nrestore the data from a specific item (`-g`.\n\nKey and Repository locations can be passed either through the options\n`-k` and `-r` or through the environment variables `BUPSTASH_KEY` or\n`BUPSTASH_REPOSITORY`.\n\n## Options\n\n----  --------------------------------------------------------------------\n`-l`  List mode. List the repository's items as YAML-like output.\n`-g`  Get mode. Extracts the specified item ID and prints it to stdout.\n`-k`  Specify Bupstash private key file to use.\n`-r`  Specify Bupstash repository root directory to use.\n`-i`  Specify item ID to restore. Identify this item by using `-l` before.\n----  --------------------------------------------------------------------\n\n## Environment Variables\n\n`BUPSTASH_KEY`\n:   Alternative way to specify the key file to use. Equivalent to option `-k`.\n`BUPSTASH_REPOSITORY`\n:   Alternative way to specify the repository location. Equivalent to option\n    `-r`.\n\n## Stdout\n\nAny output data produced is written to stdout. For `-g` it is often advisable to\npipe the output through a `tar -x` command in order to extract the retrieved\ndata.\n\nWhen single data items are extracted, their output is returned directly and may\nthus require different processing.\n\n## Examples\n\n\tmaxbupst -l -k testdata/maxbupst-testkey.key -r testdata/small-0.10.3\n\n## Bugs\n\nThe order of items as retrieved by `-l` is file-system dependent and does not\nproperly reflect the order of creation.\n\nRun Advanced Tests\n==================\n\nTesting Backup software correctly is not all that easy. Even for a mere\n“restore” the only real test is: Does it restore the real production data of\ninterest. Tests with production data are always difficult to conduct for\nmultiple reasons: Data under consideration may be large, confidential and a\nrestore must not interfere with the productive systems' operations.\n\nHence the testing for maxbupst is threefold:\n\n 1. A simple restore test can be found in script `test_with_testdata.sh`. It\n    really just tries to restore known files from a tiny repository supplied\n    as part of the maxbupst source code. After compiling `maxbupst` and having\n    a `maxbupst` binary in the repository's directory, it can be run as-is\n    without any additional configuration being required.\n 2. A synthetic test based on Bupstash's `cli-tests` test suite. This one is\n    nice because it makes use of Bupstash's own test cases although it cannot\n    run all of them. It works by downloading the test definitions from the\n    Bupstash repository, editing them to replace `bupstash get` instances with\n    `maxbupst` and then running a sensible subset of the tests. If you used\n    `ant` to compile `maxbupst`, the necessary prerequisites for the script to\n    work may already be present. Additionally, package `bats` must be installed\n    for the script to be able to run the Bupstash tests. The script does not\n    take any parameters and can be run as-is then.\n 3. The test with productive data. This one cannot run without additional\n    configuration by the user. Also, given how long it can take to complete its\n    invocation is organized in multiple stages such that one can perform\n    one stage after each other or independent stages in parallel even. To avoid\n    messing with the production data, it is intended that they are copied to\n    a working directory for the tests. These tests are contained in\n    subdirectory `test_with_production_data` and implemented in a GNU Makefile.\n    Run `make -C test_with_production_data help` to display information about\n    the variables and targets available. The broad idea is that you provide\n    an `env.mk` in that directory to set `MAXBT_PROD_SOURCE_DATA` and your\n    repository access data. Then you edit the `s1_update_backup.txt` target to\n    your needs and finally run `make -C test_with_production_data -j all` to\n    perform the end-to-end test. As additional dependencies, this test requires\n    `docker`, GNU `make`, GNU `tar` and `diff`.\n\nBupstash's Cryptosystem\n=======================\n\n~~~{.ada}\n-- from bupstash_key.ads\ntype Key is tagged limited record\n\tID:                   Bupstash_Types.XID;\n\tRollsum_Key:          String(1 .. Random_Seed_Bytes);\n\tData_Hash_Key_Part_1: Bupstash_Types.Partial_Hash_Key;\n\tData_Hash_Key_Part_2: Bupstash_Types.Partial_Hash_Key;\n\tData_PK:              Bupstash_Types.PK;\n\tData_SK:              Bupstash_Types.SK;\n\tData_PSK:             Bupstash_Types.PSK;\n\tIdx_Hash_Key_Part_1:  Bupstash_Types.Partial_Hash_Key;\n\tIdx_Hash_Key_Part_2:  Bupstash_Types.Partial_Hash_Key;\n\tIdx_PK:               Bupstash_Types.PK;\n\tIdx_SK:               Bupstash_Types.SK;\n\tIdx_PSK:              Bupstash_Types.PSK;\n\tMetadata_PK:          Bupstash_Types.PK;\n\tMetadata_SK:          Bupstash_Types.SK;\n\tMetadata_PSK:         Bupstash_Types.PSK;\nend record;\n~~~\n\n`ID` uniquely identifies this key, `Rollsum_Key` is used for deduplication\nduring backup creation and not needed for data restoration. The remainder of\nthe structure's entries are used for restoration and explained in the following.\n\nNote that this is _my_ understanding as a reader of the source code rather than\nthe inventor of Bupstash. Feel free to point out any parts where I understood\nthe hierarchy wrongly.\n\nDifferent keys are used to encrypt different parts of the repository as follows:\n\n\tMetadata (Items)   Index                      Data\n\t\n\t+----------+       +-------------------+      +-----------------------+\n\t| Backup 1 |------\u003e| hello.txt size 12 |-----\u003e| Hello world.#!/bin/sh |\n\t+----------+       | test.sh   size 24 |      |  -eu.echo Test.       |\n\t                   +-------------------+      +-----------------------+\n\t\n\tSchematic visualization of an example repository with a single item\n\tcontaining two files.\n\nMetadata Keys\n:   At the high-level, Bupstash repositories contain any number of _items_.\n    Metadata stores information about such an item. It contains a number\n    of tags, the date of backup creation and the addresses and sizes of the\n    associated index data (if any) and the associated data (always present).\n    The keys prefixed `Metadata_` are used for protecting the private (“secret”)\n    part of the metadata.\n\nIndex Keys\n:   If an item contains multiple files (i.e. is not just a data stream that\n    was added as-is to a bupstash repository) then the index stores information\n    about each of the contained files. Among other file metadata this includes\n    file paths and file sizes. The keys prefixed `Idx_` are used to protect\n    this data.\n\nData Keys\n:   Data is the actual backup contents. For restoration purposes, it can be\n    thought of as an opaque stream of bytes. If multiple files are contained\n    within the backup, the _index_ is used to associate suitable chunks of the\n    data stream to the individual files. The keys prefixed `Data_` are used\n    to protect the backup contents.\n\nThe use of multiple keys for the different repository contents seems sensible.\nIt allows sub-keys to be created to e.g. only access the backup metadata without\nhaving to be able to decrypt the index and data contents. The use of separate\nkeys for index and data ensures that adversaries cannot attack the system by\nexchanging index and data contents.\n\nFor each of the parts, PK, SK and PSK keys are stored.\n\n * PK (“Public Key”)\n * SK (“Secret Key”)\n * PSK (“Pre-Shared Key”)\n\nThe idea behind the PSK is that it is a symmetric key that is considered to\nbe between the public and the secret key in terms of secrecy: Unlike the public\nkey it is not stored together with the data, but unlike the secret key it is\nprovided in key files where SK is missing. For restoration purposes, SK and PSK\ncan both be considered required secret key inputs.\n\nAt the low level, all data chunks are encrypted using libsodium's cryptobox\nfunctionality. At the lowest level, the following two API calls are used for\ndecryption (`zsodiumbinding.ads`):\n\n * `crypto_box_curve25519xchacha20poly1305_beforenm`:\n   This function computes a “shared key” from public and secret keys that is\n   used for decryption. This is intended to be used as an input to\n   `open_easy_afternm` afterwards (bupstash does something peculiar here,\n   though -- read on). I subsequently call this `cryptobox-beforenm`.\n * `crypto_box_curve25519xchacha20poly1305_open_easy_afternm`:\n   This function decrypts an encrypted message by providing the “shared key”,\n   the ciphertext and a nonce that is typically stored together with the\n   ciphertext. I subsequently call this `cryptobox-open`.\n\nRaw ciphertext data in Bupstash consists of the following parts:\n\n\tCiphertext := Nonce || Cryptobox Ciphertext || PK\n\nThe decryption key (“box key”, BK) for the contained cryptobox ciphertext is\ncomputed as follows and from that, the plaintext:\n\n\tBK        := BLAKE3(Key=PSK, Data=cryptobox-beforenm(PK, SK))\n\tPlaintext := cryptobox-open(Key=BK, Nonce, Data=Cryptobox Ciphertext)\n\nThis encryption is used at the level of _chunks_ with the chunks being managed\nby a data and index tree for data and index contents respectively. In order\nto check that the content addressable storage is indeed addressed correctly,\nthe addresses inside the trees are checked as follows using a Hash Key (HK):\n\n\tHK               := BLAKE3(Key Part 1 || Key Part 2)\n\tComputed Address := BLAKE3(Key=HK, Data=Plaintext)\n\nIt is then asserted that the computed address corresponds to the address\nspecified in the tree. The concatenation of all decrypted plaintext chunks then\nforms the contents of the index and data respectively.\n\nGraphically, this scheme can be drawn as follows with HKP1 and HKP2 serving as a\nshorthand notation for _Key Part 1_ and _Key Part 2_ of the Hash Key:\n\n\tChunk                                    Key\n\t+-------+----------------------+----+    +----+-----+------+------+\n\t| Nonce | Cryptobox Ciphertext | PK |    | SK | PSK | HKP1 | HKP2 |\n\t+-------+----------------------+----+    +----+-----+------+------+\n\t    |              |             |          |    |    |          |\n\t    |              |             v          v    |    +- concat -+\n\t    |              |           +--------------+  |         |\n\t    |              |           |  cryptobox-  |  |         v\n\t    |              |           |   beforenm   |  |  +--------------+\n\t    |              |           +--------------+  |  | Blake 3 Hash |\n\t    |              |                   |         |  +--------------+\n\t    |              |                   |         |         |\n\t    |              |                   | data    | key     |\n\t    |              |                   v input   v input   |\n\t    |              |           +--------------------+      |\n\t    |              |           |    Blake 3 keyed   |      |\n\t    |              |           |    hash function   |      |\n\t    |              |           +--------------------+      |\n\t    |              |                        | BK           |\n\t    |              |                        |              |\n\t    | nonce        | ciphertext             | key          |\n\t    v input        v input                  v input        |\n\t+---------------------------------------------------+      |\n\t|                 cryptobox-open                    |      |\n\t+---------------------------------------------------+      |\n\t    | plaintext                                            | key\n\t    | output                                               v input\n\t    |                                    data input +---------------+\n\t    +----------------------------------------------\u003e| Blake 3 keyed |\n\t    |                                               | hash function |\n\t    |                                               +---------------+\n\t    |                                                      |\n\t    v                                                      v\n\t+---------------+                                   +---------------+\n\t| Plaintext to  |                                   |    computed   |\n\t| use           |                                   | chunk address |\n\t+---------------+                                   +---------------+\n\nSoftware Design\n===============\n\nThis section contains some notes about what I learned from reading the Bupstash\nsource code. It focuses on the data structures and is completed by a diagram\nwhich shows the implementation maxbupst at a high-level glance.\n\n## Bupstash's Index Structures\n\nTo associate file contents and metadata, bupstash does the following:\n\n * It creates two separate HTrees and iterates over both of them independently:\n   One for metadata and one for the actual file contents.\n * To restore, it iterates over the metatadata tree which consists of a stream\n   of records.\n * For each record, it reads out the _size_ of the respective item.\n * If _size_ is nonzero, it reads _size_ bytes from the data tree and produces\n   them as output.\n * Now the data tree cursor points to data from the next entry with data such\n   that the restore can continue by going to the next metadata item.\n * Additional data fields allow for restoration of a subset of files. I did not\n   check this in more detail because maxbupst only implements the “full” restore\n   functionality.\n\n## Bupstash's HTree Storage\n\nFrom a restorer point of view, Bupstash stores its data as a “stream”. Instead\nof having one large file where data can be appended, a tree structure is used\nto represent the “stream”.\n\nWhen there is only one backup containing multiple files then there is typically\ntwo streams: One for metadata and one for data (see above).\n\nThe storage only contains encrypted data and tree metadata. As a result, the\nHTree can be traversed without being decrypted. The actual (encrypted) data\nis contained in the leaf nodes whereas the other nodes contain unencrypted\nmetadata about which other tree nodes belong to the same subtree.\n\nBupstash's storage is “content-addressable”, i.e. the _ID_ of a tree node\nis actually the BLAKE3 hash over the concatenation of its (file) contents.\n\nAs a result, an entire stream can be identified by its ID. Such a tree can\nbasically be traversed as follows given the root node _ID_:\n\n * Read the file with file name _ID_\n * If this is already a leaf node, emit its contents as data.\n * If this is not a leaf node, decode the contained IDs in this node and\n   continue by traversing each of the contained nodes in order\n\nTo determine which nodes are leaf nodes and which not, additional metadata is\nrequired. This metadata is stored by bupstash as part of the backup metadata\nand not contained in any of the two (metadata and data) trees.\n\nSince there is no obvious relation between the tree nodes and the files in the\nbackup, this structure is not expected to leak any sensitive information. In\norder to ensure that the trees are not tampered with, all content addresses\nmust be validated by independently computing the hash over their contents and\ncomparing the result with the ID they were found under.\n\nIn order to save memory it makes sense to not read the entire tree into RAM.\nRather, the list of leaf nodes is constructed while processing such that there\nis always only a few nodes loaded into RAM rather say the entire the backup\ncontents. The Maxbupst implementation simplifies this a little in that it reads\nall of the leaf nodes' addresses into RAM and holds them there while restoring\nthe backup. While this is a waste of memory, it is also the easiest variant that\ncould be implemented.\n\n## Maxbupst Package Dependencies\n\nThe following diagram shows the dependencies between the maxbupst Ada packages.\nAn arrow A -\u003e B defines a dependency of type “A knows B”. External library\ncomponents like Tar, Blake3 and LZ4 are shown for completeness despite not being\ncontained in the maxbupst source tree.\n\n~~~\n                                                              ┌─────┐\n                                                              │ LZ4 │\n                                                              └──▲──┘\n                                                                 │\n                                                         ┌───────┴─────┐\n                                                         │ Compression │\n                                                         └───────▲─────┘\n                              ┌───────┐                          │\n         ┌───────────────────▶│ Serde │               ┌────────┐ │\n         │           ┌───────▶│       │             ╔═╡ Crypto ╞═╪══════════╗\n         │           │        └───▲▲──┘             ║ └────────┘ │          ║\n         │           │            ││   ┌────────┐   ║            │          ║\n         │   ┌────┐  │            ││   │ Blake3 │◀──╫──────┐     │          ║\n         │ ╔═╡ FS ╞══╪══════╗     ││   └───▲─▲──┘   ║      │     │          ║\n         │ ║ └────┘  │      ║     ││       │ │      ║      │     │          ║\n         │ ║         │      ║     ││       │ │      ║      │     │  ZSodium ║\n         │ ║         │      ║     │└───────┼┐│      ║      │     │      ▲   ║\n┌─────┐  │ ║         │      ║     └───────┐│││      ║      │     │      │   ║\n│ Tar │  │ ║       Index ◀──╫────────────┐││││      ║      │     │      │   ║\n└──▲──┘  │ ║         ▲      ║            │││││      ║      │     │      │   ║\n   │     │ ║         │      ║            │││││      ║   Decryption ─────┘   ║\n   └─────┼─╫─────── XTar ◀──╫────────────┤││││      ║      ▲▲               ║\n         │ ║                ║            │││││      ║      ││               ║\n         │ ╚════════════════╝            │││││      ╚══════╪╞═══════════════╝\n         │                               │││││             ││\n         │                               │││││             ││\n         │                               │││└┼─────────┐   ││\n         │   ┌──────┐                    │││ └─────┐   │   ││\n         │ ╔═╡ Tree ╞════════════════════╪╪╪═════╗ │   │   ││\n         │ ║ └──────┘                    │││     ║ │   │   ││\n         │ ║      ┌───────▶ HTree_LL ◀──┐│││     ║ │   │   ││\n         │ ║      │            ▲        ││││     ║ │   │   ││\n         │ ║  HTree_Iter       │        ││││     ║ │   │   ││\n         │ ║      ▲            │        ││││     ║ │   │   ││\n         │ ║      └────────────┼───── Restorer ──╫─┼───┼───┘│\n         │ ║                   │          ▲      ║ │   │    │\n         │ ║                   │          │      ║ │   │    │\n         │ ╚═══════════════════╪══════════╪══════╝ │   │    │\n         │                     │          │        │   │    │\n         └────────────────────┐│┌─────────┼────────┘   │    │\n                              │││┌────────┼────────────┼────┘\n                     ┌────┐   ││││        │            │\n                   ╔═╡ DB ╞═══╪╞╞╞════════╪════════════╪═══════════════╗\n                   ║ └────┘   ││││        │            │               ║\n                   ║          ││││        │            │      ZBase64  ║\n                   ║          ││││        │            │         ▲     ║\n                   ║          Item        │            Key       │     ║\n                   ║           ▲          │            ▲         │     ║\n                   ║           └─────── Repository ────┴─────────┘     ║\n                   ║                      ▲                            ║\n                   ║                      │                            ║\n                   ║                     Main                          ║\n                   ║                                                   ║\n                   ╚═══════════════════════════════════════════════════╝\n~~~\n\nHistory\n=======\n\nSubdirectory `testcpp` contains a previous attempt to write such a thing in C++\n(as an exercise to get to know modern C++ better). After the necessity for\nlearning more C++ broke away, a rewrite was performed in Ada.\n\nFuture Directions\n=================\n\n * Currently, only Bupstash v0.10.3 is supported.\n   Is intended to add support for a newer version in the future.\n * The Ma_Sys.ma CI cannot build this. As a workaround one can install the\n   library dependencies into the build container. To fix this the best solution\n   would be to replace the Ma_Sys.ma CI by something better.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fm7a%2Fbo-maxbupst","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fm7a%2Fbo-maxbupst","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fm7a%2Fbo-maxbupst/lists"}