{"id":18974512,"url":"https://github.com/m7a/lp-backup-tests","last_synced_at":"2026-04-08T15:30:19.914Z","repository":{"id":53931263,"uuid":"356428676","full_name":"m7a/lp-backup-tests","owner":"m7a","description":"Comparison of Modern Linux Backup Tools – Borg, Bupstash and Kopia - Resources","archived":false,"fork":false,"pushed_at":"2024-04-28T19:46:52.000Z","size":1528,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-01-01T09:08:01.100Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/m7a.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-04-10T00:20:03.000Z","updated_at":"2024-04-28T19:46:55.000Z","dependencies_parsed_at":"2022-08-13T04:50:19.239Z","dependency_job_id":null,"html_url":"https://github.com/m7a/lp-backup-tests","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/m7a%2Flp-backup-tests","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/m7a%2Flp-backup-tests/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/m7a%2Flp-backup-tests/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/m7a%2Flp-backup-tests/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/m7a","download_url":"https://codeload.github.com/m7a/lp-backup-tests/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239972039,"owners_count":19727291,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-08T15:15:16.391Z","updated_at":"2026-04-08T15:30:19.844Z","avatar_url":"https://github.com/m7a.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\nsection: 37\nx-masysma-name: backup_tests_borg_bupstash_kopia\ntitle: Comparison of Modern Linux Backup Tools -- Borg, Bupstash and Kopia\ndate: 2021/03/16 18:59:46\nlang: en-US\nauthor: [\"Linux-Fan, Ma_Sys.ma (Ma_Sys.ma@web.de)\"]\nkeywords: [\"linux\", \"backup\", \"borg\", \"bupstash\", \"jmbb\", \"kopia\", \"deduplicating\"]\nx-masysma-version: 1.1.1\nx-masysma-repository: https://www.github.com/m7a/lp-backup-tests\nx-masysma-website: https://masysma.net/37/backup_tests_borg_bupstash_kopia.xhtml\nx-masysma-owned: 1\nx-masysma-copyright: |\n  Copyright (c) 2021 Ma_Sys.ma.\n  For further info send an e-mail to Ma_Sys.ma@web.de.\n---\nPreamble\n========\n\nTraditional backup tools can mostly be subdivided by the following\ncharacteristics:\n\nfile-based vs. image-based\n:   Image-based solutions make sure everything is backed up, but are potentially\n    difficult to restore on other (less powerful) hardware. Additionally,\n    creating images by using traditional tools like `dd` requires the disk that\n    is being backed up to be unmounted (to avoid consistency issues). This makes\n    image-based backups better suited for filesystems that allow doing\n    advanced operations like snapshots or `zfs send`-style images that contain\n    a consistent snapshot of the data of interest.\n    For file-based tools there is also a distinction between tools that exactly\n    replicate the source file structure in the backup target (e.g. `rsync` or\n    `rdiff-backup`) and tools that use an archive format to store backup\n    contents (`tar`).\nnetworked vs. single-host\n:   Networked solutions allow backing up multiple hosts and to some extent allow\n    for centralized administration. Traditionally, a dedicated client is\n    required to be installed on all machines to be backed up. Networked\n    solutions can act pull-based (server gets backups from the clients) or\n    push-based (client sends backup to server). Single-Host solutions consist of\n    a single tool that is being invoked to backup data from the current host to\n    a target storage. As this target storage can be a network target, the\n    distinction between networked and single-host solutions is not exactly\n    clear.\nincremental vs. full\n:   Traditionally, tools either do an actual 1:1 copy (full backup) or copy\n    “just the differences“ which can mean anything from “copy all changed files”\n    to “copy changes from within files”. Incremental schemes allow multiple\n    backup states to be kept without needing much disk space. However,\n    traditional tools require that another full backup be made in order to free\n    space used by previous changes.\n\nModern tools mostly advance things on the _incremental vs. full_ front by\nacting _incremental forever_ without the negative impacts that such a scheme\nhas when realized with traditional tools. Additionally, modern tools mostly\nrely on their own/custom archival format. While this may seem like a step\nback from tools that replicate the file structure, there are numerous potential\nadvantages to be taken from this:\n\nEnclosing files in archives allows them and their metadata to be encrypted and\nportable across file systems.\n\nGiven that many backups will eventually be stored to online storages like\nDropbox, Mega, Microsoft One Drive or Google Drive, the portability across file\nsystems is especially useful. Even when not storing backups online, portability\nensures that backup data can be copied by easy operations like `cp` without\ndamaging the contained metadata. Given that online stores are often not exactly\ntrustworthy, encryption is also required.\n\nAbstract\n========\n\nThis article attempts to compare three modern backup tools with respect to their\nfeatures and performance. The tools of interest are Borg, Bupstash and Kopia.\nAdditionally, the currently used Ma_Sys.ma Backup Tool JMBB is taken as a\nreference. While it lacks a lot of the modern features with respect to the\nother competitors, the idea to test the other ones stems from the intention to\nreplace JMBB with a more widely-used and feature-rich alternative.\n\nNot in this Document\n====================\n\nHere are a few points that are explicitly not covered in this comparison:\n\n * Client-server backup solutions: While these tools are useful in certain\n   contexts, they mostly expect there to be a central server to put backups\n   to. As this does not model the reality of the Ma_Sys.ma backup process\n   (which consists of multiple backup targets including offsite and offline\n   ones), such tools are not considered here.\n * Restic.\n * Traditional backup tools: tar, rsync, rdiff-backup, bup, dar, ...\n * Non-free (as per DFSG) tools.\n * Considerations on the respective tools' crypto implementations.\n\nTools\n=====\n\nHere is a short tabular overview of the backup tools compared in this article\nshowing some (potentially interesting) metadata (as per 2021/03/17, data mostly\nfrom the respective Github repository pages).\n\nName            JMBB      Borg         Bupstash  Kopia\n--------------  --------  -----------  --------  -------\nFirst Released  2013/08   2010/03 (1)  2020/08   2019/05\nIn Debian       No        Yes          No        No\nCloc (2)        6372      71757        13134     55124\nImplemented In  Java      Python+C     Rust      Go\nVersion Tested  1.0.7     1.1.15       0.7.0     0.7.3\n\nTool      Type     Link\n--------  -------  --------------------------------------------\nJMBB      Website  \u003chttps://masysma.lima-city.de/32/jmbb.xhtml\u003e\n          Github   \u003chttps://github.com/m7a/lo-jmbb\u003e\nBorg      Website  \u003chttps://www.borgbackup.org/\u003e\n          Github   \u003chttps://github.com/borgbackup/borg\u003e\nBupstash  Website  \u003chttps://bupstash.io/\u003e\n          Github   \u003chttps://github.com/andrewchambers/bupstash\u003e\nKopia     Website  \u003chttps://kopia.io/\u003e\n          Github   \u003chttps://github.com/kopia/kopia\u003e\n\n### Footnotes\n\n 1. First release of Attic which Borg is based on. Date according to Wikipedia\n    \u003chttps://en.wikipedia.org/wiki/Attic_(backup_software)\u003e\n 2. Excluding non-source repository content. This is sometimes difficult, e.g.\n    Kopia seems to consist of more than just the Go source code but it is hard\n    to separate what is needed for documentation and what for the actual\n    application. Take the numbers as estimates.\n\nFeatures\n========\n\nAs already mentioned in the Preamble, encryption and portability across file\nsystems are essential for modern backup tools. There are some other features\nthat are often already achieved by the traditional approaches which are\nexpected to be found in their modern competitors, too. Additionally, one could\nenvision some usful features that are rarely available.\n\nThe following table presents the Ma_Sys.ma's idea of a good set of features for\nany modern backup program. A good program is expected to have all of the\n_Basic Features_ and _Advanced Features_, while the _Very Advanced Features_\nare seen as useful but less important.\n\nFeature                                 JMBB  Borg  Bupstash  Kopia\n--------------------------------------  ----  ----  --------  -----\nBasic Features                                                 \nshrink on input-file deletion           Yes   Yes   Yes       Yes\nUNIX special files and metadata         Yes   Yes   Yes       No\nread only changed files                 Yes   Yes   Yes       Yes\nrestore individual files                Yes   Yes   Yes       Yes\ndata encryption                         Yes   Yes   Yes       Yes\nmetadata encryption                     Yes   Yes   Yes       Yes?\nportable across file systems            Yes   Yes   Yes       Yes\nmultithreading or parallelization       Yes   No    Yes       Yes\narbitrarily complex file names          No    Yes?  Yes?      Yes?\ninput file size irrelevant              No    Yes   (11)      Yes\ninput file number irrelevant            (4)   (4)   Yes       Yes\n                                                               \nAdvanced Features                                              \ncompression                             Yes   Yes   Yes       Yes\nintegrity checks                        Yes   Yes   No        Yes\ndata archival                           Yes   (9)   (9)       No\nworks on slow target storage            Yes   (10)  (10)      Yes\nreadable by third-party tools           (2)   No    No        No\nWindows support w/o WSL/Cygwin          (3)   No    No        Yes\nretention policy for versions           No    Yes   Yes       Yes\ndeduplication                           No    Yes   Yes       Yes\ndirectly upload to remote               No    (1)   (1)       Yes\n                                                               \nVery Advanced Features                                         \nmount backup as r/o filesystem          No    Yes   No        Yes\nmultiple hosts backup to same target    No    (13)  Yes       Yes\nprocess non-persistent live streams     No    Yes   Yes       Yes\nconfigure output file size limit        No    Yes?  (12)      No\nconsistent state on interruption        No    Yes?  ?         ?\nincremental metadata store/update       No    No?   Yes?      Yes?\nconcurrent write to same target         No    No?   Yes       Yes\nretry on fail mechanisms                No    No?   No?       Yes?\nGDPR-style data deletion requests       (5)   No?   No        No\nintegrated cloud storage client         No    No    No        Yes\ndata redundancy/bit rot recover         No    No    No        No\ncrypto-trojan-proof pull-scheme         No    No    (6)       No?\nconsistently backup running VMs or DBs  No    No    No?       (8)\nREST API for submitting backup inputs   No    No    No        (7)\nREST API for restoring                  No    No    No        (7)\nREST API for monitoring                 No    No    No        (7)\n\nYes?/No? := guessed.\n\n### Footnotes\n\n  1. Yes, but only to tool-specific server.\n  2. Yes, but practically limited to restoration of individual files.\n  3. Yes, but only for restoring with a `cpio.exe`.\n  4. Both tools' capability for many files is limited. Borg is limited by its\n     sequential approach, JMBB is limited by loading its metadata completely\n     into RAM.\n  5. Yes, but: Requires obsoleting all related blocks manually by means of\n     `jmbb -e` and `obsolete id`. Impractical for anything more than a few\n     requests per year. Metadata is retained. Archival storage not affected\n     (if used).\n  6. Yes, but: Requires running Bupstash's server on the backup target machine.\n     It is not acutally a pull scheme but a crypto-trojan-proof push scheme!\n     See \u003chttps://bupstash.io/doc/guides/Remote%20Access%20Controls.html\u003e\n  7. APIs exist but their details have not been checked when this article was\n     created.\n  8. Can be implemented by custom Actions, see\n     \u003chttps://kopia.io/docs/advanced/actions/\u003e.\n     Examples in the documentation do not indicate if this can be used in\n     conjunction with reading backup input data from stdin. Hence it does not\n     seem well-suited for large files like VMs.\n  9. These tools can be configured to run in an append-only mode that allows\n     users to establish a basic archival scheme. These features were not tested.\n 10. Yes, but Borg failed to backup to a mounted WebDAV file system (during an\n     explorative test). Bupstash consistently fails to backup to NFS but works\n     fine on either SSH or SSHFS targets (see _Data-Test to NFS and SSHFS_).\n     Hence, the problem with NFS is expected to be a bug rather then a general\n     problem with slow target storages.\n 11. Bupstash produces a large number of output files for large total backup\n     input sizes. This may cause problem if limitations exist in the underlying\n     storage/file system.\n 12. Practically irrelevant as the output file sizes are always rather small.\n 13. Technically it does not fail to work under these circumstances, but the\n     Borg documentation advises against doing this for performance and security\n     reasons, see \u003chttps://borgbackup.readthedocs.io/en/stable/faq.html#can-i-backup-from-multiple-servers-into-a-single-repository\u003e.\n\n## Rationale\n\nThe table is presented such that “Yes” means “good” and “No” means “bad”. Thus\nthere are negated lines like _input file size irrelevant_ because if the input\nfile size is limited/relevant that's bad and hence “No” is given in the table.\n\n## Basic Features Explained\n\nWhile many of the points under _Basic Features_ are pretty obvious, there are\nsome notable mentions:\n\nUNIX special files and metadata\n:   For regular user data, It is already important that a backup retains\n    symlinks. I know at least one program that chokes on a missing socket\n    file (although they should be ephemereal and YES, this was in the respecitve\n    user's `/home` directory!). Additionally, when backing up system structures\n    like chroots, whole root file systems, containers etc. it becomes important\n    that devices and other UNIX special files are retained.\n\nMetadata Encryption\n:   Metadata is worth protecting. If you think otherwise, consider sending a\n    list of all your files to your worst enemy/competitor. What might they\n    learn? They will surely know what kind of software you are using already\n    from the file extensions. They will know about the names of all installed\n    programs if a system drive is backed up etc. Seeing when which of the files\n    changed is even more interesting: They will know what projects you are\n    working on and potentially even how well the effort is going...\n\nMultithreading or Parallelization\n:   Traditional backup tools (and even some of the more advanced ones) work\n    sequentially. The advances of multiprocessing have been claimed for years.\n    Yet many programs do not make use of that potential gain in performance.\n    With respect to backup tools, the advantages of parallelization are\n    sometimes downplayed by people claiming that a backup tool should not\n    interfere with the computer's other activities and hence be as minimal as\n    possible wrt. CPU/RAM resources. By this argumentation, a single-threaded\n    backup would be sufficent. This does not model reality adequatly, though:\n    Modern tools need not only backup highly loaded servers but also all kinds\n    of client devices, too. Doing a user's backup in background may serve\n    for some basic “data loss” prevention, but a proper backup should contain\n    a consistent state of the data. A potential way to achieve this is to\n    let the user do backups explicitly. Additionally, many people backup to\n    external devices that should be taken offline “as soon as possible” to avoid\n    damages by electrical failures. In both cases, _short backup times_ are\n    desirable. While parallelization cannot be mistaken for high performance or\n    short time of execution in general, it _scales_ and hence allows the backup\n    to become faster with newer computers. It is always possible to turn\n    parallelization off to have lower CPU load over a longer time, but being\n    able to parallelize significant parts of the backup process saves time\n    in practice and is thus desirable!\n\n## Advanced Features Explained\n\nCompression and Deduplication\n:   Compression and Deduplication are both techniques to make the effective\n    backup size on disk smaller. It is not exactly easy to clearly distinguish\n    them (see \u003chttps://stackoverflow.com/questions/35390533/actual-difference-between-data-compression-and-data-deduplication\u003e for an idea). Here is a shorter\n    “rough” idea: Compression works on an individual data stream that cannot\n    be added to/removed from later (except by rewriting it). Deduplication lifts\n    this limitation by providing CRUD (create read update delete) operations\n    while still eliminating redunancies.\n    On the other hand, deduplication _does not imply_ compression because in\n    practice, deduplication only works for “rather large” redundant pieces\n    (many KiB) of data whereas typical compression algorithms already work in\n    for sub-KiB input data sizes. Good tools hence combine both approaches.\n    Deduplication is a very useful feature for a backup tool because it allows\n    backing up certain data that are largely redundant (like virtual machines\n    and containers) to target storages that are much smaller. While it\n    is often claimed that “storage is cheap”, this claim fails as soon as the\n    limits of a single drive are reached. Even which cheap storage, needing less\n    of it allows a higher number of old backups to be retained at same cost.\n\n~~~\n# an example of compression working efficiently for less than 1 KiB\n# of redundant input data:\n$ for i in `seq 1 80`; do echo hello world; done | wc -c\n960\n$ for i in `seq 1 80`; do echo hello world; done | gzip -9 | wc -c\n41\n~~~\n\nIntegrity Checks\n:   In an ideal world, everyone would test their backups regularly by performing\n    a full restore. Given certain storage and time constraints and difficulty\n    in automating such a process -- remember that such a process works on actual\n    production data and needs to access them all -- the ability to check the\n    restoration without actually providing the comparison data is an important\n    feature.\n\nData Archival\n:   Often not seen at the core of a backup strategy, _data archival_ is the\n    process of collecting old data and storing it _away_ i.e. separate from\n    the “active” production data and backups. Accessing an archive is the last\n    resort for certain tasks like “our database has been corrupted for months\n    and nobody noticed”. While one might argue that such a thing _never_ occurs,\n    here is a counterexample from experience: In the Thunderbird E-Mail client,\n    I used to have a default setting that if a folder contained more than 1000\n    messages, the oldest ones would be deleted until there were only 1000 left.\n    This seemed reasonable at the time of setup but backfired when a local\n    folder of collected mailing list wisdom exceeded the limit. Of course, as\n    that folder was accessed very rarely, it took literally months to notice.\n    By then, a certain number of messages had already vanished. Had there only\n    been a “regular” backup, luck would have been needed to have it configured\n    to retain yearly snapshots or such. Using archive data, it was not exactly\n    easy to retrieve the lost messages (given that two MBOX files had to be\n    merged), but it was possible and performed successfully.\n    Now why is the archive stored away? The main reasons are: (1) Archives are\n    much larger than the regular backups because they contain all of the data.\n    Hence, it makes sense to keep them on cheaper and slower storage. The\n    traditional way would be using tape, but a dedicated (low-end) NAS or an\n    extra large pair of HDDs may also serve. (2) Archives should be protected\n    from the typical accidents like deletion, malware etc. Just like with\n    backups it makes sense to keep copies of them offsite, but (also just like\n    with backups) this means additional costs. In certain cases where archives\n    are rarely needed, it might make sense to not have multiple copies of\n    them. It is interesting to note that data archival does not seem to be a\n    feature really addressed by most modern backup tools.\n\nWorks on Slow Target Storage\n:   Even tools that do not support storing their data to network devices may be\n    used in conjunction with networked file systems like e.g. NFS, SSHFS,\n    WebDav etc. Except for NFS, these file systems' characteristics differ\n    so much from local ones that it is not uncommon for tools to choke on them.\n    E.g.: I once tried to create a 7z archive directly on an SSHFS and it was\n    extremly slow. Much slower than first creating the 7z locally and then\n    sending it to the remote with `scp`. While slowless can and needs to be\n    accepted in this context _to some extent_, there are limits. Also, some\n    tools actually fail to store their backups on slow targed storages.\n    Everything from stack traces, timeout errors and totally unclear messages\n    has been observed in practice including cases where restoring the backup\n    was not possible afterwards.\n    It thus makes sense to explicitly research on this, although it is difficult\n    to make a definitive decision on it in practice.\n\nReadable by Third-Party Tools\n:   A backup needs to be restorable in time of need. While the best choice of\n    restoration is certainly the tool the backup was created with, it is also\n    imaginable that the program will not be available on the target of\n    restoration. Also, imagine that the restoration routine does not work at\n    all or fails due to an inconsistency in the backup or some other error.\n    A means to establish confidence in the reliability of the solution is the\n    ability to restore data even without the tool that originally wrote them.\n    Given the complexity of storage that includes deduplication, encryption,\n    compression and multiple backup versions/hosts it is not surprising that\n    most modern tools' data can only be read by themselves. Yet, it would be\n    highly desirable for independent and compatible restoration tools to exist!\n\nWindows Support w/o WSL/Cygwin\n:   Even if Linux is the primary system of concern (for the sake of this\n    article), there are good reasons for why Windows support is beneficial:\n    Restoration in time of need could happen on a common device found somewhere\n    including lent or old devices. Chances are, these will run some\n    (potentially ancient!) sort of Windows. Restoration on Windows would be\n    the only chance in this scenario. WSL, Cygwin, Docker, VMs and all other\n    imaginable „Linux on Windows” means _do work in practice_, but only under\n    certain good circumstances like: reasonably powerful computer,\n    administrative privileges, permission to modify OS data, Internet access.\n    From an entirely different point of view: Linux-only networks are pretty\n    rare. Most often, there are some Linux servers and Windows clients. Given\n    that a good backup tool is already known on Linux, why not use the tool\n    for Windows, too? Hence, while not strictly essential, the ability to also\n    _create backups on Windows_ has its advantages!\n\nRetention Policy for Versions\n:   A _retention policy_ specifies how many old backup states are to be kept.\n    Traditional solutions often impose limits on this by the technical aspects\n    of incremental and full backups e.g.: All increments from the latest full\n    backup up to now inclusive need to be retained to restore the latest state.\n    Modern tools no longer have this restriction and can thus provide more\n    useful retention policies like e.g.: Keep the last three versions plus one\n    copy from each of the last three weeks. This would be six backup states in\n    total reaching back three weeks from now with reduced “density” for the\n    older versions. Retention policies are desirable, because they can serve\n    as a (limited) substitute for archival storage. Additionally, restoring\n    from a retained backup state is expected to always be faster/cheaper/...\n    than from archival storage.\n\nDirectly Upload to Remote\n:   As already mentioned in the Preamble, backups are nowdays often stored on\n    online storage. Of course, a local server may also do, but\n    local filesystems as target storage are certainly the exception. It thus\n    makes sense for backup tools to integrate the ability to upload the backup\n    data directly to a target server. In the case of having a dedicated server\n    component for the respecitive backup tool, this can enable additional\n    features up to the complexity and power of networked backup solutions.\n\n## Ideas for Very Advanced Features\n\nMount Backup as R/O Filesystem\n:   Mounting backups is useful because it allows chosing the files to restore\n    using the established file mangers designed for the purpose of navigating\n    large directory structures. It is not strictly needed but user-friendly.\n\nMultiple Hosts Backup to same Target\n:   This is a feature from networked backup solutions that could be achieved by\n    single-host tools, too. It is especially intersting to consider tools\n    deduplicating across multiple hosts as this makes OS backups very efficent.\n    There are some limits on this e.g. multiple hosts writing to the same target\n    storage concurrently or one of the hosts corrupting the data of other hosts\n    by incorrectly/maliciously performing deduplication actions.\n\nProcess Non-Persistent Live Streams\n:   While this article concentrates on file backups, there are also things like\n    databases that can be backed up by “exporting” them to a file/stream. Some\n    backup tools can process these streams directly avoiding writing sensitive\n    and large data to a temporary storage.\n\nConfigure Output File Size Limit\n:   Cheap online storages often impose a maximum file size limit. While it is\n    often pretty large (e.g. magnitude of GiB) for paying customers, it is often\n    tightly limited for “free” accounts (e.g. magnitude of a few MiB). If a\n    tool can adjust to these limits, it becomes usable across a wider area of\n    target storages.\n\nConsistent State on Interruption\n:   Backup processes might get interrupted. Just like other important processing\n    tasks, it should be possible to resume them and recover from crashes. Some\n    tools (e.g. JMBB) do not support this, though.\n\nIncremental Metadata Store/Update\n:   While the actual data contents are often compressed/deduplicated and stored\n    efficiently minimizing the number of read and write operations, same does\n    not necessarily hold for the metadata. JMBB just rewrites the whole\n    “database” of metadata on each run. Advanced tools often seem to use local\n    cache directories to speed up the management of metadata. Neither solution\n    is ideal: The ideal tool would not need such things and rather store\n    everything efficiently.\n\nConcurrent Write to Same Target\n:   While it seems difficult to support this for non-networked solutions,\n    there are multiple tools claiming to do this. The advantage is clearly that\n    one could configure a lot of machines independently to store to the same\n    target and one would not need to coordinate the times at which backups are\n    performed. One could even have multiple users run backup processes to the\n    same target at the time of their own choice.\n\nRetry on Fail Mechanisms\n:   Especially in the presence of slow storage or virtual hard drives backed\n    by networks, it makes sense to retry failed write operations. Similarly,\n    the input file system may change while being backed up and a retry could\n    find another (consistent) state. Tools do not usually implement this,\n    possibly due to high complexity and difficulty in deciding _when_ to retry.\n\nGDPR-style Data Deletion Requests\n:   While it can be argued that data need not be explicitly deleted from\n    backups due to the excessive complexity of implementing it securely, it\n    would nevertheless be interesting to find out what means a backup program\n    can provide to actually delete data from the backups in the sense that the\n    program assures that after completion of the process, data is no longer\n    present in the backup. This differs from operations that try to re-claim\n    space occupied by deleted data in that these can work on a best-effort basis\n    without issues whereas regulatory deletions need to actually happen.\n\nIntegrated Cloud Storage Client\n:   This is the advanced version of _directly upload to remote_ where backups\n    not only go to a remote location but explicitly an Internet target. Tools\n    supporting this are expected to be able to directly communicate with\n    the respective vendor-specific APIs.\n\nData Redundancy and Bit Rot Recover\n:   The larger the data to be backed up becomes, the more likely _bit rot_ is\n    to occur. Many authors of backup tools argue that it is better to avoid bit\n    rot at a different level in the storage hierarchy, e.g. file systems like\n    ZFS could ensure this. This does not, however, match the practical\n    requirements of being able to portably use backups across file systems.\n    Strong restrictions are imposed by portable devices: E.g. ZFS is a poor\n    choice given that it can only be read on specialized systems and needs to be\n    imported/exported all the time. Other Linux file systems are more portable\n    across Linux versions but cannot be read on Windows. If compatibility is\n    sought, only exFAT, FAT32 and the like remain -- all choices that offer\n    _no_ protection for the stored data. Hence, integrated redundancy is\n    useful. Tools do not usually seem to implement this, though.\n\nCrypto-Tojan-Proof Pull-Scheme\n:   In recent times, ransomware malware attacks have risen. All backup tools\n    need to be audited as to how malware could destroy old copies of the data\n    it encrypts. In fact, many known attacks by ransomware specifically included\n    a dedicated strategy by the adversary to delete backups. It has often been\n    concluded from that, that only a pull-based scheme is a safe scheme. It\n    turns out that reasonable ideas for making a safe push-based scheme (e.g.\n    simplified: with a server that does not allow deletion) also exist. All of\n    the measures to protect against ransomware are subsumed under this point.\n\nConsistently Backup running VMs or DBs\n:   In an ideal world, there would not be a need to export/stream certain\n    hard-to-backup data to the backup tool. Instead, the tool would detect\n    the presence of such data and automatically invoke the necessary backup\n    procedure. Typical file-based tools do not implement anything like that,\n    though.\n\nREST APIs\n:   Modern programs often interact with REST APIs. Having them available, allows\n    for a high grade of automation and monitoring and thus enhances the\n    reliability and completeness of a backup. It is not a required feature,\n    though.\n\nBenchmark Scenario\n==================\n\nThe benchmark scenario presented in this article is intended to closely resemble\npractice with some simplifications to allow performing tests in reasonable time.\n\nThese simplifications include running backups directly after each other,\nreducing the amount of input data and preferring the use of faster storages to\ndistinguish the backup tools' performances from the underlying storage systems'.\n\nThe benchmark consists of multiple groups of tests described in the following\nsubsections.\n\n## Data-Test\n\nThe _Data-Test_ most closely resembles the typical backup operation. This test\nconsists of multiple past states of the most important Ma_Sys.ma data. These\nstates were recovered from an archive created with JMBB.\n\nHere is a table showing the 30 snapshots used for the tests:\n\nState  Represents Date  Size/GiB  Number of Files\n-----  ---------------  --------  ---------------\nx2a00  15.06.2018       26.73     387026\nx2a4c  07.07.2018       27.22     391406\nx2aaf  27.07.2018       27.95     393288\nx2af8  29.08.2018       27.94     398013\nx2b81  18.09.2018       28.28     417315\nx2bcb  14.10.2018       28.19     407058\nx2c15  13.11.2018       28.80     411347\nx2c80  18.12.2018       29.00     415495\nx2ccb  18.01.2019       29.72     422693\nx2d17  18.02.2019       30.04     429587\nx2d6b  14.03.2019       30.48     429836\nx2dbf  14.04.2019       30.94     453757\nx2e0c  14.05.2019       31.40     456110\nx2e57  09.06.2019       31.84     461060\nx2eb4  18.07.2019       33.26     466967\nx2efd  17.08.2019       33.68     470190\nx3001  26.12.2019       40.59     525879\nx30da  24.02.2020       35.93     473647\nx312a  24.03.2020       34.84     461554\nx31a3  21.04.2020       35.36     466773\nx31ec  27.05.2020       36.03     471625\nx323b  22.06.2020       37.14     476479\nx3288  23.07.2020       37.78     479961\nx32e1  23.08.2020       38.51     483756\nx3333  20.09.2020       38.97     487709\nx338a  03.10.2020       40.06     490852\nx33d5  28.10.2020       40.51     494283\nx3436  22.11.2020       41.50     484262\nx3483  18.12.2020       42.42     487438\nx34d7  20.01.2021       42.45     489874\n\nHere are two graphs displaying how the data might have changed between these\nversions. A file with change in modification date is considered _changed_,\na file not present in the previous backup is considered _added_ and a file\nnot present in the current backup is considered _removed_. Of course, in\npractice, it's the backup tools' job to identify the changes and they all do\nthis surprisingly well.\n\n![Estimated Changes in the Input Data between Backup Versions in MiB](backup_tests_borg_bupstash_kopia_att/inputchangesmib)\n\n![Estimated Changes in the Input Data between Backup Versions in Number of Files](backup_tests_borg_bupstash_kopia_att/inputchangesnumf)\n\nThe basic test procedure is as follows:\n\n * For each backup tool initialize an empty backup target repository\n * For each backup state:\n    * Load backup into ramdisk\n    * For each backup tool:\n       * backup from ramdisk to target repository.\n       * expire all but the most recent backup.\n       * invoke GC procedure.\n    * Analyze the state of the output directories\n * For each backup tool restore most recent state to ramdisk and compute\n   all files' SHA-256 hashes.\n\nThe Data-Test is executed for different backup target storage types:\n\n 1. Backup to local NVMe SSD\n 2. Backup to remote SATA SSD over NFS (without testing restore)\n 3. Backup to remote SATA SSD over SSHFS (without testing restore)\n\n## Games-Test\n\nThe _Games-Test_ scenario uses a larger input set of playonlinux game\ninstallations.\n\nTestset  Size/GiB  Number of Files  Largest File/GiB\n-------  --------  ---------------  ----------------\nGames    172.33    123 345          3.99\n\nThe test procedure is as follows:\n\n * For each backup tool except JMBB:\n    * Backup games directory from HDD to remote SATA SSD.\n    * Backup games directory again to remote SATA SSD.\n    * expire all but the most recent backup\n    * invoke GC procedure\n\nThe idea of this test is as follows:\n\n * The first run finds out how well the tool copes with many small files on\n   HDD. Traditionally, all tools have issues with this...\n * The second run checks how efficiently the tools cope with unchanged files.\n\nTo transfer data over network, the following protocols are used:\n\n * Borg: NFS, because it is known to work quite well from previous experience\n * Bupstash: SSH with Bupstash on target server\n * Kopia: SSH without Kopia on target server\n\nLike with the Data-Test, caches reside on a local NVMe SSD.\n\n## VM-Test\n\nThe _VM-Test_ simulates backing up virtual machines and is intended to check the\ntools' performance wrt. in-file deduplication potential. It consists of two\npartially overlapping test data sets.\n\nTest Data Set  Size/GiB  Number of Files  Largest File/GiB\n-------------  --------  ---------------  ----------------\nVMS0           414.70    9                88.31\nVMS1           459.28    9                88.31\n\nHere is a table of the input files. Names have been simplified, sizes are\ndisplayed as reported by DirStat 2's GUI.\n\nFile                       Changed  Size0/GiB  Size1/GiB\n-------------------------  -------  ---------  ---------\n`list.txt`                 yes      0          0\n`app-docker-backup.qcow2`  no       70.1       70.1\n`deb-64-new.qcow2`         yes      80.9       80.9\n`deb-mdvl-64....qcow2`     yes      38.62      --\n`deb-sid.qcow2`            yes      49.21      49.36\n`test-ubuntu.qcow2`        yes      --         22.16\n`win-10-64-de....qcow2`    yes      --         84.8\n`win-10-64-ds.qcow2`       no       88.31      88.31\n`win-8-1-32-ds-o.qcow2`    yes      23.21      --\n`win-8-1-32-ds.qcow2`      yes      52.12      52.12\n`win-xp-sp-3-ht-1.qcow2`   no       13.11      13.11\n                                                \n                           Sum      414.70     459.28\n\nThe test procedure is as follows:\n\n * For each backup tool except JMBB:\n    * Backup VMS0 from SATA HDD to remote SATA SSD\n    * Backup VMS1 from SATA HDD to remote SATA SSD\n    * expire all but the most recent backup\n    * invoke GC procedure\n\nKopia and Borg were run over NFS, Bupstash over SSH, because it failed to\nexecute the GC procedure over NFS.\n\n## Auxiliary Tools\n\nIn order to capture information about the respective tools' runtimes, the\nsystem loads and the states of files on disk, the following tools have been\nused in conjunction:\n\n * commandline tools `du`, `ls -R`, `find`,\n   [parallel(1)](https://manpages.debian.org/buster/parallel/parallel.1.en.html),\n   [sha256sum(1)](https://manpages.debian.org/buster/coreutils/sha256sum.1.en.html)\n * GNU Time ([time(1)](https://manpages.debian.org/buster/time/time.1.en.html),\n   not the shell builtin)\n * [dirstat(32)](../32/dirstat.xhtml) with PostgreSQL.\n * Libreoffice Calc to mangle tabular results and create diagrams from SQL\n   queries against the DirStat 2 database.\n * Telegraf, Grafana, Influxdb\n\nScripts used to run the respective tests can be found in directory `automation`\ninside the repository associated with this article.\n\n## Test Platforms\n\nThree computers participate in the tests, all running Debian 10 Buster amd64.\nThe tests themselves were executed in a systemd-nspawn container on\n`masysma-18` running Debian 11 Bullseye (Testing):\n\nHost          Use                   RAM/GiB  ECC  CPU                   FS\n------------  --------------------  -------  ---  --------------------  ----\n`masysma-18`  Main test machine     128      Y    Intel Xeon W-2295     ZFS\n`pte5`        NFS and SSHFS target  32       Y    Intel Xeon E3-1231v3  ext4\n`masysma-16`  Influxdb, Grafana     8        N    Intel Celeron J3455   ext4\n\nIn terms of network, all machines are connected through a Gigabit Ethernet\nswitch. Additionally, a 10GE link between `pte5` and `masysma-18` is established\nfor data transfers (NFS, SSHFS, SSH).\n\nPre-Test Insights\n=================\n\nBefore beginning the actual tests, some experiments were made to determine the\nusage of the respective tools. Additionally, previous experience with JMBB and\nBorg already existed. This section provides a summary of related insights and\nfindings.\n\nBorg works purely sequential. There is a long standing issue in Github\n\u003chttps://github.com/borgbackup/borg/issues/37\u003e about this and it boils down to\nthe fact that it is difficult to add multithreading in retrospect. From regular\nuse at the Ma_Sys.ma, Borg is known to backup successfully over SSH to its own\nserver, NFS (without dedicated Borg server on the receiving end) and external\nHDD storage.\n\nJMBB is highly resource-intensive. Upon development, compression was the major\nidea to reduce output data size and the huge amount of memory was not seen as\na problem due to the presence of ever growing RAM sizes in the computers. So far\nthis held true, but test results (see further down) confirm that JMBB's use of\nCPU and memory is quite wasteful.\n\nBupstash is the newest contender and has some rough edges. Upon getting started\nwith the tool, a _Permission denied_ error was encountered when trying to backup\nsome data. In fact, the user account lacked permissions to read some of the\nfiles, but: Bupstash would not tell which of the files the error was about and\nit would also not continue the backup process. It was intended to report this\nas a bug, but as of 2021/04/09 the issue is already fixed in the latest git\nversion -- it now reports which file/directory caused the error.\n\nSetting up Bupstash is straight-forward due to a limited and sensible set of\noptions. It is more complex than with JMBB or Borg though, because it requires a\ndedicated key to be generated and that key is not backed up by default. This\nessentially means that it is the user's responsibility to protect the key with a\npassword and store the result with the backup in order to be able to later\nrestore the backup by using a password.\n\nKopia is the hardest tool to setup from all the tools considered here. This\nstarts with initializing a backup storage: All tools except JMBB need this step\nbefore performing the first backup whereas JMBB asks the user interactively for\nthe password upon first backup but otherwise does not need to do any setup.\nKopia, however, requires three setup steps:\n\n 1. Creation of a “repository”\n 2. Connecting to the repository\n 3. Configuration of a backup policy\n\nThe second step may stem from the fact that Kopia supports multiple different\ntarget storages that may require a dedicated and specific login procedure of\nsorts. The _backup policy_ thing is quite unintuitive though: Kopia has\n“global” and target-specific policies where things not configured in the\ntarget-specific one can be inherited from the “global” ones. Also, the default\npolicy includes that files listed in `.kopiaignore` files are ignored opening\nup a simple attack surface where a malware would just add all of the user's\nfiles to hidden `.kopiaignore` files tricking the user into believing that data\nis backed up while it is in fact ignored. To some extent, this is my point of\nview that backup excludes should be minimal as not to confuse the user about\nwhich files are backed up and which not. Of course, `.kopiaignore` is also a\nuseful feature to allow ignoring certain files that do not need to be backed up\nwith exceptionally fine granularity.\n\nTest Results\n============\n\n## Data-Test\n\nHere is a table of the sizes of the backup target and cache directories after\nbacking up the respective given state. States in between have been tested, too,\nbut are not shown in the table.\n\nState  Tool      Files in Cache  Size of Cache/MiB  Backup Files  Backup/MiB\n-----  --------  --------------  -----------------  ------------  ----------\nx2a00  Borg      8               50                 63            18 022\n       Bupstash  3               13                 41 833        20 897\n       JMBB      --              --                 877           18 283\n       Kopia     51              197                2 629         24 948\n       Kopia*    54              197                2 635         24 901\n                                                                   \nx2eb4  Borg      8               66                 134           23 048\n       Bupstash  3               26                 49 863        26 298\n       JMBB      --              --                 1 221         24 172\n       Kopia     777             790                3 631         34 214\n       Kopia*    7 223           1 367              3 736         31 926\n                                                                   \nx34d7  Borg      8               90                 228           30 908\n       Bupstash  3               30                 63 056        33 978\n       JMBB      --              --                 1 717         34 357\n       Kopia     2 279           1 883              5 591         53 535\n       Kopia*    26 086          3 349              5 513         43 076\n\nThe following conclusions can be drawn from the output sizes for the respective\ntools:\n\n * Borg: Borg produces the smallest overall backup sizes and the smallest\n   number of files. These individual files are large, though (the largest file\n   in the backup is 502 MiB large)\n * Bupstash: Performs quite OK, but produces the largest number of output files\n   which is more than factor ten to the nearest contender. It is also worth\n   noting that almost all of these files are concentrated in a single directory\n   which could cause problems with some file systems and thus limits the\n   result data structure's portability.\n * JMBB: Works without cache directories and shows fragmentation effects:\n   While initial backups contain only relevant files and are almost as small\n   as Borg's, in the end, JMBB produces the second largest overall backup size.\n   Fragmentation is a known problem in JMBB and in practice, backups have\n   grown beyond the input data size due to it.\n * Kopia: On first sight, Kopia does not perform well _at all_: It produces\n   the largest backup output size already at state x2eb4 and uses _a lot_ of\n   cache space only to worsen for state x34d7. The culprit seems to be a\n   mechanism in Kopia that makes maintenance tasks (which are required to\n   free up space!) run only after a certain _wall clock time_ has passed. As the\n   backups were created in an automated fashion in less than 24h of runtime,\n   Kopia's maintenance routine _never ran_, i.e. the ultimate state seen\n   actaully contains snapshots of all the individual backup states without\n   anything freed. As of this writing, there does not even seem to be a flag\n   to force the maintenance.\n   See \u003chttps://kopia.discourse.group/t/how-to-immediately-gc-unused-blobs/295\u003e\n   and \u003chttps://github.com/kopia/kopia/issues/800\u003e.\n * Kopia*: In order to actually run Kopia's garbage collection procedure, a\n   separate VM was prepared with the test data and its clock was set to the\n   respective original backup dates before invoking Kopia. Logs and file sizes\n   confirm that the cleanup procedure actually ran. It is worth noting that this\n   invocation caused even larger cache directories to be generated. Upon\n   inspecting the cache's file structure, there is no obvious indicator as to\n   _why_ it is that large.\n\nConcluding from the backup size performance, Borg seems to perform best, JMBB\nseems very viable and Bupstash does a good job apart from the large number of\nfiles. Kopia tests are more difficult to perform due to its invocation behaving\ntime-dependently. Even with a “faked” system time, Kopia's backup size remains\nlarger than the other tools' and its use of cache data seems excessive\nespecially with respect to the other tools'.\n\nAnother aspect to check for the tools is how much computation power and time\nthey needed.\n\nColumns _Wall Time_, _Peak Memory_ and _Avg. CPU_ are populated by the data from\nGNU Time as reported for the script that does the actual backup run followed\nby a garbage collection. In theory, I/O statistics were also gathered by\n_Telegraf_, it turned out that the data was quite unrealistic due to various\neffects including the long sample time (15sec) and the exclusion of child\nprocesses. Hence, I/O is not shown here.\n\nFor CPU: 100% means one core fully loaded, i.e. 2000% means „20 cores fully\nloaded”. The test system's Xeon W-2295 exposes 36 virtual cores to the OS.\nHence 3600% CPU load means „all cores fully loaded”. Due to Amdahl's Law, it\nis unlikely/impossible to attain 3600% CPU for realistic workloads like those\npresented here.\n\nState  Tool      Wall Time/s  Peak Memory/MiB  Avg. CPU/%\n-----  --------  -----------  ---------------  ----------\nx2a00  Borg      9 579        388.59           99\n       Bupstash  120          54.75            115\n       JMBB      1 297        24 912.91        2 862\n       Kopia     168          1 403.60         340\n       Kopia*    147          883.64           291\n                                                \nx2eb4  Borg      624          349.76           99\n       Bupstash  88           45.65            101\n       JMBB      117          18 481.07        2 276\n       Kopia     72           955.36           400\n       Kopia*    49           600.93           226\n                                                \nx34d7  Borg      359          399.51           99\n       Bupstash  106          51.02            100\n       JMBB      80           24 115.33        1 645\n       Kopia     91           757.91           415\n       Kopia*    56           558.57           224\n\nBefore drawing any conclusions, here are some notes about the data in general:\n\n * The values for all entries except “Kopia*” have been found to be similar to\n   those collected by _Telegraf_.\n * The data for Kopia* cannot be compared to the other ones because the VM has\n   a different target file system (ext4 rather than ZFS) and less CPU threads\n   (18 instead of 36).\n\nThe following conclusions can be drawn about the individual metrics from thte\ntable above:\n\n * In terms of time, one can observe three tiers: (1) Borg is the slowest with\n   factor 80 to the fastest contender for the initial backup and factor 3\n   compared to the next slowest alternative in the last backup. (2) JMBB is much\n   slower than its newer and more feature-rich alternatives Kopia and Bupstash\n   and ranges from factor 11 (worst case initial backup cration) to being\n   the fastest among the contenders for the last backup. It is quite notable\n   that in terms of speed, it can still compete with the new and shiny. (3)\n   Bupstash and Kopia expose similar timings and perform initial backups at\n   astonishing speeds. While their times differ seemingly significantly, it has\n   to be taken into account that backing up from RAM to NVMe SSD is an absolute\n   ideal scenario where I/O times are almost zero. Hence, one cannot conclude\n   that in practice, the one will be faster than the other. Judging from the\n   not-directly-comparable ext4 results for the Kopia* VM one can see that\n   at least in this test, Kopia does not show any sign that it is slowed down\n   by the actual activation of garbage collection routines. One might even\n   speculate that they speed up subsequent backups?\n * In terms of memory, the situation is quite obvious. From best (most\n   memory-efficient) to worst (most memory-wasteful) one can arrange them as\n   follows: Bupstash, Borg, Kopia, JMBB with JMBB being worlds less efficient\n   than the others. In the worst case (last backup) it peaks at 473 times the\n   memory of the most efficient contender Bupstash. The only thing to note in\n   JMBB's defence is that of course, having less cores available will make it\n   use less memory as fewer parallel XZ compressors will run. Still, it is clear\n   that JMBB requires the most memory by a large margin. As a side-note,\n   JMBB's README recommends 300 MiB + 36 * 600 MiB = 21900 MiB of RAM showing\n   that even my past estimate does not put up with what it really needs.\n   Apart from the extreme JMBB, a typical modern computer should not have any\n   issues with running any of the programs. It is interesting to note that\n   the best contender in terms of memory (Bupstash) undercuts its nearest\n   alternative Borg by factor 7 and operates fine with a maximum of 55 MiB\n   making this tool suited for invocation on low-end devices like ARM SBCs or\n   i386-machines which may have as little as 512 MiB RAM provided that one\n   can compile it for the respective platforms (not tested here!).\n * Looking at the CPU usage, Kopia and JMBB are the tools that obviously run\n   significant parts of their computation in parallel. JMBB is again the\n   heaviest on resources while the other tools load the CPU much less such that\n   other applications may continue to function with minimal performance issues.\n   Borg acts purely single-threaded yielding an average CPU usage of 99%.\n\n## Data-Test to NFS and SSHFS\n\nTo find out how times change with a more realistic scenario where data is sent\nover network, tests with NFS and SSHFS targets have been performed. The\nfollowing wall times could be observed for the different target storages.\nColumn _Local_ has been copied from before for comparison.\n\nState  Tool      Local/s  NFS/s  NFS3/s  SSHFS/s\n-----  --------  -------  -----  ------  -------\nx2a00  Borg      9 579    8 954  7 913   11 256\n       Bupstash  120      1 243  1 157   470\n       JMBB      1 297    690    696     8 392\n       Kopia     168      271    277     350\n                                          \nx2eb4  Borg      624      596    746     655\n       Bupstash  88       --     116     108\n       JMBB      117      106    69      492\n       Kopia     72       76     83      117\n                                          \nx34d7  Borg      359      367    383     371\n       Bupstash  106      --     122     113\n       JMBB      80       59     62      391\n       Kopia     91       96     107     151\n\nFrom past experience with the respective storage targets, one would have\nexpected to find the following sequence (shortest to longest time):\nLocal, NFS, SSHFS. It turns out that in practice, for the backup tools, results\nvary geratly.\n\nThe results for Bupstash over NFS are missing from the second test onwards due\nto the fact that a reproducible error occurs when invoking the garbage\ncollection procedure over NFS:\n\n\tb77083cf8d227db12e904da3a175e2c3\n\t1 item(s) removed\n\tbupstash serve: Bad file descriptor (os error 9)\n\tbupstash gc: remote disconnected\n\tCommand exited with non-zero status 1\n\t56.05user 14.75system 1:11.13elapsed 99%CPU (0avgtext+0avgdata 51240maxresident)k\n\t59731063inputs+865458outputs (180major+65251minor)pagefaults 0swaps\n\nAs part of creating this article, this issue has been reported under\n\u003chttps://github.com/andrewchambers/bupstash/issues/157\u003e. Update 2021/04/11:\nNFS tests were repeated with mount options `nfsvers=3,nolock` to measure\nBupstash's performance over NFS -- the new results are provided in column\n_NFS3_.\n\nApart from that, certain combinations of tool and storage seem to be\nproblematic:\n\n * Bupstash takes ten times the execution time for the initial run on NFS.\n   Its access pattern seems to somehow fail for NFS whereas it runs quite\n   acceptably over SSHFS. Using the options for working around the locking\n   issue one can observe that the slowdown for operating Bupstash on NFS is\n   much smaller (about factor 1.3) for the subsequent runs.\n * JMBB takes four to six times the execution time when running on SSHFS.\n   One explanation might be that JMBB compresses all data it reads and writes\n   from and to the target storage and hence sshfs' default compression will\n   decrease performance significantly. See\n   \u003chttps://www.admin-magazine.com/HPC/Articles/Sharing-Data-with-SSHFS\u003e and\n   \u003chttps://superuser.com/questions/344255/faster-way-to-mount-a-remote-file-system-than-sshfs\u003e\n   for some ideas about how performance could be improved. They were not tested.\n * Borg runs somewhat slow on SSHFS for the initial backup.\n\nKopia exposes the performance characteristics that one would have expected from\nexperience before this test i.e. runs on NFS are slightly slower than local FS\nand runs on SSHFS are even slower.\n\nIt is quite interesting to note that under certain circumstances, tools run\n_faster_ on NFS compared to the local file system. This is most likely related\nto (a) the difference in file systems (ZFS local vs. ext4 remote) and (b) the\nability to use the processing power and RAM cache from `pte5` in addition to the\nlocal resources on `masysma-18`. SSDs on `pte5` are known to be slower than\nthose on `masysma-18` hence the difference cannot be explained by the underlying\nstorages' capabilities alone.\n\nTo conclude from the NFS and SSHFS tests it seems that apart from certain bad\ninteractions, all tools are suited for invocation on remote file systems.\n\n## Checking the Diagrams: Grafana Dashboards during the SSHFS-Test\n\nIn addition to GNU Time, tests were monitored by Telegraf, Influxdb and Grafana.\nAlthough the results proved not to be all that useful in terms of precision,\nthey enabled getting an explorative “feeling” of the data.\n\n![Screenshot showing the Board as displayed for the whole time range of the SSHFS-Tests](backup_tests_borg_bupstash_kopia_att/grafana_all_sshfs.png)\n\nThe first screenshot is most notable for the _Ramdisk Disk Usage_ where one can\nobserve each of the data sets being filled in and staying constant during the\nprocessing. The upper left graph displays CPU, RAM, SWAP and DISK usages. The\nspikes in memory and CPU usage come from JMBB executions :). Below these\ndiagrams, one can find the individual process' metrics which are not really\nuseful on such a long time scale.\n\n![Screenshot showing the period of the large backup data set in the middle](backup_tests_borg_bupstash_kopia_att/grafana_single_sshfs.png)\n\nAlthough they are still largely indecipherable, one can already observe the\nlonger execution time for Borg form this perspective. `java` processes are not\nplotted here because given that there were some other Java background tasks,\nthey would clutter the view by overlaying the other tools' graphs.\n\n![Screenshot of a single Kopia run](backup_tests_borg_bupstash_kopia_att/grafana_detail_kopia.png)\n\nDrilling further down reveals the use and imprecision of the\napplication-specific diagrams. The first row of diagrams is now mostly constant\nwhereas the second and third one show memory, I/O and CPU respectively. From\nthe “stairs” one can already conclude that the sampling time is far longer than\nwould have been needed to achieve precise values for I/O, hence the area under\nthe I/O graphs does not sum up to the actually performed I/O.\n\n## Data-Test Restoration\n\nAll tools restored all file contents correctly from the backup according to the\nfiles' SHA-256 checksums.\n\nHere is a table of the tools' restore performance characteristics. In addition\nto the measures from GNU Time, a _Speed_ value has been derived from the data\nsize (around 44 008 MiB) and the wall time.\n\nTool      Wall Time/s  Speed/(MiB/s)  Peak Memory/MiB  Avg. CPU/%\n--------  -----------  -------------  ---------------  ----------\nBorg      1 333        33             148.80           98\nBupstash  306          144            72.34            132\nJMBB      2 806        16             16 583.30        119\nKopia     230          185            1 253.41         582\n\nThis time, the tools' performance can be clearly ordered from best to worst as\nfollows: Kopia, Bupstash, Borg, JMBB.\n\nThis test drastically puts apart the old and the new backup tools: Bupstash and\nKopia outperform their competitors by at least factor four and JMBB's restores\nare twelve times slower compared to Kopia's while at the same time needing\nthirteen times as much memory.\n\nOne can also observe that Kopia is the only tool to perform significant parts of\nthe restore in parallel yielding the best overall performance. Bupstash and Borg\nrun very efficiently wrt. memory consumption. It is good to know that while Borg\nneeds more than 300 MiB for backup creation it can restore with half of the\nmemory. Also, Borg is significantly faster in restoring than in creating the\ninitial backup although it has about 1.6 times the data to handle accounting for\nthe increase in backup size from the initial backup x2a00 to the last backup\nx34d7.\n\n## Games-Test\n\nHere are the test results for the _Games-Test_.\nThe following table shows the backup sizes after the respective tools' first\nruns. This data does not change significantly for the subsequent run:\n\nTool      Files in Cache  Size of Cache/MiB  Backup Files  Backup/MiB\n--------  --------------  -----------------  ------------  ----------\nBorg      8               24                 322           115 113\nBupstash  3               21                 138 288       130 550\nKopia     50              66                 14 371        154 748\n\nHere are the tools' performance results from GNU Time: T1 is the initial run\nand T2 the same run again i.e. with unchanged input data.\n\nState  Tool      Wall Time/s  Peak Memory/MiB  Avg. CPU/%\n-----  --------  -----------  ---------------  ----------\nT1     Borg      34 899       312.86           98\n       Bupstash  2 760        56.78            66\n       Kopia     1 369        2 715.75         277\n                                                \nT2     Borg      18           166.03           86\n       Bupstash  3            31.16            36\n       Kopia     10           160.90           263\n\nThe resulting backup sizes are similar to those for the data tests although this\ntime, differences between them are larger and Kopia's larger backup size cannot\nbe attributed to missing garbage collection. On the positive side, caches seem\nto be smaller than with the Data test. This hints towards a certain growth in\ncache over time (i.e. after n \u003e 1 backups caches are larger than after one\nbackup).\n\nKopia takes 38.7 GiB more storage space compared to Borg making the difference\nquite significant. Again, a huge difference in the number of files can be noted\nbetween the tools where Bupstash's 138 288 files (almost all) in a single\ndirectory can be foreseen to cause trouble when e. g. attempting to copy the\nfiles to another location or upload them to a remote storage.\n\nComparing the backup times has to take into account that the tools use different\nprotocols to transfer their data over network. Borg and Kopia both run without\ntheir counterparts on the server (Borg over NFS, Kopia over SSH). In theory,\nBupstash which uses its own server component on the target server (over SSH),\nmay thus show enhanced performance.\n\nInterestingly, Bupstash does not seem to realize that theoretical performance\nadvantage because Kopia outperforms it quite significantly -- Bupstash takes\nabout twice as long compared to Kopia. Borg takes again much longer (9.6 hours\nvs. 0.8 hours for Bupstash).\n\nMemory values mostly resemble those from before although Kopia's use of memory\ngrows to 2.6 GiB which is almost double the amount observed before. In terms of\nCPU, Bupstash uses significantly less than 100% CPU (as per GNU Time) which may\nbe due to waiting for I/O or waiting for its server counterpart. To estimate\nthe load on the server side, statistics from Telegraf have been consulted.\nAccording to them, Bupstash's client side averaged at 41 % and the server\nside averaged at 8.87 % which does not at all sum up to 100 %. Hence it remains\nplausible that Bupstash was waiting for I/O operations here.\n\nTo conclude this test, none of the tools take notably long time to detect that\nthere are no changes between T1 and T2.\n\n## VM-Test\n\nThe third dataset to test is the _VM-Test_. Backup sizes and times are presented\nin the following tables in the same style used previously.\n\nState  Tool      Files in Cache  Size of Cache/MiB  Backup Files  Backup/MiB\n-----  --------  --------------  -----------------  ------------  ----------\nVMS0   Borg      8               16                 387           139 773\n       Bupstash  3               36                 219 862       177 403\n       Kopia     52              54                 30 026        320 476\n                                                                   \nVMS1   Borg      8               12                 529           171 791\n       Bupstash  3               49                 254 693       213 892\n       Kopia     105             124                43 199        462 968\n\nState  Tool      Wall Time/s  Speed/(MiB/s)  Peak Memory/MiB  Avg. CPU/%\n-----  --------  -----------  -------------  ---------------  ----------\nVMS0   Borg      60 286       7.04           304.82           99\n       Bupstash  3 788        112.10         60.65            78\n       Kopia     2 227        190.68         1093.11          270\n                                                               \nVMS1   Borg      28 471       16.52          259.66           98\n       Bupstash  3 128        150.33         67.39            73\n       Kopia     2 301        204.35         597.00           267\n\nAs the data is mostly read sequentially from HDD it makes sense to check the\naverage reading speed the tools may have had. This was calculated by the input\nsize divided through the wall time and one can e.g. notice that the entries for\nKopia practically show the maximum reading speed available from HDDs.\nExperimentally invoking `pv deb-64-new.qcow2 \u003e /dev/null` (without anything\nbeing cached) shows figures between 150 MiB/s and 250 MiB/s indicating that\nan average of 204.35 MiB/s can reasonably be seen as the practical maximum.\nNote that the HDDs are in a ZFS mirror hence values above the typical\nmaximum of 200 MiB/s for HDDs are possible.\n\nOne can again observe the backup sizes increasing in order Borg, Bupstash, Kopia\nand one can again see the times decreasing in that order with Borg taking\n(again) much longer than the other ones (from factor 27 comparing Borg and Kopia\nfor the initial run to factor 9 comparing Borg and Bupstash for the second run).\nIt can again be observed that Kopia runs parallel, quickly and using the most\nmemory although cache sizes stay low (like in the Games-Test). The differences\nin Backup sizes are increasing between Bupstash and Kopia with Kopia taking\nat least factor 1.8 of Bupstash's backup size.\n\n![Illustration of Bupstash's server component activities during the VMS1 test](backup_tests_borg_bupstash_kopia_att/grafana_detail_bupstash_server.png)\n\nLike with the Data-Test before, the final backup size for Kopia is not reliable\nas the garbage collection routine could not run. It is also not possible to\neasily garbage-collect the data after some time has passed. When trying to do\nthis, an output similar to the following is observed:\n\n\t$ date\n\tWed 07 Apr 2021 07:16:47 PM CEST\n\t$ kopia maintenance run --password=testwort --full\n\tRunning full maintenance...\n\tlooking for active contents\n\tprocessed(0/1) active 1\n\tprocessed(7/10) active 3\n\tlooking for unreferenced contents\n\tFound safe time to drop indexes: 2021-04-04 00:01:21.132068333 +0200 CEST\n\tDropping contents deleted before 2021-04-04 00:01:21.132068333 +0200 CEST\n\tRewriting contents from short packs...\n\tLooking for unreferenced blobs...\n\tDeleted total 1 unreferenced blobs (4.6 MB)\n\tFinished full maintenance.\n\nNote that this is multiple days after the actual test which completed on\n2021-04-04.\n\nConclusion\n==========\n\nThere is a certain disparity between _problems_ and _features_ here: I\npersonally can do without most of the features but do not like to live with\nthe problems. Additionally, backup is a _must have_ but also not something\none gets in touch with often as the processes themselves are automated at least\nto the point that I as a user only call a script (e.g. connect USB drive,\ncall script, disconnect). From that point of view, most of the tools' advantages\nare largely uninteresting such as long as there are no problems!\n\nThis is an unfortunate situation with backup tools in general which may be one\nof the reasons why there are so few good tools to chose from :)\n\nWithout further delay, the following table summarizes the findings by recalling\nthe greatest issues observed for the respective tools:\n\nTool      Problems\n--------  -------------------------------------------\nBorg      -- very slow especially for initial backups\n           \nJMBB      -- very slow restore\n          -- no deduplication\n          -- no files above 8 GiB\n           \nKopia     -- no Unix pipes/special files support\n          -- large caches in Data-Test\n          -- rather large backup sizes\n           \nBupstash  -- large file numbers in single directory\n\nMy conclusion from this is that _Bupstash_ is a most viable candidate. There\nare still some rough edges but given that it is the newest among the tools\nchecked that can be expected.\n\nFuture Directions\n=================\n\nNone of the tools will immediately replace JMBB here. Borg is currently in use\nfor all data that is too large for JMBB and does an acceptable job there\n(although it literally runs for hours). Given the current state of results,\nit seems most interesting to further check on Bupstash especially wrt. the\nfollowing points.\n\n * Automate a stable compilation routine to run this tool on Debian stable\n   systems.\n * Think about replacing secondary large files backups with Bupstash.\n   For backups to local (network) targets the large number of files matters\n   less.\n * Experiment with code: Try to write a proof-of-concept custom restore for\n   Bupstash to understand its data storage format esp. wrt. crypto and archival\n   options. Think about how feasible it would be to write an own tool to _write_\n   in that format.\n * Experiment with code: Try to store Bupstash's large number of files to some\n   kind of database (Riak?) and find out if the resulting storage might be more\n   “portable” across file systems.\n\nSee Also\n========\n\n## Repository Contents\n\nThe repository is structured as follows:\n\nDirectory        Contents\n---------------  ------------------------------------------------\n`automation/`    Scripts used to run the actual (long) tests\n`docker/`        Files to try out the backup tools in Containers.\n                 Not used for measurements!\n`evaluation/`    Files used for evaluating the measurements\n  `input_data/`  Gathered data about the input files (Data-Test)\n  `other/`       Dashboard JSON, RAMDISK performance\n  `scans/`       Queries for Dirstat 2 about the result sizes\n\n## External Links\n\n * Tools for backup scalability testing:\n   \u003chttps://github.com/borgbackup/backupdata\u003e.\n   Not used when creating this article but interesting!\n * A more recent article on backup tools comparison. This one includes restic\n   and Borg v2 Beta \u003chttps://github.com/deajan/backup-bench\u003e\n\nLicense\n=======\n\nLicense for repository contents as well as this document.\nSee file `LICENSE.txt` in the repository.\n\n\tComparison of Modern Linux Backup Tools -- Borg, Bupstash and Kopia,\n\tCopyright (c) 2021 Ma_Sys.ma.\n\tFor further info send an e-mail to Ma_Sys.ma@web.de.\n\t\n\tThis program is free software: you can redistribute it and/or modify\n\tit under the terms of the GNU General Public License as published by\n\tthe Free Software Foundation, either version 3 of the License, or\n\t(at your option) any later version.\n\t\n\tThis program is distributed in the hope that it will be useful,\n\tbut WITHOUT ANY WARRANTY; without even the implied warranty of\n\tMERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\n\tGNU General Public License for more details.\n\t\n\tYou should have received a copy of the GNU General Public License\n\talong with this program.  If not, see \u003chttp://www.gnu.org/licenses/\u003e.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fm7a%2Flp-backup-tests","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fm7a%2Flp-backup-tests","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fm7a%2Flp-backup-tests/lists"}