Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/jedie/pyhardlinkbackup

Hardlink/Deduplication Backups with Python
https://github.com/jedie/pyhardlinkbackup

Last synced: 11 days ago
JSON representation

Hardlink/Deduplication Backups with Python

Awesome Lists containing this project

README

        

== pyhardlinkbackup

Hardlink/Deduplication Backups with Python.

* Backups should be saved as normal files in the filesystem:
** accessible without any extra software or extra meta files
** non-proprietary format
* Create backups with versioning
** every backup run creates a complete filesystem snapshot tree
** every snapshot tree can be deleted, without affecting the other snapshots
* Deduplication with hardlinks:
** Store only changed files, all other via hardlinks
** find duplicate files everywhere (even if renamed or moved files)
* useable under Windows and Linux

Requirement: Python 3.6 or newer.

Please: try, fork and contribute! ;)

| {{https://github.com/jedie/pyhardlinkbackup/workflows/test/badge.svg?branch=master|Build Status on github}} | [[https://github.com/jedie/pyhardlinkbackup/actions|github.com/jedie/pyhardlinkbackup/actions]] |
| {{https://travis-ci.org/jedie/pyhardlinkbackup.svg|Build Status on travis-ci.org}} | [[https://travis-ci.org/jedie/pyhardlinkbackup/|travis-ci.org/jedie/pyhardlinkbackup]] |
| {{https://ci.appveyor.com/api/projects/status/py5sl38ql3xciafc?svg=true|Build Status on appveyor.com}} | [[https://ci.appveyor.com/project/jedie/pyhardlinkbackup/history|ci.appveyor.com/project/jedie/pyhardlinkbackup]] |
| {{https://coveralls.io/repos/jedie/pyhardlinkbackup/badge.svg|Coverage Status on coveralls.io}} | [[https://coveralls.io/r/jedie/pyhardlinkbackup|coveralls.io/r/jedie/pyhardlinkbackup]] |
| {{https://requires.io/github/jedie/pyhardlinkbackup/requirements.svg?branch=master|Requirements Status on requires.io}} | [[https://requires.io/github/jedie/pyhardlinkbackup/requirements/|requires.io/github/jedie/pyhardlinkbackup/requirements/]] |

== Example

{{{
$ phlb backup ~/my/important/documents
...start backup, some time later...
$ phlb backup ~/my/important/documents
...
}}}
This will create deduplication backups like this:
{{{
~/pyhardlinkbackups
└── documents
├── 2016-01-07-085247
│ ├── phlb_config.ini
│ ├── spreadsheet.ods
│ ├── brief.odt
│ └── important_files.ext
└── 2016-01-07-102310
├── phlb_config.ini
├── spreadsheet.ods
├── brief.odt
└── important_files.ext
}}}

== Installation

=== Windows

# install Python 3: https://www.python.org/downloads/
# Download the file [[https://raw.githubusercontent.com/jedie/pyhardlinkbackup/master/boot_pyhardlinkbackup.cmd|boot_pyhardlinkbackup.cmd]]
# call **boot_pyhardlinkbackup.cmd** as admin (Right-click and use **Run as administrator**)

If everything works fine, you will get a venv here: {{{%ProgramFiles%\PyHardLinkBackup}}}

After the venv is created, call these scripts to finalize the setup:

# {{{%ProgramFiles%\PyHardLinkBackup\phlb_edit_config.cmd}}} - create a config .ini file
# {{{%ProgramFiles%\PyHardLinkBackup\phlb_migrate_database.cmd}}} - create database tables

To upgrade pyhardlinkbackup, call:

# {{{%ProgramFiles%\PyHardLinkBackup\phlb_upgrade_pyhardlinkbackup.cmd}}}

To start the Django webserver, call:

# {{{%ProgramFiles%\PyHardLinkBackup\phlb_run_django_webserver.cmd}}}

=== Linux

# Download the file [[https://raw.githubusercontent.com/jedie/pyhardlinkbackup/master/boot_pyhardlinkbackup.sh|boot_pyhardlinkbackup.sh]]
# call **boot_pyhardlinkbackup.sh**

If everything works fine, you will get a venv here: {{{~\pyhardlinkbackup}}}

After the venv is created, call these scripts to finalize the setup:

* {{{~/PyHardLinkBackup/phlb_edit_config.sh}}} - create a config .ini file
* {{{~/PyHardLinkBackup/phlb_migrate_database.sh}}} - create database tables

To upgrade pyhardlinkbackup, call:

* {{{~/PyHardLinkBackup/phlb_upgrade_pyhardlinkbackup.sh}}}

To start the Django webserver, call:

* {{{~/PyHardLinkBackup/phlb_run_django_webserver.sh}}}

== Starting a backup run

To start a backup run, use this helper script:

* Windows batch: {{{%ProgramFiles%\PyHardLinkBackup\pyhardlinkbackup_this_directory.cmd}}}
* Linux shell script: {{{~/PyHardLinkBackup/pyhardlinkbackup_this_directory.sh}}}

Copy this file to a location that should be backed up and just call it to run a backup.

== Verifying an existing backup

{{{
$ cd pyhardlinkbackup/
~/PyHardLinkBackup $ source bin/activate

(PyHardLinkBackup) ~/PyHardLinkBackup $ phlb verify --fast ~/PyHardLinkBackups/documents/2016-01-07-102310
}}}
With **--fast** the files' contents will not be checked.
If not given: The hashes from the files' contents will be calculated and compared. Thus, every file must be completely read from filesystem, so it will take some time.

A verify run does:
* Do all files in the backup exist?
* Compare file sizes
* Compare hashes from hash-file
* Compare files' modification timestamps
* Calculate hashes from files' contents and compare them (will be skipped if **--fast** used)

== Configuration

phlb will use a configuration file named: **PyHardLinkBackup.ini**

Search order is:
# current directory down to root
# user directory

E.g. if the current working directoy is **/foo/bar/my_files/** then the search path will be:

* {{{/foo/bar/my_files/PyHardLinkBackup.ini}}}
* {{{/foo/bar/PyHardLinkBackup.ini}}}
* {{{/foo/PyHardLinkBackup.ini}}}
* {{{/PyHardLinkBackup.ini}}}
* {{{~/PyHardLinkBackup.ini}}} //The user home directory under Windows/Linux//

=== Create / edit default .ini

You can just open the editor with the user directory .ini file with:
{{{
(PyHardLinkBackup) ~/PyHardLinkBackup $ phlb config
}}}

The defaults are stored here: [[https://github.com/jedie/PyHardLinkBackup/blob/master/pyhardlinkbackup/phlb/config_defaults.ini|/phlb/config_defaults.ini]]

=== Excluding files/folders from backup:

There are two ways to exclude files/folders from your backup.
Use the follow settings in your {{{PyHardLinkBackup.ini}}}
{{{
# Directory names that will be recursively excluded from backups (comma separated list!)
SKIP_DIRS= __pycache__, temp

# glob-style patterns to exclude files/folders from backups (used with Path.match(), Comma separated list!)
SKIP_PATTERNS= *.pyc, *.tmp, *.cache
}}}

The filesystem scan is divided into two steps:
1. Just scan the filesystem tree
2. Filter and load meta data for every directory item

The **SKIP_DIRS** is used in the first step.
The **SKIP_PATTERNS** is used the the second step.

== Upgrading pyhardlinkbackup

To upgrade to a new version just start this helper script:

* Windows: [[https://github.com/jedie/PyHardLinkBackup/blob/master/pyhardlinkbackup/helper_cmd/phlb_upgrade_pyhardlinkbackup.cmd|phlb_upgrade_pyhardlinkbackup.cmd]]
* Linux: [[https://github.com/jedie/PyHardLinkBackup/blob/master/pyhardlinkbackup/helper_sh/phlb_upgrade_pyhardlinkbackup.sh|phlb_upgrade_pyhardlinkbackup.sh]]

== Some notes

=== What is 'phlb' and 'manage' ?!?

**phlb** is a CLI.

**manage** is similar to a normal Django **manage.py**, but it always
uses the pyhardlinkbackup settings.

=== Why in hell do you use Django?!?

* Well, just because of the great database ORM and the Admin Site. ;)

=== How to go into the Django admin?

Just start:

* Windows: {{{phlb_run_django_webserver.cmd}}}
* Linux: {{{phlb_run_django_webserver.sh}}}

And then request 'localhost'
(Note: **--noreload** is needed for Windows with venv!)

== Running the unit tests

Just start: {{{phlb_run_tests.cmd}}} / {{{phlb_run_tests.sh}}} or do this:
{{{
$ cd pyhardlinkbackup/
~/PyHardLinkBackup $ source bin/activate
(PyHardLinkBackup) ~/PyHardLinkBackup $ manage test
}}}

== Using the CLI

{{{
$ cd pyhardlinkbackup/
~/PyHardLinkBackup $ source bin/activate
(PyHardLinkBackup) ~/PyHardLinkBackup $ phlb --help
Usage: phlb [OPTIONS] COMMAND [ARGS]...

pyhardlinkbackup

Options:
--version Show the version and exit.
--help Show this message and exit.

Commands:
add Scan all existing backup and add missing ones...
backup Start a Backup run
config Create/edit .ini config file
helper link helper files to given path
verify Verify a existing backup
}}}

== Add missing backups to the database

**phlb add** can be used in different scenarios:
* recreate the database
* add a backup manually

**phlb add** does this:
* scan the complete file tree under **BACKUP_PATH** (default: {{{~/PyHardLinkBackups}}})
* recreate all hash files
* add all files to database
* deduplicate with hardlinks, if possible

So it's possible to recreate the complete database:
* delete the current {{{.sqlite}}} file
* run **phlb add** to recreate

Another scenario is e.g.:
* DSLR images are stored on a network drive.
* You have already a copy of all files locally.
* You would like to add the local copy to pyhardlinkbackup.

Do the following steps:

* move the local files to a subdirectory below **BACKUP_PATH**
* e.g.: {{{~/PyHardLinkBackups/pictures/2015-12-29-000015/}}}
* Note: the date format in the subdirectory name must match the **SUB_DIR_FORMATTER** in your config
* run: **phlb add**

Now you can run **phlb backup** from your network drive to make a new, up-to-date backup.

=== Windows Development

Some notes about setting up a development environment on Windows: [[https://github.com/jedie/PyHardLinkBackup/blob/master/dev/WindowsDevelopment.creole|/dev/WindowsDevelopment.creole]]

=== Alternative solutions

* Attic: https://attic-backup.org/ (Not working on Windows, own backup archive format)
* BorgBackup: https://borgbackup.readthedocs.io/ (Fork of Attic with lots of improvements)
* msbackup: https://pypi.python.org/pypi/msbackup/ (Uses tar for backup archives)
* Duplicity: http://duplicity.nongnu.org/ (No Windows support, tar archive format)
* Burp: http://burp.grke.org/ (Client/Server solution)
* dirvish: http://www.dirvish.org/ (Client/Server solution)
* restic: https://github.com/restic/restic/ (Uses own backup archive format)

See also: https://github.com/restic/others#list-of-backup-software

== History

* **dev** - [[https://github.com/jedie/PyHardLinkBackup/compare/v0.13.0...master|compare v0.13.0...master]]
** TBC
* 18.03.2020 - v0.13.0 - [[https://github.com/jedie/PyHardLinkBackup/compare/v0.12.3...v0.13.0|compare v0.12.3...v0.13.0]]
** Dynamic chunk size
** replace {{{CHUNK_SIZE}}} in {{{PyHardLinkBackup.ini}}} with {{{MIN_CHUNK_SIZE}}}
** Fix misleading error msg for dst OSError, bad exception handling [[https://github.com/jedie/PyHardLinkBackup/issues/23|#23]]
** Fix "run django server doesn't work" [[https://github.com/jedie/PyHardLinkBackup/issues/39|#39]]
** Fix "add" command
** Deactivate PyPy tests on Travis CI
* 17.03.2020 - v0.12.3 - [[https://github.com/jedie/PyHardLinkBackup/compare/v0.12.2...v0.12.3|compare v0.12.2...v0.12.3]]
** Fix [[https://github.com/jedie/PyHardLinkBackup/issues/44|#44]] - wroing file size in process bar
** use {{{pytest-randomly}}}
** update requirements
* 06.03.2020 - v0.12.2 - [[https://github.com/jedie/PyHardLinkBackup/compare/v0.12.1...v0.12.2|compare v0.12.1...v0.12.2]]
** Enhance log file content
** Update requirements
** [[https://github.com/jedie/PyHardLinkBackup/pull/41|Fix too verbose output by decrease debug level]]
* 05.03.2020 - v0.12.1 - [[https://github.com/jedie/PyHardLinkBackup/compare/v0.12.0...v0.12.1|compare v0.12.0...v0.12.1]]
** revert lowercase {{{PyHardLinkBackup}}} for environment destination and default backup directory.
* 05.03.2020 - v0.12.0 - [[https://github.com/jedie/PyHardLinkBackup/compare/v0.11.0...v0.12.0|compare v0.11.0...v0.12.0]]
** Refactor backup process: Use https://github.com/jedie/IterFilesystem for less RAM usage and faster start on big source trees
** modernized project/sources:
*** Update to Django v2.2.x TLS
*** use poetry
*** add Makefile
*** run tests with pytest and tox
*** run tests only with python 3.8, 3.7, 3.6
*** run tests on github, too
*** remove support for python 3.5 ({{{os.scandir}}} fallback removed)
** **NOTE:** Windows support is not tested, yet! (Help wanted)
* 03.03.2019 - v0.11.0 - [[https://github.com/jedie/PyHardLinkBackup/compare/v0.10.1...v0.11.0|compare v0.10.1...v0.11.0]]
** Update boot files
** Use django v1.11.x
* 09.09.2016 - v0.10.1 - [[https://github.com/jedie/PyHardLinkBackup/compare/v0.10.0...v0.10.1|compare v0.10.0...v0.10.1]]
** Bugfix [[https://github.com/jedie/PyHardLinkBackup/issues/24|Catch scan dir errors #24]]
* 26.04.2016 - v0.10.0 - [[https://github.com/jedie/PyHardLinkBackup/compare/v0.9.1...v0.10.0|compare v0.9.1...v0.10.0]]
** move under Windows the install location from {{{%APPDATA%\PyHardLinkBackup}}} to {{{%ProgramFiles%\PyHardLinkBackup}}}
** to 'migrate': Just delete {{{%APPDATA%\PyHardLinkBackup}}} and install via **boot_pyhardlinkbackup.cmd**
* 26.04.2016 - v0.9.1 - [[https://github.com/jedie/PyHardLinkBackup/compare/v0.9.0...v0.9.1|compare v0.9.0...v0.9.1]]
** run migrate database in boot process
** Add missing migrate scripts
* 10.02.2016 - v0.9.0 - [[https://github.com/jedie/PyHardLinkBackup/compare/v0.8.0...v0.9.0|compare v0.8.0...v0.9.0]]
** Work-a-round for Windows MAX_PATH limit: Use {{{\\?\}}} path prefix internally.
** move **Path2()** to external lib: https://github.com/jedie/pathlib_revised
* 04.02.2016 - v0.8.0 - [[https://github.com/jedie/PyHardLinkBackup/compare/v0.7.0...v0.8.0|compare v0.7.0...v0.8.0]]
** New: add all missing backups to database: {{{phlb add}}} (s.above)
* 03.02.2016 - v0.7.0 - [[https://github.com/jedie/PyHardLinkBackup/compare/v0.6.4...v0.7.0|compare v0.6.4...v0.7.0]]
** New: verify a existing backup
** **IMPORTANT:** run database migration is needed!
* 01.02.2016 - v0.6.4 - [[https://github.com/jedie/PyHardLinkBackup/compare/v0.6.3...v0.6.4|compare v0.6.2...v0.6.4]]
** Windows: Bugfix temp rename error, because of the Windows API limitation, see: [[https://github.com/jedie/PyHardLinkBackup/issues/13#issuecomment-176241894|#13]]
** Linux: Bugfix scanner if symlink is broken
** Display local variables on low level errors
* 29.01.2016 - v0.6.3 - [[https://github.com/jedie/PyHardLinkBackup/compare/v0.6.2...v0.6.3|compare v0.6.2...v0.6.3]]
** Less verbose and better information about SKIP_DIRS/SKIP_PATTERNS hits
* 28.01.2016 - v0.6.2 - [[https://github.com/jedie/PyHardLinkBackup/compare/v0.6.1...v0.6.2|compare v0.6.1...v0.6.2]]
** Handle unexpected errors and continue backup with the next file
** Better handle interrupt key during execution
* 28.01.2016 - v0.6.1 - [[https://github.com/jedie/PyHardLinkBackup/compare/v0.6.0...v0.6.1|compare v0.6.0...v0.6.1]]
** Bugfix #13 by using a better temp rename routine
* 28.01.2016 - v0.6.0 - [[https://github.com/jedie/PyHardLinkBackup/compare/v0.5.1...v0.6.0|compare v0.5.1...v0.6.0]]
** New: faster backup by compare mtime/size only if old backup files exists
* 27.01.2016 - v0.5.1 - [[https://github.com/jedie/PyHardLinkBackup/compare/v0.5.0...v0.5.1|compare v0.5.0...v0.5.1]]
** **IMPORTANT:** run database migration is needed!
** New {{{.ini}}} setting: {{{LANGUAGE_CODE}}} for change translation
** mark if backup was finished compled
** Display information of last backup run
** Add more information into summary file
* 27.01.2016 - v0.5.0 - [[https://github.com/jedie/PyHardLinkBackup/compare/v0.4.2...v0.5.0|compare v0.4.2...v0.5.0]]
** refactory source tree scan. Split in two passed.
** **CHANGE** {{{SKIP_FILES}}} in {{{.ini}}} config to: {{{SKIP_PATTERNS}}}
** Backup from newest files to oldest files.
** Fix [[https://github.com/jedie/PyHardLinkBackup/issues/10|#10]]:
*** New **--name** cli option (optional) to force a backup name.
*** Display error message if backup name can be found (e.g.: backup a root folder)
* 22.01.2016 - v0.4.2 - [[https://github.com/jedie/PyHardLinkBackup/compare/v0.4.1...v0.4.2|compare v0.4.1...v0.4.2]]
** work-a-round for junction under windows, see also: https://www.python-forum.de/viewtopic.php?f=1&t=37725&p=290429#p290428 (de)
** Bugfix in windows batches: go into work dir.
** print some more status information in between.
* 22.01.2016 - v0.4.1 - [[https://github.com/jedie/PyHardLinkBackup/compare/v0.4.0...v0.4.1|compare v0.4.0...v0.4.1]]
** Skip files that can't be read/write. (and try to backup the remaining files)
* 21.01.2016 - v0.4.0 - [[https://github.com/jedie/PyHardLinkBackup/compare/v0.3.1...v0.4.0|compare v0.3.1...v0.4.0]]
** Search for //PyHardLinkBackup.ini// file in every parent directory from the current working dir
** increase default chunk size to 20MB
** save summary and log file for every backup run
* 15.01.2016 - v0.3.1 - [[https://github.com/jedie/PyHardLinkBackup/compare/v0.3.0...v0.3.1|compare v0.3.0...v0.3.1]]
** fix unittest run under windows
* 15.01.2016 - v0.3.0 - [[https://github.com/jedie/PyHardLinkBackup/compare/v0.2.0...v0.3.0|compare v0.2.0...v0.3.0]]
** **database migration needed**
** Add 'no_link_source' to database (e.g. Skip source, if 1024 links created under windows)
* 14.01.2016 - v0.2.0 - [[https://github.com/jedie/PyHardLinkBackup/compare/v0.1.8...v0.2.0|compare v0.1.8...v0.2.0]]
** good unittests coverage that covers the backup process
* 08.01.2016 - v0.1.8 - [[https://github.com/jedie/PyHardLinkBackup/compare/v0.1.0alpha0...v0.1.8|compare v0.1.0alpha0...v0.1.8]]
** install and runable under Windows
* 06.01.2016 - v0.1.0alpha0 - [[https://github.com/jedie/PyHardLinkBackup/commit/d42a5c59c0dcdf8d2f8bb2a3a3dc2c51862fed17|d42a5c5]]
** first Release on PyPi
* 29.12.2015 - [[https://github.com/jedie/PyHardLinkBackup/commit/2ce43d326fafbde5a3526194cf957f00efe0f198|commit 2ce43]]
** commit 'Proof of concept'

== Links

* https://pypi.python.org/pypi/pyhardlinkbackup/
* https://www.python-forum.de/viewtopic.php?f=6&t=37723 (de)
* https://github.com/jedie/pathlib_revised

== Donating

* [[https://www.paypal.me/JensDiemer|paypal.me/JensDiemer]]
* [[https://flattr.com/submit/auto?uid=jedie&url=https%3A%2F%2Fgithub.com%2Fjedie%2Fdjango-reversion-compare%2F|Flattr This!]]
* Send [[http://www.bitcoin.org/|Bitcoins]] to [[https://blockexplorer.com/address/1823RZ5Md1Q2X5aSXRC5LRPcYdveCiVX6F|1823RZ5Md1Q2X5aSXRC5LRPcYdveCiVX6F]]