https://github.com/trapexit/scorch
Silent CORruption CHecker and filesystem audit tool
https://github.com/trapexit/scorch
bitrot corruption data-integrity filesystem md5 md5sum sha1sum
Last synced: 7 months ago
JSON representation
Silent CORruption CHecker and filesystem audit tool
- Host: GitHub
- URL: https://github.com/trapexit/scorch
- Owner: trapexit
- License: isc
- Created: 2016-10-11T18:43:23.000Z (about 9 years ago)
- Default Branch: master
- Last Pushed: 2023-02-04T23:59:24.000Z (over 2 years ago)
- Last Synced: 2024-10-15T10:34:46.352Z (about 1 year ago)
- Topics: bitrot, corruption, data-integrity, filesystem, md5, md5sum, sha1sum
- Language: Python
- Homepage:
- Size: 70.3 KB
- Stars: 195
- Watchers: 12
- Forks: 11
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# scorch (Silent CORruption CHecker)
scorch is a tool to catalog files and their hashes to help in discovering file corruption, missing files, duplicate files, etc.
### Usage
```
usage: scorch [] []scorch (Silent CORruption CHecker) is a tool to catalog files, hash
digests, and other metadata to help in discovering file corruption,
missing files, duplicates, etc.positional arguments:
instruction: * add: compute & store digests for found files
* append: compute & store digests for unhashed files
* backup: backs up selected database
* restore: restore backed up database
* list-backups: list database backups
* diff-backup: show diff between current & backup DB
* hashes: print available hash functions
* check: check stored info against files
* update: update metadata of changed files
* check+update: check and update if new
* cleanup: remove info of missing files
* delete: remove info for found files
* list: md5sum'ish compatible listing
* list-unhashed: list files not yet hashed
* list-missing: list files no longer on filesystem
* list-dups: list files w/ dup digests
* list-solo: list files w/ no dup digests
* list-failed: list files marked failed
* list-changed: list files marked changed
* in-db: show if files exist in DB
* found-in-db: print files found in DB
* notfound-in-db: print files not found in DB
directory: Directory or file to scan.optional arguments:
-d, --db=: File to store digests and other metadata in. See
docs for info. (default: /var/tmp/scorch/scorch.db)
-v, --verbose: Make `instruction` more verbose. Actual behavior
depends on the instruction. Can be used multiple
times.
-q, --quote: Shell quote/escape filenames when printed.
-r, --restrict=: * sticky: restrict scan to files with sticky bit
* readonly: restrict scan to readonly files
-f, --fnfilter=: Restrict actions to files which match regex.
-F, --negate-fnfilter Negate the fnfilter regex match.
-s, --sort=: Sorting routine on input & output. (default: natural)
* random: shuffled / random
* natural: human-friendly sort, ascending
* natural-desc: human-friendly sort, descending
* radix: RADIX sort, ascending
* radix-desc: RADIX sort, descending
* mtime: sort by file mtime, ascending
* mtime-desc: sort by file mtime, descending
* checked: sort by last time checked, ascending
* checked-desc: sort by last time checked, descending
-m, --maxactions=: Max actions before exiting. (default: maxint)
-M, --maxdata=: Max bytes to process before exiting. (default: maxint)
Can use 'K', 'M', 'G', 'T' suffix.
-T, --maxtime=: Max time to process before exiting. (default: maxint)
Can use 's', 'm', 'h', 'd' suffix.
-b, --break-on-error: Any error or digest mismatch will cause an exit.
-D, --diff-fields=: Fields to use to indicate a file has 'changed' (vs.
bitrot / modified) and should be rehashed.
Combine with ','. (default: size)
* size
* inode
* mtime
* mode
-H, --hash=: Hash algo. Use 'scorch hashes' get available algos.
(default: md5)
-h, --help: Print this message.exit codes:
* 0 : success, behavior executed, something found
* 1 : processing error
* 2 : error with command line arguments
* 4 : hash mismatch
* 8 : found
* 16 : not found, nothing processed
* 32 : interrupted
```### Database
#### Format
The file is simply CSV compressed with gzip.
```
$ # file, hash:digest, size, mode, mtime, inode, state, checked
$ zcat /var/tmp/scorch/scorch.db
/tmp/files/a,md5:d41d8cd98f00b204e9800998ecf8427e,0,33188,1546377833.3844686,123456,0,1588895022.6193066
```The 'state' value can be 'U' for unknown, 'C' for changed, 'F' for failed, or 'O' for OK.
The 'mtime' and 'checked' values are floating point seconds since epoch.
#### --db argument
The `--db` argument can take more than a path.
* /tmp/test/myfiles.db : Full path. Used as is.
* /tmp/test : If /tmp/test is a directory -> /tmp/test/scorch.db
* /tmp/test/ : Force interpretation as directory -> /tmp/test/scorch.db
* /tmp/test : /tmp/test is not a directory -> /tmp/test.db
* ./test : Prepend current working directory and same as above. Any relative path with a '/'.
* test : No forward slashes -> /var/tmp/scorch/test.dbIf there is no extension then `.db` will be added.
#### Backup / Restore
To simplify backing up the scorch database there is a backup command. Without a directory defined it will store the database to the same location as the database. If directories are added to the arguments then the database backup will be stored there.
```
$ scorch -v backup
/var/tmp/scorch/scorch.db.backup_2019-07-29T02:35:46Z
$ scorch -v backup /tmp
/tmp/scorch.db.backup_2019-07-29T02:36:12Z
$ scorch list-backups
/var/tmp/scorch/scorch.db.backup_2019-07-29T02:35:46Z
$ scorch list-backups /tmp
/tmp/scorch.db.backup_2019-07-29T02:36:12Z
/tmp/scorch.db.backup_2019-07-29T02:13:34Z
$ scorch restore /tmp/scorch.db.backup_2019-07-29T02:36:12Z
```### Example
```
$ ls -lh /tmp/files
total 0
-rw-rw-r-- 1 nobody nogroup 0 May 3 16:30 a
-rw-rw-r-- 1 nobody nogroup 0 May 3 16:30 b
-rw-rw-r-- 1 nobody nogroup 0 May 3 16:30 c$ scorch -v -d /tmp/hash.db add /tmp/files
1/3 /tmp/files/c: d41d8cd98f00b204e9800998ecf8427e
2/3 /tmp/files/a: d41d8cd98f00b204e9800998ecf8427e
3/3 /tmp/files/b: d41d8cd98f00b204e9800998ecf8427e$ scorch -v -d /tmp/hash.db check /tmp/files
1/3 /tmp/files/a: OK
2/3 /tmp/files/b: OK
3/3 /tmp/files/c: OK$ echo asdf > /tmp/files/d
$ scorch -v -d /tmp/hash.db list-unhashed /tmp/files
/tmp/files/d$ scorch -v -d /tmp/hash.db append /tmp/files
1/1 /tmp/files/d: md5:2b00042f7481c7b056c4b410d28f33cf$ scorch -d /tmp/hash.db list-dups /tmp/files
md5:d41d8cd98f00b204e9800998ecf8427e /tmp/files/a /tmp/files/b /tmp/files/c$ scorch -v -d /tmp/hash.db list-dups /tmp/files
md5:d41d8cd98f00b204e9800998ecf8427e
- /tmp/files/a
- /tmp/files/b
- /tmp/files/c$ echo foo > /tmp/files/a
$ scorch -v -d /tmp/hash.db check+update /tmp/files
1/4 /tmp/files/b: OK
2/4 /tmp/files/c: OK
3/3 /tmp/files/c: FILE CHANGED
- size: 0B -> 4B
- mtime: Tue Jan 1 16:23:57 2019 -> Tue Jan 1 16:24:09 2019
- hash: d41d8cd98f00b204e9800998ecf8427e -> d3b07384d113edec49eaa6238ad5ff00
4/4 /tmp/files/d: OK$ scorch -v -d /tmp/hash.db list /tmp/files | cut -d: -f2- | md5sum -c
/tmp/files/c: OK
/tmp/files/d: OK
/tmp/files/a: OK
/tmp/files/b: OK
```### Automation
A typical setup would probably be initialized manually by using **add** or **append**. After it's finished creating the database a cron job can be created to check, update, append, and cleanup the database. By not placing **scorch** into verbose mode only differences or failures will be printed and the output from the job running will be emailed to the user (if setup to do so).
```
#!/bin/shscorch -M 128G -T 2h check+update /tmp/files
scorch append /tmp/files
scorch cleanup /tmp/files
```# Support
#### Contact / Issue submission
* github.com: https://github.com/trapexit/scorch/issues
* email: trapexit@spawn.link
* twitter: https://twitter.com/_trapexit
* reddit: https://www.reddit.com/user/trapexit
* discord: https://discord.gg/MpAr69V#### Support development
This software is free to use and released under a very liberal license. That said if you like this software and would like to support its development donations are welcome.
* PayPal: trapexit@spawn.link
* Patreon: https://www.patreon.com/trapexit
* Bitcoin (BTC): 1DfoUd2m5WCxJAMvcFuvDpT4DR2gWX2PWb
* Bitcoin Cash (BCH): qrf257j0l09yxty4kur8dk2uma8p5vntdcpks72l8z
* Ethereum (ETH): 0xb486C0270fF75872Fc51d85879b9c15C380E66CA
* Litecoin (LTC): LW1rvHRPWtm2NUEMhJpP4DjHZY1FaJ1WYs
* Basic Attention Token (BAT): 0xE651d4900B4C305284Da43E2e182e9abE149A87A
* Zcash (ZEC): t1ZwTgmbQF23DJrzqbAmw8kXWvU2xUkkhTt
* Zcoin (XZC): a8L5Vz35KdCQe7Y7urK2pcCGau7JsqZ5Gw