https://github.com/vyrti/quichash
Ultra fast hashing app for Linux, Mac, Windows, Freebsd
https://github.com/vyrti/quichash
blake3 cross-platform deduplication freebsd hash linux macos md5 rust sha2 sha256 sha3 sha512 simd verification windows xxhash xxhash3
Last synced: 5 months ago
JSON representation
Ultra fast hashing app for Linux, Mac, Windows, Freebsd
- Host: GitHub
- URL: https://github.com/vyrti/quichash
- Owner: vyrti
- License: apache-2.0
- Created: 2025-11-17T23:53:36.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2026-01-09T18:55:16.000Z (5 months ago)
- Last Synced: 2026-01-13T20:54:36.335Z (5 months ago)
- Topics: blake3, cross-platform, deduplication, freebsd, hash, linux, macos, md5, rust, sha2, sha256, sha3, sha512, simd, verification, windows, xxhash, xxhash3
- Language: Rust
- Homepage:
- Size: 259 KB
- Stars: 16
- Watchers: 0
- Forks: 1
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE-APACHE
Awesome Lists containing this project
README
# QuicHash
High-performance cryptographic hash utility with SIMD optimization.
## Features
- **Algorithms**: MD5, SHA-1, SHA-2/3, BLAKE2/3, xxHash3/128
- **Defaults**: BLAKE3 algorithm, parallel processing
- **HDD Mode**: Sequential processing with `--hdd` flag for old mechanical drives
- **SIMD**: Automatic hardware acceleration (SSE, AVX, AVX2, AVX-512, NEON)
- **Optional Fast Mode**: Quick hashing for large files (samples 300MB) ONLY for edge cases
- **Flexible Input**: Files, stdin, or text strings
- **Wildcard Patterns**: Support for `*`, `?`, and `[...]` patterns in file/directory arguments
- **Directory Scanning**: Recursive hashing with parallel processing by default
- **Verification**: Compare hashes against stored database
- **Database Comparison**: Compare two databases to identify changes, moves, and differences
- **Database Analysis**: Analyze database statistics, duplicates, and potential space savings
- **Deduplication**: Find and report duplicate files based on hash comparison
- **.hashignore**: Exclude files using gitignore patterns
- **Formats**: Standard, hashdeep, JSON
- **Compression**: LZMA compression for databases
- **Cross-Platform**: Linux, macOS, Windows, FreeBSD
## Installation
```bash
cargo install quichash
```
Or build from source:
```bash
cargo build --release
```
## Quick Start
```bash
# Hash a file (uses blake3 by default)
hash myfile.txt
# Hash text
hash --text "hello world"
# Hash from stdin
cat myfile.txt | hash
# Scan directory (parallel by default)
hash scan -d ./my_dir -b hashes.db
# Scan on old HDD (sequential)
hash scan -d ./my_dir -b hashes.db --hdd
# Verify
hash verify -b hashes.db -d ./my_dir
# Analyze database
hash analyze -d hashes.db
# List algorithms
hash list
```
## Usage
### Hash Files
```bash
hash myfile.txt # Uses blake3 by default
hash myfile.txt -a sha256 # Specify algorithm
hash myfile.txt -a sha256 -a blake3 # Multiple algorithms
hash largefile.iso -f # Fast mode
hash myfile.txt -b output.txt # Save to file
hash myfile.txt --json # JSON output
```
### Wildcard Patterns
Hash multiple files using wildcard patterns:
```bash
hash "*.txt" -a sha256 # All .txt files
hash "file?.bin" -a sha256 # file1.bin, fileA.bin, etc.
hash "[abc]*.jpg" -a sha256 # Files starting with a, b, or c
hash "img202405*.jpg" -a sha256 # All images from May 2024
```
Patterns work with all commands:
```bash
hash scan -d "data/*/hashes" -a sha256 -b output.db # Multiple directories
hash verify -b "*.db" -d "data/*" --json # Multiple databases/dirs
```
### Hash Text or Stdin
```bash
hash --text "hello world" -a sha256 # Hash text
cat myfile.txt | hash -a sha256 # Hash from stdin
```
### Scan Directory
```bash
hash scan -d /path/to/dir -b hashes.db # Basic (blake3, parallel)
hash scan -d /path/to/dir -b hashes.db --hdd # Sequential for old HDDs
hash scan -d /path/to/dir -a sha256 -b hashes.db # Custom algorithm
hash scan -d /path/to/dir -b hashes.db -f # Fast mode
hash scan -d /path/to/dir -b hashes.db -f --hdd # Fast mode, sequential
hash scan -d /path/to/dir -b hashes.db --compress # Compressed
hash scan -d /path/to/dir -b hashes.db --format hashdeep # Hashdeep format
```
### Verify Directory
```bash
hash verify -b hashes.db -d /path/to/dir # Parallel (default)
hash verify -b hashes.db -d /path/to/dir --hdd # Sequential for old HDDs
hash verify -b hashes.db -d /path/to/dir --json # JSON output
```
## Performance Optimizations
### Parallel Verification (Default)
The verification engine uses parallel processing by default for significantly faster verification:
```bash
# Parallel verification (default, 2-4x faster)
hash verify -b hashes.db -d /path/to/dir
# Sequential verification (for old HDDs)
hash verify -b hashes.db -d /path/to/dir --hdd
```
**Performance improvements:**
- **Parallel by default**: Uses all CPU cores via rayon (like scan)
- **Path canonicalization caching**: Reduces redundant filesystem calls
- **Optimized file collection**: Efficient recursive directory traversal
- **Reduced overhead**: Minimizes lock contention in parallel mode
**Parallel mode (default):**
- SSDs or NVMe drives (no seek penalty)
- Large numbers of files (>1000)
- Fast network storage
- Modern systems with multiple cores
**Sequential mode (--hdd flag):**
- Old mechanical HDDs (avoid thrashing)
- Network drives with high latency
- Systems with limited CPU cores
- When minimizing system load
```bash
hash verify -b hashes.db -d /path/to/dir # Verify
hash verify -b hashes.db.xz -d /path/to/dir # Compressed
hash verify -b hashes.db -d /path/to/dir --json # JSON
```
Output shows: Matches, Mismatches, Missing files, New files
### Compare Databases
Compare two hash databases to identify changes, moves, and differences:
```bash
hash compare db1.txt db2.txt # Compare two databases
hash compare db1.txt db2.txt -b report.txt # Save report to file
hash compare db1.txt db2.txt --format json # JSON output
hash compare db1.txt db2.txt --format hashdeep # Hashdeep audit format
hash compare db1.txt.xz db2.txt.xz # Compare compressed databases
hash compare db1.txt db2.txt.xz # Mix compressed and plain
```
Output shows:
- **Unchanged**: Files with same hash in both databases
- **Changed**: Files with different hashes
- **Moved**: Files renamed or moved (same hash, different path)
- **Removed**: Files in DB1 but not DB2
- **Added**: Files in DB2 but not DB1
### Analyze Database
Analyze a hash database to view statistics, duplicates, and potential space savings:
```bash
hash analyze -d hashes.db # Analyze database
hash analyze -d hashes.db --json # JSON output
hash analyze -d hashes.db -b report.txt # Save report to file
hash analyze -d hashes.db.xz # Analyze compressed database
```
Output shows:
- **Database info**: Path, format (standard/hashdeep), size
- **Summary**: Total files, unique hashes, algorithms used
- **File sizes**: Total size of all files (hashdeep format only)
- **Duplicates**: Number of duplicate groups, duplicate files, potential space savings
### Deduplicate Files
Find and report duplicate files based on hash comparison:
```bash
hash dedup -d /path/to/dir # Find duplicates (dry-run)
hash dedup -d /path/to/dir -b report.txt # Save report to file
hash dedup -d /path/to/dir -f # Fast mode
hash dedup -d /path/to/dir --json # JSON output
```
Output shows duplicate groups with file paths and sizes.
### Benchmark & List
```bash
hash benchmark # Benchmark all algorithms
hash benchmark -s 500 # Custom data size
hash list # List algorithms
hash list --json # JSON output
```
## Command-Line Options
| Command | Option | Description |
|---------|--------|-------------|
| | `FILE` | File or wildcard pattern to hash (omit for stdin) |
| | `-t, --text ` | Hash text string |
| | `-a, --algorithm ` | Algorithm (default: blake3) |
| | `-b, --output ` | Write to file |
| | `-f, --fast` | Fast mode (samples 300MB) |
| | `--json` | JSON output |
| scan | `-d, --directory ` | Directory or wildcard pattern to scan |
| | `-a, --algorithm ` | Algorithm (default: blake3) |
| | `-b, --database ` | Output database |
| | `--hdd` | Sequential mode for old HDDs (default: parallel) |
| | `-f, --fast` | Fast mode |
| | `--format ` | standard or hashdeep |
| | `--compress` | LZMA compression |
| | `--json` | JSON output |
| verify | `-b, --database ` | Database file or wildcard pattern |
| | `-d, --directory ` | Directory or wildcard pattern to verify |
| | `--json` | JSON output |
| compare | `DATABASE1` | First database file (supports .xz) |
| | `DATABASE2` | Second database file (supports .xz) |
| | `-b, --output ` | Write report to file |
| | `--format ` | plain-text, json, or hashdeep |
| analyze | `-d, --database ` | Database file to analyze (supports .xz) |
| | `-b, --output ` | Write report to file |
| | `--json` | JSON output |
| dedup | `-d, --directory ` | Directory to scan for duplicates |
| | `-f, --fast` | Fast mode |
| | `-b, --output ` | Write report to file |
| | `--json` | JSON output |
| benchmark | `-s, --size ` | Data size (default: 100) |
| | `--json` | JSON output |
## .hashignore
Exclude files using gitignore-style patterns:
```bash
cat > /path/to/dir/.hashignore << 'EOF'
*.log
*.tmp
build/
node_modules/
!important.log
EOF
hash scan -d /path/to/dir -a sha256 -b hashes.db
```
Patterns: `*.ext`, `dir/`, `!pattern`, `#comments`, `**/*.ext`
## Output Formats
**Standard** (default):
```
```
**Hashdeep**: CSV format with file size, compatible with hashdeep tool
**JSON**: Structured output for automation
## Performance
| Algorithm | Throughput | Use Case |
|-----------|-----------|----------|
| xxHash3 | 10-30 GB/s | Non-crypto, max speed |
| BLAKE3 | 1-3 GB/s | Crypto, fastest |
| SHA-512 | 600-900 MB/s | Crypto, 64-bit |
| SHA-256 | 500-800 MB/s | Crypto, common |
| SHA3-256 | 200-400 MB/s | Post-quantum |
**Tips:**
- Parallel processing is enabled by default (2-4x faster on multi-core)
- Use `--hdd` for old mechanical drives (sequential processing)
- Use `-f` for large files (10-100x faster)
- BLAKE3 is the default algorithm (fastest cryptographic hash)
- Compile with `RUSTFLAGS="-C target-cpu=native"` for best performance
**Fast Mode Speedup:**
- 1 GB: ~7x faster
- 10 GB: ~67x faster
- 100 GB: ~667x faster
## Fast Mode
Samples 300MB (first/middle/last 100MB) instead of entire file.
**Good for:** Quick checks, large files, backups
**Not for:** Full verification, forensics, small files
## Common Use Cases
```bash
# Verify downloaded file
hash downloaded-file.iso -a sha256
# Backup verification (parallel by default)
hash scan -d /data -b backup.db
hash verify -b backup.db -d /data
# Backup on old HDD (sequential processing)
hash scan -d /data -b backup.db --hdd
hash verify -b backup.db -d /data
# Monitor changes
hash scan -d /etc/config -b baseline.db
hash verify -b baseline.db -d /etc/config
# Compare two snapshots
hash scan -d /data -b snapshot1.db
# ... time passes ...
hash scan -d /data -b snapshot2.db
hash compare snapshot1.db snapshot2.db -b changes.txt
# Analyze database for duplicates and stats
hash analyze -d media.db # View stats and duplicates
hash analyze -d media.db --json # JSON output for automation
# Find duplicates in directory
hash dedup -d /media # Quick duplicate scan
# Forensic analysis
hash scan -d /evidence -a sha3-256 -b evidence.db
hash scan -d /evidence -a sha256 -b evidence.txt --format hashdeep
# Quick checksums (blake3 is default)
hash large-backup.tar.gz -f
hash scan -d /backups -b checksums.db -f
# Automation
hash verify -b hashes.db -d /data --json | jq '.report.mismatches'
hash compare db1.db db2.db --format json | jq '.summary'
```
## Algorithm Selection
**Recommended:**
- SHA-256: Widely supported, good security
- BLAKE3: Fastest cryptographic hash
- SHA3-256: Post-quantum resistant
**Deprecated:**
- MD5, SHA-1: Use only for compatibility
**Non-crypto (trusted environments):**
- xxHash3/128: Maximum speed
## SIMD Optimization
Automatic support for SSE, AVX, AVX2, AVX-512 (x86_64) and NEON (ARM).
Verify: `cargo test --release --test simd_verification -- --nocapture`
## Wildcard Patterns
Supported patterns:
- `*` - Matches any number of characters (e.g., `*.txt`, `file*`)
- `?` - Matches exactly one character (e.g., `file?.bin`)
- `[...]` - Matches any character in brackets (e.g., `[abc]*.jpg`)
**Examples:**
```bash
hash "*.txt" -a sha256 # All .txt files in current dir
hash "data/*.bin" -a sha256 # All .bin files in data/
hash "file?.txt" -a sha256 # file1.txt, fileA.txt, etc.
hash "[abc]*.jpg" -a sha256 # Files starting with a, b, or c
hash scan -d "backup/*/data" -a sha256 -b db.txt # Multiple directories
hash verify -b "*.db" -d "data/*" # All .db files against all data dirs
```
**Notes:**
- Patterns are expanded by the shell or the application
- If no files match, an error is displayed
- Multiple matches are processed in sorted order
- For scan/verify with multiple directories, results are aggregated
## Troubleshooting
| Issue | Solution |
|-------|----------|
| Unsupported algorithm | Run `hash list` to see available algorithms |
| Permission errors | Use `sudo hash scan -d /protected/dir ...` |
| Slow performance | Use `-p` for parallel, `-f` for fast mode, or BLAKE3 |
| Fast mode not working | Fast mode only works with files (not stdin/text) |
| .hashignore not working | Check file location: `/path/to/dir/.hashignore` |
| Wildcard pattern not matching | Ensure pattern is quoted (e.g., `"*.txt"` not `*.txt`) |
| No files match pattern | Check pattern syntax and file locations |
## Contributing
We welcome contributions to QuicHash! To contribute, you must certify that you have the right to submit your contribution and agree to license it under the project's dual MIT/Apache-2.0 license.
### Developer Certificate of Origin (DCO)
QuicHash uses the [Developer Certificate of Origin (DCO)](https://developercertificate.org/) process. This is a lightweight way for contributors to certify that they wrote or otherwise have the right to submit code or documentation to an open source project.
#### Inbound = Outbound License
All contributions to QuicHash are made under the same dual MIT/Apache-2.0 license as the project itself. By signing off on your commits, you agree that your contributions will be licensed under these same terms, with no additional restrictions.
#### How to Sign Off Commits
Contributors sign-off that they adhere to these requirements by adding a Signed-off-by line to commit messages.
```
This is my commit message
Signed-off-by: Random J Developer
```
Git even has a `-s` command line option to append this automatically to your commit message:
```bash
$ git commit -s -m 'This is my commit message'
```
## License
QuicHash is dual-licensed under either:
- **MIT License** ([LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT)
- **Apache License, Version 2.0** ([LICENSE-APACHE](LICENSE-APACHE) or http://www.apache.org/licenses/LICENSE-2.0)
at your option.