https://github.com/stabla/syntegrity
Compute cryptographic hashes to follow your system structure integrity.
https://github.com/stabla/syntegrity
Last synced: 7 months ago
JSON representation
Compute cryptographic hashes to follow your system structure integrity.
- Host: GitHub
- URL: https://github.com/stabla/syntegrity
- Owner: stabla
- Created: 2025-07-12T09:32:56.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2025-07-13T12:27:03.000Z (11 months ago)
- Last Synced: 2025-08-15T18:20:10.905Z (10 months ago)
- Language: Python
- Homepage:
- Size: 11.7 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Syntegrity
Project is currently under development.
A fast Python tool that computes cryptographic hashes for files and folders to check system integrity. It generates two different hashes for each folder - one for content and one for structure.
## What it does
Syntegrity scans directories recursively and computes:
- Individual file hashes (SHA-256)
- Folder content hashes (hash1) - combines all file and subfolder hashes
- Folder structure hashes (hash2) - based on folder metadata and organization
This lets you detect both content changes and structural modifications in your file system.
## Features
- **Fast processing** - Uses parallel processing and memory mapping for large files
- **Smart caching** - Saves computed hashes to avoid re-processing unchanged files
- **Dual hash system** - Separate hashes for content vs structure integrity
- **Recursive scanning** - Processes entire directory trees automatically
- **Error handling** - Gracefully handles permission errors and missing files
## Quick start
```bash
# Run the analyzer
python3 analyzer.py
```
## Output format
The tool outputs file hashes and folder hashes in this format:
```
Processing files:
/home/test-syntegrity/file1: 32c66107f0f4f2053128e519681fc8e88806d0d2b17607ce9f2362aff66ad6c7
/home/test-syntegrity/file2: 85df9a7c92f2e8c562629361ed51d54efb76e0f12ffd2a588f25f93a29d2a43e
Processing folders:
test-syntegrity:[content_hash];[structure_hash]
folder1:[content_hash];[structure_hash]
```
For folders, the format is: `foldername:[hash1];[hash2]`
- **hash1** = content integrity (all files and subfolders)
- **hash2** = structure integrity (folder metadata and organization)
## How it works
### Hash1 (Content Hash)
Computes a hash of all file hashes within the folder, recursively. This detects when any file content changes.
### Hash2 (Structure Hash)
Computes a hash of the folder's immediate structure - file names, sizes, subfolder names, and modification times. This detects structural changes like renames, moves, or permission changes.
### Performance optimizations
- Uses multiprocessing to utilize all CPU cores
- Memory maps large files (>1MB) for faster reading
- Caches results to avoid re-computing unchanged files
- Single-pass directory discovery reduces I/O operations
## Configuration
Edit the directories list in `main()`:
```python
directories_to_process = [
"/home/test-syntegrity",
"/etc/config",
"/var/log"
]
```
## Use cases
### System monitoring
```bash
# Create baseline
python3 analyzer.py > baseline.txt
# Later check for changes
python3 analyzer.py > current.txt
diff baseline.txt current.txt
```
### File integrity checking
- Detect unauthorized file modifications
- Monitor system configuration changes
- Verify backup integrity
- Ensure build artifacts haven't changed
## Troubleshooting
### Empty hash collisions
If you see the same hash for different files/folders, clear the cache:
```bash
rm .hash_cache.pkl
```
### Permission errors
The script handles permission errors gracefully and logs them to stderr. Check the error output for details.
### Performance tuning
Adjust the worker count based on your system:
```python
max_workers = min(cpu_count(), 8) # Default max 8 workers
```
## Technical details
- **Time complexity**: O(n) where n = number of files
- **Space complexity**: O(p) where p = number of parallel workers
- **Hash algorithm**: SHA-256
- **Cache location**: `.hash_cache.pkl`
The script adapts its approach based on file size:
- Small files (<1MB): Chunked reading
- Large files (1-10MB): Memory mapping
- Very large files (>10MB): Chunked memory mapping
## Requirements
- Python 3.7+
- Linux/Unix system (for optimal performance)
- Standard library modules only (no external dependencies)
## License
MIT License - see LICENSE file for details.