https://github.com/paolostivanin/fastfilecheck

FastFileCheck is a fast, multithreaded file integrity checker for Linux, using parallel processing and a lightweight database to quickly hash and verify large volumes of files.
https://github.com/paolostivanin/fastfilecheck

hash hashing integrity-checker lmdb multithreading

Last synced: 8 months ago
JSON representation

FastFileCheck is a fast, multithreaded file integrity checker for Linux, using parallel processing and a lightweight database to quickly hash and verify large volumes of files.

Host: GitHub
URL: https://github.com/paolostivanin/fastfilecheck
Owner: paolostivanin
License: gpl-3.0
Created: 2024-11-04T08:01:17.000Z (11 months ago)
Default Branch: main
Last Pushed: 2024-12-19T09:24:24.000Z (10 months ago)
Last Synced: 2025-01-04T22:46:34.653Z (9 months ago)
Topics: hash, hashing, integrity-checker, lmdb, multithreading
Language: C
Homepage:
Size: 134 KB
Stars: 0
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# FastFileCheck
FastFileCheck is a high-performance, multithreaded file integrity checker for Linux. Designed for speed and efficiency, FastFileCheck utilizes parallel processing and a lightweight database to quickly hash and verify large volumes of files, ensuring their integrity over time.

Features:
* Multithreaded processing: automatically adapts to available CPU cores for optimal performance.
* Flexible configuration: see example.conf about all configuration options.
* Efficient hashing: uses fast, non-cryptographic hashing (xxHash) to detect file changes.
* Lightweight database storage: stores file hashes in a compact, memory-mapped database (LMDB) for rapid access and minimal overhead. The following information is stored for each file:
* Full file path
* Hash
* Inode number
* Link count
* Block count
* Three modes of operation:
- add: to register new files in the database.
- check: to verify files against stored information, flagging any mismatches.
- update: to update the database with new information for existing files.

Design overwiew:
* Main thread (producer): traverses directories and feeds the queue (one thread is more than enough for most use cases)
* Dedicated consumer thread: manages queue and distributes work to threadpool
* Worker threads: compute hashes in parallel

This separation of concerns is efficient because:
* Directory traversal is I/O bound and works well in a single thread
* Queue management is centralized, preventing race conditions
* Hash computation is CPU-intensive and properly parallelized

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/paolostivanin/fastfilecheck

Awesome Lists containing this project

README