https://github.com/geolffreym/rolling-sync
Rolling Hash File Diffing
https://github.com/geolffreym/rolling-sync
Last synced: about 2 months ago
JSON representation
Rolling Hash File Diffing
- Host: GitHub
- URL: https://github.com/geolffreym/rolling-sync
- Owner: geolffreym
- Created: 2022-04-08T04:35:20.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2024-04-21T15:40:04.000Z (about 1 year ago)
- Last Synced: 2025-03-29T13:35:18.989Z (2 months ago)
- Language: Go
- Size: 10 MB
- Stars: 6
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
[](https://github.com/geolffreym/rolling-sync/actions/workflows/go.yml)
# Rolling Hash Algorithm
_Spec v5 (2022-04-04)_
Make a rolling hash based file diffing algorithm. When comparing original and an updated version of an input, it should return a description ("delta") which can be used to upgrade an original version of the file into the new file. The description provides information of the chunks which:
- Can be reused from the original file
- Have been added or modified and thus would need to be synchronizedThe real-world use case for this type of construct could be a distributed file storage system. This reduces the need for bandwidth and storage. If user has a local copy of a file stored in the cloud, then changes between these two instances can be synchronized using diff produced by rolling hash.
A library that does a similar thing is [rdiff](https://linux.die.net/man/1/rdiff). You don't need to fulfill the patch part of the API, only signature and delta.
## Requirements
- Hashing function gets the data as a parameter. Separate possible filesystem operations.
- Chunk size can be fixed or dynamic, but must be split to at least two chunks on any sufficiently sized data.
- Should be able to recognize changes between chunks. Only the exact differing locations should be added to the delta.
- Well-written unit tests function well in describing the operation, no UI necessary.## Checklist
1. Input/output operations are separated from the calculations
2. detects chunk changes and/or additions
3. detects chunk removals
4. detects additions between chunks with shifted original chunks## Usage
Run Tests: `make test`
Build: `make build`
Test Coverage: `make coverage`
Benchmark: `make benchmark`
Code Analysis: `make check`
Compile: `make compile`
## Next
- Use of immutable [string vs byte benchmark](https://medium.com/@felipedutratine/in-golang-should-i-work-with-bytes-or-strings-8bd1f5a7fd48) comparison