{"id":37150656,"url":"https://github.com/i5heu/hazzy","last_synced_at":"2026-01-14T17:48:26.493Z","repository":{"id":214098354,"uuid":"735665623","full_name":"i5heu/hazzy","owner":"i5heu","description":"a fuzzy hash designed to be able to search for duplicates fast","archived":false,"fork":false,"pushed_at":"2023-12-25T19:53:42.000Z","size":18151,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-06-21T19:56:31.141Z","etag":null,"topics":["approximating","deduplication","hash","hashing","hashing-algorithm"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/i5heu.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-12-25T18:34:10.000Z","updated_at":"2024-01-12T12:06:30.000Z","dependencies_parsed_at":null,"dependency_job_id":"332fccf7-4af6-4849-a392-771e09e6d866","html_url":"https://github.com/i5heu/hazzy","commit_stats":null,"previous_names":["i5heu/hazzy"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/i5heu/hazzy","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/i5heu%2Fhazzy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/i5heu%2Fhazzy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/i5heu%2Fhazzy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/i5heu%2Fhazzy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/i5heu","download_url":"https://codeload.github.com/i5heu/hazzy/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/i5heu%2Fhazzy/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28428939,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-14T16:38:47.836Z","status":"ssl_error","status_checked_at":"2026-01-14T16:34:59.695Z","response_time":107,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["approximating","deduplication","hash","hashing","hashing-algorithm"],"created_at":"2026-01-14T17:48:25.668Z","updated_at":"2026-01-14T17:48:26.488Z","avatar_url":"https://github.com/i5heu.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"! WORK IN PROGRESS !\n\n\n# Hazzy\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\".media/logo.png\"  width=\"33%\"\u003e\n\u003c/p\u003e\n\n## Description\n\nHazzy is a Go package that offers a unique approach to file hashing, particularly useful for identifying duplicates in files, either wholly or partially. It achieves this by computing hash values in chunks, enabling users to compare parts of files for potential similarities. This feature is particularly beneficial in data deduplication, storage optimization, and efficient file management.\n\n## Features\n\n- **Compression Ratio Calculation**: Hazzy calculates the compression ratio of a file, providing insight into the level complexity of the file's content.\n- **Chunk-Based Hashing**: Files are hashed in chunks (100KB and 1KB sizes), enabling partial file comparison and detection of similarities within different files.\n- **Format**: The hash output format is `(compression ratio).(hash of 100KB chunks).(hash of 1KB chunks)`.\n\n## Advantages Over Traditional Hashing Methods\n\n### Compared to Pure Fuzzy Hashing\n\n- **More Informative**: Hazzy not only identifies similar files but also quantifies the level of similarity with its compression ratio and detailed chunk-based hashes.\n- **Efficient Partial Matching**: Unlike traditional fuzzy hashes that provide a single hash value, Hazzy's chunk-based approach allows for more granular comparison, making it easier to locate which parts of the files are similar.\n\n### Compared to Bloom Trees\n\n- **Simpler Implementation**: Hazzy offers a more straightforward approach without the need for complex data structures like Bloom trees.\n- **Reduced False Positives**: By providing detailed chunk-based hashes, Hazzy reduces the likelihood of false positives that can occur with Bloom tree implementations in large datasets.\n- **Versatility in File Size**: Effective for both large and small files, Hazzy ensures accurate hashing and comparison regardless of file size.\n\n## Example\n\nThis image will give this hash\n\n\u003cimg src=\"./testData/smol.jpeg\"  width=\"33%\"\u003e\n\n`4.Is.33PidPezAaudnCIjEcl0Nd5jjeOaOZL2iVhsDZMutapsNSYqe3LW2EiikBfVxYB1sWAXwmKqr0BxqAzSjbDO1Uy9krHUYjnridr5xajV72leTJLp6uFNZF1swoVkDgsiFyIZODlRgdz979lhLJU7jVmii8878wJZwgCPEBHs715C5FVlFJHjK3OHuMhbqueAPVSzBGmoUfj21T0FOb4qjfp0qn7OhDeoJ2WAM568KfhXqtZPgUvvHGHDf5n1iGz00Qvukiv8kxB9SYIFcuHILjxJ6L7SK4eNDhvjo0LR7ETyPUspX25aRiVEwhpdh2weRYj6RYkilaoFORCo4aS7QU3xLLCeP8di35LbVlxZw5HLDlKzwCGt6igYjIihoifSxwlHYOYS4Q9ujti907BCiPQlKKm8HqDFC9vqudGZMyeR14ybCcpO6c3Td6FjnTHXRILCvkCRkqsZYKeDq2mLHUUTZ2M6ORj8odKrsjSefIEjhrddnSsY7ODfgWmbc3aloYXZtnQezwDhcuEUbUezKbPYhfRglZj29MOYWriHS0Y4HnAAO9jkhrWQE9OylHf3XWuRHcjmn6Ilv605Jb1Uwer4SMWyWE9S1HD0q2qKor6HmmSywC\n`\n\n\n## Installation\n\nTo install Hazzy, use the following go get command:\n\n```bash\ngo get -u github.com/i5heu/hazzy\n```\n\n## Usage\n\nHere's a basic example of how to use Hazzy:\n\n! WORK IN PROGRESS !  \n\n```go\npackage main\n\nimport (\n    \"fmt\"\n    \"github.com/i5heu/hazzy\"\n)\n\nfunc main() {\n    // Generate hash from a file\n    hash, err := hazzy.GenerateHashFromFile(\"path/to/your/file\")\n    if err != nil {\n        fmt.Println(\"Error:\", err)\n        return\n    }\n    fmt.Println(\"File Hash:\", hash)\n\n    // Generate hash from a byte slice\n    data := []byte(\"example data\")\n    hash, err = hazzy.GenerateHashFromBytes(data)\n    if err != nil {\n        fmt.Println(\"Error:\", err)\n        return\n    }\n    fmt.Println(\"Data Hash:\", hash)\n}\n```\n\n## Contributing\n\nContributions are welcome! Please feel free to submit pull requests, open issues, or suggest improvements.\n\n## License\n\nhazzy (c) 2023 Mia Heidenstedt and contributors\n\nSPDX-License-Identifier: AGPL-3.0","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fi5heu%2Fhazzy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fi5heu%2Fhazzy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fi5heu%2Fhazzy/lists"}