{"id":18729057,"url":"https://github.com/lichtso/rollinghashexperiments","last_synced_at":"2026-01-24T07:32:36.635Z","repository":{"id":151645401,"uuid":"100057053","full_name":"Lichtso/RollingHashExperiments","owner":"Lichtso","description":null,"archived":false,"fork":false,"pushed_at":"2017-08-17T15:08:26.000Z","size":6,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-05-19T20:33:51.995Z","etag":null,"topics":["rolling-hash-functions"],"latest_commit_sha":null,"homepage":null,"language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Lichtso.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-08-11T17:45:12.000Z","updated_at":"2017-08-11T17:49:33.000Z","dependencies_parsed_at":null,"dependency_job_id":"f5add8f8-6209-40aa-bccb-757e987cedd5","html_url":"https://github.com/Lichtso/RollingHashExperiments","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Lichtso/RollingHashExperiments","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Lichtso%2FRollingHashExperiments","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Lichtso%2FRollingHashExperiments/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Lichtso%2FRollingHashExperiments/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Lichtso%2FRollingHashExperiments/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Lichtso","download_url":"https://codeload.github.com/Lichtso/RollingHashExperiments/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Lichtso%2FRollingHashExperiments/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28718960,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-24T05:53:42.649Z","status":"ssl_error","status_checked_at":"2026-01-24T05:53:41.698Z","response_time":89,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["rolling-hash-functions"],"created_at":"2024-11-07T14:25:31.849Z","updated_at":"2026-01-24T07:32:36.616Z","avatar_url":"https://github.com/Lichtso.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"# RollingHashExperiments\n\n## Interactive Demos\n[Example: search](https://lichtso.github.io/RollingHashExperiments/search.html) and [Example: diff](https://lichtso.github.io/RollingHashExperiments/diff.html).\nYou can experiement with text streams fetched via HTTP.\nThe text steams are segemented using a rolling hash function and then the resulting segements are dyed using a general hash function.\n\n## What is a [rolling hash](https://en.wikipedia.org/wiki/Rolling_hash) function?\nInstead of giving one output value for a given input sequence of data cells (e.g. bytes),\nit yields a value for each position in the sequence using a sliding window.\nBy some simple math (dynamic programming) it can be calculated in linear time instead of quadratic time.\n\n## Why would you need a hash value for each position in the input sequence?\nThe two most popular use cases for rolling hashes are search algorithms and diff tools.\nIn case of a [search algorithm](https://en.wikipedia.org/wiki/Rabin–Karp_algorithm) you simply hash the needle once and\nthen let a rolling hash with a window of the size of the needle iterate over the haystack.\nBy doing so you can save a lot of string comparisons as a mismatch in the hash values definitely indicates a mismatch at that position.\nAnd in case of diff tools you hash both input sequences and then cut them into chunks whenever the hash outputs a specific value (like 0).\nThen the resulting chunks are hashed by a \"normal\" generic hash function.\nThat way, you can split the inputs into chunks of varying size based on their contents and\nthen find similarities as well as differences locally based on the output values of the generic hash function.\nThis would not be possible with uniform sized chunks as an insertion at the beginning would ruin all following hash values,\nbecause the borders of the chunks would not move with the content.\n\n## How does a rolling hash work?\nThere are multiple ways, but the easiest one to understand is to use a positional notation as sliding window.\nFor example if we take the number: 123456, it is encoded in a positional notation using a sequence of 6 symbols and the base 10 (different possible digits).\nWhat we actually have is: `1*10^5 + 2*10^4 + 3*10^3 + 4*10^2 + 5*10^1 + 6*10^0 = 123456`.\nIf we now want to move our window one to the right, we have to remove the left symbol and add one to the right.\n- `123456 - 1*10^5 = 23456`\n- `23456 * 10 = 234560`\n- `234560 + 7 = 234567`\n- `2*10^5 + 3*10^4 + 4*10^3 + 5*10^2 + 6*10^1 + 7*10^0 = 234567`\n\nIn general the window is defined as `sum over all i in range(0, input.length): input[i] * power(base, input.length-1-i)`\nAnd the window shift is performed by: `newHashValue = (lastHashValue - symbolToRemove * power(base, window.length-1)) * base + symbolToAdd;`\nTo get the output into a specific range and let it appear more chaotic we take the modulo at the end.\nThis can also be done in the middle process to avoid too big numbers as modular arithmetic does not affect addition, subtraction and multiplication.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flichtso%2Frollinghashexperiments","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flichtso%2Frollinghashexperiments","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flichtso%2Frollinghashexperiments/lists"}