{"id":15461481,"url":"https://github.com/ibnesayeed/binsearch","last_synced_at":"2025-08-30T02:02:58.685Z","repository":{"id":152152979,"uuid":"243129292","full_name":"ibnesayeed/binsearch","owner":"ibnesayeed","description":"A Python module for binary search in sorted files","archived":false,"fork":false,"pushed_at":"2020-02-26T01:23:46.000Z","size":13,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-10-12T12:28:50.209Z","etag":null,"topics":["binary-file-search","binary-search","binsearch","file-search","python","python-module","python3","sorted-file"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ibnesayeed.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-02-26T00:03:27.000Z","updated_at":"2024-06-12T16:17:31.000Z","dependencies_parsed_at":null,"dependency_job_id":"ba700363-e470-469f-a784-0adeade4f1ec","html_url":"https://github.com/ibnesayeed/binsearch","commit_stats":{"total_commits":7,"total_committers":1,"mean_commits":7.0,"dds":0.0,"last_synced_commit":"1e0f250a50769b66839d3c64741101d817616796"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ibnesayeed/binsearch","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ibnesayeed%2Fbinsearch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ibnesayeed%2Fbinsearch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ibnesayeed%2Fbinsearch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ibnesayeed%2Fbinsearch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ibnesayeed","download_url":"https://codeload.github.com/ibnesayeed/binsearch/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ibnesayeed%2Fbinsearch/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":272793017,"owners_count":24993830,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-30T02:00:09.474Z","response_time":77,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["binary-file-search","binary-search","binsearch","file-search","python","python-module","python3","sorted-file"],"created_at":"2024-10-01T23:42:44.314Z","updated_at":"2025-08-30T02:02:58.651Z","avatar_url":"https://github.com/ibnesayeed.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# BinSearch\n\nA Python module for binary search in sorted files.\n\n## Objective\n\nGiven a sorted file, perform binary search on disk to find if a given lookup key is present in the file.\nThe file can have fielded data and matching is desired to be performed on a select few fields only.\nIt may also be desired to perform a prefix match where the lookup key is a prefix of one or more records in the file.\nThe search needs to be able to report no matches, all matches, any match, the first match, or the last match.\nIn case a start and/or end byte position(s) is/are provided, the search should only be performed within that given range.\n\n## Functions\n\n```\nfind(fh, key, **kw)       =\u003e None | ANY_MATCHED_LINE\nfind_first(fh, key, **kw) =\u003e None | FIRST_MATCHED_LINE\nfind_last(fh, key, **kw)  =\u003e None | LAST_MATCHED_LINE\nfind_all(fh, key, **kw)   =\u003e ITERATOR_OVER_ALL_MATCHED_LINES\n```\n\n## Parameters\n\n* `fh` - File handle, open for binary read\n* `key` - Bytes, search key\n* `fields` - Int, number of fields in each line from the beginning to match against (Default: `None`)\n* `matcher` - Function, to identify the portion of lines read for matching, overrides `fields` param (Default: `line.strip()`)\n* `prefix` - Boolean, whether perform prefix match or exact (Default: `False`)\n* `prefix_boundary` - Bytes, only consider prefix match if the following character is a boundary character (Default: `''`)\n* `start` - Int, beginning byte of the start the search boundary (Default: `0`)\n* `end` - Int, end byte of the search boundary (Default: `fh.size()`)\n\n## Use Cases\n\nWe can see it being useful in many places, but following are a few places where we do have the need:\n\n* [InterPlanetary Wayback (IPWB)](https://github.com/oduwsdl/ipwb) replay index search\n* [MementoMap](https://github.com/oduwsdl/MementoMap) prefix search\n* A built-in CLI tool to perform binary search on files\n\n## Considerations\n\n* Should the caller provide a file path (and let the API deal with opening and closing the file) or an open file handle?\n  * If the API handles file opening/closing then it will open and close files on each request.\n  * If the caller provides a file handle then it can be reused, but the caller needs to write a couple extra lines.\n* Should the API check for the file being sorted before performing lookup?\n  * If the input file is not sorted then search function may behave in unexpected ways.\n  * Checking for sorting has linear complexity that will require reading the whole file once, binding it to each search call will render the benefit of binary search useless.\n  * Perhaps we can provide a separate helper function, in case the user wants to ensure it once before performing searches as a bootstrapping procedure.\n* Should the API work for reverse sorted files too?\n  * It is doable without any efficiency cost, but I do not see enough motivation to add more code complexity, unless there are compelling use cases.\n* Should the `find_all()` function return `None` if no match found or an iterator which will terminate at the first iteration and will evaluate to `0` on length calculation?\n  * Returning an empty iterator seems more sane approach, but the caller will need to iterate it at least once to see whether there was a match.\n  * Returning `None` means the return value is polymorphic and call to the function cannot be used in a loop without first checking for `None` value.\n* Should we use an existing implementation?\n  * There are a few similar implementations in Python, but their license is not permissive and can not be used in projects with permissive licenses.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fibnesayeed%2Fbinsearch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fibnesayeed%2Fbinsearch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fibnesayeed%2Fbinsearch/lists"}