{"id":19792956,"url":"https://github.com/joangq/bobs","last_synced_at":"2025-07-28T05:06:03.302Z","repository":{"id":143541736,"uuid":"607886634","full_name":"joangq/bobs","owner":"joangq","description":"Byte-Oriented Binary Search for looking up values in huge files.","archived":false,"fork":false,"pushed_at":"2023-03-01T05:43:40.000Z","size":1357,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-11T04:32:52.748Z","etag":null,"topics":["big-data","binary","binary-file","binary-search","binary-search-algorithm","cpp","cpp11","pattern-matching","py3","python","python3","search","search-algorithm"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/joangq.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-02-28T21:53:37.000Z","updated_at":"2024-04-15T02:02:54.000Z","dependencies_parsed_at":null,"dependency_job_id":"be8b5e22-7e27-4848-a22c-9c8f2add05f1","html_url":"https://github.com/joangq/bobs","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/joangq%2Fbobs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/joangq%2Fbobs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/joangq%2Fbobs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/joangq%2Fbobs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/joangq","download_url":"https://codeload.github.com/joangq/bobs/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241133046,"owners_count":19915319,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["big-data","binary","binary-file","binary-search","binary-search-algorithm","cpp","cpp11","pattern-matching","py3","python","python3","search","search-algorithm"],"created_at":"2024-11-12T07:08:25.115Z","updated_at":"2025-02-28T09:48:44.354Z","avatar_url":"https://github.com/joangq.png","language":"C++","readme":"# Byte-Oriented Binary Search\n\nThis is an algorithm designed to search a value inside a huge file. Instead of loading the whole file into memory, or doing linear search by loading chunks, it instead performs a binary search using pointers to the raw file data. The implementation presented here is flexible, providing functions as parameters to provide customization for parsing the values, checking if the characters read are valid, and comparing them.\n\nBecause dividing a file in half will not give the middle element (Rather the middle byte), it's necessary to first determine the full information of the element. Because we're dealing with pointers to raw data, the pointer moves backwards and forwards searching a delimiter, this specific program assumes that the delimiter between values is a `newline` (Either `\\n` or `\\r\\n`), that way we can read entire lines from the current pointer to the rightmost `newline`.\n\nThe example below demonstrates the line-completion algorithm with lines formatted in a `key:value` fashion.\n\n![](./assets/completeLine.gif)\n\n# Usage\n\n### C++\n\n```cpp\nint parser(const std::string\u0026 s) {\n    string left = s.substr(0, s.find(':'));\n    int value = std::strtol(left.c_str(), nullptr, 10);\n    return value;\n}\n\nint comparator(int a, int b) {\n    return basicCompare(a, b);\n}\n\nbool validator(const string\u0026 s) {\n    return s.length() \u003e 0 and (iswalnum(s[0]) or s[0] == ':');\n}\n\nint main() {\n    string inputFile = R\"(test.bin)\";\n    int target = 1;\n\n    string output = bobs(inputFile, target, parser, validator, comparator);\n    cout \u003c\u003c \"Output: \" \u003c\u003c output \u003c\u003c endl; // Done! In 4 steps. Output: 1:1\n\n    return 0;\n}\n```\n\n### Python\n\n```python\ndef parser(s: str):\n    return int(s.split(':')[0], 10)\n\n\ndef compare(a: int, b: int):\n    return basic_compare(a, b)\n\n\ndef validator(s: str):\n    return len(s) \u003e 0 and (s[0].isalnum() or s[0] == ':')\n\n\nif __name__ == '__main__':\n    file = 'test.bin'\n    target = 12345\n    output = bobs(file, target, parser, validator, compare)\n    print(output) # Done! In 2 steps. 12345:77\n\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjoangq%2Fbobs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjoangq%2Fbobs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjoangq%2Fbobs/lists"}