{"id":19390839,"url":"https://github.com/tecnickcom/binsearch","last_synced_at":"2025-04-24T00:31:38.494Z","repository":{"id":66250101,"uuid":"112932041","full_name":"tecnickcom/binsearch","owner":"tecnickcom","description":"Search unsigned integers in sorted binary file","archived":false,"fork":false,"pushed_at":"2025-03-26T10:32:15.000Z","size":466,"stargazers_count":6,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-04-02T22:51:16.916Z","etag":null,"topics":["binary","c","c99","digital","fast","filesystem","golang","memory-mapped-file","python","search"],"latest_commit_sha":null,"homepage":null,"language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tecnickcom.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":"CODEOWNERS","security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"custom":["https://www.paypal.com/cgi-bin/webscr?cmd=_donations\u0026currency_code=GBP\u0026business=paypal@tecnick.com\u0026item_name=donation%20for%20binsearch%20project"]}},"created_at":"2017-12-03T13:53:54.000Z","updated_at":"2025-03-26T09:36:17.000Z","dependencies_parsed_at":"2024-04-16T14:39:13.137Z","dependency_job_id":"fe19da85-88bb-4ffa-9fe3-756853a82d86","html_url":"https://github.com/tecnickcom/binsearch","commit_stats":null,"previous_names":[],"tags_count":33,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tecnickcom%2Fbinsearch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tecnickcom%2Fbinsearch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tecnickcom%2Fbinsearch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tecnickcom%2Fbinsearch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tecnickcom","download_url":"https://codeload.github.com/tecnickcom/binsearch/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250539440,"owners_count":21447309,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["binary","c","c99","digital","fast","filesystem","golang","memory-mapped-file","python","search"],"created_at":"2024-11-10T10:23:34.170Z","updated_at":"2025-04-24T00:31:33.870Z","avatar_url":"https://github.com/tecnickcom.png","language":"C","funding_links":["https://www.paypal.com/cgi-bin/webscr?cmd=_donations\u0026currency_code=GBP\u0026business=paypal@tecnick.com\u0026item_name=donation%20for%20binsearch%20project"],"categories":[],"sub_categories":[],"readme":"# Binsearch\n\n*Fast binary search for columnar data formats.*\n\n[![Master Build Status](https://secure.travis-ci.org/tecnickcom/binsearch.png?branch=master)](https://travis-ci.org/tecnickcom/binsearch?branch=master)\n[![Master Coverage Status](https://coveralls.io/repos/tecnickcom/binsearch/badge.svg?branch=master\u0026service=github)](https://coveralls.io/github/tecnickcom/binsearch?branch=master)\n\n* **category**    Libraries\n* **license**     MIT (see LICENSE)\n* **author**      Nicola Asuni\n* **copyright**   2017-2024 Nicola Asuni - Tecnick.com\n* **link**        https://github.com/tecnickcom/binsearch\n\n\n## Description\n\nThe functions provided here allows to search unsigned integers in a binary file made of adjacent constant-length binary blocks sorted in ascending order.\n\nFor example, the first 4 bytes of each 8-bytes blocks below represent a `uint32` in big-endian.\nThe integers are sorted in ascending order.\n\n```\n2f 81 f5 77 1a cc 7b 43\n2f 81 f5 78 76 5f 63 b8\n2f 81 f5 79 ca a9 a6 52\n```\n\nThis binary representation can be used to encode sortable key-value data, even with nested keys.\n\nThe xxd command-line application can be used to convert a binary file to hexdump and reverse.\nFor example:\n\n```\nxxd -p -c8 binaryfile.bin \u003e hexfile.txt\nxxd -r -p hexfile.txt \u003e binaryfile.bin\n```\n\nThis library also provide functions to read columnar data in Little-Endian format.\n\nThe `mmap_binfile` function is able to extract some basic data from files in Apache Arrow, Feather or custom BINSRC format.\n\n\n\n\n## Getting Started\n\nThe reference code of this application is written in C language and includes wrappers for GO and Python.\n\nA wrapper Makefile is available to allows building the project in a Linux-compatible system with simple commands.  \nAll the artifacts and reports produced using this Makefile are stored in the *target* folder.  \n\nTo see all available options:\n```\nmake help\n```\n\nuse the command ```make all``` to build and test all the implementations.\n\n## NOTE\n\n* the \"_be_\" or \"BE\" functions refer to source files sorted in Big-Endian.\n* the \"_le_\" or \"LE\" functions refer to source files sorted in Little-Endian.\n\n\n\n## BINSRC Format:\n\n* 8 BYTE  : `BINSRC1\\0` magic number\n* 1 BYTE  : Number of columns.\n* One byte for each column type (i.e. 1 for uint8, 2 for uint16, 4 for uint32, 8 for uint64).\n* (PADDING TO ALIGN THE DATA TO 8 BYTE)\n* 8 BYTE  : number of rows\n* cols * 8 BYTE : offset to the start of each column\n* (DATA BODY AS IN APACHE ARROW)\n\n### Example\n\n```\n42494e5352433100    : BINSRC1 magic number\n02                  : 2 columns\n04                  : first column is uint32_t (4 bytes)\n08                  : second column is uint64_t (8 bytes)\n0000000000          : padding to 8 byte\n0b00000000000000    : 11 rows per columns\n2800000000000000    : byte offset to the start of the first column\n5800000000000000    : byte offset to the start of the second column\n01000000            : first column - first row\n07000000            : ...\n0b000000            : \n61000000            : \n65000000            : \ne5030000            : \nf1030000            : \nf5260000            : \na3860100            : \n19990100            : \n19990100            : \n00000000            : \n00803380257a0208    : second column - first row\n18399e43fea10048    : ...\n16eb5575fea10048    : \n00003a0074020180    : \n008013008d020180    : \n00007a0099020180    : \n00003a00622b01a0    : \n00807080622b01a0    : \n926625e3652b01a0    : \n039843d5672b01a0    : \n039843d5672b01a0    : \n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftecnickcom%2Fbinsearch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftecnickcom%2Fbinsearch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftecnickcom%2Fbinsearch/lists"}