{"id":22911988,"url":"https://github.com/opencoff/go-bbhash","last_synced_at":"2025-08-12T08:30:50.882Z","repository":{"id":47591620,"uuid":"139397675","full_name":"opencoff/go-bbhash","owner":"opencoff","description":"Fast Scalable Minimal Perfect Hash for Large Keysets","archived":false,"fork":false,"pushed_at":"2020-03-16T18:32:46.000Z","size":60,"stargazers_count":32,"open_issues_count":0,"forks_count":4,"subscribers_count":4,"default_branch":"master","last_synced_at":"2024-07-02T20:33:46.070Z","etag":null,"topics":["bbhash","constant-db","golang","key-value-database","key-value-store","minimal-perfect-hash","perfect-hash","perfect-hashing"],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/opencoff.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-07-02T06:02:47.000Z","updated_at":"2023-11-07T10:47:20.000Z","dependencies_parsed_at":"2022-09-10T12:02:26.363Z","dependency_job_id":null,"html_url":"https://github.com/opencoff/go-bbhash","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/opencoff%2Fgo-bbhash","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/opencoff%2Fgo-bbhash/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/opencoff%2Fgo-bbhash/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/opencoff%2Fgo-bbhash/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/opencoff","download_url":"https://codeload.github.com/opencoff/go-bbhash/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":229661690,"owners_count":18103615,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bbhash","constant-db","golang","key-value-database","key-value-store","minimal-perfect-hash","perfect-hash","perfect-hashing"],"created_at":"2024-12-14T04:19:29.170Z","updated_at":"2025-08-12T08:30:50.840Z","avatar_url":"https://github.com/opencoff.png","language":"Go","readme":"[![GoDoc](https://godoc.org/github.com/opencoff/go-bbhash?status.svg)](https://godoc.org/github.com/opencoff/go-bbhash)\n[![Go Report Card](https://goreportcard.com/badge/github.com/opencoff/go-bbhash)](https://goreportcard.com/report/github.com/opencoff/go-bbhash)\n\n# go-bbhash - Fast, Scalable Minimal Perfect Hash for Large Sets\n\n# This library is superceded by\n[go-mph](https://github.com/opencoff/go-mph). go-bbhash is no longer\nmaintained. Please use go-mph.\n\n## What is it?\nA library to create, query and serialize/de-serialize minimal perfect hash functions\nover very large key sets.\n\nThis is an implementation of [this paper](https://arxiv.org/abs/1702.03154). It is in part\ninspired by Damien Gryski's [Boomphf](https://github.com/dgryski/go-boomphf) - this implementation\ndiffers from Boomphf in one significant way - this library adds an efficient serialization \u0026\ndeserialization API.\n\nThe library exposes the following types:\n\n- `BBHash`: Represents an instance of a minimal perfect hash\n  function as described in the paper above.\n- `DBWriter`: Used to construct a constant database of key-value\n  pairs - where the lookup of a given key is done in constant time\n  using `BBHash`. Essentially, this type serializes a collection\n  of key-value pairs using `BBHash` as the underlying index.\n- `DBReader`: Used for looking up key-values from a previously\n  constructed (serialized) database.\n\n*NOTE* Minimal Perfect Hash functions take a fixed input and\ngenerate a mapping to lookup the items in constant time. In\nparticular, they are NOT a replacement for a traditional hash-table;\ni.e., it may yield false-positives when queried using keys not\npresent during construction. In concrete terms:\n\n   Let S = {k0, k1, ... kn}  be your input key set.\n\n   If H: S -\u003e {0, .. n} is a minimal perfect hash function, then\n   H(kx) for kx NOT in S may yield an integer result (indicating\n   that kx was successfully \"looked up\").\n\nThus, if users of `BBHash` are unsure of the input being passed to such a\n`Lookup()` function, they should add an additional comparison against\nthe actual key to verify. Look at `dbreader.go:Find()` for an\nexample.\n\n## How do I use it?\nLike any other golang library: `go get github.com/opencoff/go-bbhash`.\n\n## Example Program\nThere is a working example of the `DBWriter` and `DBReader` interfaces in the\nfile *example/mphdb.go*. This example demonstrates the following functionality:\n\n- add one or more space delimited key/value files (first field is key, second\n  field is value)\n- add one or more CSV files (first field is key, second field is value)\n- Write the resulting MPH DB to disk\n- Read the DB and verify its integrity\n\nFirst, lets run some tests and make sure bbhash is working fine:\n\n```sh\n\n  $ git clone https://github.com/opencoff/go-bbhash\n  $ cd go-bbhash\n  $ make test\n\n```\n\nNow, lets build and run the example program:\n```sh\n\n  $ make\n  $ ./mphdb -h\n```\n\nThere is a helper python script to generate a very large text file of\nhostnames and IP addresses: `genhosts.py`. You can run it like so:\n\n```sh\n\n  $ python ./example/genhosts.py 192.168.0.0/16 \u003e a.txt\n```\n\nThe above example generates 65535 hostnames and corresponding IP addresses; each of the\nIP addresses is sequentially drawn from the subnet.\n\n**NOTE** If you use a \"/8\" subnet mask you will generate a _lot_ of data (~430MB in size).\n\nOnce you have the input generated, you can feed it to the `example` program above to generate\na MPH DB:\n```sh\n\n  $ ./mphdb foo.db a.txt\n  $ ./mphdb -V foo.db\n```\n\nIt is possible that \"mphdb\" fails to construct a DB and complains of gamma being too small. In\nthat case, try increasing \"g\" like so:\n```sh\n  $ ./mphdb -g 2.75 foo.db a.txt\n```\n\n## Basic Usage of BBHash\nAssuming you have read your keys, hashed them into `uint64`, this is how you can use the library:\n\n```go\n\n        bb, err := bbhash.New(2.0, keys)\n        if err != nil { panic(err) }\n\n        // Now, call Find() with each key to gets its unique mapping.\n        // Note: Find() returns values in the range closed-interval [1, len(keys)]\n        for i, k := range keys {\n                j := bb.Find(k)\n                fmt.Printf(\"%d: %#x maps to %d\\n\", i, k, j)\n        }\n\n```\n\n## Writing a DB Once, but lookup many times\nOne can construct an on-disk constant-time lookup using `BBHash` as\nthe underlying indexing mechanism. Such a DB is useful in situations\nwhere the key/value pairs are NOT changed frequently; i.e.,\nread-dominant workloads. The typical pattern in such situations is\nto build the constant-DB _once_ for efficient retrieval and do\nlookups multiple times.\n\n### Step-1: Construct the DB from multiple sources\nFor example, let us suppose that file *a.txt* and *b.csv* have lots\nof key,value pairs. We will build a constant DB using this.\n\n```go\n\n    wr, err := bbhash.NewDBWriter(\"file.db\")\n    if err != nil { panic(err) }\n\n    // add a.txt and a.csv to this db\n\n    // txt file delimited by white space;\n    // first token is the key, second token is the value\n    n, err := wr.AddTextFile(\"a.txt\", \" \\t\")\n    if err != nil { panic(err) }\n    fmt.Printf(\"a.txt: %d records added\\n\", n)\n\n    // CSV file - comma delimited\n    // lines starting with '#' are considered comments\n    // field 0 is the key; and field 1 is the value.\n    // The first line is assumed to be a header and ignored.\n    n, err := wr.AddCSVFile(\"b.csv\", ',', '#', 0, 1)\n    if err != nil { panic(err) }\n    fmt.Printf(\"b.csv: %d records added\\n\", n)\n\n    // Now, freeze the DB and write to disk.\n    // We will use a larger \"gamma\" value to increase chances of\n    // finding a minimal perfect hash function.\n    err = wr.Freeze(3.0)\n    if err != nil { panic(err) }\n```\n\nNow, `file.db` has the key/value pairs from the two input files\nstored in an efficient format for constant-time retrieval.\n\n### Step-2: Looking up Key in the DB\nContinuing the above example, suppose that you want to use the\nconstructed DB for repeated lookups of various keys and retrieve\ntheir corresponding values:\n\n```go\n\n    // read 'file.db' and cache upto 10,000\n    // records in memory.\n    rd, err := bbhash.NewDBReader(\"file.db\", 10000)\n    if err != nil { panic(err) }\n```\n\nNow, given a key `k`, we can use `rd` to lookup the corresponding\nvalue:\n\n```go\n\n    val, err := rd.Find(k)\n\n    if err != nil {\n        if err == bbhash.ErrNoKey {\n            fmt.Printf(\"Key %x is not in the DB\\n\", k)\n        } else {\n            fmt.Printf(\"Error: %s\\n\", err)\n        }\n    }\n\n    fmt.Printf(\"Key %x =\u003e Value %x\\n\", k, val)\n```\n\n\n## Implementation Notes\n\n* For constructing the BBHash, keys are `uint64`; the DBWriter\n  implementation uses Zi Long Tan's superfast hash function to\n  transform arbitary bytes to uint64.\n\n* The perfect-hash index for each key is \"1\" based (i.e., it is in the closed\n  interval `[1, len(keys)]`.\n\n## License\nGPL v2.0\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopencoff%2Fgo-bbhash","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopencoff%2Fgo-bbhash","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopencoff%2Fgo-bbhash/lists"}