{"id":22912001,"url":"https://github.com/opencoff/go-chd","last_synced_at":"2025-09-13T16:39:49.595Z","repository":{"id":57562341,"uuid":"326268195","full_name":"opencoff/go-chd","owner":"opencoff","description":"Minimal Perfect Hash function via Compress Hash Displace","archived":false,"fork":false,"pushed_at":"2021-04-04T00:28:54.000Z","size":54,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-01T11:08:58.196Z","etag":null,"topics":["chd","compress-hash-displace","constant-time","golang","mphf"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/opencoff.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-01-02T20:47:18.000Z","updated_at":"2023-12-21T12:21:40.000Z","dependencies_parsed_at":"2022-09-16T19:51:12.540Z","dependency_job_id":null,"html_url":"https://github.com/opencoff/go-chd","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/opencoff/go-chd","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/opencoff%2Fgo-chd","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/opencoff%2Fgo-chd/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/opencoff%2Fgo-chd/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/opencoff%2Fgo-chd/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/opencoff","download_url":"https://codeload.github.com/opencoff/go-chd/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/opencoff%2Fgo-chd/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":260183170,"owners_count":22971196,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chd","compress-hash-displace","constant-time","golang","mphf"],"created_at":"2024-12-14T04:19:32.008Z","updated_at":"2025-06-16T15:10:51.172Z","avatar_url":"https://github.com/opencoff.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![GoDoc](https://godoc.org/github.com/opencoff/go-chd?status.svg)](https://godoc.org/github.com/opencoff/go-chd)\n[![Go Report Card](https://goreportcard.com/badge/github.com/opencoff/go-chd)](https://goreportcard.com/report/github.com/opencoff/go-chd)\n\n# go-chd - Minimal Perfect Hash Function using Compress Hash Displace\n\n## What is it?\nA library to create, query and serialize/de-serialize minimal perfect hash function (\"MPHF\").\n\nThis is an implementation of [CHD](http://cmph.sourceforge.net/papers/esa09.pdf) -\ninspired by this [gist](https://gist.github.com/pervognsen/b21f6dd13f4bcb4ff2123f0d78fcfd17).\n\nThe library exposes the following types:\n\n- `ChdBuilder`: Represents the construction phase of the MPHF.\n  function as described in the paper above.\n- `Chd`: Represents a frozen MPHF over a given set of keys. You can only\n  do lookups on this type.\n- `DBWriter`: Used to construct a constant database of key-value\n  pairs - where the lookup of a given key is done in constant time\n  using `ChdBuilder`. Essentially, this type serializes a collection\n  of key-value pairs using `ChdBuilder` as the underlying index.\n- `DBReader`: Used for looking up key-values from a previously\n  constructed (serialized) database.\n\n*NOTE* Minimal Perfect Hash functions take a fixed input and\ngenerate a mapping to lookup the items in constant time. In\nparticular, they are NOT a replacement for a traditional hash-table;\ni.e., it may yield false-positives when queried using keys not\npresent during construction. In concrete terms:\n\n   Let S = {k0, k1, ... kn}  be your input key set.\n\n   If H: S -\u003e {0, .. n} is a minimal perfect hash function, then\n   H(kx) for kx NOT in S may yield an integer result (indicating\n   that kx was successfully \"looked up\").\n\nThus, if users of `Chd` are unsure of the input being passed to such a\n`Lookup()` function, they should add an additional comparison against\nthe actual key to verify. Look at `dbreader.go:Find()` for an\nexample.\n\n`DBWriter` optimizes the database if there are no values present -\ni.e., keys-only. This optimization significantly reduces the\nfile-size.\n\n\n## How do I use it?\nLike any other golang library: `go get github.com/opencoff/go-chd`.\n\n## Example Program\nThere is a working example of the `DBWriter` and `DBReader` interfaces in the\nfile *example/mphdb.go*. This example demonstrates the following functionality:\n\n- add one or more space delimited key/value files (first field is key, second\n  field is value)\n- add one or more CSV files (first field is key, second field is value)\n- Write the resulting MPH DB to disk\n- Read the DB and verify its integrity\n\nFirst, lets run some tests and make sure chd is working fine:\n\n```sh\n\n  $ git clone https://github.com/opencoff/go-chd\n  $ cd go-chd\n  $ make test\n\n```\n\nNow, lets build and run the example program:\n```sh\n\n  $ make\n  $ ./mphdb -h\n```\n\nThere is a helper python script to generate a very large text file of\nhostnames and IP addresses: `genhosts.py`. You can run it like so:\n\n```sh\n\n  $ python ./example/genhosts.py 192.168.0.0/16 \u003e a.txt\n```\n\nThe above example generates 65535 hostnames and corresponding IP addresses; each of the\nIP addresses is sequentially drawn from the given subnet.\n\n**NOTE** If you use a \"/8\" subnet mask you will generate a _lot_ of data (~430MB in size).\n\nOnce you have the input generated, you can feed it to the `example` program above to generate\na MPH DB:\n```sh\n\n  $ ./mphdb foo.db a.txt\n  $ ./mphdb -V foo.db\n```\n\nIt is possible that \"mphdb\" fails to construct a DB after trying 1,000,000 times. In that case,\ntry lowering the \"load\" factor (default is 0.85).\n\n```sh\n  $ ./mphdb -l 0.75 foo.db a.txt\n```\n\n## Basic Usage of ChdBuilder\nAssuming you have read your keys, hashed them into `uint64`, this is how you can use the library:\n\n```go\n\n        builder, err := chd.New(0.9)\n        if err != nil { panic(err) }\n\n        for i := range keys {\n            builder.Add(keys[i])\n        }\n\n        lookup, err := builder.Freeze()\n\n        // Now, call Find() with each key to gets its unique mapping.\n        // Note: Find() returns values in the range closed-interval [1, len(keys)]\n        for i, k := range keys {\n                j := lookup.Find(k)\n                fmt.Printf(\"%d: %#x maps to %d\\n\", i, k, j)\n        }\n\n```\n\n## Writing a DB Once, but lookup many times\nOne can construct an on-disk constant-time lookup using `ChdBuilder` as\nthe underlying indexing mechanism. Such a DB is useful in situations\nwhere the key/value pairs are NOT changed frequently; i.e.,\nread-dominant workloads. The typical pattern in such situations is\nto build the constant-DB _once_ for efficient retrieval and do\nlookups multiple times.\n\nThe example program in `example/` has helper routines to add from a\ntext or CSV delimited file: see `example/text.go`.\n\n## Implementation Notes\n\n* `chd.go`: The main implementation of the CHD algorithm. It has two\n  types: one to construct and freeze a MPHF (`ChdBuilder`) and\n  another to do constant time lookups from a frozen CHD MPHF\n  (`Chd`).\n\n* `dbwriter.go`: Create a read-only, constant-time MPH lookup DB. It \n  can store arbitrary byte stream \"values\" - each of which is\n  identified by a unique `uint64` key. The DB structure is optimized\n  for reading on the most common architectures - little-endian:\n  amd64, arm64 etc.\n\n* `dbreader.go`: Provides a constant-time lookup of a previously\n  constructed CHD MPH DB. DB reads use `mmap(2)` to reduce I/O\n  bottlenecks. For little-endian architectures, there is no data\n  \"parsing\" of the lookup tables, offset tables etc. They are \n  interpreted in-situ from the mmap'd data. To keep the code\n  generic, every multi-byte int is converted to little-endian order\n  before use. These conversion routines are in `endian_XX.go`.\n\n* `mmap.go`: Utility functions to map byte-slices to uintXX slices\n  and vice versa.\n\n* `marshal.go`: Marshal/Unmarshal CHD MPH\n\n## License\nGPL v2.0\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopencoff%2Fgo-chd","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopencoff%2Fgo-chd","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopencoff%2Fgo-chd/lists"}