{"id":21504406,"url":"https://github.com/keilerkonzept/bitknn","last_synced_at":"2025-03-17T14:19:33.520Z","repository":{"id":257812135,"uuid":"868240691","full_name":"keilerkonzept/bitknn","owner":"keilerkonzept","description":"Fast exact k-nearest neighbors (k-NN) for 1-bit feature vectors packed into uint64s","archived":false,"fork":false,"pushed_at":"2025-03-03T19:31:55.000Z","size":142,"stargazers_count":0,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-03T20:31:31.994Z","etag":null,"topics":["bitvector","hamming-distance","hamming-distance-knn","k-nearest-neighbors","knn","uint64"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/keilerkonzept.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-05T20:58:13.000Z","updated_at":"2025-03-03T19:31:08.000Z","dependencies_parsed_at":null,"dependency_job_id":"a0d8f4ca-9490-4187-8a93-f3c792ab6024","html_url":"https://github.com/keilerkonzept/bitknn","commit_stats":null,"previous_names":["keilerkonzept/bitknn"],"tags_count":27,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/keilerkonzept%2Fbitknn","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/keilerkonzept%2Fbitknn/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/keilerkonzept%2Fbitknn/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/keilerkonzept%2Fbitknn/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/keilerkonzept","download_url":"https://codeload.github.com/keilerkonzept/bitknn/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244047646,"owners_count":20389206,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bitvector","hamming-distance","hamming-distance-knn","k-nearest-neighbors","knn","uint64"],"created_at":"2024-11-23T18:59:12.505Z","updated_at":"2025-03-17T14:19:33.487Z","avatar_url":"https://github.com/keilerkonzept.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# bitknn\n[![Coverage](https://img.shields.io/badge/Coverage-100.0%25-brightgreen)](https://github.com/keilerkonzept/bitknn/actions/workflows/gocover.yaml)\n\n[![Go Reference](https://pkg.go.dev/badge/github.com/keilerkonzept/bitknn.svg)](https://pkg.go.dev/github.com/keilerkonzept/bitknn)\n[![Go Report Card](https://goreportcard.com/badge/github.com/keilerkonzept/bitknn)](https://goreportcard.com/report/github.com/keilerkonzept/bitknn)\n\n\n```go\nimport \"github.com/keilerkonzept/bitknn\"\n```\n\n`bitknn` is a fast [k-nearest neighbors (k-NN)](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm) library for `uint64`s, using (bitwise) Hamming distance.\n\nIf you need to classify **binary feature vectors that fit into `uint64`s**, this library might be useful. It is fast mainly because we can use cheap bitwise ops (XOR + POPCNT) to calculate distances between `uint64` values. For smaller datasets, the performance of the [neighbor heap](internal/heap/heap.go) is also relevant, and so this part has been tuned here also.\n\nIf your vectors are **longer than 64 bits**, you can [pack](#packing-wide-data) them into `[]uint64` and classify them using the [\"wide\" model variants](#packing-wide-data). On ARM64 with NEON vector instruction support, `bitknn` can be [a bit faster still](#arm64-neon-support) on wide data.\n\nYou can optionally weigh class votes by distance, or specify different vote values per data point.\n\n**Contents**\n- [Usage](#usage)\n  - [Basic usage](#basic-usage)\n  - [Packing wide data](#packing-wide-data)\n  - [ARM64 NEON Support](#arm64-neon-support)\n- [Options](#options)\n- [Benchmarks](#benchmarks)\n- [License](#license)\n\n## Usage\n\nThere are just three methods you'll typically need:\n\n- **Fit** *(data, labels, [\\[options\\]](#options))*: create a model from a dataset\n\n  Variants: [`bitknn.Fit`](https://pkg.go.dev/github.com/keilerkonzept/bitknn#Fit), [`bitknn.FitWide`](https://pkg.go.dev/github.com/keilerkonzept/bitknn#FitWide)\n\n- **Find** *(k, point)*: Given a point, return the *k* nearest neighbor's indices and distances.\n\n  Variants: [`bitknn.Model.Find`](https://pkg.go.dev/github.com/keilerkonzept/bitknn#Model.Find), [`bitknn.WideModel.Find`](https://pkg.go.dev/github.com/keilerkonzept/bitknn#WideModel.Find), [`bitknn.WideModel.FindV`](https://pkg.go.dev/github.com/keilerkonzept/bitknn#WideModel.FindV) (vectorized on ARM64 with NEON instructions)\n\n- **Predict** *(k, point, votes)*: Predict the label for a given point based on its nearest neighbors, write the label votes into the provided vote counter.\n\n  Variants: [`bitknn.Model.Predict`](https://pkg.go.dev/github.com/keilerkonzept/bitknn#Model.Predict), [`bitknn.WideModel.Predict`](https://pkg.go.dev/github.com/keilerkonzept/bitknn#WideModel.Predict), [`bitknn.WideModel.PredictV`](https://pkg.go.dev/github.com/keilerkonzept/bitknn#WideModel.PredictV) (vectorized on ARM64 with NEON instructions).\n\nEach of the above methods is available on either model type:\n\n- [`bitknn.Model`](https://pkg.go.dev/github.com/keilerkonzept/bitknn#Model) (64 bits)\n- [`bitknn.WideModel`](https://pkg.go.dev/github.com/keilerkonzept/bitknn#WideModel) (*N* * 64 bits)\n\n### Basic usage\n\n```go\npackage main\n\nimport (\n    \"fmt\"\n\n    \"github.com/keilerkonzept/bitknn\"\n)\n\nfunc main() {\n    // feature vectors packed into uint64s\n    data := []uint64{0b101010, 0b111000, 0b000111}\n    // class labels\n    labels := []int{0, 1, 1}\n\n    model := bitknn.Fit(data, labels, bitknn.WithLinearDistanceWeighting())\n\n    // one vote counter per class\n    votes := make([]float64, 2)\n\n    k := 2\n    model.Predict(k, 0b101011, bitknn.VoteSlice(votes))\n    // or, just return the nearest neighbor's distances and indices:\n    // distances,indices := model.Find(k, 0b101011)\n\n    fmt.Println(\"Votes:\", bitknn.votes)\n\n    // you can also use a map for the votes.\n    // this is good if you have a very large number of different labels:\n    votesMap := make(map[int]float64)\n    model.Predict(k, 0b101011, bitknn.VoteMap(votesMap))\n    fmt.Println(\"Votes for 0:\", votesMap[0])\n}\n```\n\n### Packing wide data\n\nIf your vectors are longer than 64 bits, you can still use `bitknn` if you [pack](https://pkg.go.dev/github.com/keilerkonzept/bitknn/pack) them into `[]uint64`. The [`pack` package](https://pkg.go.dev/github.com/keilerkonzept/bitknn/pack) defines helper functions to pack `string`s and `[]byte`s into `[]uint64`s.\n\n\u003e It's faster to use a `[][]uint64` allocated using a flat backing slice, laid out in one contiguous memory block. If you already have a non-contiguous `[][]uint64`, you can use [`pack.ReallocateFlat`](https://pkg.go.dev/github.com/keilerkonzept/bitknn/pack#ReallocateFlat) to re-allocate the dataset using a flat 1d backing slice.\n\nThe wide model fitting function is [`bitknn.FitWide`](https://pkg.go.dev/github.com/keilerkonzept/bitknn#FitWide) and accepts the same [Options](#options) as the \"narrow\" one:\n\n\n```go\npackage main\n\nimport (\n    \"fmt\"\n\n    \"github.com/keilerkonzept/bitknn\"\n    \"github.com/keilerkonzept/bitknn/pack\"\n)\n\nfunc main() {\n    // feature vectors packed into uint64s\n    data := [][]uint64{\n    \tpack.String(\"foo\"),\n    \tpack.String(\"bar\"),\n    \tpack.String(\"baz\"),\n    }\n    // class labels\n    labels := []int{0, 1, 1}\n\n    model := bitknn.FitWide(data, labels, bitknn.WithLinearDistanceWeighting())\n\n    // one vote counter per class\n    votes := make([]float64, 2)\n\n    k := 2\n    query := pack.String(\"fob\")\n    model.Predict(k, query, bitknn.VoteSlice(votes))\n\n    fmt.Println(\"Votes:\", votes)\n}\n```\n\n### ARM64 NEON Support\n\nFor ARM64 CPUs with NEON instructions, `bitknn` has a [vectorized distance function for `[]uint64s`s](internal/neon/distance_arm64.s) that is about twice as fast as what the compiler generates.\n\nWhen run on such a CPU, the ***V** methods [`WideModel.FindV`](https://pkg.go.dev/github.com/keilerkonzept/bitknn#WideModel.FindV) and [`WideModel.PredictV`](https://pkg.go.dev/github.com/keilerkonzept/bitknn#WideModel.predictV) are  noticeably faster than  the regular `Find`/`Predict`:\n\n| Bits  | N       | k   | `Find` s/op  | `FindV` s/op | diff                   |\n|-------|---------|-----|--------------|--------------|------------------------|\n| 128   | 1000    | 3   | 2.374µ ± 0%  | 1.792µ ± 0%  | -24.54% (p=0.000 n=10) |\n| 128   | 1000    | 10  | 2.901µ ± 1%  | 2.028µ ± 1%  | -30.08% (p=0.000 n=10) |\n| 128   | 1000    | 100 | 5.472µ ± 3%  | 4.359µ ± 1%  | -20.34% (p=0.000 n=10) |\n| 128   | 1000000 | 3   | 2.273m ± 3%  | 1.380m ± 2%  | -39.27% (p=0.000 n=10) |\n| 128   | 1000000 | 10  | 2.261m ± 1%  | 1.406m ± 1%  | -37.84% (p=0.000 n=10) |\n| 128   | 1000000 | 100 | 2.289m ± 0%  | 1.425m ± 2%  | -37.76% (p=0.000 n=10) |\n| 640   | 1000    | 3   | 6.201µ ± 1%  | 3.716µ ± 0%  | -40.07% (p=0.000 n=10) |\n| 640   | 1000    | 10  | 6.728µ ± 1%  | 3.973µ ± 1%  | -40.96% (p=0.000 n=10) |\n| 640   | 1000    | 100 | 10.855µ ± 2% | 6.917µ ± 1%  | -36.28% (p=0.000 n=10) |\n| 640   | 1000000 | 3   | 5.832m ± 2%  | 3.337m ± 1%  | -42.78% (p=0.000 n=10) |\n| 640   | 1000000 | 10  | 5.830m ± 5%  | 3.339m ± 1%  | -42.73% (p=0.000 n=10) |\n| 640   | 1000000 | 100 | 5.872m ± 1%  | 3.361m ± 1%  | -42.77% (p=0.000 n=10) |\n| 8192  | 1000000 | 10  | 72.66m ± 1%  | 30.96m ± 3%  | -57.39% (p=0.000 n=10) |\n\n\n## Options\n\n- [`bitknn.WithLinearDistanceWeighting()`](https://pkg.go.dev/github.com/keilerkonzept/bitknn#WithLinearDistanceWeighting): Apply linear distance weighting (`1 / (1 + dist)`).\n- [`bitknn.WithQuadraticDistanceWeighting()`](https://pkg.go.dev/github.com/keilerkonzept/bitknn#WithQuadraticDistanceWeighting): Apply quadratic distance weighting (`1 / (1 + dist^2)`).\n- [`bitknn.WithDistanceWeightingFunc(f func(dist int) float64)`](https://pkg.go.dev/github.com/keilerkonzept/bitknn#WithDistanceWeightingFunc): Use a custom distance weighting function.\n- [`bitknn.WithValues(values []float64)`](https://pkg.go.dev/github.com/keilerkonzept/bitknn#WithValues): Assign vote values for each data point.\n\n\n## Benchmarks\n\n```\ngoos: darwin\ngoarch: arm64\npkg: github.com/keilerkonzept/bitknn\ncpu: Apple M1 Pro\n```\n\n| Bits | N       | k   | Model     | Op        | s/op        | B/op | allocs/op |\n|------|---------|-----|-----------|-----------|-------------|------|-----------|\n| 64   | 100     | 3   | Model     | `Predict` | 99.06n ± 2% | 0    | 0         |\n| 64   | 100     | 3   | WideModel | `Predict` | 191.6n ± 1% | 0    | 0         |\n| 64   | 100     | 3   | Model     | `Find`    | 88.09n ± 0% | 0    | 0         |\n| 64   | 100     | 3   | WideModel | `Find`    | 182.8n ± 1% | 0    | 0         |\n| 64   | 100     | 10  | Model     | `Predict` | 225.1n ± 1% | 0    | 0         |\n| 64   | 100     | 10  | WideModel | `Predict` | 372.0n ± 1% | 0    | 0         |\n| 64   | 100     | 10  | Model     | `Find`    | 202.9n ± 1% | 0    | 0         |\n| 64   | 100     | 10  | WideModel | `Find`    | 345.2n ± 0% | 0    | 0         |\n| 64   | 1000    | 3   | Model     | `Predict` | 538.2n ± 1% | 0    | 0         |\n| 64   | 1000    | 3   | WideModel | `Predict` | 1.469µ ± 1% | 0    | 0         |\n| 64   | 1000    | 3   | Model     | `Find`    | 525.8n ± 1% | 0    | 0         |\n| 64   | 1000    | 3   | WideModel | `Find`    | 1.465µ ± 1% | 0    | 0         |\n| 64   | 1000    | 10  | Model     | `Predict` | 835.4n ± 1% | 0    | 0         |\n| 64   | 1000    | 10  | WideModel | `Predict` | 1.880µ ± 1% | 0    | 0         |\n| 64   | 1000    | 10  | Model     | `Find`    | 807.4n ± 0% | 0    | 0         |\n| 64   | 1000    | 10  | WideModel | `Find`    | 1.867µ ± 2% | 0    | 0         |\n| 64   | 1000    | 100 | Model     | `Predict` | 3.718µ ± 0% | 0    | 0         |\n| 64   | 1000    | 100 | WideModel | `Predict` | 4.935µ ± 0% | 0    | 0         |\n| 64   | 1000    | 100 | Model     | `Find`    | 3.494µ ± 0% | 0    | 0         |\n| 64   | 1000    | 100 | WideModel | `Find`    | 4.701µ ± 0% | 0    | 0         |\n| 64   | 1000000 | 3   | Model     | `Predict` | 458.8µ ± 0% | 0    | 0         |\n| 64   | 1000000 | 3   | WideModel | `Predict` | 1.301m ± 1% | 0    | 0         |\n| 64   | 1000000 | 3   | Model     | `Find`    | 457.9µ ± 1% | 0    | 0         |\n| 64   | 1000000 | 3   | WideModel | `Find`    | 1.302m ± 1% | 0    | 0         |\n| 64   | 1000000 | 10  | Model     | `Predict` | 456.9µ ± 0% | 0    | 0         |\n| 64   | 1000000 | 10  | WideModel | `Predict` | 1.295m ± 2% | 0    | 0         |\n| 64   | 1000000 | 10  | Model     | `Find`    | 457.6µ ± 1% | 0    | 0         |\n| 64   | 1000000 | 10  | WideModel | `Find`    | 1.298m ± 1% | 0    | 0         |\n| 64   | 1000000 | 100 | Model     | `Predict` | 474.5µ ± 1% | 0    | 0         |\n| 64   | 1000000 | 100 | WideModel | `Predict` | 1.316m ± 1% | 0    | 0         |\n| 64   | 1000000 | 100 | Model     | `Find`    | 466.9µ ± 0% | 0    | 0         |\n| 64   | 1000000 | 100 | WideModel | `Find`    | 1.306m ± 0% | 0    | 0         |\n| 128  | 100     | 3   | WideModel | `Predict` | 296.7n ± 0% | 0    | 0         |\n| 128  | 100     | 3   | WideModel | `Find`    | 285.8n ± 0% | 0    | 0         |\n| 128  | 100     | 10  | WideModel | `Predict` | 467.4n ± 1% | 0    | 0         |\n| 128  | 100     | 10  | WideModel | `Find`    | 441.1n ± 1% | 0    | 0         |\n| 640  | 100     | 3   | WideModel | `Predict` | 654.6n ± 1% | 0    | 0         |\n| 640  | 100     | 3   | WideModel | `Find`    | 640.3n ± 1% | 0    | 0         |\n| 640  | 100     | 10  | WideModel | `Predict` | 850.0n ± 1% | 0    | 0         |\n| 640  | 100     | 10  | WideModel | `Find`    | 825.0n ± 0% | 0    | 0         |\n| 128  | 1000    | 3   | WideModel | `Predict` | 2.384µ ± 0% | 0    | 0         |\n| 128  | 1000    | 3   | WideModel | `Find`    | 2.374µ ± 0% | 0    | 0         |\n| 128  | 1000    | 10  | WideModel | `Predict` | 2.900µ ± 0% | 0    | 0         |\n| 128  | 1000    | 10  | WideModel | `Find`    | 2.901µ ± 1% | 0    | 0         |\n| 128  | 1000    | 100 | WideModel | `Predict` | 5.630µ ± 1% | 0    | 0         |\n| 128  | 1000    | 100 | WideModel | `Find`    | 5.472µ ± 3% | 0    | 0         |\n| 128  | 1000000 | 3   | WideModel | `Predict` | 2.266m ± 0% | 0    | 0         |\n| 128  | 1000000 | 3   | WideModel | `Find`    | 2.273m ± 3% | 0    | 0         |\n| 128  | 1000000 | 10  | WideModel | `Predict` | 2.269m ± 0% | 0    | 0         |\n| 128  | 1000000 | 10  | WideModel | `Find`    | 2.261m ± 1% | 0    | 0         |\n| 128  | 1000000 | 100 | WideModel | `Predict` | 2.295m ± 1% | 0    | 0         |\n| 128  | 1000000 | 100 | WideModel | `Find`    | 2.289m ± 0% | 0    | 0         |\n| 640  | 1000    | 3   | WideModel | `Predict` | 6.214µ ± 2% | 0    | 0         |\n| 640  | 1000    | 3   | WideModel | `Find`    | 6.201µ ± 1% | 0    | 0         |\n| 640  | 1000    | 10  | WideModel | `Predict` | 6.777µ ± 1% | 0    | 0         |\n| 640  | 1000    | 10  | WideModel | `Find`    | 6.728µ ± 1% | 0    | 0         |\n| 640  | 1000    | 100 | WideModel | `Predict` | 11.16µ ± 2% | 0    | 0         |\n| 640  | 1000    | 100 | WideModel | `Find`    | 10.85µ ± 2% | 0    | 0         |\n| 640  | 1000000 | 3   | WideModel | `Predict` | 5.756m ± 4% | 0    | 0         |\n| 640  | 1000000 | 3   | WideModel | `Find`    | 5.832m ± 2% | 0    | 0         |\n| 640  | 1000000 | 10  | WideModel | `Predict` | 5.842m ± 1% | 0    | 0         |\n| 640  | 1000000 | 10  | WideModel | `Find`    | 5.830m ± 5% | 0    | 0         |\n| 640  | 1000000 | 100 | WideModel | `Predict` | 5.914m ± 6% | 0    | 0         |\n| 640  | 1000000 | 100 | WideModel | `Find`    | 5.872m ± 1% | 0    | 0         |\n\n## License\n\nMIT License\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkeilerkonzept%2Fbitknn","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkeilerkonzept%2Fbitknn","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkeilerkonzept%2Fbitknn/lists"}