https://github.com/keilerkonzept/bitknn
Fast exact k-nearest neighbors (k-NN) for 1-bit feature vectors packed into uint64s
https://github.com/keilerkonzept/bitknn
bitvector hamming-distance hamming-distance-knn k-nearest-neighbors knn uint64
Last synced: 2 months ago
JSON representation
Fast exact k-nearest neighbors (k-NN) for 1-bit feature vectors packed into uint64s
- Host: GitHub
- URL: https://github.com/keilerkonzept/bitknn
- Owner: keilerkonzept
- License: mit
- Created: 2024-10-05T20:58:13.000Z (8 months ago)
- Default Branch: master
- Last Pushed: 2025-03-03T19:31:55.000Z (3 months ago)
- Last Synced: 2025-03-03T20:31:31.994Z (3 months ago)
- Topics: bitvector, hamming-distance, hamming-distance-knn, k-nearest-neighbors, knn, uint64
- Language: Go
- Homepage:
- Size: 139 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# bitknn
[](https://github.com/keilerkonzept/bitknn/actions/workflows/gocover.yaml)[](https://pkg.go.dev/github.com/keilerkonzept/bitknn)
[](https://goreportcard.com/report/github.com/keilerkonzept/bitknn)```go
import "github.com/keilerkonzept/bitknn"
````bitknn` is a fast [k-nearest neighbors (k-NN)](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm) library for `uint64`s, using (bitwise) Hamming distance.
If you need to classify **binary feature vectors that fit into `uint64`s**, this library might be useful. It is fast mainly because we can use cheap bitwise ops (XOR + POPCNT) to calculate distances between `uint64` values. For smaller datasets, the performance of the [neighbor heap](internal/heap/heap.go) is also relevant, and so this part has been tuned here also.
If your vectors are **longer than 64 bits**, you can [pack](#packing-wide-data) them into `[]uint64` and classify them using the ["wide" model variants](#packing-wide-data). On ARM64 with NEON vector instruction support, `bitknn` can be [a bit faster still](#arm64-neon-support) on wide data.
You can optionally weigh class votes by distance, or specify different vote values per data point.
**Contents**
- [Usage](#usage)
- [Basic usage](#basic-usage)
- [Packing wide data](#packing-wide-data)
- [ARM64 NEON Support](#arm64-neon-support)
- [Options](#options)
- [Benchmarks](#benchmarks)
- [License](#license)## Usage
There are just three methods you'll typically need:
- **Fit** *(data, labels, [\[options\]](#options))*: create a model from a dataset
Variants: [`bitknn.Fit`](https://pkg.go.dev/github.com/keilerkonzept/bitknn#Fit), [`bitknn.FitWide`](https://pkg.go.dev/github.com/keilerkonzept/bitknn#FitWide)
- **Find** *(k, point)*: Given a point, return the *k* nearest neighbor's indices and distances.
Variants: [`bitknn.Model.Find`](https://pkg.go.dev/github.com/keilerkonzept/bitknn#Model.Find), [`bitknn.WideModel.Find`](https://pkg.go.dev/github.com/keilerkonzept/bitknn#WideModel.Find), [`bitknn.WideModel.FindV`](https://pkg.go.dev/github.com/keilerkonzept/bitknn#WideModel.FindV) (vectorized on ARM64 with NEON instructions)
- **Predict** *(k, point, votes)*: Predict the label for a given point based on its nearest neighbors, write the label votes into the provided vote counter.
Variants: [`bitknn.Model.Predict`](https://pkg.go.dev/github.com/keilerkonzept/bitknn#Model.Predict), [`bitknn.WideModel.Predict`](https://pkg.go.dev/github.com/keilerkonzept/bitknn#WideModel.Predict), [`bitknn.WideModel.PredictV`](https://pkg.go.dev/github.com/keilerkonzept/bitknn#WideModel.PredictV) (vectorized on ARM64 with NEON instructions).
Each of the above methods is available on either model type:
- [`bitknn.Model`](https://pkg.go.dev/github.com/keilerkonzept/bitknn#Model) (64 bits)
- [`bitknn.WideModel`](https://pkg.go.dev/github.com/keilerkonzept/bitknn#WideModel) (*N* * 64 bits)### Basic usage
```go
package mainimport (
"fmt""github.com/keilerkonzept/bitknn"
)func main() {
// feature vectors packed into uint64s
data := []uint64{0b101010, 0b111000, 0b000111}
// class labels
labels := []int{0, 1, 1}model := bitknn.Fit(data, labels, bitknn.WithLinearDistanceWeighting())
// one vote counter per class
votes := make([]float64, 2)k := 2
model.Predict(k, 0b101011, bitknn.VoteSlice(votes))
// or, just return the nearest neighbor's distances and indices:
// distances,indices := model.Find(k, 0b101011)fmt.Println("Votes:", bitknn.votes)
// you can also use a map for the votes.
// this is good if you have a very large number of different labels:
votesMap := make(map[int]float64)
model.Predict(k, 0b101011, bitknn.VoteMap(votesMap))
fmt.Println("Votes for 0:", votesMap[0])
}
```### Packing wide data
If your vectors are longer than 64 bits, you can still use `bitknn` if you [pack](https://pkg.go.dev/github.com/keilerkonzept/bitknn/pack) them into `[]uint64`. The [`pack` package](https://pkg.go.dev/github.com/keilerkonzept/bitknn/pack) defines helper functions to pack `string`s and `[]byte`s into `[]uint64`s.
> It's faster to use a `[][]uint64` allocated using a flat backing slice, laid out in one contiguous memory block. If you already have a non-contiguous `[][]uint64`, you can use [`pack.ReallocateFlat`](https://pkg.go.dev/github.com/keilerkonzept/bitknn/pack#ReallocateFlat) to re-allocate the dataset using a flat 1d backing slice.
The wide model fitting function is [`bitknn.FitWide`](https://pkg.go.dev/github.com/keilerkonzept/bitknn#FitWide) and accepts the same [Options](#options) as the "narrow" one:
```go
package mainimport (
"fmt""github.com/keilerkonzept/bitknn"
"github.com/keilerkonzept/bitknn/pack"
)func main() {
// feature vectors packed into uint64s
data := [][]uint64{
pack.String("foo"),
pack.String("bar"),
pack.String("baz"),
}
// class labels
labels := []int{0, 1, 1}model := bitknn.FitWide(data, labels, bitknn.WithLinearDistanceWeighting())
// one vote counter per class
votes := make([]float64, 2)k := 2
query := pack.String("fob")
model.Predict(k, query, bitknn.VoteSlice(votes))fmt.Println("Votes:", votes)
}
```### ARM64 NEON Support
For ARM64 CPUs with NEON instructions, `bitknn` has a [vectorized distance function for `[]uint64s`s](internal/neon/distance_arm64.s) that is about twice as fast as what the compiler generates.
When run on such a CPU, the ***V** methods [`WideModel.FindV`](https://pkg.go.dev/github.com/keilerkonzept/bitknn#WideModel.FindV) and [`WideModel.PredictV`](https://pkg.go.dev/github.com/keilerkonzept/bitknn#WideModel.predictV) are noticeably faster than the regular `Find`/`Predict`:
| Bits | N | k | `Find` s/op | `FindV` s/op | diff |
|-------|---------|-----|--------------|--------------|------------------------|
| 128 | 1000 | 3 | 2.374µ ± 0% | 1.792µ ± 0% | -24.54% (p=0.000 n=10) |
| 128 | 1000 | 10 | 2.901µ ± 1% | 2.028µ ± 1% | -30.08% (p=0.000 n=10) |
| 128 | 1000 | 100 | 5.472µ ± 3% | 4.359µ ± 1% | -20.34% (p=0.000 n=10) |
| 128 | 1000000 | 3 | 2.273m ± 3% | 1.380m ± 2% | -39.27% (p=0.000 n=10) |
| 128 | 1000000 | 10 | 2.261m ± 1% | 1.406m ± 1% | -37.84% (p=0.000 n=10) |
| 128 | 1000000 | 100 | 2.289m ± 0% | 1.425m ± 2% | -37.76% (p=0.000 n=10) |
| 640 | 1000 | 3 | 6.201µ ± 1% | 3.716µ ± 0% | -40.07% (p=0.000 n=10) |
| 640 | 1000 | 10 | 6.728µ ± 1% | 3.973µ ± 1% | -40.96% (p=0.000 n=10) |
| 640 | 1000 | 100 | 10.855µ ± 2% | 6.917µ ± 1% | -36.28% (p=0.000 n=10) |
| 640 | 1000000 | 3 | 5.832m ± 2% | 3.337m ± 1% | -42.78% (p=0.000 n=10) |
| 640 | 1000000 | 10 | 5.830m ± 5% | 3.339m ± 1% | -42.73% (p=0.000 n=10) |
| 640 | 1000000 | 100 | 5.872m ± 1% | 3.361m ± 1% | -42.77% (p=0.000 n=10) |
| 8192 | 1000000 | 10 | 72.66m ± 1% | 30.96m ± 3% | -57.39% (p=0.000 n=10) |## Options
- [`bitknn.WithLinearDistanceWeighting()`](https://pkg.go.dev/github.com/keilerkonzept/bitknn#WithLinearDistanceWeighting): Apply linear distance weighting (`1 / (1 + dist)`).
- [`bitknn.WithQuadraticDistanceWeighting()`](https://pkg.go.dev/github.com/keilerkonzept/bitknn#WithQuadraticDistanceWeighting): Apply quadratic distance weighting (`1 / (1 + dist^2)`).
- [`bitknn.WithDistanceWeightingFunc(f func(dist int) float64)`](https://pkg.go.dev/github.com/keilerkonzept/bitknn#WithDistanceWeightingFunc): Use a custom distance weighting function.
- [`bitknn.WithValues(values []float64)`](https://pkg.go.dev/github.com/keilerkonzept/bitknn#WithValues): Assign vote values for each data point.## Benchmarks
```
goos: darwin
goarch: arm64
pkg: github.com/keilerkonzept/bitknn
cpu: Apple M1 Pro
```| Bits | N | k | Model | Op | s/op | B/op | allocs/op |
|------|---------|-----|-----------|-----------|-------------|------|-----------|
| 64 | 100 | 3 | Model | `Predict` | 99.06n ± 2% | 0 | 0 |
| 64 | 100 | 3 | WideModel | `Predict` | 191.6n ± 1% | 0 | 0 |
| 64 | 100 | 3 | Model | `Find` | 88.09n ± 0% | 0 | 0 |
| 64 | 100 | 3 | WideModel | `Find` | 182.8n ± 1% | 0 | 0 |
| 64 | 100 | 10 | Model | `Predict` | 225.1n ± 1% | 0 | 0 |
| 64 | 100 | 10 | WideModel | `Predict` | 372.0n ± 1% | 0 | 0 |
| 64 | 100 | 10 | Model | `Find` | 202.9n ± 1% | 0 | 0 |
| 64 | 100 | 10 | WideModel | `Find` | 345.2n ± 0% | 0 | 0 |
| 64 | 1000 | 3 | Model | `Predict` | 538.2n ± 1% | 0 | 0 |
| 64 | 1000 | 3 | WideModel | `Predict` | 1.469µ ± 1% | 0 | 0 |
| 64 | 1000 | 3 | Model | `Find` | 525.8n ± 1% | 0 | 0 |
| 64 | 1000 | 3 | WideModel | `Find` | 1.465µ ± 1% | 0 | 0 |
| 64 | 1000 | 10 | Model | `Predict` | 835.4n ± 1% | 0 | 0 |
| 64 | 1000 | 10 | WideModel | `Predict` | 1.880µ ± 1% | 0 | 0 |
| 64 | 1000 | 10 | Model | `Find` | 807.4n ± 0% | 0 | 0 |
| 64 | 1000 | 10 | WideModel | `Find` | 1.867µ ± 2% | 0 | 0 |
| 64 | 1000 | 100 | Model | `Predict` | 3.718µ ± 0% | 0 | 0 |
| 64 | 1000 | 100 | WideModel | `Predict` | 4.935µ ± 0% | 0 | 0 |
| 64 | 1000 | 100 | Model | `Find` | 3.494µ ± 0% | 0 | 0 |
| 64 | 1000 | 100 | WideModel | `Find` | 4.701µ ± 0% | 0 | 0 |
| 64 | 1000000 | 3 | Model | `Predict` | 458.8µ ± 0% | 0 | 0 |
| 64 | 1000000 | 3 | WideModel | `Predict` | 1.301m ± 1% | 0 | 0 |
| 64 | 1000000 | 3 | Model | `Find` | 457.9µ ± 1% | 0 | 0 |
| 64 | 1000000 | 3 | WideModel | `Find` | 1.302m ± 1% | 0 | 0 |
| 64 | 1000000 | 10 | Model | `Predict` | 456.9µ ± 0% | 0 | 0 |
| 64 | 1000000 | 10 | WideModel | `Predict` | 1.295m ± 2% | 0 | 0 |
| 64 | 1000000 | 10 | Model | `Find` | 457.6µ ± 1% | 0 | 0 |
| 64 | 1000000 | 10 | WideModel | `Find` | 1.298m ± 1% | 0 | 0 |
| 64 | 1000000 | 100 | Model | `Predict` | 474.5µ ± 1% | 0 | 0 |
| 64 | 1000000 | 100 | WideModel | `Predict` | 1.316m ± 1% | 0 | 0 |
| 64 | 1000000 | 100 | Model | `Find` | 466.9µ ± 0% | 0 | 0 |
| 64 | 1000000 | 100 | WideModel | `Find` | 1.306m ± 0% | 0 | 0 |
| 128 | 100 | 3 | WideModel | `Predict` | 296.7n ± 0% | 0 | 0 |
| 128 | 100 | 3 | WideModel | `Find` | 285.8n ± 0% | 0 | 0 |
| 128 | 100 | 10 | WideModel | `Predict` | 467.4n ± 1% | 0 | 0 |
| 128 | 100 | 10 | WideModel | `Find` | 441.1n ± 1% | 0 | 0 |
| 640 | 100 | 3 | WideModel | `Predict` | 654.6n ± 1% | 0 | 0 |
| 640 | 100 | 3 | WideModel | `Find` | 640.3n ± 1% | 0 | 0 |
| 640 | 100 | 10 | WideModel | `Predict` | 850.0n ± 1% | 0 | 0 |
| 640 | 100 | 10 | WideModel | `Find` | 825.0n ± 0% | 0 | 0 |
| 128 | 1000 | 3 | WideModel | `Predict` | 2.384µ ± 0% | 0 | 0 |
| 128 | 1000 | 3 | WideModel | `Find` | 2.374µ ± 0% | 0 | 0 |
| 128 | 1000 | 10 | WideModel | `Predict` | 2.900µ ± 0% | 0 | 0 |
| 128 | 1000 | 10 | WideModel | `Find` | 2.901µ ± 1% | 0 | 0 |
| 128 | 1000 | 100 | WideModel | `Predict` | 5.630µ ± 1% | 0 | 0 |
| 128 | 1000 | 100 | WideModel | `Find` | 5.472µ ± 3% | 0 | 0 |
| 128 | 1000000 | 3 | WideModel | `Predict` | 2.266m ± 0% | 0 | 0 |
| 128 | 1000000 | 3 | WideModel | `Find` | 2.273m ± 3% | 0 | 0 |
| 128 | 1000000 | 10 | WideModel | `Predict` | 2.269m ± 0% | 0 | 0 |
| 128 | 1000000 | 10 | WideModel | `Find` | 2.261m ± 1% | 0 | 0 |
| 128 | 1000000 | 100 | WideModel | `Predict` | 2.295m ± 1% | 0 | 0 |
| 128 | 1000000 | 100 | WideModel | `Find` | 2.289m ± 0% | 0 | 0 |
| 640 | 1000 | 3 | WideModel | `Predict` | 6.214µ ± 2% | 0 | 0 |
| 640 | 1000 | 3 | WideModel | `Find` | 6.201µ ± 1% | 0 | 0 |
| 640 | 1000 | 10 | WideModel | `Predict` | 6.777µ ± 1% | 0 | 0 |
| 640 | 1000 | 10 | WideModel | `Find` | 6.728µ ± 1% | 0 | 0 |
| 640 | 1000 | 100 | WideModel | `Predict` | 11.16µ ± 2% | 0 | 0 |
| 640 | 1000 | 100 | WideModel | `Find` | 10.85µ ± 2% | 0 | 0 |
| 640 | 1000000 | 3 | WideModel | `Predict` | 5.756m ± 4% | 0 | 0 |
| 640 | 1000000 | 3 | WideModel | `Find` | 5.832m ± 2% | 0 | 0 |
| 640 | 1000000 | 10 | WideModel | `Predict` | 5.842m ± 1% | 0 | 0 |
| 640 | 1000000 | 10 | WideModel | `Find` | 5.830m ± 5% | 0 | 0 |
| 640 | 1000000 | 100 | WideModel | `Predict` | 5.914m ± 6% | 0 | 0 |
| 640 | 1000000 | 100 | WideModel | `Find` | 5.872m ± 1% | 0 | 0 |## License
MIT License