https://github.com/ashvardanian/usearch-binary
Binary vector search example using Unum's USearch engine and pre-computed Wikipedia embeddings from Co:here and MixedBread
https://github.com/ashvardanian/usearch-binary
binary-vector bitset vector-database vector-search
Last synced: about 1 year ago
JSON representation
Binary vector search example using Unum's USearch engine and pre-computed Wikipedia embeddings from Co:here and MixedBread
- Host: GitHub
- URL: https://github.com/ashvardanian/usearch-binary
- Owner: ashvardanian
- License: apache-2.0
- Created: 2024-03-26T03:47:01.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-04-09T06:19:08.000Z (about 2 years ago)
- Last Synced: 2025-04-11T03:02:23.851Z (about 1 year ago)
- Topics: binary-vector, bitset, vector-database, vector-search
- Language: Jupyter Notebook
- Homepage: https://github.com/unum-cloud/usearch
- Size: 66.4 KB
- Stars: 18
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Binary Vector Search Examples for USearch
This repository contains examples for constructing binary vector-search indicies for WikiPedia embeddings available on the HuggingFace portal:
- [Co:here](https://huggingface.co/datasets/Cohere/wikipedia-2023-11-embed-multilingual-v3)
- [MixedBread.ai](https://huggingface.co/datasets/mixedbread-ai/wikipedia-embed-en-2023-11)
## Running Examples
To view the results, check out the [`bench.ipynb`](bench.ipynb).
To replicate the results, first, download the data:
```sh
$ pip install -r requirements.txt
$ python download.py
$ ls -alh mixedbread | head -n 1
> total 15G
$ ls -alh cohere | head -n 1
> total 15G
```
In both cases, the embeddings have 1024 dimensions, each represented with a single bit, packed into 128-byte vectors.
32 GBs of RAM are recommended to run the scripts.
## Optimizations
Knowing the length of embeddings is very handy for optimizations.
If the embeddings are only 1024 bits long, we only need 2 ZMM registers to store the entire vector.
We don't need any `for`-loops, then entire operation can be unrolled and inlined.
```c
inline uint64_t hamming_distance(uint8_t const* first_vector, uint8_t const* second_vector) {
__m512i const first_start = _mm512_loadu_si512((__m512i const*)(first_vector));
__m512i const first_end = _mm512_loadu_si512((__m512i const*)(first_vector + 64));
__m512i const second_start = _mm512_loadu_si512((__m512i const*)(second_vector));
__m512i const second_end = _mm512_loadu_si512((__m512i const*)(second_vector + 64));
__m512i const differences_start = _mm512_xor_epi64(first_start, second_start);
__m512i const differences_end = _mm512_xor_epi64(first_end, second_end);
__m512i const population_start = _mm512_popcnt_epi64(differences_start);
__m512i const population_end = _mm512_popcnt_epi64(differences_end);
__m512i const population = _mm512_add_epi64(population_start, population_end);
return _mm512_reduce_add_epi64(population);
}
```
To run the kernel benchmarks, use the following command:
```sh
$ python kernel.py
```
To run benchmarks over real data:
```sh
$ python kernels.py --dir cohere --limit 1e6
```