https://github.com/unum-cloud/usearch-benchmarks

Comparing USearch to FAISS and other Vector Search engines on Billion-scale datasets
https://github.com/unum-cloud/usearch-benchmarks

benchmark faiss vector-search vector-search-engine

Last synced: 3 months ago
JSON representation

Comparing USearch to FAISS and other Vector Search engines on Billion-scale datasets

Host: GitHub
URL: https://github.com/unum-cloud/usearch-benchmarks
Owner: unum-cloud
Created: 2023-09-21T07:38:07.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2023-11-29T20:39:54.000Z (almost 2 years ago)
Last Synced: 2025-05-30T15:35:16.062Z (4 months ago)
Topics: benchmark, faiss, vector-search, vector-search-engine
Language: Python
Homepage: https://www.unum.cloud/blog/2023-11-07-scaling-vector-search-with-intel
Size: 9.8 MB
Stars: 9
Watchers: 3
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# USearch Benchmarks

This set of benchmarks is meant to test USearch capabilities for Billion-scale vector search.
It provides an alternative to the `ann-benchmarks` and the `big-ann-benchmarks` which generally operate on much smaller collections.

The main objective is to understand the scaling laws of the USearch compared to [FAISS](https://github.com/facebookresearch/faiss).
Supplementary adapters for other popular systems is also available under `index/` directory:

- Alternative HNSW implementations, like HNSWlib,
- Alternative CPU-based libraries, like SCANN,
- Vector Databases, like Qdrant, and Wevaite.

The primary dataset used for benchmarks is the [Deep1B](https://research.yandex.com/blog/benchmarks-for-billion-scale-similarity-search) dataset of 1 Billion 96-dimensional vectors, totalling at __384 GB__.
Ground-truth nearest neighbors are provided to calculate the recall metrics.

## Setup

First of all, we recommend creating a `conda` environment to isolate the dependencies:

```sh
conda create -n usearch-benchmarks python=3.10
conda activate usearch-benchmarks
```

Then install dependencies, getting an MKL-accelerated version of FAISS library.

```sh
pip install usearch hnswlib scann lancedb qdrant-client weaviate-client psutil plotly kaleido
conda install -c pytorch faiss-cpu=1.7.4 mkl=2021 blas=1.0=mkl
```

To benchmark Qdrant, you need to run their Docker container:

```sh
docker run -d -p 6333:6333 -p 6334:6334 qdrant/qdrant
```

Finally, download the [Deep1B](https://research.yandex.com/blog/benchmarks-for-billion-scale-similarity-search) dataset:

```sh
wget https://storage.yandexcloud.net/yandex-research/ann-datasets/DEEP/base.1B.fbin -P data
wget https://storage.yandexcloud.net/yandex-research/ann-datasets/DEEP/base.10M.fbin -P data # For smaller subset
```

To run the ANN benchmarks pass a configuration file:

```sh
python run.py configs/usearch_1B.json 1B # Outputs stats/*.npz file
python utils/draw_plots.py # Exports tp plots/*.png
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/unum-cloud/usearch-benchmarks

Awesome Lists containing this project

README