https://github.com/unum-cloud/usearch-benchmarks
Comparing USearch to FAISS and other Vector Search engines on Billion-scale datasets
https://github.com/unum-cloud/usearch-benchmarks
benchmark faiss vector-search vector-search-engine
Last synced: 3 months ago
JSON representation
Comparing USearch to FAISS and other Vector Search engines on Billion-scale datasets
- Host: GitHub
- URL: https://github.com/unum-cloud/usearch-benchmarks
- Owner: unum-cloud
- Created: 2023-09-21T07:38:07.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2023-11-29T20:39:54.000Z (almost 2 years ago)
- Last Synced: 2025-05-30T15:35:16.062Z (4 months ago)
- Topics: benchmark, faiss, vector-search, vector-search-engine
- Language: Python
- Homepage: https://www.unum.cloud/blog/2023-11-07-scaling-vector-search-with-intel
- Size: 9.8 MB
- Stars: 9
- Watchers: 3
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# USearch Benchmarks
This set of benchmarks is meant to test USearch capabilities for Billion-scale vector search.
It provides an alternative to the `ann-benchmarks` and the `big-ann-benchmarks` which generally operate on much smaller collections.The main objective is to understand the scaling laws of the USearch compared to [FAISS](https://github.com/facebookresearch/faiss).
Supplementary adapters for other popular systems is also available under `index/` directory:- Alternative HNSW implementations, like HNSWlib,
- Alternative CPU-based libraries, like SCANN,
- Vector Databases, like Qdrant, and Wevaite.The primary dataset used for benchmarks is the [Deep1B](https://research.yandex.com/blog/benchmarks-for-billion-scale-similarity-search) dataset of 1 Billion 96-dimensional vectors, totalling at __384 GB__.
Ground-truth nearest neighbors are provided to calculate the recall metrics.## Setup
First of all, we recommend creating a `conda` environment to isolate the dependencies:
```sh
conda create -n usearch-benchmarks python=3.10
conda activate usearch-benchmarks
```Then install dependencies, getting an MKL-accelerated version of FAISS library.
```sh
pip install usearch hnswlib scann lancedb qdrant-client weaviate-client psutil plotly kaleido
conda install -c pytorch faiss-cpu=1.7.4 mkl=2021 blas=1.0=mkl
```To benchmark Qdrant, you need to run their Docker container:
```sh
docker run -d -p 6333:6333 -p 6334:6334 qdrant/qdrant
```Finally, download the [Deep1B](https://research.yandex.com/blog/benchmarks-for-billion-scale-similarity-search) dataset:
```sh
wget https://storage.yandexcloud.net/yandex-research/ann-datasets/DEEP/base.1B.fbin -P data
wget https://storage.yandexcloud.net/yandex-research/ann-datasets/DEEP/base.10M.fbin -P data # For smaller subset
```To run the ANN benchmarks pass a configuration file:
```sh
python run.py configs/usearch_1B.json 1B # Outputs stats/*.npz file
python utils/draw_plots.py # Exports tp plots/*.png
```