{"id":28386005,"url":"https://github.com/unum-cloud/usearch-benchmarks","last_synced_at":"2025-06-26T12:30:45.178Z","repository":{"id":198641729,"uuid":"694536206","full_name":"unum-cloud/usearch-benchmarks","owner":"unum-cloud","description":"Comparing USearch to FAISS and other Vector Search engines on Billion-scale datasets","archived":false,"fork":false,"pushed_at":"2023-11-29T20:39:54.000Z","size":10278,"stargazers_count":9,"open_issues_count":0,"forks_count":2,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-05-30T15:35:16.062Z","etag":null,"topics":["benchmark","faiss","vector-search","vector-search-engine"],"latest_commit_sha":null,"homepage":"https://www.unum.cloud/blog/2023-11-07-scaling-vector-search-with-intel","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/unum-cloud.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-09-21T07:38:07.000Z","updated_at":"2024-12-30T16:55:09.000Z","dependencies_parsed_at":"2023-10-10T18:28:54.676Z","dependency_job_id":"6da355df-681f-45dc-bb8a-15f51ba76a14","html_url":"https://github.com/unum-cloud/usearch-benchmarks","commit_stats":null,"previous_names":["unum-cloud/usearch-benchmarks"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/unum-cloud/usearch-benchmarks","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/unum-cloud%2Fusearch-benchmarks","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/unum-cloud%2Fusearch-benchmarks/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/unum-cloud%2Fusearch-benchmarks/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/unum-cloud%2Fusearch-benchmarks/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/unum-cloud","download_url":"https://codeload.github.com/unum-cloud/usearch-benchmarks/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/unum-cloud%2Fusearch-benchmarks/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262067650,"owners_count":23253644,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["benchmark","faiss","vector-search","vector-search-engine"],"created_at":"2025-05-30T12:38:19.120Z","updated_at":"2025-06-26T12:30:45.165Z","avatar_url":"https://github.com/unum-cloud.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# USearch Benchmarks\n\nThis set of benchmarks is meant to test USearch capabilities for Billion-scale vector search.\nIt provides an alternative to the `ann-benchmarks` and the `big-ann-benchmarks` which generally operate on much smaller collections.\n\nThe main objective is to understand the scaling laws of the USearch compared to [FAISS](https://github.com/facebookresearch/faiss).\nSupplementary adapters for other popular systems is also available under `index/` directory:\n\n- Alternative HNSW implementations, like HNSWlib,\n- Alternative CPU-based libraries, like SCANN,\n- Vector Databases, like Qdrant, and Wevaite.\n\nThe primary dataset used for benchmarks is the [Deep1B](https://research.yandex.com/blog/benchmarks-for-billion-scale-similarity-search) dataset of 1 Billion 96-dimensional vectors, totalling at __384 GB__.\nGround-truth nearest neighbors are provided to calculate the recall metrics.\n\n## Setup\n\nFirst of all, we recommend creating a `conda` environment to isolate the dependencies:\n\n```sh\nconda create -n usearch-benchmarks python=3.10\nconda activate usearch-benchmarks\n```\n\nThen install dependencies, getting an MKL-accelerated version of FAISS library.\n\n```sh\npip install usearch hnswlib scann lancedb qdrant-client weaviate-client psutil plotly kaleido\nconda install -c pytorch faiss-cpu=1.7.4 mkl=2021 blas=1.0=mkl\n```\n\nTo benchmark Qdrant, you need to run their Docker container:\n\n```sh\ndocker run -d -p 6333:6333 -p 6334:6334 qdrant/qdrant\n```\n\nFinally, download the [Deep1B](https://research.yandex.com/blog/benchmarks-for-billion-scale-similarity-search) dataset:\n\n```sh\nwget https://storage.yandexcloud.net/yandex-research/ann-datasets/DEEP/base.1B.fbin -P data\nwget https://storage.yandexcloud.net/yandex-research/ann-datasets/DEEP/base.10M.fbin -P data # For smaller subset\n```\n\nTo run the ANN benchmarks pass a configuration file:\n\n```sh\npython run.py configs/usearch_1B.json 1B # Outputs stats/*.npz file\npython utils/draw_plots.py # Exports tp plots/*.png\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Funum-cloud%2Fusearch-benchmarks","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Funum-cloud%2Fusearch-benchmarks","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Funum-cloud%2Fusearch-benchmarks/lists"}