An open API service indexing awesome lists of open source software.

https://github.com/codelibs/search-ann-benchmark

Evaluating and comparing ANN search algorithms across various platforms
https://github.com/codelibs/search-ann-benchmark

Last synced: 3 months ago
JSON representation

Evaluating and comparing ANN search algorithms across various platforms

Awesome Lists containing this project

README

        

# Search ANN Benchmark

Benchmark the search performance of Approximate Nearest Neighbor (ANN) algorithms implemented in various systems.
This repository contains notebooks and scripts to evaluate and compare the efficiency and accuracy of ANN searches across different platforms.

## Introduction

Approximate Nearest Neighbor (ANN) search algorithms are essential for handling high-dimensional data spaces, enabling fast and resource-efficient retrieval of similar items from large datasets.
This benchmarking suite aims to provide an empirical basis for comparing the performance of several popular ANN-enabled search systems.

## Prerequisites

Before running the benchmarks, ensure you have the following installed:

- Docker
- Python 3.10 or higher

## Setup Instructions

1. **Prepare the Environment:**

Create directories for datasets and output files, then download the necessary datasets using the provided script.

```bash
/bin/bash ./scripts/setup.sh
```

2. **Install Dependencies:**

Install all required Python libraries.

```bash
pip install -r requirements.txt
```

## Benchmark Notebooks

The repository includes the following Jupyter notebooks for conducting benchmarks:

| Notebook | GitHub Actions |
|------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| [Chroma](run-chroma.ipynb) | [![Run Chroma on Linux](https://github.com/marevol/search-ann-benchmark/actions/workflows/run-chroma-linux.yml/badge.svg)](https://github.com/marevol/search-ann-benchmark/actions/workflows/run-chroma-linux.yml) |
| [Elasticsearch](run-elasticsearch.ipynb) | [![Run Elasticsearch on Linux](https://github.com/marevol/search-ann-benchmark/actions/workflows/run-elasticsearch-linux.yml/badge.svg)](https://github.com/marevol/search-ann-benchmark/actions/workflows/run-elasticsearch-linux.yml) |
| [Milvus](run-milvus.ipynb) | [![Run Milvus on Linux](https://github.com/marevol/search-ann-benchmark/actions/workflows/run-milvus-linux.yml/badge.svg)](https://github.com/marevol/search-ann-benchmark/actions/workflows/run-milvus-linux.yml) |
| [OpenSearch](run-opensearch.ipynb) | [![Run OpenSearch on Linux](https://github.com/marevol/search-ann-benchmark/actions/workflows/run-opensearch-linux.yml/badge.svg)](https://github.com/marevol/search-ann-benchmark/actions/workflows/run-opensearch-linux.yml) |
| [pgvector](run-pgvector.ipynb) | [![Run PGVector on Linux](https://github.com/marevol/search-ann-benchmark/actions/workflows/run-pgvector-linux.yml/badge.svg)](https://github.com/marevol/search-ann-benchmark/actions/workflows/run-pgvector-linux.yml) |
| [Qdrant](run-qdrant.ipynb) | [![Run Qdrant on Linux](https://github.com/marevol/search-ann-benchmark/actions/workflows/run-qdrant-linux.yml/badge.svg)](https://github.com/marevol/search-ann-benchmark/actions/workflows/run-qdrant-linux.yml) |
| [Vespa](run-vespa.ipynb) | [![Run Vespa on Linux](https://github.com/marevol/search-ann-benchmark/actions/workflows/run-vespa-linux.yml/badge.svg)](https://github.com/marevol/search-ann-benchmark/actions/workflows/run-vespa-linux.yml) |
| [Weaviate](run-weaviate.ipynb) | [![Run Weaviate on Linux](https://github.com/marevol/search-ann-benchmark/actions/workflows/run-weaviate-linux.yml/badge.svg)](https://github.com/marevol/search-ann-benchmark/actions/workflows/run-weaviate-linux.yml) |

Each notebook guides you through the process of setting up the test environment, loading the dataset, executing the search queries, and analyzing the results.

## Benchmark Results

For a comparison of the results, including response times and precision metrics for different ANN algorithms, see [Benchmark Results Page](https://codelibs.co/benchmark/ann-benchmark.html).

## Contributing

We welcome contributions!
If you have suggestions for additional benchmarks, improvements to existing ones, or fixes for any issues, please feel free to open an issue or submit a pull request.

## License

This project is licensed under the Apache License 2.0.