{"id":21994439,"url":"https://github.com/codelibs/search-ann-benchmark","last_synced_at":"2026-04-17T05:02:17.045Z","repository":{"id":226857186,"uuid":"765181229","full_name":"codelibs/search-ann-benchmark","owner":"codelibs","description":"Evaluating and comparing ANN search algorithms across various platforms","archived":false,"fork":false,"pushed_at":"2026-04-06T14:19:21.000Z","size":477,"stargazers_count":9,"open_issues_count":0,"forks_count":2,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-04-06T16:19:04.873Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/codelibs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-02-29T12:40:51.000Z","updated_at":"2026-04-06T14:19:26.000Z","dependencies_parsed_at":"2024-03-10T07:29:51.166Z","dependency_job_id":"24870afa-d318-4720-8cd6-3f8c9b568136","html_url":"https://github.com/codelibs/search-ann-benchmark","commit_stats":null,"previous_names":["marevol/search-wikipedia-benchmark","marevol/search-ann-benchmark","codelibs/search-ann-benchmark"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/codelibs/search-ann-benchmark","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codelibs%2Fsearch-ann-benchmark","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codelibs%2Fsearch-ann-benchmark/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codelibs%2Fsearch-ann-benchmark/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codelibs%2Fsearch-ann-benchmark/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/codelibs","download_url":"https://codeload.github.com/codelibs/search-ann-benchmark/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codelibs%2Fsearch-ann-benchmark/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31915900,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-16T18:22:33.417Z","status":"online","status_checked_at":"2026-04-17T02:00:06.879Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-29T21:08:54.800Z","updated_at":"2026-04-17T05:02:17.039Z","avatar_url":"https://github.com/codelibs.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# Search ANN Benchmark\n\nBenchmark the search performance of Approximate Nearest Neighbor (ANN) algorithms implemented in various systems.\nThis repository contains a Python CLI tool to evaluate and compare the efficiency and accuracy of ANN searches across different platforms.\n\n## Introduction\n\nApproximate Nearest Neighbor (ANN) search algorithms are essential for handling high-dimensional data spaces, enabling fast and resource-efficient retrieval of similar items from large datasets.\nThis benchmarking suite aims to provide an empirical basis for comparing the performance of several popular ANN-enabled search systems.\n\n## Supported Engines\n\n| Engine | Version | GitHub Actions |\n|--------|---------|----------------|\n| Qdrant | 1.17.1 | [![Run Qdrant](https://github.com/codelibs/search-ann-benchmark/actions/workflows/run-qdrant-linux.yml/badge.svg)](https://github.com/codelibs/search-ann-benchmark/actions/workflows/run-qdrant-linux.yml) |\n| Elasticsearch | 9.3.2 | [![Run Elasticsearch](https://github.com/codelibs/search-ann-benchmark/actions/workflows/run-elasticsearch-linux.yml/badge.svg)](https://github.com/codelibs/search-ann-benchmark/actions/workflows/run-elasticsearch-linux.yml) |\n| OpenSearch | 3.5.0 | [![Run OpenSearch](https://github.com/codelibs/search-ann-benchmark/actions/workflows/run-opensearch-linux.yml/badge.svg)](https://github.com/codelibs/search-ann-benchmark/actions/workflows/run-opensearch-linux.yml) |\n| Milvus | 2.6.14 | [![Run Milvus](https://github.com/codelibs/search-ann-benchmark/actions/workflows/run-milvus-linux.yml/badge.svg)](https://github.com/codelibs/search-ann-benchmark/actions/workflows/run-milvus-linux.yml) |\n| Weaviate | 1.36.9 | [![Run Weaviate](https://github.com/codelibs/search-ann-benchmark/actions/workflows/run-weaviate-linux.yml/badge.svg)](https://github.com/codelibs/search-ann-benchmark/actions/workflows/run-weaviate-linux.yml) |\n| Vespa | 8.667.16 | [![Run Vespa](https://github.com/codelibs/search-ann-benchmark/actions/workflows/run-vespa-linux.yml/badge.svg)](https://github.com/codelibs/search-ann-benchmark/actions/workflows/run-vespa-linux.yml) |\n| pgvector | 0.8.2-pg18 | [![Run pgvector](https://github.com/codelibs/search-ann-benchmark/actions/workflows/run-pgvector-linux.yml/badge.svg)](https://github.com/codelibs/search-ann-benchmark/actions/workflows/run-pgvector-linux.yml) |\n| Chroma | 1.5.7 | [![Run Chroma](https://github.com/codelibs/search-ann-benchmark/actions/workflows/run-chroma-linux.yml/badge.svg)](https://github.com/codelibs/search-ann-benchmark/actions/workflows/run-chroma-linux.yml) |\n| Redis Stack | 7.4.0-v8 | [![Run Redis Stack](https://github.com/codelibs/search-ann-benchmark/actions/workflows/run-redisstack-linux.yml/badge.svg)](https://github.com/codelibs/search-ann-benchmark/actions/workflows/run-redisstack-linux.yml) |\n| Vald | v1.7.17 | [![Run Vald](https://github.com/codelibs/search-ann-benchmark/actions/workflows/run-vald-linux.yml/badge.svg)](https://github.com/codelibs/search-ann-benchmark/actions/workflows/run-vald-linux.yml) |\n| ClickHouse | 26.2 | [![Run ClickHouse](https://github.com/codelibs/search-ann-benchmark/actions/workflows/run-clickhouse-linux.yml/badge.svg)](https://github.com/codelibs/search-ann-benchmark/actions/workflows/run-clickhouse-linux.yml) |\n| LanceDB | 0.29.2 | [![Run LanceDB](https://github.com/codelibs/search-ann-benchmark/actions/workflows/run-lancedb-linux.yml/badge.svg)](https://github.com/codelibs/search-ann-benchmark/actions/workflows/run-lancedb-linux.yml) |\n\n## Prerequisites\n\nBefore running the benchmarks, ensure you have the following installed:\n\n- Docker\n- Python 3.10 or higher\n- uv (Python package manager)\n\n## Installation\n\n1. **Install uv (if not already installed):**\n\n    ```bash\n    curl -LsSf https://astral.sh/uv/install.sh | sh\n    ```\n\n2. **Clone the repository and install dependencies:**\n\n    ```bash\n    git clone https://github.com/codelibs/search-ann-benchmark.git\n    cd search-ann-benchmark\n    uv sync\n    ```\n\n3. **Download the dataset:**\n\n    ```bash\n    bash scripts/setup.sh\n    ```\n\n    For GitHub Actions (smaller dataset):\n    ```bash\n    bash scripts/setup.sh gha\n    ```\n\n## Usage\n\n### Run a benchmark\n\n```bash\n# Run Qdrant benchmark with default settings\nuv run search-ann-benchmark run qdrant\n\n# Run Elasticsearch with specific configuration\nuv run search-ann-benchmark run elasticsearch --target 1m-768-m48-efc200-ef100-ip\n\n# Run with quantization\nuv run search-ann-benchmark run elasticsearch --quantization int8 --variant int8\n\n# Skip filtered search benchmark\nuv run search-ann-benchmark run chroma --no-filter\n```\n\n### List available engines\n\n```bash\nuv run search-ann-benchmark list-engines\n```\n\n### List available configurations\n\n```bash\nuv run search-ann-benchmark list-targets\n```\n\n### Show configuration details\n\n```bash\nuv run search-ann-benchmark show-config qdrant --target 100k-768-m32-efc200-ef100-ip\n```\n\n### View benchmark results\n\n```bash\nuv run search-ann-benchmark show-results results.json\n```\n\n## Configuration Options\n\n### Target Configurations\n\n| Name | Index Size | HNSW M | Description |\n|------|------------|--------|-------------|\n| 100k-768-m32-efc200-ef100-ip | 100,000 | 32 | Small dataset for quick testing |\n| 1m-768-m48-efc200-ef100-ip | 1,000,000 | 48 | Medium dataset |\n| 5m-768-m48-efc200-ef100-ip | 5,000,000 | 48 | Full dataset |\n\n### Quantization Options\n\nDifferent engines support different quantization modes:\n\n- **Qdrant**: none, int8\n- **Elasticsearch**: none, int4, int8, bbq\n- **OpenSearch**: none (supports faiss engine variant)\n- **Weaviate**: none, pq\n- **pgvector**: vector, halfvec\n\n## Project Structure\n\n```\nsearch-ann-benchmark/\n├── src/search_ann_benchmark/\n│   ├── __init__.py\n│   ├── cli.py              # CLI entry point\n│   ├── config.py           # Configuration classes\n│   ├── runner.py           # Benchmark orchestration\n│   ├── core/\n│   │   ├── base.py         # Abstract engine interface\n│   │   ├── docker.py       # Docker management\n│   │   ├── embedding.py    # Embedding loader\n│   │   └── metrics.py      # Metrics calculation\n│   └── engines/\n│       ├── qdrant.py\n│       ├── elasticsearch.py\n│       ├── opensearch.py\n│       ├── milvus.py\n│       ├── weaviate.py\n│       ├── vespa.py\n│       ├── pgvector.py\n│       ├── chroma.py\n│       ├── clickhouse.py\n│       ├── lancedb.py\n│       ├── redisstack.py\n│       └── vald.py\n├── tests/\n├── scripts/\n│   ├── setup.sh            # Dataset download\n│   └── get_hardware_info.sh\n└── .github/workflows/      # CI workflows\n```\n\n## Output Format\n\nBenchmark results are saved to `results.json` with the following structure:\n\n```json\n{\n  \"variant\": \"\",\n  \"target\": \"100k-768-m32-efc200-ef100-ip\",\n  \"version\": \"1.13.6\",\n  \"settings\": { ... },\n  \"results\": {\n    \"indexing\": {\n      \"execution_time\": 123.45,\n      \"process_time\": 100.23,\n      \"container\": { ... }\n    },\n    \"top_10\": {\n      \"num_of_queries\": 10000,\n      \"took\": { \"mean\": 5.2, \"std\": 1.1, ... },\n      \"hits\": { ... },\n      \"precision\": { \"mean\": 0.95, ... }\n    },\n    \"top_100\": { ... },\n    \"top_10_filtered\": { ... },\n    \"top_100_filtered\": { ... }\n  },\n  \"timestamp\": \"2024-01-01T00:00:00\"\n}\n```\n\n## Development\n\n### Running tests\n\n```bash\nuv run pytest\n```\n\n### Code formatting\n\n```bash\nuv run ruff check --fix src tests\nuv run ruff format src tests\n```\n\n### Type checking\n\n```bash\nuv run mypy src\n```\n\n## Updating Engine Versions\n\nTo update an engine version, modify the `ENGINE_VERSION` in:\n1. The engine config class in `src/search_ann_benchmark/engines/\u003cengine\u003e.py`\n2. The corresponding workflow in `.github/workflows/run-\u003cengine\u003e-linux.yml`\n\n## Benchmark Results\n\nFor a comparison of the results, including response times and precision metrics for different ANN algorithms, see [Benchmark Results Page](https://codelibs.co/benchmark/ann-benchmark.html).\n\n## Contributing\n\nWe welcome contributions!\nIf you have suggestions for additional benchmarks, improvements to existing ones, or fixes for any issues, please feel free to open an issue or submit a pull request.\n\n## License\n\nThis project is licensed under the Apache License 2.0.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcodelibs%2Fsearch-ann-benchmark","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcodelibs%2Fsearch-ann-benchmark","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcodelibs%2Fsearch-ann-benchmark/lists"}