{"id":13815095,"url":"https://github.com/qdrant/vector-db-benchmark","last_synced_at":"2025-05-15T07:31:55.657Z","repository":{"id":61969053,"uuid":"513208282","full_name":"qdrant/vector-db-benchmark","owner":"qdrant","description":"Framework for benchmarking vector search engines","archived":false,"fork":false,"pushed_at":"2025-05-12T08:07:59.000Z","size":1264,"stargazers_count":320,"open_issues_count":31,"forks_count":105,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-05-12T09:38:44.501Z","etag":null,"topics":["benchmark","vector-database","vector-search","vector-search-engine"],"latest_commit_sha":null,"homepage":"https://qdrant.tech/benchmarks/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/qdrant.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-07-12T16:05:59.000Z","updated_at":"2025-05-12T08:07:59.000Z","dependencies_parsed_at":"2023-10-27T14:35:21.924Z","dependency_job_id":"618daaa6-9e5b-411e-8ef7-22efc1d65a83","html_url":"https://github.com/qdrant/vector-db-benchmark","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qdrant%2Fvector-db-benchmark","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qdrant%2Fvector-db-benchmark/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qdrant%2Fvector-db-benchmark/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qdrant%2Fvector-db-benchmark/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/qdrant","download_url":"https://codeload.github.com/qdrant/vector-db-benchmark/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254295944,"owners_count":22047175,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["benchmark","vector-database","vector-search","vector-search-engine"],"created_at":"2024-08-04T04:02:57.477Z","updated_at":"2025-05-15T07:31:55.650Z","avatar_url":"https://github.com/qdrant.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# vector-db-benchmark\n\n![Screenshot from 2022-08-23 14-10-01](https://user-images.githubusercontent.com/1935623/186516524-a61098d4-bca6-4aeb-acbe-d969cf30674e.png)\n\n\u003e [View results](https://qdrant.tech/benchmarks/)\n\nThere are various vector search engines available, and each of them may offer\na different set of features and efficiency. But how do we measure the\nperformance? There is no clear definition and in a specific case you\nmay worry about a specific thing, while not paying much attention to other aspects. This\nproject is a general framework for benchmarking different engines under the\nsame hardware constraints, so you can choose what works best for you.\n\nRunning any benchmark requires choosing an engine, a dataset and defining the\nscenario against which it should be tested. A specific scenario may assume\nrunning the server in a single or distributed mode, a different client\nimplementation and the number of client instances.\n\n## How to run a benchmark?\n\nBenchmarks are implemented in server-client mode, meaning that the server is\nrunning in a single machine, and the client is running on another.\n\n### Run the server\n\nAll engines are served using docker compose. The configuration is in the [servers](./engine/servers/).\n\nTo launch the server instance, run the following command:\n\n```bash\ncd ./engine/servers/\u003cengine-configuration-name\u003e\ndocker compose up\n```\n\nContainers are expected to expose all necessary ports, so the client can connect to them.\n\n### Run the client\n\nInstall dependencies:\n\n```bash\npip install poetry\npoetry install\n```\n\nRun the benchmark:\n\n```bash\n$ poetry shell\n$ python run.py --help\n\nUsage: run.py [OPTIONS]\n\n  Examples:\n\n  python3 run.py --engines \"qdrant-rps-m-*-ef-*\" --datasets \"dbpedia-openai-100K-1536-angular\" # Qdrant RPS mode\n\n  python3 run.py --engines \"*-m-*-ef-*\" --datasets \"glove-*\" # All engines and their configs for glove datasets\n\nOptions:\n  --engines TEXT                  [default: *]\n  --datasets TEXT                 [default: *]\n  --host TEXT                     [default: localhost]\n  --skip-upload / --no-skip-upload\n                                  [default: no-skip-upload]\n  --install-completion            Install completion for the current shell.\n  --show-completion               Show completion for the current shell, to\n                                  copy it or customize the installation.\n  --help                          Show this message and exit.\n```\n\nCommand allows you to specify wildcards for engines and datasets.\nResults of the benchmarks are stored in the `./results/` directory.\n\n## How to update benchmark parameters?\n\nEach engine has a configuration file, which is used to define the parameters for the benchmark.\nConfiguration files are located in the [configuration](./experiments/configurations/) directory.\n\nEach step in the benchmark process is using a dedicated configuration's path:\n\n* `connection_params` - passed to the client during the connection phase.\n* `collection_params` - parameters, used to create the collection, indexing parameters are usually defined here.\n* `upload_params` - parameters, used to upload the data to the server.\n* `search_params` - passed to the client during the search phase. Framework allows multiple search configurations for the same experiment run.\n\nExact values of the parameters are individual for each engine.\n\n## How to register a dataset?\n\nDatasets are configured in the [datasets/datasets.json](./datasets/datasets.json) file.\nFramework will automatically download the dataset and store it in the [datasets](./datasets/) directory.\n\n## How to implement a new engine?\n\nThere are a few base classes that you can use to implement a new engine.\n\n* `BaseConfigurator` - defines methods to create collections, setup indexing parameters.\n* `BaseUploader` - defines methods to upload the data to the server.\n* `BaseSearcher` - defines methods to search the data.\n\nSee the examples in the [clients](./engine/clients) directory.\n\nOnce all the necessary classes are implemented, you can register the engine in the [ClientFactory](./engine/clients/client_factory.py).\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqdrant%2Fvector-db-benchmark","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fqdrant%2Fvector-db-benchmark","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqdrant%2Fvector-db-benchmark/lists"}