{"id":18429080,"url":"https://github.com/lancedb/lancedb-cloud-benchmarks","last_synced_at":"2025-04-13T21:18:00.383Z","repository":{"id":260088476,"uuid":"878612163","full_name":"lancedb/lancedb-cloud-benchmarks","owner":"lancedb","description":null,"archived":false,"fork":false,"pushed_at":"2025-04-01T13:05:41.000Z","size":160,"stargazers_count":0,"open_issues_count":2,"forks_count":0,"subscribers_count":7,"default_branch":"main","last_synced_at":"2025-04-13T21:17:57.314Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lancedb.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-25T18:00:31.000Z","updated_at":"2025-04-01T13:05:45.000Z","dependencies_parsed_at":"2024-11-13T18:28:36.244Z","dependency_job_id":"67bc883e-faa5-498e-806b-9a5c1ae38273","html_url":"https://github.com/lancedb/lancedb-cloud-benchmarks","commit_stats":null,"previous_names":["lancedb/lancedb-cloud-benchmarks"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lancedb%2Flancedb-cloud-benchmarks","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lancedb%2Flancedb-cloud-benchmarks/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lancedb%2Flancedb-cloud-benchmarks/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lancedb%2Flancedb-cloud-benchmarks/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lancedb","download_url":"https://codeload.github.com/lancedb/lancedb-cloud-benchmarks/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248782259,"owners_count":21160717,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-06T05:15:44.127Z","updated_at":"2025-04-13T21:18:00.377Z","avatar_url":"https://github.com/lancedb.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# lancedb-cloud-benchmarks\n\nBenchmarking tools for LanceDB Cloud and LanceDB Enterprise.\n\n### Background\n\nThis benchmark script will download the [dbpedia-entities-openai-1M](https://huggingface.co/datasets/KShivendu/dbpedia-entities-openai-1M) dataset,\ningest and index it into N tables in LanceDB Cloud/Enterprise and run vector searches on the tables.\n\nIt will report on ingestion time, indexing completion time, and query performance percentiles.\n\nFurther metrics can be gathered by the [LanceDB team](mailto:contact@lancedb.com) upon request.\n\n\n### Running benchmarks\n\n1. Install uv\n\n`curl -LsSf https://astral.sh/uv/install.sh | sh`\n\nNote: on some systems, you may need to install clang (i.e. `sudo yum install clang`)\n\n2. Install dependencies\n\n```\nuv pip install . --index-strategy unsafe-best-match\n```\n\n3. Configure environment\n```\nexport LANCEDB_API_KEY=\u003cyour api key\u003e`\nexport LANCEDB_DB_URI=\u003cyour db uri from lancedb cloud console, i.e. \"db://mydb-d5ac3e\"\u003e`\nexport LANCEDB_HOST_OVERRIDE=\u003coptional uri if using lancedb enterprise\u003e`\n```\n\n4. Run the benchmark\n\n`uv run bench.py`\n\n### Examples\n\nIngest the dataset into 4 tables and run 10k queries with a custom table prefix:\n\n`uv run bench.py -t 4 -q 10000 -p mytable`\n\nRun query benchmark only against existing tables:\n\n`uv run bench.py -t 4 -q 10000 --no-ingest --no-index`\n\n\n### Help\nRun `uv run bench.py --help`\n\n### Deployment recommendations\n\nTo get representative results, it is recommended to run the benchmarking script in a production-like environment with adequate network bandwidth and deployed in a region/datacenter close to the LanceDB endpoint.\n\nFor LanceDB Cloud, it is recommended to deploy the benchmarking script in **AWS us-east-1** region.\n\n### Scaling out\n\nAt high traffic levels, ingestion and query performance may be limited in a single Python client process. It is possible to scale out\nto larger aggregate numbers by using multiple processes or even distributing across multiple VMs. In this case, the result metrics will need to be aggregated\nto get the total QPS and throughput.\n\n#### Example 1: Each Process Creates a Table and Queries It\nTo run multiple query benchmarks in parallel processes, you can initiate several benchmarking processes,\neach creating and querying its own tables.\nThe following command demonstrates how to start 4 benchmarking processes,\neach querying 4 tables with the table name prefix \"my-prefix\":\n```\nuv run bench.py -n 4 -t 4 -p my-prefix -q 10000 -r\n```\nParameters:\n`-n 4`: Number of benchmarking processes to run\n`-t 4`: Number of tables to query per process\n`-p my-prefix`: Prefix for the table names\n`-q 10000`: Number of queries to run against each table\n`-r`: recreate the table and indices if exist\n\nAfter the initial setup, you can rerun the query performance tests without recreating tables or indexes by using the following command:\n```\nuv run bench.py -n 4 -t 4 -p my-prefix -q 10000 --no-ingest --no-index\n```\nAdditional Flags:\n`--no-ingest`: Skips the table creation step.\n`--no-index`: Skips the index creation step.\n\n#### Example 2: Multiple Processes Querying the Same Table\nIn this scenario, you first create a table with a specified name prefix and the necessary indexes.\nUse the following command to set this up:\n```\nuv run bench.py -t 1 -p my-prefix -q 0 -r\n```\nParameters:\n`-t 1`: Create one table (the same table will be queried by multiple processes).\n`-p my-prefix`: Prefix for the table name.\n`-q 0`: No queries are run during the table creation.\n`-r`: recreate the table and indices if exist\n\nOnce the table is created, you can launch multiple processes to query against the same table.\nEach process will run 10,000 queries. Use the following command:\n```\nuv run bench.py -t 1 -p my-prefix --no-ingest --no-index --query-processes 5 -q 10000\n```\nParameters:\n`--query-processes 5`: Specifies that 5 processes will query the same table concurrently.\n`--no-ingest`: Skips the table creation step.\n`--no-index`: Skips the index creation step.\n`-q 10000`: Number of queries each process will run against each table","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flancedb%2Flancedb-cloud-benchmarks","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flancedb%2Flancedb-cloud-benchmarks","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flancedb%2Flancedb-cloud-benchmarks/lists"}