{"id":18976306,"url":"https://github.com/quickwit-oss/benchmarks","last_synced_at":"2025-08-20T17:05:04.653Z","repository":{"id":232543353,"uuid":"782481210","full_name":"quickwit-oss/benchmarks","owner":"quickwit-oss","description":null,"archived":false,"fork":false,"pushed_at":"2024-04-10T06:50:57.000Z","size":1,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2024-04-10T07:39:51.488Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/quickwit-oss.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2024-04-05T11:40:34.000Z","updated_at":"2024-04-10T07:39:55.443Z","dependencies_parsed_at":"2024-04-10T07:50:04.758Z","dependency_job_id":null,"html_url":"https://github.com/quickwit-oss/benchmarks","commit_stats":null,"previous_names":["quickwit-oss/benchmarks"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/quickwit-oss%2Fbenchmarks","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/quickwit-oss%2Fbenchmarks/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/quickwit-oss%2Fbenchmarks/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/quickwit-oss%2Fbenchmarks/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/quickwit-oss","download_url":"https://codeload.github.com/quickwit-oss/benchmarks/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239978057,"owners_count":19728271,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-08T15:23:31.093Z","updated_at":"2025-02-21T07:43:22.238Z","avatar_url":"https://github.com/quickwit-oss.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Benchmark for Logs \u0026 Traces Search Engines\n\n## Overview\n\nThis benchmark is designed to measure the performance of various search engines for logs and traces use cases and more generally for append-only semi-structured data.\n\nThe benchmark makes use of two datatsets:\n- A 1TB dataset sampled from the [GitHub Archive](https://www.gharchive.org/) dataset.\n- A 1TB log datasets generated with the [https://github.com/elastic/elastic-integration-corpus-generator-tool](elastic-integration-corpus-generator-tool)\n\nWe plan to add a trace dataset soon.\n\nThe supported engines are:\n- [Quickwit](https://quickwit.io)\n- [Elasticsearch](https://www.elastic.co/)\n- [Loki](https://grafana.com/oss/loki/) (only for generated logs)\n\n\n## Prerequisites\n\n### Dependencies\n\n- [Make](https://www.gnu.org/software/make/) to ease the running of the benchmark.\n- [Docker](https://docs.docker.com/get-docker/) to run the benchmarked engines, including the Python API.\n- [Python3](https://www.python.org/downloads/) to download the dataset and run queries against the benchmarked engines.\n- [Rust](https://www.rust-lang.org/tools/install) and `openssl-devel` to build the ingestion tool `qbench`.\n- [gcloud](https://cloud.google.com/sdk/docs/install) to download datasets.\n- Various python packages installed with `pip install -r requirements.txt`\n\n### Build qbench\n\n```bash\n\ncd qbench\ncargo build --release\n\n```\n\n### Download datasets\n\nFor the generated logs dataset:\n\n```bash\nmkdir -p datasets\ngcloud storage cp \"gs://quickwit-datasets-public/benchmarks/generated-logs/generated-logs-v1-????.ndjson.gz\" datasets/\n```\n\n## Running the benchmark manually\n\n### Start engines\n\nGo to desired engines subdirs `engines/\u003cengine_name\u003e` and run `make start`.\n\n### Indexing phase\n\n```bash\n\npython3 run.py --engine quickwit --storage SSD --track generated-logs --instance m1 --tags my-bench-run --indexing-only\n\n```\n\nBy default this will export results to the [benchmark\nservice](service/README.md) accessible at [this\naddress](https://qw-benchmarks.104.155.161.122.nip.io).\nThe first time this runs, you will be re-directed to a web page where\nyou should login with you Google account and pass back a token to run.py (just follow the\ninstructions the tool prints).\nExporting to the benchmark service can be disabled by passing the flag `--export-to-endpoint \"\"`\n\nAfter indexing (and if exporting to the service was not disabled), the tool will print a URL to access results, e.g.:\nhttps://qw-benchmarks.104.155.161.122.nip.io/?run_ids=678\n\nResults will also be saved to a `results/{track}.{engine}.{tag}.{instance}/indexing-results.json` file.\n\n```json\n{\n  \"doc_per_second\": 8752.761519421289,\n  \"engine\": \"quickwit\",\n  \"index\": \"generated-logs\",\n  \"indexing_duration_secs\": 1603.68884367,\n  \"mb_bytes_per_second\": 22.77175235654048,\n  \"num_indexed_bytes\": 18840178633,\n  \"num_indexed_docs\": 14036706,\n  \"num_ingested_bytes\": 36518805205,\n  \"num_ingested_docs\": 14036706,\n  \"num_splits\": 12\n}\n```\n\n\n### Execute the queries\n\n```bash\n\npython3 run.py --engine quickwit --storage SSD --track generated-logs --instance m1 --tags my-bench-run --search-only\n\n```\n\nThe results will also be exported to the service and saved to a `results/{track}.{engine}.{tag}.{instance}/search-results.json` file.\n\n```json\n{\n    \"engine\": \"quickwit\",\n    \"index\": \"generated-logs\",\n    \"queries\": [\n        {\n            \"id\": 0,\n            \"query\": {\n                \"query\": \"payload.description:the\",\n                \"sort_by_field\": \"-created_at\",\n                \"max_hits\": 10\n            },\n            \"tags\": [\n                \"search\"\n            ],\n            \"count\": 138290,\n            \"duration\": [\n                8843,\n                9131,\n                9614\n            ],\n            \"engine_duration\": [\n                7040,\n                7173,\n                7508\n            ]\n        }\n    ]\n}\n```\n\n## Exploring results\n\nUse the [Benchmark Service web page](https://qw-benchmarks.104.155.161.122.nip.io).\n\n### Run comparison\n\nThe default page allows selecting and comparing runs:\n[example](https://qw-benchmarks.104.155.161.122.nip.io/?run_ids=779,780,771,772).\n\nRuns are identified by a numerical ID and are automatically named\n`\u003cengine\u003e.\u003cstorage\u003e.\u003cinstance\u003e.\u003cshort_commit_hash\u003e.\u003ctag\u003e`.\nFor now, names are allowed to collide, i.e. a given name can refer to\nmultiple runs. In that case, selecting a name in the list of runs to\ncompare will show the most recent indexing run with that name, and the\nmost recent search run with that name.\n\nTips:\n- The URL of the page is a permanent link to the runs shown. This is\n  convenient way to share results.\n- Clicking on the run name in the comparison shows the raw run results\n  with additional information.\n- It's fine if a run only has indexing or search results.\n- The full list of runs is loaded when the web page is loaded, so you\n  may need to reload it to see your latest runs.\n\n### Graphs\n\nThe [graphs\npage](https://qw-benchmarks.104.155.161.122.nip.io/?page=graphs)\nallows plotting graphs of indexing and search run results over time\n([example](https://qw-benchmarks.104.155.161.122.nip.io/?page=graphs\u0026track=generated-logs\u0026run_filter_display_name=quickwit.pd-ssd.c3-standard-4.docker_edge)). Only\nruns with `source` `continuous_benchmarking` or `github_workflow` are\nshown there. Runs are identified by a string\n`\u003cengine\u003e.\u003cstorage\u003e.\u003cinstance\u003e.\u003ctag\u003e` (note the absence of commit\nhash) which refers to a series of indexing and search runs over time.\n\nTip:\n- The URL of the page is a permanent link to the series of runs\n  shown. Later visits can contain additional data points.\n- Clicking on a point in any graph opens the comparison page between\n  the run that contributed the point to the run that contributed the\n  previous point.\n\n\n### Running the service\n\nSee [here](service/README.md) for running the benchmark service.\n\n## Loki VS Quickwit (WIP)\n\nDetails of the comparison can be found [here](loki_quickwit_benchmark.md).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fquickwit-oss%2Fbenchmarks","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fquickwit-oss%2Fbenchmarks","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fquickwit-oss%2Fbenchmarks/lists"}