https://github.com/quickwit-oss/benchmarks
https://github.com/quickwit-oss/benchmarks
Last synced: 10 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/quickwit-oss/benchmarks
- Owner: quickwit-oss
- License: mit
- Created: 2024-04-05T11:40:34.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-04-10T06:50:57.000Z (about 2 years ago)
- Last Synced: 2024-04-10T07:39:51.488Z (about 2 years ago)
- Language: Rust
- Size: 1000 Bytes
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Benchmark for Logs & Traces Search Engines
## Overview
This benchmark is designed to measure the performance of various search engines for logs and traces use cases and more generally for append-only semi-structured data.
The benchmark makes use of two datatsets:
- A 1TB dataset sampled from the [GitHub Archive](https://www.gharchive.org/) dataset.
- A 1TB log datasets generated with the [https://github.com/elastic/elastic-integration-corpus-generator-tool](elastic-integration-corpus-generator-tool)
We plan to add a trace dataset soon.
The supported engines are:
- [Quickwit](https://quickwit.io)
- [Elasticsearch](https://www.elastic.co/)
- [Loki](https://grafana.com/oss/loki/) (only for generated logs)
## Prerequisites
### Dependencies
- [Make](https://www.gnu.org/software/make/) to ease the running of the benchmark.
- [Docker](https://docs.docker.com/get-docker/) to run the benchmarked engines, including the Python API.
- [Python3](https://www.python.org/downloads/) to download the dataset and run queries against the benchmarked engines.
- [Rust](https://www.rust-lang.org/tools/install) and `openssl-devel` to build the ingestion tool `qbench`.
- [gcloud](https://cloud.google.com/sdk/docs/install) to download datasets.
- Various python packages installed with `pip install -r requirements.txt`
### Build qbench
```bash
cd qbench
cargo build --release
```
### Download datasets
For the generated logs dataset:
```bash
mkdir -p datasets
gcloud storage cp "gs://quickwit-datasets-public/benchmarks/generated-logs/generated-logs-v1-????.ndjson.gz" datasets/
```
## Running the benchmark manually
### Start engines
Go to desired engines subdirs `engines/` and run `make start`.
### Indexing phase
```bash
python3 run.py --engine quickwit --storage SSD --track generated-logs --instance m1 --tags my-bench-run --indexing-only
```
By default this will export results to the [benchmark
service](service/README.md) accessible at [this
address](https://qw-benchmarks.104.155.161.122.nip.io).
The first time this runs, you will be re-directed to a web page where
you should login with you Google account and pass back a token to run.py (just follow the
instructions the tool prints).
Exporting to the benchmark service can be disabled by passing the flag `--export-to-endpoint ""`
After indexing (and if exporting to the service was not disabled), the tool will print a URL to access results, e.g.:
https://qw-benchmarks.104.155.161.122.nip.io/?run_ids=678
Results will also be saved to a `results/{track}.{engine}.{tag}.{instance}/indexing-results.json` file.
```json
{
"doc_per_second": 8752.761519421289,
"engine": "quickwit",
"index": "generated-logs",
"indexing_duration_secs": 1603.68884367,
"mb_bytes_per_second": 22.77175235654048,
"num_indexed_bytes": 18840178633,
"num_indexed_docs": 14036706,
"num_ingested_bytes": 36518805205,
"num_ingested_docs": 14036706,
"num_splits": 12
}
```
### Execute the queries
```bash
python3 run.py --engine quickwit --storage SSD --track generated-logs --instance m1 --tags my-bench-run --search-only
```
The results will also be exported to the service and saved to a `results/{track}.{engine}.{tag}.{instance}/search-results.json` file.
```json
{
"engine": "quickwit",
"index": "generated-logs",
"queries": [
{
"id": 0,
"query": {
"query": "payload.description:the",
"sort_by_field": "-created_at",
"max_hits": 10
},
"tags": [
"search"
],
"count": 138290,
"duration": [
8843,
9131,
9614
],
"engine_duration": [
7040,
7173,
7508
]
}
]
}
```
## Exploring results
Use the [Benchmark Service web page](https://qw-benchmarks.104.155.161.122.nip.io).
### Run comparison
The default page allows selecting and comparing runs:
[example](https://qw-benchmarks.104.155.161.122.nip.io/?run_ids=779,780,771,772).
Runs are identified by a numerical ID and are automatically named
`....`.
For now, names are allowed to collide, i.e. a given name can refer to
multiple runs. In that case, selecting a name in the list of runs to
compare will show the most recent indexing run with that name, and the
most recent search run with that name.
Tips:
- The URL of the page is a permanent link to the runs shown. This is
convenient way to share results.
- Clicking on the run name in the comparison shows the raw run results
with additional information.
- It's fine if a run only has indexing or search results.
- The full list of runs is loaded when the web page is loaded, so you
may need to reload it to see your latest runs.
### Graphs
The [graphs
page](https://qw-benchmarks.104.155.161.122.nip.io/?page=graphs)
allows plotting graphs of indexing and search run results over time
([example](https://qw-benchmarks.104.155.161.122.nip.io/?page=graphs&track=generated-logs&run_filter_display_name=quickwit.pd-ssd.c3-standard-4.docker_edge)). Only
runs with `source` `continuous_benchmarking` or `github_workflow` are
shown there. Runs are identified by a string
`...` (note the absence of commit
hash) which refers to a series of indexing and search runs over time.
Tip:
- The URL of the page is a permanent link to the series of runs
shown. Later visits can contain additional data points.
- Clicking on a point in any graph opens the comparison page between
the run that contributed the point to the run that contributed the
previous point.
### Running the service
See [here](service/README.md) for running the benchmark service.
## Loki VS Quickwit (WIP)
Details of the comparison can be found [here](loki_quickwit_benchmark.md).