https://github.com/jalalmostafa/scits

A tool to benchmark Time-series databases
https://github.com/jalalmostafa/scits

benchmark benchmark-framework benchmarking benchmarking-suite benchmarks clickhouse database database-management databases influxdb metrics postgres postgresql scientific-computing scientific-publications time-series timescale timescaledb timeseries

Last synced: about 2 months ago
JSON representation

A tool to benchmark Time-series databases

Host: GitHub
URL: https://github.com/jalalmostafa/scits
Owner: jalalmostafa
Created: 2021-11-17T10:42:53.000Z (over 3 years ago)
Default Branch: master
Last Pushed: 2024-08-27T10:20:59.000Z (10 months ago)
Last Synced: 2025-03-30T14:51:12.630Z (3 months ago)
Topics: benchmark, benchmark-framework, benchmarking, benchmarking-suite, benchmarks, clickhouse, database, database-management, databases, influxdb, metrics, postgres, postgresql, scientific-computing, scientific-publications, time-series, timescale, timescaledb, timeseries
Language: Jupyter Notebook
Homepage:
Size: 130 MB
Stars: 13
Watchers: 2
Forks: 4
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# SciTS v2, 2023 update

A tool to benchmark Time-series on different databases

- reworked architecture
- adds mixed, online workloads
- adds regular and irregular ingestion modes.
- adds multiple values per time series ("Dimensions")
- adds limited queries
- adds CLI arguments
- adds ClientLatency metric to measure differences in local processing.

for questions on these features, please contact [email protected]

Requires .NET 7.x cross-platform framework.

## Citation

[![DOI](https://zenodo.org/badge/429005385.svg)](https://zenodo.org/badge/latestdoi/429005385)

Please cite our work:

> Jalal Mostafa, Sara Wehbi, Suren Chilingaryan, and Andreas Kopmann. 2022. SciTS: A Benchmark for Time-Series Databases in Scientific Experiments and Industrial Internet of Things. In 34th International Conference on Scientific and Statistical Database Management (SSDBM 2022). Association for Computing Machinery, New York, NY, USA, Article 12, 1–11. https://doi.org/10.1145/3538712.3538723

### Bibtex

```bibtex
@inproceedings{10.1145/3538712.3538723,
author = {Mostafa, Jalal and Wehbi, Sara and Chilingaryan, Suren and Kopmann, Andreas},
title = {SciTS: A Benchmark for Time-Series Databases in Scientific Experiments and Industrial Internet of Things},
year = {2022},☺
isbn = {9781450396677},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3538712.3538723},
doi = {10.1145/3538712.3538723},
abstract = {Time-series data has an increasingly growing usage in Industrial Internet of Things (IIoT) and large-scale scientific experiments. Managing time-series data needs a storage engine that can keep up with their constantly growing volumes while providing an acceptable query latency. While traditional ACID databases favor consistency over performance, many time-series databases with novel storage engines have been developed to provide better ingestion performance and lower query latency. To understand how the unique design of a time-series database affects its performance, we design SciTS, a highly extensible and parameterizable benchmark for time-series data. The benchmark studies the data ingestion capabilities of time-series databases especially as they grow larger in size. It also studies the latencies of 5 practical queries from the scientific experiments use case. We use SciTS to evaluate the performance of 4 databases of 4 distinct storage engines: ClickHouse, InfluxDB, TimescaleDB, and PostgreSQL.},
booktitle = {Proceedings of the 34th International Conference on Scientific and Statistical Database Management},
articleno = {12},
numpages = {11},
keywords = {time-series databases, database management systems, industrial internet of things, scientific experiments, sensor data, time-series},
location = {Copenhagen, Denmark},
series = {SSDBM '22}
}
```

# How to run

1. Create your workload as `App.config` (case-sensitive) in `BenchmarkTool`.
2. Edit the connection strings to your database servers in the workload file.
3. Choose the target database in the workload file using `TargetDatabase` element.
4. run `dotnet run --project BenchmarkTool write` if it's an ingestion workload,
and `dotnet run --project BenchmarkTool read` if it's a query workload.
x. Use `Scripts/ccache.sh ` to clear the cache between query tests.

## Additional Command Line options:

`dotnet run --project BenchmarkTool [action] [regular/irregular] [DatabaseNameDB]`

Available Actions:

* read: start the specified retrieval and aggregation workloads.
* write: start the ingestion across specified batchsize, number of clients, dimensions.
* mixed-AggQueries: start the online, mixed workload benchmark as a mixture of aggregated quieries and Ingestion-Parameters
* mixed-LimitedQueries: start the online, mixed workload benchmark as a mixture of queried and ingested datapoints according the specified percentage parameter and the requested Ingestion-Parameters. E.g. 100% means that as much datapoints are retrieved as ingested.

## System Metrics using Glances

This tool uses [glances](https://github.com/nicolargo/glances/).
1. Install glances with all plugins on the database server using `pip install glances[all]`
2. Run glances REST API on the database server using `glances -w --disable-webui`

## Workload Definition Files

you can open Default-App.config edit it and save it as App.config.
It has following content:
```xml

```

### Workload Files

You can choose from the available workloads by choosing a `*.config` file from `Workloads` folder.
The file to workload mapping is as follow:

| Workload | Workload file |
| ----------- | ---------------------------------- |
| 2022 WLs | |
| ----------- | |
| Q1 | query-q1.config |
| Q2 | query-q2.config |
| Q3 | query-q3.config |
| Q4 | query-q4.config |
| Q5 | query-q5.config |
| Batching | ingestion-batching-1client.config |
| Concurrency | ingestion-batching-nclients.config |
| Scaling | ingestion-scaling.config |
|.............|....................................|
|Collection | |
|of 2023 WLs | test2023.sh |

#### Timescale
We discovered abnormal high latencies and other failures with NPGSQL, so we embedded a python script which does the queries.
Therefore you need to configure the python location. In case of python 3.10, e.g.
use "whereis libpython3.10.so", and copy this path.
Then you go into TimescaleDB.cd, and edit the string in line 134 [ Runtime.PythonDLL = "/usr/lib/_Architecture_-linux-gnu/libpython3.10.so"; ]

So it points to your python location

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jalalmostafa/scits

Awesome Lists containing this project

README