{"id":25790487,"url":"https://github.com/hpi-epic/espbench","last_synced_at":"2025-02-27T12:07:12.471Z","repository":{"id":54783583,"uuid":"241661423","full_name":"hpi-epic/ESPBench","owner":"hpi-epic","description":"ESPBench - The Enterprise Stream Processing Benchmark","archived":false,"fork":false,"pushed_at":"2023-12-27T16:18:02.000Z","size":1239,"stargazers_count":13,"open_issues_count":0,"forks_count":2,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-04-21T03:12:42.969Z","etag":null,"topics":["benchmark","stream-processing"],"latest_commit_sha":null,"homepage":"","language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hpi-epic.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-02-19T16:11:18.000Z","updated_at":"2023-12-15T17:42:17.000Z","dependencies_parsed_at":"2022-08-14T02:51:20.035Z","dependency_job_id":null,"html_url":"https://github.com/hpi-epic/ESPBench","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hpi-epic%2FESPBench","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hpi-epic%2FESPBench/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hpi-epic%2FESPBench/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hpi-epic%2FESPBench/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hpi-epic","download_url":"https://codeload.github.com/hpi-epic/ESPBench/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241010213,"owners_count":19893502,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["benchmark","stream-processing"],"created_at":"2025-02-27T12:07:11.769Z","updated_at":"2025-02-27T12:07:12.450Z","avatar_url":"https://github.com/hpi-epic.png","language":"Scala","funding_links":[],"categories":[],"sub_categories":[],"readme":"ESPBench - The Enterprise Stream Processing Benchmark \n---\n[![DOI](https://zenodo.org/badge/241661423.svg)](https://zenodo.org/badge/latestdoi/241661423)\n\nThis repository contains *ESPBench*, the enterprise streaming benchmark. It allows comparing [data stream processing systems](https://en.wikipedia.org/wiki/Stream_processing) (DSPSs) and architectures in an enterprise context, i.e. in an environemnt where streaming data is integrated with existing, historical business data.\nThis repository contains the ESPBench toolkit, example configurations, and an example query implementation using [Apache Beam](https://github.com/apache/beam). \n\nFor further details, see:\n*Hesse, Guenter, et al. \"ESPBench: The Enterprise Stream Processing Benchmark\", ACM/SPEC International Conference on Performance \nEngineering (ICPE), 2021*\nand the [ESPBench example implementation results](https://github.com/guenter-hesse/ESPBenchExperiments).\n\n#### Table of Contents  \n[1. ESPBench Architecture](#1-espbench-architecture)\u003cbr/\u003e\n[2. ESPBench Process](#2-espbench-process)\u003cbr/\u003e\n[3. Structure of the Project](#3-structure-of-the-project)\u003cbr/\u003e\n[4. ESPBench Setup and Execution](#4-espbench-setup)\u003cbr/\u003e\n[5. ESPBench Results](#5-espbench-results)\n\n## 1. ESPBench Architecture \u003ca name=\"1-espbench-architecture\"/\u003e\nThe overall architecture is visualized in the image below:\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"https://owncloud.hpi.de/index.php/apps/files_sharing/ajax/publicpreview.php?x=2378\u0026y=848\u0026a=true\u0026file=HesseBenchArch.png\u0026t=piRX6IDNo5Gy9bp\u0026scalingup=0\" width=\"600\"\u003e\n\u003c/p\u003e\n\nInput data is sent to [Apache Kafka](https://kafka.apache.org/) by the data sender tool, which is [part of this repository](https://github.com/Gnni/EnterpriseStreamingBenchmark/tree/master/tools/datasender).\nThe DSPS runs the queries. It gets the data from Apache Kafka as input as well as from the enterprise database management system (DBMS) when required by the query.\nAfter the configured time of the benchmark run is over, the validator and result calculator, which are also part of this repository, can check the query result correctness and compute the benchmark results, i.e., the latencies.\n\n## 2. ESPBench Process \u003ca name=\"2-espbench-process\"/\u003e\n\nA brief process overview of ESPBench is visualized in the activity diagram below:\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"https://owncloud.hpi.de/index.php/apps/files_sharing/ajax/publicpreview.php?x=2378\u0026y=848\u0026a=true\u0026file=Screenshot%25202020-02-19%2520at%252014.13.11.png\u0026t=gBJplJAIye1dvy0\u0026scalingup=0\" width=\"900\"\u003e\n\u003c/p\u003e\n\nThe entire process is automated using [Ansible](https://www.ansible.com/) scripts.\n\n## 3. Structure of the Project \u003ca name=\"3-structure-of-the-project\"/\u003e\n\n```\nESPBench\n│   README.md\n│   .gitignore\n│   .gitlab-ci.yml    \n│   build.sbt\n│   scalastyle-config.xml\n│\n└───ci\n│   │   Dockerfile\n│\n└───implementation\n│   └───beam\n│       └───...\n│   \n└───project\n│   │   build.properties\n│   │   Dependencies.scala\n│\n└───tools\n    └───commons\n    │   └───src\n    │       └───...\n    │       │   commons.conf\n    │\n    └───configuration\n    │   └───group_vars\n    │   └───plays\n    │   └───roles\n    │   └───...\n    │   │   ansible.cfg\n    │   │   hosts\n    │\n    └───datasender\n    │   └───src\n    │       └───...\n    │       │   datasender.conf\n    │\n    └───tpc-c_gen\n    │   └───src\n    │       └───...\n    │       │   tpc-c.properties\n    │\n    └───util\n    │   └───...\n    │\n    └───validator\n        └───...\n```\n\n## 4. ESPBench Setup and Execution \u003ca name=\"4-espbench-setup\"/\u003e\nYou find brief instructions in the following. If you are looking for a more detailed setup and execution description, \nplease have a look at [this file](docs/ESPBenchSetupAndExecutionDetailed.md).\n\nAll steps are tested on Ubuntu servers.\n\n- Create a user `benchmarker` on all involved machines that has sudo access\n- Ensure that this user can connect to all machines via ssh w/o password, e.g., through adding the ssh public key to the `authorized_keys` files\n- Install `Apache Kafka` and the DSPSs to be tested under `/opt/` - you can find example configurations in the `tools/configurations` directory\n- Make Apache Kafka a service:\n  - Run `sudo apt install policykit-1`\n  - Copy `tools/configuration/kafka/etc/init.d/kafka` to `/etc/init.d/` on the Apache Kafka servers\n  - Run `update-rc.d kafka defaults`\n- Install `PostgreSQL` on one server (you can use another DBMS, however, that requires some adaptions in the tools)\n- Create the directory `Benchmarks` in the home directory of the user `benchmarker` and clone the repository into this folder\n- The project can be built using `sbt assembly`\n\n###### tools/commons/commons.conf\n- Define the Apache Kafka topic prefix and the benchmark run number, which will have an impact on the Apache Kafka topic names that are going to be created by the ansible scripts\n-  Define the query you want to execute (config for each query in comment)\n- Define the sending interval, which determines the pause between sending two records - a pause of, e.g., 1,000,000ns would result in an input rate of 1,000 records/s\n- Define the benchmark duration\n- Define the Apache Kafka bootstrap servers and zookeeper servers\n\n###### tools/configuration\n- After cloning the repository and setting up the systems, you change to the `tools/configuration` directory and start the benchmark (as shown in the activity diagram above) from here, e.g., via `ansible-playbook -vvvv plays/benchmark-runner-beam.yml` for the example implementations of this repository. The number of `v` define the level of verbosity\n- Adapt the `group_vars/all` files if needed\n- The directories `plays` and `roles` contain several ansible files, which can be adapted if needed. The starting point that represents the entire process is `plays/benchmark-runner-beam.yml` for the example implementation. These scripts also contain information about, e.g., how to start the data sender or the data generation.\n- The `hosts` file needs to be edited, i.e, the servers' IP addresses need to be entered\n\n###### tools/datasender\n- Input data is taken from DEBS 2012 Grand Challenge, which can be downloaded from [ftp://ftp.mi.fu-berlin.de/pub/debs2012/](ftp://ftp.mi.fu-berlin.de/pub/debs2012/)\n- This data file needs to be converted using the `dos2unix` command, and duplicated, so that there are two input files.\n- The two files need to be extended by a machine ID using the following commands (adapt file names):\n  - First file: `awk 'BEGIN { FS = OFS = \"\\t\" } { $(NF+1) = 1; print $0 }' input1.csv \u003e\n output1.csv`\n  - Second file: `awk 'BEGIN { FS = OFS = \"\\t\" } { $(NF+1) = 2; print $0 }' input2.csv \u003e\noutput2.csv`\n- A third input file is `production_times.csv`, which is generated by the TPC-C data generator that is part of this project.\n- The file `datasender.conf` contains Apache Kafka producer configs and the location of the input data files.\n- The file `src/main/resources/application.conf` needs the correct DBMS configuration.\n\n###### tools/tpc-c_gen\nThe `tpc-c.properties` file contains the default setting WRT the number of warehouses and the data output directory. Changes of the output directory require according adaptions in ansible scripts.\n\n###### tools/validator\nThe file `src/main/resources/application.conf` needs the correct DBMS configuration.\n\n###### Example Implementation\nIf you want to use the example implementation, you need to adapt at least two files accordingly:\n- `implementation/beam/src/main/resources/beam.properties`\n- The `beamRunner` variable in `build.sbt`\n\n## 5. ESPBench Results \u003ca name=\"5-espbench-results\"/\u003e\nThe validator will create a `logs` directory that will contain information about the, e.g., query result correctness, and latencies.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhpi-epic%2Fespbench","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhpi-epic%2Fespbench","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhpi-epic%2Fespbench/lists"}