{"id":28501983,"url":"https://github.com/fluent/benchmark-framework","last_synced_at":"2025-07-05T02:31:33.519Z","repository":{"id":235296672,"uuid":"790459629","full_name":"fluent/benchmark-framework","owner":"fluent","description":"Local benchmark framework for log agents","archived":false,"fork":false,"pushed_at":"2024-06-26T16:41:32.000Z","size":2010,"stargazers_count":1,"open_issues_count":0,"forks_count":2,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-06-08T16:08:35.008Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fluent.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-04-22T23:15:08.000Z","updated_at":"2025-01-02T21:10:19.000Z","dependencies_parsed_at":"2024-04-23T00:31:53.285Z","dependency_job_id":"feea65aa-513c-4746-8d44-cb0d4f1a9dc9","html_url":"https://github.com/fluent/benchmark-framework","commit_stats":null,"previous_names":["fluent/benchmark-framework"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/fluent/benchmark-framework","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fluent%2Fbenchmark-framework","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fluent%2Fbenchmark-framework/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fluent%2Fbenchmark-framework/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fluent%2Fbenchmark-framework/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fluent","download_url":"https://codeload.github.com/fluent/benchmark-framework/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fluent%2Fbenchmark-framework/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":263671743,"owners_count":23494025,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-08T16:08:40.384Z","updated_at":"2025-07-05T02:31:33.513Z","avatar_url":"https://github.com/fluent.png","language":"Python","readme":"# Log Processor Benchmark\n\nThis is a generic benchmark to measure the performance of log processors for given scenarios.\n\nA scenario is a particular test that can be configured and executed with fluent-bit, fluentd, \nstanza, and vector.\n\n## Setup Environment\n\nBefore run program must create as  virtual environment, follow these steps:\n\n1. Set Up a Virtual Environment\nA virtual environment allows you to create an isolated Python environment where you can install packages and run programs independently of your system-wide Python installation.\n\npip install virtualenv\n\nNow, create a new virtual environment. Navigate to your project directory in the terminal and run:\n\nvirtualenv venv\n\nThis command creates a directory named venv which contains a complete Python environment isolated from your system Python.\n\nActivate the Virtual Environment\n* On macOS/Linux:\n\nsource venv/bin/activate\n\n* On Windows:\n\nvenv\\Scripts\\activate\n\nAfter activation, your terminal prompt will change to indicate that you are now using the virtual environment ((venv) will typically appear at the beginning of the prompt).\n\n2. Install Dependencies (if any)\n\nIf program X has dependencies specified in a requirements.txt file, you can install them into your virtual environment using pip:\n\npip install -r requirements.txt\n\nReplace requirements.txt with the actual name of the requirements file if it's different.\n\n### Description of the directory structure\n\nThe scenario.py file describes the scenarios and its sub scenarios (i.e. run the same scenario with different data sets).\nThe config folder contains the scenario specific configuration file for each log processor.\nThe name of the folder has to be one of these: fluent-bit, stanza, fluentd, vector\n\nEach scenario has its own folder inside the scenarios folder, under the structure:\n\n### benchmark-framework \u003e scenarios \u003e http_http : \n\nThis scenario sends JSON log lines via HTTP requests to the log processor.\nThe output of the log processor is pointing to HTTP as well.\nFor the HTTP output a https-benchmark-server instance is started by the scenario.\nThe scenario is done once all sent http requests are received by the backend or the maximum scenario time has elapsed.\n\n### benchmark-framework \u003e scenarios \u003e http_null: \n\nThis scenario sends JSON log lines via HTTP requests to the log processor.\nThe output of the log processor is pointing to NULL.\nThe scenario is done once all http requests are sent to the log processor.\nThis means it measures how fast a log processor can consume/buffer the log requests.\n\n### benchmark-framework \u003e scenarios \u003e tail_http :\n\nThis scenario creates a log file in the data folder.\nOnce the log processor is started it processes this pre-existing file.\nThe output of the log processor is pointing to HTTP.\nFor the HTTP output a https-benchmark-server instance is started by the scenario.\nThe scenario is done once all the log lines are received by the backend or the maximum scenario time has elapsed.\n\n### benchmark-framework \u003e scenarios \u003e tail_null :\n\nThis scenario creates a log file in the data folder.\nOnce the log processor is started it processes this pre-existing file.\nThe output of the log processor is pointing to NULL.\nThe scenario is considered done once the configured scenario time has elapsed.\n\n### benchmark-framework \u003e scenarios \u003e tcp_null :\n\nThis scenario sends JSON log lines via tcp/socket requests to the log processor.\nThe output of the log processor is pointing to NULL.\nThe scenario is done once all the tcp/socket requests are sent.\n\n### benchmark-framework \u003e scenarios \u003e tcp_tcp : \n\nThis scenario sends JSON log lines via tcp/socket requests to the log processor.\nThe output of the log processor is pointing to tcp/socket as well.\nFor the tcp/socket output a socket server instance is started by the scenario.\nThe scenario is done once all sent requests are received by the backend or the maximum scenario time has elapsed.\n\n## Prerequisites\n\n1. Python Interpreter\n\nThe log processor requires python 3 (tested with 3.9.10, 3.10 and 3.12) and the python dependencies listed \nin the ([requirements.txt](requirements.txt)) file.\n\nTo do this, you must install the dependencies in your Python virtual environment by running the command: \npip install -r requirements.txt (ideally within your virtual environment created for Python 3.x, or if \nit is not a Python virtual environment, use pip3 install -r requirements.txt).\n\n2. Log Processor\n\nIn addition you need to have the log processor executables on your path:\n\n\n* [Download fluent-bit](https://docs.fluentbit.io/manual/installation/getting-started-with-fluent-bit)\n* [Download stanza](https://github.com/observIQ/stanza)\n* [Download fluentd](https://www.fluentd.org/download)\n* [Download vector](https://github.com/vectordotdev/vector)\n* [Download otel-collector](https://opentelemetry.io/docs/collector/installation/)\n\n3. For the HTTP scenarios you need the ([https-benchmark-server](https://github.com/chronosphereio/calyptia-https-benchmark-server)) on your path as well.\n\nThe https-benchmark-server is a program written in Go. From the provided link, you can download the Docker \nimages. However, the execution of the benchmark (via benchmark.py) will attempt to find the executable in \nyour operating system's path, so it is necessary to provide an environment for compiling and building \nthe https-benchmark-server executable. Once you have built it, copy it to your environment's path or modify \nyour environment to add the directory where https-benchmark-server is located to the search path.\n\n4. Environment variable\n\nPlease ensure you have PYTHONPYCACHEPREFIX environment variable set (i.e. /tmp/.pycache) to avoid __pycache__ in the project.\n\n## Limitations on macOS\n\nLimitations of psutil on macOS\n\nI/O Counters Access (io_counters):\nOn macOS, the io_counters() function of psutil is not supported, resulting in an AttributeError when attempting to \naccess this property for processes.\n\nAlternative: There is no direct alternative in psutil for macOS to obtain I/O counters. \nFor detailed I/O information, you may need OS-specific tools like dtrace.\n\nDue to this limitation, all tests will fail when attempting to tally input/output operations, and obtaining \nsuch a metric in monitor_pid.py will fail, but it won't be blocking, and the program will continue.\n\nThe failure due to library limitation occurs in:\n\n#### def _get_io_read(proc, withchildren)\n#### def _get_io_write(proc, withchildren)\n\n## Config / Run the Benchmark\n\nThis version of benchmark-framework incorporates configuration via a YAML file: log-processor.yaml.\n\nFile struture: \n```yaml\nagents:\n  - name: fluent-bit\n    version: 1.8\n    path: /opt/fluent-bit/bin/fluent-bit\n  - name: vector\n    version: 0.21.0\n    path: /home/aditya/.vector/bin/vector\n  - name: stanza\n    version: 0.3.0\n    path: /home/stanza/bin/stanza\n  - name: otel-collector\n    version: 0.103.0-dev\n    path: /opt/opentelemetry-collector-contrib/bin/otelcontribcol\nscenarios:\n  type:\n    - tail_http\n    - http_http\n    - http_null\n    - tail_null\n    - tcp_null\n    - tcp_tcp\n  agents_scenarios:\n    - fluent-bit\n    - vector\n    - stanza\n    - fluentd\n    - otel-collector\nlogging:\n  version: 1\n  handlers:\n    console:\n      level: DEBUG\n      stream: ext://sys.stdout\n    file:\n      level: DEBUG\n      filename: default.log\n  root:\n    level: DEBUG\n    handlers: [file]\n```\nIn this structure you can define:\n\n### agents: \n\nThese are the agents that are available to be executed (name, version \u0026 path ).\n\n### scenarios\u003etypes: \n\nThese are the types of scenarios that are currently available. \nAll those that appear in this list will be executed. If you want to exclude any, \nsimply add the # symbol in front of the list item, for example:\n```yaml\nscenarios:\n  type:\n    - tail_http\n    - http_http\n    #- http_null\n    - tail_null\n    #- tcp_null\n    #- tcp_tcp\n```\nThis indicates that all those starting with # should be ignored.\n\n### scenarios\u003eagents_scenarios: \n\nThese are the agents that will be executed for the defined scenarios. Similar to the previous point, \nif you don't want to execute a particular agent, simply add the # symbol in front of the agent's \nlist item to be ignored, for example:\n\n```yaml\nagents_scenarios:\n    - fluent-bit\n    #- vector\n    #- stanza\n    #- fluentd\n    - otel-collector\n```\nWe have also added a section for the output log information of the benchmark program, \nwhich indicates the name of the output log file and whether it should be sent to \nconsole or file:\n\nlogging\u003ehandlers\u003efile\u003efilename: Indicates the name of the output file. \nThis filename can be modified here in log-processor.yaml or specified in the command line:\n\n benchmark.py --logfile \u003cfilename\u003e.log\n\nThis will take priority over the filename specified in the logging section of the YAML.\n\n## Execution of benchmark.py\n\nIf executed as python benchmark.py without specifying parameters, the configuration indicated in \nlog-processor.yaml will be used. This will always take precedence over the scenarios and agents \nspecified via the command line.\n\nOnly the log output to file takes precedence over what is indicated in the YAML configuration file.\n\n`python benchmark.py`\n\nThe following only applies when the configuration file log-processor.yaml is not available.\n\nIt will run all scenarios for all agents (fluent-bit, fluentd, stanza, and vector).\n\n## Run the Benchmark\n\n`python benchmark.py`\n\nIt will run all scenarios for all agents configured in log-processor.yaml (fluent-bit, fluentd, stanza, and vector).\n\nWhen using `python benchmark.py`, it will always use the configuration file log-processor.yaml, which, as seen before, \nconfigures the agents, specifies where to find each one, lists the available scenarios to execute, and specifies \nwhich agent to run in particular.\n\nIf using command-line parameters is needed, follow these instructions:\n\nIf you need to define a specific scenario or a set of them, you should specify the --scenarios parameter \nfollowed by the scenario names, separated by commas.\n\nExample:\n\n`python benchmark.py --scenarios tail_null,tail_http`\n\nIf you need to define a specific log processor or a set of them, you should specify the --logprocessors parameter followed by the log processor names, separated by commas.\n\nExample:\n\n`python benchmark.py --scenarios tail_null --logprocessors fluent-bit`\n\nThe available scenarios for --scenarios parameter are::\n\n* http_http\n* http_null\n* tail_http\n* tail_null\n* tcp_null\n* tcp_tcp\n\nThe available log processors for --logprocessors (or in the log-processor.yaml) parameter are:\n\n* fluent-bit\n* fluentd\n* stanza\n* vector\n* otel-collector (aka open telemetry collector)\n\n## Benchmark Results\n\nInformation about the system where the benchmark was executed is persisted in the system_info.txt \ninside the benchmark-framework folder, in a folder generated during each run named: \n\n* scenario_\u003cdate\u003e_\u003csequencenumber\u003e\n\nExample: \n\n###### benchmark_framework/results/scenario_20240520_103157\n\nThe results for each scenario are stored in the results folder under scenario name folder\nThe data is kept in csv files and there are graphs in png format.\n\nExample:\n\n###### benchmark_framework/scenarios_tcp_null/results\n\nIn addition you can start a dashboard server to view the results:\n\n`\npython dashboard.py\n`\n\nThen go to ([http://localhost:8050](http://localhost:8050)) to see the results per scenario.\n\n![Example with Fluent Bit](imgs/FluentBit_LogProcessorBenchmarkResults.png)\n\n![Example with OpenTelemetry](imgs/LogProcessorBenchmarkResults.png)\n\ndashboard.py takes the last folder from the ‘results’ directory to display in the browser.\n\n## Adding Scenarios\n\nIn order to add new scenarios you can start by copying the scenarios/_scenario_template folder.\nName the scenario according to your scenario. For example based on the input and output used by the scenario.\nThere is also a README.md in each scenario that describes what the scenario does.\n\nEach scenario consists of the following folders:\n\n### config:\ncontains sub folders per log processor that should be executed for this scenario. Please note that the folder names and config file names are expected to be identical to the other scenarios.\ni.e.: /config/fluent-bit/fluent-bit.conf, /config/fluentd/fluentd.conf, /config/vector/vector.toml, /config/stanza/config.yaml\n\n### data:\nif your scenario requires some input data then this should be placed into this folder\n\n### tmp:\ntemporary folder that will be cleared before each scenario execution\n\n### results:\nresults of the scenario run.\n\n## How it works\n\nThe benchmark framework will execute the scenario.py in the following order:\n\n***scenario.init()*** \u0026rarr; allows you to initialize the scenario i.e. start/prepare the input\n\n***scenario.get_description()*** \u0026rarr; provide scenario description to the framework\n\n\u0026rarr;\u0026rarr; after the init the benchmark framework will tart the log processor and the monitoring\n\n***scenario.wait()*** \u0026rarr; wait till the scenario is done, you can start input/output also here if it makes sense for your scenario\n\n\u0026rarr;\u0026rarr; log processor and monitoring will be stopped\n\n***scenario.cleanup()*** \u0026rarr; stop input, output and do cleanup\n\n***scenario.get_input_description()*** \u0026rarr; if there is an input metric the scenario has to provide a description with the metric\n***scenario.get_input_metric()***\n***scenario.get_output_description()*** \u0026rarr; if there is an output metric the scenario has to provide a description with the metric\n***scenario.get_output_metric()***\n\n\n## General Info\n\n- This project was originally started at calyptia/benchmark-framework\n- Project was moved to chronosphereio/calyptia-benchmark-framework (archived)\n- Project has been moved in full to fluent/benchmark-framework\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffluent%2Fbenchmark-framework","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffluent%2Fbenchmark-framework","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffluent%2Fbenchmark-framework/lists"}