{"id":25176698,"url":"https://github.com/conduitio/streaming-benchmarks","last_synced_at":"2025-07-30T01:33:32.477Z","repository":{"id":53091663,"uuid":"521233591","full_name":"ConduitIO/streaming-benchmarks","owner":"ConduitIO","description":"Benchmarks for Conduit and other data streaming tools.","archived":false,"fork":false,"pushed_at":"2025-05-29T10:03:38.000Z","size":8807,"stargazers_count":4,"open_issues_count":4,"forks_count":0,"subscribers_count":7,"default_branch":"main","last_synced_at":"2025-06-11T09:46:28.359Z","etag":null,"topics":["benchmark","conduit","data-streaming"],"latest_commit_sha":null,"homepage":"","language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ConduitIO.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-08-04T11:09:16.000Z","updated_at":"2025-05-21T00:08:24.000Z","dependencies_parsed_at":"2024-04-25T10:48:14.288Z","dependency_job_id":"e074739b-1873-4fd1-9192-37ae2c925bc1","html_url":"https://github.com/ConduitIO/streaming-benchmarks","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ConduitIO/streaming-benchmarks","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ConduitIO%2Fstreaming-benchmarks","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ConduitIO%2Fstreaming-benchmarks/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ConduitIO%2Fstreaming-benchmarks/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ConduitIO%2Fstreaming-benchmarks/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ConduitIO","download_url":"https://codeload.github.com/ConduitIO/streaming-benchmarks/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ConduitIO%2Fstreaming-benchmarks/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267792674,"owners_count":24144930,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-29T02:00:12.549Z","response_time":2574,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["benchmark","conduit","data-streaming"],"created_at":"2025-02-09T13:17:55.572Z","updated_at":"2025-07-30T01:33:32.459Z","avatar_url":"https://github.com/ConduitIO.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Streaming Benchmarks\n\nThis repository contains the tools and scripts used to run performance\nbenchmarks for data streaming tools like\n[Conduit](https://github.com/conduitio/conduit) and\n[Kafka Connect](https://docs.confluent.io/platform/current/connect/index.html).\nThe tools are tested under the same conditions using\n[Benchi](https://github.com/conduitio/benchi), to ensure that the results are\ncomparable.\n\n## Results\n\nThe raw results of the benchmarks can be found in the [results](./results)\ndirectory. Here we are just posting the aggregated results.\n\n### Benchmark: MongoDB to Kafka\n\n\u003e [Click here](./results/mongo-kafka) to see the full results.\n\nThis benchmark tests the performance of the data pipeline when reading from a\nMongoDB source and writing to a Kafka destination. We tested the speed at which\nthe tools read the data in snapshot mode and CDC mode. We also tested\nthe [official MongoDB connector](https://www.mongodb.com/docs/kafka-connector/current/)\nas well\nas [the Debezium connector](https://debezium.io/documentation/reference/stable/connectors/mongodb.html).\n\n[Compared to the official connector](./results/mongo-kafka/20250422), Conduit’s\nCPU usage is higher by around 13% in snapshots and 28% in CDC. When it comes to\nmemory usage, we see a bigger gap, this time with Conduit using less resources (\n390 MB or 68%) than Kafka Connect ( 1200 MB).\n\nWhile the snapshot message rates are pretty close (Conduit’s message rate is\nabout 9% higher), we see a greater gap in CDC, where Conduit’s message rate is\nabout 52% higher.\n\nOur [comparison between Conduit and Debezium's Mongo connector](./results/mongo-kafka/20250428)\nshowed performance differences. Conduit's Mongo connector delivered 17% higher\nCDC message throughput (37k msg/s vs. 32k msg/s). Kafka Connect with Debezium\nused 25% more CPU and required 7.5 GB of memory compared to Conduit's 350 MB.\n\nNote that tests for the official connector and the Debezium connector were\nconducted on different machines, resulting in varying throughput rates for\nConduit. Due to this testing environment difference, throughput comparisons\nbetween the official connector and Debezium should be interpreted with caution.\n\n### Benchmark: Postgres to Kafka\n\n\u003e [Click here](./results/postgres-kafka/20250508) to see the full results.\n\nThis benchmark tests the performance of the data pipeline when reading from a\nPostgres source and writing to a Kafka destination. We evaluated how quickly\nthe tools process data in both snapshot mode and CDC (Change Data Capture) mode.\n\nThe comparison included [Conduit](https://github.com/conduitio/conduit) and\n[Kafka Connect](https://docs.confluent.io/platform/current/connect/index.html)\nwith the [Debezium Postgres connector](https://debezium.io/documentation/reference/stable/connectors/postgresql.html).\n\nCompared to Kafka Connect, Conduit’s message throughput was about 7% higher in\nCDC mode (48.060 msg/s vs. 44.889 msg/s) and about 3% higher in snapshot mode\n(70.753 msg/s vs. 68.783 msg/s).\n\nIn CDC mode, Conduit used dramatically less memory (110 MB vs. 6.863 MB, or about\n98% less) and required about 25% less CPU (110% vs. 147%).\n\nIn snapshot mode, Conduit used about 18% less memory (2.234 MB vs. 2.729 MB), but\nrequired about 25% more CPU (231% vs. 184%).\n\nThese results highlight Conduit’s strong performance, particularly in CDC scenarios,\nwhere it achieved higher throughput and much lower memory usage and CPU consumption.\n\nIn snapshot mode, although Conduit’s throughput was slightly higher and memory usage\nwas lower, CPU usage increased. Overall, Conduit offers a compelling choice for\nPostgres-to-Kafka pipelines where efficiency and throughput are critical.\n\n### Benchmark: Kafka to Snowflake\n\n\u003e [Click here](./results/kafka-snowflake/20250417) to see the full results.\n\nThis benchmark tests the performance of the data pipeline when reading from a\nKafka source and writing to a [Snowflake](https://www.snowflake.com/) destination.\n\nWe found that Conduit was able to process 13,333 messages per second, while Kafka\nConnect was able to process 66,400 messages per second. The test was run for 1\nminute, and the results were aggregated over the entire test duration.\n\nIt's important to note what caused the difference in throughput. Both tools\nfunction entirely differently and have different use cases. Kafka Connect is\ndumping the raw data into Snowflake, letting the user transform the data in\nSnowflake in later steps. Conduit, on the other hand, is transforming the data\nas it flows through the pipeline, inserting the transformed data into proper\ncolumns in Snowflake. Additionally, Conduit deduplicates the data, while Kafka\nConnect does not. This means that Conduit is doing more work than Kafka Connect,\nwhich is reflected in the throughput numbers.\n\n### Benchmark: MySQL to Kafka\n\n\u003e [Click here](./results/mysql-kafka/20250507) to see the full results.\n\nThis benchmark tests the performance of the data pipeline when reading from a\nMySQL source and writing to a Kafka destination. We evaluated how quickly\nthe tools process data in both snapshot mode and CDC (Change Data Capture) mode.\n\nThe comparison included [Conduit](https://github.com/conduitio/conduit) and\n[Kafka Connect](https://docs.confluent.io/platform/current/connect/index.html)\nwith the [Debezium MySQL connector](https://debezium.io/documentation/reference/stable/connectors/mysql.html).\n\nCompared to Kafka Connect, and across all tested EC2 instance types, from low-resource to high-resource \nenvironments, Conduit consistently proves to be a lean, reliable, and production-ready tool for\nMySQL to Kafka pipeline.\n\nUsing the EC2 instance type c7a.large, _Kafka Connect couldn’t even start_, while Conduit ran smoothly,\ndelivering solid throughput with minimal resource usage.\n\nUsing the EC2 instance type c7a.xlarge, Kafka Connect achieved higher throughput, but Conduit held close,\nusing ~75% less memory while maintaining competitive performance. Even though Conduit\nused more CPU (around 80% vs Kafka Connect’s 60%), it did so intentionally and efficiently,\nmaking better use of available system resources.\n\n## Running the benchmarks\n\nTo run the benchmarks yourself, you need to have Docker and Docker Compose\ninstalled on your machine (see [Docker Desktop](https://docs.docker.com/desktop/)).\n\nRun all benchmarks using:\n\n```sh\nmake install-tools run-all\n```\n\nThe [`Makefile`](./Makefile) contains a number of useful targets to make it easy\nto work with the benchmarks. Use `make help` to see the available targets.\n\n```sh\n$ make help\ninstall-tools   Install all tools required for benchmarking.\ninstall-benchi  Install latest version of benchi.\ninstall-csvtk   Install csvtk for processing CSV files.\nlint            Lint all benchmarks.\nlist            List all benchmarks.\nrun-all         Run all benchmarks. Optionally add \"run-\u003cbenchmark-name\u003e\" to run a specific benchmark.\nrun-%           Run a specific benchmark.\nrmi-conduit     Remove the Conduit docker image (use when built-in connectors get added or upgraded).\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fconduitio%2Fstreaming-benchmarks","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fconduitio%2Fstreaming-benchmarks","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fconduitio%2Fstreaming-benchmarks/lists"}