{"id":45907524,"url":"https://github.com/awslabs/amazon-msk-data-generator","last_synced_at":"2026-03-13T15:00:51.394Z","repository":{"id":66103445,"uuid":"399592418","full_name":"awslabs/amazon-msk-data-generator","owner":"awslabs","description":"Data generator for Amazon MSK","archived":false,"fork":false,"pushed_at":"2024-05-07T16:20:02.000Z","size":1002,"stargazers_count":18,"open_issues_count":6,"forks_count":6,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-09-09T13:27:02.460Z","etag":null,"topics":["msk"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"epl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/awslabs.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-08-24T20:10:22.000Z","updated_at":"2025-07-22T15:27:30.000Z","dependencies_parsed_at":"2023-02-26T23:15:22.137Z","dependency_job_id":null,"html_url":"https://github.com/awslabs/amazon-msk-data-generator","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/awslabs/amazon-msk-data-generator","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/awslabs%2Famazon-msk-data-generator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/awslabs%2Famazon-msk-data-generator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/awslabs%2Famazon-msk-data-generator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/awslabs%2Famazon-msk-data-generator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/awslabs","download_url":"https://codeload.github.com/awslabs/amazon-msk-data-generator/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/awslabs%2Famazon-msk-data-generator/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30469098,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-13T11:00:43.441Z","status":"ssl_error","status_checked_at":"2026-03-13T11:00:23.173Z","response_time":60,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["msk"],"created_at":"2026-02-28T04:00:15.452Z","updated_at":"2026-03-13T15:00:51.385Z","avatar_url":"https://github.com/awslabs.png","language":"Java","funding_links":[],"categories":["Data Generators \u0026 Testing"],"sub_categories":["FreeSWITCH"],"readme":"# Amazon MSK Data Generator\n\nMSK Data Generator is a translation of the *awesome* Voluble Apache Kafka\ndata generator from Clojure to Java.  (Link in Resources Section below)\n\nThe killer feature is being able to generate\nevents which reference other generated events.  (AKA: cross-reference, reference-able, joinable, etc.)\n\nFor example, we can generate one stream of Order events containing a customer_id (as well as price, sku, quantity, etc.)\nand at same time, we can generate a different stream of Customer events containing a customer_id (as well as first name, last name, location, etc.)\nThe dynamically generated Customer event customer_id can reference the Order event customer_id.\n\n#### Why this matters?\n\nMultiple streams of \"joinable\" data is especially useful when building\nstream processor applications (in `Kinesis Data Analytics for Apache Flink` or `Kinesis Data\nAnalytics Studio` for example) which perform joins.\n\nFor an example, see AWS Big Data Blog [Query your Amazon MSK topics interactively using Amazon Kinesis Data Analytics Studio](https://aws.amazon.com/blogs/big-data/query-your-amazon-msk-topics-interactively-using-amazon-kinesis-data-analytics-studio/)\n\n#### Why translate to Java?\n\nBy translating to Java, the hope is we open up the potential of wider community\ncollaboration.  (Nothing against Clojure mind you!  It's just more folks know Java.)\n\nThis project can likely be used outside of Amazon MSK, but to start at least, the focus will be making\nthis generator easy to use with Amazon MSK.\n\n#### Further Context\n\nMSK Data Generator is deployed and configured as a Kafka Connect _Source_,\nso basic knowledge of Kafka Connect will be helpful.\n\nLike many dynamic data generation projects, the key component is the use\nof Java Faker library.  Knowing more about Java Faker capabilities and options will be helpful.  \nSee link in Resources section below.\n\n## Getting Started\n\nMSK Data Generator can be deployed in a variety of ways including:\n\n* [Deploying in a container running in Elastic Container Service](./docs/msk-data-gen-container-deploy.md)\n\n* [Deploying as a Kafka Connect source connector in MSK Connect](./docs/msk-connect-deploy.md)\n\n## Customizing Data Generation Configuration\n\nThere are 5 essential constructs to understand when customizing key-value data generation:\n\n1. **Directives** `genk`, `genkp`, `genv`, and `genvp`\n\n2. **Generators** `with` or `matching`\n\n3. **Attribute** the name of the field to generate data\n\n4. **Qualifiers** `sometimes`\n\n5. **Expressions** based on Java faker\n\nFor example, consider the configuration of the following:\n\n```\n\"genkp.customer.with\": \"#{Internet.uuid}\",\n\"genv.customer.name.with\": \"#{Name.full_name}\",\n\"genv.customer.gender.with\": \"#{Demographic.sex}\",\n\"genv.customer.favorite_beer.with\": \"#{Beer.name}\",\n\"genv.customer.state.with\": \"#{Address.state}\",\n\n\"genkp.order.with\": \"#{Internet.uuid}\",\n\"genv.order.product_id.with\": \"#{number.number_between '101','109'}\",\n\"genv.order.quantity.with\": \"#{number.number_between '1','5'}\",\n\"genv.order.customer_id.matching\": \"customer.key\"\n```\n\nThis config will generate data to the `customer` and `customer` topics and _assumes_ the MSK cluster has been configured to allow auto topic creation OR the `customer` and `order` topics have already been created.\n\nFor example, the above configuration will create 2 events with every iteration similar to the following:\n\n`customer` event with a key of `0c88cbb7-eb4a-44f0-83aa-00957761b3b6` (because Internet.uuid for random string from Java Faker) and JSON payload of\n\n```\n{\n   \"favorite_beer\": \"Weihenstephaner Hefeweissbier\",\n   \"gender\": \"Male\",\n   \"name\": \"Miss Gilbert Luettgen\",\n   \"state\": \"Oregon\"\n}\n```\n\n`order` event with a random string key of `dc236186-9037-45a0-8b91-a3c2b50f0582` (again, because of Internet.uuid)\nand a JSON payload of\n\n```\n{\n   \"quantity\": \"4\",\n   \"product_id\": \"132\",\n   \"customer_id\": \"0c88cbb7-eb4a-44f0-83aa-00957761b3b6\"\n}\n```\n\nNotice how the `order` event `customer_id` value references the previously generated `customer` key field?  (Hint: with this kind of data generation, we can test our join code!)\n\nThis also highlights the differences between `with` and `matching` in configuration.\n\nIn this example, `with` is utilizing and methods available from Java Faker see [API docs](https://dius.github.io/java-faker/apidocs/) and then compare the class methods with configuration above such as `Name.full_name`, `Beer.name`, etc.\n\n\nWith this example above and the 5 previously mentioned essential constructs in mind, the sequence is:\n\n`directive.topic.attribute-or-qualifier.generator: expression`\n\nFor further information on data generation configuration options, check both the Voluble README as well as some of the\n[examples in this repo](./examples/)\n\n\n## External References\n\n* Voluble (basis for this project) https://github.com/MichaelDrogalis/voluble\n\n* Java Faker https://github.com/DiUS/java-faker\n\n* Java Faker API docs https://dius.github.io/java-faker/apidocs/\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fawslabs%2Famazon-msk-data-generator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fawslabs%2Famazon-msk-data-generator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fawslabs%2Famazon-msk-data-generator/lists"}