{"id":18052908,"url":"https://github.com/canelmas/data-producer","last_synced_at":"2025-04-05T07:44:04.238Z","repository":{"id":42208798,"uuid":"184885275","full_name":"canelmas/data-producer","owner":"canelmas","description":"Fake data producer for Kafka, console and http endpoints","archived":false,"fork":false,"pushed_at":"2023-01-03T21:01:13.000Z","size":908,"stargazers_count":1,"open_issues_count":13,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2024-10-30T23:13:46.143Z","etag":null,"topics":["data","fake-content","fake-data","fakerjs","kafka","kafka-producer"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/canelmas.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-05-04T11:18:08.000Z","updated_at":"2020-10-23T10:24:28.000Z","dependencies_parsed_at":"2023-02-01T10:00:42.541Z","dependency_job_id":null,"html_url":"https://github.com/canelmas/data-producer","commit_stats":null,"previous_names":[],"tags_count":11,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/canelmas%2Fdata-producer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/canelmas%2Fdata-producer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/canelmas%2Fdata-producer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/canelmas%2Fdata-producer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/canelmas","download_url":"https://codeload.github.com/canelmas/data-producer/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":230276582,"owners_count":18201108,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data","fake-content","fake-data","fakerjs","kafka","kafka-producer"],"created_at":"2024-10-30T23:13:42.178Z","updated_at":"2024-12-18T13:20:38.572Z","avatar_url":"https://github.com/canelmas.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Usage\n\n```bash\ndocker run --name=data-producer -d --restart=always \\\n        -e ENV=production \\\n        -e OUTPUT=\"console,kafka,webhook\" \\\n        -e VERBOSE=\"true\" \\\n        -e MODE=default \\\n        -e EVENT_SCENARIO=random \\\n        -e PERIOD_IN_MS=10000 \\\n        -e NUM_OF_USERS=3 \\\n        -e SESSION_PER_USER=5 \\\n        -e EVENTS_PER_SESSION=20 \\\n        -e APP_IDS=\"LovelyApp,LoveliestApp,HappyApp,HappiestApp\" \\\n        -e EVENT_DATE_RANGE=\"1D\"\n        -e SEND_USERS=\"true\" \\\n        -e ADD_USER_DEMOGRAPHICS=\"false\" \\\n        -e DATE_FORMAT=\"YYYY-MM-DDTHH:mm:ssZ\" \\\n        -e REDIS_HOST=redis \\\n        -e REDIS_PORT=6379 \\\n        -e BROKERS=broker1:9092,broker2:19092,broker3:29092 \\\n        -e CREATE_TOPICS=\"events:1:1,users:1:1\" \\\n        -e TOPIC_USERS=users \\\n        -e TOPIC_EVENTS=events \\        \n        -e FORMAT=avro \\\n        -e WRITE_TO_MULTI_TOPICS=\"event:events-json:json,event:events-avro:avro:events-avro-value\" \\\n        -e SCHEMA_REGISTRY=http://schema-registry:8081 \\\n        -e MESSAGE_KEY=\"appId\"\n        -e WEBHOOK_URL=http://localhost:3000/v1/events \\\n        -e WEBHOOK_HEADERS='x-api-key:f33be30e-7695-4817-9f0c-03cb567c5732,lovely-header:value'\n        -e FUNNEL_TEMPLATE=\"{\\\"steps\\\":[{\\\"name\\\":\\\"A\\\",\\\"attributes\\\":{\\\"a_key_1\\\":\\\"word\\\",\\\"a_key_2\\\":\\\"number\\\"},\\\"probability\\\":0.6},{\\\"name\\\":\\\"B\\\",\\\"attributes\\\":{\\\"b_key_1\\\":\\\"amount\\\",\\\"b_key_2\\\":\\\"uuid\\\"},\\\"probability\\\":0.5},{\\\"name\\\":\\\"C\\\",\\\"probability\\\":0.9,\\\"attributes\\\":{\\\"c_key_1\\\":\\\"boolean\\\"}}]}\" \\         \n        -e EXPLODE=\"false\"\n        canelmas/data-producer:4.7.0\n```\nImages are available on [DockerHub](https://hub.docker.com/r/canelmas/data-producer).\n\n## Env Variables\n\n### `ENV`\n\n- __`production`__ : Messages are written to `output` values.\n- __`development`__ : Messages are written to console only.\n\nDefault is __development__.\n\n### `VERBOSE`\n\n- __`\"true\"`__ : Kafka Record metadata is written to console for each message.\n- __`\"false\"`__ : Kafka Record metadata is not written to console.\n\nThis option makes only sense when `ENV=production`.\n\nDefault is __false__.\n\n### `OUTPUT`\n\n- __`console`__ : Generated data is written to console.\n- __`kafka`__ : Generated data is written to kafka.\n- __`webhook`__ : Generated data is posted to webhook.\n\nDefault is __console__.\n\n### `MODE`\n\n- __`default`__ : Events and users are generated and written to Kafka.\n- __`create-users`__ : Generate users and writes them to Redis, without writing any record to Kafka.\n- __`use-redis`__ : Generate events by using random users in Redis. Events are written to Kafka; users are not.\n- __`send-users`__ : Write users in Redis to Kafka.\n\n`create-users`, `use-redis` and `send-users` modes require you to set `REDIS_HOST` and `REDIS_PORT`.\n\nDefault is __`default`__.\n\n### `EVENT_SCENARIO`\n\n- __`random`__ : Choose among `view`, `commerce`, `custom` and `apm` types each time an event is generated.\n- __`view`__ : Only _`viewStart`_ and _`viewStop`_ events are generated randomly.\n- __`commerce`__ : Only _`purchase`, `purchaseError`, `viewProduct`, `viewCategory`, `search`, `clearCart`, `addToWishList`, `removeFromWishList`, `startCheckout`, `addToCart`_ and _`removeFromCart`_ events are generated randomly.\n- __`apm`__ : Only _`httpCall`_ and _`networkError`_ events are generated randomly.\n- __`custom`__ : Check [here](https://github.com/canelmas/data-producer/blob/8bb80243ae6d996fcebee69f596438c093fd1988/generators/custom_events.js#L258) to see list of custom events.\n\nDefault is __`random`__.\n\n### `PERIOD_IN_MS`\n\nPeriod in milliseconds to generate and send events/users. \n\nDefault is __5000__.\n\n### `NUM_OF_USERS`\n\nNumber of users to generate and send for each period.\n\nDefault is __1__.\n\n### `SESSIONS_PER_USER`\n\nNumber of sessions to generate for each user within each period.\n\nDefault is __1__.\n\n### `EVENTS_PER_SESSION`\n\nNumber of events to generate for each user session.\n\nDefault is __5__.\n\n### `DEVICE_ID`\n\nDevice id to be used for each event.\n\nOnce this option is set, only one user is generated and used throughout the event generation.\n\nDefault is __undefined__.\n\n### `APP_IDS`\n\nComma separated app names to use randomly as `appId` for each event. (e.g. `APP_IDS=DemoApp,FooApp,BarApp,ZooWebApp`)\n\nDefault is __DemoApp__.\n\n### `SEND_USERS`\n\nWhether generated user data should be written to any output.\n\nDefault is __true__.\n\n### `ADD_USER_DEMOGRAPHICS`\n\nWhether bunch of demographics information should be generated and set for each user.\n\nCheck [here](https://github.com/canelmas/data-producer/blob/8bb80243ae6d996fcebee69f596438c093fd1988/generators/user_generator.js#L42) to see list of demographics information generated.\n\nDefault is __false__.\n\n### `EXCLUDE_SESSION_EVENTS`\n\nWhether `clientSessionStart` and `clientSessionStop` events should be sent.\n\nDefault is __false__.\n\n### `DATE_FORMAT`\n\nDate format to be used in event and user data generated.\n\nCheck [moment.js](https://momentjs.com/) or this [cheatsheet](https://devhints.io/moment) for format options.\n\nDefault is __YYYY-MM-DDTHH:mm:ssZ__.\n\n### `REDIS_HOST`\n\nRedis host.\n\nThis option is required only if you're running one of the following modes : `create-users`, `use-redis`, `send-users`.\n\nDefault is __undefined__.\n\n### `REDIS_PORT`\n\nRedis port.\n\nThis option is required only if you're running one of the following modes : `create-users`, `use-redis`, `send-users`.\n\nDefault is __undefined__.\n\n### `BROKERS`\n\nComma separated Kafka brokers e.g. `BROKERS=kafka1:19092,kafka2:29092,kafka3:39092`\n\nDefault is __localhost:19092__.\n\n### `CREATE_TOPICS`\n\nIf set, create Kafka topics. \n\nThis configuration expects comma separated list of entries with the following format : `topic name(required):number of partitions(required):replications factor(required)`\n\nFor example, `CREATE_TOPICS=A:3:1,B:1:1` configuration will create two topics named A and B. Topic A will have 2 as the partition number and 1 as the replication factor whereas topic B will have its partition number and replication factor set to 1.\n\nDefault is __undefined__.\n\nBy default event and user data are written respectively to `events` and `users` topics. So you better make sure that these topics are present or create these topics by setting this parameter. \n\nOnly exception to this, is `WRITE_TO_MULTI_TOPICS` case. When this parameter is set, event and user data are not written to default topics (`events` and `users`), but the topics specified with `WRITE_TO_MULTI_TOPICS` parameter.\n\n### `TOPIC_USERS`\n\nName of the topic for Kafka producer to send user data.\n\nDefault is __users__.\n\nThis parameter is ignored if `WRITE_TO_MULTI_TOPICS` is used.\n\n### `TOPIC_EVENTS`\n\nName of the topics for Kafka producer to send event data.\n\nDefault is __events__.\n\nThis parameter is ignored if `WRITE_TO_MULTI_TOPICS` is used.\n\n### `FORMAT`\n\nSerialization format of Kafka records. Only `json` and `avro` are supported.\n\nDefault is __json__.\n\n`SCHEMA_REGISTRY` is required when this paremeter is set to `avro`.\n\n### `SCHEMA_REGISTRY`\n\nRequired schema registry url if `avro` format is used.\n\nDefault is __undefined__.\n\n### `WRITE_TO_MULTI_TOPICS`\n\nConvenient when the same record must be written to multiple topics.\n\nThis configuration expects comma separated list of entries with the following format : \n`entity type(required):topic name(required):serialization format(required):subject name if avro is used(optional)`\n\nSupported entity types are `user` for user data and `event` for event data.\n\nFor example, in order to write event messages to two different topics, first in json and the second in avro, following configuration may be used:\n\n`WRITE_TO_MULTI_TOPICS=event:events-json:json,event:events-avro:avro:events-avro-value`\n\nWe're basically saying producer to write `event` entity (generated event data) to both \n\n- `events-json` topic in `json format` and\n- `events-avro` topic with subject name `events-avro-value` in `avro` format.\n\nIf `avro` is used, make sure to set `SCHEMA_REGISTRY` and to register the schema under the subject name `events-avro-value`.\n\nDefault is __undefined__.\n\n### `WEBHOOK_URL`\n\nWebhook url to post generated data.\n\nOnly events are posted to specified webhook; users are omitted.\n\nDefault is __undefined__.\n\n### `WEBHOOK_HEADERS`\n\nComma separated headers to pass while using `WEBHOOK_URL` (e.g. `x-api-key:ABCD-XYZ,lovely-header:lovely-header-value`)\n\nDefault is __undefined__.\n\n### `FUNNEL_TEMPLATE`\n\nEscaped funnel template json string.\n\nOnce this property is set, only funnel events will be created according to the template.\n\n__`probability`__ for each funnel event indicates how likely that specific event will be generated e.g. **1** means that the event will be generated each time; **0.2** means that specific event generation probability is 20%.\n\nIn case of a not generated funnel event, a **`random`** event will be generated instead.\n\n__`attributes`__ accepts `boolean`, `word`, `amount`, `uuid` and `number` values in order to generate random event attributes values. Default is __word__.\n\nYou may use [this](https://tools.knowledgewalls.com/jsontostring) to convert json to string.\n\nBelow is a sample template:\n\n```json\n{\n    \"steps\": [\n        {\n            \"name\": \"A\",\n            \"attributes\": {\n                \"a_key_1\": \"word\",\n                \"a_key_2\": \"number\",\n                \"a_key_3\": \"amount\",\n                \"a_key_4\": \"uuid\",\n                \"a_key_5\": \"boolean\"\n            },\n            \"probability\" : 0.8\n        },\n        {\n            \"name\": \"B\",\n            \"attributes\" : {\n                \"b_key_1\": \"amount\",\n                \"b_key_2\": \"uuid\"\n            },\n            \"probability\" : 0.5\n        },\n        {\n            \"name\": \"C\",                        \n            \"attributes\" : {\n                \"c_key_1\" : \"boolean\"\n            },\n            \"probability\" : 0.6\n        }\n    ]\n}\n```\n\nDefault is __undefined__.\n\n### `EVENT_DATE_RANGE`\n\nAmount of time to substract from now() while generating event times i.e. setting this value to `15D` will ensure that event time is randomly selected starting from 15 days ago.\n\n`M`, `D` and `Y` are supported.\n\nDefault is __1M__.\n\n### `EVENT_DATE`\n\nA specific date to use while generating events.\n\nExpected format is `YYYY-MM-DD`.\n\nDefault is __undefined__.\n\n### `MESSAGE_KEY`\n\nKafka message key.\n\n`aid`, `deviceId`, `eventId`, `appId` and `eventName` are supported.\n\nDefault is __null__.\n\n### `EXPLODE`\n\nExperimental boolean flag when set to true, json is exploded for each and every key in attributes.\n\n```JSON\n{\n    \"attributes\" : {\n        \"category\" : \"Chair\",        \n        \"currency\" : \"USD\",\n        \"price\" : 804\n    }\n}\n```\n\nis transformed into 3 separate events:\n\n```JSON\n{\n    \"attributes\" : {\n        \"dim_name\" : \"category\",\n        \"dim_value_string\" : \"Chair\"\n    }\n}\n{\n    \"attributes\" : {\n        \"dim_name\" : \"currency\",\n        \"dim_value_string\" : \"USD\"\n    }    \n}\n{\n    \"attributes\" : {\n        \"dim_name\" : \"price\",\n        \"dim_value_double\" : 804\n    }    \n}\n```\n\nDefault is __false__.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcanelmas%2Fdata-producer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcanelmas%2Fdata-producer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcanelmas%2Fdata-producer/lists"}