{"id":13569573,"url":"https://github.com/knaufk/flink-faker","last_synced_at":"2026-01-04T00:49:51.812Z","repository":{"id":37012436,"uuid":"300975265","full_name":"knaufk/flink-faker","owner":"knaufk","description":"A data generator source connector for Flink SQL based on data-faker.","archived":false,"fork":false,"pushed_at":"2023-07-24T21:06:58.000Z","size":1893,"stargazers_count":219,"open_issues_count":13,"forks_count":59,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-04-01T18:09:21.288Z","etag":null,"topics":["apache-flink","flink","flink-sql"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/knaufk.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2020-10-03T20:39:43.000Z","updated_at":"2025-03-19T08:05:18.000Z","dependencies_parsed_at":"2024-01-14T03:49:03.391Z","dependency_job_id":"bf451698-9626-40d9-9eb4-84e77369e09e","html_url":"https://github.com/knaufk/flink-faker","commit_stats":null,"previous_names":[],"tags_count":10,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/knaufk%2Fflink-faker","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/knaufk%2Fflink-faker/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/knaufk%2Fflink-faker/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/knaufk%2Fflink-faker/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/knaufk","download_url":"https://codeload.github.com/knaufk/flink-faker/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247128702,"owners_count":20888232,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache-flink","flink","flink-sql"],"created_at":"2024-08-01T14:00:41.518Z","updated_at":"2026-01-04T00:49:51.765Z","avatar_url":"https://github.com/knaufk.png","language":"Java","funding_links":[],"categories":["Java"],"sub_categories":[],"readme":"![Build Status](https://github.com/knaufk/flink-faker/actions/workflows/ci.yml/badge.svg?branch=master)\n\n# flink-faker\n\nflink-faker is an Apache Flink [table source](https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/connectors/)\nthat generates fake data based on the [Data Faker](https://github.com/datafaker-net/datafaker) expression\nprovided for each column.\n\nCheckout this [demo web application](https://java-faker.herokuapp.com/) for some example Java Faker\n(fully compatible with Data Faker) expressions and [Data Faker documentation](https://www.datafaker.net/documentation/providers/).\n\nThis project is inspired by [voluble](https://github.com/MichaelDrogalis/voluble).\n\n## Package\n\n```shell script\nmvn clean package\n```\n\n## Compatibility Matrix\n\n| Flink Version | flink-faker Version |\n|---------------|---------------------|\n| 1.11          | 0.1.x - 0.4.x       |\n| 1.12          | 0.1.x - 0.4.x       |\n| 1.13          | 0.1.x - 0.4.x       |\n| 1.14          | 0.1.x - 0.4.x       |\n| 1.15          | 0.5.0               |\n| 1.16          | 0.5.1+              |\n| 1.17          | 0.5.1+              |\n\nThere are no automated tests that check this compatibility. So, please treat this table as \"best knowledge\". \nIf you notice any incompatibilities please open an issue.\n\n## Using flink-faker with the Flink SQL Client\n\n1. Download Flink from the [Apache Flink website](https://flink.apache.org/downloads.html).\n2. Download the flink-faker JAR from the [Releases](https://github.com/knaufk/flink-faker/releases) page (or [build it yourself](#package)).\n3. Put the downloaded jars under `lib/`.\n4. (Re)Start a [Flink cluster](https://ci.apache.org/projects/flink/flink-docs-stable/docs/try-flink/local_installation/#step-2-start-a-cluster).\n5. (Re)Start the [Flink CLI](https://ci.apache.org/projects/flink/flink-docs-stable/docs/dev/table/sqlclient/).\n\n## Usage\n\n### As ScanTableSource\n\n```sql\nCREATE TEMPORARY TABLE heros (\n  `name` STRING,\n  `power` STRING, \n  `age` INT\n) WITH (\n  'connector' = 'faker', \n  'fields.name.expression' = '#{superhero.name}',\n  'fields.power.expression' = '#{superhero.power}',\n  'fields.power.null-rate' = '0.05',\n  'fields.age.expression' = '#{number.numberBetween ''0'',''1000''}'\n);\n```\n```sql\nSELECT * FROM heros;\n```\n\n\n### As LookupTableSource\n\n```sql\nCREATE TEMPORARY TABLE location_updates (\n  `character_id` INT,\n  `location` STRING,\n  `proctime` AS PROCTIME()\n)\nWITH (\n  'connector' = 'faker', \n  'fields.character_id.expression' = '#{number.numberBetween ''0'',''100''}',\n  'fields.location.expression' = '#{harry_potter.location}'\n);\n```\n```sql\nCREATE TEMPORARY TABLE characters (\n  `character_id` INT,\n  `name` STRING\n)\nWITH (\n  'connector' = 'faker', \n  'fields.character_id.expression' = '#{number.numberBetween ''0'',''100''}',\n  'fields.name.expression' = '#{harry_potter.characters}'\n);\n```\n```sql\nSELECT \n  c.character_id,\n  l.location,\n  c.name\nFROM location_updates AS l\nJOIN characters FOR SYSTEM_TIME AS OF proctime AS c\nON l.character_id = c.character_id;\n```\n\nCurrently, the `faker` source supports the following data types:\n\n* `CHAR`\n* `VARCHAR`\n* `STRING`\n* `TINYINT`\n* `SMALLINT`\n* `INTEGER`\n* `BIGINT`\n* `FLOAT`\n* `DOUBLE`\n* `DECIMAL`\n* `BOOLEAN`\n* `TIMESTAMP`\n* `DATE`\n* `TIME`\n* `ARRAY`\n* `MAP`\n* `MULTISET`\n* `ROW`\n\n### Connector Options\n\n| Connector Option            | Default | Description                                                                                                                      |\n|-----------------------------|---------|----------------------------------------------------------------------------------------------------------------------------------|\n| `number-of-rows`            | None    | The number of rows to produce. If this is options is set, the source is bounded otherwise it is unbounded and runs indefinitely. |\n| `rows-per-second`           | 10000   | The maximum rate at which the source produces records.                                                                           |\n| `fields.\u003cfield\u003e.expression` | None    | The [Data Faker](https://www.datafaker.net/documentation/expressions/) expression to generate the values for this field.         |\n| `fields.\u003cfield\u003e.null-rate`  | 0.0     | Fraction of rows for which this field is `null`                                                                                  |\n| `fields.\u003cfield\u003e.length`     | 1       | Size of array, map or multiset                                                                                                   |\n\n### On Timestamps\n\nFor rows of type `TIMESTAMP`, `DATE` the corresponding Data Faker expression needs to return a timestamp formatted as `uuuu-MM-dd hh:mi:ss[.nnnnnnnnn]`.\nTypically, you would use one of the following expressions:\n\n```sql\nCREATE TEMPORARY TABLE timestamp_time_and_date_example (\n  `timestamp1` TIMESTAMP(3),\n  `timestamp2` TIMESTAMP(3),\n  `timestamp3` TIMESTAMP(3),\n  `time`       TIME,\n  `date1`      DATE,\n  `date2`      DATE\n)\nWITH (\n  'connector' = 'faker', \n  'fields.timestamp1.expression' = '#{date.past ''15'',''SECONDS''}',\n  'fields.timestamp2.expression' = '#{date.past ''15'',''5'',''SECONDS''}',\n  'fields.timestamp3.expression' = '#{date.future ''15'',''5'',''SECONDS''}',\n  'fields.time.expression' = '#{time.future ''15'',''5'',''SECONDS''}',\n  'fields.date1.expression' = '#{date.birthday}',\n  'fields.date2.expression' = '#{date.birthday ''1'',''100''}'\n);\n```\n```sql\nSELECT * FROM timestamp_time_and_date_example;\n```\n\nFor `timestamp1` Data Faker will generate a random timestamp that lies at most 15 seconds in the past.\nFor `timestamp2` Data Faker will generate a random timestamp, that lies at most 15 seconds in the past, but at least 5 seconds.\nFor `timestamp3` Data Faker will generate a random timestamp, that lies at most 15 seconds in the future, but at least 5 seconds.\nFor `time` Data Faker will generate a random time, that lies at most 15 seconds in the future, but at least 5 seconds.\nFor `date1` Data Faker will generate a random birthday between 18 and 65 years ago.\nFor `date2` Data Faker will generate a random birthday between 1 and 100 years ago.\n\n### On Collection Data Types\n\nThe usage of `ARRAY`, `MULTISET`, `MAP` and `ROW` types is shown in the following example.\n\n```sql\nCREATE TEMPORARY TABLE hp (\n  `character-with-age` MAP\u003cSTRING,INT\u003e,\n  `spells` MULTISET\u003cSTRING\u003e,\n  `locations` ARRAY\u003cSTRING\u003e,\n  `house-points` ROW\u003c`house` STRING, `points` INT\u003e\n) WITH (\n  'connector' = 'faker',\n  'fields.character-with-age.key.expression' = '#{harry_potter.character}',\n  'fields.character-with-age.value.expression' = '#{number.numberBetween ''10'',''100''}',\n  'fields.character-with-age.length' = '2',\n  'fields.spells.expression' = '#{harry_potter.spell}',\n  'fields.spells.length' = '5',\n  'fields.locations.expression' = '#{harry_potter.location}',\n  'fields.locations.length' = '3',\n  'fields.house-points.house.expression' = '#{harry_potter.house}',\n  'fields.house-points.points.expression' = '#{number.numberBetween ''10'',''100''}'\n);\n```\n```sql\nSELECT * FROM hp;\n```\n\n### \"One Of\" Columns\n\nDatafaker allows to pick a random value from a list of options via expression ``Options.option``\n\n```sql\nCREATE TEMPORARY TABLE orders (\n  `order_id` INT,\n  `order_status` STRING\n)\nWITH (\n  'connector' = 'faker',\n  'fields.order_id.expression' = '#{number.numberBetween ''0'',''100''}',\n  'fields.order_status.expression' = '#{Options.option ''RECEIVED'',''SHIPPED'',''CANCELLED'')}'\n);\n```\n```sql\nSELECT * FROM orders;\n```\n\n## License\n\nCopyright © 2020-2023 Konstantin Knauf\n\nDistributed under Apache License, Version 2.0.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fknaufk%2Fflink-faker","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fknaufk%2Fflink-faker","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fknaufk%2Fflink-faker/lists"}