{"id":29874950,"url":"https://github.com/datasqrl/flink-sql-runner","last_synced_at":"2026-02-10T20:01:42.067Z","repository":{"id":259110641,"uuid":"875021571","full_name":"DataSQRL/flink-sql-runner","owner":"DataSQRL","description":"Dockerized runner, utilities, and functions for FlinkSQL applications","archived":false,"fork":false,"pushed_at":"2026-02-09T12:17:56.000Z","size":20226,"stargazers_count":27,"open_issues_count":6,"forks_count":2,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-02-09T17:25:50.782Z","etag":null,"topics":["docker","flink","flinksql","kubernetes"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DataSQRL.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-10-18T23:17:19.000Z","updated_at":"2026-02-09T12:17:58.000Z","dependencies_parsed_at":"2025-03-24T18:21:56.898Z","dependency_job_id":"1a9e9f60-fb48-4f5b-99b6-5df44308fb7a","html_url":"https://github.com/DataSQRL/flink-sql-runner","commit_stats":null,"previous_names":["datasqrl/flink-jar-runner","datasqrl/flink-sql-runner"],"tags_count":21,"template":false,"template_full_name":null,"purl":"pkg:github/DataSQRL/flink-sql-runner","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DataSQRL%2Fflink-sql-runner","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DataSQRL%2Fflink-sql-runner/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DataSQRL%2Fflink-sql-runner/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DataSQRL%2Fflink-sql-runner/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DataSQRL","download_url":"https://codeload.github.com/DataSQRL/flink-sql-runner/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DataSQRL%2Fflink-sql-runner/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29314703,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-10T17:48:59.043Z","status":"ssl_error","status_checked_at":"2026-02-10T17:45:37.240Z","response_time":65,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["docker","flink","flinksql","kubernetes"],"created_at":"2025-07-31T01:45:34.986Z","updated_at":"2026-02-10T20:01:42.060Z","avatar_url":"https://github.com/DataSQRL.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![GitHub release](https://img.shields.io/github/v/release/DataSQRL/flink-sql-runner?sort=semver)](https://github.com/DataSQRL/flink-sql-runner/releases)\n[![Docker Image Version](https://img.shields.io/docker/v/datasqrl/flink-sql-runner?sort=semver)](https://hub.docker.com/r/datasqrl/flink-sql-runner/tags)\n[![Maven Central](https://img.shields.io/maven-central/v/com.datasqrl.flinkrunner/flink-sql-runner)](https://repo1.maven.org/maven2/com/datasqrl/flinkrunner/flink-sql-runner/)\n\n# Flink SQL Runner\n\n\u003cimg src=\"stdlib-docs/img/runner_logo.png\" alt=\"Flink SQL Runner Logo\" width=\"300\" align=\"right\" /\u003e\n\nTools and extensions for running Apache Flink SQL applications, including Docker images, data types, connectors, function libraries, and formats.\n\nThis repository contains core components for running Flink SQL applications in production using the Flink Kubernetes Operator, without manual JAR assembly or custom infrastructure.\n\nThe individual components are modular and the project is composable to make it easy to create your own custom Flink SQL runner.\n\n## Features\n\n- 📝 **SQL Script Execution**: Run SQL scripts directly with Flink.\n- 🧾 **Compiled Plan Execution**: Run pre-compiled Flink SQL plans to manage production deployments and versioning.\n- 🔄 **Environment Variable Substitution**: Inject environment variables `${VAR}` into SQL scripts and configs at runtime.\n- 📦 **JAR Dependency Management**: Reference local directories with required JARs (e.g. UDFs).\n- 🌍 **Kubernetes-Friendly**: Built to run with the Flink Kubernetes Operator.\n- 🔧 **Function Infrastructure**: Utilities for writing and loading UDFs as system functions.\n- 🪄 **Flink Extensions**:\n    - 💀 Dead-letter queue support in Kafka for poison message handling.\n    - 🚀 Native JSON and Vector types with JSON format and PostgreSQL connector support.\n    - 📚 Function libraries for additional functionality in Flink SQL (advanced math, OpenAI, etc)\n    - ⚙️ Additional configuration options for CSV format.\n\n---\n\n## Flink SQL Runner Usage\n\nYou can use the docker image to run Flink SQL scripts or compiled plans locally or in Kubernetes.\nThe docker image contains the executable flink-sql-runner.jar file which supports the following command line arguments:\n\n| Argument           | Description                                                                                                                                                                                          |\n|--------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `-p, --planfile`   | Compiled plan (i.e. JSON file) to execute                                                                                                                                                            |\n| `-s, --sqlfile`    | Flink SQL script to execute                                                                                                                                                                          |\n| `-c, --config-dir` | Directory containing the [Flink configuration YAML file](https://nightlies.apache.org/flink/flink-docs-release-2.2/docs/deployment/config/)                                                          |\n| `-u, --udfpath`    | Path to folder that contains JAR files that implement user defined functions (UDFs) or other runtime extensions for Flink                                                                            |\n| `-m, --mode`       | Optional argument to specify [Flink execution mode](https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/dev/datastream/execution_mode/) (`STREAMING` (default), `BATCH`, or `AUTOMATIC`) |\n\n\u003e [!WARNING]\n\u003e The runner expects either a Flink SQL script or a compiled plan - not both.\n\u003e \n\u003e The `--mode` argument - even if it is explicitly set - will be ignored if the Flink YAML configuration set via `--config-dir` contains `execution.runtime-mode`.\n\nWe strongly recommend to run compiled plans for production Flink SQL applications since they support\nlifecycle management of applications, are stable across Flink versions, and provide more control over\nthe executed JobGraph.\nYou can use the [SQRL compiler](https://github.com/DataSQRL/sqrl/) to compile Flink SQL applications to compiled plans.\n\n### Running Locally\n\nTo run Flink SQL Runner locally using Docker in a self-contained cluster (JobManager and TaskManager in a single container):\n\n1\\. Create your SQL script\nPut your Flink SQL (e.g., `flink.sql`) in a local directory, such as:\n\n```bash\n./sql-scripts/flink.sql\n```\n\n2\\. Run the Docker image\nThis starts a full standalone Flink session cluster in one container:\n\n```bash\ndocker run -d --rm -it \\\n  -p 8081:8081 \\\n  -v \"$PWD/sql-scripts\":/flink/sql \\\n  --name runner \\\n  datasqrl/flink-sql-runner:0.9.3-flink-2.2 \\\n  cluster\n```\n\n3\\. Submit your SQL job\nIn a separate terminal, run:\n\n```bash\ndocker exec -it runner flink run flink-sql-runner.jar --sqlfile /flink/sql/flink.sql\n```\n\nThe job will be submitted to the embedded JobManager and executed using the local TaskManager.\n\n\u003e [!NOTE]  \n\u003e The `flink-sql-runner.jar` is a symlink placed in the Flink root directory (`/opt/flink`) for easier access, but the actual file resides in its own plugin directory: `/opt/flink/plugins/flink-sql-runner`.\n\u003e It is possible to add any Flink arguments or run any accessible JAR, just like with a vanilla `flink run` command.\n\n4\\. Inspect output\nIf your SQL uses the print connector as a sink, you can check logs via:\n\n```bash\ndocker exec -it runner bash -c \"cat /opt/flink/log/$(ls /opt/flink/log | grep 'flink--taskexecutor' | grep '.out')\"\n```\n\nOr use the Flink UI at http://localhost:8081 to monitor jobs.\n\n### Running in Kubernetes with Flink Operator\n\nHere's how to use the Flink Jar Runner with the Flink Operator on Kubernetes:\n\n1\\. Prepare Your Files: Ensure that your SQL scripts (`statements.sql`) or compiled plans (`compiled_plan.json`), and JAR files are accessible within your container.\n\nExample Helm chart configuration:\n\n```yaml\napiVersion: flink.apache.org/v1beta1\nkind: FlinkDeployment\nmetadata:\n  name: sql-example\nspec:\n  image: datasqrl/flink-sql-runner:latest\n  flinkVersion: v1_19\n  flinkConfiguration:\n    taskmanager.numberOfTaskSlots: \"1\"\n  serviceAccount: flink\n  jobManager:\n    resource:\n      memory: \"2048m\"\n      cpu: 1\n  taskManager:\n    resource:\n      memory: \"2048m\"\n      cpu: 1\n  job:\n    jarURI: http://raw.github.com/datasqrl/releases/0.9.3/flink-sql-runner.jar\n    args: [\"--sqlfile\", \"/opt/flink/usrlib/sql-scripts/statements.sql\", \"--planfile\", \"/opt/flink/usrlib/sql-scripts/compiled_plan.json\", \"--udfpath\", \"/opt/flink/usrlib/jars\"]\n    parallelism: 1\n    upgradeMode: stateless\n```\n\n\u003e [!WARNING]\n\u003e Configure either the SQL script OR the compiled plan - not both.\n\n1. Deploy with Helm:\n```bash\nhelm install sql-example -f \u003cyour-helm-values\u003e.yaml \u003cyour-helm-chart\u003e\n```\n\n### Environment Variable Substitution\n\nFlink SQL Runner automatically substitutes environment variables in your configuration files, SQL scripts, and compiled plans for secrets and environment specific configuration. Environment variables must be of the form `${ENV_VARIABLE}` and inside of strings.\n\nFor example, `${DATA_PATH}` is an environment variable inside the connector configuration of a table that is substituted at runtime:\n```sql\nCREATE TEMPORARY TABLE `MyTable` (\n  ...\n) WITH (\n  'connector' = 'filesystem',\n  'format' = 'json',\n  'path' = '${DATA_PATH}/applications.jsonl',\n  'source.monitor-interval' = '1'\n);\n```\n\n### Default Environment Variables\n\nThe Flink SQL Runner automatically provides default values for deployment-specific environment variables if they are not already set in your environment.\nThe following environment variables are automatically set with default values:\n\n| Variable               | Default Value                | Description                                                                           |\n|------------------------|------------------------------|---------------------------------------------------------------------------------------|\n| `DEPLOYMENT_ID`        | Random UUID                  | A unique identifier for the deployment (e.g., `550e8400-e29b-41d4-a716-446655440000`) |\n| `DEPLOYMENT_TIMESTAMP` | Current time in milliseconds | The deployment timestamp in milliseconds since epoch (e.g., `1704067200000`)          |\n\nThese defaults are applied at runtime before environment variable substitution occurs, making them available for use in SQL scripts and compiled plans even when not explicitly set in your environment.\n\nExample usage in a SQL script:\n```sql\nSELECT '${DEPLOYMENT_ID}' AS deployment_id, CAST('${DEPLOYMENT_TIMESTAMP}' AS BIGINT) AS deployment_timestamp;\n```\n\n### Building Your Own Flink SQL Runner\n\nThe Flink SQL runner is published to Maven Central and you can add it as a dependency in your project to extend\nthe runner to suit your needs.\n\n- Maven:\n\n```xml\n\u003cdependency\u003e\n  \u003cgroupId\u003ecom.datasqrl.flinkrunner\u003c/groupId\u003e\n  \u003cartifactId\u003eflink-sql-runner\u003c/artifactId\u003e\n  \u003cversion\u003e0.9.3\u003c/version\u003e\n\u003c/dependency\u003e\n```\n- Gradle:\n\n```groovy\nimplementation 'com.datasqrl.flinkrunner:flink-sql-runner:0.9.3'\n```\n---\n\n## Flink Extensions\n\nThe Flink SQL Runner contains a few extensions to the Flink runtime.\n\n### Dead-Letter-Queue Support for Kafka Sources\n\nIf a Flink SQL application fails to deserialize a message from a Kafka topic, the entire job can fail. \n\nThis project implements the `kafka-safe` and `upsert-kafka-safe` [connectors](connectors/kafka-safe) which extend the respective kafka connectors with dead-letter-queue support, so that messages which fail to deserialize can be logged, or sent to a dead-letter-queue, instead of failing the job.\n\nIn addition to the configuration options exposed by the original kafka connectors, the `-safe` versions support the following optional configuration options:\n\n| Options                    | Default | Type   | Description                                                                                                                       |\n|----------------------------|---------|--------|-----------------------------------------------------------------------------------------------------------------------------------|\n| scan.deser-failure.handler | none    | String | Use `log` to output failed messages to the logger, `kafka` to output failed messages to a kafka topic, or `none` to fail the job. |\n| scan.deser-failure.topic   | -       | String | The topic for the dead-letter-queue that failed messages are written to. Required when the handler is configured to `kafka`.      |\n\n\u003e [!NOTE]  \n\u003e The dead-letter-queue producer will use the same Kafka configuration, that is provided for the Flink SQL table that reads the data.\n\n### JSONB Type\n\nThis project adds a [binary JSON type](types/json-type) and associated functions for more efficient JSON handling that does not serialize from and to string repeatedly.\n\nNative JSON type support is also extended to the [JSON format](formats/flexible-json-format) called `flexible-json` for writing JSON data as nested documents (instead of strings) as well as the [JDBC connector for PostgreSQL](connectors/postgresql-connector) to write JSON data to JSONB columns.\n\nThe binary JSON type is supported by [these system functions](stdlib-docs/system-functions.yml).\n\n### Vector Type\n\nThis project adds a native [Vector type](types/vector-type) and associated functions for more efficient handling of vectors (e.g. for content embeddings).\n\nNative Vector type support is also extended to the [JDBC connector for PostgreSQL](connectors/postgresql-connector) to write vector data to vector columns for the `pgvector` extension.\n\nThe native vector type is supported by [these system functions](stdlib-docs/system-functions.yml).\n\n### Function Libraries\n\n\u003cimg src=\"stdlib-docs/img/sqrl_functions_logo.png\" alt=\"Flink SQL Runner Logo\" width=\"300\" align=\"right\" /\u003e\n\nImplementation of Flink SQL and SQRL functions that can be added as user-defined functions (UDFs) to support additional functionality.\n\n* [Math](stdlib-docs/library-functions.yml): Advanced math functions\n* [Iceberg](stdlib-docs/library-functions.yml): Helper functions for advanced Iceberg functionality.\n* [OpenAI](stdlib-docs/library-functions.yml): Function for calling completions, structured data extraction, and vector embeddings.\n\n## Usage\n\n### Within DataSQRL\n\nIf you are using the [DataSQRL framework](https://github.com/DataSQRL/sqrl) to compile your SQRL project, you can import the function library as follows:\n\n`IMPORT stdlib.[library-name].*`\n\nwhere `[library-name]` is replaced with the name of the library, e.g. `stdlib.math.*`.\n\nTo import a single function:\n\n`IMPORT stdlib.[library-name].[function-name]`\n\ne.g. `stdlib.text.split`.\n\n### Flink SQL Runner\n\nTo use a function library with the Flink SQL Runner:\n\n1. Copy the JAR file for the function library to the UDF directory that is passed as an argument.\n2. Declare the function in your Flink SQL script:\n```sql\nCREATE FUNCTION TheFunctionToAdd AS 'com.datasqrl.flinkrunner.[library-name].[function-name]';\n```\n\nwhere you replace `[library-name]` with the name of the function library and `[function-name]` with the name of the function.\n\n### Custom Flink Implementation\n\nIf you are building your own Flink SQL runner, you can depend on the function modules and load the functions into your project.\n\n### CSV Format\n\nThe `flexible-csv` format extends the standard csv format with a configuration option `skip-header` to skip the first row in a CSV file (i.e. the header).\n\n---\n\n## Community Contributions\n\nContributions are welcome! Feel free to open an issue or submit a [pull request](https://github.com/DataSQRL/flink-sql-runner/pulls) on GitHub.\n\n### Code Formatting\n\nThis repo uses `spotless-maven-plugin` for code formatting.\n\nTo automatically apply formatting before each commit:\n\n```bash\n./scripts/install-git-hooks.sh\n```\n\nThis sets `core.hooksPath` to `.githooks` in your local clone.\n\n### Releasing\nRelease process is fully automated and driven by github release. Just [create a new release](https://github.com/DataSQRL/flink-sql-runner/releases/new) and github action will take care of the rest. The new release version will match the `tag`, so must use [semver](https://semver.org/) when selecting tag name.\n\n### License\nThis project is licensed under the Apache 2 License. See the [LICENSE](https://github.com/DataSQRL/flink-sql-runner/blob/main/LICENSE) file for details.\n\n### Contact \u0026 Support\nFor any questions or support, please open an [issue](https://github.com/DataSQRL/flink-sql-runner/issues) in the GitHub repository.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatasqrl%2Fflink-sql-runner","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatasqrl%2Fflink-sql-runner","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatasqrl%2Fflink-sql-runner/lists"}