{"id":21514850,"url":"https://github.com/getindata/flink-elastic-catalog","last_synced_at":"2025-08-02T09:14:40.961Z","repository":{"id":74526105,"uuid":"592365388","full_name":"getindata/flink-elastic-catalog","owner":"getindata","description":"Flink Catalog for Elasticsearch.","archived":false,"fork":false,"pushed_at":"2025-01-08T12:29:11.000Z","size":492,"stargazers_count":4,"open_issues_count":1,"forks_count":2,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-04-09T20:11:27.542Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/getindata.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2023-01-23T15:22:33.000Z","updated_at":"2025-01-08T12:06:06.000Z","dependencies_parsed_at":null,"dependency_job_id":"fd05343f-49c6-44f4-9eb1-08873edcaf0a","html_url":"https://github.com/getindata/flink-elastic-catalog","commit_stats":null,"previous_names":[],"tags_count":10,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/getindata%2Fflink-elastic-catalog","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/getindata%2Fflink-elastic-catalog/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/getindata%2Fflink-elastic-catalog/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/getindata%2Fflink-elastic-catalog/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/getindata","download_url":"https://codeload.github.com/getindata/flink-elastic-catalog/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248103872,"owners_count":21048245,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-23T23:53:13.345Z","updated_at":"2025-04-09T20:11:32.555Z","avatar_url":"https://github.com/getindata.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# flink-elastic-catalog\n\n---\n\n## Description\n\nThis is an implementation of a [Flink Catalog](https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/catalogs/)\nfor [Elastic](https://www.elastic.co/).\n\n---\n\n## Possible Operations\n\n- `listDatabases` Lists Databases in a catalog.\n- `databaseExists` Checks if a database exists.\n- `listTables` Lists Tables in a Database.\n- `tableExists` Checks if a table exists.\n- `getTable` Gets the metadata information about the table. This consists of table schema and table properties. Table properties among others contain `CONNECTOR`, `BASE_URL`, `TABLE_NAME` and `SCAN_PARTITION` options.\n\n---\n\n## Scan options\n\nIf we want tables in a catalog to be partitioned by a column we should specify scan options.\nIt is possible to set up [Scan options](https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/connectors/table/jdbc/#scan-partition-column:~:text=than%201%20second.-,scan.partition.column,-optional) while defining a catalog.\n\nThere are 2 types of scan options for Elastic Catalog:\n\n### Default scan options for a catalog\nWe can specify default partitioning options for all tables in a catalog. If no options for a table are specified, these options will be used to\nselect a column for partitioning and the number of partitions for a table will be calculated based on catalog default option. \n\n- `catalog.default.scan.partition.column.name` Specify what column to use for table partitioning by default. The default option will be used\nfor all tables in a catalog. We can overwrite a column to use for partitioning of a table by specifying table specific scan options.\n- `catalog.default.scan.partition.size` Specify how many elements should be placed in a single partition. The number of\npartitions will be calculated based on the number of elements and the default size of a partition. If we want a particular table\nto have an exact number of partitions, we can specify that number using table specific scan options.\n\n### Table specific scan options\nThese options can be useful if we know that not all tables in a catalog should be partitioned in the same way. Here\nwe can specify partitioning options for selected tables.\n\n- `properties.scan.{tablename}.partition.column.name` Specify the name of the column to use for partitioning of a table.\nCorresponds to the `scan.partition.column` option.\n- `properties.scan.{tablename}.partition.number` Specify the number of partitions for a table. Corresponds to the `scan.partition.num` option.\n\nFor both of options specified above we should replace `{tablename}` with the name of the table that we want the options to apply to.\nWe can provide these options for multiple tables.\n\n### Index patterns\nIf we specify an index pattern, a Flink table will be created in Catalog that instead of targeting a single index in Elastic will target all indexes that match\nthe pattern provided. It is useful to use if we want to write Flink SQL that reads similar data from many similar tables instead of a single one.\nThe resulting Flink table will contain all columns found in matching tables and will use all the data from matching tables.\nThis table will have the same name as the pattern.\n\n- `properties.index.patterns` Specify patterns for which we want to create Flink tables. We can specify multiple index patterns by\nseparating them with a comma `,` sign.\n\nThe Flink tables created this way can also be partitioned just as other Flink tables by providing default catalog scan options or table specific scan options.\n\n### Time attributes\n\nIt is possible to add `proctime` column to each catalog table.\n\n```properties\ncatalog.add-proctime-column=true\n```\n\n---\n\n## Rules for overwriting catalog scan options\n\n### No scan options were provided\nThere is no necessity to provide either default scan options for a catalog or table specific scan options. If there are no scan options provided\nno tables in a catalog will be partitioned.\n\n### Only default scan options for a catalog were provided\nIf only default catalog scan options were provided, all tables in a catalog will be partitioned in a similar way. The same column name for table partitioning for all tables and\nthe number of partitions for tables will be dependant on the number of records in a table. All tables will have the same maximum number of elements in a partition.\n\n### Only table specific scan options were provided\nIf we want a specific table to be partitioned and leave the rest of tables nonpartitioned we have to provide both table specific scan options.\n\n### We specified both catalog default scan options and table specific scan were options\nTable specific scan options have higher priority over catalog default scan properties when deciding how to partition a table.\nIf we specify catalog default partition column name and a table specific partition column name then table specific partition column name is taken into account.\nSimilar thing happens when we specify catalog default scan partition size and table specific partition number. Instead of calculating the number of partitions for a table\nbased on the count of elements, the table will have the number of partitions equal to the one provided for a table.\n\n--- \n\n## Calculation of scan partition bounds\nIf a table is partitioned, meaning that we specified catalog default scan options or we specified table specific scan options the upper and lower bounds will be calculated.\nAs specified in the Flink documentation, the `properties.scan.{tablename}.partition.column.name` option works for numeric and temporal data types.\nThe `scan.partition.lower-bound` will be calculated as the lowest value in the table.\nThe `scan.partition.upper-bound` will be calculated as the highest value in the table.\n\n---\n\n## Note that\nIf we want a table to be partitioned it is necessary that we provide a catalog default or table specific option for partition column to use and\ncatalog default or table specific partition number option for deciding how many partitions to use for a table.\nIf only 1 option is provided we will receive an error.\n\n---\n\n## Implementation details\n\nA few `org.apache.flink.*` classes has been copied and shaded due to incorrect JDBC validation (`org.apache.flink.connector.jdbc.catalog.JdbcCatalogUtils.validateJdbcUrl`).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgetindata%2Fflink-elastic-catalog","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgetindata%2Fflink-elastic-catalog","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgetindata%2Fflink-elastic-catalog/lists"}