{"id":18358986,"url":"https://github.com/ev2900/iceberg_glue_register_table","last_synced_at":"2026-05-04T05:32:19.182Z","repository":{"id":225878026,"uuid":"767040231","full_name":"ev2900/Iceberg_Glue_register_table","owner":"ev2900","description":"Example using the Iceberg register_table command with AWS Glue and Glue Data Catalog","archived":false,"fork":false,"pushed_at":"2026-04-02T14:27:16.000Z","size":633,"stargazers_count":1,"open_issues_count":0,"forks_count":3,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-04-03T03:04:53.825Z","etag":null,"topics":["apache-iceberg","aws","aws-glue","aws-glue-data-catalog","glue","iceberg"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ev2900.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-03-04T15:44:07.000Z","updated_at":"2026-04-02T14:27:17.000Z","dependencies_parsed_at":"2025-12-07T04:07:22.894Z","dependency_job_id":null,"html_url":"https://github.com/ev2900/Iceberg_Glue_register_table","commit_stats":null,"previous_names":["ev2900/iceberg_glue_register_table"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ev2900/Iceberg_Glue_register_table","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ev2900%2FIceberg_Glue_register_table","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ev2900%2FIceberg_Glue_register_table/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ev2900%2FIceberg_Glue_register_table/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ev2900%2FIceberg_Glue_register_table/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ev2900","download_url":"https://codeload.github.com/ev2900/Iceberg_Glue_register_table/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ev2900%2FIceberg_Glue_register_table/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32596525,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-03T22:12:39.696Z","status":"online","status_checked_at":"2026-05-04T02:00:06.625Z","response_time":58,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache-iceberg","aws","aws-glue","aws-glue-data-catalog","glue","iceberg"],"created_at":"2024-11-05T22:20:17.631Z","updated_at":"2026-05-04T05:32:19.177Z","avatar_url":"https://github.com/ev2900.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Iceberg Glue - register_table\n\n\u003cimg width=\"275\" alt=\"map-user\" src=\"https://img.shields.io/badge/cloudformation template deployments-154-blue\"\u003e \u003cimg width=\"85\" alt=\"map-user\" src=\"https://img.shields.io/badge/views-1815-green\"\u003e \u003cimg width=\"125\" alt=\"map-user\" src=\"https://img.shields.io/badge/unique visits-661-green\"\u003e\n\nThe Apache Iceberg ```register_table``` can be used to register Iceberg metadata file to a new data catalog table. This functionality is especially useful in data catalog migrations.\n\n\u003e [!CAUTION]\n\u003e ```register_table``` will **NOT** change the S3 absolute paths in the Iceberg metadata files.\n\u003e\n\u003e If you want to change the S3 absolute paths bc. you are migrating the table storage (not just the catalog) you need to first use the ```rewrite_table_path``` procedure to update the S3 absolute paths in the metadata files. The documentation for this procedure is [HERE](https://iceberg.apache.org/docs/1.9.0/spark-procedures/#rewrite_table_path).\n\u003e\n\u003e After you run ```rewrite_table_path``` you can use ```register_table``` referencing the updated metadata.\n\u003e\n\u003e This is only applicable if you are moving the Iceberg table to a different S3 bucket or prefix. If you are leaving the S3 location the same and just migrating it to a new data catalog entry you skip the  ```rewrite_table_path``` procedure and go straight to ```register_table```\n\nThe use case for ```register_table``` is your Iceberg datafiles and metadata files are **staying the same S3 location** but you want to register them as a new data catalog table.\n\n## Example using AWS Glue and Glue Data Catalog\n\nLaunch the CloudFormation stack below to walk through an example. In the example you will creating an Iceberg table in the Glue Data Catalog database ```iceberg``` via. a Glue job. Then you will use another Glue job to register the table you created with a different Glue Data Catalog Database ```icebergregister```\n\n### Launch the CloudFormation stack\n\nClick the button below to launch a CloudFormation stack. The stack will deploy everything we need including Glue jobs, Glue Data Catalog databases, S3 buckets etc.\n\n\u003e [!WARNING]\n\u003e The CloudFormation stack creates IAM role(s) that have ADMIN permissions. This is not appropriate for production deployments. Scope these roles down before using this CloudFormation in production.\n\n\u003e [!NOTE]\n\u003e The Glue jobs this cloudformation stack deploys uses Iceberg version 1.10.0\n\n[![Launch CloudFormation Stack](https://sharkech-public.s3.amazonaws.com/misc-public/cloudformation-launch-stack.png)](https://console.aws.amazon.com/cloudformation/home#/stacks/new?stackName=iceberg-register-table\u0026templateURL=https://sharkech-public.s3.amazonaws.com/misc-public/glue_iceberg_register_table.yaml)\n\n### Run the Glue job to Create Iceberg table\n\nOpen the [Glue Console](https://us-east-1.console.aws.amazon.com/gluestudio/home). Select the ETL jobs section and click on run the *0 Create Iceberg Table* and then *Run job*\n\n\u003cimg width=\"700\" alt=\"quick_setup\" src=\"https://github.com/ev2900/Iceberg_Glue_register_table/blob/main/REAME/run_glue_job_1.PNG\"\u003e\n\nThis will create a table in Glue Data Catalog named *iceberg*\n\n\u003cimg width=\"700\" alt=\"quick_setup\" src=\"https://github.com/ev2900/Iceberg_Glue_register_table/blob/main/REAME/glue_table_1.PNG\"\u003e\n\n## Update and run the Glue job to register the Iceberg table\n\nOpen the [Glue Console](https://us-east-1.console.aws.amazon.com/gluestudio/home). Select the ETL jobs section and click on edit\n\n\u003cimg width=\"700\" alt=\"quick_setup\" src=\"https://github.com/ev2900/Iceberg_Glue_register_table/blob/main/REAME/edit_job_1.png\"\u003e\n\nIn the Glue script we need to edit the query\n\n```\nCALL glue_catalog.system.register_table(\n  table =\u003e 'icebergregister.registersampledataicebergtable',\n  metadata_file =\u003e 's3://\u003cbucket-name\u003e/iceberg/iceberg.db/sampledataicebergtable/metadata/\u003cmost-recent-snapshot-file\u003e.metadata.json'\n```\n\nSpecifically you need to replace the ```\u003cbucket-name\u003e``` and ```\u003cmost-recent-snapshot-file\u003e``` file name. You want the ```register_table```, ```metadata_file``` to reference the most recent *.metadata.json* file. This *.metadata.json* files was created when you ran the *0_create_iceberg_table.py* job to create the initial Iceberg table. You can find the name of the S3 bucket and the name of the most recent snapshot file by navigating through the [S3 console](https://us-east-1.console.aws.amazon.com/s3/home)\n\nOnce you update the Glue script **Save** and **Run** the job.\n\nAfter running the Glue job. The Glue Data Catalog will have a new table *registersampledataicebergtable* created in the *icebergregister* database\n\n\u003cimg width=\"700\" alt=\"quick_setup\" src=\"https://github.com/ev2900/Iceberg_Glue_register_table/blob/main/REAME/registered_table.PNG\"\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fev2900%2Ficeberg_glue_register_table","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fev2900%2Ficeberg_glue_register_table","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fev2900%2Ficeberg_glue_register_table/lists"}