https://github.com/anatol-ju/iceberg-tables-example
Example of how to create Apache Iceberg tables using AWS CDK (TypeScript).
https://github.com/anatol-ju/iceberg-tables-example
aws cdk-examples iceberg
Last synced: 3 months ago
JSON representation
Example of how to create Apache Iceberg tables using AWS CDK (TypeScript).
- Host: GitHub
- URL: https://github.com/anatol-ju/iceberg-tables-example
- Owner: anatol-ju
- Created: 2025-06-05T19:48:31.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2025-06-07T14:21:46.000Z (4 months ago)
- Last Synced: 2025-06-07T15:26:23.304Z (4 months ago)
- Topics: aws, cdk-examples, iceberg
- Language: TypeScript
- Homepage:
- Size: 22.5 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Iceberg Table Infrastructure with AWS CDK
This project demonstrates how to define, provision, and manage Apache Iceberg tables in AWS using the AWS Cloud Development Kit (CDK). It creates a scalable, modular data lakehouse foundation with proper infrastructure-as-code practices.
> π οΈ This repository was created for demonstration purposes and is part of my engineering portfolio. While it can be adapted for real use cases, it is not actively maintained for production.
## π Overview
This repository provisions:
- **S3 buckets** to store Iceberg table data and schema files.
- **Glue Catalog databases and tables** compatible with Iceberg.
- **Athena queries via custom AWS SDK resources** for declarative table creation.
- **Automatic schema parsing from JSON** to Iceberg-compatible SQL.
- **Parameter storage in AWS SSM** for safe retrieval of table details.It supports flexible schema definitions, optional SQL overrides, and SSM-based configuration for secure and reusable environments.
---
## π§± Project Structure
```bash
iceberg-tables-example/
βββ bin/
β βββ createIcebergTables.ts # CDK entry point to deploy infrastructure
βββ lib/
β βββ interfaces.ts # TypeScript interfaces for table configuration
β βββ bucketStack.ts # CDK stack for creating secure S3 buckets
β βββ icebergTableStack.ts # CDK stack for deploying Iceberg tables
β βββ utils.ts # Utilities for schema parsing, SSM access, and validation
β βββ versionedStack.ts # Base class for versioned CDK stacks
βββ data/
β βββ schemas/ # JSON schema files for Iceberg tables
βββ package.json
βββ README.md # Project documentation (this file)
```---
## π§ Features
- Environment-aware deployments via EnvAwareStackProps
- Custom SQL support with onCreateQuery, onUpdateQuery, and onDeleteQuery
- JSON Schema β SQL column mapping with custom type conversions
- SSM-resolved parameters for runtime bucket config and outputs
- Partitioned table support for efficient querying
- Reusable IAM roles with scoped permissions
- Schema upload to S3 for transparency and auditing---
## π¦ Prerequisites
- Node.js β₯ 16
- AWS CDK v2
- AWS credentials with permissions for:
- S3
- Athena
- Glue
- SSM
- IAMInstall dependencies:
```bash
yarn install
```---
## π Deploying the Stack
1. Configure your environment
Edit `stackProps` and `environment` settings in `bin/createIcebergTables.ts`.
2. Add your JSON schema
Place your Iceberg-compatible schema in `data/schemas/your_table.schema.json`.
3. Define your table properties
Adjust or add a new `TableBuildProps` object in `createIcebergTables.ts`.
4. Deploy
```
yarn deploy:dev
```This will:
- Create the bucket
- Upload schema to S3
- Create an Iceberg table using Athena
- Store key table metadata in SSM---
## π§ͺ Example Schema Mapping
Hereβs an example of a JSON schema-to-Iceberg conversion using the mapping feature:
```typescript
const mapping = {
"json_str": {
"json_map": {
"type": "map",
"properties": {
"key": { "type": "string" },
"value": { "type": "integer" }
}
}
}
};
```
This will rename `json_str` to `json_map` and convert it to `map` in the resulting SQL schema.---
## π Outputs
After deployment, the following will be saved in AWS Systems Manager Parameter Store:
- Table name
- Table ARN
- Table S3 location
- Output S3 path for Athena
- Path to schema in S3These can be referenced across your infrastructure for consistency.
---
## π Learn More
- [Apache Iceberg Docs](https://iceberg.apache.org/docs/nightly/)
- [AWS CDK Docs](https://docs.aws.amazon.com/cdk/v2/guide/home.html)
- [AWS Athena Iceberg Setup](https://docs.aws.amazon.com/athena/latest/ug/querying-iceberg.html)
- [Glue Overview](https://docs.aws.amazon.com/glue/latest/dg/what-is-glue.html)---
## π§βπ» Author
Anatol Jurenkow
Cloud Data Engineer | AWS CDK Enthusiast | Iceberg Fan
(https://github.com/anatol-ju)[GitHub] Β· (https://de.linkedin.com/in/anatol-jurenkow)[LinkedIn]
---
## π License
βThis project is for portfolio purposes only. Please contact me if youβd like to reuse or adapt this code.β