An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with apache-iceberg

A curated list of projects in awesome lists tagged with apache-iceberg .

https://github.com/matanolabs/matano

Open source security data lake for threat hunting, detection & response, and cybersecurity analytics at petabyte scale on AWS

alerting apache-iceberg aws aws-security big-data cloud cloud-native cloud-security cybersecurity detection-engineering dfir log-analytics log-management rust secops security security-tools serverless siem threat-hunting

Last synced: 14 May 2025

https://github.com/apache/incubator-xtable

Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.

apache-hudi apache-iceberg delta-lake

Last synced: 13 Apr 2025

https://github.com/dominikhei/Local-Data-LakeHouse

Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testing.

apache-iceberg data-lake data-lakehouse hive-metastore lakehouse minio trino

Last synced: 07 May 2026

https://github.com/dacort/modern-data-lake-storage-layers

Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work

amazon-emr apache-hudi apache-iceberg aws delta-lake hudi iceberg

Last synced: 31 Oct 2025

https://github.com/aws-solutions-library-samples/guidance-for-developing-data-and-ai-foundation-with-amazon-sagemaker

DAIVI is a reference solution with IAC modules to accelerate development of Data, Analytics, AI and Visualization applications on AWS using the next generation Amazon SageMaker Unified Studio. The goal of the DAIVI solution is to provide engineers with sample infrastructure-as-code modules and application modules to build their data platforms.

apache-iceberg sagemaker sagemaker-studio terraform

Last synced: 31 May 2026

https://github.com/guidok91/spark-movies-etl

Spark data pipeline that processes movie ratings data.

apache-iceberg data-engineering data-pipeline elt etl pyspark spark uv

Last synced: 11 Mar 2026

https://github.com/datazip-inc/olake-ui

Frontend & BFF (Backend for frontend) for Olake. This includes the UI code and backend code for storing the configuration of sync and orchestrating it.

apache-iceberg change-data-capture data-engineering database elt elt-pipeline etl etl-pipeline hacktoberfest ui

Last synced: 23 Apr 2026

https://github.com/bodo-ai/denali

An open-source, community-driven REST catalog for Apache Iceberg!

apache-iceberg catalog go golang iceberg

Last synced: 06 Jul 2025

https://github.com/aws-samples/iceberg-streaming-examples

This repo contains examples of high throughput ingestion using Apache Spark and Apache Iceberg. These examples cover IoT and CDC scenarios using best practices. The code can be deployed into any Spark compatible engine like Amazon EMR Serverless or AWS Glue. A fully local developer environment is also provided.

apache-iceberg apache-spark structured-streaming

Last synced: 29 Oct 2025

https://github.com/guidok91/spark-structured-streaming-kafka

Spark Structured Streaming data pipeline that processes movie ratings data in real-time.

apache-iceberg apache-kafka apache-spark data-engineering etl kafka pyspark real-time spark spark-structured-streaming streaming

Last synced: 11 Mar 2026

https://github.com/gordonmurray/apache_flink_and_iceberg

Using Apache Flink to write to s3 in Apache Iceberg format

apache-flink apache-iceberg parquet s3

Last synced: 12 Apr 2025

https://github.com/jesufemi-o/iceberg-integration-framework

A poc open framework to manage data ingestion into apache iceberg tables

apache-iceberg lakehouse-platform pyiceberg

Last synced: 06 Mar 2026

https://github.com/sidequery/dlt-iceberg

An Iceberg destination for DLT that supports REST catalogs

apache-iceberg data-engineering datalake dlt dlthub etl iceberg

Last synced: 09 Feb 2026

https://github.com/bahbosque/delta-to-iceberg-aws-glue

Tool to migrate Delta Lake tables to Apache Iceberg using AWS Glue and S3

apache-iceberg aws aws-glue-data-catalog data-lake delta-lake migration-tool open-source spark

Last synced: 03 Jul 2025

https://github.com/ev2900/iceberg_emr_athena

Resources from an virtual tech talk / workshop - Set Up and Use Apache Iceberg Tables on Your Data Lake

apache-iceberg athena aws emr spark

Last synced: 01 Aug 2025

https://github.com/ev2900/iceberg_update_metadata_script

Python script that will update S3 file paths in Iceberg metadata files (metadata.json + AVRO)

apache-iceberg aws aws-glue glue iceberg python

Last synced: 13 Apr 2025

https://github.com/joewood/react-iceberg

React Components to visualize Apache Iceberg tables

apache-arrow apache-iceberg apache-spark avro devcontainer docker-compose minio reactjs s3

Last synced: 11 Apr 2026

https://github.com/hussein-awala/gdpr-compliant-lakehouse

This repository is a demonstration of how to handle GDPR export and delete requests in an Iceberg Lakehouse to make it GDPR-compliant.

apache-iceberg apache-spark datalake gdpr lakehouse

Last synced: 18 May 2026

https://github.com/j3-signalroom/apache_flink-kickstarter

Examples of Apache Flink® applications showcasing the DataStream API and Table API in Java and Python, featuring AWS, GitHub, Terraform, and Apache Iceberg.

apache-flink apache-iceberg aws-glue aws-parameter-store aws-s3 aws-secrets-manager flink flink-examples flink-kafka flink-stream-processing github-actions iceberg snowflake streamlit-dashboard terraform-cloud

Last synced: 16 Mar 2025

https://github.com/kameshsampath/polaris-spark-devbox

A development environment for experimenting with Apache Polaris and Iceberg

apache-iceberg apache-polaris apache-spark jupyter-notebooks

Last synced: 19 May 2026

https://github.com/ev2900/mongodb_streams_glue_iceberg

Process DynamoDB change streams via. AWS Glue w Iceberg to keep a copy of a collection in S3 upto date

apache-iceberg aws-glue glue mondodb mongodb-change-streams python

Last synced: 15 Oct 2025

https://github.com/ev2900/emr_studio_iceberg

Apache Icebery examples designed to be run on AWS Elastic Map Reduce (EMR) via. EMR Studio or EMR Notebooks

apache-iceberg aws elastic-map-reduce emr iceberg

Last synced: 02 May 2026

https://github.com/ev2900/iceberg_glue_register_table

Example using the Iceberg register_table command with AWS Glue and Glue Data Catalog

apache-iceberg aws aws-glue aws-glue-data-catalog glue iceberg

Last synced: 04 May 2026

https://github.com/j3-signalroom/supercharge_streamlit-apache_flink

Engaging, interactive visualizations crafted with Streamlit, seamlessly powered by Apache Flink in batch mode to reveal deep insights from data.

apache-flink apache-iceberg aws-glue-data-catalog flink flink-sql iceberg kafka pyflink streamlit streamlit-dashboard

Last synced: 22 May 2026

https://github.com/j3-signalroom/linux_flink_with_iceberg

Apache Flink Docker image with Apache Iceberg support for Linux (i.e., non-Mac M chip).

apache-flink apache-iceberg flink iceberg

Last synced: 18 Mar 2026

https://github.com/johnymontana/hands-on-havasu-geoparquet

Notebook to accompany the "Hands-On With Havasu & GeoParquet" livestream

apache-iceberg apache-sedona geoparquet parquet sedonadb

Last synced: 12 Oct 2025

https://github.com/ivanyu/icebreaker

A GUI for Apache Iceberg REST Catalog

apache-iceberg gui iceberg swing

Last synced: 05 Apr 2025

https://github.com/marcinthecloud/iceberg.rest

An Apache Iceberg REST Catalog explorer - view namespaces, tables, stats, metadata, schema evolution, and more.

apache-iceberg claude-code cloudflare cloudflare-workers iceberg iceberg-rest

Last synced: 01 May 2026

https://github.com/jiatangzhi/master_thesis

This project implements my master’s thesis on building a scalable, ACID-compliant data lakehouse architecture for IoT and industrial workloads, integrating AWS Glue, S3, Athena, and Grafana with Iceberg to evaluate Copy-on-Write vs Merge-on-Read performance.

apache-iceberg aws-glue aws-s3 batch-processing data-engineering data-lakehouse distributed-systems grafana iot-data mqtt open-table-format python3 schema-evolution spark

Last synced: 04 May 2026

https://github.com/dmschauer/wap-pattern-iceberg-pyspark-aws-glue

About This repository shows how to implement the Write-Audit-Publish (WAP) pattern using Apache Spark and Apache Iceberg. It's aimed at Data Engineers who want to get started quickly.

apache-iceberg apache-spark aws aws-glue iceberg pyspark spark

Last synced: 08 Feb 2026

https://github.com/theades/serverless-data-lakehouse

This is an example project how to build a serverless data lakehouse on AWS using Terraform, Apache Iceberg and Spark.

apache-iceberg apache-spark aws data-engineering data-lakehouse terraform

Last synced: 09 Feb 2026

https://github.com/j3-signalroom/mac_flink_with_iceberg

Apache Flink Docker image with Apache Iceberg support for Mac M2, M3, or M4 chips.

apache-flink apache-iceberg flink iceberg

Last synced: 18 Mar 2026

https://github.com/ardnaile/trabalho-eng-dados

Implementação de Apache Spark com Delta Lake e Apache Iceberg

apache-iceberg apache-spark delta-lake docker

Last synced: 06 May 2026

https://github.com/dmschauer/wap-pattern-pyspark-aws-glue

This repository shows how to implement the Write-Audit-Publish (WAP) pattern using Apache Spark and Apache Iceberg. It's aimed at Data Engineers who want to get started quickly.

apache-iceberg apache-spark aws aws-glue iceberg pyspark python

Last synced: 04 Feb 2026

https://github.com/dgroomes/iceberg-playground

📚 Learning and exploring Apache Iceberg

apache-iceberg

Last synced: 07 Feb 2026

https://github.com/subhamay-bhattacharyya/snowflake-de-azure-iceberg-tables

Terraform-based reference implementation for building Snowflake Data Engineering workloads on Azure using external Iceberg tables, Azure Blob Storage, and modular CI/CD pipelines with GitHub Actions.

adls-gen2 apache-iceberg azure ci-cd data-engineering github-actions infrastructure-as-code lakehouse snowflake terraform

Last synced: 02 May 2026